Digitizing records

As part of the COADS project, a major effort has been underway since 1989 at the National Climatic Data Center (NCDC) to digitize U.S.-recruited merchant marine weather observations taken during World Wars I and II and thereby to help fill significant data voids (Fig. 1). The original plan was later expanded to include all available merchant marine data (approximately three million observations) held in the U.S. archives, not in digital form, for the period 1912 to 1946. However, the original estimate provided by NCDC of approximately 18 million undigitized observations taken during 1913-1919 and 1937-1948 (Woodruff et al., 1987) now appears much too high. After actually retrieving the records from the archives, NCDC discovered that the World War II merchant marine records were missing and that a significant number of the Navy records were already digitized and available in COADS. Locating marine weather records for these periods in any of the world's other national archives has proven generally unsuccessful, as the World Wars disrupted routine observing practices and normal government functions such as data archiving.

Only recently, for example, did we discover that the U.S. World War II merchant marine records, as distinct from similar records for other periods, were under the jurisdiction of the U.S. Maritime Commission and War Shipping Administration (later the Maritime Administration). In 1974 the Maritime Administration made the determination to "destroy immediately" all the deck logbooks for the period 1940 through 1947. These were all archived at the New York and San Francisco Federal Archives and Records Centers with the request for destruction approved by the Archivist of the United States. Although the actual destruction had not been unequivocally verified at the time we prepared this article, it is highly probable that the order was carried out because of the large volume of space these records occupied. The justification attached to the disposition form stated: "The Maritime Deck Department Log Books for the WW II period have little if any research value." Unfortunately, it is felt today that these data would have been of significant importance to a number of research projects, including the quest to establish the validity of global warming.

At the outset of the project to digitize the World War I and II data, a strategy was developed for digitizing the data, which is proving to be generally applicable for keying historical ship logbook data. The goal was to key as much as possible of the data and metadata (information about the data) contained in the logbooks, even if they were not of immediate use in the COADS project. This would maximize information preservation as the paper forms deteriorate. These forms should still be considered for microfilming, however, because not all information is digitized (e.g., remarks in the daily journal, gale and storm reports, fog reports, and abstract storm log) and because of the possibility of digitizing errors. Moreover, maximizing information preservation in one digitizing pass should prove more cost effective than handling the paper forms multiple times.

Specifically, two types of records were designed for keying. First, since each logbook form generally contains header information pertaining to an individual ship voyage, a single "voyage header record" was constructed to contain information such as the ship's name, captain, departure and destination, and observational metadata such as the method of observing sea surface temperature and any barometer correction. Second, "data records" were constructed to contain the actual observational data at each time and position. Each header record was assigned a unique voyage number, and the voyage number was also entered into each data record, so that the data records could be linked with the appropriate header during later data processing. This record management strategy minimizes keying, because the voyage header record is keyed only once and allows future "track checking" of the data records that compose a voyage.

In order to accommodate the wide variety of original form types, a multitude of formats had to be devised. It will require a large software development effort to convert these digitized data to a common format for inclusion in COADS. It should also be noted that it is quite labor intensive to prepare the original forms for digitizing and to ensure the accuracy of keying through a stringent quality control process.

Some additional unanticipated difficulties arose in development of the digitizing procedures. The digitizing for this project began on a climatological data management system known as CLICOM. This system was developed for XT-class personal computers, and requires that the header information and data be kept in separate files. This file management requirement and the inability of the CLICOM system to be expanded to increase production led to a conversion to the operational system used at NCDC for its routine keying operations. A separate system was developed to manage and quality control the data. NCDC's operational keying system also allowed for the voyage header and data records to be keyed consecutively and maintained in the same file thus simplifying the file management aspects of the process. Each file is processed through a quality control program that flags outliers and invalid codes so personnel with the original observational forms can determine whether the information was either keyed incorrectly or can be corrected using additional information on the form. It is often possible to correct columns that were transposed; miscalculated dates, times, and locations; incorrect barometer adjustments; mislabeled temperature scales; and other elements that were miscoded or miskeyed.

A companion project to digitize data collected at manned stations on ice floes ("ice islands") and ships over-wintering in the ice pack in the Arctic Ocean, dating back to 1893 (Fig. 1), began in 1992 at NCDC as part of its contribution to COADS, but also in cooperation with the World Data Center-A Glaciology/ National Snow and Ice Data Center (WDC-A/NSIDC), and the Polar Science Center (PSC), University of Washington. Because of their research interests, PSC and WDC-A/NSIDC provided data that they had collected for the missing periods and assisted in establishing the keying priorities.

Funding to start the keying was provided by the National Geophysical Data Center (NGDC). WDC-A/NSIDC is operated for NGDC by the University of Colorado as part of a cooperative agreement between NOAA/ERL and the University. Table 1 illustrates the available period of record of ice island data digitized or retrieved from several poorly documented digital data files under this initiative. Some data records are still missing.

Most of the ice island records that have been keyed to date came from T-3, often referred to as "Fletcher's Ice Island" after its discoverer, Joseph O. Fletcher, who was also instrumental in launching the COADS project a decade ago. It is appropriate that we will finally be able to add the highly valuable climate data from T-3 to COADS. Figure 2 illustrates recorded positions of T-3 based on the meteorological reports.

Other sources of unique data are being provided directly by various international organizations and governments (see Fig. 1). It is hoped that most of these sets can be completed in time for COADS Release 2 around the mid-1990s. Several nations are contributing to this effort:

  • The Arkeologisk Museum in Stavanger, Norway, has obtained a grant to key over 600 late nineteenth century (1867-1890) Norwegian logbooks (approximately 500,000 records) in cooperation with the COADS project which provided keying instructions.

  • Germany is keying 30,000 observations for the period 1887-1890.

  • The Russian Federation provided approximately 3500 observations recently digitized from the Russian ship Vitiaz and other ships for the period 1804-1891 that appeared in the book by S.O. Makarov (1894). Figure 3 presents the geographical distribution of the observations.

  • Negotiations led to an agreement between NCDC and the Chinese National Oceanographic Data Center to establish a cooperative keying project to digitize approximately 1 million ship reports in the Maury Collection (primarily between 1820 and 1860).

In addition, other important historical data sets remain undigitized and need to be considered for digitization as time and resources permit. These include:

  • Japanese Kobe Collection. In the early 1960s Japan provided 623 rolls of microfilm from the Kobe Observatory in Japan containing merchant marine observations from the period 1890 to 1932 that have not been digitized, as well as those observations previously digitized from 1933 to 1961 (in COADS as decks 118 & 119). The microfilm also contains Japanese Navy Observations from 1903 through 1944, although few observations are available past 1941. The total amount of undigitized data is estimated between five and six million reports (Uwai et al., 1992; Elms 1992).

  • Other undigitized U.S. ship logbooks. In addition to the Maury Collection (1796-1900) at least one other set of nineteenth century merchant marine logbooks housed in the U.S. National Archives needs to be considered for digitizing. Also, the Archives may possess undigitized U.S. Navy data.

  • East India Company logbooks (located in the India Office Collection of the British Museum). The East India Company operated uninterrupted from 1599 to 1834, with its ships collecting a "wealth of information about the wind and weather" (Smith, 1925).

There is no doubt that other valuable marine data sets not listed here remain undigitized. The COADS project welcomes all participants who have the resources to provide additional data to join the effort to produce a more complete data set for use by the scientific community.

Introduction | Digitizing records | Impacts of changes in coding and observational procedures | Remaining work | Acknowledgements | References

U.S. National Oceanic and Atmospheric Administration hosts the icoads website privacy disclaimer
Document maintained by icoads@noaa.gov
Updated: Nov 8, 2005 23:57:45 UTC