=============================================================================== International Comprehensive Ocean-Atmosphere Data Set (ICOADS): Release 2.1 Digitizing Historical Records for COADS (Earth Sys. Mon. article) 27 Feb 2004 ==================================================================== This document is based on the article "Digitizing Historical Records for the Comprehensive Ocean-Atmosphere Data Set (COADS)--A search and rescue mission for marine weather observations" appearing in the Earth System Monitor, Vol. 4, No. 2, December 1993, with minor changes to conform to the e-doc standard format and some minor text corrections. One note was added to the text, and a correction noted in Table 1. Figures corresponding to the figure captions appearing in this text are found in a compressed postscript file. Document Revision Information (previous version: 9 September 2002): Updates for Release 2.1 and ICOADS. ------------------------------------------------------------------------------- Joe D. Elms, National Climatic Data Center, NOAA/NESDIS; Scott D. Woodruff, Climate Diagnostics Center, NOAA/ERL; Steven J. Worley, National Center for Atmospheric Research; and Claire S. Hanson, WDC-A Glaciology/ National Snow and Ice Data Center NOAA/NESDIS/NCDC E/CC22, Federal Building, Asheville, NC 28801-2733; NOAA/ERL R/E/CD, 325 Broadway, Boulder, CO 80303-3328; NCAR, P.O. Box 3000, Boulder, CO 80307-3000; NSIDC, University of Colorado, Boulder, CO 80309-0449. Digital marine weather records taken over the global ocean, based primarily on merchant ship observations, date back to 1854. These records and related statistics are available in the Comprehensive Ocean-Atmosphere Data Set (COADS). The cooperative effort that produced this widely used resource for climate research is a continuing project that started with COADS Release 1 (Slutz et al., 1985; Woodruff et al., 1987), which covers the period 1854-1979, and the recently completed COADS Release 1a (Woodruff et al., 1993), which extends the Release 1 products through 1992. There are plans to update the record following World War II (COADS Release 1b) over the next year, with the scope of this update guided partially by the needs of the global atmosphere re-analysis project (Jenne, 1992), and later to update the entire period of record as the goal for Release 2. These efforts are intended to improve coverage during data sparse periods since 1854, extend the period of record back in time prior to 1854, and correct known problems where possible. The work being accomplished today under the COADS project would not have been possible without the insight and devotion of many earlier mariners and researchers. Among the most important of these are Admiral Francis Beaufort who in 1805--as a Lieutenant in the British Navy--promoted the wind speed scale that now bears his name, and Matthew F. Maury, Superintendent of the U.S. Naval Depot of Charts and Instruments from 1842 to 1861, who helped organize international efforts to systematically collect marine data. Data collected by early mariners proved crucial to improving sailing times based on the knowledge of average weather and current conditions such as provided by Pilot Charts first developed by Maury. Unfortunately, large quantities of the earliest records as well as numerous records from more recent periods such as the two World Wars, were never saved in digital form. Approximately one million U.S. Merchant Marine and Ocean Station Vessel (1940-1945) observations for the period 1938-1948 were punched on cards from WB (Weather Bureau) Form 1210A-Marine and archived as Deck 115. The cards were discarded in November 1960 because: -- punched cards were not then a recognized archive medium, -- an estimated 40% of the records contained errors (and adequate technology to perhaps correct these errors was not available), and -- the Ocean Station Vessel observations (a large portion of the deck) were maintained elsewhere. All of the U.S.-recruited merchant marine logbook data prior to 1949, including Maury's own collection of logbooks, have not been digitized. Moreover, in many cases surviving original logbooks are in deteriorating condition. This paper discusses efforts by the COADS project, with the vigorous cooperation of a number of other countries in the international community, to improve data coverage and quality by locating and digitizing as many as possible of these records. {Digitizing records} As part of the COADS project, a major effort has been underway since 1989 at the National Climatic Data Center (NCDC) to digitize U.S.-recruited merchant marine weather observations taken during World Wars I and II and thereby to help fill significant data voids (Fig. 1). The original plan was later expanded to include all available merchant marine data (approximately three million observations) held in the U.S. archives, not in digital form, for the period 1912 to 1946. However, the original estimate provided by NCDC of approximately 18 million undigitized observations taken during 1913-1919 and 1937-1948 (Woodruff et al., 1987) now appears much too high. After actually retrieving the records from the archives, NCDC discovered that the World War II merchant marine records were missing and that a significant number of the Navy records were already digitized and available in COADS. Locating marine weather records for these periods in any of the world's other national archives has proven generally unsuccessful, as the World Wars disrupted routine observing practices and normal government functions such as data archiving. Only recently, for example, did we discover that the U.S. World War II merchant marine records, as distinct from similar records for other periods, were under the jurisdiction of the U.S. Maritime Commission and War Shipping Administration (later the Maritime Administration). In 1974 the Maritime Administration made the determination to "destroy immediately" all the deck logbooks for the period 1940 through 1947. These were all archived at the New York and San Francisco Federal Archives and Records Centers with the request for destruction approved by the Archivist of the United States. Although the actual destruction had not been unequivocally verified at the time we prepared this article, it is highly probable that the order was carried out because of the large volume of space these records occupied. The justification attached to the disposition form stated: "The Maritime Deck Department Log Books for the WW II period have little if any research value." Unfortunately, it is felt today that these data would have been of significant importance to a number of research projects, including the quest to establish the validity of global warming. At the outset of the project to digitize the World War I and II data, a strategy was developed for digitizing the data, which is proving to be generally applicable for keying historical ship logbook data. The goal was to key as much as possible of the data and metadata (information about the data) contained in the logbooks, even if they were not of immediate use in the COADS project. This would maximize information preservation as the paper forms deteriorate. These forms should still be considered for microfilming, however, because not all information is digitized (e.g., remarks in the daily journal, gale and storm reports, fog reports, and abstract storm log) and because of the possibility of digitizing errors. Moreover, maximizing information preservation in one digitizing pass should prove more cost effective than handling the paper forms multiple times. Specifically, two types of records were designed for keying. First, since each logbook form generally contains header information pertaining to an individual ship voyage, a single "voyage header record" was constructed to contain information such as the ship's name, captain, departure and destination, and observational metadata such as the method of observing sea surface temperature and any barometer correction. Second, "data records" were constructed to contain the actual observational data at each time and position. Each header record was assigned a unique voyage number, and the voyage number was also entered into each data record, so that the data records could be linked with the appropriate header during later data processing. This record management strategy minimizes keying, because the voyage header record is keyed only once and allows future "track checking" of the data records that compose a voyage. In order to accommodate the wide variety of original form types, a multitude of formats had to be devised. It will require a large software development effort to convert these digitized data to a common format for inclusion in COADS. It should also be noted that it is quite labor intensive to prepare the original forms for digitizing and to ensure the accuracy of keying through a stringent quality control process. Some additional unanticipated difficulties arose in development of the digitizing procedures. The digitizing for this project began on a climatological data management system known as CLICOM. This system was developed for XT-class personal computers, and requires that the header information and data be kept in separate files. This file management requirement and the inability of the CLICOM system to be expanded to increase production led to a conversion to the operational system used at NCDC for its routine keying operations. A separate system was developed to manage and quality control the data. NCDC's operational keying system also allowed for the voyage header and data records to be keyed consecutively and maintained in the same file thus simplifying the file management aspects of the process. Each file is processed through a quality control program that flags outliers and invalid codes so personnel with the original observational forms can determine whether the information was either keyed incorrectly or can be corrected using additional information on the form. It is often possible to correct columns that were transposed; miscalculated dates, times, and locations; incorrect barometer adjustments; mislabeled temperature scales; and other elements that were miscoded or miskeyed. A companion project to digitize data collected at manned stations on ice floes ("ice islands") and ships over-wintering in the ice pack in the Arctic Ocean, dating back to 1893 (Fig. 1), began in 1992 at NCDC as part of its contribution to COADS, but also in cooperation with the World Data Center-A Glaciology/ National Snow and Ice Data Center (WDC-A/NSIDC), and the Polar Science Center (PSC), University of Washington. Because of their research interests, PSC and WDC-A/NSIDC provided data that they had collected for the missing periods and assisted in establishing the keying priorities. Funding to start the keying was provided by the National Geophysical Data Center (NGDC). WDC-A/NSIDC is operated for NGDC by the University of Colorado as part of a cooperative agreement between NOAA/ERL and the University. Table 1 illustrates the available period of record of ice island data digitized or retrieved from several poorly documented digital data files under this initiative. Some data records are still missing. Most of the ice island records that have been keyed to date came from T-3, often referred to as "Fletcher's Ice Island" after its discoverer, Joseph O. Fletcher, who was also instrumental in launching the COADS project a decade ago. It is appropriate that we will finally be able to add the highly valuable climate data from T-3 to COADS. Figure 2 illustrates recorded positions of T-3 based on the meteorological reports. Other sources of unique data are being provided directly by various international organizations and governments (see Fig. 1). It is hoped that most of these sets can be completed in time for COADS Release 2 around the mid-1990s. Several nations are contributing to this effort: -- The Arkeologisk Museum in Stavanger, Norway, has obtained a grant to key over 600 late nineteenth century (1867-1890) Norwegian logbooks (approximately 500,000 records) in cooperation with the COADS project which provided keying instructions. -- Germany is keying 30,000 observations for the period 1887-1890. -- The Russian Federation provided approximately 3500 observations recently digitized from the Russian ship Vitiaz and other ships for the period 1804-1891 that appeared in the book by S.O. Makarov (1894). Figure 3 presents the geographical distribution of the observations. -- Negotiations led to an agreement between NCDC and the Chinese National Oceanographic Data Center to establish a cooperative keying project to digitize approximately 1 million ship reports in the Maury Collection (primarily between 1820 and 1860). In addition, other important historical data sets remain undigitized and need to be considered for digitization as time and resources permit. These include: -- Japanese Kobe Collection. In the early 1960s Japan provided 623 rolls of microfilm from the Kobe Observatory in Japan containing merchant marine observations from the period 1890 to 1932 that have not been digitized, as well as those observations previously digitized from 1933 to 1961 (in COADS as decks 118 & 119). The microfilm also contains Japanese Navy Observations from 1903 through 1944, although few observations are available past 1941. The total amount of undigitized data is estimated between five and six million reports (Uwai et al., 1992; Elms 1992). -- Other undigitized U.S. ship logbooks. In addition to the Maury Collection (1796-1900) at least one other set of nineteenth century merchant marine logbooks housed in the U.S. National Archives needs to be considered for digitizing. Also, the Archives may possess undigitized U.S. Navy data. -- East India Company logbooks (located in the India Office Collection of the British Museum). The East India Company operated uninterrupted from 1599 to 1834, with its ships collecting a "wealth of information about the wind and weather" (Smith, 1925). There is no doubt that other valuable marine data sets not listed here remain undigitized. The COADS project welcomes all participants who have the resources to provide additional data to join the effort to produce a more complete data set for use by the scientific community. {Impacts of changes in coding and observational procedures} Changes in coding and observational procedures require that data adjustments be made to assure data continuity. Marine weather observations over time have been recorded on a number of different form types as communications technology, the science of meteorology, and subsequent coding practices evolved. The earliest observations from the East India logbooks, for example, pre-date establishment of the Beaufort Wind Scale and even the invention of the barometer in 1643. This means much of the information from this early period would be incompatible with current codes and methods of measurement. Similarly, the earliest reports in the Maury Collection contain wind direction but no wind speed. Nevertheless, these data sets represent a unique resource of weather information having great value. Unforseen data continuity problems have been encountered in our endeavor to collect and digitize U.S. merchant (1912-46) and Arctic ice island data. Some problems surfacing in the retrospective data are the result of established observing practices in effect at the time, while others result from observer error due to carelessness or lack of procedural knowledge/training. Date and time are recurring problems throughout the original records because observers often miscoded the date and time when converting from local ship watch to local time to Greenwich time and date. Also, the dry bulb and wet bulb temperatures occasionally seem reversed on the observational form. In most of these cases the data are corrected during the QC process. Changes in instructions to U.S.-recruited observers generally resulted in the issuance of a new edition of the standard Weather Bureau Instructions to the Marine Meteorological Observer, which later became known as Circular M (for "marine handbook"). Table 2 gives examples of such changes encompassing the period of the 1912-46 merchant data. We believe the instructions prior to the first edition in 1906 were typically attached to the logbooks. The instructions remained fairly consistent from 1906 until the introduction of major international code changes in 1949. Even the international code changes of 1929--when great progress was made in standardizing methods of reporting weather observations, especially by radio--did not substantially alter U.S. coding practices. Unfortunately, observers frequently did not adhere to the observing instructions. One example of a problem associated with the 1912-46 merchant data is that wind directions were sometimes not coded in accordance with the U.S. instructions (32-point scale), occasionally appearing as 3 digits (e.g. 240). It is interesting to note that in the instructions in the 1938 and 1941 editions, the Weather Bureau broke somewhat with tradition by allowing the winds to be reported on the marine weather log form (1210A) in two acceptable codes: "The direction of the wind may be entered in the appropriate column on Form 1210A either directly in terms of compass points or in code, according to the scale 01-32, in which 08=E, 16=S, etc. However, inasmuch as the wind direction must be coded in figures whenever an observation is transmitted by radio, it is customary for most observers to enter the code number and this procedure is preferred by the Weather Bureau. Therefore, observers who have been accustomed to recording the wind direction directly in terms of compass points are urged to make a practice of using the numeric scale instead." This practice required that different form types be developed before keying the data, but did not intrinsically lead to any data biases in the wind directions. However, this does possibly lead to biases in other observed elements such as cloud amounts by inadvertently establishing a precedent of substituting the radio code for the established code to be used on the observational form. For example, the "total cloud amount" entered on the logbook forms appears to have come from two different codes, one developed for the log form entry in tenths (0-10) and the other a single digit code (0-9) adopted for radio transmission. Apparently, the observers, for convenience, often used the radio code instead of the form code to make their entry on the log form. Since there is generally no way of distinguishing which code was used, this innocent practice introduced an observational bias (e.g., overcast skies coded as 10 in the form code and as 8 in the radio code) that can only be corrected statistically after making certain assumptions regarding overall cloud distributions (Table 3). Careful checks will be required for conversion of digitized sea level pressure values into a common COADS format because of the many changes in instrumental or reporting procedures. In a number of cases all the necessary information needed to make these checks is not available on the original form. Before the time of radio transmissions, observers using mercurial barometers were instructed not to make the corrections for temperature and gravity, but simply enter the value as read; the Weather Bureau made the necessary corrections upon receiving the forms. Some of the early aneroid barometers issued did have an attached thermometer for convenience in reading the ambient air temperature (dry bulb). It is difficult to imagine, however, that the mariner would have mounted the valuable barometer in the open air exposed to all the weather elements and not in the protection of the cabin. There is also a large number of attached thermometer entries indicated to have come from an aneroid barometer that have the same value as the dry bulb temperature. This can be corrected by careful QC. Many of the U.S. merchant marine logbook forms located in the archives for the period 1912-46 contain only one observation per day at 0000 UTC. This is in contrast to both the mid- to late 1800s when U.S. observations were reported every two hours and to contemporary international ship observations, which are generally reported every six hours (0000, 0600, 1200, 1800 UTC). Starting sometime before 1906, the Weather Bureau required that radio reports be sent twice or even four times a day but requested only the 0000 UTC observation be sent by mail. The logic behind the practice of reporting only the 0000 UTC observation on the forms is explained in the 1906 edition. The Weather Bureau felt that with the advent of weather forecasting as a science, mariners could best determine which route to take based on conditions actually encountered (referencing the daily synoptic charts) rather than on average conditions. The 0000 UTC observational practice can bias the digital database, however. For example, at certain longitudes all reports are observed near the average diurnal cycle for maximum heating, while at other longitudes reports are observed near maximum cooling, with reports from intervening longitudes falling somewhere in between. These biases can be statistically corrected. One partial solution to correcting the bias of once-daily logbook observations would be to supplement them with radio messages. However, few of these radio messages (often garbled) are available in their original format today. Reasons for this include: -- lack of storage space, -- lack of economical or viable technology at the time for archiving the information on film or digital media, and -- deterioration over time of the teletype paper. [NOTE: It appears that nearly all the U.S.-received radio messages, which were archived on teletype paper at NCDC, were destroyed around 1980.] A large number of these messages were plotted on the Northern Hemisphere Charts and are available dating back to 1899. These observations would be very time consuming and expensive to digitize, and a number of the elements would have to be estimated because of the coarseness of the plotting code. For example, wind speeds were plotted only to the nearest 5 knots, wind direction would have to be based on direction of the plotted wind shaft, and ship position estimated from the location of the plotted station model. {Remaining work} Large tasks remain before COADS Release 2 can be completed. One of the largest will be the conversion of all the additional data sets to the common format used for COADS. The conversion programs must be designed so that data elements are preserved for future research even though they may not be compatible with the COADS Release 2 data format and statistics. To provide adequate metadata will also be a sizeable undertaking. Lastly, all the data sources will have to be merged, duplicate observations eliminated, and quality control applied to the dataset with erroneous or suspect entries flagged. Those elements passing the quality control will then be used to compute the COADS Release 2 statistics which, along with the observations, are planned for general availability in the mid-1990s. {Acknowledgements} COADS is the result of a continuing cooperative project between the National Oceanic and Atmospheric Administration (NOAA)--specifically its Environmental Research Laboratories (ERL), National Climatic Data Center (NCDC), and Cooperative Institute for Research in Environmental Sciences (CIRES, conducted jointly with the University of Colorado)--and the National Science Foundation's National Center for Atmospheric Research (NCAR). The NOAA portion of COADS is currently supported by the NOAA Climate and Global Change Program and the NOAA Environmental Services Data and Information Management (ESDIM) Program. We also acknowledge the many individuals who have worked over the years through their national meteorological services and the World Meteorological Organization (WMO) and its predecessors to see that global scale marine observations were collected, preserved, and distributed for the use and benefit of all. {References} Elms, J.D., 1992: Status of NCDC Keying of Historical Marine Data. Proceedings of the International COADS Workshop, Boulder, Colorado, 13-15 January 1992. H.F. Diaz, K. Wolter, and S.D. Woodruff, Eds., NOAA Environmental Research Laboratories, Boulder, Colo., 37-45. Jenne, R.L., 1992: The Importance of COADS for Global Reanalysis. Proceedings of the International COADS Workshop, Boulder, Colorado, 13-15 January 1992. H.F. Diaz, K. Wolter, and S.D. Woodruff, Eds., NOAA Environmental Research Laboratories, Boulder, Colo., 9-15. Makarov, S.O., 1894: Vitiaz in Pacific Ocean, Volume I, St. Petersburg. Slutz, R.J., S.J. Lubker, J.D. Hiscox, S.D. Woodruff, R.L. Jenne, D.H. Joseph, P.M. Steurer, and J.D. Elms, 1985: Comprehensive Ocean-Atmosphere Data Set; Release 1. NOAA Environmental Research Laboratories, Boulder, Colo., 268 pp. (NTIS PB86-105723). Smith, H.T., 1925: Marine Meteorology, History and Progress. The Marine Observer, Vol. II, No. 15, 33-35. Uwai, T. and K. Komura, 1992: The Collection of Historical Ships' Data in Kobe Marine Observatory. Proceedings of the International COADS Workshop, Boulder, Colorado, 13-15 January 1992. H.F. Diaz, K. Wolter, and S.D. Woodruff, Eds., NOAA Environmental Research Laboratories, Boulder, Colo., 47-59. Woodruff,S.D., R.J. Slutz, R.L. Jenne, and P.M. Steurer, 1987: A comprehensive ocean-atmosphere data set., Bull. Amer. Meteor. Soc., 68, 1239-1250. Woodruff, S.D., S.J. Lubker, K. Wolter, S.J. Worley and J.D. Elms, 1993: Comprehensive Ocean-Atmosphere Data Set (COADS) Release 1a: 1980-92, Earth System Monitor, Vol 4, No. 1. Table 1. Dates of observations digitized for T-3 (Fletcher's Ice Island) and AIDJEX. These dates are not all-inclusive as occasionally several observations or days of observations are missing. [NOTE: The ending date of AIDJEX data digitization has been corrected from what appeared in the original Earth System Monitor article.] ------------------------------------------------------------------------------- T-3 (Fletcher's Ice Island) Source Dates (day/month/year) =============================================================================== Deck 117 (previously digitized)..........................15/04/1952-14/05/1954 01/05/1955-16/09/1955 WBAN Form 610-7..........................................25/05/1957-31/03/1958 WBAN Form 10A & 10B......................................01/04/1958-25/10/1961 Summary of the Day WBAN Form 10A & 10B...................01/04/1958-25/10/1961 (January 1960 missing) Plain Language Teletype Messages.........................19/02/1962-12/06/1966 (used to fill in missing records) Teletype Observations....................................12/07/1963-30/06/1966 TD 3280 (Previously digitized)...........................13/06/1966-14/04/1971* *Note: T-3 records not located after 14/04/71. Ice Island abandoned September 1974. ------------------------------------------------------------------------------- AIDJEX (Big Bear, Blue Fox, Caribou, Snowbird) =============================================================================== April 1975-April 1976 ------------------------------------------------------------------------------- Table 2. U.S. Weather Bureau Instructions to Marine Meteorological Observers. ------------------------------------------------------------------------------- EXAMPLES OF CHANGES IN CODES AND OBSERVING PRACTICES Wind Wind Sea Mercury EDITION YEAR Clouds Direction Speed Weather Temp. Barometer =============================================================================== Instructions Late Proportion Mean Beaufort Beaufort Bucket No correct- attached to 1800s of clear magnetic force weather temp. ions as forms sky 10ths direction code read* First 1906 Amount of True wind " " " " clouds in direction 10ths 32-point Second 1908 " " " " " " Third 1910 " " " " " " Fourth 1925 " " " " " No correct- ions as read** Fifth 1929 " " " " " " Sixth 1938 " " " 00-99 pres. Bucket/ " weather code injection Seventh 1941 " " " " " " Provisional 1949 Eighths 36-point Knots " " Corrected to sea level ------------------------------------------------------------------------------ * Indications are that this was probably the practice, however, it is not clearly evident from reading the instructions. ** Corrections to be added for radio transmissions. NOTE: Updates after 1941 include Circular M (editions 8-12) in 1950, 1954, 1959, 1963, 1964, and NWSOH No. 1 editions in 1971, 1982, and 1992. A new update will be required for the November 2, 1994 approved WMO code changes. ---------- Table 3. Total Cloud Amount. ------------------------------------------------------------------------------- Radio Code on transmission observation code figures Proportion of sky covered (in tenths) form =============================================================================== 0 0 0 1 Less than 0.1 * 2 0.1 1 3 0.2 to 0.3 2,3 4 0.4 to 0.6 4,5,6 5 0.7 to 0.8 7,8 6 0.9 9 7 More than 0.9 but with openings * 8 Sky completely covered with clouds 10 9 Sky obscured by fog, dust storm, or other phenomenon * ------------------------------------------------------------------------------- * No corresponding code. ---------- Figure 1. Annual global marine reports after duplicate elimination (curve) for COADS Release 1 through 1979, continued by Release 1a through 1992. Horizontal lines span the time periods for data now being collected and digitized (World War I and II; Arctic) or proposed for future digitization, with the approximate numbers of reports shown in millions (M) or thousands (K). Labelled ticks along the upper horizontal axis mark the starting years for Release 1a, and those planned for Releases 1b (1947) and Release 2 (1854, or earlier). Figure 2. Fletcher's Ice Island (T3) data available in digital form. Note that data prior to about 1961 had positions keyed only to whole degrees of latitude and longitude. (Figure courtesy of Ignatius Rigor, Polar Science Center, University of Washington.) Figure 3. Russian S.O. Makarov Collection of data, 1804-1891, showing the observation locations of the Vitiaz (circles) and other ships (triangles) in the Collection.