ICOADS Web information page (Friday, 21-Nov-2008 23:03:14 UTC):

Translation of the US Maury Collection: Inventories and Plots

Background: This page contains some preliminary inventory results and discussion of processing options prepared prior to quality control and conversion processing. Note that some of the information is outdated. There are 12,336 voyages (control numbers); 855,724 form type 1 (once a day) reports; 546,138 form type 2 reports; 1,401,862 reports total.

1. Problems in the CD-ROM data that need to be addressed prior to LMR conversion:

a) Hour-day sequence problems (Fig.1 and Table 1).
Proposed solution: Problems b)-d) need to be addressed first. Then a solution to high frequency cases may be straightforward. E.g., 03, 08, 04, 09, 12 at day n become 15, 20 at day n-1 and 04, 09, 12 at day n.
b) Locations at midnight (Fig. 2 and Table 2).
Proposed solution: Those at hour 00 are 8% of form type 2. We will examine a limited number of microfilm examples to investigate whether they were actually at noon?
c) Typographical errors (Table 3).
Proposed solution: Corrections to 20,316 year, month, and day typographical errors have been made (at PSD). They were identified within a voyage by inconsistencies in year, nonconsecutive year-month-day duplicates, and too many hours in a day.
d) Data within a control number that appear to be from a different voyage (Table 4).
Proposed solution: Move headers. If a header is missing insert a new one, creating a new control number, either going back to the microfilm or with blank fields. Change observation control numbers accordingly. [Note: No action was taken on this element of the plan.]
2. Additional comments:
a) The data from the CD-ROM were partially cleaned up from what was received from China.

b) We plan to benchmark stages of the above correction process, so that lists of changes could be generated by an automated process (i.e., Unix "diff" commands) if needed at a future date.

c) The Collection is also known to contain problems with incorrect units indicators (e.g., temperature of Celsius versus Reaumur), although the extent of these problems is not yet known.

Figure 1. Inventory of hours present in form type 2 of the US Maury Collection. Form type 2 consists of reports more than once a day (39% of Collection); form type 1 is daily data.

Table 1. There are almost 7000 different combinations of hours per day for form type 2. The most frequent categories are listed (in decreasing order of frequency), with the first two categories (labeled 3-4 on the left) making up 69%. When the data were digitized, day frequently was not assigned starting at midnight, but at other hours as listed. Moreover, hour was not always keyed according to a 24-hour clock. These problems must be addressed when the data are still in voyage order so that the data will end up properly sequenced during the day. Otherwise, e.g., for data from the leading category 03, 08, 04, 09, 12, what is labeled hour 04 would appear after 03.

Figure 2. Locations were typically recorded only once a day in the Collection. Due to navigational limitations, this should have been local noon, except in cases of dead reckoning. This is an inventory of the hour at which location was recorded per day, which shows cases at some other hours including about 8% of form type 2 at hour 00 (midnight). We split the voyages into two groups: Group 2 is comprised of voyages with at least one location at hour 00 and Group 1 is all other voyages. Note that Group 2 is not distributed across the hours, thus for those voyages the only locations are at midnight.

Table 2. Examples of reports with location recorded at 00 or 24 (both interpreted as midnight). The first line is the header record, which is followed by observational records. Observational fields from left to right are control number, year, month, day, hour, latitude, and longitude (e.g., in the first case: 8600211, 1896, 05, 18, 00, 5008N, 218W), followed by data fields. The hour 00 cases are troublesome; examples should probably be located on microfilm. In the hour 24 examples the location hours are probably noon not midnight since hour 21 (09) precedes and hour 03 (15) follows them (the day changes correctly at hour 14 (02)).

Table 3. Form type 1 (daily). At the cursor (right of day) are a second 1853 08 08 through 1853 08 12 (hour is blank=missing). The month should be corrected to September.

Table 4. This table illustrates a group of reports within a voyage (same control number) that actually appear to belong to a different voyage. The reports at the cursor (right of day) appear to belong with the reports above the header, and the reports above the header are either out of order or a header is missing. Although misplaced or missing headers probably explain many time discrepancies within a voyage, there is evidence of voyages with duplicates or not sequenced by time.

[Translation information][US Maury Collection]

U.S. National Oceanic and Atmospheric Administration hosts the icoads website privacy disclaimer
Document maintained by icoads@noaa.gov
Updated: Nov 21, 2008 23:03:14 UTC