ICOADS Web information page (Wednesday, 29-Feb-2012 18:52:30 UTC):

US Maury Collection (deck 701; 1784-1863) (by ESRL)



1. Background

Background on the Collection and the format used for digitization is provided in the following texts from the CD-ROM (NCDC, 1998): about.txt format.txt Updated format information is provided here: maury_format Format translation specifications are located here: maury_transpec Additional translation-related information is located on these Webpages: Time Adjustments Temperature Corrections Preliminary Inventories and Plots Inventories of the ship names, and voyages, in the Collection are available here: mauri_out maury_invoy (1.8 MB text file) As detailed in Table 1, the first 13 years of the Collection contain temporal discontinuities (missing months and years), and very few data. Table 1. Numbers of reports and ships present in the earliest years of the US Maury Collection. Months are frequently composed entirely of reports from a single ship, and some several and years contain no data (until February 1796, which is the last missing month). =============================================================================== Reports Year/Month ID fields (ship names abbreviated to 8 characters) ------------------------------------------------------------------------------- 6 1784/02 EMPRE*_C 26 1784/03 EMPRE*_C 27 1784/04 EMPRE*_C 20 1792/03 GRAND_TU 30 1792/04 GRAND_TU 31 1792/05 GRAND_TU 30 1792/06 GRAND_TU 31 1792/07 GRAND_TU 21 1792/08 GRAND_TU PEGGY 25 1792/09 PEGGY 37 1793/02 GRAND_TU PEGGY 62 1793/03 GRAND_TU PEGGY 51 1793/04 GRAND_TU PEGGY 55 1793/05 GRAND_TU PANTHER PEGGY 55 1793/06 GRAND_TU PANTHER PEGGY 27 1794/07 KATY 5 1794/08 KATY 4 1794/09 KATY 31 1794/10 KATY 12 1794/11 KATY 2 1795/05 YORKTOWN 1 1795/06 YORKTOWN 1 1795/07 YORKTOWN 1 1795/08 YORKTOWN 1 1795/09 YORKTOWN 22 1795/10 YORKTOWN 27 1795/11 YORKTOWN 15 1795/12 YORKTOWN 1 1796/01 YORKTOWN 6 1796/03 ARRABIDA 30 1796/04 ARRABIDA 84 1796/05 ARRABIDA BRIDGEWA 242 1796/06 BRIDGEWA 260 1796/07 BRIDGEWA 170 1796/08 BRIDGEWA 45 1796/09 BRIDGEWA 77 1796/10 BRIDGEWA 21 1796/11 KITTY 26 1796/12 KITTY 1618 TOTAL ------------------------------------------------------------------------------- The US Maury Collection data as obtained from the CD-ROM contained a large number of problems, many arising from the difficulty of reading the original microfilm records, or otherwise introduced during digitization and assembly of the data. A 5-phase processing was used to address many of these problems (sec. 2-4). Some voyage number misassignments still exist in the US Maury data, which could not be resolved under our schedule constraints (extensive comparisons with the original microfilm would have been required). In most cases we suspect that the reports were keyed in proper order (with respect to the original microfilm sequence), but the voyage number was not updated properly during digitization (e.g., reports at the start of a new voyage inadvertently received the previous voyage number). This means that incorrect metadata such as ship name and type may be attached to some reports, but the basic meteorological data are probably at the correct times and locations. Also, these problems may have impacted the results of position interpolation to some extent (as discussed in sec. 4).

2. Processing overview

Five general phases of processing (Fig. 1) were used to help expedite the work and for related technical reasons (this is a simplification of the actual processing, which involved additional steps and variations depending on the form type of the data). Processing through Phase C is done, with Assessment (Phase D) beginning. Time edit Time/pos. assign. Translation to LMR Assessment ---------- ----------- ----------- ----------- CD1 data ---A---> CD2 data ---B---> QC data ---C---> LMR data ---->D ---------- ----------- ----------- ^ reject file ^ |____________________________________________| summary | ----------- Pre-edit | ---------- Figure 1. Processes (Pre-edit and A-D), and data and metadata CD data outputs proposed for the US Maury Collection. ---------- In the following overview of the Pre-edit and Phases A-D, the output data at each stage are represented in one of three formats: CD (format used for digitization and on the CD-ROM), QC (quality control format), and LMR (LMR6). "CD" is suffixed by a number to indicate that the data are still in the CD format, but edited. Further details on Phases A and B are given in secs. 3-4. The files for each processing phase were divided up according to the original microfilm reel numbers (one file for each of the 85 reels that were digitized). Pre-edit A few changes to the original CD data were required to manipulate the data on a Unix system. Most significantly the data contained null characters in place of some real characters. A few additional minor problems were detected and changes made: for three control numbers, the headers were not with the data records and were shifted; one corrupt and redundant header record was deleted; and an erroneous form type = 5 was changed to form type = 1 for a few records. Input: CD Output: CD1 Phase A: Time edit (and other incidental editing) First a number of modifications were made to the time elements (records were not moved, and voyage numbers were not changed). Then changes were made to day and hour to regularize the data for the 24-hour clock, and to fix apparent problems introduced by the noon-to-noon definition of day in some early data. Input: CD1 Output: CD2 Additional information: see sec. 3. Also, since the record structure has not changed, a diff could be performed between the CD1 and CD2 data to obtain a complete list of differences. Phase B: Time/position assignment A condensed QC format forms the output from this process, containing the edited time elements, the originally reported or interpolated positions, and other information. If, for example, the interpolation failed to produce a latitude and longitude, one or both of latitude/longitude was missing and the report was rejected at the next (translation) phase. Input: CD2 Output: QC records (1-for-1 with CD1 and CD2) Additional information: see sec. 4. Phase C: Translation to LMR Fields were translated (as feasible) into the regular fields of the LMR format, plus data from the CD1 and QC records were placed in the supplemental attachment of each report. Note that we attached the CD1, rather than CD2, records in order to preserve the more original records (all the edited elements were provided via the QC records). Temperature units and other corrections were also made as part of this processing. Input: CD1 + QC records (1-for-1) Outputs: Per reel: LMR, reject file, and conversion summary Phase D: Assessments Planned to include rechecking of ship tracks, climatological comparisons, etc. Input: LMR Outputs: graphics, etc. (products not planned for archival)

3. Details on time edit (Phase A)

Summary: Approximately four person-months were spent analyzing the voyages and correcting (almost exclusively) time problems. Some incidental location and other obvious problems were also corrected, but records were not moved (with respect to the original data sequence) and control numbers were not changed. This work, plus other phases, could be reiterated if significant new problems were discovered in the future. The input CD data have the following counts: 1,414,198 total lines (data + header records) - 12,336 header records (voyages) --------- 1,401,862 data records (reports) 20819 records were modified (~1.49%). These can be subdivided as follows: Changes to: ------------------------------ year month day hour 13673 2608 4035 1875 0.98% 0.19% 0.29% 0.13% There were 268 "year jumps" in the original data, i.e., year difference not zero or one within an (apparent) voyage. Only about 60% of these were corrected (a subset of the above year corrections); the remainder were not corrected because they were found to have problems such as control number incorrect (different voyage/ship shares control number), duplicated reports, or data out-of-order. Next the hours (and days) in the Collection were regularized according to the 24-hour clock (local time). The details of this processing are described on the Time Adjustments Webpage. The Preliminary Inventories and Plots Webpage provides some additional background information on Phase A problems.

4. Details on time/position assignment (Phase B)

The edited time elements were carried forward in this phase (from CD2 into the QC format). Also latitude/longitude were carried forward as reported or interpolated, together with a few other flags and metadata in the QC format. Within each voyage, we attempted to interpolate missing latitude/longitude for any reports in sequence between two reports containing observed values of latitude/longitude (subject to constraints described below). This was facilitated by the fact that data in the CD format were organized into voyages, with the data generally digitized in proper time-sequence within the digitized files, even after the time edit. One exception was logform pattern 7 (discussed on the Time Adjustments Webpage), in which the reports containing observed latitude/longitude ended up out of sequence after time edit (special steps were taken to properly interpolate pattern 7 voyages). When data were otherwise out of time sequence (e.g., due to header misassignment problems), interpolation was not performed. To avoid having to manipulate the entire (fairly voluminous) CD dataset, the interpolation output consisted of the abbreviated QC format, containing: voyage number, reel sequence number (with respect to the original microfilm reel), time (year, month, day, and hour), position (latitude and longitude), and the LMR lat/lon indicator (LI). LI contained missing (if latitude and longitude ended up missing), or one of the following values: 3 = interpolated 4 = degrees and minutes Data were processed one reel (file) at a time. First latitude was interpolated (for the entire file), and then longitude, to take better advantage of more frequent reporting of observed latitude than observed longitude (owing to early navigational constraints). This could result in reports with latitude originally reported, but longitude interpolated (flagged LI=3). In hindsight, an additional LI value: 6 = other (refer to metadata) could have been used to distinguish between the mixture of latitude observed and longitude interpolated (but the CD1 records are available in LMR format if it is desired to isolate this case). Interpolation was performed using simple linear interpolation in each dimension (latitude or longitude), rather than spherical coordinates (great circle calculations). Except over large distances this probably produced satisfactory results (also considering the relatively coarse resolution of early reported positions). If lat1 and lat2 (hr1 and hr2) were the two reported latitudes (hours), we calculated: dlat = lat2 - lat1 dhr = hr2 - hr1 (using "julian" hours) Interpolation was not performed if dhr was negative (a jump backwards in time), if ob1 and ob2 did not have the same voyage number, or if: dhr > 3 months |dlat/dhr| > 8 degrees in 24 hours (i.e., > 20 knots) |dlat| > 32 degrees Since there are approximately 60 nautical miles or 111 km per degree of latitude (actually using 1852 m per international nautical mile): max 24-hour distance = 889 km = 552 miles max abs value of dlat = 3556 km = 2209 miles Then for each pair of corresponding longitudes, lon1 and lon2, we calculated: dlon = lon2 - lon1 (adjusted accordingly if voyage crossed dateline) wlon = dlon * cos(ylat) wdis = sqrt(wlon**2 + dlat**2) where ylat was the mean of lat1 and lat2, lat1 (lat2) if lat2 (lat1) was missing, or 45 degrees if both were missing. Similarly to latitude, interpolation was not performed if: dhr > 3 months |wlon/dhr| > 8 degrees in 24 hours (i.e., > 20 knots) |wdis| > 32 degrees Note that reports lacking latitude and/or longitude were rejected during the next phase of processing (translation to LMR). Reports with latitude successfully interpolated, but not longitude (due to stricter tests), were rejected by this means.

Reference

NCDC (National Climatic Data Center), 1998: The Maury Collection: Global Ship Observations, 1792-1910 (CD-ROM, Version 1.0, February 1998). NCDC, Asheville, NC

[Documentation and Software][Translation information]


U.S. National Oceanic and Atmospheric Administration hosts the icoads website privacy disclaimer
Document maintained by icoads@noaa.gov
Updated: Feb 29, 2012 18:52:30 UTC
http://icoads.noaa.gov/maury.html