=============================================================================== International Comprehensive Ocean-Atmosphere Data Set (ICOADS): Release 2.1 Data Preconditioning and Duplicate Elimination: 1980-97 27 February 2004 ================================================================= Document Revision Information (previous version: 9 September 2002): Updates Release 2.1 and ICOADS. ------------------------------------------------------------------------------- {1. Introduction} This document describes the rules for a sequence of processing steps performed in the duplicate elimination (dupelim) program for 1980-97 (originally referred to as COADS Release 1a) data. Preconditioning (sec. 2), the first step in this sequence, was used to delete Long Marine Reports (LMR6), or to correct or modify individual data fields within a given report. The second step involved setting the LMR fields for platform type (PT) and ID indicator (II) (sec. 3). The final step was actual QC/dupelim processing (sec. 4). During dupelim, additional reports were eliminated, and a limited number of changes was made to the contents of reports by substitution between duplicates. [NOTE: Because of processing differences, the three original COADS updates that compose ICOADS.DM, and accompanying documentation, are referred to as follows: Release 1a: 1980-97 Release 1b: 1970-79 1946-69 Release 1c: 1784-1949 These four documents describe the "preconditioning" and duplicate elimination processing used to create LMR for the indicated periods. 1946-49 Release 1b data were replaced by Release 1c data.] {2. Preconditioning} Sec. 2.1 gives the rules for report deletions; sec. 2.2 gives the rules for field modifications. Similarly to setting platform type and the ID indicator (sec. 3), deck is the field that initially determines the rules to be used. Decks that are not specified are not subject to preconditioning. Some rules are labelled according to a lower-case letter, which indicates that more than one rule applies to a deck. Dates indicated as part of the rules are inclusive, e.g., "October 1991-March 1992" refers to the beginning of October 1991 through the end of March 1992. {2.1 Report deletions} All decks Rules: Delete any report after 31 December 1997. Background: A small number of data sources continuing into 1998 were included in the input data. 1998 data are deleted at this stage of processing, e.g., to ensure that dupelim summary information strictly describes 1980-97. Deck 144: PMEL TOGA/TAO Buoys Rules: Delete all deck 144 reports assigned to source ID 86 or 87 that also contain a position flag (character 12 of the supplemental attachment) with a value of 8 or 9. Background: SID=86 and 87 contain RAM (downloaded from buoy memory after physical recovery of the buoy) or SPOT (Argos satellite receipts) data, respectively, from PMEL's new "standard archive" (the standard archive is frequently updated; the data used for this update carry a generation date, 17 November 1998, retained in the LMR supplemental attachment). The source ID check is needed because earlier deck 144 data assigned to SID=66 (and deck 145 data assigned to SID=67) have not yet been fully replaced by the standard archive (no quality control metadata were provided with the old archive). Definitions of the PMEL flag settings follow: 0 = datum missing (instrument package never had sensor) 1 = highest quality; pre/post-deployment calibrations agree (for future use) 2 = standard quality; pre-deployment calibrations applied 3 = lower quality; pre/post calibrations differ 4 = questionable; doesn't agree with adjacent buoys, climatology, or other data sources 5 = sensor or tube failed (positions can be flagged 5 after deployment but before Argos has registered the position) 8 = position: buoy drifted or drifting with speed less than 0.1 knots (usually a buoy that is dragging its anchor) 9 = position: buoy drifting and speed greater than 0.1 knots (usually a buoy that has broken, or been cut free, from its mooring) Decks 201-255: UK Met. Office (UKMO) Main Marine Data Bank (MDB) Rules: This rule is identical to rule b) under deck 926, except that it is applied to decks 201-255 (French correction). Background: (See background under deck 926.) Deck 500: Gulf Offshore Weather Observing Network (GOWON) (plat data) Rules: Delete all deck 500 reports (Gulf of Mexico oil platform data). Background: Deck 500 is documented in the "TD-1129 Marine Data Users Reference; 1970-Current" to cover January-December 1982. However, this rule deletes all reports regardless of date, in case of a documentation error. The 3,494 reports (~50% duplication rate) from deck 500 in the preliminary 1982-83 product were printed out for examination. There are questions about data quality and about whether all expected data from deck 500 have been translated into TD-1129. Consideration should be given to including these data in a later update. Deck 714: Canadian Marine Environmental Data Service (MEDS) Buoys Rules: a) For 1980-92, delete any report from deck 714 with position flagged doubtful (i.e., lat/lon flag=3). b) This rule is identical to rule b) under deck 893, except that it is applied to deck 714. c) This rule is identical to rule c) under deck 893, except that it is applied to deck 714. d) Starting January 1986, delete any deck 714 report if the third position of the ID falls in the range 0-4 (moored buoys). e) This rule is identical to rule e) under deck 893, except that it is applied to deck 714. Background: General background on MEDS data: The COADS project provided MEDS with NCEP's Office Note 124 (ON124) GTS decode for 1980-January 1986 (ON124 report-type codes 561 and 562, corresponding to moored and drifting buoys, respectively). MEDS quality controlled both the moored and drifting buoy data for 1980-85, as well as MEDS-gathered drifting buoy data for 1986 forward (NOTE: some MEDS-gathered data also were added for the 1980-85 period). For the period starting 1986, MEDS also QC'd data transmitted from moored buoys in the DRIBU or BUOY code, such as PMEL TAO buoys. Detailed background on rules: a) Prior to 1993, most reports with lat/lon flag=3 were flagged because of unrealistic movements in the buoy trajectory as detected during MEDS track checking. Starting in 1993, however, MEDS started flagging large numbers of reports because the position was calculated from a different satellite pass (i.e., the time of the buoy position did not correspond to the time of data), in response to apparent software changes at ARGOS. Starting in 1993, therefore, excessive amounts of data were previously being rejected by application of the lat/lon flag=3 rule. Conversely, however, by non-application of the rule we are now accepting some data with non-trivial position (track check) problems. At present, we do not have enough information in the available MEDS format to determine why data were flagged, but work is underway to estimate the extent of remaining problems, and to design better solutions to these problems. b-d) We use delayed-mode data from PMEL and NDBC, instead of these or other GTS receipts, for the major US moored buoy arrays. For simplicity we use NCEP data starting in 1986 for all other/foreign moored buoys, since some may transmit in the SHIP code and thus not be available in MEDS data after 1985. [NOTE: Application of rule d) may retain some moored buoy data with unrecognizable/missing ID fields.] e) (See background under deck 893.) Deck 732: Russian Marine Meteorological Data Set (MORMET) (received at NCAR) Rules: This rule is identical to rule c) under deck 893, except that it is applied to deck 732. Background: (See background under deck 893.) [NOTE: Deck 732 matches with deck 883 were found in 1988 with PT=5 (ship), OS=2 (national telecom), and OP=3 (auxiliary ship).] Deck 780: Levitus World Ocean Database (WOD) Rules: a) Delete any report from deck 780 with sea surface temperature missing. b) Delete any deck 780 report with OCL Platform Code 7503. Background: a) See for background on the application of this rule to the earlier version of deck 780, the World Ocean Atlas (WOA). The situation is similar here, except that the WOD may contain surface meteorological elements in addition to SST. We delete reports lacking SST to eliminate reports containing no data; some reports (estimated ~60K) containing only non-SST data also are removed, which reduces confusion over platform type (PT) (e.g., a report with no SST but labelled PT=12 for XBT). [NOTE: If a suitable estimate of SST was not available from the profile, a "reference temperature," if available, was used for SST, but in that case the SST method indicator (SI) was set to a value (possibly missing) other than 11 or 12. Deck 780 was not available for the initial Release 1a update for 1980-92. WOA data available for the 1990-95 extension of Release 1a covered 1898-1993, and the applicable pre-1980 portion was included in Release 1b.] b) Data from the Global Temperature-Salinity Profile Program (GTSPP) are included in the WOD XBT archive. Some data in the GTSPP are derived from "fixed" platforms, such as the TAO moored buoy array. XBT's are not actually used in the TAO array; instead, the upper-ocean thermal structure is measured using a vertical string of thermistors attached below the buoy. Data from the TAO buoys are identified by the OCL Platform Code (right-justified in bytes 95-99 of the supplemental attachment; 69941 reports found during conversion to LMR). We remove these data to avoid possible duplication with regular TAO surface data. Deck 882: US National Data Buoy Center (NDBC) Data (obsolete) Rules: Delete all deck 882 reports. Background: This version of the NDBC data, converted from TD-1129 format, has a variety of format and data problems, and has been replaced by TD-1171 data (deck 883). Deck 883: US National Data Buoy Center (NDBC) Data Rules: Delete any deck 883 report if the third position of the ID falls in the range 5-9 (drifting buoys), applicable only to IDs not starting with "91." Background: MEDS provides a quality controlled set of global drifting buoy data. [NOTE: Since April 1988 NDBC has been quality controlling data from selected drifting buoys in the TOGA research program, and other drifting buoy data from NDBC date back to 1984 at NCDC. This rule will be applied to all years, although drifting buoy data appear to have been included only back to January 1988 in the TD-1171 files provided for Release 1a.] Deck 888: US Air Force Global Weather Central (GWC) Rules: a) Delete any report from deck 888 unless it contains source ID 79 (GWC reconversion). b) Delete any report from deck 888 containing a block number other than "999999" (block number for roving ships and drifting buoys). c) Delete any report from deck 888 containing a drifting buoy ID: "DRIB," or with the fifth position of ID extant (non-blank). d) Delete any report from deck 888 with SHIP, BUOY, RIGG, and PLAT (generic IDs), plus any report with a call sign not starting with an alpha character or less than four (non-blank) characters in length. e) Delete all reports from deck 888 during 1982-85. Background: a) The reconversion of the GWC data (SID=79) from DATSAV2 format should provide better quality and more complete data. An older version of the GWC data previously was available in Release 1a only for 1980-81. b-d) These three rules are intended to constrain the GWC data input to dupelim to clearly identified roving ship data. We rely on alternative sources for data types other than roving ships (i.e., drifting and moored buoy, OSV, rig/plat, and C-MAN data) for the entire Release 1a update time period. Preliminary runs without rule d) indicated that GWC reports with generic or suspicious IDs frequently matched moored and drifting buoy reports handled under the limited pass through rule for dupelim, hence rule d). It should also be noted that processing problems both internal and external to GWC make deck 888 drifting buoy data prior to 1 November 1988 unusable (further details have not been located about the nature of these problems). [NOTE: GWC ID fields were taken from the DATSAV2 4-character "CALL LTRS" field, except for drifting buoy data which were identified by "DRIB" in that field. In the case of "DRIB," ID was set to five characters extracted from DATSAV2 "remarks," which were expected to contain a 5-digit WMO drifting buoy number, or ID was retained as "DRIB" if the remarks were unavailable. Therefore, drifting buoy data should be identifiable either by DRIB or by an extant fifth ID position (since all other data types have ID limited to four characters). However, we hope to reprocess the GWC data at a future date to retain the DRIB information and/or ensure that a 5-digit buoy number actually is extracted.] e) GWC data during a period after the 1982 WMO Ship Code change are impacted by biases in total cloudiness (in many cases it appears that N>0 has been replaced by N=0) and other complex problems impacting subsidiary cloud fields and present and past weather. [NOTE: GWC data were included in this period in an initial set of 1980-97 data.] Deck 890: US National Meteorological Center (NMC, now NCEP) Data (obsolete) Rules: Delete all deck 890 reports. Background: This version of the NMC data, converted from TD-1129 format, has serious data problems, and has been replaced by decks 892-896. Decks 892-896: US Nat. Cntrs. for Environ. Pred. (NCEP) Data (obsolete version) Rules: a) Delete any report from decks 892-896, unless it contains source ID 29 (NCEP reconversion; 1980-92); not applicable after 1992. b) Delete any report from decks 892-896, unless it contains source ID 77 (NCEP reconversion; November 1994-December 1997); applicable only during November 1994-December 1997. Background: a) This reconversion of the NCEP data from Office Note 124 (ON124) format, covering 1980-92, corrects numerous problems. [NOTE: Volumes from a previous (1980-86) reconversion, which were also assigned SID=29 in preliminary Release 1a products, were not included in the selection of volumes to be input for the sort.] b) This later reconversion of the NCEP data from ON124 format, covering (nominally) Nov. 1994-Dec. 1997, is expected to correct problems during this period introduced by NCDC's MOPS processing, e.g., misassignments of data to generic IDs and losses of reported GTS wet bulb temperatures. [NOTE: The Nov. 1994 starting date is nominal because some late October 1994 data appear with SID=77 apparently due to the way that NCEP data are processed in "timeblocks." Overlapping GTS data in ON124 and BUFR format were produced at NCEP for approximately 1 March-19 April 1997, and thereafter ON124 was discontinued at NCEP. NCEP re-formatted BUFR data into a pseudo-ON124 format for archival at NCDC. It is assumed that the period 1 March-19 April is derived from genuine ON124 data, but this should be checked.] Deck 892: NCEP Ship Data Rules: Delete any report from deck 892 with ID set to BOUY. Background: Reports containing this misspelling of the generic ID BUOY were found to match NDBC (deck 883) reports in 1984, 1986, and 1992 data. Deck 893: NCEP Moored Buoy Data Rules: a) Delete any report from deck 893 prior to January 1986. b) Delete any report from deck 893 containing an ID whose first five characters (ignoring any trailing characters) match one of the 84 TAO buoy numbers that may have appeared on GTS during 1985-June 1998 (rule does not test for date): 13008, 13009, 13010, 15001, 15002, 23001, 31001, 32303, 32304, 32305, 32315, 32316, 32317, 32318, 32319, 32320, 32321, 32322, 32323, 43001, 43301, 51006, 51007, 51008, 51009, 51010, 51011, 51012, 51013, 51014, 51015, 51016, 51017, 51018, 51019, 51020, 51021, 51022, 51023, 51301, 51302, 51303, 51304, 51305, 51306, 51307, 51308, 51309, 51310, 51311, 52001, 52002, 52003, 52004, 52006, 52007, 52008, 52010, 52011, 52012, 52301, 52302, 52303, 52304, 52305, 52306, 52307, 52308, 52309, 52310, 52311, 52312, 52313, 52314, 52315, 52316, 52317, 52318, 52319, 52320, 52321, 53001, 53002, 53003. [NOTE: "GTS Data: List of TAO WMO Numbers and Site Location, June 1998" obtained from http://www.pmel.noaa.gov/toga-tao/wmo.html, 29 July 1998.] c) Delete any report from deck 893 containing an ID whose first five characters (ignoring any trailing characters) match one of the 138 NDBC moored buoy numbers that may have appeared on GTS during 1980-97: 32301, 32302, 33301, 41001, 41002, 41003, 41004, 41005, 41006, 41007, 41008, 41009, 41010, 41011, 41015, 41016, 41017, 41018, 41021, 41022, 41023, 42001, 42002, 42003, 42005, 42006, 42007, 42008, 42009, 42010, 42011, 42012, 42014, 42015, 42016, 42017, 42018, 42019, 42020, 42025, 42035, 42036, 42037, 42039, 42040, 42107, 44001, 44002, 44003, 44004, 44005, 44006, 44007, 44008, 44009, 44010, 44011, 44012, 44013, 44014, 44015, 44019, 44023, 44025, 44026, 44028, 45001, 45002, 45003, 45004, 45005, 45006, 45007, 45008, 45009, 45010, 45011, 46001, 46002, 46003, 46004, 46005, 46006, 46010, 46011, 46012, 46013, 46014, 46016, 46017, 46018, 46019, 46020, 46021, 46022, 46023, 46024, 46025, 46026, 46027, 46028, 46029, 46030, 46031, 46032, 46033, 46034, 46035, 46036, 46037, 46038, 46039, 46040, 46041, 46042, 46043, 46045, 46047, 46048, 46050, 46051, 46053, 46054, 46059, 46060, 46061, 46062, 46125, 51001, 51002, 51003, 51004, 51005, 51026, 51027, 51028, 52005, 52009. This rule is applied without respect to date, with two exceptions: i) Starting in October 1987, reports from buoy 46036 are not deleted. ii) Starting in July 1988, reports from buoy 46004 are not deleted. d) Delete any report from deck 893 containing ID 44152 (ignoring any trailing characters) and mislocated in 10-degree box number 147 (Caspian Sea) (retain ID 44152 reports in all other 10-degree boxes). e) Delete any report from deck 893 containing an ID whose first two characters are "91" followed by three numeric characters (ignoring any trailing characters). Background: a-c) We use delayed-mode data from PMEL and NDBC, instead of these or other GTS receipts, for the major US moored buoy arrays. NCEP moored (and drifting) buoy data through December 1985 have been quality controlled by MEDS (deck 714). For simplicity we use NCEP data in deck 893 starting in 1986 for all other/foreign moored buoys, since some may transmit in the SHIP code and thus not be available in MEDS data after 1985. NDBC buoy numbers 46036 and 46004 were assigned to Canada effective after 22 September 1987 and after 29 June 1988, respectively (for simplicity in preconditioning, whole-month cutoffs were used rather than precise dates, i.e., a total of nine days of data may be lost). d) Reports from Canadian moored buoy 44152 are known to be mislocated in this area in the NCEP archive at least during June-November 1987, but the rule is applied to all time periods since moored buoys with such a number should never appear in the Caspian Sea. e) Beginning around November 1989, IDs for some NDBC Western Pacific C-MAN (Westpac) stations took a numeric form resembling a 5-digit WMO buoy number except beginning with "91" (not legitimate starting digits for a buoy number). Reports from these stations have been sporadically misassigned into moored and drifting buoy datastreams (e.g., decks 714, 893, and 896). [NOTE: These stations were equipped with both a primary and secondary data system. Data from the primary system were received through NDBC's normal satellite communication channels identified by the third digit less than 5 (91nxx), and data from the secondary system from Service Argos in the drifting buoy code identified by the third digit greater than or equal to 5 (91mxx). Data from both systems were quality controlled at NDBC and the best data released in deck 883 under 91nxx. 91mxx data appearing, e.g., in NCEP and MEDS data came directly from Service Argos and were not released by NDBC. Also, it is known that secondary wind directions identified under 91852 are erroneous. NDBC suggests that data from all Westpac stations identified as 91mxx be disregarded.] Deck 894: NCEP Drifting Buoy Data Rules: Delete all deck 894 reports. Background: MEDS provides a quality controlled set of global drifting buoy data, as discussed above. [NOTE: During the 1980-92 NCEP reconversion, NCEP data containing report type 562 ("drifting buoys") that did not contain a legitimate drifting buoy number were written out for later study to a separate file, not to be included in Release 1a.] Deck 895: NCEP Coastal Marine Automated Network (C-MAN) Data Rules: Delete all deck 895 reports. Background: NDBC provides a higher quality and more complete set of C-MAN data; this preconditioning step assumes that no other useful data will appear in deck 895. [NOTE: NDBC's C-MAN program is documented to begin in March 1983, and inventories confirm that the TD-1171 C-MAN data indeed start at that date. Earlier data appearing in deck 895 have been determined not to be from lightships as was previously thought and the quality of such data was found to be extremely poor. Therefore, as part of the 1980-92 NCEP reconversion, all data prior to March 1983 that would ordinarily go into deck 895 were written out for later study to a separate file, not to be used for Release 1a.] Deck 896: NCEP Miscellaneous (OSV, plat, and rig) Data Rules: This rule is identical to rule e) under deck 893, except that it is applied to deck 896. Background: (See background under deck 893.) Deck 926: International Maritime Meteorological (IMM) Data Rules: a) Canadian correction: Delete all deck 926 reports during 1982 with the US country code (C1=2). b) French correction: Delete all deck 926 reports during 1980-88 containing the French country code (C1=4), confined to the region 90E-90W across the dateline, except retain reports from the French correction tape: both those mislocated (SID=58; removed at dupelim processing termination), and those properly located (SID=60). c) UK correction: delete all deck 926 reports during 1982-89 containing the UK country code (C1=3), except retain all reports from the UK correction tapes (SID=59). d) Dutch deletions: Only for 1987 delayed receipts (SID=35), delete all deck 926 reports during 1982 containing the FRG country code (C1=21), which also contain the Dutch secondary country code (C2=0) or which have the secondary country code missing. [NOTE: The bad Dutch tape volume appears to be I3ZG41, which was omitted from the update data.] e) Delete all deck 926 reports with the US country code (C1=2) and with IMM receipt date (IRD) November 1994 or later (regardless of data date). f) Delete all deck 926 reports during 1996-97 with SID=32. Background: a) A replacement tape was received from Canada (tape volumes D6ZZ48 and I3ZG01) to correct erroneous country codes and wet bulb temperatures that were misinterpreted as dew point temperatures due to incorrect settings in the sign position. This rule may have the unintended effect of eliminating any actual US-keyed data sent to other countries and then for some reason sent back. However, this probably would be harmless or beneficial. b) See for general background on the problem. During past Release 1a updates, all deck 926 (logbook) reports during 1980-88 containing the French country code (C1=4) were deleted, confined to the region 90E-90W across the dateline. Since the problem was poorly understood at the time, this approach simply removed all French logbook data, including the correction data received from France; instead we relied on the assumption that GTS receipts would be available since 1980. For this update, we are using a different approach: Two copies of the French correction tape have been included as part of the input data: the SID=58 copy with mislocated longitudes, and the SID=60 copy with properly located longitudes. SID=58 is deleted at completion of dupelim (ref. Table 1), after comparison with other duplicates. This allows us to assess whether this report deletion rule was successful in removing all or the majority of the mislocated data. c) Replacement data were received from the UK to correct erroneous SST measurement method indicators. The US requested "a repeat of all British data, amended where necessary for the SST indicator" to allow a complete replacement. d) Dutch doubled wind speeds. IMMT data were received from the Dutch containing German ship data with doubled wind speeds; the 1982 period is known to be affected. Later, a correction tape was received from the Dutch. We believe that the erroneous tape is I3ZG41, but it has not been possible to determine if the original correction tape still exists in IMMT format, since any correspondence also was lost. Erroneous data were translated into TD-1129 and appear in the 1987 delayed receipts (SID=35); it is assumed that no other source ID is affected. Presumably the corrected data were also translated, but the resulting source ID is unknown (IMMT data are now tracked by secondary country code and receipt date). Therefore, corrected Dutch or, notably, original German receipts without the secondary country code that also happen to reside within SID=35 will be deleted together with the erroneous data. In any case, lost data hopefully will appear in GTS receipts. This problem should be looked into more carefully after Release 1a. e) WMO's new Global Collecting Centres (GCC) for centralized exchange of International Maritime Meteorological (IMM) ship data, located in Germany and the UK, became operational around 1994, with the first GCC tape received by NCDC in November 1994. The GCC tapes contain data received from all countries including US-recruited ship data provided by NCDC. Starting about November 1994, US-recruited data as provided by NCDC to the GCCs were transitioned to be almost entirely ship data from the Global Telecommunications System (GTS), instead of keyed ship logbook data as was the case previously. This change was due to the initiation of NCDC's new Marine Observations Processing System (MOPS), which essentially discontinued keying except to the extent that data were not transmitted over GTS [NOTE: keying was apparently reinitiated during sporadic time periods, e.g., ~1997, due to questions about GTS data quality]. However, we receive the GTS and any US-recruited logbook data in more original form via decks 892 (NCEP) and 927 (US-keyed). This rule is therefore intended to ensure that only data received from other countries by the GCCs are input to dupelim from deck 926 (current dupelim rules favor logbook over GTS data under the assumptions that IMM data generally are from countries other than the US, and of higher quality than GTS receipts). f) The new 131-character IMMT format changed from octant to quadrant effective starting with data received through the UK GCC for the third quarter of 1996 (data received July 1996 forward). But NCDC's program to convert from IMMT to TD-1129 was not promptly updated to reflect this format change, and as a result misinterpreted quadrant as octant for an unknown period of time. This problem was clearly evident on the 1997 delayed tape D17ZZ12 (data received during 1997 for previous years), and thus that tape was not used for this update (but SEAS data from D17ZZ12 were extracted and placed onto a new volume D17ZZ21). There is some evidence that the error may also impact annual receipts (SID=32) for 1996-97, hence this rule. Deck 927: International Marine (US-keyed ship data) Rules: Delete any report from deck 927, unless it contains source ID 78 (US-keyed Logbook Data Reconversion; data keyed during 1996-97), applicable only during 1996-97. Background: This reconversion of the US-keyed data is expected to correct problems during 1996-97 apparently introduced by NCDC's MOPS processing, including possible modification or deletion of the sea surface temperature method indicator (SI). Tests have shown that dupelim cannot be relied upon to select SID=78 over other sources of deck 927, thus this rule is needed to ensure retention of SID=78. [NOTE: Delayed data in SID=78 extend back as far as 1987, but the amount of data prior to 1995 is relatively small. Nonetheless, a reconversion, and some keying or re-keying, of deck 927 back several years may be desirable for a future update to obtain more complete and better quality data.] Source IDs 73-76: NCDC Marine Obs. Processing System (MOPS)-Related SIDs Rules: Delete all reports from SIDs 73-76. Background: These SIDs represent various redundant datastreams from MOPS, included in the data input to dupelim to enable intercomparisons of the MOPS datastreams. However, all the final MOPS data should be available in SID=32 (plus delayed receipts in SIDs=41-45). Dupelim cannot automatically be relied upon to retain the final MOPS data in preference to these other MOPS datastreams, thus this rule is applied. [NOTE: Due to many concerns about MOPS processing, most SID=32 data since Nov. 1994 also are deleted in favor of reconverted NCEP (decks 892-896) and US-keyed (deck 927) data under SIDs=77-78, as discussed under those decks above.] {2.2 Field modifications} Field modifications take the form of deleting, modifying, or adding a field. (see sec. 3 for the rules used for setting the LMR fields for platform type and ID indicator). Erroneous data values already stored in the error attachment of LMR may or may not be affected by preconditioning, as specified under the description of each set of preconditioning rules. In addition, deleted data fields, for example, are not written out to the error attachment. These actions are important to note for any user of the error attachment, in case of unexpected effects. All decks Rules: a) Left-justify ID, with missing right fill. This rule is only applied if leading missing positions are not associated with data in the error attachment. Any ID characters in the error attachment are similarly shifted to ensure that, e.g., they will assume the proper position in a printout. b) Compute a missing dew point temperature if WBT and AT are extant; if SLP is missing 1015.0 is used as SLP. This rule is not applied if any of the data used for computation of DPT (i.e., SLP, AT, WBT, or T2) are in the error attachment, or if WBT is greater than AT. Constants ACON and BCON are set for computation of DPT relative to water: ACON=7.5 and BCON=237.3. The following Fortran code is then used to attempt computation of DPT: ESW = 6.1078*10.**(WBT*ACON/(WBT+BCON)) E = ESW-(.00066*SLP)*(((.00115*WBT)+1)*(AT-WBT)) IF(E.LT.0.) RETURN CCON = ALOG10(E/6.1078) DPT = BCON*CCON/(ACON-CCON) where the RETURN if vapor pressure (E) is less than zero leads to an error diagnostic, and otherwise the resulting DPT is rounded to the nearest 0.1@C. To indicate that this calculation has taken place during preconditioning, T2 is set to 3, 4, 5, or 6, simply depending on whether the previous value of T2 was missing, 0, 1, or 2. b') During January-May 1988, an amplification of this rule applies only to reports from deck 927 (International Marine; US-keyed ship data), and furthermore confined to reports containing a QC year and month of "8812" (original TD-1129 positions 141-144, assigned by NCDC, and available in Attm4): First, a test is made for equal temperatures (AT = WBT = DPT), in which case both WBT and DPT are deleted. Otherwise the contents of DPT are transferred into WBT, regardless of whether either quantity is missing, extant, or in the error attachment (i.e., any existing WBT data are deleted and replaced by the contents of the DPT field, and then any existing DPT data are deleted). Next the procedure outlined in b) above is followed to recompute DPT and set T2 (T2 should always have been missing in deck 927 before recomputation). If recomputation fails or is not attempted due to variables in the error attachment, DPT is left missing and T2 unchanged. b'') An amplification of this rule applies only during 1992-October 1994 to reports from deck 927 (International Marine; US-keyed ship data) with DPT extant (i.e., and not in the error attachment): First, DPT is set to missing, but the "old" DPT is saved in a temporary variable for later comparison. Then the procedure outlined in b) above is followed to recompute DPT and set T2 (T2 should always have been missing in deck 927 before recomputation). If recomputation fails or is not attempted due to variables in the error attachment, DPT is left missing and T2 unchanged. Otherwise, compare the recomputed DPT to the old DPT, with different diagnostics issued depending on the amount of any difference. c) Recover country code (C1) characters 00-40 from Attm5 by accepting an 11 overpunch over a 0-9 in either or both character positions (i.e., "}",J-R, plus "{" (12 overpunch over zero). Background: a) Although some input formats such as TD-1129 are documented to have ID left-justified, errors may have occurred (note that blanks were translated into missing during conversion into TD-1129). After this step, leading positions set to missing in the ID array should correspond to erroneous (or out-of-range) characters appearing in the error attachment. b) This step prepares for statistics by calculating DPT where it would otherwise be unnecessarily missing (see for background). b') Errors occurred in computation by NCDC of deck 927 wet bulb and dew point temperatures, primarily impacting January-May 1988 data, and apparently confined to reports with QC year-month "8812" (additional deck 927 DPT errors are known to impact 1992-93; see rule b''). Based on indirect evidence, it appears that the January-May 1988 errors resulted from three erroneous processes: i) Substitution of the original keyed WBT value into the DPT field. ii) Calculation of an erroneous WBT using the spurious contents of the DPT field (actually WBT). iii) Recalculation of DPT from the erroneous contents of the WBT field according to a flawed formula. [NOTE: Questions may still exist because we were unable to identify a uniform recalculation bias.] This rule will recompute DPT using DPT in place of WBT, assuming that the contents of the DPT field provide the best available approximation within deck 927 of the original keyed WBT value during the January-May 1988 period. Evidence for process iii) is that the contents of the DPT field typically failed to match WBT observations recorded on the logbook forms (based on spot checks). In general, due to substitution of WBT into the DPT field, resulting WBT and DPT values were higher, and wet bulb and dew point depressions (AT - WBT and AT - DPT) were lower, than observed. Another aspect of the problem appears to be excessive frequency of equal (saturation) temperature values (AT = WBT = DPT). Note that it will not be possible to ascertain from T2 whether DPT was computed under rule b) in the event it was missing in the original deck 927 report, or recomputed or deleted (in the event of a computation failure) under rule b'). b'') See http://www.cdc.noaa.gov/coads/corrections.html for general background on the DPT computational errors at NCDC that were originally addressed by this rule as part of the Release 1a 1992-93 extension. We are unaware of similar errors prior to 1992, but the errors likely extend at least into early 1994 in deck 927. Since WBT is believed to be generally correct in deck 927, this extends the recomputation and diagnostic checks through October 1994. The October 1994 date has been chosen to (approximately) correspond to initiation of the 2 November 1994 WMO SHIP code change, which permitted the reporting over GTS of DPT and/or WBT, with related changes made around then to U.S. ship logbook forms and to NCDC keying and data handling under the new Marine Observations Processing System (MOPS). Although MOPS is believed to have addressed the DPT errors in deck 927, due to the use of different algorithms for computation of DPT by MOPS versus ICOADS, less- significant computational biases with respect to ICOADS procedures appear to extend throughout 1994-95. c) See for a description of overpunches. This is a larger set of overpunch combinations than was accepted during conversion from LMR5 to LMR6 (ref. ). All decks except 883 Rules: Delete wave direction (WD) from all reports from all decks except 883 (NDBC). This rule is not applied to data in the error attachment, and the following comparison is made with wind direction (D) as a part of this check: WD values 1-36 must equal D/10, or WD=0 must correspond to D=361, or WD=38 must correspond to D=362. If WD does not match D exactly as stated, print a diagnostic message followed by a listing of the report, but still delete wave direction. Background: Effective 1 January 1968 wave direction was no longer ordinarily reported by ships. However, NCDC substitutes wind direction into missing wave direction. [NOTE: Release 1, p. J4 says that QC has been changed to make a temporary substitution of wind direction into wave direction, so that this rule should have no effect on QC results. This rule should affect only data converted from TD-1129. We should add LMR documentation about when and why the wave direction is expected to be missing in ship data, and how wave directions (and other wave fields) reported from buoys differ from ship data.] Deck 144: PMEL TOGA/TAO Buoys Rules: Delete data elements wind (speed and direction), sea surface temperature, and air temperature, which are flagged 0, 4, or 5 by PMEL (applicable to reports assigned to source ID 86 or 87). This rule applies to data in the error attachment. Background: See the rules for deck 144 in sec. 2.1 for definitions of the flags. Data elements flagged 0 and 5 should already be missing. Decks 201-255: UK Met. Office (UKMO) Main Marine Data Bank (MDB) Rules: Delete all extant past weather (W1) data from decks 201-255, applicable also to data in the error attachment. Background: Based on examination of patterns of rejected W1 fields during this time period, present weather (WW) data may inadvertently have been placed into the MDB flatfile field intended for W1 (e.g., rejected W1 values ranging up to 99, with ~50% rate of rejection). W1 data are deleted to avoid misinterpretation (since the meanings of WW=00-09 and W1=0-9 differ) and since W1 is used in dupelim. Deck 732: Russian Marine Meteorological Data Set (MORMET) (received at NCAR) Rules: a) Delete an extant station/weather indicator (IX) from any deck 732 report. This rule is applied to data in the error attachment. b) Delete an ID consisting of "000000." c) Starting in 1975, delete extant country code (C1), applicable to data in the error attachment. d) Starting in 1975, delete extant observation source (OS) and observation platform (OP), applicable to data in the error attachment. e) Delete extant sea surface temperature method indicator (SI), applicable to data in the error attachment. Background: a) (See background under deck 926.) b) Apparently meaningless IDs of "000000" are frequently present (mainly before 1982). c)-e) (See . ---------- The special deck rules listed in Table 1 are described below in order of precedence, i.e., the [a] rules take precedence over the [b] rules in the event of an [a]/[b] match (error diagnostic messages may be suppressed due to print volume considerations, but a summary of the number of diagnostics issued for each rule is printed): [a] Absolute pass through. These data should be duplicate free, and not available from any other source. Matches within this deck are ignored (all data are passed through). In the event of a match of a report from this deck with a report from any other deck (including a different [a] deck), the match is ignored, but dupelim generates an error diagnostic followed by a listing of the matching reports with an indication of the type of match that occurred but was not acted upon (a small amount of matching may be expected, e.g., for ships servicing PMEL stations). [b] Limited pass through. These data should be duplicate free, but may occasionally be available from other sources (e.g., where precondition- ing failed due to earlier deck misassignment, etc.). Matches within this deck are ignored (all data are passed through). When a report from this deck matches a report from any other deck, dupelim generates an error diagnostic followed by a listing of the matching reports. If the two matching reports are from different [b] decks, dup selection is resolved according to the default rules (i.e., quality code, priority code, and sort order). Otherwise, the report from the [b] deck is automatically selected unless the following two conditions are met: i) the non-[b] deck has platform type indicating ship (PT=5), and ii) dup status for the match is less than 9. If both conditions are met, the match is ignored (both reports are passed through and no error diagnostic is generated). [NOTE: Examination of preliminary diagnostic outputs showed cases where deck 883 (NDBC buoy or C-MAN) or deck 714 (MEDS) data matched other decks, e.g., a ship report colocated at a buoy or C-MAN station location, or GTS receipts of NDBC data misassigned to deck 892 (with a C-MAN ID, or "BOUY"), with dup selection often in both situations resolved in favor of the non-NDBC data. The stated rules are intended to assure that no [b] reports are deleted except if matches occur between different [b] decks, and that in most cases genuine ship data colocated with a buoy or C-MAN station are retained, but misassigned GTS receipts of buoy and C-MAN data are not.] [c] Special rules for IMM data. For matches between a report from deck 926 and any other deck, the default rules hold. For matches within deck 926, dup selection is determined solely by source ID, such that the report with the largest SID is selected (or the default rules apply within a SID). If, however, a report that has been selected solely because of its higher SID also has a quality code that is higher (i.e., inferior quality) by 6 or more quality points, then the report with the lower SID is selected instead. In this case, dupelim generates an error diagnostic followed by a listing of the two reports (due to print volume considerations, error diagnostics for this rule are printed only when they occur among the selected duplicates that may be automatically listed for a given run). [NOTE: This will select the IMM reconversion (SID=46-47) over any matching TD-1129 report that was previously converted from IMMT, all of which should reside in source IDs less than 46. Similarly, this will select the UK (SID=59) correction data in preference to anything else from deck 926, including SID=46-47 (in theory, all of the matches to SID=59 will be eliminated during preconditioning). The comparison of quality codes is intended to avoid problems that might arise since we are not guaranteed that a match will occur between two reports that were converted from the same original IMMT report that arrived at NCDC, due to all the "forwarding" of data by different countries that seems to have occurred. Otherwise, for example, a SID=46 report with very little data or data problems might get selected over a TD-1129 report with more data. However, the difference in quality code value also must be selected so as to also overcome the effect of "manufactured" data in TD-1129 reports.] [k] Automatic data rejection. Some data known to be available from other sources, and some incorrectly located reports, were deliberately introduced into the datastream (e.g., to test for the presence of the French longitude problem). In the event of a match of a [k] report with any other report (including another [k] report), the match is ignored. Following such testing for all possible duplicates with [k] reports, all [k] reports are automatically deleted from the dupelim output. [NOTE: Match results appear in dupelim summary Tables R5-R6, such that % BEST indicates selection under other rules supposing the automatic deletion rules were not in effect. Regarding diagnostic messages that appear in the dupelim output: [k] reports appear only in absolute pass through diagnostics; not in any other diagnostics.] [z] Non-selection except for unique reports. Matches within this deck are resolved by selection according to the default rules. Matches of a report from this deck with a report from any other deck automatically result in the non-[z] report being considered the best duplicate; thus under these circumstances [z] reports are deleted unless they are unique (or uncertain duplicates). Preliminary work has demonstrated that much of the 1980-92 data from this deck are available at higher quality from other sources, but this deck is still believed to contain some unique data. e) Revisions for dup status (DS) Following are revised configurations proposed for the LMR field for dup status (DS); DS settings marked by asterisks are retained for compatibility with Release 1 data (* applied to 1854-1979 data; ** applied to 1854-1969 data): 0 = unique 1 = best duplicate 2 = best duplicate with substitution *3 = worse duplicate, uncertain: weather elem. match with hour cross 4 = worse duplicate, uncertain: weather elem. match with no cross **5 = worse duplicate, uncertain: weather elem. match with day cross 6 = worse duplicate, uncertain: time/space match with ID mismatch *7 = worse duplicate, uncertain: weather elem. match with hour cross 8 = worse duplicate, certain: weather elem. match with no cross 9 = worse duplicate, certain: combined DS 4 and 6 10 = worse duplicate, certain: combined DS 6 and 8 11 = worse duplicate, certain: time/space/ID match 12 = worse duplicate, certain: combined DS 4 and 11 13 = worse duplicate, certain: combined DS 8 and 11 14 = automatic data rejection (reports subject to [k] rule) [NOTE: DS settings 6 and 7 were redefined (as listed above) for Releases 1a and 1b, and DS settings 8-14 added; reports with the following DS values (as previously defined) were not output for Release 1 (see Release 1, supp. K, p. K26): 6 = worse duplicate, certain with hour cross 7 = worse duplicate, certain with no cross Also, DS setting 7 has been further redefined for Releases 1a and 1b in classifying it as uncertain so that it would be output.) f) Revisions for dup check (DC) The dup check (DC) field retains the same configurations defined for Release 1, except that GTS and "logbook" data are defined according to Table 1 (above); also, DC may be set if two reports match under either the time/space checks or the weather element checks (certain or uncertain): 0 = GTS and logbook (or delayed mode) match with SLP and SST match 1 = GTS and logbook (or delayed mode) match without SLP and SST match 2 = not GTS and logbook (or delayed mode) match [NOTE: For a future update we should reassess whether this flag is configured optimally for the current data mixture, since the original intent of the flag was to record the extent of ship logbook/GTS matches. Having a mixture of buoys and other platform types complicates matters, as does having a mixture of GTS and logbook data in the Russian deck.] g) Substitutions between duplicates Rules: Starting in 1982, a station/weather indicator (IX) may be substituted from one report containing IX into another matching report that does not contain IX, subject to the following restrictions: i) Substitution may take place only among reports from decks 889, 892, 896, 926, and 927. ii) The two reports must match as "certain" duplicates without an hour cross (DS=8 or greater), such that the report to receive IX qualifies for DS=1, which is changed to DS=2 to indicate that substitution has taken place (thus reports with DS already set to a higher value as the result of an earlier match do not receive IX). iii) Only IX values indicating a manned station (IX=1-3) are substituted. iv) This rule is not applied if the report intended to receive IX has an IX, present weather (WW), or past weather (W1, W2) value in the error attachment. v) If IX=1 (weather data included), at least one of WW, W1, or W2 must be extant in the report intended to receive IX. Conversely, if IX=2 or 3 (weather data omitted because of no significant weather, or weather data not observed), all of WW, W1, and W2 must be missing in the report intended to receive IX. Background: WMO added the station/weather indicator (IX) to the GTS SHIP code on 1 January 1982, but IX was not added to the IMMT format until March 1985, and is still omitted from IMMT data from some countries due to documentation problems. Furthermore, IX was not implemented in the NCEP format until 9 May 1984. [NOTE: This rule is limited to the period starting in 1982, although it is possible that some countries may have supplied delayed data dating back before 1982 in the March 1985 version of the IMMT format. In addition, this rule does not handle IX values greater than 3 that may originate from automated platforms, because we expect these to be very infrequent. Regarding implementation, if a report that had a previous substitution as indicated by DS=2 is subject to a change to a higher dup status as the result of a subsequent match, the previously substituted IX value is deleted after possible substitution into the newly chosen best report. This means that only reports with DS=2 will contain a substituted IX upon completion of dupelim, but that IX values may propagate from one report to another.] [NOTE: Due to frequent disparities between GTS and logbook data, some more generalized form of report compositing may be highly desirable in the longer term, involving fields such as the following: ID/call sign Sea surface temperature method indicator (SI) Wind speed indicator (WI) Ship course and speed (SC, SS) Barometric tendency, amount of pressure change (A, PPP) In planning any such future substitutions, a issue that needs to be carefully considered is whether the report should be re-QC'd each time a substitution takes place, and the quality code recomputed (IX is not considered during QC, thus changing it has no effect on the quality code). Note that if the quality of a report were to improve during one match, this could influence the results of subsequent matches with other reports.]