Overview of ICOADS Duplicate Elimination Procedures

This webpage was developed to provide a brief overview of the duplicate elimination procedure, and related processing steps—collectively referred to as "Dupelim"—used to blend together varied historical data sources, into ICOADS delayed-mode (DM) updates. This Appendix provides further details on current DM procedures, as well as on simplified procedures used to create real-time (RT) "preliminary" updates based on Global Telecommunication System (GTS) receipts.

A fundamental contrast exists between processing marine and land temperature (only) data. Specifically, dupelim operates largely by manipulating on data by "marine report," i.e. the collection of meteorological/oceanographic observations, such as air and sea surface temperatures (AT and SST) and sea level pressure (SLP), observed by a given marine platform (e.g. ship or buoy) at a specific date/time and geographic location.

The Dupelim procedure used for the most recent DM update of ICOADS is divided into several separate steps (e.g. sorting and QC-flagging input data, and preparing the final user output), as described in more detail in the Appendix. The two major steps are "Preconditioning," in which suspect reports are deleted and fields in some cases modified/corrected; following by the core "Dupelim" step in which duplicate reports are flagged for later elimination.

The core Dupelim procedure considers reports within the same 1°x1° box, and only for earlier historical data within plus or minus one hour or day, as possible duplicates. It performs a check for seven weather elements—wind speed, visibility, present weather, past weather, SLP, AT, and SST—to determine the degree to which reports are duplicated ("dups"). These checks for weather elements include "allowances," which consider weather elements etc. to match under some circumstances even though they were not exactly equal.

A quality code as computed by a marine QC procedure originally designed by NCDC, generally serves as the initial basis for selection of one duplicate report over another. Additionally, priority codes are available if two reports match with equal quality code (e.g. generally favoring delayed-mode ship/buoy reports over GTS receipts), together with a few special rules that can be set and applied to selected data sources. These rules can shield known high-quality sources from potential negative impacts, e.g. passing a high temporal resolution data source known to be internally unique through Dupelim without elimination of any apparent internal/external duplicates.

Thus far, ICOADS has not generally made available processing codes, except for libraries and utilities intended for common public use. Looking to the future however, we recognize the need (as resources permit) to develop software in other languages, and to make publicly available final processing codes so that they will be fully open and transparent (see Appendix for further details).

