=============================================================================== International Comprehensive Ocean-Atmosphere Data Set (ICOADS): Release 2.4 Monthly Summary Groups (MSG1) 22 September 2007 ========================================================================== Document Revision Information (previous version: 27 February 2004): Updates (heading only) for Release 2.4. ------------------------------------------------------------------------------- {1. Introduction} This document provides a technical description of the Monthly Summary Groups (MSG) format. This format is designed to store both 1-degree, and 2-degree, latitude x longitude monthly summaries (the format also has the capability, unused at present, to store 0.5-degree data). MSG products are currently available covering the global domain (1-degree and/or 2-degree boxes) and an equatorial domain (1-degree), with 1-degree products available only for 1960 forward. The boundaries of the global 2-degree boxes fall on even degrees of latitude and longitude (e.g., 0-2E, 88-90N). The 2-degree box system is the same as LLN2F1 as described in Release 1, supp. G, except omitting the polar boxes (16,200 boxes total). The boundaries of the global 1-degree boxes fall on units of latitude and longitude (e.g., 0-1E, 89-90N) (64,800 boxes total). The equatorial domain comprises the latitude band 10.5N to 10.5S, and is global with respect to longitude. The equatorial 1-degree boxes are shifted half a degree in latitude (only) in comparison to the global domain, such that the center-latitude of the central row of boxes is the equator (e.g., 0-1E, 0.5S-0.5N). See for additional information about the details of products and time periods of data currently available in the MSG format. Six "groups" of variables make up MSG (Table 1a). The variables comprise the 19 variables that were produced for COADS Release 1 (groups 3-7) plus three additional variables that make up group 9: the cube of the wind speed, W**3, as well as the zonal and meridional contributions to the latent heat flux, U(QS-Q) and V(QS-Q). The ten statistics calculated for each variable are listed in Table 1b. Header fields, as defined below Table 1b, precede the statistics in the MSG format. The Fortran read program for the MSG format should provide ready access to the header fields and statistics included in each group (using the abbreviations given in Tables 1a and 1b, and for the header fields as follow Table 1b). As background, however, a technical description of the format layout (bit lengths) is given in sec. 2, and detailed information about the reconstruction of floating point data is given in sec. 3. Each MSG record contains the data for a single year, month and (2- or 1-degree) box. The MSG records are labeled with the year, month, and box-size, and the latitude/longitude coordinates of the lower-left corner of the box. The weighted mean position of all observations within a box can be obtained from the corner coordinates plus the longitude and latitude offset statistics, x and y, for each variable. MSG records were output only for year-months and boxes containing at least one observation of at least one variable. Records are supressed for boxes (and a few entire year-months) without any extant data. This provides an inherent form of data compression for sparsely populated spatial grids. The records are sorted by year, month, and either 2- or 1-degree box. The 2- and 1-degree box systems proceed east from the prime meridian (i.e., starting with the box with its SW corner longitude at 0E) and spiral down through each zone of latitude. Table 1a. Variables in MSG. Each group contains four variables and ten selected statistics (Table 1b) for each variable. Derived variables in groups 5-8 are computed as indicated from individual observations of other variables, e.g., the wind-stress parameter "X" is the product of W and U. In addition, QS denotes saturation Q at SST. ------------------------------------------------------------------------------- Group 3 4 5 6 7 9 =============================================================================== Sea sfc. temp. (S) Scalar wind (W) Total cloud. (C) D=S - A I=UA M=FU Air temp. (A) Wind U-comp.(U) R E=(S - A)W J=VA N=FV Specific hum. (Q) Wind V-comp.(V) X=WU F=QS - Q K=UQ B1=(W**3)* Relative hum. (R) SLP (P) Y=WV G=FW L=VQ B2=(W**3)* ------------------------------------------------------------------------------- * B1 and B2 are high- and low-resolution representations of B = W**3 (i.e., data increments of 0.5 or 5 m**3/s**3; see Table 4b). ---------- Table 1b. Statistics in MSG. Each statistic is assigned a number and a lower-case abbreviation (from Release 1, supp. A, Table A1-2). ------------------------------------------------------------------------------- No. Abbrev. Variable =============================================================================== 9 s1 1/6 sextile (a robust estimate of the mean minus 1 standard dev.) 11 s3 3/6 sextile (the median) 13 s5 5/6 sextile (a robust estimate of the mean plus 1 standard dev.) 6 m mean 5 n number of observations 7 s standard deviation 1 d mean day-of-month of observations 2 ht fraction of observations in daylight 3 x mean longitude of observations 4 y mean latitude of observations ------------------------------------------------------------------------------- Following is a definition of the MSG header fields preceding the statistics in the MSG format: RPTIN RPTID The RPTIN field is reserved for use of the RPTIN unblocking utility, where available (e.g., NCAR), and RPTID indicates the MSG format version number (1). YEAR The year can range from 1800 to 2054. MONTH 1=January, 2=February, ..., 12=December. BSZ box size 0 = 0.5 degree latitude x longitude (not currently used) 1 = 1 degree latitude x longitude 2 = 2 degree latitude x longitude BLO box left (W) corner longitude (E) BLA box lower (S) corner latitude (+N, -S) Coordinates (half-degree precision) of the lower-left (SW) corner of the box (BSZ gives the box size). For a given variable, the mean sample position of the observations within a box can be obtained from: mean longitude = BLO + x mean latitude = BLA + y Note that longitude is always measured in east coordinates. Because the x values can range up to two degrees (one degree) for 2-degree (1-degree) boxes, the resultant range of mean longitude is 0-360E. PID1 product identification part 1 PID2 product identification part 2 Presently, PID1 is unused. PID2: 0 = standard statistics (3.5 sigma trimming limits; ship data) 1 = enhanced statistics (4.5 sigma trimming limits; ship + other data) GRP group Group number (3-7 or 9). CK checksum A checksum was computed and stored with each packed summary as a measure of reliability during storage and transmission. The Fortran read program will recalculate the checksum and compare it to the stored checksum. If disagreement is found data processing will stop and an error statement will be issued. This problem indicates that the data file is corrupted or the access software is not correctly implemented. {2. Details of Monthly Summary Groups (MSG)} Table 2 shows the bit layout in common to any MSG (regardless of group number); Tables 3a through 3f show the bit layout of each of the 64-bit or 16-bit sections of groups 3-7 and 9 (note that Tables B1-1b through B1-1f in Release 1, supp. B give identical bit layouts for groups 3-7 in the MSTG1 format). Each variable is assigned a number and an upper-case abbreviation (from Release 1, supp. A, Table A1-1). Example of bit layout: If we denote the lower-case abbreviation for each statistic by "a" and the upper-case abbreviation for each variable by "B", group 3 contains, in order, RPTIN, RPTID, ..., CK followed by: ((aB, B = S,A,Q,R), a = s1,s3,s5,m,n,s,d,ht,x,y) I.e., s1 of S, s1 of A,..., s1 of R; s3 of S, s3 of A,..., s3 of R; ... ; y of S, y of A,..., y of R. The MSG format was developed as an extension and enhancement to the MSTG format; following is a summary of changes in comparison to MSTG2: a) Header information (first 64 bits): The 2- and 10-degree box numbers are omitted. Instead, the box size and coordinates are specified (thus accommodating different box systems within the MSG format). Data falling precisely at the North (or theoretically at the South) Pole are handled differently in MSG: there are no 2-deg (or 1-deg) boxes dedicated to +90 or -90 latitude, as in the 2-degree box system used for MSTG. Instead, data are assigned to boxes adjoining the poles based on reported longitude and other box inclusivity rules (see Release 1, supp. G). Header fields also were added for product identification (i.e., standard versus enhanced statistics). b) The 1st and 5th sextiles were added, and the estimated standard deviation was replaced by the actual standard deviation; the estimate can however still be computed, i.e., (s5 - s1)/2. c) The units and ranges (of true values) of the mean longitude and latitude of the observations (x and y) depend on the box size, as detailed in sec. 3. Table 2. The Monthly Summary Group (MSG1) format. Fields added (with respect to MSTG2) are marked by a double asterisk (**), and when the number of bits for a field has changed (*), the old MSTG2 value is shown in parentheses. ------------------------------------------------------------------------------- No. Abbrev. Description Bits =============================================================================== Header fields: RPTIN (reserved) 12 RPTID (reserved) 4 YEAR 8 MONTH 4 BSZ box size 3** BLO box left (W) corner longitude 10** BLA box lower (S) corner latitude 9** PID1 product identification part 1 3** PID2 product identification part 2 3** GRP group 4 CK checksum 4*(8) Statistics: 9 s1 1/6 sextile (est minus 1 sigma) 64** 11 s3 3/6 sextile (the median) 64 13 s5 5/6 sextile (est plus 1 sigma) 64** 6 m mean 64 5 n number of observations 64 7 s standard deviation 64** 1 d mean day-of-month of observations 16 2 ht fraction of observations in daylight 16 3 x mean longitude of observations 16 4 y mean latitude of observations 16 total 512*(384) = 64B ------------------------------------------------------------------------------- Table 3a. Group 3, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 1 S sea surface temperature 16 4 2 A air temperature 16 4 8 Q specific humidity 16 4 9 R relative humidity 16 4 total 64 16 ------------------------------------------------------------------------------- Table 3b. Group 4, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 3 W scalar wind 16 4 4 U vector wind eastward component 16 4 5 V vector wind northward component 16 4 6 P sea level pressure 16 4 total 64 16 ------------------------------------------------------------------------------- Table 3c. Group 5, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 7 C total cloudiness 16 4 9 R relative humidity 16 4 14 X WU 16 4 15 Y WV (14-15 are wind stress parameters) 16 4 total 64 16 ------------------------------------------------------------------------------- Table 3d. Group 6, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 10 D S - A = sea-air temperature difference 16 4 11 E (S - A)W = sea-air temperature difference*wind magnitude 16 4 12 F QS - Q = (saturation Q at S) - Q 16 4 13 G FW = (QS - Q)W (evaporation parameter) 16 4 total 64 16 ------------------------------------------------------------------------------- Table 3e. Group 7, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 16 I UA 16 4 17 J VA 16 4 18 K UQ 16 4 19 L VQ (16-19 are sensible and latent heat transport parameters) 16 4 total 64 16 ------------------------------------------------------------------------------- Table 3f. Group 9, 64-bit or 16-bit sections. ------------------------------------------------------------------------------- No. Abbr. Variable Bits Bits =============================================================================== 20 M FU 16 4 21 N FV 16 4 22 B1 B = W**3 (high-resolution representation) 16 4 23 B2 B = W**3 (low-resolution representation) 16 4 total 64 16 ------------------------------------------------------------------------------- {3. Reconstruction of floating point data} The Fortran access program provides the logic necessary to transfer binary data into memory and then extract into INTEGER variables the bit strings whose lengths are given in sec. 2. Refer to Release 1, supp. H for more information about the techniques used. Compression was achieved by packing data represented as positive integers into fields whose lengths are specified in the bits column of Tables 2 and 3a through 3f. To accomplish this, a field's floating point true value was divided by its units (the smallest increment of the data that has been encoded). Then the base was subtracted to produce, after rounding, a coded positive integer, which was finally right-justified with zero fill in the field's position within the summary. Using the true value mean of sea surface temperature value 28.61 degrees C as an example, (28.61/0.01) - (-501) = 3362. Once a given field has been extracted into the coded value, the true value can be reconstructed by reversing the process: true value = (coded + base) * units The above true value example is reconstructed by (3362 + (-501)) * 0.01) = 28.61 degrees C. NOTE: in each coded value, zero is reserved as an indicator of missing data. The coded and true value ranges, the units, and the base associated with each header field and statistic will be found in Table 4a. In the case of the first and fifth sextiles, median, mean, and standard deviation, these quantities are different for each variable, hence cross-reference to Table 4b. Similarly for the mean longitude and latitude of the observations, where the units depend on box size, hence cross-reference to Table 4c. Table 4a. Unpacking header fields and statistics. Notation is as follows: m:n denotes m through n inclusive; @ is used as a plain text abbreviation for the degree symbol. "Units" gives the smallest increment of the data that has been encoded; thus a change of one unit in the integer coded value represents a change in the true value of one of the units shown (units of 1 are explained in the text). ------------------------------------------------------------------------------- Abbr. Description True value Units Base Coded =============================================================================== Header fields -------------------------------------------- RPTIN (reserved) n/a n/a n/a n/a RPTID (reserved) n/a n/a n/a n/a YEAR 1800:2054 1 1799 1:255 MONTH 1:12 1 0 same BSZ box size 0:2 1 -1 1:3 BLO box left (W) corner longitude (E) 0:359.5 0.5@ -1 1:720 BLA box lower (S) corner latitude (+N, -S) -90.0:90.0 0.5@ -181 1:361 PID1 product identification part 1 (presently unused) PID2 product identification part 2 0:1 1 -1 1:2 GRP group 3:9 1 0 same CK checksum n/a n/a n/a n/a Statistics -------------------------------------------- s1 1/6 sextile (est minus 1 sigma) (all as given in Table 4b) s3 3/6 sextile (the median) (all as given in Table 4b) s5 5/6 sextile (est plus 1 sigma) (all as given in Table 4b) m mean (all as given in Table 4b) n number of observations 1:65535 1 0 same s (e) standard deviation (or estimate; MSTG) 0:# Table 4b -1 1:# d mean day-of-month of observations* 2:30 2 days 0.0 1:15 ht fraction of observations in daylight 0.0:1.0 0.1 -1 1:11 x mean longitude of observations Table 4c Table 4c -1 1:11 y mean latitude of observations Table 4c Table 4c -1 1:11 ------------------------------------------------------------------------------- * A coded value of 16, which would otherwise result when the calculated mean day-of-month is 31 (e.g., 31/2 = 15.5, rounded = 16), is avoided by changing 16 into a coded value of 15 prior to storage. This means that the coded value 15 represents a slightly larger numeric interval than the other values. # Standard deviations have a true value ranging upwards from zero for all variables, thus the base is always -1. Units for each variable are still chosen from Table 4b. [NOTE: The MSTG format contains the standard deviation estimate instead of the standard deviation about the mean (s); this robust estimate is computed from the fifth and first sextiles: e = (s5 - s1)/2. For unpacking purposes, e is treated exactly like the corresponding standard deviation of each respective variable.] ---------- Table 4b. Unpacking variables (notation as for Table 4a). Variables 20-23 are unique to the MSG format; otherwise the information presented here follows Release 1, supp. A, Table A2-4b. ------------------------------------------------------------------------------- No. Abbrev. Variable True value* Units Base Coded =============================================================================== "Observed" ------------------------------ 1 S sea surface temperature -5.00:40.00 0.01 @C -501 1:4501 2 A air temperature -88.00:58.00 0.01 @C -8801 1:14601 3 W scalar wind 0.00:102.20 0.01 m/s -1 1:10221 4 U vector wind eastward comp. -102.20:102.20 0.01 m/s -10221 1:20441 5 V vector wind northward comp. -102.20:102.20 0.01 m/s -10221 1:20441 6 P sea level pressure 870.00:1074.60 0.01 hPa 86999 1:20461 7 C total cloudiness 0.0:8.0 0.1 okta -1 1:81 8 Q specific humidity 0.00:40.00 0.01 g/kg -1 1:4001 Derived ------------------------------ 9 R relative humidity 0.0:100.0 0.1 % -1 1:1001 10 D S - A -63.00:128.00 0.01 @C -6301 1:19101 11 E (S - A)W -1000.0:1000.0 0.1 @C m/s -10001 1:20001 12 F (saturation Q at S) - Q -40.00:40.00 0.01 g/kg -4001 1:8001 13 G FW -1000.0:1000.0 0.1 g/kg m/s -10001 1:20001 14 X WU -3000.0:3000.0 0.1 m**2/s**2 -30001 1:60001 15 Y WV -3000.0:3000.0 0.1 m**2/s**2 -30001 1:60001 16 I UA -2000.0:2000.0 0.1 @C m/s -20001 1:40001 17 J VA -2000.0:2000.0 0.1 @C m/s -20001 1:40001 18 K UQ -1000.0:1000.0 0.1 g/kg m/s -10001 1:20001 19 L VQ -1000.0:1000.0 0.1 g/kg m/s -10001 1:20001 20 M FU -1000.0:1000.0 0.1 g/kg m/s -10001 1:20001 21 N FV -1000.0:1000.0 0.1 g/kg m/s -10001 1:20001 22 B1 B = W**3 (high-resolution) 0.0:32767.0 0.5 m**3/s**3 -1 1:65535 23 B2 B = W**3 (low-resolution) 0:327670 5 m**3/s**3 -1 1:65535 ------------------------------------------------------------------------------- * Each individual observation is checked against the given range of true values (with B1 as an exception, as discussed below), and only individual observations within range are included in the statistics. The total cloudiness code N=9 for "sky obscured or cloud amount cannot be estimated" is thereby always rejected. Generally other variables should fall within the true value ranges due to the application of trimming (e.g., Release 1, Table C2-3) and other preprocessing. B1 is handled differently, in that each observation of W**3 is checked only against the true value column for B2 (not B1). Once final statistics for W**3 have been calculated, the attempt is made to store each W**3 statistic in both B1 and B2. Should a given statistic exceed the highest value allowed for B1, it is stored only in B2. Note that in this case one statistic (e.g., m) may be stored in both B1 and B2, but another (e.g., s5) only in B2. ---------- Table 4c. Unpacking mean longitude and latitude (x and y) of the observations, depending on box size (BSZ) (notation as for Table 4a). After unpacking, it should be noted that the resultant true values corresponding to the lower- and upper-most coded values are not centered in the numerical intervals that they represent. The lower portion of the table provides information about this issue, also listing "best" adjusted values. ------------------------------------------------------------------------------- Abbr. Description True value Units Base Coded =============================================================================== BSZ = 2 = 2 degree latitude x longitude: x mean longitude of observations 0.0:2.0 0.2@ -1 1:11 y mean latitude of observations 0.0:2.0 0.2@ -1 1:11 BSZ = 1 = 1 degree latitude x longitude: x mean longitude of observations 0.0:1.0 0.1@ -1 1:11 y mean latitude of observations 0.0:1.0 0.1@ -1 1:11 BSZ = 0 = 0.5 degree latitude x longitude (not currently used): x mean longitude of observations 0.0:0.5 0.05@ -1 1:11 y mean latitude of observations 0.0:0.5 0.05@ -1 1:11 Resultant (and best numeric fit) mappings of coded values into true values: Coded 2-degree 1-degree 0.5-degree ----- ------------ ------------ ------------ 1 0.0 (0.05) 0.0 (0.02) 0.0 (0.01) 2 0.2 0.1 0.05 3 0.4 0.2 0.1 4 0.6 0.3 0.15 5 0.8 0.4 0.2 6 1. 0.5 0.25 7 1.2 0.6 0.3 8 1.4 0.7 0.35 9 1.6 0.8 0.4 10 1.8 0.9 0.45 11 2.0 (1.95) 1.0 (0.98) 0.5 (0.49) -------------------------------------------------------------------------------