=============================================================================== International Comprehensive Ocean-Atmosphere Data Set (ICOADS): Release 2.4 Software to Read/Write Long Marine Reports (LMR6) 22 September 2007 ===================================================================== Document Revision Information (previous version: 27 February 2004): Updates for Release 2.4 and for the ICOADS URL. ------------------------------------------------------------------------------- {1. Introduction} The Long Marine Report version 6 (LMR6) format is more complex than other ICOADS formats, due to its variable-length, and its mixture of packed binary plus characters. This document describes generalized machine-transportable Fortran 77 software available to help read and write LMR6. Sec. 2 describes documentation conventions used here and in additional "software documentation" (s-doc) prefacing each software file in the form of program comments. Sec. 3 lists the available files, and indicates those needed for reading versus writing LMR6. Sec. 4 reviews the structure of the LMR6 format, and sec. 5 discusses a variety of computing environment issues applicable to reading and/or writing LMR6. Further details about the LMR6 read program are given in the s-doc prefacing the LMR6 read program, including its ability to read multiple LMR6 input files, and, optionally, to write one output LMR6 file based on the input LMR data. Sec. 6 and Appendix I, plus the s-doc prefacing the LMR6 write program, discuss the installation and operation of the LMR6 write program. Appendix II provides a discussion of additional issues specific to the IBM mainframe (MVS) computing environment. {2. Documentation conventions used for software} Filename suffixes with "." (or their absence) indicate: ".f" : Fortran 77 library ".c" : C language library ".o" : object module compiled on Sun Solaris system (optional) ".tar": a multi-file "tarfile" created by the Unix tar command none : Fortran 77 program enclosed in Sun Unix shell commands Detailed software documentation (s-doc) appears as comments at the top of each software file. In the s-doc, curly brackets ({}) are used as a notational device to enclose the names of programs or accompanying software libraries, and also the names of subprograms within the programs or libraries. For example, {date.f} refers to a Fortran 77 library and {date,time} refers to two "user-interface" routines that make up {date.f}. "User-interface" routines are those routines within a library that are intended to be accessed by users; other routines should not require direct user access or any modifications. NOTE: The brackets are strictly notational (not part of the actual file name). {3. Available files} The following files located in http://icoads.noaa.gov/software/ are relevant to the tasks of reading and/or writing LMR6: {date.f} {date.o} {ebcasc.f} {ebcasc.o} {gsbytes.c} {gsbytes.o} {lmrlib} {rdlmr6} {rptin.f} {rptin.o} {wrlmr6.tar} The specific software file requirements for reading and/or writing LMR6, and a brief functional summary in each case, follow: Reading (and optionally writing) LMR6: Fortran program required: {rdlmr6} Libraries (or equivalent) required: {ebcasc.f,gsbytes.c,rptin.f} Object files (optional): {ebcasc.o,gsbytes.o,rptin.o} Functional summary: Program {rdlmr6} reads and prints LMR6. Multiple input LMR6 files can be read, and, optionally, one LMR6 output file can be written based on the input LMR6 data. Writing LMR6: Tarfile required: {wrlmr6.tar} Libraries (or equivalent) required: {date.f,ebcasc.f,gsbytes.c,rptin.f} Object files (optional): {date.o,ebcasc.o,gsbytes.o,rptin.o} Functional summary: The tarfile {wrlmr6.tar} contains program {wrlmr6} plus "benchmark" results. Program {wrlmr6} includes an example main program {eg} and {test} routine, plus logical function {wrlmr} which provides the core function of writing LMR6 data. Note regarding compiled object files included in the file directory: These were generated in NOAA/CDC's current Sun Solaris environment. They are strictly optional, and are unnecessary unless appropriate for your computing environment. {4. LMR6 structure} As background, we highlight from the overall structure and size of the LMR6 (versus the fixed-length LMRF6) format: QC trimming suppl. error [loc.][reg.][^ctrl] [^Attm1] [^Attm2] [^Attm4] [^Attm5] = variable length [loc.][reg.][^] [ Attm2] = 64 B (512 bits) (where ^ indicates the first 15 bits of the control section, or the first 12 bits preceding the data in each attachment). LMR averages about 50% larger than the fixed-length LMRF. The QC and trimming attachments (Attm1 and Attm2) are not created during conversion processing; these are added during later quality control and duplicate elimination processing. Only the supplemental and error attachments are created during conversion processing: a) The supplemental attachment (Attm4) is used to store information from each distinct dataset that will not fit into the location (loc.) or regular (reg.) sections of LMR6, or whose conversion is questionable. Fields stored in Attm4 may have originated from characters (converted from ebcdic to ascii, if applicable, via a standard conversion utility offered as part of the software) or binary data, as indicated by source ID. To allow automated decoding of Attm4, it is important to note that a different source ID (SID) must be assigned to each input format (deck and SID assignments need to be agreed in advance). From the user standpoint, storage by {wrlmr6} in Attm4 of both ascii and binary data is effectively in the form of integral 8-bit bytes (up to 255 bytes). Actually, however, the input 8-bit bytes are mapped into the 4/8/12-bit "ship" character-set by {wrlmr6}, possibly resulting in a decrease in the number of 8-bit bytes that can actually be stored (if so, an error is generated by {wrlmr6}). b) The error attachment (Attm5) is used to store fields identified by the user of {wrlmr} as erroneous (e.g., containing invalid characters), or when {wrlmr} encounters a numeric field that is outside the range defined in for each field. Similarly to Attm4, all characters within Attm5 are maintained either in ascii or binary in the form of integral 8-bit bytes (up to eight 8-bit bytes, per LMR6 field). It should be noted that the LMR6 format also has the flexibility to add new attachments, if clearly needed. {5. Computing environment issues} As reviewed in sec. 4, the LMR6 format is a relatively complex, variable- length format incorporating a mixture of packed-binary data and characters. Therefore, in comparison to other ICOADS products, which utilize fixed-length packed-binary (or character-only formats), the underlying software is less easily machine-transportable. Nevertheless, the programs to read and write LMR6 {rdlmr6,wrlmr6} have been successfully tested and utilized on 32- and 64-bit Unix (native-ascii) computers, and on a IBM 32-bit (native-ebcdic) mainframe computer (VSFORTRAN V2.6.0). At this time, however, the programs have not been tested on computers with byte-swapping conventions used in storage within computer words (e.g., VAX and PCs). Specific computing environment issues are discussed under the following headings, including a description of the role played by the different software libraries in addressing some of these issues (also see the s-doc accompanying the software libraries for more details about their operation, and Appendix II for additional considerations in the IBM mainframe environment): a) Local system date and time: {date.f} Logical function {wrlmr} produces a "conversion summary" which includes information about when the program was run. Local date and time are not Fortran ANSI 77 standard capabilities, therefore these functions have been isolated into library {date.f} for user modifications as needed to adapt the routines to local system features and implement the documented functionality. As provided, {rdlmr6} only accesses local date and time via the Unix shell command "date". If this command, or the equivalent capability, is not readily available, the date and time will appear blank on the output listing. b) Native machine character-set differences/print mode: {ebasc.f} The programs to read and write LMR6 ({rdlmr6,wrlmr6}, and the program to read LMRF6 {rdlmrf6}, all contain common block /ENV/ with variable CSET that is used to share information throughout the software about the native machine character-set. By default CSET=ASC for ascii, but on an IBM mainframe computer it should be changed by the user to EBC for ebcdic. Character information, including the ID fields (LMR6 fields 57-64), may otherwise be printed or handled improperly. In connection with CSET, the routines in {ebcasc.f} need to be included both to read or write LMR6; these routines provide fully-reversible mappings from ebcdic to ascii, and vice versa. A separate variable MODE is also utilized in {rdlmr6} and {wrlmr6} to provide different handling and printing of binary or character data. In both programs it should be noted that a 'Z' format specification is used that is not part of the Fortran 77 ANSI standard, but nevertheless appears widely available, to produce hexadecimal printouts. In {rdlmr6}, MODE governs the format of printouts of the supplemental and error attachments, where CHR (character) or BIN (binary) are the available MODE values. MODE is passed as an argument to print routines {prnsup,prnerr} (or to the alternative, more detailed, routines {prnsu2,prner2}). Either MODE may be used, but the original source data format will dictate which is most appropriate for a given data set (characters judged unprintable are set to blank by the print routines if MODE=CHR). When MODE=BIN, the output is printed in hexadecimal rather than literally in binary, thus "HEX" (rather than BIN) is printed for labelling purposes. In {wrlmr6}, MODE is included with CSET in the /ENV/ common block. It governs the sort order and printed format of a "conversion summary" tabulation of the frequency distribution of occurrences of patterns stored in the error attachment. Either MODE may be used, but if the input data are characters (or primarily characters), MODE is best set to CHR (its default); or, if the input data are binary (or primarily binary) MODE is probably best set to BIN (Appendix I provides a detailed description of the conversion summary). c) Bit-string manipulations: {gsbytes.c} LMR6 and other ICOADS packed-binary formats utilize bit-strings that are less than or equal to 16 bits (in presently-available products), but otherwise of varying length and organization across different formats (possibly crossing actual machine byte and word boundaries). The C language library {gsbytes.c} provided among the available files is one implementation of routines {gbyte,gbytes,sbyte,sbytes} used to manipulate the bit-strings (or "bytes"). Specifically, {gbyte} ("get byte") unpacks one such byte or {gbytes} unpacks multiple bytes, and {sbyte} ("store byte") packs one byte or {sbytes} packs multiple bytes (multiple bytes must all be of the same length). The first versions of these routines were developed in the 1970s by NCAR. Additional Fortran and other versions, tuned for different computer environments, are available from NCAR: http://dss.ucar.edu/libraries/gbytes/ d) Blocking of variable-length records: {rptin.f} Two enveloping structures are used around the basic LMR6 records: (i) a "RPTIN" format defined by NCAR, and (ii) a "Cray Operating System" (COS) blocked file structure to enclose the RPTIN physical records. More information, including references, is provided in the s-doc included with {rptin.f}. {rptin.f} is available for 32-bit computers, and a 64-bit implementation exists in the Cray environment, to handle all the details of reading and writing these structures. e) Unix shell scripts As provided, {rdlmr6,wrlmr6} are each enclosed in a customized Unix Bourne shell (sh) script. [NOTE: More discussion may be needed of the interaction of the sh shell with other Unix shells.] Usage of these commands is optional, but they provide an illustration of one method of compilation and execution. For example, the following commands enclose the Fortran code included in {rdlmr6}: cat > p.f <<\EOR (Fortran code appears here) EOR a=/data/coads/software rm a.out f77 p.f $a/ebcasc.o $a/gsbytes.o $a/rptin.o date ls $* | a.out Assuming that the file {rdlmr6} has executable status in the Sun environment and "filename" contains LMR6 data to be read, after typing: rdlmr6 filename the individual shell commands then perform the following functions: a) The "cat" command copies the Fortran code (down to "EOR") into p.f. b) Variable "a" is set to indicate the pathname of the object libraries [NOTE: this setting reflects the local NOAA/CDC environment, and should be modified as needed]. c) File a.out (if present) is removed by the "rm" command (in the Sun environment, a.out is the default external filename for an executable output file resulting from compilation, e.g., of p.f). d) The "f77" command compiles p.f, and loads it together with the three (pre-compiled) object libraries into the executable output file a.out. e) The Unix "date" command is used to output the date and time. f) Execute a.out. In this example, "filename" forms the input because as the only typed argument it is substituted into $*. [NOTE: A list of filenames could be inserted in place of $* to read multiple files, as discussed in the s-doc for {rdlmr6}.] In a non-Sun Unix environment, these commands may be eliminated or modified to fit individual requirements. For example, we found that the following additional f77 arguments were needed in an HP Unix environment: f77 +U77 +ppu (followed by source and/or object library names) where the +U77 option picked up the library that contained system-dependent routines needed by {date.f}, and the +ppu added underscores onto all the calls to external routines (needed because of underscores in {gsbytes.c}). f) Fortran input/output units System conventions are likely to differ for the default external file names associated with Fortran unit numbers (e.g., ftn01 in the HP, versus fort.1 in a Sun environment). We avoid usage of units 5-7, which may be "pre-assigned" (e.g., for standard input/output) on different systems. g) Finite-precision issues For simplicity, we have chosen to use floating point (real) arguments as the means of communication to transfer most of the basic numeric data between the user and the read/write programs. However, this choice introduces the possibility of rounding errors due to the effects of finite precision on arithmetic operations in usage or preparation of the data, and the precise nature of the finite precision issues may vary depending on the computing environment, the floating-point representation, computer word size, etc. We believe that the code within program {rdlmr6} and logical function {wrlmr} robustly handles the floating-point data that are being read or written. However, one example of a specific area where problems can develop is that exact comparisons are made by the user of {rdlmr6}, and within {wrlmr}, between the data array (e.g., FTRUE) and the user-assigned missing data value (FMISS). The comparisons are made to determine if a given FTRUE element is missing. The user should ensure that the missing value is set to one and only one value throughout the program. Figure 1 illustrates examples of problems associated with finite-precision. ----------------------------------------------------------------------------\ cat > p.f <<\EOR | program finite | fmiss1 = -999.9 | fmiss2 = -9999./10. | fmiss3 = -999.0-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.1 | write(*,100) 1,fmiss1,fmiss1 | write(*,100) 2,fmiss2,fmiss2 | write(*,100) 3,fmiss3,fmiss3 | 100 format(' fmiss',i1,'=',f12.5,o13) | if(fmiss1.ne.fmiss2) print *,'fmiss1<>fmiss2' | if(fmiss2.ne.fmiss3) print *,'fmiss2<>fmiss3' | if(fmiss1.ne.fmiss3) print *,'fmiss1<>fmiss3' | end | EOR | rm a.out | f77 p.f | a.out | | (output from program finite) | p.f: | MAIN finite: | fmiss1= -999.90002 30436374632 | fmiss2= -999.90002 30436374632 | fmiss3= -999.89978 30436374626 | fmiss2<>fmiss3 | fmiss1<>fmiss3 | | ----------------------------------------------------------------------------/ Figure 1. Examples of finite-precision representation/exact comparison problems. The Fortran 77 program in the figure illustrates the effects of finite-precision on a 32-bit computer in the representation of an example missing value "-999.9" and the problems that can arise if numbers representing identical values using infinite precision arithmetic are constructed in different ways using finite precision. {6. Writing LMR6} This section describes the installation, benchmarking, and use of software in {wrlmr6.tar} that is available to help convert data from other formats into the LMR6 format. This software is used for production ICOADS conversion processing. [NOTE: As discussed in the s-doc prefacing {rdlmr6}, that program also has options to write one output LMR6 file based on the input LMR6 data.] On a Unix system, the following tar command: tar -xvf wrlmr6.tar will extract from the tarfile {wrlmr6.tar} two files, plus two directories each containing four "benchmark" files: wrlmr6 Fortran 77 Program+Shell FILENAME {wrlmr6} input: data (report) BIN directory for MODE=BIN benchmark outputs: BIN/FILENAME_LMR6 {wrlmr6} output: LMR6 file BIN/FILENAME_REJECT {wrlmr6} output: reject file BIN/FILENAME_SUMMARY {wrlmr6} output: conversion summary BIN/rdlmr6_FILENAME {rdlmr6} output: listing of LMR6 file CHR directory for MODE=CHR benchmark outputs: CHR/FILENAME_LMR6 {wrlmr6} output: LMR6 file CHR/FILENAME_REJECT {wrlmr6} output: reject file CHR/FILENAME_SUMMARY {wrlmr6} output: conversion summary CHR/rdlmr6_FILENAME {rdlmr6} output: listing of LMR6 file Logical function {wrlmr} included within {wrlmr6} (plus underlying libraries) handles the details of writing out LMR6 records, and also generates an accompanying "conversion summary" (Appendix I) describing how well the data fit into LMR6 (e.g., counts of records and individual fields rejected). Included as part of the conversion summary is an "index record" that is used to track the converted data and to drive sort/merge operations for ICOADS processing. Utilization of {wrlmr} basically involves filling a set of parallel arrays, each dimensioned to the number of LMR6 fields in the location and regular sections, plus the checksum from the control section (73 fields). Data to be packed into the location/regular fields of LMR6 are passed as floating-point values via argument FTRUE1, or FTRUE1 can be set to the missing value FMISS1. Table 1 provides a list of the required or suggested settings of FTRUE1 for each of the regular/location elements of LMR6. Additional arrays are used to pass copies (CSUP) and the length (LSUP) of supplemental data to be stored in Attm4; and of the original input data (CTRUE) and their field lengths (LTRUE) used to construct the individual FTRUE1 elements, in case the original data need to be written to Attm5. A detailed discussion of the arguments to logical function {wrlmr} and its operation is provided in the s-doc prefacing {wrlmr6}. Logical function {wrlmr} automatically implements the following conventions for handling data, which we presently follow in conversions to LMR6: i) B10, YR, MO, LON, or LAT are not stored in the error attachment; instead, any reports with problems in these fields are written to the reject file. [NOTE: The FTRUE1 element for B10 is required by {wrlmr} to be set to FMISS1, or a warning is issued (see Table 1).] ii) A field is not permitted to be both in the location/regular LMR field and in the error attachment. iii) Indicators referring to missing (not erroneous data) are set to missing. iv) Implementation of the "Uniform Convention" for location assignment along the boundaries of 10-degree boxes and along location "discontinuities": the Equator, Poles, and 0 and 180 longitude (see for details). An example program {eg} and {test} routine are also provided within {wrlmr6}. By default, this software attempts to write out a set of 73 sample LMR6 records to demonstrate the use of {wrlmr}. Logical function {wrlmr} returns true if a report is successfully written, or false if the report had problems in key report fields (in which case the user is encouraged to write the defective report to a "reject file" for examination). As part of the tarfile, "benchmark" outputs from {wrlmr6} are provided in digital form. Together with usage of {rdlmr6} to read and convert into characters the output LMR6 data, it is possible to make a variety of mechanical verifications of output produced on the target machine, versus the benchmark results, to help ensure proper example program installation and operation (Figure 2). Table 1. Required or suggested settings of FTRUE1 for users of logical function {wrlmr} (FMISS1 is the user-defined missing data value). The field numbers and abbreviations (from ) corresponding to the 73 elements in the location and regular section of LMR6 (plus the checksum from the control section) FTRUE1 are listed, together with information about their settings. Fields marked: a) "calculated" are calculated by {wrlmr}. b) "X" must be filled with FMISS1, or {wrlmr} will issue a WARNING. c) "required" must be extant (not FMISS1), or {wrlmr} will either: i) return .FALSE. (for fields marked "or reject"), in which case the original report is to be written out to the user-defined reject file, ii) issue a WARNING (for those marked "assigned"). d) "fill" should be filled with originally reported data (converted to units as needed), or with information derived from the input format or external metadata (for indicators so footnoted), or with FMISS1 if no data or metadata are available and the field is to be set to missing. e) "precondition" fields are likely candidates for later preconditioning (prior to dupelim) to ensure more uniform results. f) "unused" are presently unused; however, if an input format has metadata to include in these fields, we can coordinate on defining/activating them. g) "dupelim" are assigned during later dupelim processing. h) "IMM" are specifically linked to the International Maritime Met. (IMM) formats or receipts of IMM data at NCDC (for IRD). OS and OP should be filled only if the original source of the data was IMM and the fields are directly available. ------------------------------------------------------------------------------- Field Field Field =============================================================================== --location sec.-- --regular sec.-- --regular sec.-- 1 B10 calculated (X) 18 DI fill 47 C1 IMM 2 YR required (or reject) 19 D fill 48 C2 IMM 3 MO required (or reject) 20 WI fill 49 SC fill 4 DY fill 21 W fill 50 SS fill 5 HR fill 22 VI fill 51 A fill 6 TI fill* 23 VV fill 52 PPP fill 7 LON required (or reject) 24 WW fill 53 IS fill 8 LAT required (or reject) 25 W1 fill 54 ES fill 9 LI fill* 26 W2 fill 55 RS fill 10 DCK required (assigned) 27 SLP fill 56 II fill* (precondition) 11 SID required (assigned) 28 T1 fill 57 ID1 fill (left-justify)** 12 PT fill* (precondition) 29 AT fill 58 ID2 fill " 13 QI unused (X) 30 WBT fill*** 59 ID3 fill " 14 DS dupelim (X) 31 DPT fill*** 60 ID4 fill " 15 DC dupelim (X) 32 SST fill 61 ID5 fill " 16 TC unused (X) 33 SI fill 62 ID6 fill " 17 PB unused (X) 34 N fill 63 ID7 fill " --end location sec.-- 35 NH fill 64 ID8 fill " 36 CL fill 65 OS IMM 37 HI fill 66 OP IMM 38 H fill 67 T2 fill** (precondition) 39 CM fill 68 IX fill**** 40 CH fill 69 WX fill 41 WD fill 70 SX fill 42 WP fill 71 IRD IMM (X) 43 WH fill 72 A6 missing (X) 44 SD fill --end regular sec.-- 45 SP fill --control sec.-- 46 SH fill 73 CK missing (X) ------------------------------------------------------------------------------- * Indicators typically not available in original input formats (e.g., not part of the WMO code), but which may still be derivable from knowledge about the input format or external metadata (if available). ** ID characters should be left-justified in the ID array. *** Problems have arisen from the use of different algorithms to calculate WBT and DPT, followed by a later inability to determine which quantity was originally reported. We suggest that one or the other of these quantities be left missing rather than computed as part of the conversion to LMR6, unless both are already extant in the input format. In that case T2 may be useful to carry metadata (if available) about original reporting procedures. NOTE: T2 is presently underconfigured in LMR6 (and settings 3-6 must not be used) to represent all possible metadata configurations available in IMMT formats. The supplemental or error attachments should be used to carry additional metadata. **** IX is presently underconfigured in LMR6 to represent IX=7 (and "/"). The supplemental or error attachments should be used to carry additional metadata. ---------- ----------------------------------------------------------------------------\ (i) MODE=CHR benchmark runs (1-2) and verifications (a-d): | ---------------------------------------------------------- | 1)FILENAME -> {wrlmr6} -> FILENAME_LMR6 <-a) cmp -> CHR/FILENAME_LMR6 | -> FILENAME_REJECT <-b) diff-> CHR/FILENAME_REJECT | -> FILENAME_SUMMARY<-c)*diff-> CHR/FILENAME_SUMMARY| 2)FILENAME_LMR6->{rdlmr6}-> rdlmr6_FILENAME <-d)*diff-> CHR/rdlmr6_FILENAME | | (ii) MODE=BIN benchmark runs (3-4) and verifications (e-h): | ----------------------------------------------------------- | 3)FILENAME -> {wrlmr6} -> FILENAME_LMR6 <-e) cmp -> BIN/FILENAME_LMR6 | -> FILENAME_REJECT <-f) cmp -> BIN/FILENAME_REJECT | -> FILENAME_SUMMARY<-g)*diff-> BIN/FILENAME_SUMMARY| 4)FILENAME_LMR6->{rdlmr6}-> rdlmr6_FILENAME <-h)*diff-> BIN/rdlmr6_FILENAME | | Example Unix script to execute the CHR benchmark (i): | ------------------------------------------------- | wrlmr6 | cmp FILENAME_LMR6 CHR/FILENAME_LMR6 | diff FILENAME_REJECT CHR/FILENAME_REJECT | diff FILENAME_SUMMARY CHR/FILENAME_SUMMARY | rdlmr6 FILENAME_LMR6 > rdlmr6_FILENAME | diff rdlmr6_FILENAME CHR/rdlmr6_FILENAME | ----------------------------------------------------------------------------/ Figure 2. Benchmark verification procedure. First, CSET in {wrlmr6,rdlmr6} should be changed from ASC to EBC, if and only if the native machine character- set is ebcdic instead of the default ascii. Two overall benchmarks, based on MODE, can then be run: i) Using the {wrlmr6} default MODE=CHR. ii) Using MODE reset to BIN. Each benchmark involves one run of {wrlmr6} and one run of {rdlmr6} (labelled 1-4 in the above figure), and four mechanical verifications (labelled a-h in the figure). Unix commands "cmp" (to compare binary data) and "diff" (to compare character data) are suggested, if available, to check for exact (bit-for-bit or character-for-character) agreement. Following are additional details about implementing the benchmarking procedure: (1) Modifications required to {rdlmr6}: For test i), {rdlmr6} will need to be modified from its default to call {getatt,getsup,geterr,prnsup, prnerr} to obtain and print the supplemental and error attachments with MODE=CHR. For test ii), {rdlmr6} will need to be further modified only to call {prnerr} with MODE=BIN (MODE should remain CHR for {prnsup}). (2) Verification against results produced on an ascii computer: Exact agreement is expected for all of the verifications, except for those marked by "*" in the figure. These should differ from the provided benchmark outputs only in local system date and time fields (if available) appearing in the third line of FILENAME_SUMMARY and in the first line of rdlmr6_FILENAME. Possible cosmetic differences also may appear in the final two lines of rdlmr6_FILENAME, where the number of reports and EOF status are printed (twice), due to variations in the output appearance produced by Fortran "PRINT *" statements on different computers. Note that the provided reject and summary benchmark files were constructed using variable-length newline-delimited records; additional differences will arise if these outputs are reproduced using fixed-length records, in which case "diff -b" (ignore trailing blanks) can be used. (3) Verification against results produced on an ebcdic computer: Also, line 1 of the conversion summary will differ since it includes the settings of CSET and MODE (see Appendix I). See Appendix II for a discussion of additional differences that arise from file formatting and disk storage details. (4) Formats of the reject files: The MODE=BIN reject file is written out by {wrlmr6} in a mixture of ascii and binary regardless of CSET, plus with blank fill that conforms to CSET (e.g., ebcdic blanks if CSET=EBC). In contrast, the MODE=CHR reject file is written out in characters that conform to CSET (e.g., ebcdic if CSET=EBC). Note that complications in verification therefore arise for results produced on an ebcdic computer (see Appendix II for more information). =============================================================================== Appendix I: Structure of the Pathname, Index Record, and Conversion Summary =============================================================================== A. Pathname ------------ This is one element of the "index record" (item B). It represents the target pathname on the NCAR mass store, and provides a linkage back to the input FILENAM (e.g., a single input data volume). It is created automatically by logical function {wrlmr} using FILENAM (set to "FILENAME" in the default version of {wrlmr6}) and the programmer's initials INITLS ("?" by default). The pathname also is temporarily written onto SCRATCH unit 4 for internal usage by {wrlmr}. The pathname has the following 27-character structure (the pathname example, preceded by a 2-line position heading, is taken from the CHR/FILENAME_SUMMARY benchmark): 1 2 123456789012345678901234567 /DSS/LMR6/? /FILENAME [ 1 ][ 2 ][3][ 4 ] [1 ] data support section [2 ] LMR6 directory [3 ] programmer's initials: argument INITLS to {wrlmr6} (default "?") [4 ] input filename: argument FILENAM to {wrlmr6} (default FILENAME) B. Index Record --------------- This one-line record is generated automatically by logical function {wrlmr} as the third line (93 characters long) of the "conversion summary" (see item C). The index record is used to track and process individual output data volumes. The record is produced automatically by logical function {wrlmr}, using arguments supplied by the user to generate the pathname (see item A), plus overall run statistics (e.g., B10 range and years covered) that are tracked by {wrlmr} in the course of processing the total number of records included in a given input data volume. The index record has the following 93-character structure (the pathname example, preceded by a 2-line position heading and separated for readability into onto separate lines between positions 61-62, is taken from the CHR/FILENAME_SUMMARY benchmark): 1 2 3 4 5 6 1234567890123456789012345678901234567890123456789012345678901 /DSS/LMR6/? /FILENAME 33 33 2024 2024 69 1 [ 1 ][ 2][ 3][ 4 ][ 5 ][ 6 ][ 7 ] 7 8 9 23456789012345678901234567890123 929 19980320 14:23:16 1 [ 8 ][ 9 ][ 10 ][11] [1 ] pathname on the NCAR mass store (see item A) [2 ] min B10 [3 ] max B10 [4 ] min year [5 ] max year [6 ] number of rptout logical records [7 ] number of rptout physical records [8 ] number of rptout words [9 ] creation date [10] creation time [11] 1 or file number in a multi-file file C. Conversion summary --------------------- The conversion summary information is gathered automatically by logical function {wrlmr} over the course of processing an entire input data volume, and then a standardized tabulation is output (unit 8) automatically at run termination (when {wrlmr} is called with JEOF=2). This summary provides a variety of useful tabular information about how many records were input, output, and rejected; how many individual fields were extant, missing, and erroneous; and details about why reports and fields were rejected. When data volumes are being converted in final production mode, we suggest that each corresponding conversion summary be permanently retained to preserve the index record (line 3) and the other detailed conversion statistics. Following is a discussion of the major elements of the conversion summary, illustrated with selected excerpts interleaved from the conversion summary that is provided in the CHR/FILENAME_SUMMARY benchmark: a) Line 1: This records the user-assigned program name and version (arguments PNAME and PLEVEL; by default "?"), the basic software name and version, and the settings in use of environment variables CSET (CHR) and MODE (ASC): ?? {WRLMR6}.01A CHR ASC b) Line 2: This provides the total numbers of reports read, rejected, and written (including the percentage written with respect to input): 73 REPORTS READ 4 REPORTS REJECTED 69 LMR6 WRITTEN ( 95%) c) The index record, including the pathname, forms the third line of the conversion summary (see items A-B). d) The SUMMARY OF FIELDS provides output counts, and percentages with respect to the total output report count (line 2), separately for each of the 72 LMR6 fields in the location and regular sections. Note that erroneous fields are counted both in the missing and erroneous columns (since they are missing in the regular or location section, but also appear in the error attachment): SUMMARY OF FIELDS FIELD # EXTANT # MISSING # ERRONEOUS % EXTANT % MISSING % ERRONEOUS 1 B10 69 0 0 100 0 0 2 YR 69 0 0 100 0 0 3 MO 69 0 0 100 0 0 4 DY 68 1 1 99 1 1 . . . 69 WX 68 1 1 99 1 1 70 SX 68 1 1 99 1 1 71 IRD 0 69 0 0 100 0 72 A6 0 69 0 0 100 0 e) The SUMMARY OF ERROR ATTACHMENTS provides a sorted distribution frequency of patterns of characters (or binary data) stored in the error attachment (in the example benchmark data, there is only one pattern associated with each field). When MODE=CHR, separate columns list each value in both hexidecimal and characters, except any characters judged unprintable are set to blank (intrinsic Fortran function ICHAR is used to determine if the stored ascii characters fall outside the inclusive range 32-126, i.e., space through "~"). The stored field width (which must be an integral number of 8-bit bytes) can be determined from the width of the string of hexadecimal digits (i.e., each hexadecimal digit represents a 4-bit byte). In contrast, when MODE=BIN (not shown), the character column is omitted. The sort is executed by considering each complete line as a sort key, and using Fortran intrinsic lexical comparison functions to ensure identical sort ordering on different computer systems (otherwise collating sequences may differ among systems, e.g., ebcdic versus ascii): SUMMARY OF ERROR ATTACHMENTS FIELD- CHARACTER------ HEXADECIMAL------------------- --------------FREQUENCY 4 DY 31 3331 1 5 HR 2399 32333939 1 6 TI 3 33 1 9 LI 6 36 1 . . . 67 T2 6 36 1 68 IX 6 36 1 69 WX 1 31 1 70 SX 1 31 1 f) The SUMMARY OF ADDITIONAL INFORMATION provides a similar sorted distribution frequency of patterns of characters (or binary data) leading to the rejection of entire reports (as signaled by a false return from {wrlmr}). The handling of different MODE settings is as for item e): SUMMARY OF ADDITIONAL INFORMATION FIELD- CHARACTER------ HEXADECIMAL------------------- --------------FREQUENCY 2 YR 2024 32303234 1 3 MO 12 3132 1 7 LON 35999 3335393939 1 8 LAT 9000 2039303030 1 =============================================================================== Appendix II: Additional ebcdic and IBM mainframe (MVS) computing issues =============================================================================== A. Example Job Control Language (JCL) ------------------------------------- Example JCL sequences are provided for running {wrlmr6} and {rdlmr6}, followed by a discussion in each case of important elements of the JCL related to file naming and other characteristics. These JCL sequences were tested on an IBM MVS system at NCAR. However, it should be noted that the precise definitions of disk tracks, and possibly other file specifications or JCL elements, may be site dependent (e.g., the details of the JOB statement). 1) {wrlmr6}: //SNOWW JOB (43310016,8191),'WRLMR6',CLASS=A,MSGCLASS=X // EXEC VSF2CLG,FVPOLST='NOLIST,CHARLEN(512)',FVTERM='SYSOUT=*', // GOF6DD='SYSOUT=*' //FORT.SYSPRINT DD SYSOUT=* //FORT.SYSLIN DD SPACE=(TRK,(50,50),RLSE) //FORT.SYSIN DD * (insert Fortran code from {wrlmr6} plus from required libraries here) /* //LKED.SYSPRINT DD SYSOUT=* //GO.FILENAME DD DISP=SHR,DSN=SNOW.LUBKER.FILENAME, // DCB=(RECFM=FB,LRECL=140,BLKSIZE=5880) //GO.FT02F001 DD UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG), // SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT02.OUTPUT, // DCB=(RECFM=FB,LRECL=140,BLKSIZE=5880) //GO.FT03F001 DD UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG), // SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT03.OUTPUT, // DCB=(RECFM=FB,LRECL=4096,BLKSIZE=4096) //GO.FT08F001 DD UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG), // SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT08.OUTPUT, // DCB=(RECFM=FB,LRECL=93,BLKSIZE=930) //GO.SYSIN DD * FILENAME /* Discussion: a) File naming: Except for the name FILENAME (read from SYSIN), the input/output filenames we have developed in the Unix environment (e.g., "FILENAME_LMR6") present problems in IBM mainframe (MVS) Job Control Language (JCL). Underscore ("_") is not allowed, and there may be additional site-specific constraints on filenames. The above JCL associates each output unit 2, 3, and 8 with a file appropriately named in the NCAR site environment (e.g., DSN=SNOW.LUBKER.FT02.OUTPUT). In addition, the main program {eg} within {wrlmr6} must be modified to comment out (deactivate) the corresponding OPEN statements. b) Input/output record lengths and blocksizes: All of the files are formatted as fixed-length and blocked records (RECFM=FB). In contrast, the benchmark (input) FILENAME as produced on Unix is a variable-length newline-delimited file containing two lines: the first 140, and the second 67, characters in length. In preparation for input on an IBM system, the second line may need to be padded out to create a fixed-record length of 140 characters. As shown above, the record lengths (LRECL) of output on units 2, 3, and 8 should be set to 140, 4096, and 93 characters, respectively. The blocksizes (BLKSIZE) are set as multiples (1 in the case of the LMR6 output) of the record size. c) Output file characteristics: The smallest disk file is 1 TRK, so every IBM generated file will be padded out (e.g., with binary zero in the LMR6 output file) to create a final full track. In the case of the benchmark test outputs the amount of padding is excessive, but it may be relatively minor for real data processing. d) CHARLEN(512): This important setting is associated with the largest size needed within all the Fortran code of a character string. 2) {rdlmr6}: //SNOWW JOB (43310016,8191),'RDLMR6',CLASS=A,MSGCLASS=X // EXEC VSF2CLG,FVPOLST='NOLIST,CHARLEN(1992)',FVTERM='SYSOUT=*', // GOF6DD='SYSOUT=*,DCB=LRECL=140' //FORT.SYSPRINT DD SYSOUT=* //FORT.SYSLIN DD SPACE=(TRK,(50,50),RLSE) //FORT.SYSIN DD * (insert Fortran code from {rdlmr6} plus from required libraries here) //LKED.SYSPRINT DD SYSOUT=* //GO.FT01F001 DD DISP=SHR,DSN=SNOW.LUBKER.FT03.OUTPUT, // DCB=(RECFM=FB,LRECL=4096,BLKSIZE=4096) //GO.SYSIN DD * FILENAME /* Discussion: a) File naming: The above JCL associates input unit 1 with the file name of the LMR6 output from {wrlmr6}. The output appears as part of standard output (SYSOUT). For reasons discussed under {wrlmr6}, "rdlmr6_FILENAME" is not allowed in JCL. b) Input/output record lengths and blocksizes: The input LMR6 data are formatted as fixed-length and blocked records (RECFM=FB). As shown above, the record length (LRECL) of input on unit 1 should be set to 4096 characters, and the LRECL of output (on SYSOUT) to 140 characters. The blocksize (BLKSIZE) of input is identical to the record size. c) Output file characteristics: Since the printed LMR6 output appears as part of standard output (SYSOUT), it may be intermingled with compiler and execution listings and diagnostics. d) CHARLEN(1992): This important setting is associated with the largest size needed within all the Fortran code of a character string. B. Benchmark verification ------------------------- Probably the most practical means of verification is to transfer (e.g., via ftp) outputs produced on an ebcdic computer back to an Unix ascii computer for verification against benchmarks stored on the ascii system. In this case, the LMR6 outputs should be ftp'd using binary (or image) "representation type," and the other outputs (except for the MODE=BIN reject file, as discussed below) using characters (typically the ftp default). The latter involves a conversion from ebcdic to ascii, which should match that produced by {ebcasc.f} for the commonly used characters present in the benchmark outputs. Note that the presence of trailing binary zeros in the LMR6 outputs, due to padding to a full disk track, produces files that will not agree according to "cmp" with the benchmark LMR6 outputs. However, these files should still be readable by {rdlmr6}, allowing a "diff" against the corresponding (rdlmr6_FILENAME) printed benchmark output. If the non-LMR6 outputs are formatted using fixed-length records (as in the above JCL examples), they cannot be successfully verified using a simple "diff" due to the presence of trailing blanks. However, "diff -b" will ignore trailing blanks. Alternatively, variable-length records (RECFM=VB) may be chosen for some of the output files to avoid trailing blanks, although we have not fully explored this option. As discussed in Figure 2, the formats of the reject files are different for MODE=CHR (output in the native machine character set, as indicated by CSET) versus MODE=BIN (a mixure of ascii and binary regardless of CSET, plus blank fill in conformance with CSET). Therefore, the CHR reject file should be ftp'd using characters, and the BIN reject file using binary. However, its verification then requires preparatory hand-editing to remove ebcdic blanks and other changes to make the file look like an ordinary Unix file.