===============================================================================
International Comprehensive Ocean-Atmosphere Data Set (ICOADS):     Release 2.4
Software to Read/Write Long Marine Reports (LMR6)             22 September 2007
=====================================================================<soft_lmr>

Document Revision Information (previous version: 27 February 2004):  Updates
for Release 2.4 and for the ICOADS URL.

-------------------------------------------------------------------------------


{1. Introduction}

The Long Marine Report version 6 (LMR6) format is more complex than other
ICOADS formats, due to its variable-length, and its mixture of packed binary
plus characters.  This document describes generalized machine-transportable
Fortran 77 software available to help read and write LMR6.  Sec. 2 describes
documentation conventions used here and in additional "software documentation"
(s-doc) prefacing each software file in the form of program comments.  Sec. 3
lists the available files, and indicates those needed for reading versus
writing LMR6.  Sec. 4 reviews the structure of the LMR6 format, and sec. 5
discusses a variety of computing environment issues applicable to reading
and/or writing LMR6.  Further details about the LMR6 read program are given in
the s-doc prefacing the LMR6 read program, including its ability to read
multiple LMR6 input files, and, optionally, to write one output LMR6 file based
on the input LMR data.  Sec. 6 and Appendix I, plus the s-doc prefacing the
LMR6 write program, discuss the installation and operation of the LMR6 write
program.  Appendix II provides a discussion of additional issues specific to
the IBM mainframe (MVS) computing environment.


{2. Documentation conventions used for software}

Filename suffixes with "." (or their absence) indicate:
     ".f"  : Fortran 77 library
     ".c"  : C language library
     ".o"  : object module compiled on Sun Solaris system (optional)
     ".tar": a multi-file "tarfile" created by the Unix tar command
     none  : Fortran 77 program enclosed in Sun Unix shell commands

Detailed software documentation (s-doc) appears as comments at the top of each
software file.  In the s-doc, curly brackets ({}) are used as a notational
device to enclose the names of programs or accompanying software libraries,
and also the names of subprograms within the programs or libraries.  For
example, {date.f} refers to a Fortran 77 library and {date,time} refers to two
"user-interface" routines that make up {date.f}.  "User-interface" routines
are those routines within a library that are intended to be accessed by users;
other routines should not require direct user access or any modifications.
NOTE: The brackets are strictly notational (not part of the actual file name).


{3. Available files}

The following files located in http://icoads.noaa.gov/software/ are relevant
to the tasks of reading and/or writing LMR6:
     {date.f}
     {date.o}
     {ebcasc.f}
     {ebcasc.o}
     {gsbytes.c}
     {gsbytes.o}
     {lmrlib}
     {rdlmr6}
     {rptin.f}
     {rptin.o}
     {wrlmr6.tar}

The specific software file requirements for reading and/or writing LMR6, and
a brief functional summary in each case, follow:

Reading (and optionally writing) LMR6:
     Fortran program required: {rdlmr6}
     Libraries (or equivalent) required: {ebcasc.f,gsbytes.c,rptin.f}
     Object files (optional): {ebcasc.o,gsbytes.o,rptin.o}
     Functional summary: Program {rdlmr6} reads and prints LMR6.  Multiple
     input LMR6 files can be read, and, optionally, one LMR6 output file can
     be written based on the input LMR6 data.

Writing LMR6:
     Tarfile required: {wrlmr6.tar}
     Libraries (or equivalent) required: {date.f,ebcasc.f,gsbytes.c,rptin.f}
     Object files (optional): {date.o,ebcasc.o,gsbytes.o,rptin.o}
     Functional summary: The tarfile {wrlmr6.tar} contains program {wrlmr6}
     plus "benchmark" results.  Program {wrlmr6} includes an example main
     program {eg} and {test} routine, plus logical function {wrlmr} which
     provides the core function of writing LMR6 data.

Note regarding compiled object files included in the file directory:  These
were generated in NOAA/CDC's current Sun Solaris environment.  They are
strictly optional, and are unnecessary unless appropriate for your computing
environment.


{4. LMR6 structure}

As background, we highlight from <lmr> the overall structure and size of
the LMR6 (versus the fixed-length LMRF6) format:
                           QC   trimming   suppl.   error
   [loc.][reg.][^ctrl] [^Attm1] [^Attm2] [^Attm4] [^Attm5] =  variable length
   [loc.][reg.][^]              [ Attm2]                   =  64 B (512 bits) 
(where ^ indicates the first 15 bits of the control section, or the first 12
bits preceding the data in each attachment).  LMR averages about 50% larger
than the fixed-length LMRF.  The QC and trimming attachments (Attm1 and Attm2)
are not created during conversion processing; these are added during later
quality control and duplicate elimination processing.  Only the supplemental
and error attachments are created during conversion processing:
     a) The supplemental attachment (Attm4) is used to store information from
     each distinct dataset that will not fit into the location (loc.)  or
     regular (reg.) sections of LMR6, or whose conversion is questionable.
     Fields stored in Attm4 may have originated from characters (converted
     from ebcdic to ascii, if applicable, via a standard conversion utility
     offered as part of the software) or binary data, as indicated by source
     ID.  To allow automated decoding of Attm4, it is important to note that a
     different source ID (SID) must be assigned to each input format (deck and
     SID assignments need to be agreed in advance).  From the user standpoint,
     storage by {wrlmr6} in Attm4 of both ascii and binary data is effectively
     in the form of integral 8-bit bytes (up to 255 bytes).  Actually, however,
     the input 8-bit bytes are mapped into the 4/8/12-bit "ship" character-set
     by {wrlmr6}, possibly resulting in a decrease in the number of 8-bit bytes
     that can actually be stored (if so, an error is generated by {wrlmr6}).

     b) The error attachment (Attm5) is used to store fields identified by the
     user of {wrlmr} as erroneous (e.g., containing invalid characters), or
     when {wrlmr} encounters a numeric field that is outside the range defined
     in <lmr> for each field.  Similarly to Attm4, all characters within Attm5
     are maintained either in ascii or binary in the form of integral 8-bit
     bytes (up to eight 8-bit bytes, per LMR6 field).
It should be noted that the LMR6 format also has the flexibility to add new
attachments, if clearly needed.


{5. Computing environment issues}

As reviewed in sec. 4, the LMR6 format is a relatively complex, variable-
length format incorporating a mixture of packed-binary data and characters.
Therefore, in comparison to other ICOADS products, which utilize fixed-length
packed-binary (or character-only formats), the underlying software is less
easily machine-transportable.  Nevertheless, the programs to read and write
LMR6 {rdlmr6,wrlmr6} have been successfully tested and utilized on 32- and
64-bit Unix (native-ascii) computers, and on a IBM 32-bit (native-ebcdic)
mainframe computer (VSFORTRAN V2.6.0).  At this time, however, the programs
have not been tested on computers with byte-swapping conventions used in
storage within computer words (e.g., VAX and PCs).  Specific computing
environment issues are discussed under the following headings, including
a description of the role played by the different software libraries
in addressing some of these issues (also see the s-doc accompanying the
software libraries for more details about their operation, and Appendix
II for additional considerations in the IBM mainframe environment):

a) Local system date and time: {date.f}
Logical function {wrlmr} produces a "conversion summary" which includes
information about when the program was run.  Local date and time are not
Fortran ANSI 77 standard capabilities, therefore these functions have been
isolated into library {date.f} for user modifications as needed to adapt the
routines to local system features and implement the documented functionality.

As provided, {rdlmr6} only accesses local date and time via the Unix shell
command "date".  If this command, or the equivalent capability, is not readily
available, the date and time will appear blank on the output listing.

b) Native machine character-set differences/print mode: {ebasc.f}
The programs to read and write LMR6 ({rdlmr6,wrlmr6}, and the program to read
LMRF6 {rdlmrf6}, all contain common block /ENV/ with variable CSET that is
used to share information throughout the software about the native machine
character-set.  By default CSET=ASC for ascii, but on an IBM mainframe
computer it should be changed by the user to EBC for ebcdic.  Character
information, including the ID fields (LMR6 fields 57-64), may otherwise be
printed or handled improperly.  In connection with CSET, the routines in
{ebcasc.f} need to be included both to read or write LMR6; these routines
provide fully-reversible mappings from ebcdic to ascii, and vice versa.

A separate variable MODE is also utilized in {rdlmr6} and {wrlmr6} to provide
different handling and printing of binary or character data.  In both programs
it should be noted that a 'Z' format specification is used that is not part
of the Fortran 77 ANSI standard, but nevertheless appears widely available,
to produce hexadecimal printouts.  In {rdlmr6}, MODE governs the format of
printouts of the supplemental and error attachments, where CHR (character) or
BIN (binary) are the available MODE values.  MODE is passed as an argument to
print routines {prnsup,prnerr} (or to the alternative, more detailed, routines
{prnsu2,prner2}).  Either MODE may be used, but the original source data
format will dictate which is most appropriate for a given data set (characters
judged unprintable are set to blank by the print routines if MODE=CHR).  When
MODE=BIN, the output is printed in hexadecimal rather than literally in binary,
thus "HEX" (rather than BIN) is printed for labelling purposes.

In {wrlmr6}, MODE is included with CSET in the /ENV/ common block.  It governs
the sort order and printed format of a "conversion summary" tabulation of
the frequency distribution of occurrences of patterns stored in the error
attachment.  Either MODE may be used, but if the input data are characters
(or primarily characters), MODE is best set to CHR (its default); or, if the
input data are binary (or primarily binary) MODE is probably best set to BIN
(Appendix I provides a detailed description of the conversion summary).

c) Bit-string manipulations: {gsbytes.c}
LMR6 and other ICOADS packed-binary formats utilize bit-strings that are less
than or equal to 16 bits (in presently-available products), but otherwise of
varying length and organization across different formats (possibly crossing
actual machine byte and word boundaries).  The C language library {gsbytes.c}
provided among the available files is one implementation of routines
{gbyte,gbytes,sbyte,sbytes} used to manipulate the bit-strings (or "bytes").
Specifically, {gbyte} ("get byte") unpacks one such byte or {gbytes} unpacks
multiple bytes, and {sbyte} ("store byte") packs one byte or {sbytes} packs
multiple bytes (multiple bytes must all be of the same length).  The first
versions of these routines were developed in the 1970s by NCAR.  Additional
Fortran and other versions, tuned for different computer environments, are
available from NCAR:
     http://dss.ucar.edu/libraries/gbytes/

d) Blocking of variable-length records: {rptin.f}
Two enveloping structures are used around the basic LMR6 records: (i) a "RPTIN"
format defined by NCAR, and (ii) a "Cray Operating System" (COS) blocked file
structure to enclose the RPTIN physical records.  More information, including
references, is provided in the s-doc included with {rptin.f}.  {rptin.f} is
available for 32-bit computers, and a 64-bit implementation exists in the Cray
environment, to handle all the details of reading and writing these structures.

e) Unix shell scripts
As provided, {rdlmr6,wrlmr6} are each enclosed in a customized Unix Bourne
shell (sh) script.  [NOTE: More discussion may be needed of the interaction of
the sh shell with other Unix shells.]  Usage of these commands is optional, but
they provide an illustration of one method of compilation and execution.  For
example, the following commands enclose the Fortran code included in {rdlmr6}:
     cat > p.f <<\EOR
     (Fortran code appears here)
     EOR
     a=/data/coads/software
     rm a.out
     f77 p.f $a/ebcasc.o $a/gsbytes.o $a/rptin.o
     date
     ls $* | a.out       

Assuming that the file {rdlmr6} has executable status in the Sun environment
and "filename" contains LMR6 data to be read, after typing:
     rdlmr6 filename 
the individual shell commands then perform the following functions:
     a) The "cat" command copies the Fortran code (down to "EOR") into p.f.
     b) Variable "a" is set to indicate the pathname of the object libraries
     [NOTE: this setting reflects the local NOAA/CDC environment, and should
     be modified as needed].
     c) File a.out (if present) is removed by the "rm" command (in the Sun
     environment, a.out is the default external filename for an executable
     output file resulting from compilation, e.g., of p.f).
     d) The "f77" command compiles p.f, and loads it together with the three
     (pre-compiled) object libraries into the executable output file a.out.
     e) The Unix "date" command is used to output the date and time.
     f) Execute a.out.  In this example, "filename" forms the input because
     as the only typed argument it is substituted into $*.  [NOTE: A list of
     filenames could be inserted in place of $* to read multiple files, as
     discussed in the s-doc for {rdlmr6}.]

In a non-Sun Unix environment, these commands may be eliminated or modified
to fit individual requirements.  For example, we found that the following
additional f77 arguments were needed in an HP Unix environment:
     f77 +U77 +ppu   (followed by source and/or object library names)
where the +U77 option picked up the library that contained system-dependent
routines needed by {date.f}, and the +ppu added underscores onto all the calls
to external routines (needed because of underscores in {gsbytes.c}).

f) Fortran input/output units
System conventions are likely to differ for the default external file names
associated with Fortran unit numbers (e.g., ftn01 in the HP, versus fort.1 in
a Sun environment).  We avoid usage of units 5-7, which may be "pre-assigned"
(e.g., for standard input/output) on different systems.

g) Finite-precision issues
For simplicity, we have chosen to use floating point (real) arguments as
the means of communication to transfer most of the basic numeric data between
the user and the read/write programs.  However, this choice introduces the
possibility of rounding errors due to the effects of finite precision on
arithmetic operations in usage or preparation of the data, and the precise
nature of the finite precision issues may vary depending on the computing
environment, the floating-point representation, computer word size, etc.

We believe that the code within program {rdlmr6} and logical function {wrlmr}
robustly handles the floating-point data that are being read or written.
However, one example of a specific area where problems can develop is that
exact comparisons are made by the user of {rdlmr6}, and within {wrlmr},
between the data array (e.g., FTRUE) and the user-assigned missing data value
(FMISS).  The comparisons are made to determine if a given FTRUE element is
missing.  The user should ensure that the missing value is set to one and only
one value throughout the program.  Figure 1 illustrates examples of problems
associated with finite-precision.


----------------------------------------------------------------------------\
cat > p.f <<\EOR                                                            |
      program finite                                                        | 
      fmiss1 = -999.9                                                       | 
      fmiss2 = -9999./10.                                                   |
      fmiss3 = -999.0-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.1                   |
      write(*,100) 1,fmiss1,fmiss1                                          |  
      write(*,100) 2,fmiss2,fmiss2                                          |
      write(*,100) 3,fmiss3,fmiss3                                          |
 100  format(' fmiss',i1,'=',f12.5,o13)                                     |
      if(fmiss1.ne.fmiss2) print *,'fmiss1<>fmiss2'                         |
      if(fmiss2.ne.fmiss3) print *,'fmiss2<>fmiss3'                         |
      if(fmiss1.ne.fmiss3) print *,'fmiss1<>fmiss3'                         |
      end                                                                   |
EOR                                                                         |
rm a.out                                                                    |
f77 p.f                                                                     |
a.out                                                                       | 
                                                                            |
(output from program finite)                                                |
p.f:                                                                        |
 MAIN finite:                                                               |
 fmiss1=  -999.90002  30436374632                                           |
 fmiss2=  -999.90002  30436374632                                           |
 fmiss3=  -999.89978  30436374626                                           | 
 fmiss2<>fmiss3                                                             |
 fmiss1<>fmiss3                                                             |
                                                                            |
----------------------------------------------------------------------------/
Figure 1.  Examples of finite-precision representation/exact comparison
problems.  The Fortran 77 program in the figure illustrates the effects of
finite-precision on a 32-bit computer in the representation of an example
missing value "-999.9" and the problems that can arise if numbers representing
identical values using infinite precision arithmetic are constructed in
different ways using finite precision.


{6. Writing LMR6}

This section describes the installation, benchmarking, and use of software in
{wrlmr6.tar} that is available to help convert data from other formats into
the LMR6 format.  This software is used for production ICOADS conversion
processing.  [NOTE: As discussed in the s-doc prefacing {rdlmr6}, that program
also has options to write one output LMR6 file based on the input LMR6 data.]

On a Unix system, the following tar command:
     tar -xvf wrlmr6.tar
will extract from the tarfile {wrlmr6.tar} two files, plus two directories
each containing four "benchmark" files:
     wrlmr6                  Fortran 77 Program+Shell
     FILENAME                {wrlmr6} input: data (report)
     BIN           directory for MODE=BIN benchmark outputs:
     BIN/FILENAME_LMR6       {wrlmr6} output: LMR6 file
     BIN/FILENAME_REJECT     {wrlmr6} output: reject file
     BIN/FILENAME_SUMMARY    {wrlmr6} output: conversion summary
     BIN/rdlmr6_FILENAME     {rdlmr6} output: listing of LMR6 file
     CHR           directory for MODE=CHR benchmark outputs:
     CHR/FILENAME_LMR6       {wrlmr6} output: LMR6 file
     CHR/FILENAME_REJECT     {wrlmr6} output: reject file
     CHR/FILENAME_SUMMARY    {wrlmr6} output: conversion summary
     CHR/rdlmr6_FILENAME     {rdlmr6} output: listing of LMR6 file

Logical function {wrlmr} included within {wrlmr6} (plus underlying libraries)
handles the details of writing out LMR6 records, and also generates an
accompanying "conversion summary" (Appendix I) describing how well the data fit
into LMR6 (e.g., counts of records and individual fields rejected).  Included
as part of the conversion summary is an "index record" that is used to track
the converted data and to drive sort/merge operations for ICOADS processing.

Utilization of {wrlmr} basically involves filling a set of parallel arrays,
each dimensioned to the number of LMR6 fields in the location and regular
sections, plus the checksum from the control section (73 fields).  Data to be
packed into the location/regular fields of LMR6 are passed as floating-point
values via argument FTRUE1, or FTRUE1 can be set to the missing value FMISS1.
Table 1 provides a list of the required or suggested settings of FTRUE1 for
each of the regular/location elements of LMR6.  Additional arrays are used to
pass copies (CSUP) and the length (LSUP) of supplemental data to be stored in
Attm4; and of the original input data (CTRUE) and their field lengths (LTRUE)
used to construct the individual FTRUE1 elements, in case the original data
need to be written to Attm5.  A detailed discussion of the arguments to logical
function {wrlmr} and its operation is provided in the s-doc prefacing {wrlmr6}.

Logical function {wrlmr} automatically implements the following conventions
for handling data, which we presently follow in conversions to LMR6:
     i) B10, YR, MO, LON, or LAT are not stored in the error attachment;
     instead, any reports with problems in these fields are written to the
     reject file.  [NOTE: The FTRUE1 element for B10 is required by {wrlmr}
     to be set to FMISS1, or a warning is issued (see Table 1).]
     ii) A field is not permitted to be both in the location/regular LMR
     field and in the error attachment.
     iii) Indicators referring to missing (not erroneous data) are set to
     missing.
     iv) Implementation of the "Uniform Convention" for location
     assignment along the boundaries of 10-degree boxes and along location
     "discontinuities": the Equator, Poles, and 0 and 180 longitude (see
     <loc_disc> for details).

An example program {eg} and {test} routine are also provided within {wrlmr6}.
By default, this software attempts to write out a set of 73 sample LMR6
records to demonstrate the use of {wrlmr}.  Logical function {wrlmr} returns
true if a report is successfully written, or false if the report had problems
in key report fields (in which case the user is encouraged to write the
defective report to a "reject file" for examination).  As part of the tarfile,
"benchmark" outputs from {wrlmr6} are provided in digital form.  Together with
usage of {rdlmr6} to read and convert into characters the output LMR6 data, it
is possible to make a variety of mechanical verifications of output produced on
the target machine, versus the benchmark results, to help ensure proper example
program installation and operation (Figure 2).


Table 1.  Required or suggested settings of FTRUE1 for users of logical
function {wrlmr} (FMISS1 is the user-defined missing data value).  The field
numbers and abbreviations (from <lmr>) corresponding to the 73 elements in
the location and regular section of LMR6 (plus the checksum from the control
section) FTRUE1 are listed, together with information about their settings.
Fields marked:
     a) "calculated" are calculated by {wrlmr}.
     b) "X" must be filled with FMISS1, or {wrlmr} will issue a WARNING.
     c) "required" must be extant (not FMISS1), or {wrlmr} will either:
          i) return .FALSE. (for fields marked "or reject"), in which case the
          original report is to be written out to the user-defined reject file,
          ii) issue a WARNING (for those marked "assigned").
     d) "fill" should be filled with originally reported data (converted to
     <lmr> units as needed), or with information derived from the input format
     or external metadata (for indicators so footnoted), or with FMISS1 if
     no data or metadata are available and the field is to be set to missing.
     e) "precondition" fields are likely candidates for later preconditioning
     (prior to dupelim) to ensure more uniform results.
     f) "unused" are presently unused; however, if an input format has metadata
     to include in these fields, we can coordinate on defining/activating them.
     g) "dupelim" are assigned during later dupelim processing.
     h) "IMM" are specifically linked to the International Maritime Met. (IMM)
     formats or receipts of IMM data at NCDC (for IRD).  OS and OP should be
     filled only if the original source of the data was IMM and the fields are
     directly available.
-------------------------------------------------------------------------------
  Field                        Field              Field
===============================================================================
--location sec.--            --regular sec.--   --regular sec.--
  1  B10 calculated (X)        18  DI  fill       47  C1  IMM
  2  YR  required (or reject)  19  D   fill       48  C2  IMM
  3  MO  required (or reject)  20  WI  fill       49  SC  fill
  4  DY  fill                  21  W   fill       50  SS  fill
  5  HR  fill                  22  VI  fill       51  A   fill
  6  TI  fill*                 23  VV  fill       52  PPP fill
  7  LON required (or reject)  24  WW  fill       53  IS  fill
  8  LAT required (or reject)  25  W1  fill       54  ES  fill
  9  LI  fill*                 26  W2  fill       55  RS  fill
  10 DCK required (assigned)   27  SLP fill       56  II  fill* (precondition)
  11 SID required (assigned)   28  T1  fill       57  ID1 fill (left-justify)**
  12 PT  fill* (precondition)  29  AT  fill       58  ID2 fill        "
  13 QI  unused (X)            30  WBT fill***    59  ID3 fill        "
  14 DS  dupelim (X)           31  DPT fill***    60  ID4 fill        "
  15 DC  dupelim (X)           32  SST fill       61  ID5 fill        "
  16 TC  unused (X)            33  SI  fill       62  ID6 fill        "
  17 PB  unused (X)            34  N   fill       63  ID7 fill        "
--end location sec.--          35  NH  fill       64  ID8 fill        "
                               36  CL  fill       65  OS  IMM
                               37  HI  fill       66  OP  IMM
                               38  H   fill       67  T2  fill** (precondition)
                               39  CM  fill       68  IX  fill****
                               40  CH  fill       69  WX  fill
                               41  WD  fill       70  SX  fill
                               42  WP  fill       71  IRD IMM (X)
                               43  WH  fill       72  A6  missing (X)
                               44  SD  fill     --end regular sec.--
                               45  SP  fill     --control sec.--
                               46  SH  fill       73  CK  missing (X)  
-------------------------------------------------------------------------------
* Indicators typically not available in original input formats (e.g., not part
of the WMO code), but which may still be derivable from knowledge about the
input format or external metadata (if available).
** ID characters should be left-justified in the ID array.
*** Problems have arisen from the use of different algorithms to calculate
WBT and DPT, followed by a later inability to determine which quantity was
originally reported.  We suggest that one or the other of these quantities be
left missing rather than computed as part of the conversion to LMR6, unless
both are already extant in the input format.  In that case T2 may be useful to
carry metadata (if available) about original reporting procedures.  NOTE: T2
is presently underconfigured in LMR6 (and settings 3-6 must not be used) to
represent all possible metadata configurations available in IMMT formats.  The
supplemental or error attachments should be used to carry additional metadata.
**** IX is presently underconfigured in LMR6 to represent IX=7 (and "/").  The
supplemental or error attachments should be used to carry additional metadata.
----------


----------------------------------------------------------------------------\
       (i) MODE=CHR benchmark runs (1-2) and verifications (a-d):           | 
       ----------------------------------------------------------           |
1)FILENAME -> {wrlmr6}   -> FILENAME_LMR6   <-a) cmp -> CHR/FILENAME_LMR6   |
                         -> FILENAME_REJECT <-b) diff-> CHR/FILENAME_REJECT |
                         -> FILENAME_SUMMARY<-c)*diff-> CHR/FILENAME_SUMMARY|
2)FILENAME_LMR6->{rdlmr6}-> rdlmr6_FILENAME <-d)*diff-> CHR/rdlmr6_FILENAME |
                                                                            |
       (ii) MODE=BIN benchmark runs (3-4) and verifications (e-h):          |
       -----------------------------------------------------------          |
3)FILENAME -> {wrlmr6}   -> FILENAME_LMR6   <-e) cmp -> BIN/FILENAME_LMR6   |
                         -> FILENAME_REJECT <-f) cmp -> BIN/FILENAME_REJECT |
                         -> FILENAME_SUMMARY<-g)*diff-> BIN/FILENAME_SUMMARY|
4)FILENAME_LMR6->{rdlmr6}-> rdlmr6_FILENAME <-h)*diff-> BIN/rdlmr6_FILENAME |
                                                                            |
       Example Unix script to execute the CHR benchmark (i):                |
       -------------------------------------------------                    |
wrlmr6                                                                      |
cmp  FILENAME_LMR6    CHR/FILENAME_LMR6                                     |
diff FILENAME_REJECT  CHR/FILENAME_REJECT                                   |
diff FILENAME_SUMMARY CHR/FILENAME_SUMMARY                                  |
rdlmr6 FILENAME_LMR6 > rdlmr6_FILENAME                                      | 
diff rdlmr6_FILENAME  CHR/rdlmr6_FILENAME                                   |
----------------------------------------------------------------------------/
Figure 2.  Benchmark verification procedure.  First, CSET in {wrlmr6,rdlmr6}
should be changed from ASC to EBC, if and only if the native machine character-
set is ebcdic instead of the default ascii.  Two overall benchmarks, based
on MODE, can then be run: i) Using the {wrlmr6} default MODE=CHR.  ii) Using
MODE reset to BIN.  Each benchmark involves one run of {wrlmr6} and one run of
{rdlmr6} (labelled 1-4 in the above figure), and four mechanical verifications
(labelled a-h in the figure).  Unix commands "cmp" (to compare binary data)
and "diff" (to compare character data) are suggested, if available, to check
for exact (bit-for-bit or character-for-character) agreement.  Following are
additional details about implementing the benchmarking procedure:
     (1) Modifications required to {rdlmr6}:  For test i), {rdlmr6} will need
     to be modified from its default to call {getatt,getsup,geterr,prnsup,
     prnerr} to obtain and print the supplemental and error attachments with
     MODE=CHR.  For test ii), {rdlmr6} will need to be further modified only
     to call {prnerr} with MODE=BIN (MODE should remain CHR for {prnsup}).

     (2) Verification against results produced on an ascii computer:  Exact
     agreement is expected for all of the verifications, except for those
     marked by "*" in the figure.  These should differ from the provided
     benchmark outputs only in local system date and time fields (if available)
     appearing in the third line of FILENAME_SUMMARY and in the first line of
     rdlmr6_FILENAME.  Possible cosmetic differences also may appear in the
     final two lines of rdlmr6_FILENAME, where the number of reports and EOF
     status are printed (twice), due to variations in the output appearance
     produced by Fortran "PRINT *" statements on different computers.  Note
     that the provided reject and summary benchmark files were constructed
     using variable-length newline-delimited records; additional differences
     will arise if these outputs are reproduced using fixed-length records,
     in which case "diff -b" (ignore trailing blanks) can be used.

     (3) Verification against results produced on an ebcdic computer:  Also,
     line 1 of the conversion summary will differ since it includes the
     settings of CSET and MODE (see Appendix I).  See Appendix II for a
     discussion of additional differences that arise from file formatting
     and disk storage details.

     (4) Formats of the reject files:  The MODE=BIN reject file is written out
     by {wrlmr6} in a mixture of ascii and binary regardless of CSET, plus
     with blank fill that conforms to CSET (e.g., ebcdic blanks if CSET=EBC).
     In contrast, the MODE=CHR reject file is written out in characters that
     conform to CSET (e.g., ebcdic if CSET=EBC).  Note that complications in
     verification therefore arise for results produced on an ebcdic computer
     (see Appendix II for more information).


===============================================================================
Appendix I: Structure of the Pathname, Index Record, and Conversion Summary
===============================================================================


A.  Pathname
------------
This is one element of the "index record" (item B).  It represents the target
pathname on the NCAR mass store, and provides a linkage back to the input
FILENAM (e.g., a single input data volume).  It is created automatically by
logical function {wrlmr} using FILENAM (set to "FILENAME" in the default
version of {wrlmr6}) and the programmer's initials INITLS ("?" by default).
The pathname also is temporarily written onto SCRATCH unit 4 for internal
usage by {wrlmr}.  The pathname has the following 27-character structure
(the pathname example, preceded by a 2-line position heading, is taken from
the CHR/FILENAME_SUMMARY benchmark):

         1         2
123456789012345678901234567
/DSS/LMR6/? /FILENAME
[ 1 ][ 2 ][3][     4      ]

[1 ] data support section
[2 ] LMR6 directory
[3 ] programmer's initials: argument INITLS  to {wrlmr6} (default "?")
[4 ] input filename:        argument FILENAM to {wrlmr6} (default FILENAME)


B. Index Record
---------------
This one-line record is generated automatically by logical function {wrlmr} as
the third line (93 characters long) of the "conversion summary" (see item C).
The index record is used to track and process individual output data volumes.
The record is produced automatically by logical function {wrlmr}, using
arguments supplied by the user to generate the pathname (see item A), plus
overall run statistics (e.g., B10 range and years covered) that are tracked by
{wrlmr} in the course of processing the total number of records included in
a given input data volume.  The index record has the following 93-character
structure (the pathname example, preceded by a 2-line position heading and
separated for readability into onto separate lines between positions 61-62,
is taken from the CHR/FILENAME_SUMMARY benchmark):

         1         2         3         4         5         6
1234567890123456789012345678901234567890123456789012345678901
/DSS/LMR6/? /FILENAME        33  33 2024 2024      69       1
[             1           ][ 2][ 3][ 4 ][ 5 ][   6  ][   7  ]

        7         8         9
23456789012345678901234567890123
       929 19980320 14:23:16   1
[    8   ][   9   ][   10  ][11]

[1 ] pathname on the NCAR mass store (see item A)
[2 ] min B10
[3 ] max B10
[4 ] min year
[5 ] max year
[6 ] number of rptout logical records
[7 ] number of rptout physical records
[8 ] number of rptout words
[9 ] creation date
[10] creation time
[11] 1 or file number in a multi-file file


C. Conversion summary
---------------------
The conversion summary information is gathered automatically by logical
function {wrlmr} over the course of processing an entire input data volume,
and then a standardized tabulation is output (unit 8) automatically at run
termination (when {wrlmr} is called with JEOF=2).  This summary provides a
variety of useful tabular information about how many records were input,
output, and rejected; how many individual fields were extant, missing, and
erroneous; and details about why reports and fields were rejected.  When
data volumes are being converted in final production mode, we suggest that
each corresponding conversion summary be permanently retained to preserve
the index record (line 3) and the other detailed conversion statistics.

Following is a discussion of the major elements of the conversion summary,
illustrated with selected excerpts interleaved from the conversion summary
that is provided in the CHR/FILENAME_SUMMARY benchmark:

a) Line 1:  This records the user-assigned program name and version (arguments
PNAME and PLEVEL; by default "?"), the basic software name and version, and
the settings in use of environment variables CSET (CHR) and MODE (ASC):
 ??      {WRLMR6}.01A CHR ASC

b) Line 2: This provides the total numbers of reports read, rejected, and
written (including the percentage written with respect to input):
      73 REPORTS READ            4 REPORTS REJECTED       69 LMR6 WRITTEN ( 95%)

c) The index record, including the pathname, forms the third line of the
conversion summary (see items A-B).

d) The SUMMARY OF FIELDS provides output counts, and percentages with respect
to the total output report count (line 2), separately for each of the 72 LMR6
fields in the location and regular sections.  Note that erroneous fields are
counted both in the missing and erroneous columns (since they are missing in
the regular or location section, but also appear in the error attachment):

 SUMMARY OF FIELDS
 FIELD    # EXTANT  # MISSING  # ERRONEOUS    % EXTANT  % MISSING  % ERRONEOUS
  1 B10         69          0            0         100          0            0
  2 YR          69          0            0         100          0            0
  3 MO          69          0            0         100          0            0
  4 DY          68          1            1          99          1            1
 . . .
 69 WX          68          1            1          99          1            1
 70 SX          68          1            1          99          1            1
 71 IRD          0         69            0           0        100            0
 72 A6           0         69            0           0        100            0

e) The SUMMARY OF ERROR ATTACHMENTS provides a sorted distribution frequency
of patterns of characters (or binary data) stored in the error attachment (in
the example benchmark data, there is only one pattern associated with each
field).  When MODE=CHR, separate columns list each value in both hexidecimal
and characters, except any characters judged unprintable are set to blank
(intrinsic Fortran function ICHAR is used to determine if the stored ascii
characters fall outside the inclusive range 32-126, i.e., space through "~").
The stored field width (which must be an integral number of 8-bit bytes) can
be determined from the width of the string of hexadecimal digits (i.e., each
hexadecimal digit represents a 4-bit byte).  In contrast, when MODE=BIN (not
shown), the character column is omitted.  The sort is executed by considering
each complete line as a sort key, and using Fortran intrinsic lexical
comparison functions to ensure identical sort ordering on different computer
systems (otherwise collating sequences may differ among systems, e.g., ebcdic
versus ascii):

 SUMMARY OF ERROR ATTACHMENTS
 FIELD- CHARACTER------ HEXADECIMAL------------------- --------------FREQUENCY
  4 DY  31              3331                                                 1
  5 HR  2399            32333939                                             1
  6 TI  3               33                                                   1
  9 LI  6               36                                                   1
 . . .
 67 T2  6               36                                                   1
 68 IX  6               36                                                   1
 69 WX  1               31                                                   1
 70 SX  1               31                                                   1

f) The SUMMARY OF ADDITIONAL INFORMATION provides a similar sorted distribution
frequency of patterns of characters (or binary data) leading to the rejection
of entire reports (as signaled by a false return from {wrlmr}).  The handling
of different MODE settings is as for item e):

 SUMMARY OF ADDITIONAL INFORMATION
 FIELD- CHARACTER------ HEXADECIMAL------------------- --------------FREQUENCY
  2 YR  2024            32303234                                             1
  3 MO  12              3132                                                 1
  7 LON 35999           3335393939                                           1
  8 LAT  9000           2039303030                                           1


===============================================================================
Appendix II:  Additional ebcdic and IBM mainframe (MVS) computing issues
===============================================================================


A. Example Job Control Language (JCL)
-------------------------------------
Example JCL sequences are provided for running {wrlmr6} and {rdlmr6}, followed
by a discussion in each case of important elements of the JCL related to file
naming and other characteristics.  These JCL sequences were tested on an IBM
MVS system at NCAR.  However, it should be noted that the precise definitions
of disk tracks, and possibly other file specifications or JCL elements, may
be site dependent (e.g., the details of the JOB statement).

1) {wrlmr6}:
//SNOWW JOB (43310016,8191),'WRLMR6',CLASS=A,MSGCLASS=X
//  EXEC VSF2CLG,FVPOLST='NOLIST,CHARLEN(512)',FVTERM='SYSOUT=*',
//  GOF6DD='SYSOUT=*'
//FORT.SYSPRINT  DD  SYSOUT=*
//FORT.SYSLIN  DD  SPACE=(TRK,(50,50),RLSE)
//FORT.SYSIN  DD  *

     (insert Fortran code from {wrlmr6} plus from required libraries here)

/*
//LKED.SYSPRINT  DD  SYSOUT=*
//GO.FILENAME DD  DISP=SHR,DSN=SNOW.LUBKER.FILENAME,
//  DCB=(RECFM=FB,LRECL=140,BLKSIZE=5880)
//GO.FT02F001 DD  UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG),
//  SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT02.OUTPUT,
//  DCB=(RECFM=FB,LRECL=140,BLKSIZE=5880)
//GO.FT03F001 DD  UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG),
//  SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT03.OUTPUT,
//  DCB=(RECFM=FB,LRECL=4096,BLKSIZE=4096)
//GO.FT08F001 DD  UNIT=SYSDA,VOL=SER=SYS001,DISP=(NEW,CATLG),
//  SPACE=(TRK,(1,1)),DSN=SNOW.LUBKER.FT08.OUTPUT,
//  DCB=(RECFM=FB,LRECL=93,BLKSIZE=930)
//GO.SYSIN DD *
FILENAME
/*

     Discussion:
     a) File naming:  Except for the name FILENAME (read from SYSIN), the
     input/output filenames we have developed in the Unix environment (e.g.,
     "FILENAME_LMR6") present problems in IBM mainframe (MVS) Job Control
     Language (JCL).  Underscore ("_") is not allowed, and there may be
     additional site-specific constraints on filenames.  The above JCL
     associates each output unit 2, 3, and 8 with a file appropriately named
     in the NCAR site environment (e.g., DSN=SNOW.LUBKER.FT02.OUTPUT).  In
     addition, the main program {eg} within {wrlmr6} must be modified to
     comment out (deactivate) the corresponding OPEN statements.

     b) Input/output record lengths and blocksizes:  All of the files are
     formatted as fixed-length and blocked records (RECFM=FB).  In contrast,
     the benchmark (input) FILENAME as produced on Unix is a variable-length
     newline-delimited file containing two lines: the first 140, and the
     second 67, characters in length.  In preparation for input on an IBM
     system, the second line may need to be padded out to create a fixed-record
     length of 140 characters.  As shown above, the record lengths (LRECL) of
     output on units 2, 3, and 8 should be set to 140, 4096, and 93 characters,
     respectively.  The blocksizes (BLKSIZE) are set as multiples (1 in the
     case of the LMR6 output) of the record size.

     c) Output file characteristics:  The smallest disk file is 1 TRK, so every
     IBM generated file will be padded out (e.g., with binary zero in the LMR6
     output file) to create a final full track.  In the case of the benchmark
     test outputs the amount of padding is excessive, but it may be relatively
     minor for real data processing.

     d) CHARLEN(512): This important setting is associated with the largest
     size needed within all the Fortran code of a character string.

2) {rdlmr6}:
//SNOWW JOB (43310016,8191),'RDLMR6',CLASS=A,MSGCLASS=X
//  EXEC VSF2CLG,FVPOLST='NOLIST,CHARLEN(1992)',FVTERM='SYSOUT=*',
//  GOF6DD='SYSOUT=*,DCB=LRECL=140'
//FORT.SYSPRINT  DD  SYSOUT=*
//FORT.SYSLIN  DD  SPACE=(TRK,(50,50),RLSE)
//FORT.SYSIN  DD  *

     (insert Fortran code from {rdlmr6} plus from required libraries here)

//LKED.SYSPRINT  DD  SYSOUT=*
//GO.FT01F001 DD  DISP=SHR,DSN=SNOW.LUBKER.FT03.OUTPUT,
//  DCB=(RECFM=FB,LRECL=4096,BLKSIZE=4096)
//GO.SYSIN DD *
FILENAME
/*

     Discussion:
     a) File naming:  The above JCL associates input unit 1 with the file name
     of the LMR6 output from {wrlmr6}.  The output appears as part of standard
     output (SYSOUT).  For reasons discussed under {wrlmr6}, "rdlmr6_FILENAME"
     is not allowed in JCL.

     b) Input/output record lengths and blocksizes:  The input LMR6 data are
     formatted as fixed-length and blocked records (RECFM=FB).  As shown
     above, the record length (LRECL) of input on unit 1 should be set to
     4096 characters, and the LRECL of output (on SYSOUT) to 140 characters.
     The blocksize (BLKSIZE) of input is identical to the record size.

     c) Output file characteristics: Since the printed LMR6 output appears as
     part of standard output (SYSOUT), it may be intermingled with compiler
     and execution listings and diagnostics.

     d) CHARLEN(1992): This important setting is associated with the largest
     size needed within all the Fortran code of a character string.


B. Benchmark verification
-------------------------
Probably the most practical means of verification is to transfer (e.g., via
ftp) outputs produced on an ebcdic computer back to an Unix ascii computer for
verification against benchmarks stored on the ascii system.  In this case, the
LMR6 outputs should be ftp'd using binary (or image) "representation type,"
and the other outputs (except for the MODE=BIN reject file, as discussed below)
using characters (typically the ftp default).  The latter involves a conversion
from ebcdic to ascii, which should match that produced by {ebcasc.f} for the
commonly used characters present in the benchmark outputs.

Note that the presence of trailing binary zeros in the LMR6 outputs, due to
padding to a full disk track, produces files that will not agree according
to "cmp" with the benchmark LMR6 outputs.  However, these files should
still be readable by {rdlmr6}, allowing a "diff" against the corresponding
(rdlmr6_FILENAME) printed benchmark output.

If the non-LMR6 outputs are formatted using fixed-length records (as in the
above JCL examples), they cannot be successfully verified using a simple
"diff" due to the presence of trailing blanks.  However, "diff -b" will ignore
trailing blanks.  Alternatively, variable-length records (RECFM=VB) may be
chosen for some of the output files to avoid trailing blanks, although we have
not fully explored this option.

As discussed in Figure 2, the formats of the reject files are different for
MODE=CHR (output in the native machine character set, as indicated by CSET)
versus MODE=BIN (a mixure of ascii and binary regardless of CSET, plus blank
fill in conformance with CSET).  Therefore, the CHR reject file should be
ftp'd using characters, and the BIN reject file using binary.  However, its
verification then requires preparatory hand-editing to remove ebcdic blanks
and other changes to make the file look like an ordinary Unix file.