RFC 242

RFC 242
NIC 7672
Categories: D.4, D.7

               DATA DESCRIPTIVE LANGUAGE FOR SHARED DATA

                                L. Haibt
                               A. Mullery

                    Thomas J. Watson Research Center
                         Yorktown Heights, N.Y.

                             July 19, 1971

Introduction

    A primary consequence of the use of networks of computers is the
demand for more efficient shared use of data.

    Many of the impedements to easy shared data follow from the many
diverse ways of representing and making reference to the same data.
Almost all of these problems have been known before data was shared
through computer networks, but the network facility has simply
emphasized the problems.

    For convenience of discussion, representation differences will be
classified in three categories. The first category is one of very local
representation - the bit patterns for the character set, for fixed point
and floating point numbers. These differences are usually imposed by
differences in CPU's and storage devices. Translations from one
representation to another at another at this level can usually be made a
unit at a time (e.g. computer word by computer word) with the most
serious problems occurring when there are some values in one
representation scheme which have no corresponding meaning in the other
representation scheme, as, for expamble, when trying to translate
eight-bit bytes to six-bit bytes.

    A second category of differences has to do with the representation
of collections of data, e.g., their size, ordering and location.

                                                                [Page 1]


A third category of representation differences which is a little
difficult to characterize has to do with all the more complex structures
that data collections may have - for example, files with indexes, fields
with internal pointers and cross references, and collections of files
such as partitioned data sets and generation data sets in OS 360.

    The approach to coping with these problems within our project of
Network/440 has been to work on the development of a descriptive
language which would permit the specification of those aspects of data
representation which would be subject to transformation in moving data
about in a network. Then, the network data managment system would be
able to refer to the descriptions as needed in the data management
function. For example, to a large extent, one could supply two
descriptions to the data manager, one wich indicates how data is now
represented, and one which indicates how a copy of it should look, and
the data managment systems could invoke the necessary transformations to
make the proper copy.

    This approach to specifying data transformation contrasts somewhat
with systems, such as the RAND Form Machine, which provide a formalism
for specifying the particular translation alogrithms for changing form
one form to another. the descriptor-to descriptor approach seems to
simplyfy the programming burden when creating new field formats. Neither
method of specifying translations precludes the use of a Network
Standard Reprsentation.

Structure

    The descriptive language assumes that data may have an inherent
structure independent of other groupings, such as name groupings,
locking groupings, etc., imposed on it. A data structure description
consists of groupings of established data value type codes. The list of
established data value types should be sufficient, through appropriate
groupings, to describe any hierarchical structure of data.

    The data type identifies how the data value is to be interpreted. A
list of data type codes is given below. This list must be able identify
each data type that may exist in a data set in any machine in the
network. However, for data sets that contain only data types of the
machine on which it is stored, it is not necessary that a different code
be defined for different forms of any single type that may exist among
different machines. The data type specified in the description along
with the identification of the machine at which the data is stored is
sufficient to completely describe all such forms of the data types. A
tentative list of machine dependent type codes, compiled by

                                                                [Page 2]


G. Howe and T. Kibler is as follows:

          F    floating point

          I    fixed

          D    double precision floating point

          C    character string

          X    complex

          P    packed decimal

          L    logical

    It is desirable to be able to construct data sets that contain
either data types not allowable at the machine at which the data set
is stored or, possibly, even types that say not exist at any machine
in the network. For example, one may wish to store eight bit data on a
six bit machine.  This may, in principle at least, be done by defining
a logical data set containing eight bit bytes in terms of a real data
set containing six bit characters. For this, however, data value type
descriptors have to be defined that are machine independent. The basic
machine independent data type is as follows:

          B    bit.

It is not clear at this time that any others are necessary since others
can be built from this one. For convenience, other standard machine
independent data types may later be defined.

    Two other machine independent types are useful in describing
structures. These are:

          Z    null

          O    omit.

the null type indicates that there is no data corresponding to this
item: however, the item should be counted as existing in the structure.
The omit type indicates just the opposite: there is data that should not
be counted as an item, it should be ignored.

                                                                [Page 3]


A grouping of data values is described by the list of elements of
the grouping enclosed in parentheses. An element of grouping may be
either a data value as described by one of the data value type codes, or
a grouping. The list consists of these elements, separated by commas and
indicates that the elements appear in the grouping in the order
indicated. For example, the description:

          ((C,C),(F,F,I))

describes a data collection consisting of two subgroupings, the first
subgrouping consisting of two data values of type 'C', and the second
subgrouping consisting of two data values of type 'F' followed by a data
value of type 'I'. the structure of this data collection is thus a three
level tree which may be shown in two dimensions as follows:

          ( )
           |
          ( )---( )
           |     |
           C-C   F-F-I

Properties

    Other properties of data beyond that of the structures and
composition of the data set have also to be described.  These may be
assigned to items of the data collection, where an item may be defined
as an individual data value, or a grouping of these, by modifying the
item description with the specification of the preperties that apply to
it. The notation that will be used will be an infix notation of the
form:

         operand operator |[extent]|

where the operator indicates the property type, the operand the property
value and the optional extent the numer of items to which the property
applies. Normally the property is assumed to apply to just the following
item in the description. If the property is to apply to more than just
the following item description, this is indicated by specifying a number
as the extent, this number indicating how many of the following item
definitions at the same level the property is to apply to.

    Type - The structure description of the data set is a constitutional
or syntactic description of the data set. In some cases it is necessary
to give a discription of the use or meaning of an element. For example,
in some complex data structures, the linkages of the structures may be
represented as data values in the data set. Thus, though the more

                                                                [Page 4]


complex data structure is represented in a hierarchical form and, as a
result, is in a form describable by the above notation, the data values
that represent the linkages, and their meaning, must somehow he
represented in the data description in order for the complex data
structure to be truly described. As another example, one may wish
ascribe to some level of the data structure the type 'record' so that
the data set can be used by some system which uses the concept 'record'
in accessing the data.

    What an initial set of such types should be has not been deicded.

    Names - Items of a data structure can be given names by modifying
the items description with a notation of the form:

          name n |[extent]|.

Depending on the context of its use, the name can refer to the
description itself or to the data pertaining to the named part of the
description. The name is assumed to be unique only within the scope of
extent of the next higher encompassing name unless otherwise indicated
by giving another encompassing name as the scope. This may be the name
of the whole data set or description, for example. The scope of a name
is specified by preceding the inner name by the outer name or names,
separated by dots. The name:

          A,BETA

indicates that the scope of the name BETA is A.

    The name applies to just the following item in the description
unless otherwise indicated by including the extent parameter, For
exampel, the description:

          (An(C,C), (Bn[ 2 ]F,Cn[ 2 ]F,I))

indicates that name A is given to the item that contains two data values
of type 'C', the name B to two data values, both of type 'F', and the
name C to the last two data values, one of type 'F', and the other of
type 'I'. Notice that with this notation, extents can overlap. For
example, in the above description, the extent of name B overlapped that
of name C.

    In a description, the same name can be applied to more than one item
definition either by use of the extent parameter, or by actually
specifying the name at each item to be included in the extent of the
name. If a name is multiply-applied within the same scope, then the name
is assumed to apply to the aggregate of the items to which it has been
given. Thus is possible to apply names to aggregates of items that are

                                                                [Page 5]


not necessarily sequential.

    Lock - During the course of processing data, it may be necessary to
lock out use of some portion of data to other users. Seqmentation of a
data set into units for locking purposes may be indicated by the
notation:

          k|[extent]|.

Whether or not the data is locked and the type of lock applied (for
example, write protect or read/write protect) is specified at the time
the data is used.

    Authorization - Authorization for a user to access data may be
governed by some access code assigned to the data.  This access code can
be specified in the description by modifying the desired elements of the
description with an indication of the code. The notation is:

          code a |[extent ]|

Control.

    Two modifiers are provided which govern the existence of items in
the definition. The first is the repetition modifier:

          factor r |[extent ]|.

This causes the following item definition or item definitions (if the
extent indicates more than one) to be repeated. Thus the description

          (3rC)

is equivalent to the description

          (C,C,C).

The other control modifier is the condition modifier:

          condition c |[extent ]|.

If the condition specified is not true, then the following item
definition is ignored. The condition is specified by a Boolean
expression.

    Since several modifiers may apply to an item definition, there is a
problem concerning the relationship among them. For example, if a
repetition modifier and a conditional modifier apply to an item, does
the condition apply to all the repeated items, or only to the first,

                                                                [Page 6]


assuming the extent of the condition modifier is one? The effect of
multiple modifiers is dependent on the order in which they are
evaluated. Two possible conventions come to mind. One says that
repetitions are expanded first, then properties applied, and finally
conditions applied to the resulting expanded item definitions,
independent of the order in which the modifiers were specified in the
description. Thus the description

          (A=3c [ 4 ]4rF,I)

is equivalent to

          (4rA=3c[ 4 ]F,T),

and if the condition is true, is equivalent to

          (F,F,F,F,I),

or, if the condition is not true, is equivalent to

          (T).

The other convention is that the modifiers are evaluated in the order in
which they appear in the description, perhaps.  in reverse order - the
modifier immediately preceding an item definition is evaluated first,
then the one next preceding, etc. This gives more flexibility of meaning
to the mulitple modifiers. For example, the descriptions

          (A=3c3rC)

and

          (3rA=3cC)

are not equivalent.  In the first, only the first of the three
repetitions is affected by the condition whereas in the second, ll three
repetitions are affected. Since this second convention is more flexible,
it shall be the one assumed. This convention allows, for example, the
repetition modifier to the applied to a named item as shown:

          (3rAnC).

The name A applies to the three items (in effect, the name A is applied
three times). This facility allows a name to be applied to a vertical
column in a two dimensioned array by, for example, the description:

          (3r[ 3 ]C,AnC,C)

                                                                [Page 7]


which given the name A to the second column of the 3x3 array.

Reference

    Named descriptions, or parts of descriptions, that have already been
defined may be inserted into a description using the notation:

          $ specification.

Is a description, the reference is used as an item definition of a
string fo item definitions. The item definitions used are those defined
by the name given. Names that apply to the named item or items as a
whole in the description in which it is defined are ignored by the
description at which is referred. However, names that apply to parts of
the named item are carried over to the description at which it is
referred. For example the description

          (An(F,F),I,$A)

is equivalent to

          (An(F,F),T,(F,F)).

Notice that the name "A" was not carried over to where the description
was referenced since it applied to the referred-to item or items as a
whole.

    Parts of a data set or description must be able to be specified for
use in a reference. This specification is in terms of the structure of
the data set or description. The specification has the form of a data
set name, or description name, followed by modifiers which particularize
to specifications, to the part desired. The four types of modifiers are
for going down a level, going up a level, going frontwards along a
level, and going backwards.

    Down - To go down a level from that previously specified, the
modifier has the following form:

         . item

or

         . (item |,extent| |,=value|).

Having gone down a level, the item indicates which particular item at
this level is the first (or only one) desired. This may be a number or a

                                                                [Page 8]


name. If more than one are desired, then the extent indicates how many
items. (* as extent means all remaining items at that level, !  means
the first item that meets the conditions that may he get on it or
subitems in following modifiers.) The items selected may he conditioned
by their contents. If a value is given, then only those items with the
value indicated are selected. For example,

          A.1.1

specifies the first field of the first record of data set A,

          A.(1*).1

specified the first field of all the records of A,

          A.1.(1,2)

specifies the first two fields of the first record of data

set A,

          A.(1,*).(1,="768174")

specified only the first fields of all the records of A that

have value "768174", and

          A.(1,!)-(1,="768174")

finds the first field that has value "768174".

    Up- To go up a level from that previously specified,

the modifier has the following form:

          ' item

or

          ' (item |,extent| |,=value|).

Going up a level specifies the item up one level that contains the item
previously specified. The item indicates which particular item at this
level is desired where the containing item is considered the first. For
example,

          A.(1,!).(1,=768174")'1

                                                                [Page 9]


specifies the first record whose first field has value "768174".

    Forward. - To go forward on the same level as that previously
specified, the modifier is as follows:

          + item

or

          + (item |,extent| |,=value|).

This modifier is useful when an item following the one which has a
certain value in a field is desired. It may also be used when the data
set name is really a pointer, into the data set, which has beet set
previously. Pointers may or may net be described in a section elsewhere.
Backward. - To go backward on the same level as that previously
specified, the modifier has the following form:

          - item

or

          - (item |,extent| |,=value|).

An example of the use of this modifier is when an item preceding the one
which has a certain value in a field is desired. This might be
specified:

          A.(1,!).(2,="768174")'1-1

       [ This RFC was put into machine readable form for entry ]
       [ into the online RFC archives Gottfried Janik   9/97   ]

                                                               [Page 10]