DAP Design: shared dimensions, groups and types

From OPeNDAP Documentation
Revision as of 00:40, 4 March 2009 by Jimg (talk | contribs) (→‎Groups)
⧼opendap2-jumptonavigation⧽

Back: DAP3/4#NC-DAP

Document Organization

In the DDX all items are declarations which describe things actually defined (assigned or otherwise associated with values) elsewhere. The term Type Definitions refers to the representation of something in a data set which defines a data type; the types are defined in the data set and the DDX merely holds a representation of that definition - a declaration.

DAP and the DDX will be extended to include Groups, Shared dimensions and user-defined types. Groups will be added as a kind of constructor-type with properties similar to Structure and to Java or C++ namespaces. Unlike Structure, Groups cannot be dimensioned.

A rough syntax which describes how these additions will fit into the DAP and the existing DDX Notation is:

Dataset :== Groups
Groups :== null | Group Groups
Group :== Types Dimensions Attributes Variables
Types :== null | Type Types
Dimensions :== null | Dimension Dimensions
Attributes :== null | Attribute Attributes
Variables :== null | Variable Variables

This pseudo-grammar does not capture what can be produced for a Group, et cetera. Instead it shows how these section of the DDX must be organized. It also does not represent that a valid Dataset can have only Types (user-define types) and does not need to have variables, but it must have one or the other or both.

Group

The DDX will be modified so that it contains one or more Groups. If only one Group is present (which describes the case for DAP 3.2 and earlier) then the declaration can be left out, but if there are two or more groups, the declarations must be present.

Group characteristics:

  • Any configuration of Groups other than one (anonymous) Group which holds all the variables in a data set must be declared.
  • If declared, Groups must be named.
  • A Group can contain any type, including a Group
  • Variables and Attributes are named using / <group name> / ... / <variable name> to reflect their hierarchy.
  • Each Group declares a new lexical scope for values.
  • A Group cannot be an Array or a Grid (although the distinction between those two might become blurred or non-existent; Group is fundamentally a scalar container-type).
  • This definition does not completely subsume the HDF5 Group type but is equivalent to the netCDF 4 version of it.

Examples:

This data set contains one Group - the root group - which has by convention the name '/'

<Dataset ... >
    ...
</Dataset>

This data set contains two Groups, one after the other.

<Dataset ... >
<group name="primary">
    ...
</group>
<group name="secondary">
    ...
</group>
</Dataset>

This data set contains more Groups, and shows they can be nested.

<Dataset ... >
<group name="primary">
    ...
    <group name="in_situ">
        ...
    </group>

</group>
<group name="secondary">
    ...
</group>
</Dataset>

Discussion

In the past we have often talked about Dataset as a kind of Structure but implicitly it's not exactly the same since there cannot be an Array of datasets; The Group type captures this semantic distinction.

In HDF5, the Group type is modeled after a general graph but here it's uses a strict hierarchy, which simplifies both servers and clients while retaining most of the utility of the HDF5 data type.

Shared Dimensions

Shared dimensions will be added to DAP so that special variables used to identify the independent parameters in non-scalar variables can be clearly labeled as such. When a dimension is named, each use of that dimension means the same values. Each dimension will have both a name and a size in addition to a type. The moniker Shared Dimension is really redundant. Any Dimension in DAP 3.3 can be shared. There is no requirement that a dimension be used; it can be declared and never used.

Dimension examples

Declaring dimensions in the DDX:

<Dataset name="dimension_ex_1" ...>
    <!-- The 'dimensions' section must come first if present -->
    <dimensions>
        <!-- Dimensions are declared like an array except that the 
             'dimension' element is replaced by an 'extent' element
             and the extent has no name (since the dimension itself
             is named -->
        <Int32 name="latitude">
            <extent size="1024">
        </Int32>
        <String name="color">
            <extent size="3">
        </String>
    </dimensions>

    <!-- remainder of the document -->

</Dataset>

Using dimensions in the DDX. In DAP, dimensions can only be used in a Grid variable;

<Dataset name="dimension_ex_1" ...>
    <dimensions>
        <Int32 name="latitude">
            <dimension size="1024">
        </Int32>
        <Int32 name="longitude">
            <dimension size="1024">
        </Int32>
    </dimensions>

    <!-- The two declarations that follow are effectively Grids,
         but now share the same dimentions -->
    <Byte name="uwnd">
        <!-- note that only the name is used, not the size -->
        <map name="latitude">
        <map name="longitude">
    </Byte>

    <Byte name="vwnd">
        <map name="latitude">
        <map name="longitude">
    </Byte>

    ...

</Dataset>

Types