DAP 4.0 Design
Introduction
Overall Operation
Definitions
- Type Definition
- The representation of something in a data set which defines a data type; types are defined in a data set. The DAP provides a way to represent this definition and use it as a stand-in for a definition built using the DAP-supplied types.
- Dimension
- A name bound to a size, e.g., "lon" has a size of 1024
- Coordinate Variable
- A name bound to both a dimension and a datatype, e.g.,"height" is a vector of dimension "height" 32-bit floating point numbers, or "latitude" is an array of dimension "x" by dimension "y" 32-bit floating point numbers.
- Grid
- One or more N-dimensional array of values bound to 1 to N coordinate variables.
Data Types
DAP 4 will have a small increase in supported data types. All of the DAP 2 data types describes in ESE RFC 004.11 will be supported with their existing definition with the exceptions that Grid will be expanded so that it can be used in more situations and strings will comply with UTF-8. The additional types will support 64-bit integers, an Opaque type that can be used for data objects like JPEG images, Groups that can be used to build logical collections as in NetCDF4 or HDF5 (with some limitations over HDF5's definition of Group) and Dimensions. In addition, the server-side of DAP 4 will provide for Type Definitions which will allow data systems that have these to be presented with better fidelity than DAP 2.
Support for Existing Types
Changes in the Definition of Grid
Changes to the String Type
New Datatypes
Groups
The DDX will be modified so that it contains one or more Groups. If only one Group is present (which describes the case for DAP 3.2 and earlier) then the declaration can be left out, but if there are two or more groups, the declarations must be present.
Group characteristics:
- Any configuration of Groups other than one (anonymous) Group which holds all the variables in a data set must be declared.
- If declared, Groups must be named.
- A Group can contain any object, including a Group
- Variables and Attributes are named using / <group name> / ... / <variable name> to reflect their hierarchy.
- Each Group declares a new lexical scope for values.
- A Group cannot be an Array or a Grid (although the distinction between those two might become blurred or non-existent; Group is fundamentally a scalar container-type).
- This definition does not completely subsume the HDF5 Group type but is equivalent to the netCDF 4 version of it.
Dimensions
Type Definitions
Opaque
64-bit Integers
Suggested Types not Included
Discussed in this section are types that are present in some other systems (e.g., ASN 1.1) but that are not includes in DAP 4 along with the rationale of not including them.
Enumeration
This type will be taxing to client builders because there is an unbounded set of potential values. At the same time most (all?) clients will implement this type as a set with no actual knowledge of the set elements semantics, so it is no different than a byte or integer type with a attribute that provides a binding between (integral) formal values and String, et c., actual values.
Boolean
The additional semantic information provided by this type seems very limited given the cost associated with each additional data type in a protocol such as DAP.
Date/Time
Of all the types suggested but not included, this has the most potential. Unfortunately, it's very unlikely that this type would be implemented correctly by a majority of servers. It is certainly possible to include it in the DAP itself, but servers would have to provide a mapping from the encoding of date/time in each relevant data source to some sort of a standard representation (e.g., ISO 8601). That seems unlikely and thus it seems most likely that a date/time type would not be used consistently. A better solution is to use Attributes which provide the potential for third party mediation or augmentation.
Attributes
Attributes in DAP 4 are largely unchanged from DAP 2 with the only change being the addition of a new type of attribute to hold XML which is supplied for a data source using some external system.
Existing Attribute Types
New Attribute Types
XML
Names
Services
Examples
Responses
Persistent representations
DDX Document Organization
DAP and the DDX will be extended to include Groups, Shared dimensions and user-defined types. Groups will be added as a kind of constructor-type with properties similar to Structure and to Java or C++ namespaces. Unlike Structure, Groups cannot be dimensioned.
A rough syntax which describes how these additions will fit into the DAP and the existing DDX Notation is (Replace with XML schema):
Dataset :== Groups Groups :== null | Group Groups Group :== Types Dimensions Attributes Variables Groups Types :== null | Type Types Dimensions :== null | Dimension Dimensions Attributes :== null | Attribute Attributes Variables :== null | Variable Variables
This pseudo-grammar does not capture what can be produced for a Group, et cetera. Instead it shows how these sections of the DDX must be organized. It also does not show that a valid Dataset can have only Types (user-define types) and does not need to have variables, but it must have one or the other or both.
Examples
Group Examples
This data set contains one Group - the root group - which has by convention the name '/'
<Dataset ... > ... </Dataset>
This data set contains two Groups, one after the other.
<Dataset ... > <group name="primary"> ... </group> <group name="secondary"> ... </group> </Dataset>
This data set contains more Groups, and shows they can be nested.
<Dataset ... > <group name="primary"> ... <group name="in_situ"> ... </group> </group> <group name="secondary"> ... </group> </Dataset>