DAP4: Specification Volume 1: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
 
(117 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[[Category:Development|Development]][[Category:DAP4|DAP4]]
[[Category:Development|Development]][[Category:DAP4|DAP4]]
[[OPULS_Development| << Back to OPULS Development]]
[[OPULS_Development| << Back to OPULS Development]]
<!-- Copyright 2012, UCAR/Unidata -->
<!-- Copyright 2016, UCAR/Unidata -->
<!-- See the COPYRIGHT file for more information. -->
<!-- See the COPYRIGHT file for more information. -->
<!-- When inserting text, AVOID the following situations: -->
<!-- When inserting text, AVOID the following situations: -->
Line 15: Line 15:
<table border=1 width="85%">
<table border=1 width="85%">
<tr><td width="20%">Date:</td><td>May 31, 2012</td></tr>
<tr><td width="20%">Date:</td><td>May 31, 2012</td></tr>
<tr><td width="20%">Last Revised:</td><td>22 November 2013</td></tr>
<tr><td width="20%">Last Revised:</td><td>24 February 2016</td></tr>
<tr><td width="20%">Status:</td><td>Draft</td></tr>
<tr><td width="20%">Status:</td><td>Draft</td></tr>
<tr><td width="20%">Authors:</td><td>John Caron (Unidata)</td></tr>
<tr><td width="20%">Authors:</td><td>John Caron (Unidata)</td></tr>
Line 23: Line 23:
<tr><td width="20%"></td><td>Dennis Heimbigner (Unidata)</td></tr>
<tr><td width="20%"></td><td>Dennis Heimbigner (Unidata)</td></tr>
<tr><td width="20%"></td><td>Nathan Potter (OPeNDAP)</td></tr>
<tr><td width="20%"></td><td>Nathan Potter (OPeNDAP)</td></tr>
<tr><td width="20%">Copyright:</td><td>2013 University Corporation for Atmospheric Research and Opendap.org</td></tr>
<tr><td width="20%">Copyright:</td><td>2016 University Corporation for Atmospheric Research and Opendap.org</td></tr>
</table>
</table>


Line 34: Line 34:
protocol is intended to supersede all previous versions of
protocol is intended to supersede all previous versions of
the DAP protocol. DAP4 is designed specifically for science
the DAP protocol. DAP4 is designed specifically for science
data. The protocol relies on the widely used and stable
data, but it is intended to be discipline neutral.
The protocol relies on widely used and stable
standards, and is capable of representing a wide variety of
standards, and is capable of representing a wide variety of
scientific data types.</i>
scientific data types.</i>
Line 108: Line 109:
<tr><td width="25%">2013.10.14</td>
<tr><td width="25%">2013.10.14</td>
     <td>Enforce a specific order on declarations in a Group body.</td></tr>
     <td>Enforce a specific order on declarations in a Group body.</td></tr>
<tr><td width="25%">2013.11.21</td>
    <td>Dennis made changes</td></tr>
<tr><td width="25%">2013.11.22</td>
<tr><td width="25%">2013.11.22</td>
     <td>Added sections for DSR, Async, and Error responses and their schemas</td></tr>
     <td>Added sections for DSR, Async, and Error responses and their schemas</td>
<tr><td width="25%">2013.11.22</td>
    <td>Specified the case sensitivity of XML element names and XML attribute names</td>
<tr><td width="25%">2014.07.04</td>
    <td>Make a pass to clean up and clarify (dmh)</td>
<tr><td width="25%">2016.02.14</td>
    <td>Rollback to version of 2015.12.16</td>
<tr><td width="25%">2016.02.24</td>
    <td>Add back the multiple disjoint slice subset.<br>Provide a general mechanism for arbitrary reserved names.</td>
<tr><td width="25%">2016.10.25</td>
    <td>Add _DAP4_Little_Endian attribute to the DMR to reflect the bytorder used to encode the serialized data.</td>
<tr><td width="25%">2016.12.5</td>
    <td>Forgot to mention adding the special names section (5.3)</td>
<tr><td width="25%">2016.12.18</td>
    <td>Clarified the reserved names section (5.3) to say that all names beginning with "_" are reserved, but that the reverse DNS case is preferred.</td>
</table>
</table>


==Introduction==
==Introduction==
Line 137: Line 147:
The DAP is a stateless protocol that governs clients making requests from servers, and servers issuing responses to those requests. This section provides an overview of the requests and responses (i.e. the messages) that DAP-compliant software MUST support. These messages are used to request information about a server and data made accessible by that server, as well as requesting data values themselves.
The DAP is a stateless protocol that governs clients making requests from servers, and servers issuing responses to those requests. This section provides an overview of the requests and responses (i.e. the messages) that DAP-compliant software MUST support. These messages are used to request information about a server and data made accessible by that server, as well as requesting data values themselves.


For every data resource the DAP defines a number of responses that may elicited by a client. These response provide services information (i.e. capabilities), structural/semantic descriptions, data access timing  and error information.
For every data resource the DAP defines a number of responses that may elicited by a client. These responses provide services information (i.e. capabilities), structural/semantic descriptions, data access timing  and error information.
 
The Dataset Services Response (DSR) provides a 'Services' or 'Capabilities' response for DAP4. Dereferencing an unadorned DAP4 dataset resource URL will return a document describing the DAP services available for the dataset.


The DAP utilizes two responses to represent semantic structural description and data content of a data source.  One response, the DMR returns metadata information describing the structure of a request for data. That is, it characterizes the variables, their datatypes, names and attributes. The second response, the Data Response, returns both the metadata about the request, but also the data that was requested. The DMR and the metadata part of the Data Response are represented using a specific XML [16] representation. The syntax of that representation is defined previously (Section [[#Fully Qualified Names|5.3]]).
The Dataset Services Response (DSR) provides a 'Services' or 'Capabilities' response for the DAP. Dereferencing an unadorned DAP dataset resource URL will return a document describing the DAP services available for the dataset.


The DAP Asynchronous Response is returned to a client when the requested resource (DMR, Data Response, etc.) is not immediately available but by making a specific request that it be made available the server is able to retrieve it. If the client makes the "retrieve it" request the server will inform the client through a subsequent Asynchronous Response when and where the client may access the requested resource.
The DAP utilizes two responses to represent semantic structural description and data content of a data source.  One response, called the DMR, returns metadata information describing the structure of a request for data. That is, it characterizes the variables, their datatypes, names and attributes. The second response, the Data Response, returns both the metadata about the request, but also the data that was requested. The DMR and the metadata part of the Data Response are represented using a specific XML [16] representation. The syntax of that representation is defined elsewhere in this document (Section [[#Fully Qualified Names|5.3]]).
   
   
The DAP returns error information using an Error response. If a request for any of the three basic responses cannot be completed then an Error response is returned in its place.
The DAP returns error information using an Error response. If a request for any of the three basic responses cannot be completed then an Error response is returned in its place.
Line 149: Line 157:
The two responses (DMR and Data Response) are complete in and of themselves so that, for example, a client can use the data response without ever requesting either of the two other responses. In many cases, client programs will request the DMR response first before requesting the Data Response but there is no requirement they do so and no server SHALL require that behavior on the part of clients.
The two responses (DMR and Data Response) are complete in and of themselves so that, for example, a client can use the data response without ever requesting either of the two other responses. In many cases, client programs will request the DMR response first before requesting the Data Response but there is no requirement they do so and no server SHALL require that behavior on the part of clients.


Operationally, communication between a DAP client and a DAP server uses some underlying already existing protocol. Volume 2 discusses the appropriate choices for the underlying protocol.
Operationally, communication between a DAP client and a DAP server uses some underlying already existing protocol, most typically HTTP. Volume 2 of this specification discusses how the DAP should utilize HTTP.


In addition to these data objects, a DAP server MAY provide additional "services" which clients may find useful.  For example, many DAP-compliant servers provide HTML-formatted representations or ASCII representations of a data source's structure and data. Such additional services are discussed in Volume 2 of this specification.
In addition to these data objects, a DAP server MAY provide additional "services" which clients may find useful.  For example, many DAP-compliant servers provide HTML-formatted representations or ASCII representations of a data source's structure and data. Such additional services are discussed in Volume 2 of this specification.
The DAP specification also defines extensions to the protocol and representing important, but optional, capabilities. At least the following extensions have been defined.
1. Asynchronous Response. The DAP Asynchronous Response is returned to a client when the requested resource (DMR, Data Response, etc.) is not immediately available and by making a specific request that it be made available the server is able to retrieve it. If the client makes the "retrieve it" request the server will inform the client through a subsequent Asynchronous Response when and where the client may access the requested resource.
2. CSV Data Encoding. The DAP4 CSV data encoding represents DAP4 data as structured Comma-Separated Values (CSV) in UTF-8 text. Though based on the text/csv media type described in RFC 4180[RFC 4180], the DAP4 CSV is more complex so that it can fully represent the more complex data structures of the DAP4 data model. Some structure beyond simple CSV is necessary to capture the DAP4 data structures.


==Characterization of a Data Source==
==Characterization of a Data Source==
Line 160: Line 172:


Section [[#DAP4 DMR Syntax as a RELAX NG Schema|13]] provides a formal syntax for DAP DMR characterizations. It is defined using the RelaxNG standard [13] for describing the context-free syntax of a class of XML documents, the DMR in this case. It should be noted that any syntax specification requires a specification of the lexical elements of the syntax.
Section [[#DAP4 DMR Syntax as a RELAX NG Schema|13]] provides a formal syntax for DAP DMR characterizations. It is defined using the RelaxNG standard [13] for describing the context-free syntax of a class of XML documents, the DMR in this case. It should be noted that any syntax specification requires a specification of the lexical elements of the syntax.
The XML specification [16] provides most of the lexical context for the syntax, but there are certain places where additional lexical elements must be used. Section [[#DAP4 Lexical Elements|11]] describes those additional lexical elements, and those elements are discussed at appropriate points in the following discussion.
The XML specification [16] provides most of the lexical context for the syntax, but there are certain places where additional lexical elements must be used. Section [[#DAP4 Lexical Elements|11]] describes those additional lexical elements, and those elements are discussed at appropriate points in this specification.


Since the syntax is context-free, there are semantic limitations on what is legal in a DMR. These semantic limitations are noted at appropriate places in the following documentation. It should also be noted that if there are conflicts between what is described here and the RelaxNG syntax, then the syntax takes precedence.
Since the syntax is context-free, there are semantic limitations on what is legal in a DMR. These semantic limitations are noted at appropriate places in the following documentation. It should also be noted that if there are conflicts between what is described here and the RelaxNG syntax, then the syntax takes precedence.
Line 168: Line 180:
===DMR XML Format===
===DMR XML Format===


====Element and Attribute Names====
;Element and Attribute Names:
Within the DMR XML document, it is assumed
:Within the DMR XML document, it is assumed that XML element and XML attribute names are case sensitive.
that element names are case sensitive.
XML attribute names are case insensitive.


====Character Escapes====
;Character Escapes:
Any string of characters appearing within an XML attribute in the DMR must apply the standard XML escapes.  Specifically, any attribute value containing any of the following characters must replace them with the corresponding XML escape form.
:Any string of characters appearing within an XML attribute in the DMR must apply the standard XML escapes.  Specifically, any attribute value containing any of the following characters must replace them with the corresponding XML escape form.


<table border=1 width="30%">
<table border=1 width="30%">
Line 181: Line 191:
<tr><td>&lt;<td>&amp;lt;
<tr><td>&lt;<td>&amp;lt;
<tr><td>&gt;<td>&amp;gt;
<tr><td>&gt;<td>&amp;gt;
c<tr><td>"<td>&amp;quot;
<tr><td>"<td>&amp;quot;
</table>
</table>


Line 191: Line 201:
===Names===
===Names===


A name (aka identifier) in DAP4 consists of a sequence of any legal non-control UTF-8 characters. A control character is any UTF-8 character in the inclusive range 0x00 &mdash; 0x1F.
A name (aka identifier) in DAP4 consists of a sequence of any legal non-control UTF-8 characters. A control character is any UTF-8 character in the inclusive range 0x00 &mdash; 0x1F. Names are case sensitive.
 
===Reserved Names===
 
Any name that begins with the character sequence "_" is considered reserved. Note that if the receiver encounters such a name and has no information on how to process the name, it may at its discretion either ignore the object with that name, or it may treat the name as an ordinary name.
 
A special case is when the "_" is followed by a reverse DNS name defining both the definer of that reserved name and possible additional naming information. This form of reserved name is preferred because it provides information about the organization that defined it.
 
A (reverse) DNS name is of this syntactic form.
<pre>
DNS = &lt;name&gt; | DNS '.' &lt;name&gt;
</pre>
An example might be "edu.ucar.unidata.NAME1.NAME2...". This indicates the owner/definer of that name is "edu.ucar.unidata" and that the additional naming information ("NAME1.NAME2...) has meaning to the owner for defining the semantics of the so-named object.


===Fully Qualified Names===
===Fully Qualified Names===
Line 197: Line 219:
Every object in a DAP4 Dataset has a Fully Qualified Name (FQN), which provides a way to unambiguously reference declarations in a dataset and which can be used in several contexts such as in the DMR in a constraint expression
Every object in a DAP4 Dataset has a Fully Qualified Name (FQN), which provides a way to unambiguously reference declarations in a dataset and which can be used in several contexts such as in the DMR in a constraint expression
(see Section [[#Constraints|8]]).
(see Section [[#Constraints|8]]).
These FQNs follow the common conventions of names for lexically scoped identifiers.  In DAP4 three kinds of lexical items provide lexical scoping: Dataset, Groups, Structures, and Sequences . Just as with hierarchical file systems or variables in many programming languages, a simple grammar formally defines how the names are built using the names of the FQN's components (see Section [[#FQN Syntax|10]]). Consider the following simple dataset, which contains a Structure named "inner" within a Structure named "outer" all contained in the Dataset "D".


These FQNs follow the common conventions of names for lexically scoped identifiers.  In DAP4 several kinds of lexical items provide lexical scoping: Dataset, Groups, Structures, Sequences, Enumerations, and AttributeSets. Just as with hierarchical file systems or variables in many programming languages, a simple grammar formally defines how the names are built using the names of the FQN's components (see Section [[#FQN Syntax|10]]).
The FQN for a "top-level" variable &mdash; as opposed to e.g. a field in a structure or sequence &mdash;  is defined purely by the sequence of enclosing groups plus the variable's simple name. This also holds for Enumeration declarations.
Consider the following simple dataset, which contains a Structure named "inner" within a Structure named "outer" all contained in the Dataset "D".
<blockquote>
<blockquote>
<pre>
<pre>
Line 212: Line 238:
</pre>
</pre>
</blockquote>
</blockquote>
The FQN for the field 'temperature' is
The FQN for the field 'temperature' is
<blockquote>
<pre>'/places.weather.temperature'.</pre>
'/places.weather.temperature'.
</blockquote>
Substituting the keyword ''Sequence'' for one or more occurrences of ''Structure'' in the above example will leave the FQNs unchanged.
Substituting the keyword ''Sequence'' for one or more occurrences of ''Structure'' in the above example will leave the FQNs unchanged.
Note that the name of the dataset ("D") is not included; it is implied by the leading "/".


As is the case with Structure or Sequence variables, Groups can be nested to form hierarchies, too, and this example shows that case.
As is the case with Structure or Sequence variables, Groups can be nested to form hierarchies, too, and this example shows that case.
Line 240: Line 264:


The FQN to the field 'temperature' in the dataset shown is
The FQN to the field 'temperature' in the dataset shown is
<pre>'/environmental_data/places.weather.temperature'.</pre>
Note the use of a different separator character &mdash; "." instead of
"/" &mdash; once we enter the scope of a structure (or sequence).
Enumeration constants are treated similarly to fields. Consider this example.
<blockquote>
<blockquote>
'/environmental_data/places.weather.temperature'.
<pre>
&lt;Dataset name="DE"&gt;
    &lt;Enumeration name="e"&gt;
        <EnumConst name="v1" value="5"/>
    &lt;/Enumeration&gt;
&lt;/Dataset&gt;
</pre>
</blockquote>
</blockquote>
The FQN for the "v1" constant in "e" is as follows. <pre>/e.v1</pre>


Notes:
Notes:
Line 249: Line 285:
which semantically, acts like the root group.
which semantically, acts like the root group.
Whatever name that dataset has is ignored for the purposes of forming the FQN and instead is treated as if it has the empty name ("").
Whatever name that dataset has is ignored for the purposes of forming the FQN and instead is treated as if it has the empty name ("").
<li>There is no limit to the nesting of groups or the nesting of Structures or the nesting of Sequences.
<li>There is no limit to the nesting of groups or the nesting of Structures or the nesting of Sequences. Enumerations cannot be nested.
<li>Reserved names (see above) inherently contain characters ('.') that will require escaping.
</ol>
</ol>


Line 259: Line 296:
<tr><th>/<th>\/
<tr><th>/<th>\/
<tr><th>\<th>\\
<tr><th>\<th>\\
<tr><th>blank <th>\blank
</table>
</table>


Line 321: Line 359:
</blockquote>
</blockquote>


A group defines a name space and contains other DAP elements. Specifically, it can contain groups, variables, dimensions, and enumerations. The fact that groups can be nested means that the set of groups in a DMR form a tree data structure. For any given DMR, there exists a root group that is the root of this tree.
A group defines a name space and contains other DAP elements. Specifically, it can contain in this order: dimension, enumerations, variables, and (sub-)groups. The fact that groups can be nested means that the set of groups in a DMR form a tree data structure. For any given DMR, there exists a root group that is the root of this tree.


A nested set of groups defines a variety of name spaces and access to the contents of a group is specified using a notation of the form "/g1/g2/.../gn". This is called a "path". By convention "/" refers to the root group (the Dataset declaration). Thus the path "/g1/g2/g3" indicates that one should start in the root group, move to group g1 within that root group, then to group g2 within group g1, and finally to group g3. This is more fully described in the section on Fully Qualified names
A nested set of groups defines a variety of name spaces and access to the contents of a group is specified using a notation of the form "/g1/g2/.../gn". This is called a "path". By convention "/" refers to the root group (the Dataset declaration). Thus the path "/g1/g2/g3" indicates that one should start in the root group, move to group g1 within that root group, then to group g2 within group g1, and finally to group g3. This is more fully described in the section on Fully Qualified names (Section [[#Fully Qualified Names|5.3]]).
(Section [[#Fully Qualified Names|5.3]]).


The order of declarations within a Group is fixed and must conform to
The order of declarations within a Group is fixed and must conform to
Line 345: Line 382:
<li>Each Group declares a new lexical scope for the objects it contains.  
<li>Each Group declares a new lexical scope for the objects it contains.  


<li>A Group cannot have dimensions and a Group cannot be defined within a Structure or Sequence.
<li>An array of Group is not allowed, and a Group cannot be defined within a Structure or Sequence.
</ol>
</ol>


Line 360: Line 397:
The size is a positive integer (which means that a zero
The size is a positive integer (which means that a zero
length dimension is illegal).  As described in the
length dimension is illegal).  As described in the
<a href="#arrays>Arrays Section</a>, the maximum size of any
Arrays Section, the maximum size of any
dimension is 2<sup>61</sub> - 1.  A dimension declaration
dimension is 2<sup>61</sup> - 1.  A dimension declaration
will be referenced elsewhere in the DMR by specifying its
will be referenced elsewhere in the DMR by specifying its
name. It should also be noted that anonymous dimensions also
name. It should also be noted that anonymous dimensions also
Line 371: Line 408:
<li> Dimension declarations are not associated with a data type.
<li> Dimension declarations are not associated with a data type.


<li> Dimension sizes that are not 'anonymous' MUST be a capable of being represented as a signed 64-bit integer.
<li> Dimension sizes MUST be a capable of being represented as a signed 64-bit integer.
</ol>
</ol>


===Enumeration Types===
===Enumeration Types===


An enumeration type defines a set of names with specific values: enumeration constants. As will be seen in Section [[#Variables|5.12]], enumeration types may be used as the type for variables or attributes. The values that can be assigned to such typed objects must come from the set of enumeration constants.
An enumeration type defines a set of names with specific values called enumeration constants. As will be seen in Section [[#Variables|5.12]], enumeration types may be used as the type for variables or attributes. The values that can be assigned to such typed objects must come from the set of enumeration constants.


An enumeration type specifies a set of named, integer constants. When a data source has a variable of type 'Enumeration' a DAP 4 server MUST represent that variable using a specified integer type, up to an including a 64-bit unsigned integer.  
An enumeration type specifies a set of named, integer constants. When a data source has a variable of type 'Enumeration' a DAP 4 server MUST represent that variable using a specified integer type, up to and including a 64-bit unsigned integer.  


An Enumeration type is declared using this XML form.
An Enumeration type is declared using this XML form.
Line 422: Line 459:
</table>
</table>


Note that for historical reasons, the Char type is defined to be a synonym of UInt8, this mean that technically, the Char type has no associated character set encoding. However, servers and clients are free to infer typical character semantics to this type. The inferred character set encoding is chosen purely at the discretion of the server or client using whatever conventions they agree to use.
Note that for historical reasons, the Char type is defined to be a synonym of UInt8, this mean that technically, the Char type has no associated character set encoding. However, servers and clients are free to infer typical character semantics to this type. The inferred character set encoding is chosen purely at the discretion of the server or client using whatever conventions they agree to use, possibly specified using attributes. Note specifically that multi-byte character encodings such as UTF-8 are problematic precisely because they can be multi-byte.


<i><u><span id="Floating Point Types">Floating Point Types</span></u></i>
<i><u><span id="Floating Point Types">Floating Point Types</span></u></i>
Line 457: Line 494:


The Opaque type is use to hold objects like JPEG images and other Binary Large Object (BLOB) data that have significant internal structure which might be understood by clients (e.g., an image display program) but that would be very cumbersome to describe using the DAP4 built-in types. Defining a variable of type "Opaque" does not communicate any information about its content, although an attribute could be used to do that.
The Opaque type is use to hold objects like JPEG images and other Binary Large Object (BLOB) data that have significant internal structure which might be understood by clients (e.g., an image display program) but that would be very cumbersome to describe using the DAP4 built-in types. Defining a variable of type "Opaque" does not communicate any information about its content, although an attribute could be used to do that.
Opaque instances are individually sized. This means that in an array of opaques, for example, each instance of that opaque MAY be of a different size.


''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
Line 472: Line 511:
</blockquote>
</blockquote>


The Enum type is intended to be used in the definition of a variable. It should not be confused with the definition of an Enumeration, but rather references such a definition.


''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
Line 504: Line 544:


The ''corresponding'' Structure object
The ''corresponding'' Structure object
is obtained by substuting the ''Sequence''
is obtained by substituting the ''Sequence''
keyword with ''Structure. Our above example
keyword with ''Structure''. Our above example
then has this associated Structure.
then has this associated Structure.
<blockquote>
<blockquote>
Line 515: Line 555:
</pre>
</pre>
</blockquote>
</blockquote>


The semantics of a sequence are that it represents a sequence
The semantics of a sequence are that it represents a sequence
Line 530: Line 569:
This represents an array of six (3 times 2) sequence instances. However, the length MAY be different for each of those six instances.
This represents an array of six (3 times 2) sequence instances. However, the length MAY be different for each of those six instances.


Note that the &lt;Sequence&gt; construct was introduced to replace the concept of variable length dimensions. It turns out that trying to treat variable length dimensions as dimensions causes significant conceptual and implementation difficulties. It is hoped thatisolating such variable length objects syntactically is a better representation.
Note that the &lt;Sequence&gt; construct was introduced to replace the concept of variable length dimensions. It turns out that trying to treat variable length dimensions as dimensions causes significant conceptual and implementation difficulties. It is hoped that isolating such variable length objects syntactically is a better representation.


''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
Line 539: Line 578:
===Variables===
===Variables===


Each variable in a data source MUST have a name, a type and one or more values. Using just this information and armed with an understanding of the definition of the DAP data types, a program can read any or all of the information from a data source.
Each variable in a data source MUST have a name, a type and one or more values. Using just this information and armed with an understanding of the definition ofv the DAP data types, a program can read any or all of the information from a data source.


The DAP variables come in several different types. There are several atomic types, the basic indivisible types representing integers, floating point numbers and the like, and a container type &ndash; the Structure or Sequence type &ndash; that supports aggregation of other variables into a single unit. A container type may contain both atomic typed variable as well as other container typed variables, thus allowing nested type definitions.
The DAP variables come in several different types. There are several atomic types, the basic indivisible types representing integers, floating point numbers and the like, and a container type &ndash; the Structure or Sequence type &ndash; that supports aggregation of other variables into a single unit. A container type may contain both atomic typed variable as well as other container typed variables, thus allowing nested type definitions.
Line 548: Line 587:
<i><u><span id="Arrays">Arrays</span></u></i>
<i><u><span id="Arrays">Arrays</span></u></i>


Most (but not all) types may be arrays. An Array is a
An Array is a multi-dimensional indexed data structure. An Array's member
multi-dimensional indexed data structure. An Array's member
variable MUST be of some DAP data type. Array indexes MUST
variable MUST be of some DAP data type. Array indexes MUST
start at zero. Arrays MUST be stored in row-major order (as
start at zero. Arrays MUST be stored in row-major order (as
Line 555: Line 593:
declaration of dimensions is significant. The size of each
declaration of dimensions is significant. The size of each
Array's dimensions MUST be given.
Array's dimensions MUST be given.
The number of elements in an Array is fixed as that given by product
The total number of elements in an Array is fixed as that
of the size(s) of its dimension(s).
given by the product of the size(s) of its
Note that a dimension size of zero is illegal.
dimension(s). Note that a dimension size of zero is illegal.


For practical reasons having to do with current hardware
For practical reasons having to do with current hardware
Line 580: Line 618:
<ol>
<ol>
<li> Simple variables (see below) MAY be arrays.
<li> Simple variables (see below) MAY be arrays.
<li> Structures and Sequences MAY be arrays.
<li> Structures and Sequences MAY be arrays.
</ol>
</ol>
Line 591: Line 628:
<pre>
<pre>
&lt;Int32 name="name"&gt;
&lt;Int32 name="name"&gt;
   &lt;Dim name="{fqn};"/&gt;
   &lt;Dim name="{fqn}"/&gt;
   ...
   ...
   &lt;Dim size="{integer}"/&gt;
   &lt;Dim size="{integer}"/&gt;
Line 598: Line 635:
</blockquote>
</blockquote>


Note the use of two types of dimensions.
Note the use of two types of dimensions:
<ol>
<ol>
<li> name="{fqn}" &ndash; specify the fully qualified name of a dimensions
<li> name="{fqn}" &ndash; specify the fully qualified name of a Dimension
declared previously,
that has been declared previously in the XML document order. ''[https://www.w3.org/TR/DOM-Level-3-Core/glossary.html See the W3C DOM-3 glossary for the definition of XML document order.]''
 
<li> size="{integer}" &ndash; specify an anonymous dimension of a given size,  
<li> size="{integer}" &ndash; specify an anonymous dimension of a given size,  
</ol>
</ol>
Line 614: Line 652:


<i><u><span id="Dimension Ordering">Dimension Ordering</span></u></i>
<i><u><span id="Dimension Ordering">Dimension Ordering</span></u></i>
Consider this example.
Consider this example.
<blockquote>
<blockquote>
<pre>
<pre>
Line 627: Line 663:
</pre>
</pre>
</blockquote>
</blockquote>
The dimensions are considered ordered from top to bottom. From this, a corresponding left-to-right order [d1][d2]...[dn] can be inferred where the top dimension is the left-most and the bottom dimension is the right-most. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the right-most is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this left-to-right ordering of dimensions.
The dimensions are considered ordered from top to bottom. From this, a corresponding left-to-right order [d1][d2]...[dn] can be inferred where the top dimension is the left-most and the bottom dimension is the right-most. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the right-most is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this left-to-right ordering of dimensions.


Line 635: Line 670:


<i><u><span id="Structures">Structures</span></u></i>
<i><u><span id="Structures">Structures</span></u></i>
The XML scheme for a Structure typed variable is as follows.
The XML scheme for a Structure typed variable is as follows.
<blockquote>
<blockquote>
<pre>
<pre>
Line 651: Line 684:
</pre>
</pre>
</blockquote>
</blockquote>
The Structure contains within it a list of variable definitions
The Structure contains within it a list of variable definitions
(Section [[#Variables|5.12]]).
(Section [[#Variables|5.12]]).
Line 658: Line 690:
''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
<ol>
<ol>
<li> Structures MAY be dimensioned.
<li> Structure variables MAY be dimensioned.
</ol>
</ol>


Line 689: Line 721:
''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
<ol>
<ol>
<li> Sequences MAY be dimensioned.
<li> Sequence variables MAY be dimensioned.
</ol>
</ol>


Line 695: Line 727:


A "Discrete Coverage" is a concept commonly found in many disciplines, where the term refers to a sampled function with both its domain and range explicitly enumerated by variables. DAP2 uses the name 'Grid' to denote what the OGC calls a 'rectangular grid' [12]. DAP4 expands on this so that other types of discrete coverages (hereafter 'coverage(s)') can be explicitly represented.
A "Discrete Coverage" is a concept commonly found in many disciplines, where the term refers to a sampled function with both its domain and range explicitly enumerated by variables. DAP2 uses the name 'Grid' to denote what the OGC calls a 'rectangular grid' [12]. DAP4 expands on this so that other types of discrete coverages (hereafter 'coverage(s)') can be explicitly represented.
Note that the DAP2 ''Grid'' construct is gone, and is replaced by these coverages, which are more general than DAP2 Grids.
Consider the example coverage function
:Temp: ''lat'' X ''lon'' -> Float32
:where
:''lat'' and ''lon'' subsets are of Float32 in the range [0,360).
The range is, of course, ''Float32'' and the domain is ''lat X lon''. The Temp function as a coverage is a sampled subset of the continuous function and is defined at some finite set of pairs from lat X lon.
In DAP4, the range for a coverage is represented by a variable, Temp in this example, whose values are the range of the sampled function. Because the domain of ''Temp'' is a two-tuple (lat,lon), the DAP4 variable must have rank two. In order to complete the sampling of Temp, it is necessary to also define two 'Map' (also called 'coordinate') variables representing the sampling of lat and lon. These two variables, lat and lon, have rank one each. Taken as whole, this collection of a variable plus maps is called a "grid" for convenience sake.


In DAP4, the range for a coverage is the values of a (simple or container) variable that includes a specific set of 'maps' or 'coordinate variables' that define the domain for the sampled function. Taken as whole, this type of variable is called a "grid" for convenience sake.
Suppose we want to access the value of the Temp function at position (x,y), where x is a value
in the lat variable and y is a value in the lon variable. The lat variable is consulted to find ilat
such that lat[ilat] = x. Similarly, we want the ilon index such that lon[ilon] = y. We can then obtain Temp(x,y) as the value of Temp[ilat][ilon]. This is probably the simplest example for using coverages and more complex examples exist for, for example, satellite swathes.


Using OGC coverage terminology, we have this.
Using OGC coverage terminology, we have this.
<ol>
<ol>
<li> The maps specify the "Domain"
<li> The maps (e.g. lat and lon) specify the "Domain"


<li> The array specifies the "Range"
<li> The array (e.g. Temp) specifies the "Range"


<li> The Grid itself is a "Coverage" per OGC.
<li> The Grid itself is a "Coverage" per OGC.
Line 713: Line 758:
<blockquote>
<blockquote>
<pre>
<pre>
&lt;Map name="{FQN for some variable defined in the DMR}"/&gt;
&lt;Map name="{FQN for some variable previously defined in the DMR}"/&gt;
</pre>
</pre>
</blockquote>
</blockquote>
Line 721: Line 766:
<blockquote>
<blockquote>
<pre>
<pre>
&lt;Float32 name="A"&gt;
&lt;Float32 name="Temp"&gt;
   &lt;Dim name="/lat"/&gt;
   &lt;Dim name="/lat"/&gt;
   &lt;Dim name="/lon"/&gt;
   &lt;Dim name="/lon"/&gt;
Line 742: Line 787:
</blockquote>
</blockquote>


The containing variable, A in the example, will be referred to as the "array variable".
The containing variable, temp in the example, will be referred to as the "array variable".


''<ins>Semantic Notes</ins>''
''<ins>Semantic Notes</ins>''
Line 750: Line 795:
<li> An array variable can have as many maps as desired.
<li> An array variable can have as many maps as desired.


<!-- Why did we do this?
<li> The dimensions of the array variable may not contain duplicates so A[x,x] is disallowed.
<li> The dimensions of the array variable may not contain duplicates so A[x,x] is disallowed.
-->


<li> Any map duplicates are ignored and the order of declaration of the maps is irrelevant.
<li> Any map duplicates are ignored


<li> The order of declaration (top to bottom) MAY be significant.
<!-- why this limitations?
and the order of declaration of the maps is irrelevant.
-->


<li> The fully qualified name of a map must either be in the same lexical scope as the array variable, or the map must be in some enclosing scope.
<li> The fully qualified name of a map must either be in the same lexical scope as the array variable, or the map must be in some enclosing scope.
Line 773: Line 824:
<i><u><span id="Attributes">Attributes</span></u></i>
<i><u><span id="Attributes">Attributes</span></u></i>


Attributes are defined using the following XML scheme.
Simple attributes are defined using the following XML scheme.
 
<blockquote>
<blockquote>
<pre>
<pre>
&lt;Attribute name="name" type="{atomic type name}"&gt;
&lt;Attribute name="name" type="{atomicTypeName|EnumType fqn}"&gt;
   &lt;Namespace href="http://netcdf.ucar.edu/cf"/&gt;
   &lt;Namespace href="http://netcdf.ucar.edu/cf"/&gt; &lt;!--optional--&gt;
   &lt;Value value="value"/&gt;
   &lt;Value value="value"/&gt;
   ...
   ...
Line 784: Line 834:
&lt;/Attribute&gt;
&lt;/Attribute&gt;


&lt;Attribute name="name" type="{container name}"&gt;
or
 
&lt;Attribute name="name" type="{atomicTypeName|EnumType fqn}" value="value"/&gt;
</pre>
</blockquote>
 
Attributes may also serve as containers for other attributes (and other containers). In this case, no type is specified, only a name.
<blockquote>
<pre>
&lt;Attribute name="name"&gt;
   &lt;Namespace href="http://netcdf.ucar.edu/cf"/&gt;
   &lt;Namespace href="http://netcdf.ucar.edu/cf"/&gt;


Line 803: Line 862:
In DAP4, Attributes (not to be confused with XML attributes) are tuples with four components:  
In DAP4, Attributes (not to be confused with XML attributes) are tuples with four components:  
<ul>
<ul>
<li> Name  
<li> Name,
 
<li> Type (one of the defined atomic types such as Int16, String, Enum fqn, etc.).
<li> Type (one of the defined atomic types such as Int16, String, etc.), or a child attribute container
<li> value as an alternate form for attributes with a single value,
 
<li> Vector of one or more value declarations,
<li> Vector of values
<li> OR a set of contained attributes,
 
<li> Zero or more Namespaces
<li> One or more Namespaces (optional)
</ul>
</ul>


Line 823: Line 881:
<ol>
<ol>
<li> DAP4 explicitly treats an attribute with one value as an attribute whose value is a one-element vector.  
<li> DAP4 explicitly treats an attribute with one value as an attribute whose value is a one-element vector.  
 
<li> All of the atomic types are allowed as the type for an attribute
<li> All of the Atomic types as well as containers are allowed as the type for an attribute
<li> If the attribute has type Enum, it must also have an XML attribute, ''enum'', that references a previously defined &lt;Enumeration&gt; declaration.
<li> If the attribute has type Enum, it must also have an attribute that references a previously defined &lt;Enumeration&gt; declaration.
<li> Attribute value constants MUST conform to the appropriate constant format for the given attribute type and as defined in Section [[#DAP4 Lexical Elements|11]].
 
<li> Attribute containers may may only contain attributes. Container attributes may not have values; only lowest level (leaf) attributes may have values.
<li> Attribute value constants MUST conform to the appropriate constant format for the given attribute type and as defined in
Section [[#DAP4 Lexical Elements|11]].
 
<li> Attributes may themselves have attributes: effectively leading to nested attributes. Such attributes are called container attributes. However container attributes may not have values; only lowest level (leaf) attributes may have values.
</ol>
</ol>


<i><u><span id="Arbitrary XML content ">Arbitrary XML content </span></u></i>
<i><u><span id="Arbitrary XML content ">Arbitrary XML content </span></u></i>


By supporting an explicit type to hold "arbitrary XML" markup, DAP4 provides a way for the protocol to transport information encoded in XML along with the attributes read from the dataset itself. This has proved very useful in work with semantic web software.  
Dap4 supports an explicit type to hold "arbitrary XML" markup that provides a way for the protocol to transport information encoded in XML. This is useful for "annotating" meta-data with information more complex than simple attributes. This can be used, for example, for passing semantic web information, or for passing out-of-band information: e.g about the conversion from some other meta-data system into DAP4.


The form on an otherXML declaration is as follows.
The form on an otherXML declaration is as follows.
<blockquote>
<blockquote>
<pre>
<pre>
Line 846: Line 899:
</pre>
</pre>
</blockquote>
</blockquote>
There are no &lt;value/&gt; elements because the value of otherXML
There are no &lt;value/&gt; elements because the value of otherXML
is the xml inside the &lt;otherXML&gt;...&lt;/otherXML&gt;.
is the xml inside the &lt;otherXML&gt;...&lt;/otherXML&gt;.
Line 887: Line 939:


<i><u><span id="DMR-Only Response">DMR-Only Response</span></u></i>
<i><u><span id="DMR-Only Response">DMR-Only Response</span></u></i>
If the client requests only the DMR, then it is returned as a standard XML encoded document. If constraints were specified, then the returned DMR may differ from the full DMR in that, for example, meta-data about only variables specified in the constraint will be returned.
If the client requests only the DMR, then it is returned as a standard XML encoded document. If constraints were specified, then the returned DMR may differ from the full DMR in that, for example, meta-data about only variables specified in the constraint will be returned. The DMR-Only response MUST be ''self-contained''. This means that all declarations directly or transitively mentioned in the selected variables must be included in the returned DMR. Additionally, all attributes associated with the included declarations MUST be included as well.


<i><u><span id="Data Response">Data Response</span></u></i>
<i><u><span id="Data Response">Data Response</span></u></i>
Line 894: Line 946:
The first part holds metadata describing the names and types of the variables in the response while the second part holds the values of those variables.
The first part holds metadata describing the names and types of the variables in the response while the second part holds the values of those variables.


The metadata information, sent as part 1 of the Data Response, is the DMR limited to just those variables included in the response. DAP attributes may be included, but MAY be ignored by the receiving client.
The metadata information, sent as part 1 of the Data Response, is the DMR limited to just those variables included in the response. The response, however, MUST be self-contained (in the DMR-Only sense). DAP attributes for all included declarations MUST be included, but MAY be ignored by the receiving client.


Part 2 of the response consists of the binary data for each variable in the order they are listed in the DMR given as the response preface. DAP4 uses a receiver makes it right encoding, so the servers MAY simply write out binary data as they store it with the exceptions that floating-point data must be encoded according to IEEE 754[6] and Integer data must use twos-complement notation for signed types. Clients are responsible for performing byte-swapping operations needed to compute using the values retrieved.
Part 2 of the response consists of the binary data for each variable in the order they are listed in the DMR given as the response preface. DAP4 uses a receiver makes it right encoding, so the servers MAY simply write out binary data as they store it with the exceptions that floating-point data must be encoded according to IEEE 754[6] and Integer data must use twos-complement notation for signed types. Clients are responsible for performing byte-swapping operations needed to compute using the values retrieved.


The Data Response is encoded using chunking scheme  
The Data Response is encoded using chunking scheme  
(see Section [[#How the Chunked Encoding Affects the Data Response Format|6.1.3]]).
(see Section [[#How the Chunked Encoding Affects the Data Response Format|6.2]]).
that breaks it into N parts where each part is prefixed with a chunk type and chunk byte count header. Chunk types include data and error types, making it simple for servers to indicate to clients that an error occurred during the transmission of the Data Response and (relatively) simple for clients to detect that error.
that breaks it into N parts where each part is prefixed with a chunk type and chunk byte count header. Chunk types include data and error types, making it simple for servers to indicate to clients that an error occurred during the transmission of the Data Response and (relatively) simple for clients to detect that error.


Line 908: Line 960:
<i><u><span id="Format of the DMR Part">Format of the DMR Part</span></u></i>
<i><u><span id="Format of the DMR Part">Format of the DMR Part</span></u></i>


The first part (''part'' is not to be confused with ''chunk'') of the Data Response always contains the DMR. The Data Response, when DAP is using HTTP as a transport protocol, is the payload for an HTTP response, is separated from the last of the HTTP response's MIME headers by a single blank line, which MIME defines as a carriage return
The first part (''part'' is not to be confused with ''chunk'') of the Data Response always contains the DMR. The Data Response, when DAP is using HTTP as a transport protocol, is the payload for an HTTP response. It is separated from the last of the HTTP response's MIME headers by a single blank line, which MIME defines as a carriage return (ASCII character with byte value of 13) followed by a line feed (ASCII character with byte value of 10). This combination can be abbreviated as CRLF.
(ASCII value 13) followed by a line feed (ASCII value 10). This combination
 
can be abbreviated as CRLF.
<i>Format Related DMR Attributes</i><br>
The DMR MAY contain attributes that reflect information from the serialized data.
Specifically, the following attributes are defined.
<ol>
<li> <Attribute name="_DAP4_Checksum_CRC32" type="Int32"/> &mdash;
this attribute may be attached to each top-level variable to show
the CRC-32 checksum of the content of that data. See Section
[[#The DAP4 Serialized Representation|6.2]]
for more information.
<li> <Attribute name="_DAP4_Little_Endian" type="UInt8"/> &mdash;
this attribute exists in the root group (the dataset) to indicate if
the serialized data byte order is little-endian. The value "1" indicates
that little-endian order was used and "0" indicates that big-endian order was used.
If missing, little-endian is assumed.
</ol>


<i><u><span id="Format of the Data Part">Format of the Data Part</span></u></i>
<i><u><span id="Format of the Data Part">Format of the Data Part</span></u></i>
Line 938: Line 1,004:
<i><u><span id="How the Chunked Encoding Affects the Data Response Format">How the Chunked Encoding Affects the Data Response Format</span></u></i>
<i><u><span id="How the Chunked Encoding Affects the Data Response Format">How the Chunked Encoding Affects the Data Response Format</span></u></i>


In a sense, the chunked encoding does not affect the format of the Data Response at all. Conceptually, the entire binary Data Response is built and then passed through a 'chunking encoder' transforming it into one that is broken up into a series of chunks. That 'chunked document' is the sent as the payload of some transport protocol, e.g., HTTP. In practice, that would be a wasteful implementation because a server would need to hold the entire response in memory. A better implementation would, for HTTP, write the initial parts of the HTTP response (its response code and headers) and then use a pipeline of filters to perform the encoding operations. The intent of the chunking scheme is to make it possible for servers to build responses in small chunks, and once they know those parts have been built without error, send them to the client. Thus a server should choose the chunk size to be small enough to fit comfortably in memory but large enough to limit the amount of overhead spent by the software that encodes and decodes those chunks. When an error is detected, the normal flow of building chunks and sending the data along is broken and an error chunk should be sent
In a sense, the chunked encoding does not affect the format of the Data Response at all. Conceptually, the entire binary Data Response is built and then passed through a 'chunking encoder' transforming it into one that is broken up into a series of chunks. That 'chunked document' is then sent as the payload of some transport protocol, e.g., HTTP. In practice, that would be a wasteful implementation because a server would need to hold the entire response in memory. A better implementation would, for HTTP, write the initial parts of the HTTP response (its response code and headers) and then use a pipeline of filters to perform the encoding operations. The intent of the chunking scheme is to make it possible for servers to build responses in small chunks, and once they know those parts have been built without error, send them to the client. Thus a server should choose the chunk size to be small enough to fit comfortably in memory but large enough to limit the amount of overhead spent by the software that encodes and decodes those chunks. When an error is detected, the normal flow of building chunks and sending the data along is broken and an error chunk should be sent
(See Section [[#DAP4 Error Response Format|12]]).
(See Section [[#DAP4 Error Response Format|12]]).


===The DAP4 Serialized Representation (DSR)===
===The DAP4 Serialized Representation===


Given a DMR and the corresponding data, the serialized representation is formally described in this section.
Given a DMR and the corresponding data, the serialized representation is formally described in this section.
Line 959: Line 1,025:
</blockquote>
</blockquote>


The dimensions are considered ordered the top-to-bottom lexically. This order is linearized into a corresponding left-to-right order [d1][d2]...[dn]. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the rightmost is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this ordering of dimensions.
The dimensions are considered ordered top-to-bottom textually. This order is linearized into a corresponding left-to-right order [d1][d2]...[dn]. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the rightmost is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this ordering of dimensions.


<i><u><span id="Order of Serialization">Order of Serialization</span></u></i>
<i><u><span id="Order of Serialization">Order of Serialization</span></u></i>
Line 1,027: Line 1,093:


<i><u><span id="Variable-Length Scalar Atomic Types">Variable-Length Scalar Atomic Types</span></u></i>
<i><u><span id="Variable-Length Scalar Atomic Types">Variable-Length Scalar Atomic Types</span></u></i>
The variable length atomic values are all represented as a signed 64-bit count followed by the data of the value.


<table border=1 width="85%">
<table border=1 width="85%">
Line 1,040: Line 1,108:


A Structure typed variable is represented as the concatenation of the representations of the variables contained in the Structure taken in textual top-to-bottom order. This representation may be nested if one of the variables itself is a Structure variable. Dimensioned structures are represented in a form analogous to dimensioned variables of atomic type. The Structure array is represented by the concatenation of the instances of the dimensioned Structure, where the instances are listed in row-major order.  
A Structure typed variable is represented as the concatenation of the representations of the variables contained in the Structure taken in textual top-to-bottom order. This representation may be nested if one of the variables itself is a Structure variable. Dimensioned structures are represented in a form analogous to dimensioned variables of atomic type. The Structure array is represented by the concatenation of the instances of the dimensioned Structure, where the instances are listed in row-major order.  
It should be noted that no padding is present in the structure representation.
One field's content is immediately followed by the next field's content.


<i><u><span id="Sequence Variable Representation">Sequence Variable Representation</span></u></i>
<i><u><span id="Sequence Variable Representation">Sequence Variable Representation</span></u></i>
Line 1,052: Line 1,123:
all the "top-level" variables present in the DMR of a returned
all the "top-level" variables present in the DMR of a returned
response from a server. The term "top-level" means that the variable
response from a server. The term "top-level" means that the variable
is not a field of a Structure typed variable.
is not a field of a Structure (or Sequence) typed variable.


The purpose of the checksum is to detect changes in data
The purpose of the checksum is to detect changes in data
Line 1,059: Line 1,130:
infer that the data has not changed. The checksum is not
infer that the data has not changed. The checksum is not
intended for transmission error detection, although the
intended for transmission error detection, although the
client MAY use it for that purpose if it chooses.
client MAY use it for that purpose if it chooses. Note that the
value of the checksum will change depending on the byte order used
to serialize the data.


The checksum is made visible to the client by adding an attribute to each top-level variable in the DMR. This attribute is named "DAP4_Checksum_CRC32".
The checksum is made visible to the client by adding an attribute to each top-level variable in the DMR. This attribute is named "_DAP4_Checksum_CRC32".


In all cases, the checksum is computed over the serialized representation of each top-level variable. The checksum is computed before any chunking
In all cases, the checksum is computed over the serialized representation of each top-level variable. The checksum is computed before any chunking
Section [[#DAP4 Chunked Data Representation|7]])
Section [[#DAP4 Chunked Data Representation|7]]) is applied.
is applied.


If the request to the server is a dmr-only request, then the
If the request to the server is a dmr-only request, then the
server will compute the checksum for each variable mentioned
server will compute the checksum for each variable mentioned
in the DMR and will insert the "DAP4_Checksum_CRC32"
in the DMR and will insert the "_DAP4_Checksum_CRC32"
attribute in the DMR.
attribute in the DMR.
Note that this can have significant performance consequences
Note that this can have significant performance consequences
since the server is required to read and serialize
since the server may need to read and serialize
all of the data for all of the variables mentioned in the DMR
all of the data for all of the variables mentioned in the DMR
even though that data is not transmitted to the client.
even though that data is not transmitted to the client.
Line 1,082: Line 1,154:
serialized representation for transmission to the
serialized representation for transmission to the
client. Note that in this case, the client is expected to
client. Note that in this case, the client is expected to
add the "DAP4_Checksum_CRC32" attribute to the DMR.
add the "_DAP4_Checksum_CRC32" attribute to the DMR.


The default checksum algorithm is CRC32.  So the size of
The default checksum algorithm is CRC32.  So the size of
Line 1,235: Line 1,307:
Notes:
Notes:
<ol>
<ol>
<li> The checksum calculation includes only the values of the variable, not the prefix chunk length bytes.
<li> The checksum calculation includes only the values of the variable, not the containing chunk's length bytes.


<li> The Sequence objects are treated 'like strings' and prefixed with a length count. In the last of the three variables, the dimensioned sequence ''x-star'' has two sequence instances
<li> The Sequence objects are treated 'like strings' and prefixed with a length count. In the last of the three variables, the dimensioned sequence ''x-star'' has two sequence instances
Line 1,243: Line 1,315:
<i><u><span id="Nested Sequences">Nested Sequences</span></u></i>
<i><u><span id="Nested Sequences">Nested Sequences</span></u></i>


The sequence 'x-start' has a field that is itself a sequence. In the example, at the time of serialization 'x-star' has three elements the inner sequence (of which there are three instances) have three, six and one element, respectively.
The sequence 'x-star' has a field that is itself a sequence. In the example, at the time of serialization 'x-star' has three elements the inner sequence (of which there are three instances) have three, six and one element, respectively.
<blockquote>
<blockquote>
<pre>
<pre>
Line 1,266: Line 1,338:
==DAP4 Chunked Data Representation==
==DAP4 Chunked Data Representation==


An important capability for DAP4 is supporting client in determining when a data transmission fails. This is especially difficult when sending binary data
An important capability for DAP4 is supporting clients in determining when a data transmission fails. This is especially difficult when sending binary data
(Section [[#Response Format|6.1]]).
(Section [[#Response Format|6.1]]).
In order to support such a capability, the DAP4 protocol uses a simplified variation on the HTTP/1.1 chunked transmission format [9] to serialize the data part of the response document so that errors are simple to detect. Furthermore, this format is independent of the form or content of that part of the response, so the same format can be used with different response forms or dropped when/if DAP is used with protocols that support out-of-band error signaling, simplifying our ongoing refinement of the protocol.
In order to support such a capability, the DAP4 protocol uses a simplified variation on the HTTP/1.1 chunked transmission format [9] to serialize the data part of the response document so that errors are simple to detect. Furthermore, this format is independent of the form or content of that part of the response, so the same format can be used with different response forms or dropped when/if DAP is used with protocols that support out-of-band error signaling, simplifying our ongoing refinement of the protocol.


The data part of a response document is "chunked" in a fashion similar to that outlined in HTTP/1.1. However, in addition to a prefix indicating the size of the chunk, DAP4 includes a chunk-type code. This provides a way for the receiver to know if the next chunk is part of the data response or if it contains an error response
The data part of a response document is "chunked" in a fashion similar to that outlined in HTTP/1.1. However, in addition to a prefix indicating the size of the chunk, DAP4 includes a chunk-type code. This provides a way for the receiver to know if the next chunk is part of the data response or if it contains an error response (Section [[#DAP4 Error Response Format|12]]).
(Section [[#DAP4 Error Response Format|12]]).
In the latter case, the client should assume that the data response has ended, even though the correct closing information was not provided.
In the latter case, the client should assume that the data response has ended, even though the correct closing information was not provided.


Line 1,279: Line 1,350:
<li> Treat the 32 bit header a single, big-endian, unsigned integer.
<li> Treat the 32 bit header a single, big-endian, unsigned integer.


<li> Convert the integer to the local machine byte order by swapping bytes as necessary
<li> Convert the integer to the local machine byte order by swapping bytes as necessary (Section [[#Byte Swapping Rules|6.2.3.2]]).
(Section [[#Byte Swapping Rules|6.2.3.2]]).
Let the resulting integer be called H.
Let the resulting integer be called H.


Line 1,290: Line 1,360:
The chunk type is determined as a set of one or more flags.
The chunk type is determined as a set of one or more flags.
Currently, the possible flags are as follows:
Currently, the possible flags are as follows:
{| class="wikitable
|+Chunk Type Encoding
!|Bit #
!Value of 0
!Value of 1
|-
|0
|A data containing chunk
|The last data chunk
|-
|1
| The current chunk is not an error chunk.
| The current chunk is an "error chunk" and contains an error message
|-
|2
|The data in this response is encoded using Big-Endian (i.e. network byte order)
|The data in this response is encoded using Little-Endian
|}


<ul>
It is possible for a chunk type to have more than one of the
<li> Data (= 0) &ndash; Indicates a data containing chunk.
flags. So, for example, if the data fits into a single chunk, and we assume little-endian encoding,
 
then its chunk type would be End + LittleEndian.
<li> End (= 1) &ndash; Indicates the current chunk is the last data chunk
 
<li> Error (= 2) &ndash; Indicates the current chunk contains an error message.
The Error flag also implies the End flag.
<li> Little-Endian (= 4) &ndash; Indicates that the data in this response is encoded
using Little-Endian byte order. If not specified, then Big-Endian is assumed.


</ul>
Error implies End, but if the Error flag is set,
It is possible for a chunk type to have more than one of the
then bit 0 should be treated as set even if it is not.  
flags. So, for example, if the data fits into a single chunk,
Note that in order for this to work, the chunk flags
then its chunk type would be Data+End. Error implies
values must be powers of two: e.g. 1, 2, 4.
End. Note that in order for this to work, the chunk flags
values must be powers of two.


The Little-Endian flag must be set only in the first Data
The Endian flag must be set only in the first Data
chunk. It applies to the whole response. If set in any
chunk. It applies to the whole response. If set in any
subsequent chunk type, it will be ignored.
subsequent chunk type, it will be ignored.
Line 1,340: Line 1,420:


==Constraints==
==Constraints==
A request to a DAP4 server for either metadata (the DMR) or data may include a constraint expression. This constraint expression specifies which variables are to be returned and what subset of the data for each variable is to be returned.
A request to a DAP4 server for either metadata (the DMR) or data may include a constraint expression. This constraint expression specifies which variables are to be returned and what subset of the data for each variable is to be returned.


It is important to define a minimal request language &ndash;
This section defines the a constraint language that MUST be supported by any implementation claiming to support the DAP4 protocol. The method by which a server is provided with a constraint is specified in Volume 2. But as a typical example, if such a constraint were to be embedded in a URL, then it is presumed that it is prefixed with a "?dap4.ce=constraint-expression" that is appended to the end of the URL.
a constraint language &ndash; to select information from a dataset on a server and obtaining in response a DMR and data corresponding to that request.


This section defines the syntax and semantics of the minimal request language that MUST be supported by all implementations. The method by which a server is provided with a constraint is specified in Volume 2.
The DAP4 Constraint Expression (CE) syntax is an extension of the syntax used by DAP2 that adds some important new features for Arrays as well as addressing some ambiguities and structural problems in the DAP2 syntax. In this design we also introduce some new terminology to make the explanation of the CE syntax clearer. Additionally, we use a 'curly brace' notation for datasets to streamline the description of datasets because the XML documents that DAP4 servers produce is verbose and hard for humans to read.
But as a typical example, if such a constraint were to be embedded in a URL, then it is presumed that it is prefixed with a
"?CE={constraint}"
and is appended to the end of the URL.


===Syntax===
When a client makes a request to a DAP4 server, it MAY send a CE where a missing (or empty) CE is interpreted to mean that the client wants the entire dataset sent. A CE is made up of a list of clauses, e
ach of which names a variable in the dataset that the client would like the server to send to it. Each clause can further be broken down into two parts: The subset expression and the filter expression. There are limitations on the CE clauses depending on variable type. For scalar variables, getting the variable is the only option available, so filter expression is supported, and if present, the only subset expression allowed is ''[0]'' or ''[]''. Structure variables can be subset by field but do not support filter expressions (although fields within a Structure may support filtering). Sequences can be subset by field and do support filters. Arrays support index subsets.


The syntax of the minimal constraint language, also referred to as the "simple constraint" language, is as follows.
Specifically, the new features added for DAP4 constraints include:
* Using a grouping operator for Structures and Sequences.
* Sequence filtering expressions explicitly bound to a specific Sequence variable.
* Multiple, disjoint index subsets.


<blockquote>
=== Terminology used by this section ===
<pre>
;selection expression: The entire expression passed to the server that is used to choose specific parts of a dataset.
simpleconstraint: /*empty*/ | constraintlist ;
;subset: The act of choosing parts of a dataset based on the ''type'' of one or more of its variables. We define several types of subsetting operations as follows:
;index subsetting: Choosing parts of an array based on the indexes of that array's dimensions. This operation always returns an array of the same rank as the original, although the size of the return array will (likely) be smaller. Index subsetting uses the bracket syntax described subsequently.
;field subsetting: Choosing specific variables (fields) from the dataset. A dataset in DAP4 is made up of a number of variables and those may be Structures or Sequences that contain fields. Field subsetting uses the brace syntax described later. One or more fields can be specified using a semicolon (''';''') as the separator.
;filter: A filter is a predicate that can be used to choose sequence rows based on the values of fields of the sequence. the vertical bar ('''|''') is used as a prefix operator for the filter predicate. Filters can be applied to fields of a Sequence. A filter predicate consists of one or more filter subexpressions. One or more subexpressions can be specified, using a comma (''',''') as the separator. Implicitly, multiple filter subexpressions are logically and'ded together.
;filter subexpression: A simple expression that consists of a single variable/field; the expression is composed from traditional set of binary and unary operators: comparison operators (=, !=, <, <=, >, >=) for numbers and strings, and a string specific regular expression comparison operator (~=). The operands of the operators must be either numeric or string constants or a field of the Sequence. Specifically, only atomic-valued, scalar fields can be used in the filter expression.
<!-- and new operators for Arrays and Coverages (<<, >>, @=).-->
;id: The name of a variable. These must be absolute, with some specific exceptions. Absolute names are fully qualified names (See Section [[#Fully Qualified Names|5.3]]).
<!--
;domain, range: A function is a mapping from a  set of ''domain'' values to a set of ''range'' values; in a ''discrete function'', these sets are finite.
;discrete coverage: A ''discrete coverage'' is a (discrete) function where the indices of the arrays that hold the ''domain'' and ''range'' values have a one-to-one and onto mapping, with the important exception that in cases where the dimension of the arrays containing the domain values can be reduced without loosing information, that is done. This is purely an implementation optimization, but where applicable, it is nearly universally used. In DAP4 we call discrete coverages simply ''coverages'' or ''grids''.
;coverage: Synonymous with ''grid'' in this document.
-->


constraintlist: constraint | constraintlist ',' constraint ;
=== Subsetting Constraints ===
The simplest constraint is the null string and it means 'return everything' from the dataset. Choosing variables in a dataset is referred to as the ''subset''. To choose a subset of the variables in a dataset, enumerate them in a semicolon-separated list. To choose parts of a Structure, name those parts explicitly using the syntax ''structure_name{field name}'' or ''structure_name.field name''. Each DAP4 dataset contains one or more Groups; the top-level Group is always present and is named ''/'' (pronounced 'root').
<!--
If the root Group is the only Group in the dataset, it does not need to be named when listing variables in the CE. However, if there are other Groups in the dataset, each Group other than the root Group must be named. In any case, naming the root Group is optional.
-->


constraint: variablesubset | namedslice ;
==== Example: subsetting by variable or field ====
<source lang="xml">
<Dataset name="vol_1_ce_1"
  dapVersion="4.0"
  dmrVersion="1.0"
  xml:base="file:dap4/test_ce_1.xml"
  xmlns="http://xml.opendap.org/ns/DAP/4.0#"
  xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#">


variablesubset: PATH structpath ;
  <Int32 name="u"/>
  <Int32 name="v"/>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y"/>
  </Structure>


structpath: ID dimset | structpath NAME dimset ;
</Dataset>
</source>
'''Note''': The syntax used for the examples is (hopefully) easier to read than the DAP4 DMR which uses XML; Curly braces indicate hierarchy.
<source lang="c">
Dataset {
    Int32 u;
    Int32 v;
    Structure {
        Int32 x;
        Int32 y;
    } Point;
} vol_1_ce_1;
</source>


dimset: /*empty*/ | slicelist ;
; Access just ''u'': ''/u''
; Access just ''u'' and ''v'': ''/u;/v''
; Access just ''x'' within ''Point'': ''/Point{x}''
<!--
This notation is based on the use of brackets in [[DAP4: Proposal for Structure Projection]] and [[DAP4: DAP4 Filter Constraints]] with the exception that braces ('''''{}''''') are likely easier to parse than brackets ('''''[]''''') given that arrays of both Structure and Sequence are possible and thus with arrays of these structures the grammar that defines the constraint expression syntax would become context sensitive.
-->
; Equivalent expression to access just ''x'' within ''Point'':  ''/Point.x''
<!--
; Access ''u'' and ''v'' by explicitly naming their Group: ''/u;/v''. Every dataset in DAP4 has a root Group, written ''/''. When that is the only Group in a dataset, it is implicit in the CE, but you can still use its name explicitly.
-->


slicelist: slice | slicelist slice ;
<source lang="xml">
<Dataset name="vol_1_ce_2">
  <Int32 name="u"/>
  <Int32 name="v"/>
  <Group name="inst2">
    <Int32 name="u"/>
    <Int32 name="v"/>
    <Structure name="Point">
      <Int32 name="x"/>
      <Int32 name="y"/>
    </Structure>
  </Group>
</Dataset>
</source>
<source lang="c">
Dataset {
    Int32 u;
    Int32 v;
    Group {
        Int32 u;
        Int32 v;
Structure {
    Int32 x;
    Int32 y;
} Point;
  } inst2;
} vol_1_ce_2;
</source>


slice:   '[' start ']'
; Access 'top-level' ''u'' and ''v'': ''/u;/v''.
        | '[' start ':' last ']'
; Access  'top-level' ''u'' and ''v'' and ''inst2'''s ''u'' and ''v'': ''/u;/v;/inst2/u;/inst2/v''.
        | '[' start ':' stride ':' last ']'
; Access ''inst2'''s ''u'' and ''v'': ''/inst2/u;/inst2/v''
        | '[' slicename ']' ;
; Access field ''x'' in ''Point'', which is inside the ''inst2'' Group: ''/inst2/Point{x}'' or ''/inst2/Point.x''.


namedslice: slicename '=' slice ;
'''Notes'''
* Using a semicolon is a change from DAP2 where clauses in the ''project part'' of the constraint were separated using a comma ('',''). We used semicolon because the comma is used elsewhere and using comma here made for a convoluted grammar. We wanted the grammar to be LALR(1) so that both table-driven and recursive-descent parsers would be easy to write.because it's easy to make both table and recursive descent parsers for these.
* Every name in a constraint should be a fully qualified name, except that if a simple name is referenced inside curly braces (e.g. {x}) for a variable whose type is a structure or sequence type, S say, and "x" is a top-level field in S, then that is allowed.
<!--As a notational simplification, we assume that non-qualified names are actually at the top dataset level (i.e., in the root group).-->


slicename: ID ;
=== Array Subsetting in Index Space ===
Subsetting fixed-size arrays in their ''index space'' is accomplished using square brackets. The syntax closely follows that of DAP2, with some extensions. For an array with ''N'' dimensions, ''N'' sets of brackets are used, even if the array is only subset on some of the dimensions. The names of array variables are fully qualified names (FQNs) so it's possible to name arrays in structures and/or Groups. Array index values are ''zero-based'' as with a number of programming languages such as C and Java. Every array has a known starting index value of zero. Within the square brackets, several subexpressions are allowed:
; [ ] : return all of elements elements for a particular dimension ''or'' apply a shared dimension slice (more on this later).
; [ ''n'' ] : return only the value at a single index, where 0 <= n < N for a dimension of size ''N''. This slicing operator does not reduce the dimensionality of an array, but does return a dimension size of one for the dimension to which this is applied.
; [ ''start'' <nowiki>:</nowiki> ''step'' <nowiki>:</nowiki> ''last'' ] : return every value whose index is in the range ''start &lt;= index &lt;= last'' and where ''(index - start) % step == 0''. This is the complete version of the syntax.
; [ ''start'' <nowiki>:</nowiki>  ''last'' ] : return the values whose index is in the range ''start &lt;= index &lt;= last''.
; [ ''start'' <nowiki>:</nowiki> ] : return the values whose index is in the range ''start &lt;= index &lt;= the dimension size - 1''.
; [ ''start'' <nowiki>:</nowiki> ''step'' <nowiki>:</nowiki> ] : return every value whose index is in the range ''start &lt;= index &lt;= dimension size - 1'' and where ''(index - start) % step == 0''.


start: INTEGER ;
Subsetting can be applied to any array. It can also be applied
last: INTEGER ;
to a scalar, but in this case, the only legal forms are ''[0]'' or ''[]''.
stride: INTEGER ;


</pre>
==== Example: Subsetting in Index Space ====
</blockquote>
<source lang="xml">
<Dataset name="vol_1_ce_3">
 
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="256"/>
  </Int32>
  <Int32 name="v">
    <Dim size="256"/>
    <Dim size="256"/>
  </Int32>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Dim size="256"/>
  </Structure>
</Dataset>
</source>
<source lang="c">
Dataset {
    Int32 u[256][256];
    Int32 v[256][256];
    Structure {
        Int32 x;
        Int32 y;
    } Point[256];
} vol_1_ce_3;
</source>


The variablesubset rule specifies a subset of values for a variable as specified by the slices. The PATH lexical element is the same as the FQN path as defined in Section [[#FQN Syntax|10]].
; Access all of ''u'': ''/u''
; Access all of ''Point'' 's ''x'' field: ''/Point{x}'' or ''/Point.x''. This returns an array of Structures with a single (Int32) element, not an array of Int32.
; Access elements 10 through 19 of array ''Point'': ''/Point[10:19]''. DAP4, like DAP2, uses zero-based indexes. This CE will return the 10th through the 19th elements (Structures in this case) of the array.
; Access every 4th element in the ''Point'' array: ''/Point[0:4:255]'', or ''/Point[0:4:]''. This is a simple decimation operation; this CE would return 64 Structures corresponding to elements at indexes 0, 3, 7, ..., 255 of the array.
; The index-space and field subsetting may be combined in the logical way: ''/Point[0:4:]{x}'' will return an array of structures (with 64 elements) named ''Point'' that contains a single ''Int32'' field named ''x''.
<!-- Note that <del>''/Point[0:4:].x''</del> is not accepted
; Access parts of ''u'' and ''v'': ''/u[4:2:9];/v[4:2:9]''-->


The structpath is almost the same as the FQN prefix as defined in that same Section. The difference is that each component (between '.' separators) of the structpath can have an optional dimset indicating the set of dimension slices to apply.
Other possible CEs:
; ''/u[0:4:][0:4:]'': every fourth element in both dimensions; this would return 1/16<sup>th</sup> of the array's data.
; ''/u[][10:19]'': elements corresponding to every row and columns 10 through 19.
; ''/u[7][10:19]'': elements corresponding to the 8<sup>th</sup> row and columns 10 through 19.
; ''u[10:19][10:19]'': elements corresponding to rows 10 through 19 and columns 10 through 19.
; ''/u[0:19][0:19]'': elements corresponding to rows 0 through 19 and columns 0 through 19.
; ''/u[][]'': identical to ''/u'', as are ''/u[0:][0:]'' and ''/u[0:1:][0:1:]''.


A dimset is either empty or is a slicelist.
==== More complex subsetting examples ====


A slicelist is a non-empty list of slices, where a slice indicates a subset of dimension indices. The first case of a slice (e.g. '[5]') indicates a single dimension value, 5 in this case. The second case (e.g. '[5:9]' indicates the range of dimension values 5,6,7,8,9. The third case (e.g. '[5:2:11]') indicates a range of dimension values separated by the stride (the middle values. Thus the example would be the dimension values 5,7,9,11. The fourth case (e.g. '[time]', shows the use of a named slice.
The data model for DAP4 is very similar to that of a modern structured programming language where ''constructor types'' like ''Structure'' may contain any allowed type (including other Structures, etc.) as well as being arrays themselves. The basic syntax for subsetting outlined so far can be applied to the fields of a Structure using braces to enclose the subsetting expression that apply to the fields of the Structure. This can be applied recursively.


Note that unlike a suffix, intermediate structures in the structlist can have associated dimsets Thus we might have something like this.
<source lang="xml">
<Dataset name="vol_1_ce_4">
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="1024"/>
  </Int32>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y">
      <Dim size="256"/>
    </Int32>
    <Int32 name="z">
      <Dim size="1024"/>
    </Int32>
    <Dim size="256"/>
  </Structure>
</Dataset>
</source>


<blockquote>
<source lang="c">
<pre>
Dataset {
/g/S1[5][5:9].v[5:2:11].
    Int32 u[256][1024];
</pre>
    Structure {
</blockquote>
        Int32 x;
        Int32 y[1024];
        Int32 z[256];
    } Points[256];
} vol_1_ce_4;
</source>


A 'namedslice' provides a way to define a slice and give it a slice name.
; ''/Points{y[7:256]}'' or ''/Points.y[7:256]'': Get all of the elements of the Array of Structure ''Points'' and for each of those elements get the elements 7 through 256 from the field array ''y''. Do not return the field ''x''.
The slice name has lexical type ID. The name, when enclosed in "[]" can be used anywhere a slice is legal. The goal of the 'namedslice' is to ensure that the same slice is used consistently across multiple 'variablesubsets' as a way to impose shared dimension semantics.
; ''/Points[0:9]{y[0:9]}'' or ''/Points[0:9].y[0:9]'': Get the first ten elements of ''Points'' and, for each of those, only the  first ten elements of the array ''y''.
; ''/Points[0:9]{x;y[0:9]}'': Get the first ten elements of ''Points'' and, for each of those, return only all of ''x' and the first ten elements of the array ''y''.
; ''/Points[0:9]'': Get the first ten elements of ''Points'' (both fields are included)
; ''/Points'' or ''/Points[]'' or ''/Points[0:]'': Get all of ''Points'' with the subtle difference that if ''Points'' uses a shared dimension, the last of the three CEs will replace that with an anonymous dimension (see the section on shared dimensions, below).


There are certain context sensitive constraints on 'structpaths' and 'slicelists'.
<source lang="xml">
<ol>
<Dataset name="vol_1_ce_5">
<li> The terminal variable in the 'structpath' must be an atomic-typed variable.
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="1024"/>
  </Int32>
  <Structure name="Points">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Structure name="sounding">
      <Int32 name="height">
        <Dim size="1024"/>
      </Int32>
      <Int32 name="pressure">
        <Dim size="1024"/>
      </Int32>
    </Structure>
   
    <Dim size="256"/>
  </Structure>
</Dataset>
</source>


<li> The number of slices associated with a component in the 'structpath' must correspond to the arity of that structure or the last, atomic-typed variable.
<source lang="c">
Dataset {
    Int32 u[256][1024];
    Structure {
        Int32 x;
        Int32 y;
        Structure {
            Int32 height[1024];
            Int32 pressure[1024];
        } sounding;
    } Points[256];
} vol_1_ce_5;
</source>


<li> A slice name must be defined before it is used.
; ''/Points[0]{x,y,sounding{height[0:8:]}}'': Get only the first element of ''Points'' and, for that, get the fields ''x'', ''y'' and a slice of ''sounding'' where the ''sounding'' slice is every 8<sup>th</sup> element of the field ''height'' and elide the field ''pressure''. An equivalent way of writing this expression is ''/Points[0]{x,y,sounding.height[0:8:]}''. The ''{}'' syntax provides an easy way to request ''x'', ''y'' and ''sounding.height[0:8:]'' without having to repeat ''/Points[0]'' three times. A CE like ''/Points[0].x;/Points[0].y;Points[0].soundings.height[0:8:]'' is legal, but ''/Points[0]'' will only appear once in the result and a CE where ''Points'' is sliced differently is not legal. That is, <del>''Points[0].x;Points[0:10].y;Points[15].soundings.height[0:8:]''</del> is not legal because ''Points'' can appear only once in the result but has been sliced three different ways in the CE. In any CE, each variable can be constrained only one way.
</ol>


===Interpretation===
===Array subsetting with Disjoint Index Subsets===
As a new feature in DAP4 constraints, index subset within square brackets can contain multiple, disjoint slices, where each slice is of any of the previously defined slice formats (most generally ''start:stride:last''). The disjoint slices are separated by commas.


Consider the following Array.
Using the preceding example (dataset ''vol_1_ce_4''), some disjoint index examples might be as follows.
; ''/u[10:12,19:23]'': Access elements 10 through 12 and 19 through 23 of array ''u''. The result will be an array of size 3+5 = 11 elements. The values returned will be, in order,
''u[10] u[11] u[12] u[19] u[20] u[21] u[22] u[23]''.
; ''/u[19:23, 10:12]'': Access elements 19 through 23 and 10 through 12 of array ''u''. The result will be an array of size 11, but the values returned will be in a different order, namely
''u[19] u[20] u[21] u[22] u[23] u[10] u[11] u[12]''.


<blockquote>
In the event that the slices are not disjoint, the result is undefined.
<pre>
&lt;Int32 name="A"&gt;
  &lt;Dim size="d1"/&gt;
  &lt;Dim size="d2"/&gt;
  ...
  &lt;Dim size="dn"/&gt;
&lt;/Int32&gt;
</pre>
</blockquote>
where all of the dimension sizes, di, are integers.  


Consider the following array subset constraint, where for the purposes of interpretation, all named slices are assumed to have been replaced with their defined slice.
=== How Sequences fit into this syntax ===


<blockquote>
The ''Sequence'' type is more general data type in DAP4 than in DAP2 where it was significantly limited. In DAP4 Arrays of Sequences will be supported as will Sequence fields that are themselves Arrays or Sequences. A Sequence variable is conceptually like a table of rows where each field in the Sequence is a column in the table (or like an array of Structures, where the size of the single array dimension is a secret). Note that while there is a big difference between the value held by a Structure and a Sequence, each has the same subsetting syntax in the CE (although Sequences may have filters applied while Structures may not).
<pre>
A[start1:stride1:end1]...[startn:striden:endn]
</pre>
</blockquote>
Where


<blockquote>
<source lang="xml">
     for i=1 .. n, starti &lt; di &amp; endi &lt; di &amp; starti &lt; endi &amp; starti &gt;= 0 &amp; stridei &gt;= 1 &amp; endi &gt;= 0.
<Dataset name="vol_1_ce_6">
</blockquote>
  <Sequence name="s1">
The constraint selects the elements A[i1][i2]...[in] from A where ii is in the set {starti+stridei*j} and where j=0..k such that starti+stridei*k &lt;= endi and starti+stridei*(k+1) &gt; endi.
    <Int32 name="x"/>
    <Int32 name="y"/>
  </Sequence>
 
  <Sequence name="s2">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Dim size="100"/>
  </Sequence>
 
  <Sequence name="s3">
     <Int32 name="z"/>
    <Int32 name="x">
      <Dim size="10"/>
    </Int32>
  </Sequence>
 
  <Sequence name="s4">
    <Int32 name="z"/>
    <Int32 name="x">
      <Dim size="1024"/>
    </Int32>
    <Dim size="100"/>
  </Sequence>
 
</Dataset>
</source>


Now consider the same array embedded in a dimensioned Structure.
<source lang="c">
Dataset {
    Sequence {
        Int32 x;
        Int32 y;
    } s1;


<blockquote>
  Sequence {
<pre>
        Int32 x;
&lt;Structure name="S"&gt;
        Int32 y;
  &lt;Int32 name="A"&gt;
     } s2[100];
    &lt;Dim size="d3"/&gt;
     ...
    &lt;Dim size="dn"/&gt;
  &lt;/Int32&gt;
  &lt;Dim size="d1"/&gt;
  &lt;Dim size="d2"/&gt;
&lt;/Structure&gt;
</pre>
</blockquote>
where all of the dimension sizes, di , are again integers.


Consider the following subset constraint.
    Sequence {
        Int32 z;
        Int32 x[10];
    } s3;


<blockquote>
    Sequence {
S[start1:stride1:end1][start2:stride2:end2].A[start3:stride3:end3]...[startn:striden:endn]
        Int32 z;
</blockquote>
        Int32 x[1024];
with conditions as before.
    } s4[100];
} example;
</source>


This constraint selects the Structure instances
; ''/s1'': All of Sequence ''s1''.
; ''/s1{x;y}'': Also all of Sequence ''s1''.
; ''/s1{x}'' or ''/s1.x'': every 'row' of Sequence ''s1'', but just field ''x''.
; ''/s2{x;y}'': All one hundred Sequences instances (not rows, but full sequences) of the Array ''s2''. Same as ''/s2'' and ''/s2[0:99]{x,y}'' and ''/s2[]{x;y}''.
; ''/s2[0:9]{x;y}'': The first ten Sequence instances of ''s2''. That would be 10 Sequences and for each, both the fields ''x'' and ''y''.
; ''/s3{} | z < 10'': Every instance of the Sequence ''s3'' where z is less than 10. Note that this is the first example of a ''filter'', a topic that is discussed in much more detail later on.


<blockquote>
=== Subsetting and Shared Dimensions ===
S[i1][i2]
</blockquote>
where ii
is in the set {starti+stridei*j} and where j=0..k such that starti+stridei*k &lt;= endi and starti+stridei*(k+1) &gt; endi.


Then for each selected structure, the elements
''Shared Dimensions'' provide additional information to indicate that a group of arrays share certain relationships; that specific groups of the arrays form ''coverages''  by indicating how dimensions of ''Maps'' and ''Arrays'' are linked. The DAP4 CE syntax provides a way to slice a Shared Dimension so that slice can be used by all of the arrays that use it without repeating the slicing operation for each Array. The syntax can be read 'Assign the shared dimension ''X'' this slice,' where the slice looks like, for example, ''row=[10:19]''.
A[i3]...[in] are selected from that instance of A
<!--
where ii is in the set {starti+stridei*j} and where j=0..k such that starti+stridei*k &lt;= endi and starti+stridei*(k+1) &gt; endi.
The only difference syntactically between a shared dimension slice and the slice operator applied to a one  dimensional array is the assignment operator (''='').-->
All of the variations of the slice operator possible for an array are accepted for shared dimension slicing. In any CE, all of the shared dimension slicing clauses must precede the variable subsetting clauses.


The results of all of the selections of the instances of A are concatenated as the value of the whole constraint.
'''Note''' DAP4 uses XML for it's actual grammar, and because that's wordy this document includes a mock notation. I will extend that notation used so far so it includes concepts needed to mimic DAP4's notation for a coverage:
* The keyword ''Dimensions'' introduces a list of symbols and their sizes. (That is the definition of a Dimension in DAP4; a size bound to an identifier.)
* Arrays where every dimension uses a ''Dimension'' to supply its extent are DAP4 ''Maps''. Maps are the arrays that hold the ''domain'' values for a ''coverage''.
<!--
* Arrays that use parenthesis '''''()''''' in place of brackets to indicate the sizes of dimensions and which ''use the names of maps'' to do so ''for at least one dimension'' hold the ''range'' values for the coverage. These are the coverage's ''data array'' (aka ''array'' as distinct from ''maps'').
* As with the previous examples, both the official and the mock syntax are shown and these datasets are available from the OPeNDAP test server.-->


== Dataset Services Response ==
<font color="red">New 4/15/16</font>
The Dataset Services Response provides a 'Services' or 'Capabilities' response for DAP4. Dereferencing an unadorned DAP4 dataset resource URL will return a document describing the DAP services available for the dataset. This is a REQUIRED response and the only valid response for an 'unadorned' dataset resource URL. We will refer to this response as the 'Dataset Services Response' or 'DSR' in the 'DAP4 Web Services' specification and the remainder of this document.


At minimum the DSR document MUST provide, for each service available for the dataset:
Using Shared Dimensions for array slicing adds some complexity to the processing of constraints. Two cases are important to consider and are shown in the examples.
* When a request is made for an Array with Maps but the request names only the Array and not the Maps, the assumption is made that the requester intended to receive ''only'' the Array and not the Maps. For example, the client might have already requested/received the Maps. Note that in this case the CDMR included with the data response will still include the ''Map'' element(s) for the Array, and the receiving client must know that the associated (Map) variable is not present in the response.
* A second case involves requests for two or more Arrays that share Maps and that constrain (i.e. 'slice') those Maps differently. Because this can introduce a logical inconsistency, when a local dimension slice is applied to an Array's dimension that has a Map, using that local dimension slice will cause the Map to be removed from the data response's CDMR.
The examples make these two cases clearer.


* A human readable title of that service (e.g., The Dataset Metadata Response service for the dataset)
<font color="red">/New</font>
* One or more links that can be dereferenced to get the various representations of the response for the dataset.
==== Example of this syntax ====
* An unambiguous, unique to the service type, resource role ID that serves as a way to clearly identify what each service does. (Note: ''Resource roles for all DAP4 services are defined, as well as for several other legacy DAP2 related services in the 'DAP4 Specification, Volume 2 Web Services'. Defining a resource role for a new service will be the responsibility of the service creator.'')
<source lang="xml">
* A brief description of the service
<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
 
  <Float32 name="lat">
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
 
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>


The response should also include:
  <Float32 name="sal">
* A link to a complete, human readable, description of the service.
    <Dim name="nlon"/>
* A reference to an XSLT that a browser would use to render the description into HTML for a presentation view.
    <Dim name="nlat"/>
* Descriptions and syntax of server side functions available for the dataset
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>
 
  <Float32 name="O2">
    <Dim name="nlat"/>
    <Dim name="nlon"/>
    <Map name="lon"/>
    <Map name="lat"/>
  </Float32>
 
  <Float32 name="CO2">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Dim size="10"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>
 
</Dataset>
</source>


=== Service Access ===
<source lang="c">
Dataset {
    Dimensions: nlat=100, nlon=50;
    Float32 lat[nlat];
    Float32 lon[nlon];


A service is accessed by dereferencing one of the access URLs held in the <font size="2"><code>'''href'''</code></font> attribute of a <font size="2"><code>'''<dsr:link>'''</code></font> elements, which is typically constructed by adding some type of suffix to the dataset's referent (aka base) URL. The way in which the query string (aka constraint expression) is used is defined by each service, and there is no requirement for inter-service query string API conformity.
    // The maps ''lat'' and ''lon'' are used here and define a coverage
    Float32 temp[lon][lat];
    Float32 sal[lon][lat];
    Float32 O2[lat][lon];
    Float32 CO2[lon][lat][10];
} shared_dimensions;
</source>


A more extensive discussion of the various services that might appear in a DatasetServices document can be found in [[DAP4:_Specification_Volume_2| DAP4 Volume 2, Web Services]].
==== Examples of subsetting using shared dimensions ====


'''Note:''' If a client understands how to modify/augment the dataset resource URL (as described in the [[DAP4:_Specification_Volume_2| DAP4 Volume 2, Web Services]] ) such that it becomes a DAP response URL then it need never obtain the Dataset Services response.
; ''nlat=[0:9];nlon=[10:19];lat[nlat];lon[nlon];temp[nlat][nlon]'': This will return Dimensions nlat=10, nlon = 10, ''lat'', ''lon'' and ''temp'' such that lat an lon are 10 element vectors and ''temp'' is a 10 x 10 array.
Because the arrays are dimensioned using nlat and nlon in the original DMR, this expression can also be written as ''nlat=[0:9];nlon=[10:19];lat[];lon[];temp[][]'' or ''nlat=[0:9];nlon=[10:19];lat;lon;temp''
; ''nlat=[0:9];nlon=[10:19];lat; lon; temp; sal'': Same as above, but with both ''temp'' and ''sal'' included. This example shows how two or more arrays variables can be accessed along with their Maps without sending multiple copies of the Maps. Similarly, ...
; ''nlat=[0:9];nlon=[10:19];lat; lon'': This CE requests just the arrays that hold the domain values, while ...
; ''nlat=[0:9];nlon=[10:19];temp; sal'': This CE requests just the arrays that hold the range values. Taken together, the two preceding examples support clients that read the domain values first and then display a map (for example) providing a way for someone to view the data's geographical extent before accessing the values them selves. Also note that there is no restriction that the same shared dimension slices must be used for both requests; like DAP2, each request in DAP4 is ''stateless''.
; ''nlat=[0:9];nlon=[10:19];temp[][]; sal[][]'': This CE requests exactly the same data as the previous one, but uses the ''[]'' notation to indicate that the shared dimensions should be used for the subset. An example below shows how this notation can be used to mix local and shared dimension slicing.
; ''nlat=[0:4:];nlon=[0:4:];CO2'': This CE decimates ''CO2'' by returning every fourth value in the first two dimensions
; ''nlat=[0:4:];nlon=[0:4:];CO2[][][0:4:]'': This CE introduces the second meaning for ''[]''. When the empty braces are used for a dimension that corresponds to a shared dimension, it means ''use the shared dimension slice''. This is useful because some arrays contain a mixture of shared and anonymous dimensions and it's desirable to slice both, using a shared dimension slice previously defined where applicable and an anonymous slice where that's needed. This expression will decimate ''CO2'' by four in each of its three dimensions.
; ''nlat=[0:4:];nlon=[0:4:];CO2[][1][0:4:]'': To override the slicing provided by a shared dimension slice, simply replace the ''[]'' with a local dimension slice.


=== Dataset Services Response Encoding  ===  
<font color="red">New 4/15/16</font>
; ''temp'': This will return only the Array ''temp''. The constraint ''lat;lon;temp'' will return three Arrays: The Map Arrays ''lat'' and ''lon'' and the 'value Array' '' temp. In both cases the CDMR returned in the response will include mention of the Maps ''lat' and ''lon''. In the first case where only ''temp'' is requested, the client must be savvy (or permissive) enough to realize that the Map Arrays are not present. In summary, it is the requester's responsibility to understand that the Maps are separate variables and must be explicitly requested. Here are example CDMR responses:
<blockquote>
The CDMR for the CE ''temp'':
<source lang="xml">
<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
   
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>


c.f. [[DAP4:_Specification_Volume_2| DAP4 Volume 2, Web Services]]
</Dataset>
</source>


''In this section the namespace prefix <font size="2"><code>'''dsr'''</code></font> is associated with the namespace '''<font size="2"><code><nowiki>http://xml.opendap.org/ns/DAP/4.0/dataset-services#</nowiki></code></font>'''.''
The CDMR for the CE ''lat;lon;temp'':
<source lang="xml">
<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
 
  <Float32 name="lat">
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
 
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>


==== dsr:DatasetServices element ====
</Dataset>
</source>
</blockquote>


The Dataset Services Response MUST contain a top level <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element. This <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element MUST contain:
; ''nlat=[0:9];nlon=[10:19];lat; lon; temp; sal[][8:9]'': This request is almost the same as the third example, but notice that ''sal'' uses a local dimension slice for its second dimension. This means that it will not use the ''nlon=[10:19]'' slice that ''temp'' uses. To avoid a conflict with the ''nlon'' slice and the fact that that is being applied to ''temp'' (and ''lon'' in this example), applying a local dimension slice to an Array with Maps will cause the associated Maps to be elided from the response's CDMR. For Arrays with no Maps, this has no effect.
* An <font size="2"><code>'''xml:base'''</code></font> attribute whose value is the resource URL that was dereferenced to request the DSR response.
* A list of DAP versions supported by server
*: These appear as one or more child <font size="2"><code>'''<dsr:DapVersion>'''</code></font> elements of the <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element.
* The implementation version of the server that produced the DSR.
*: This appears as a single <font size="2"><code>'''<dsr:ServerSoftwareVersion>'''</code></font> child element of the <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element.
* A list of all available DAP4 services for the dataset
*: This is represented as 3 or more <font size="2"><code>'''[[#Service_Element | <dsr:Service>]]'''</code></font> elements. (There are 3 required services for a DAP4 server.)
* A list of supported extensions
** Resource type extensions
** Media type extensions
** Server-side function extensions


The <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element SHOULD contain:
<blockquote>
* A <font size="2"><code>'''title'''</code></font> attribute whose value is a human readable title for the dataset.
The CDMR for the CE ''temp'':
<source lang="xml">
<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="10"/> <!-- The effect of ''nlat=[0:9]'' -->
  <Dimension name="nlon" size="10"/> <!-- ... nlon=[10:19] ->
 
  <Float32 name="lat">              <!-- We asked for lat and lon -->
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
 
  <Float32 name="temp">              <!-- ... and temp -->
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>


The <font size="2"><code>'''<dsr:DatasetServices>'''</code></font> element MAY contain:
  <Float32 name="sal">             <!-- ... and sal, but... -->
* A <font size="2"><code>'''<dsr:Description>'''</code></font> element whose value is a human readable description of the dataset.
    <Dim name="nlon"/>
 
    <Dim size=2/>                   <!-- for this dimension, we use a local dim slice -->
==== dsr:DapVersion Element ====
    <Map name="lat"/>               <!-- and thus only one of the two Maps is shown. -->
The <font size="2"><code>'''<dsr:DapVersion>'''</code></font> element contains a single DAP version value as a text string. The DAP version is represented as two integer values separated by a period, MM.mm where the first value (MM) is the major version of the DAP protocol, and the second value (mm) is the minor version. Multiple <font size="2"><code>'''<dsr:DapVersion>'''</code></font> elements are used to represent that server can support multiple versions of the DAP protocol, one instance for each supported version.
  </Float32>
 
   
Example:
</Dataset>
<font size="2">
<source lang="xml" >
<DapVersion>4.0</DapVersion>
<DapVersion>3.2</DapVersion>
<DapVersion>2.0</DapVersion>
</source>
</source>
</font>
</blockquote>
<font color="red">/New</font>


==== dsr:ServerSoftwareVersion element ====
=== Constrained DMR Objects ===


The <font size="2"><code>'''<dsr:ServerSoftwareVersion>'''</code></font> element has no attributes and may contain as little as a simple text string (e.g., "TDS 4.3.57" or "Hyrax 1.7.45"), or it may contain any well-formed XML content not in the <font size="2"><code>'''dsr'''</code></font> namespace that the server implementer sees fit to use to describe the software version of their server implementation.
When a DAP4 server receives a request for a Data response, it must build and return a Data Response Document that contains a text/xml part containing a DMR, a separator and a binary part that contains the data values. The organization of the Data Response Document is described in detail elsewhere in this document. In this section the focus is on the DMR returned in the first part of the response and how it relates to the DMR for the original unconstrained dataset. We refer to the original dataset's DMR as the ''DMR'' and the DMR associated with the data response as the ''CDMR'' (short-hand for Constrained DMR), although a data response can be generated using a null CE, we consider that a constraint, too.


==== dsr:Service element ====
The DMR contains a number of declarations for the dataset: Enumerations, Dimensions, Attributes, Groups and Variables. Each DMR and CDMR must follow the rules for the DMR described in this specification and, because DAP4 is a stateless protocol, each response from a server must stand on its own. Since a Constraint Expression alters the data returned (limiting variables, changing the size of dimensions and so on), it stands to reason that the contents of the CDMR will vary for any given dataset based on the CE. Furthermore, a goal of DAP4 is to specify that the CDMR be 'minimal' containing no unused definitions.


Each DAP4 web service is described by a <font size="2"><code>'''<dsr:Service>'''</code></font> element with:
Because filters alter the values of variables, but not whether a variable is returned, they have no affect on the CDMR. Only the subsetting operators will be discussed here.
* An optional <font size="2"><code>'''title'''</code></font> attribute whose value is a simple human readable title for the service.
* An required <font size="2"><code>'''role'''</code></font> attribute whose value is a unique to the service type, resource role ID (a URI) that serves as a way to clearly identify what each service does. This is intended to allow people and software to unambiguously identify specific services, irrespective of their human readable title. The idea is that that the role attribute tells you what is going to happen when you dereference the URL held in the <font size="2"><code>'''<dsr:link>'''</code></font> element. Each service has a single role and all of the links (alternate representations) must fulfill the same role.
* An optional <font size="2"><code>'''<dsr:Description>'''</code></font> element .
* One or more <font size="2"><code>'''<dsr:link>'''</code></font> elements.


===== Regarding the <font size="3"><code>'''role'''</code></font> attribute =====
==== Enumerations ====
An enumeration is included in the CDMR if and only if some variable or attribute in the CE references it. A null CE returns the entire dataset, so it effectively references every variable.
<!--FIXME Make sure that 'variable' and 'field' are defined correctly in the terms section. jhrg 12/31/13-->


With regards to multiple representations of a service response and the <font size="2"><code>'''role'''</code></font>  attribute: While there may be many alternate representations of a response, not all will fufill the same <font size="2"><code>'''role'''</code></font>. Representations fulfilling the same <font size="2"><code>'''role'''</code></font> should be bundled together in their own '''Service''' with the appropriate <font size="2"><code>'''role'''</code></font> value. For example the DMR can be mapped to the ISO19115 metadata space, but IS0-19115 responses are clearly outside of the <font size="2"><code>'''role'''</code></font> of the regular DMR service, which returns a DAP4 metadata response. One could view it as the two roles described different domains.
==== Shared Dimensions ====
Shared Dimension declarations from the DMR are not included in the CDMR unless the Shared Dimension is used by a variable that has been projected and that variable does not override that shared dimension using a local slicing operation.
<!--FIXME Refer to the DMR from the previous section and show some examples. jhrg 12/31/13-->


==== dsr:Description element ====
==== Variables ====
Each <font size="2"><code>'''<dsr:Description>'''</code></font> element MUST contain:
Each clause in the constraint must specify a variable and that variable will be declared in the CDMR. The variable must be referenced by a FQN.
* An optional <font size="2"><code>'''href'''</code></font> attribute whose value is a URL string that points to a human readable document describing the service.  
<!--unless it is declared at the top level of the DMR and the short-hand notation is used where the leading slash (''/'') is elided.-->
* The required text content of the <font size="2"><code>'''dsr:Description'''</code></font> element MUST be a human readable description of the service.


==== dsr:link element ====
==== Array Variables ====
Array variables follow all the rules for ''Variables'' with the additional conditions that their dimensions may appear altered depending on the CE. If the local slicing operations are used, then the sliced dimensions will have the size given be the slice operator, not the size as shown in the full dataset's DMR. If a shared dimension is sliced and the Array uses that slice, then its size will reflect that. Arrays may mix shared dimension slices and local slices and the result must be correctly reflected in the specific variable's declaration.


Each <font size="2"><code>'''<dsr:link>'''</code></font> element MUST contain:
Note that slicing never affects the ''rank'' of an array.


* A required <font size="2"><code>'''type'''</code></font> attribute whose value is the media-type for the normative representation returned by that URL held in the <font size="2"><code>'''href'''</code></font> attribute.
==== Structure Variables ====
* A required attribute called <font size="2"><code>'''href'''</code></font> whose value is a URL that when dereferenced will return the service response in the media type described by the value of the <font size="2"><code>'''type'''</code></font> attribute.
If the variable is a Structure, then either the entire Structure is included or a subset of its fields will be included in the variable declaration where the fields are those specifically mentioned in a constraint projection. As with all other variables, each variable in the structure will have the same rank and type as the original declaration in the DMR.
* An optional attribute called <font size="2"><code>'''description'''</code></font> whose value is a human readable string that provides a brief description of the <font size="2"><code>'''<dsr:link>'''</code></font> elements semantics.
* Zero or more <font size="2"><code>'''<dsr:alt>'''</code></font> elements, one for each and every alternate representation of the response that can be requested using [http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12.1 HTTP server-driven content negotiation] in conjunction with the <font size="2"><code>'''href'''</code></font> URL. Each <font size="2"><code>'''<dsr:alt>'''</code></font> element MUST contain:
** A required <font size="2"><code>'''type'''</code></font> attribute whose value is the MIME type that would be used in the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1 HTTP '''Accept''' header] to request that representation from the server.


=== Extensions ===
==== Sequence Variables ====
If the variable is a Sequence, then for declaration
purposes, it is treated like a Structure (as above).
Note that applying a filter to a Sequence will not change
its declaration form because the number of records in the
sequence is not specified in the DMR. Note also that mentioning
a Sequence field in the filter does not necessarily mean it will
be included in the DMR. It will only be included if it is mentioned
in the projection part of the constraint clause.


Server extensions are simple descriptions of server behaviors that fall outside of the services descriptions provided so far. There are three types of extensions: <font size="2"><code>'''function'''</code></font>, <font size="2"><code>'''functionGroup'''</code></font>, and <font size="2"><code>'''extension'''</code></font>.
==== Groups ====
Each declaration in the CDMR that corresponds to a declaration
in the DMR will cause its containing group (and that group's parents)
to be included in the CDMR. This ensures that the FQN for a declaration
in the CDMR is the same as in the DMR.


A linear regression, a re-gridding operation, and a coordinate re-projection are all examples of a <font size="2"><code>'''function'''</code></font>. A collection of functions that allows the user to perform geocentric manipulations of the data (such as re-gridding operations, reprojection operations) taken together could define a <font size="2"><code>'''functionGroup'''</code></font>.  An <font size="2"><code>'''extension'''</code></font> is a server behavior that does not operate directly on the data values, but may provide other types of services and functionality for the client.
==== Attributes ====
Attributes are unaffected by the CE and are simply included in the CDMR, with the stipulation that attributes for variables that are not included in the CDMR won't be part of the CDMR. Essentially DAP4 views those attributes as part of the variables and explicitly excluding the variable from the CDMR (by providing a CE that does not include it) excludes its attributes too. Group level attributes
will be included if and only that group appears in the CDMR.


It is beyond the scope of this specification to attempt to describe the syntax of usage of a  <font size="2"><code>'''function'''</code></font> in some general way. Rather we have chosen to provide a simple mechanism (the <font size="2"><code>'''role'''</code></font> attribute) to alert client software implementations that the server supports specific additional behaviors they can utilize, along with more descriptive elements through which humans can be alerted to the presence of server features and where to find out more about them.
There is one situation that bears mention, however. Many datasets contain variables which include attributes that describe domain-specific values for for the variables value(s). For example, imagine a atmospheric profile that includes information about the minimum and maximum temperatures of that profile. If the values are stored in an array and the array is sliced so that only a subset of values are returned, the attributes will provide correct values for the original data ''but possibly not the data returned in the response'' because the slicing operation has removed some of the values of the array. Because DAP4 is a ''domain neutral'' protocol, it has no knowledge about how the values of a specific attribute relate to the values of the variable and cannot adjust the values of the attribute to match the CE.


==== dsr:function ====
=== Filters ===
While ''subsetting'' provides ways to choose data based on the dataset structure and the types of the variables, ''filters'' provide a way to choose data based on their values. The values to be returned are denoted using one or more simple predicates. The general syntax for a filter expression is to follow a subset (projection) expression with a pipe ('''|''') and one or more filter predicates. Multiple predicates are separated by commas and the value of complete predicate is the logical AND of the comma-separated subexpressions.


A function extension always describes an operation that applies to the Data Response and the Dataset Metadata Response (because some functions might alter the syntacic structure of the data the change would be previewed by applying the function to the DMR).
Filter expressions can only be applied to Sequence variables (or arrays of them).
In each case the result of the filter operation returns ''the same type'' variable. A Sequence variable is essentially a table of values and thus can be thought of as containing a number of rows and the filter expression is applied to ''each row'' in the order those rows are provided to the expression evaluator. Every row that satisfies the predicate will be included in the value returned; those that don't will not be included in the result. Note that no new values are computed by these operations; no interpolations, means, etc., are performed.


Each <font size="2"><code>'''<dsr:function>'''</code></font> MUST contain:
The behavior of filtering expressions on Sequences will be covered in the following sections.
* A <font size="2"><code>'''name'''</code></font> attribute whose value is the name of the function as it would be used in a DR or DMR request.
* A <font size="2"><code>'''role'''</code></font> attribute whose value is a URI that universally defines the function.
* A <font size="2"><code>'''<dsr:Description>'''</code></font> element whose value is a human readable description of the function and a link to detailed documentation of the function and its usage.


A common example of a function would be the DAP2 (and DAP4) geogrid function that allows users to be sub-sampled gridded data using georeferenced values.
<!--
==== Filters on Sequences ====
<source lang="xml">
<Dataset name="vol_1_ce_8">


==== dsr:functionGroup ====
  <Sequence name="Points">
    <Int32 name="x"/>
    <Int32 name="y"/>
  </Sequence>
 
</Dataset>
</source>
<source lang="c">
Dataset {
    Sequence {
        Int32 x;
        Int32 y;
    } Points;
} arrays;
</source>
-->


A function group identifies that the server supports a group of functions. Rather than enumerating a large list of related functions, the functionGroup provides the user with the information that the server supports a collection of related functions.
=== Filters and more complex data types ===
The basic syntax for filters is that there is a subsetting expression, a pipe ('''|''') and then one or more filter predicates. This syntax can appear any place a ''selection expression'' can appear, so it can be used inside braces when an Array or Sequence is a field of a Structure or Sequence. Note that the filter expression prefix operator binds to the index subset immediately to its left at the same level (i.e. eliding braces). Some examples follow.


Each <font size="2"><code>'''<dsr:functionGroup>'''</code></font> MUST contain:
==== Example: Filters on complex types ====
* A <font size="2"><code>'''name'''</code></font> attribute whose value is the human readable name of the collection of functions.
<source lang="xml">
* A <font size="2"><code>'''role'''</code></font> attribute whose value is a URI that universally defines the function group.
<Dataset name="vol_1_ce_9">
* A <font size="2"><code>'''<dsr:Description>'''</code></font> element whose value is a human readable description of the function group and a link to detailed documentation regarding the functions in the function group and their usage.
  <Sequence name="Points1">
    <Int32 name="x">
      <Dim size="100"/>
    </Int32>
    <Int32 name="y"/>
  </Sequence>
 
  <Sequence name="Points2">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="sounding">
      <Int32 name="depth"/>
      <Int32 name="temp"/>
    </Sequence>
  </Sequence>


An example of a function group would be a server that implemented a functional syntax for all of the functions found in the [http://ferret.wrc.noaa.gov/Ferret/ Ferret] application.
  <Sequence name="Points3">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="sounding">
      <Int32 name="depth"/>
      <Int32 name="temp"/>
    </Sequence>
    <Dim size="20"/>
  </Sequence>
 
  <Structure name="Points4">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="raw">
      <Int32 name="depth"/>
      <Int32 name="temps">
        <Dim size="4"/>
      </Int32>
      <Dim size="300"/>
    </Sequence>
  </Structure>
 
</Dataset>
</source>
<source lang="c">
Dataset {
    Sequence {
        Int32 x[100];
        Int32 y;
    } Points1;


==== dsr:extension ====
    Sequence {
An extension is a mechanism for indicating that the server supports a certain behavior. This behavior MUST NOT be an operation/computation on the data (other wise it would be a <font size="2"><code>'''dsr:function'''</code></font>or a <font size="2"><code>'''dsr:functionGroup'''</code></font>), but rather some other behavior that the server can undertake.
        Int32 x;
        Int32 y;
        Sequence {
            Int32 depth;
            Int32 temp;
        } sounding;
    } Points2;


Each <font size="2"><code>'''<dsr:extension>'''</code></font> MUST contain:
    Sequence {
* A <font size="2"><code>'''name'''</code></font> attribute whose value is the human readable name of the extension.
        Int32 x;
* A <font size="2"><code>'''role'''</code></font> attribute whose value is a URI that universally defines the extension.
        Int32 y;
* A <font size="2"><code>'''<dsr:Description>'''</code></font> element whose value is a human readable description of the extension and a link to detailed documentation regarding the extension and it's behavior and  usage.
        Sequence {
            Int32 depth;
            Int32 temp;
        } sounding;
    } Points3[20];


An example of an extension would be the DAP4 asynchronous response support. Some servers will support this functionality, and the functionality is not defined as an operation on the data itself, but in terms of the transaction procedures for accessing the data.
    Structure {
        Int32 x;
        Int32 y;
        Sequence {
            Int32 depth;
            Int32 temps[4];
        } raw;
    } Points4[100]


=== Example:  A Minimal DSR ===
} complex_types_example;
</source>


This example contains only the required components of the DSR document from a minimal DAP4 server implementation.
; ''/Points1{x[0:9]}|y<3'': For the Sequence ''Points1'', return the rows of data where ''y'' is less than 3. In those rows, subset ''x'' so that only the first ten elements are included. Note that y is mentioned in the filter, but not in the selection so it will not appear in the resulting DMR.
; ''/Points2{x; y; sounding | depth > 20} | x > 17'': This show, without the added complexity of an array, how filter expressions associate with Sequences. For the sequence ''sounding'' the filter expression can use only ''depth'' and ''temp'' (and constants). When filtering the values of a child sequence, the sequence name must be used and thus the names of all of the fields of the parent sequence needed in the result must be listed.
; ''/Points3[10:19] { x; y; sounding | depth > 10 } | 20 < x < 40, y <35'': This selection expression first finds the index subset of ''Points3'' and arranges to return the fields ''x'', ''y'' and ''sounding'' where ''x''and ''y'' satisfy the predicates ''20 < x < 40'' and ''y <35'' and for the field ''sounding'', which is a Sequence itself, it will return both fields where ''depth > 10''. This example points out an important aspect to the syntax and to expression evaluation: the order of evaluation of the filter predicates happens after the index and  variable and/or field subsetting. The order of evaluation of the complete filter predicates can happen in any order (i.e., the ''20 < x < 40, y <35'' and ''depth > 10'' predicates can happen in any order. The order of evaluation of the filter predicate subexpressions (i.e., ''20 < x < 40'' and  ''y <35'') is also unspecified.
; ''/Points4[3:2:8] {x; y; raw{temps[2] | temps > 7,ND=-1}}'': In this expression the ''temps'' field of the Sequence ''raw'' is still an Array, it's just an Array with a single element, which illustrates that neither the subsetting nor filtering operations alter the types of the variables.


<font size="2">
<!--
<source lang="xml" >
=== The Mask operator may be applied only to an Array ===
For this predicate, the syntax is ''dest *= source'' where any element<sub>'''i,j'''</sub> in the ''source'' array with the value equal to the given ''No Data'' value will result in the corresponding element of the ''dest'' array being set to that array's ''No Data'' value. Other values in the ''dest'' array are not affected. The intent of this is that filtering performed on a map array can then be applied to a data array that uses the map (although there's no actual restriction that this be only used with coverages; any two isomorphic arrays in an anonymous group can be used). To use the Mask (''*='') predicate, the array on the Right Hand Side of the sub-clause must have been previously filtered (and so must have a given ND value).


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<source lang="xml">
<DatasetServices xmlns="http://xml.opendap.org/ns/DAP/4.0/dataset-services#"
<Dataset name="vol_1_ce_10">
     xml:base="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml">
  <Float32 name="lat">
    <Dim size="100"/>
    <Dim size="50"/>
  </Float32>
 
  <Float32 name="lon">
    <Dim size="100"/>
    <Dim size="50"/>
  </Float32>
 
  <Float32 name="temp">
     <Dim size="100"/>
    <Dim size="50"/>
  </Float32>
</Dataset>
</source>
<source lang="c">
Dataset {
Float32 lat[100][50];
Float32 lon[100][50];
Float32 temp[100][50];
} vol_1_ce_10;
</source>


    <DapVersion>4.0</DapVersion>
; ''lat | lat < 20, ND=-255; temp | temp *= lat, ND=-255'': Filter ''lat'' so that all values  < 20 are replaced with the No Data value, then use that as a mask, making all of the corresponding elements of 'temp'' also be No Data. Note that this operator forces an order on filter evaluation.
   
; ''lat[0:20][0:20] | lat < 20, ND=-255; lon[0:20][0:20] | -100 < lon < -80, ND=-255; temp[0:20][0:20] | temp *= lon, temp *= lat, ND=-255'': This will result in ''temp'' effectively being masked by the logical AND of ''lat'' and ''lon''
    <ServerSoftwareVersion>Hyrax-2.7.9</ServerSoftwareVersion>
       
    <Service title="DAP4 Dataset Services" role="http://services.opendap.org/dap4/dataset-services">
        <link type="application/vnd.opendap.org.dataset-services+xml"
              href="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml">
              <alt type="text/xml"/>
        </link>
        <link type="text/xml"  href="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml.xml"/>
    </Service>


    <Service title="DAP4 Dataset Metadata"  role="http://services.opendap.org/dap4/dataset-metadata">
==== Filters and Coverages ====
        <link type="application/vnd.org.opendap.dap4.dataset-metadata+xml"
The same filtering operations that can be applied to simple arrays can be applied to a coverage by simply using its Maps in the Constraint expression (i.e., the subset and filter sub-expressions). The array filtering operation previously described can easily be applied to a coverage. Below we show two cases; where the maps are vectors and where they are two-dimensional arrays.
              href="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml.dmr">
              <alt type="text/xml"/>
        </link>
        <link type="text/xml" href="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml.dmr.xml"/>
    </Service>


    <Service title="DAP4 Data" role="http://services.opendap.org/dap4/data">
<source lang="xml">
        <link type="application/vnd.org.opendap.dap4.data" href="http://test.opendap.org:8090/opendap/hyrax/ECMWF_ERA-40_subset.ncml.dap" />
<Dataset name="vol_1_ce_11" dapVersion="4.0" dmrVersion="1.0" xml:base="file:dap4/vol_1_ce_11.xml"
     </Service>
  xmlns="http://xml.opendap.org/ns/DAP/4.0#" xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
 
  <Float32 name="lat">
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
 
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lon"/>
     <Map name="lat"/>
  </Float32>
 
</Dataset>
</source>
<source lang="c">
Dataset {
Dimensions: nlat=100, nlon=50;
Float32 lat[nlat];
Float32 lon[nlon];


   
Float32 temp(lon)(lat);
</DatasetServices>
} vector_maps;
</source>
</source>
</font>


== Asynchronous Service Behavior and Responses ==
; ''lat | lat < 20,ND=-255; lon | -100 < lon < -80, ND=-255'; temp'': Return all of ''lat'', ''lon'' and ''temp'' where ''lat'' and ''lon'' have ben filtered as per the predicates. The values of ''temp'' are not altered.
Asynchronous responses are responses that will take the server some time to build. When a client is told that a response 'is asynchronous,' it must know to come back at a later time to retrieve the response. The concept is a very simple one, and the existing network infrastructure is very good at supporting these kinds of interactions. A major factor in the success of the proposed solution will be the level of uniform support for the design. Secondly, as is often the case, the details will be more complex than the underlying concept. In particular, the request mechanism must be extended so that synchronous (regular) requests are not affected by the addition of asynchronous requests and, at the same time, clients do not inadvertently make asynchronous requests. another detail is that the (asynchronous) responses are ''ephemeral'' because they typically only persist for a period of time and then be purged.
; ''nlat=[0:20]; nlon=[0:20]; lat | lat < 20, ND=-255; lon | -100 < lon < -80, ND=-255'; temp'': The same as above, but the maps ''lat'' and ''lon'' and the array ''temp'' are subset using the index subsetting expression and the resulting arrays are filtered. Note that because this is a coverage, the Maps' dimensions are tied to ''temp'''s dimensions using Shared Dimensions and we can use the Shared Dimension slicing to specify the slicing once and use it for all three of the Maps/Arrays. That is, ''lat'', ''lon'' and ''temp'' are each subset using ''[0:20][0:20]''. Contrast this with the corresponding example in the previous section where the slicing subset had to be explicitly specified for each Array.
; ''nlat=[0:20]; nlon=[0:20]; lat | lat < 20, ND=-255; lon | -100 < lon < -80, ND=-255'; temp > 7, ND=-255'': Same as above, but now ''temp'' is filtered too. Note that the filters applied to ''lat'' and ''lon'' have no affect on ''temp''.


A typical 'workflow' for an asynchronous request is:
<source lang="xml">
# A client makes a data request that indicates that it will accept either an asynchronous or synchronous response. Optionally, the client can place a time constraint on the response, indicating that if the response will not be ready in a given period of time, it does not want the response.
<Dataset name="vol_1_ce_12" dapVersion="4.0" dmrVersion="1.0" xml:base="file:dap4/vol_1_ce_12.xml"
# The server returns an initial response (without delay) that indicates the request has indeed resulted in an asynchronous response and provides the client with a URL and time estimate.  
  xmlns="http://xml.opendap.org/ns/DAP/4.0#" xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#">
# The client reads the time estimate and waits...
  <Dimension name="x" size="100"/>
# The client dereferences the URL and gets the response.
  <Dimension name="y" size="50"/>
 
  <Float32 name="lat">
    <Dim name="x"/>
    <Dim name="y"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="x"/>
    <Dim name="y"/>
  </Float32>
 
  <Float32 name="temp">
    <Dim name="x"/>
    <Dim name="y"/>
    <Map name="lon"/>
    <Map name="lat"/>
  </Float32>
 
</Dataset>
</source>
<source lang="c">
Dataset {
Dimensions: x=100, y=50;
Float32 lat[x][y];
Float32 lon[x][y];


The remainder of this section will expand on this basic workflow using examples that focus on the HTTP protocol but that also allow for the use of other transport protocols.
Float32 temp(lon)(lat);
} two_dim_maps;
</source>


=== Client willingness to accept asynchronous responses ===
; ''lat | lat < 20,ND=-255; lon | -100 < lon < -80, ND=-255'; temp'': These examples repeat the above three, but show that the same syntax applies to the case where the maps are N-dimensional (in this case N == 2).
A client can indicate willingness to accept asynchronous responses in one of two ways:
:; ''lat=[0:20][0:20]; lon=[0:20][0:20]; lat | lat < 20, ND=-255; lon | -100 < lon < -80, ND=-255'; temp'':
* By including the [[#Accept DAP Asynchronous Response|X-DAP-Async-Accept]] HTTP header.
:; ''lat=[0:20][0:20]; lon=[0:20][0:20]; lat | lat < 20, ND=-255; lon | -100 < lon < -80, ND=-255'; temp > 7, ND=-255'':
* By adding the [[#DAP4_Constraint_Expression_extension_for_Async | async]] keyword to the DAP constraint expression.
-->


If the client indicates that it must have access to the asynchronous response content within a certain time (utilizing either the  [[#Accept DAP Asynchronous Response|X-DAP-Async-Accept]] HTTP header and/or the [[#DAP4_Constraint_Expression_extension_for_Async | async]] keyword in the constraint expression) and the response will not be available in that time frame, the server MUST reject the request and return an HTTP status of [[#412_Precondition_Failed | 412]] and the [[#DAP Asynchronous Request Rejected | DAP Asynchronous Request Rejected]] XML document.
==References==


If both the ''X-DAP-Async-Accept'' HTTP header and the ''async'' keyword are used, the keyword takes precedence.
<ol>
<li><div id="Ref-1"></div>
Caron, J.,
<i>Unidata's Common Data Model Version 4</i>, 2012
(http://www.unidata.ucar.edu/software/netcdf-java/CDM/).


Servers must reject requests that require an asynchronous response if the client has not indicated willingness to accept such a response. Rejection of such requests is indicated by all three of the following:
<li><div id="Ref-2"></div>
# [[#400 DAP Asynchronous Response Required| HTTP status of 400]]
Folk, M. and E. Pourmal,
# Inclusion of the [[#DAP Asynchronous Response Required|X-DAP-Async-Required]] HTTP response header
<i>HDF5 Data Model, File Format and Library &mdash; HDF5 1.6</i>,
# The response body must contain the [[#DAP_Asynchronous_Response_Required | DAP Asynchronous Response Required]] XML document.  
Category: Recommended Standard January 2007
This safety check (requiring clients to explicitly indicate their willingness to accept asynchronous responses) is required because otherwise very simple clients might inadvertently make requests that will result in an asynchronous responses, and these kinds of responses are likely to use disproportionately (relative to synchronous responses) more server resources. We want to make DAP4 so that simple clients work well and don't encounter unexpected 'hiccups.'
NASA Earth Science Data Systems Recommended Standard ESDS-RFC-007, 2007
(http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/ese-rfc-007/ESDS-RFC-007v1.pdf).


=== Initial processing by the server ===
<li><div id="Ref-3"></div>
When a request is accepted by the server and it will result in an asynchronous response, the server MUST the server MUST return a 202 (Accepted) HTTP
Gallagher J., N. Potter, T. Sgouros, S. Hankin, and G. Flierl,
status code and the [[#DAP Asynchronous Request Accepted | DAP Asynchronous Request Accepted ]] XML document. This document contains a URL to the pending result of the request.
<i>The Data Access Protocol&mdash;DAP 2.0</i>,
NASA Earth Science Data Systems Recommended Standard ESE-RFC-004.1.2
(http://opendap.org/pdf/ESE-RFC-004v1.2.pdf).


Of course, this discussion is about the mechanism that enables a client to make a request and the server to provide ''information about'' an asynchronous response to that request. It does not cover any of the nearly infinite ways a server might actually make the ''content'' of that response. It is likely that servers will write the responses to files and the URL returned to the client will be used to retrieve that file, but there's no requirement that servers do that. The only requirements on server are that:
<li><div id="Ref-4"></div>
#The URL returned asserts, using the [[#DAP4_Constraint_Expression_extension_for_Async | constraint expression syntax for async]] that the client accepts async responses.
Gosling, J., B. Joy, G. Steele, G. Bracha, A Buckley,
# The URL returned can be dereferenced and that operation will return the response requested by the client.
<i>The Java™ Language Specification &mdash; 7th Editition</i>
Oracle Corporation, 2012,
(http://docs.oracle.com/javase/specs/jls/se7/html/).


=== Response retrieval by the client ===
<li><div id="Ref-5"></div>
When a client requests an asynchronous result that is ready, the server MUST return a 200 (OK) HTTP status code and the resulting data response. If the client attempts to access the asynchronous result prior to it's availability, the server SHOULD return an HTTP response status of  [[#409_Conflict_-_DAP4_Response_Not_Ready | 409 (DAP Response Not Ready)]] along with the [[#DAP4_Asynchronous_Response_Not_.28Yet.29_Available | DAP Asynchronous Response Not Available]] XML document. If the server does not return the 409 response status then it MUST return a 404 (Not Found) response along with whatever document it deems fit as the response body.
Hartnett, E.,
<i>netCDF-4/HDF5 File Format</i>,
NASA Earth Science Data Systems Recommended Standard ESDS-RFC-022, 2011
(http://earthdata.nasa.gov/sites/default/files/field/document/ESDS-RFC-022v1.pdf).


If the client attempts to access the asynchronous result after it is no longer available, the server SHOULD return an [[#410_Gone_-_DAP4_Response_No_Longer_Available | HTTP response status of 410 (Gone)]] along with the [[#DAP4_Asynchronous_Response_Gone| DAP4 Asynchronous Response Gone]] document. If the server does not return the 410 response status then the server MUST return a 404 (Not Found) response along with whatever document it deems fit as the response body.
<li><div id="Ref-6"></div>
IEEE, <i>IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985</i>, Digital Object Identifier: 10.1109/IEEESTD.1985.82928, 1985.


In each case above where the server SHOULD return a specific error code, but may return a 404 code instead, the intent is for servers to provide the most appropriate use of HTTP/1.1's error codes while also providing servers with an 'out' when that is hard for them to do. For example, knowing that a response, which is essentially ephemeral, is gone would, in theory, require to server to keep a record of every URL ever issued for an asynchronous response and that is not practical. At the same time, it is easy to see that a client would really like to know that the response has not yet been finished (i.e., it has not waited long enough) or that it is gone (i.e., it waited too long).
<li><div id="Ref-7"></div>
The Internet Society, <i>IETF RFC 2119:
Key words for use in RFCs to Indicate Requirement Levels
</i>, 1997
(http://tools.ietf.org/html/rfc2119).


=== Detail: Client requests ===
<li><div id="Ref-8"></div>
The Internet Society, <i>IETF RFC 2396:
Uniform Resource Identifiers (URI): Generic Syntax
</i>, 1998
(http://tools.ietf.org/html/rfc2396).


==== DAP4 Constraint Expression extension for Async ====
<li><div id="Ref-9"></div>
By adding a keyword/value pair to the DAP4 constraint expression we can allow a client to encode it's willingness to accept an asynchronous response, along with the a maximum amount of time the client can wait before it can access the response.  
The Internet Society, <i>IETF RFC 2616:
Hypertext Transfer Protocol &mdash; HTTP/1.1
</i>, 1999
(http://tools.ietf.org/html/rfc2616).


; async
<li><div id="Ref-10"></div>
: A value of zero indicates the client is willing to unconditionally accept an asynchronous response. A positive integer value will be interpreted as the number of seconds that the client will wait for access to the response. If the value is negative the serve MUST return an error.  
The Internet Society, <i>IETF RFC 4506: XDR: External Data Representation Standard</i>, 2006
(http://tools.ietf.org/html/rfc4506).


; Examples
<li><div id="Ref-11"></div>
: Client is willing to unconditionally accept an asynchronous response
ISO/IEC,
:: <font size="2"><code>?async=0</code><font>
<i>Information technology &mdash; Portable Operating System Interface (POSIX) &mdash; Part 2: Shell and Utilities</i>,
: Client is willing to wait for 60 seconds for access to the asynchronous response
ISO/IEC 9945-2,1993
:: <font size="2"><code>?async=60</code></font>
(http://www.iso.org/iso/catalogue_detail.htm?csnumber=17841).


==== X-DAP-Async-Accept ====
<li><div id="Ref-12"></div>
The Open Geospatial Consortium Inc.,
<i>Abstract Specifications</i>,
(http://www.opengeospatial.org/standards/as).


A client may indicate willingness to accept asynchronous responses by including the ''X-DAP-Async-Accept'' HTTP header. Clients can make conditional requests for asynchronous responses by indicating the maximum time they are willing to wait by using the '''X-DAP-Async-Accept''' HTTP header with a value given in seconds. A value of zero indicates that the client is willing to accept whatever delay the server may encounter.
<li><div id="Ref-13"></div>
The Organization for the Advancement of Structured Information Standards,
<i>RELAX NG Specification</i>,
Committee Specification: 2001,
J. Clark, M. Makoto (eds.)
(http://relaxng.org/spec-20011203.html).


=== Detail: Server responses ===
<li><div id="Ref-14"></div>
The Unicode Consortium. <i>The Unicode Standard, Version 6.2.0</i>,  ISBN 978-1-936213-07-8, 2012.


Several 'experimental' HTTP headers are used by this design. They convey information either in the request (like the ''X-DAP-Async-Accept'' described above) or they encode information for a response. While only clients that intend to support asynchronous responses need to understand all of these, ''every'' client SHOULD understand the ''X-DAP-Async-Required'' header. Because we need to support clients like web browsers, knowledge of that header is not required, but DAP4-specific clients will provide the most information to users if they know to look for at least that response header.
<li><div id="Ref-15"></div>
Unidata,
<i>CF Metadata</i>,
(http://www.cfconventions.org/).


==== X-DAP-Async-Required ====
<li><div id="Ref-16"></div>
W3C, <i>Extensible Markup Language (XML) 1.0</i>,
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau (eds.),
Fifth Edition. 2008
(http://www.w3.org/TR/2008/REC-xml-20081126/).


The ''X-DAP-Async-Required'' HTTP response header is included in the response if the request requires an asynchronous response and the client has not indicated willingness to accept such a response. Rejection of the request should also be indicated by the [[#400_DAP4_Asynchronous_Response_Required |400 DAP Asynchronous Response Required]] HTTP response code.
<li><div id="Ref-17"></div>
World Meteorological Organization,
<i>FM 92 GRIB</i>,
edition 2, version 2, 2003
(http://www.wmo.int/pages/prog/www/DPS/FM92-GRIB2-11-2003.pdf).007
</ol>


==== X-DAP-Async-Accepted ====
=Appendices=


The ''X-DAP-Async-Accepted'' HTTP response header is included in the response if the server has accepted an asynchronous request. Acceptance of the request should also be indicated by the [[#202_Accepted|202 Asynchronous Request Accepted]] HTTP response code.
==Appendix 1. DAP4 DMR Syntax as a RELAX NG Schema==


==== HTTP Response Codes ====
This RELAX NG grammar is the definitive formal grammar for the DMR.


HTTP provides a number of response codes beyond the simple 200 (OK), 404 (Not Found) and 500 (Internal Server Error). In this design we describe how those standard codes SHOULD be used by DAP4 servers. We don't enumerate all of the possible codes, instead opting for a description of those that most relevant.
<source lang="xml">
<!-- RELAX NG Grammar -->
<!-- Date: June 15, 2012 -->
<!-- Last Revised: November 23, 2012 -->


===== 202 Accepted =====
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
        xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
        datatypeLibrary="http://xml.opendap.org/datatypes/dap4"
        ns="http://xml.opendap.org/ns/DAP/4.0#"
        >
<start>
  <ref name="dataset"/>
</start>


A server indicates that a request has been accepted and will be handled asynchronously by returning a '202 Accepted' HTTP response code. The response body must contain a document in one of the asynchronous information media types listed [[#Media Types|below]]. A server MUST return this response, and only do so, when a client has indicated a willingness to process an asynchronous response and the response will actually be returned using the asynchronous mechanism.
<define name="dataset">
  <element name="Dataset">
    <a:documentation>
        Semantic restriction: dapVersion, dmrVersion are required.
    </a:documentation>
   
    <attribute name="dapVersion"><data type="dap4_string"/></attribute>
    <attribute name="dmrVersion"><data type="dap4_string"/></attribute>


===== 400 DAP4 Asynchronous Response Required =====
    <ref name="groupbody"/>
  </element>
</define>


The '400 DAP Asynchronous Response Required' HTTP response code is used to indicate that the DAP4 request has been rejected because an asynchronous response is required and the client did not indicate willingness to accept an asynchronous response.
<define name="groupdef">
  <element name="Group">
    <ref name="groupbody"/>
  </element>
</define>


The response code text is used to indicate the reason for the rejection. However, since the '400' HTTP response code is not specific to asynchronous DAP (the standard text for the '400' code is "Bad Request"), the ''X-DAP-Async-Required'' HTTP response header is also included in the response (see [[#Accept DAP Asynchronous Response|above]]).
<define name="groupbody">
  <attribute name="name"><data type="dap4_id"/></attribute>
 
  <zeroOrMore>
    <ref name="dimdef"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="enumdef"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="variable"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="metadata"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="groupdef"/>
  </zeroOrMore>


'''Note''' that a standard 400 HTTP response code is returned. In this way, a client that does not understand asynchronous DAP can fail gracefully. The response code text message has been changed to be more informative of the reason for the failure. For clients that are aware of asynchronous DAP, the "DAP-Async-Required" header is set to "true". The body of the response also returns some information the client can use to decide on how it will continue.
</define>


===== 409 Conflict - DAP4 Response Not Ready =====
<define name="enumdef">
  <element name="Enumeration">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="basetype">
        <choice> <!-- Must be consistent with atomictype and variable -->
            <value>Byte</value> <!-- equivalent to UInt8 -->
            <value>Int8</value>
            <value>UInt8</value>
            <value>Int16</value>
            <value>UInt16</value>
            <value>Int32</value>
            <value>UInt32</value>
            <value>Int64</value>
            <value>UInt64</value>
        </choice>
    </attribute>
    <oneOrMore><ref name="enumconst"/></oneOrMore>
  </element>
</define>


The '409 Conflict' HTTP response code MAY be returned by a server to indicate that the DA4P request has been rejected because a previous asynchronous request has not been completed and the result is not ready for access. If a server utilizes the '409 Conflict' HTTP response code it must also return a [[#DAP4_Asynchronous_Response_Not_.28Yet.29_Available | DAP4 Asynchronous Response Not Yet Available]] document in the response body.
<define name="enumconst">
  <element name="EnumConst">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="value"><data type="dap4_integer"/></attribute>
  </element>
</define>


===== 410 Gone - DAP4 Response No Longer Available =====
<define name="namespace">
  <zeroOrMore>
    <element name="Namespace">
      <attribute name="href"><data type="dap4_uri"/></attribute>
    </element>
  </zeroOrMore>
</define>


The '410 Gone' HTTP response code MAY be used by a server to indicate that the result of an asynchronous request is no longer available. If a server utilizes the '410 Gone' HTTP response code it must also return a [[#DAP4_Asynchronous_Response_Gone | DAP4 Asynchronous Response Gone]] document in the response body.
<define name="dimdef">
  <element name="Dimension">
    <a:documentation>
      A Dimension is a binding of a name to a size; when two or more variables
      use the same 'name' it can be inferred that they 'share' that dimension.
      The 'size' attribute must be a positive integer.
    </a:documentation>
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="size"><data type="dap4_dim"/></attribute>
    <ref name="metadatalist"/>
  </element>
</define>


===== 412 Precondition Failed=====
<define name="dimref">
  <element name="Dim">
    <optional>
        <attribute name="name"><data type="dap4_fqn"/></attribute>
    </optional>
    <optional>
      <attribute name="size">
          <data type="dap4_dim"/>
      </attribute>
    </optional>
  </element>
</define>


The '412 Precondition Failed' HTTP response code is used to indicate that the DAP request has been rejected because it did not meet the '''X-DAP-Async-Accept''' condition (see [[#Accept DAP Asynchronous Response Conditionally on Estimated Time to Completion|above]]) that was specified in the request.
<!-- Atomictype define is only a way
    to list the set of atomictypes;
    it is never used in the grammar
-->
<define name="atomictype">
  <!-- This must be consistent with "variable" below -->
  <choice>
    <value>Char</value>
    <value>Byte</value>
    <value>Int8</value>
    <value>UInt8</value>
    <value>Int16</value>
    <value>UInt16</value>
    <value>Int32</value>
    <value>UInt32</value>
    <value>Int64</value>
    <value>UInt64</value>
    <value>Float32</value>
    <value>Float64</value>
    <value>String</value>
    <value>URL</value>
    <value>Opaque</value>
    <value>Enum</value>
  </choice>
</define>


===== 500 Internal Error=====
<define name="variable">
  <choice>
    <ref name="simplevariable"/>
    <ref name="structurevariable"/>
    <ref name="sequencevariable"/>
  </choice>
</define>


The '500 Internal Error' HTTP response code is used to indicate that the DAP request has caused an error on the server. The request body and other headers must be compliant with the [[DAP4_Web_Services_v3#DAP4_Error_Response | DAP4 Error Response]] and [[DAP4_Web_Services_v3#Status_Codes | Status Codes]] sections of the [[DAP4_Web_Services_v3 | web services specification]]. The request should not be repeated.
<define name="simplevariable">
  <choice>
    <!-- Following  must be consistent with "atomictype" -->
    <element name="Char"  ><ref name="variabledef"/></element>
    <element name="Byte"  ><ref name="variabledef"/></element>
    <element name="Int8"  ><ref name="variabledef"/></element>
    <element name="UInt8"  ><ref name="variabledef"/></element>
    <element name="Int16"  ><ref name="variabledef"/></element>
    <element name="UInt16" ><ref name="variabledef"/></element>
    <element name="Int32"  ><ref name="variabledef"/></element>
    <element name="UInt32" ><ref name="variabledef"/></element>
    <element name="Int64"  ><ref name="variabledef"/></element>
    <element name="UInt64" ><ref name="variabledef"/></element>
    <element name="Float32"><ref name="variabledef"/></element>
    <element name="Float64"><ref name="variabledef"/></element>
    <!-- Made 'string' capitalized. jhrg -->
    <element name="String" ><ref name="variabledef"/></element>
    <!-- Added URL type. jhrg -->
    <element name="URL" ><ref name="variabledef"/></element>
    <element name="Opaque"><ref name="variabledef"/></element>
    <element name="Enum">
      <attribute name="enum"><data type="dap4_fqn"/></attribute>
      <ref name="variabledef"/>
    </element>
  </choice>
</define>


==== Asynchronous Response Documents ====
<define name="variabledef">
  <attribute name="name"><data type="dap4_id"/></attribute>
  <zeroOrMore>
    <choice>
      <ref name="dimref"/>
      <ref name="mapref"/>
      <ref name="metadata"/>
    </choice>
  </zeroOrMore>
</define>


The uses of these documents are:
<define name="mapref">
* to inform clients that a request will result in an asynchronous response;
  <element name="Map">
* to provide clients with the status of an an accepted asynchronous request; and
    <attribute name="name"><data type="dap4_fqn"/></attribute>
* to inform clients that a request for and asynchronous response has been rejected.
  </element>
</define>


These response documents are the payloads to various responses, including errors. By using the HTTP 400-series error response codes, the design ensures that generic web clients will understand that their request was in error (even if they don't really understand why). The text provided with the response code will be sufficient that person could understand the gist of the problem, if not more. The response documents described here, along with the ''X-DAP'' describe above, are a way of providing additional information to a savvy client so that it can take full advantage of the synchronous response system.
<define name="structurevariable">
  <element name="Structure">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
      <choice>
        <ref name="dimref"/>
        <ref name="variable"/>
        <ref name="metadata"/>
      </choice>
    </zeroOrMore>
  </element>
</define>


These documents are XML that follows the DAP Asynchronous XML schema and are declared in the namespace '''<nowiki>http://opendap.org/ns/dap/asynchronous</nowiki>'''.
<define name="sequencevariable">
  <element name="Sequence">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
      <choice>
        <ref name="dimref"/>
        <ref name="variable"/>
        <ref name="metadata"/>
      </choice>
    </zeroOrMore>
  </element>
</define>


===== DAP4 Asynchronous Response Required =====
<define name="metadatalist">
  <zeroOrMore>
    <ref name="metadata"/>
  </zeroOrMore>
</define>


This document informs clients that a request will result in an asynchronous response, and that the client has not yet indicated it's willingness to accept  an asynchronous response. It might seem superfluous to include a document that clearly only a client knowledgable about the asynchronous response features could parse, but many such clients may not, as a matter of course, indicate they will accept these responses. For example, a user-configurable parameter might be turn off support for the feature. The ''expectedDelay'' and ''responseLifetime'' elements convey information about conditions the clients can expect if it submits an asynchronous request for the response. As noted below, these are estimates made by the server since a number of things that the server cannot predict can affect them in the interleaving time between the client's requests. Additionally, a server MAY return values of zero for either of the values, indicating that it cannot make an accurate estimate.
<define name="metadata">
    <choice>
    <ref name="otherxml"/>
    <ref name="attribute"/>
    </choice>
</define>


<font size="2"><source lang="xml">
<define name="attribute">
<AsynchronousResponse status="required">
  <choice>
  <expectedDelay seconds="600" />
    <ref name="atomicattribute"/>
  <responseLifetime seconds="3600"/>
    <ref name="containerattribute"/>
</AsynchronousResponse>
  </choice>
</source></font>
</define>


This response MUST be associated with the 400 HTTP response code and the ''X-DAP-Async-Required'' response header.
<define name="atomicattribute">
  <element name="Attribute">
      <attribute name="name"><data type="dap4_id"/></attribute>
      <a:documentation>
        Semantic constraint: type must be compatible
        with the set of attribute value types
      </a:documentation>
      <attribute name="type">
        <choice>
          <value>Char</value>
          <value>Byte</value>
          <value>Int8</value>
          <value>UInt8</value>
          <value>Int16</value>
          <value>UInt16</value>
          <value>Int32</value>
          <value>UInt32</value>
          <value>Int64</value>
          <value>UInt64</value>
          <value>Float32</value>
          <value>Float64</value>
          <value>String</value>
          <value>URL</value>
          <value>Enum</value>
          <value>Opaque</value>
        </choice>
      </attribute>
      <optional>
          <ref name="namespace"/>
      </optional>
      <zeroOrMore>
<choice>
          <element name="Value">
              <attribute name="value">
                <choice> <!-- technical ambiguity -->
                    <data type="dap4_integer"/>
                    <data type="dap4_float"/>
                    <data type="dap4_opaque"/>
                    <data type="dap4_char"/>
                    <data type="dap4_string"/>
                    <data type="dap4_fqn"/> <!-- for enum types -->
                </choice>
              </attribute>
        </element>
<element name="Value"><data type="dap4_text"/></element>
</choice>
      </zeroOrMore>
  </element>
</define>


===== DAP4 Asynchronous Request Accepted =====
<define name="containerattribute">
  <element name="Attribute">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
<ref name="attribute"/>
    </zeroOrMore>
  </element>
</define>


This response informs clients that a request resulting in an asynchronous response has been accepted, along with operational information about retrieving the asynchronous response result. Note that the ''expectedDelay'' and ''responseLifetime'' elements are an estimate by the server. A server SHOULD ensure that the response will remain available for the time period given by ''expectedDelay'' and ''responseLifetime''. We say ''SHOULD'' and not ''MUST'' because we cannot predict all possible operational situations where these kinds of responses might be used. For example, a server might be providing access for several types of users who might have different access priorities, especially to limited resources like those typically involved with asynchronous access, and thus some responses might be further delayed, or removed early, to enable processing of requests from users with higher priority. It should be kept in mind, however, that the usefulness of the asynchronous responses will depend, in part, on servers providing a facility on which clients can depend.
<define name="otherxml">
  <element name="OtherXML">
    <ref name="arbitraryxml"/>
  </element>
</define>


While the ''expectedDelay'' and ''responseLifetime'' elements are required, a server MAY set their ''seconds'' attribute to ''0'' to indicate that it cannot provide a reliable value. In this case, clients SHOULD poll every 300 seconds and servers SHOULD expect this behavior. This is the default TCP user timeout period (see http://tools.ietf.org/html/rfc5482).
<define name="arbitraryxml">
    <element>
      <anyName/>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <text/>
          <ref name="arbitraryxml"/>
        </choice>
      </zeroOrMore>
    </element>
</define>
</grammar>
</source>


<font size="2"><source lang="xml">
==Appendix 2. DAP4 RELAX NG Lexical Elements==
<AsynchronousResponse status="accepted">
Within the RELAXNG DAP4 grammar there are markers for occurrences of primitive type such as integers, floats, or strings (ignoring case). The markers typically look like this when defining an attribute that can occur in the DAP4 DMR.
  <expectedDelay seconds="600" />
  <responseLifetime seconds="3600"/>
  <link href="http://server.org/async/path/result" />
</AsynchronousResponse>
</source></font>


This response document MUST be associated with the 202 HTTP status code and the ''X-DAP-Async-Accepted'' response header.
<blockquote>
 
<pre>
===== DAP4 Asynchronous Response Not (Yet) Available =====
&lt;attribute name="Principal_Investigator"&gt;
&lt;datatype="dap4_string"/&gt;
&lt;/attribute&gt;
</pre>
</blockquote>
The "&lt;data type="dap4_string"/&gt;" specifies the lexical class for the values that this attribute can have. In this case, the "Principal_Investigator" attribute is defined to have a DAP4 string value. Similar notation is used for values occurring as text within an xml element.


This document informs clients that a while a previous request for an asynchronous response has been accepted, the result is not available.
The lexical specification later in this section defines the legal lexical structure for such items. Specifically, it defines the format of the following lexical items.
<ol>
<li> Constants, namely: string, float, integer, character, and opaque.
<li> Identifiers
<li> Fully qualified names (also referred to as FQNs)
(Section [[#Fully Qualified Names|5.3]]).
</ol>
The specification is written using the extended POSIX regular expression notation [11] with some additions.
<ol>
<li> Names are assigned to regular expressions using the notation "name = regular-expression"
<li> Named expressions can be used in subsequent regular expressions by using the notation "{name}". Such occurrences are equivalent to textually substituting the expression associated with name for the "{name}" occurrence.
</ol>
Notes:
<ol>
<li> The definition of {UTF8} is deferred to the next section.
<li> Comments are indicated using the "//" notation. Standard xml escape formats (&amp;x#DDD; or &amp;{name};) are assumed to be used as needed.
</ol>


<font size="2"><source lang="xml">
===Basic character set definitions===
<AsynchronousResponse status="pending"/>
<blockquote>
</source></font>
<pre>
CONTROLS  = [\x00-\x1F] // ASCII control characters


This response document MUST be associated with the [[#409_Conflict_-_DAP4_Response_Not_Ready | 409 HTTP response code]].
WHITESPACE = [ \r\n\t\f]+


Servers SHOULD return this response document and it's associated HTTP status of 409,  but servers MAY return any document in the response body along with either a a 404 (Not Found) or a 400 (Bad Request) HTTP status.
HEXCHAR    = [0-9a-zA-Z]


===== DAP4 Asynchronous Response Gone =====
// ASCII printable characters


This document informs clients that a while a previous request for an asynchronous response has been accepted, the result is ''no longer'' available.
ASCII = [0-9a-zA-Z !"#$%&amp;'()*+,-./:;&lt;=&gt;?@[\\\]\\^_`|{}~]
</pre>
</blockquote>


<font size="2"><source lang="xml">
===Ascii characters that may appear unescaped in Identifiers===
<AsynchronousResponse status="gone"/>
</source></font>


This response document MUST be associated with the [[#410_Gone_-_DAP4_Response_No_Longer_Available| 410 HTTP status code]].  
This is assumed to be basically all ASCII printable characters except these characters: '.', '/', '"', '&#39;',  and '&amp;'. Occurrences of these characters are assumed to be representable using the standard XML &amp;{name}; notation (e.g. &amp;amp;). In this expression, backslash is interpreted as an escape character.


Servers SHOULD return this response document and it's associated HTTP status of 410,  but servers MAY return any document in the response body along with either a a 404 (Not Found) or a 400 (Bad Request) HTTP status.
<blockquote>
<pre>
IDASCII=[0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]
</pre>
</blockquote>


===== DAP4 Asynchronous Request Rejected =====
===The Numeric Constant Classes: integer and float===
<blockquote>
<pre>
INTEGER    = {INT}|{UINT}|{HEXINT}


This document informs clients that a request for an asynchronous response has been rejected, even though the client said it is willing to process an asynchronous response. There are at least as many reasons a server might reject the request for an asynchronous response as there are systems that might return such responses. However, this design provides suggested response codes for cases that seem likely so that clients can make educated decisions about the reason for the rejection. The reason codes supported are:
INT        = [+-][0-9]+{INTTYPE}?
;time: The client indicated that it was only willing to wait ''X'' seconds and the server thought it would take more time to build the result.
;unavailable: A needed resource is not available. This might indicate that hardware, like a robot tape system, cannot be currently accessed.
;privileges: The client is not allowed to make the request.
;other: Self evident...
In addition to the reason codes, this response will contain a text description of the reason for rejection.


Servers SHOULD make every effort to use the correct reason codes and provide cogent descriptions.
UINT      = [0-9]+{INTTYPE}?


<font size="2"><source lang="xml">
HEXINT     = {HEXSTRING}{INTTYPE}?
<AsynchronousResponse status="rejected">
     <reason code="time"/>
    <description>Acceptable access delay was less than estimated delay.</description>
</AsynchronousResponse>
</source></font>


This response document MUST associated with the 412 HTTP status code.
INTTYPE    = ([BbSsLl]|"ll"|"LL")


Servers SHOULD return this response document along with an HTTP status of 412, but servers MAY return any document in the response body along with an HTTP status of 404 (Not Found) or of 400 (Bad Request) in its place.
HEXSTRING  = (0[xX]{HEXCHAR}+)


===== DAP4 Error =====
FLOAT      = ({MANTISSA}{EXPONENT}?)|{NANINF}


If the server encounters an error it must MUST (MAY?) return an HTTP status of 500 (Internal Error) along with a  request body and other headers compliant with the [[DAP4_Web_Services_v3#DAP4_Error_Response | DAP4 Error Response]] and [[DAP4_Web_Services_v3#Status_Codes | Status Codes]] sections of the [[DAP4_Web_Services_v3 | web services specification]]. The request should not be repeated.
EXPONENT  = ([eE][+-]?[0-9]+)


=== Examples ===
MANTISSA  = [+-]?[0-9]*\.[0-9]*


==== Constrained Data Request-Response using GET ====
NANINF    = (-?inf|nan|NaN)B.1.4 The String Constant Class


; Simple Request:
STRING    = ([^"&amp;&lt;&gt;]|{XMLESCAPE})*
<font size="2"><pre>
GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org
</pre></font>


If the server decides it needs to handle this request in an asynchronous manner, it will refuse the request because it did not say it would accept an asynchronous response.
CHAR      = ([^'&amp;&lt;&gt;]|{XMLESCAPE})


; Response:
URL        = (http|https|[:][/][/][a-zA-Z0-9\-]+
<font size="2"><source lang="xml">
            ([.][a-zA-Z\-]+)+([:][0-9]+)?
400 DAP Asynchronous Response Required
            ([/]([a-zA-Z0-9\-._,'\\+%)*
X-DAP-Async-Required: true
            ([?].+)?([#].+)?
Content-Type: text/xml;charset=UTF-8
</pre>
</blockquote>
<AsynchronousResponse status="required">
  <expectedDelay seconds="600" />
  <responseLifetime seconds="3600"/>
</AsynchronousResponse>
</source></font>
 
==== Constrained Data Request-Response with DAP-Async-Accept Request Header ====


Request:
===The String/URL Constant Class===
<font size="2"><pre>
<blockquote>
GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
<pre>
Host: server.org
STRING = "\({SIMPLESTRING}{ESCAPEDQUOTE}?\)*"
X-DAP-Async-Accept: 0
SIMPLESTRING = [^"\\]
ESCAPEDQOTE=\\"
</pre>
</blockquote>


</pre></font>
===The Opaque Constant Class===
<blockquote>
<pre>
OPAQUE = 0x([0-9A-Fa-f] [0-9A-Fa-f])+
</pre>
</blockquote>


Alternately, this request would produce the same result using only the URL:
There is a semantic constraint that if there is an odd
<font size="2"><pre>
number of hex digits in the opaque constant, a zero hex digit
GET /dap/path/data.nc.dap?accept=0&projection=x,y,temp HTTP/1.1
will be added to the end to ensure that the constant represents
Host: server.org
a set of 8-bit bytes.
</pre></font>


Response:
===The Identifier Class===
<font size="2"><source lang="xml">
<blockquote>
202 Accepted
<pre>
Content-Type: text/xml;charset=UTF-8
ID        = {IDCHAR}+


<AsynchronousResponse status="accepted">
IDCHAR    = ({IDASCII}|{XMLESCAPE}|{UTF8})
  <expectedDelay seconds="600" />
  <responseLifetime seconds="3600"/>
  <link href="http://server.org/async/path/result" />
</AsynchronousResponse>
</source></font>


'''NB''': This example originally included an ''Accept'' header with the value of ''multipart/mixed''. However, that is not a good example. The HTTP/1.1 specification says that when a specific media type is indicated as the only one acceptable, a server must return a 406 response code if it cannot return that media type. The meaning of ''Accept: */*'' is the same as not including the header, so I have removed the header from these examples. We need to be heads up in the ways that we suggest that header should be used by clients
XMLESCAPE  = [&amp;][#][0-9]+;
 
</pre>
==== Constrained Data Request-Response with conditional DAP-Async-Accept Request Headers ====
</blockquote>


Request:
===The Atomic Type Class===
<font size="2"><pre>
<blockquote>
GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
<pre>
Host: server.org
ATOMICTYPE =   Char | Byte
X-DAP-Async-Accept: 60
            | Int8 | UInt8 | Int16 | UInt16
</pre></font>
            | Int32 | UInt32 | Int64 | UInt64
            | Float32 | Float64
            | String | URL
            | Enum
            | Opaque ;
</pre>
</blockquote>
This list should be consistent with the atomic types in the grammar.


Alternately, this request would produce the same result using only the URL:
===The Fully Qualified Name Class===
<font size="2"><pre>
<blockquote>
GET /dap/path/data.nc.dap?acceptAsync=60&projection=x,y,temp HTTP/1.1
<pre>
Host: server.org
FQN      = ([/]{EID})+([.]{EID})*
</pre></font>
EID      = {EIDCHAR}+
EIDCHAR  = ({EIDASCII}|{XMLESCAPE}|{UTF8})
EIDASCII = [0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]
</pre>
</blockquote>
This should be consistent with the definition in Section [[#Fully Qualified Names|5.3]].


Response:
==Appendix 2. DAP4 Type Definitions==
<font size="2"><source lang="xml">
412 Precondition Failed
Content-Type: text/xml;charset=UTF-8
<AsynchronousResponse status="rejected">
    <reason code="time"/>
    <description>Acceptable access delay was less than estimated delay.</description>
</AsynchronousResponse>
</source></font>


==== Premature Request For Asynchronous Result ====
The RELAXNG [13] grammar references the following specific types. For each type, the following table give the lexical format as defined by the patterns previously given or by specific patterns as listed.


Request:
<table border=1 width="50%">
<font size="2"><pre>
<tr><th>RELAXNG Data Type Name<th>Lexical Pattern
GET /async/path/data.nc?projection=x,y,temp HTTP/1.1
<tr><td>dap4_integer<td>{INTEGER}
Host: server.org
<tr><td>dap4_float<td>{FLOAT}
</pre></font>
<tr><td>dap4_char<td>{CHAR}
 
<tr><td>dap4_string<td>{STRING}
Alternately, this request would produce the same result using only the URL:
<tr><td>dap4_opaque<td>{OPAQUE}
<font size="2"><pre>
<tr><td>dap4_id<td>{ID}
GET /async/path/data.nc?projection=x,y,temp HTTP/1.1
<tr><td>dap4_fqn<td>{FQN}
Host: server.org
<tr><td>dap4_uri<td>{URL}
</pre></font>
<tr><td>dap4_dim<td>[1-9][0-9]*
</table>


Note that the above lexical element classes are not disjoint.  The type element "&lt;datatype=.../&gt;" should be sufficient to interpret the type within the DMR.


Response:
==Appendix 3. UTF-8==
<font size="2"><source lang="xml">
The UTF-8 specification [14] defines several ways to validate a UTF-8 string of characters.
409 Conflict
Content-Type: text/xml;charset=UTF-8


<AsynchronousResponse status="pending"/>
The full (most correct) validating version of UTF8 character set is as follows.
</source></font>
<blockquote>
<pre>
UTF8 =   ([\xC2-\xDF][\x80-\xBF])
      | (\xE0[\xA0-\xBF][\x80-\xBF])
      | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])
      | (\xED[\x80-\x9F][\x80-\xBF])
      | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])
      | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])
      | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])
      | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])
</pre>
</blockquote>
The lines of the above expression cover the UTF-8 characters as follows:
1. non-overlong 2-byte
2.  excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16


== Error Responses ==
Note that values from 0 through 127 (ASCII and control characters)
are not included in any of these definitions.


DAP4 Error Response documents are XML documents that have a '''&lt;dap4:Error&gt;''' as their root element and that may contain any or all of the following
The above reference also defines some alternative regular expressions. First, there is what is termed the partially relaxed version of UTF8 defined by this regular expression.
inner elements:
<blockquote>
 
<pre>
'''&lt;dap4:code&gt;'''  A numerically valued error code. This code must be associated with a protocol, such as HTTP, via the ''protocol'' attribute. It is not a requirement that the protocol over which the originating request and subsequent Error response where transmitted be the same as the protocol identified by the value of the ''protocol'' attribute.
UTF8 =    ([\xC0-\xD6][\x80-\xBF])
        | ([\xE0-\xEF][\x80-\xBF][\x80-\xBF])
        | ([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])
</pre>
</blockquote>
Second, there is what is termed the most-relaxed version of UTF8 defined by this regular expression.
<blockquote>
<pre>
UTF8 = ([\xC0-\xD6]...)|([\xE0-\xEF)...)|([\xF0 \xF7]...)
</pre>
</blockquote>
Any conforming DAP4 implementation MUST use at least the most-relaxed expression for validating UTF-8 character strings, but MAY use either the partially-relaxed or the full validation expression.  


'''&lt;dap4:Message&gt;''' A short informative text message describing the error.
==Appendix 4. LALR(1) Grammar for DMR using Bison Notation==
It is conventient to have a Bison grammar that corresponds to the above RELAX NG grammar. If there is a conflict, then the RELAX NG grammar is considered correct.


'''&lt;dap4:Context&gt;''' Textual information describing the context in which the error occurred: position of a parse error in a constraint expression, for example.
<source lang="bnf">
%start dataset
%%
dataset:
DATASET_
xml_attribute_list
groupbody
_DATASET
;


'''&lt;dap4:OtherInformation&gt;''' Arbitrary additional text information: a Java stack trace, for example.
group:
GROUP_
ATTR_NAME
groupbody
_GROUP
;


=== Error Response Resource Role ===
groupbody:
  %empty
| groupbody dimdef
| groupbody enumdef
| groupbody variable
| groupbody metadata
| groupbody group
;


DAP4 Error Responses are identified by the resource role:
enumdef:
ENUMERATION_
xml_attribute_list
enumconst_list
_ENUMERATION
;


: '''<font size="2"><code><nowiki>http://services.opendap.org/dap4/error</nowiki></code></font>'''
enumconst_list:
  enumconst
| enumconst_list enumconst
;


=== Normative Encoding of the Error Response ===
enumconst:
  ENUMCONST_ ATTR_NAME ATTR_VALUE _ENUMCONST
| ENUMCONST_ ATTR_VALUE ATTR_NAME _ENUMCONST
;


The normative XML representation for the Error Response is defined in Appendix x "Normative XML Encoding of the Error Response". The media type for the normative XML representation is:
dimdef:
DIMENSION_
xml_attribute_list
metadatalist
_DIMENSION
;


: <font size="2"><code>'''application/vnd.opendap.dap4.error.xml'''</code></font>
dimref:
  DIM_ ATTR_NAME _DIM
| DIM_ ATTR_SIZE _DIM
;


=== Examples ===
variable:
; Not Found
  atomicvariable
<font size="2">
| enumvariable
<source lang="xml">
| structurevariable
<?xml version="1.0" encoding="UTF-8"?>
| sequencevariable
<dap4:Error xmlns:dap4="http://xml.opendap.org/ns/DAP/4.0#">
;
    <dap4:code protocol="http">404</dap4:code>
    <dap4:Message>Unable to locate requested resource</dap4:Message>
</dap4:Error>
</source>
</font>
; Parse Error
<font size="2">
<source lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
<dap4:Error xmlns:dap4="http://xml.opendap.org/ns/DAP/4.0#">
    <dap4:code protocol="http">400</dap4:code>
    <dap4:Message>Bad DAP4 Request Syntax</dap4:Message>
    <dap4:Context>
        The constraint expression "?u[3][lat<66]" failed to parse at the sub expression "lat<66".
        Relational syntax expressions are not supported for array subsetting operations.
    </dap4:Context>
</dap4:Error>
</source>
</font>


==References==
atomicvariable:
atomictype_
ATTR_NAME
varbody
_atomictype
;


<ol>
enumvariable:
<li><div id="Ref-1"></div>
ENUM_
Caron, J.,
xml_attribute_list
<i>Unidata's Common Data Model Version 4</i>, 2012
varbody
(http://www.unidata.ucar.edu/software/netcdf-java/CDM/).
_ENUM
;


<li><div id="Ref-2"></div>
atomictype_:
Folk, M. and E. Pourmal,
  CHAR
<i>HDF5 Data Model, File Format and Library &mdash; HDF5 1.6</i>,
| BYTE
Category: Recommended Standard January 2007
| INT8
NASA Earth Science Data Systems Recommended Standard ESDS-RFC-007, 2007
| UINT8
(http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/ese-rfc-007/ESDS-RFC-007v1.pdf).
| INT16
| UINT16
| INT32
| UINT32
| INT64
| UINT64
| FLOAT32
| FLOAT64
| STRING
| URL
| OPAQUE
;


<li><div id="Ref-3"></div>
_atomictype:
Gallagher J., N. Potter, T. Sgouros, S. Hankin, and G. Flierl,
  _CHA
<i>The Data Access Protocol&mdash;DAP 2.0</i>,
| _BYT
NASA Earth Science Data Systems Recommended Standard ESE-RFC-004.1.2
| _INT
(http://opendap.org/pdf/ESE-RFC-004v1.2.pdf).
| _UINT
| _INT1
| _UINT1
| _INT3
| _UINT3
| _INT6
| _UINT6
| _FLOAT3
| _FLOAT6
| _STRIN
| _UR
| _OPAQU
| _ENUM
;
 
varbody:
  %empty
| varbody dimref
| varbody mapref
| varbody metadata
;


<li><div id="Ref-4"></div>
mapref:
Gosling, J., B. Joy, G. Steele, G. Bracha, A Buckley,
MAP_
<i>The Java™ Language Specification &mdash; 7th Editition</i>
ATTR_NAME
Oracle Corporation, 2012,
metadatalist
(http://docs.oracle.com/javase/specs/jls/se7/html/).
_MAP
;


<li><div id="Ref-5"></div>
structurevariable:
Hartnett, E.,
STRUCTURE_
<i>netCDF-4/HDF5 File Format</i>,
ATTR_NAME
NASA Earth Science Data Systems Recommended Standard ESDS-RFC-022, 2011
structbody
(http://earthdata.nasa.gov/sites/default/files/field/document/ESDS-RFC-022v1.pdf).
_STRUCTURE
;


<li><div id="Ref-6"></div>
structbody:
IEEE, <i>IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985</i>, Digital Object Identifier: 10.1109/IEEESTD.1985.82928, 1985.
  %empty
| structbody variable
| structbody dimref
| structbody mapref
| structbody metadata
;


<li><div id="Ref-7"></div>
sequencevariable:
The Internet Society, <i>IETF RFC 2119:
SEQUENCE_
Key words for use in RFCs to Indicate Requirement Levels
ATTR_NAME
</i>, 1997
sequencebody
(http://tools.ietf.org/html/rfc2119).
_SEQUENCE
;


<li><div id="Ref-8"></div>
sequencebody:
The Internet Society, <i>IETF RFC 2396:
  %empty
Uniform Resource Identifiers (URI): Generic Syntax
| sequencebody dimref
</i>, 1998
| sequencebody variable
(http://tools.ietf.org/html/rfc2396).
| sequencebody mapref
| sequencebody metadata
;


<li><div id="Ref-9"></div>
metadatalist:
The Internet Society, <i>IETF RFC 2616:
  %empty
Hypertext Transfer Protocol &mdash; HTTP/1.1
| metadatalist metadata
</i>, 1999
;
(http://tools.ietf.org/html/rfc2616).
 
metadata:
  attribute
;


<li><div id="Ref-10"></div>
attribute:
The Internet Society, <i>IETF RFC 4506: XDR: External Data Representation Standard</i>, 2006
  atomicattribute
(http://tools.ietf.org/html/rfc4506).
| containerattribute
| otherxml
;


<li><div id="Ref-11"></div>
ISO/IEC,
<i>Information technology &mdash; Portable Operating System Interface (POSIX) &mdash; Part 2: Shell and Utilities</i>,
ISO/IEC 9945-2,1993
(http://www.iso.org/iso/catalogue_detail.htm?csnumber=17841).


<li><div id="Ref-12"></div>
atomicattribute:
The Open Geospatial Consortium Inc.,
  ATTRIBUTE_
<i>Abstract Specifications</i>,
  xml_attribute_list
(http://www.opengeospatial.org/standards/as).
  namespace_list
  valuelist
  _ATTRIBUTE
|
  ATTRIBUTE_
  xml_attribute_list
  namespace_list
  _ATTRIBUTE
;


<li><div id="Ref-13"></div>
namespace_list:
The Organization for the Advancement of Structured Information Standards,
  %empty
<i>RELAX NG Specification</i>,
| namespace_list namespace
Committee Specification: 2001,
;
J. Clark, M. Makoto (eds.)
(http://relaxng.org/spec-20011203.html).


<li><div id="Ref-14"></div>
namespace:
The Unicode Consortium. <i>The Unicode Standard, Version 6.2.0</i>,  ISBN 978-1-936213-07-8, 2012.
NAMESPACE_
ATTR_HREF
_NAMESPACE
;


<li><div id="Ref-15"></div>
containerattribute:
Unidata,
  ATTRIBUTE_
<i>CF Metadata</i>,
  xml_attribute_list
(http://www.cfconventions.org/).
  namespace_list
  attributelist
  _ATTRIBUTE
;


<li><div id="Ref-16"></div>
attributelist:
W3C, <i>Extensible Markup Language (XML) 1.0</i>,
  attribute
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau (eds.),
| attributelist attribute
Fifth Edition. 2008
;
(http://www.w3.org/TR/2008/REC-xml-20081126/).
 
valuelist:
  value
| valuelist value
;
 
value:
  VALUE_ TEXT _VALUE
| VALUE_ ATTR_VALUE _VALUE
;
 
otherxml:
OTHERXML_
xml_attribute_list
xml_body
_OTHERXML
;
 
xml_body:
  element_or_text
| xml_body element_or_text
;


<li><div id="Ref-17"></div>
element_or_text:
World Meteorological Organization,
  xml_open
<i>FM 92 GRIB</i>,
  xml_attribute_list
edition 2, version 2, 2003
  xml_body
(http://www.wmo.int/pages/prog/www/DPS/FM92-GRIB2-11-2003.pdf).007
  xml_close
</ol>
| TEXT
;


=Appendices=
xml_attribute_list:
  %empty
| xml_attribute_list xml_attribute
;


==FQN Syntax==
xml_attribute:
  ATTR_BASE
| ATTR_BASETYPE
| ATTR_DAPVERSION
| ATTR_DMRVERSION
| ATTR_ENUM
| ATTR_HREF
| ATTR_NAME
| ATTR_NAMESPACE
| ATTR_NS
| ATTR_SIZE
| ATTR_TYPE
| ATTR_VALUE
;


An FQN has two parts. First, there is the path, which refers to Group traversal, and second is the suffix, which refers to the traversal of Structures. An FQN is the concatenation of the path with the suffix and separated by the '/' Character. The suffix may not exist if O is a group or is not a Structure typed variable.
xml_open:
  DATASET_
| GROUP_
| ENUMERATION_
| ENUMCONST_
| NAMESPACE_
| DIMENSION_
| DIM_
| ENUM_
| MAP_
| STRUCTURE_
| SEQUENCE_
| VALUE_
| ATTRIBUTE_
| OTHERXML_
| CHAR_
| BYTE_
| INT8_
| UINT8_
| INT16_
| UINT16_
| INT32_
| UINT32_
| INT64_
| UINT64_
| FLOAT32_
| FLOAT64_
| STRING_
| URL_
| OPAQUE_
;


Fully qualified names conform to the following syntax.
xml_close:
  _DATASET
| _GROUP
| _ENUMERATION
| _ENUMCONST
| _NAMESPACE
| _DIMENSION
| _DIM
| _ENUM
| _MAP
| _STRUCTURE
| _SEQUENCE
| _VALUE
| _ATTRIBUTE
| _OTHERXML
| _CHAR
| _BYTE
| _INT8
| _UINT8
| _INT16
| _UINT16
| _INT32
| _UINT32
| _INT64
| _UINT64
| _FLOAT32
| _FLOAT64
| _STRING
| _URL
| _OPAQUE
;
</source>


<blockquote>
===Lexical Tokens for Bison Grammar===
<pre>
The above Bison grammar assumes a corresponding lexer
FQN:  grouppath
that will return a set of token types (listed below).
    | grouppath '/' name
The token with a trailing underscore represents an opening XML element
    | grouppath '/' structurepath
and a token with a leading underscore represents a closing XML element.
    | grouppath '/' structurepath '.' name
So, for example, token ''DATASET_'' is ''&lt;Dataset&gt;''
and token ''_DATASET'' is ''&lt;/Dataset&gt;''.


grouppath: /*empty*/ | grouppath '/' groupname
<source lang="bnf">
/* XML Element Names */
%token DATASET_ _DATASET
%token GROUP_ _GROUP
%token ENUMERATION_ _ENUMERATION
%token ENUMCONST_ _ENUMCONST
%token NAMESPACE_ _NAMESPACE
%token DIMENSION_ _DIMENSION
%token DIM_ _DIM
%token MAP_ _MAP
%token STRUCTURE_ _STRUCTURE
%token SEQUENCE_ _SEQUENCE
%token VALUE_ _VALUE
%token ATTRIBUTE_ _ATTRIBUTE
%token OTHERXML_ _OTHERXML
%token ERROR_ _ERROR
%token MESSAGE_ _MESSAGE
%token CONTEXT_ _CONTEXT
%token OTHERINFO_ _OTHERINFO


structurepath: /*empty*/ | structurepath '.' structname
/* XML Element Names for Atomic Types*/
</pre>
%token CHAR_ _CHAR
</blockquote>
%token BYTE_ _BYTE
%token INT8_ _INT8
%token UINT8_ _UINT8
%token INT16_ _INT16
%token UINT16_ _UINT16
%token INT32_ _INT32
%token UINT32_ _UINT32
%token INT64_ _INT64
%token UINT64_ _UINT64
%token FLOAT32_ _FLOAT32
%token FLOAT64_ _FLOAT64
%token STRING_ _STRING
%token URL_ _URL
%token OPAQUE_ _OPAQUE
%token ENUM_ _ENUM


To write a path for an object O, follow these steps.
/* XML Attribute Names */
<ol>
%token ATTR_BASE ATTR_BASETYPE ATTR_DAPVERSION ATTR_DMRVERSION
<li><!--0-->
%token ATTR_ENUM ATTR_HREF ATTR_NAME ATTR_NAMESPACE
Locate the closest enclosing group G for O.  If O is a group, then O and G will be the same.
%token ATTR_NS ATTR_SIZE ATTR_TYPE ATTR_VALUE
%token ATTR_HTTPCODE


<li> Create the scope prefix for O by traversing a path through the Group tree,
/* Arbitrary XML Text */
starting with the Dataset and continuing down to and including G. Concatenate the group names on that path and separating them with '/'. The name for Dataset is ignored, hence the FQN will begin with "/".
%token TEXT
</ol>
</source>


If O is not a Structure typed variable, then we are done and the FQN for O is just the path. Otherwise, the suffix must be computed as follows.
==Appendix 5. LALR(1) Grammar for Constraints using Bison Notation==
<source lang="bnf">
%start constraint
%%
constraint:
dimredeflist
clauselist
;


<ol>
dimredeflist:
<li> Traverse the nested Structure declarations from G to O, including O, but not including G in the path. Traversal here means following the enclosing Structure typed variables until O is reached.
          %empty
        | dimredeflist ';' dimredef
        ;


<li> Concatenate the names on that suffix path and separating them with '.' to create a suffix.
clauselist:
          clause
        | clauselist ';' clause
        ;


<li> Create the final FQN as the concatenation of the path, the character '/', and the suffix.
clause:
</ol>
          projection
| selection
        ;


==DAP4 Lexical Elements==
projection:
This section describes the lexical elements that occur in the DAP4 DMR.
segmenttree
        ;


Within the RELAXNG DAP4 grammar
segmenttree:
(Section [[#DAP4 DMR Syntax as a RELAX NG Schema|13]])
          segment
there are markers for occurrences of primitive type such as integers, floats, or strings (ignoring case). The markers typically look like this when defining an attribute that can occur in the DAP4 DMR.
        | segmenttree '.' segment
        | segmenttree '.' '{' segmentforest '}'
        | segmenttree '{' segmentforest '}'
        ;


<blockquote>
segmentforest:
<pre>
  segmenttree
&lt;attribute name="Principal_Investigator"&gt;
| segmentforest ',' segmenttree
&lt;datatype="dap4_string"/&gt;
;
&lt;/attribute&gt;
</pre>
</blockquote>
The "&lt;data type="dap4_string"/&gt;" specifies the lexical class for the values that this attribute can have. In this case, the "Principal_Investigator" attribute is defined to have a DAP4 string value. Similar notation is used for values occurring as text within an xml element.


The lexical specification later in this section defines the legal lexical structure for such items. Specifically, it defines the format of the following lexical items.
segment:
<ol>
          NAME
<li> Constants, namely: string, float, integer, character, and opaque.
        | NAME slicelist
        ;


<li> Identifiers
slicelist:
          slice
        | slicelist slice
        ;


<li> Fully qualified names (also referred to as FQNs)
slice:
(Section [[#Fully Qualified Names|5.3]]).
          '[' ']'
</ol>
        | '[' subsetlist ']'
The specification is written using the extended POSIX regular expression notation [11] with some additions.
;
<ol>
<li> Names are assigned to regular expressions using the notation "name = {regular expression}"


<li> Named expressions can be used in subsequent regular expressions by using the notation "{name}". Such occurrences are equivalent to textually substituting the expression associated with name for the "{name}" occurrence.
subsetlist:
</ol>
  subset
Notes:
| subsetlist ',' subset
<ol>
;
<li> The definition of {UTF8} is deferred to the next section.
 
subset:
<li> Comments are indicated using the "//" notation. Standard xml escape formats (&amp;x#DDD; or &amp;{name};) are assumed to be used as needed.
          index
</ol>
        index ':' index
 
         |  index ':' index ':' index  
===Basic character set definitions===
         | index ':'
<blockquote>
         | index ':' index ':'
<pre>
         ;
CONTROLS  = [\x00-\x1F] // ASCII control characters
 
WHITESPACE = [ \r\n\t\f]+
 
HEXCHAR    = [0-9a-zA-Z]
 
// ASCII printable characters
 
ASCII = [0-9a-zA-Z !"#$%&amp;'()*+,-./:;&lt;=&gt;?@[\\\]\\^_`|{}~]
</pre>
</blockquote>
 
===Ascii characters that may appear unescaped in Identifiers===
 
This is assumed to be basically all ASCII printable characters except these characters: '.', '/', '"', '&#39;', and '&amp;'. Occurrences of these characters are assumed to be representable using the standard xml &amp;{name}; notation (e.g. &amp;amp;). In this expression, backslash is interpreted as an escape character.
 
<blockquote>
<pre>
IDASCII=[0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]
</pre>
</blockquote>
 
===The Numeric Constant Classes: integer and float===
<blockquote>
<pre>
INTEGER    = {INT}|{UINT}|{HEXINT}
 
INT        = [+-][0-9]+{INTTYPE}?
 
UINT      = [0-9]+{INTTYPE}?
 
HEXINT    = {HEXSTRING}{INTTYPE}?
 
INTTYPE    = ([BbSsLl]|"ll"|"LL")
 
HEXSTRING = (0[xX]{HEXCHAR}+)
 
FLOAT      = ({MANTISSA}{EXPONENT}?)|{NANINF}
 
EXPONENT  = ([eE][+-]?[0-9]+)
 
MANTISSA  = [+-]?[0-9]*\.[0-9]*
 
NANINF    = (-?inf|nan|NaN)B.1.4 The String Constant Class
 
STRING    = ([^"&amp;&lt;&gt;]|{XMLESCAPE})*
 
CHAR      = ([^'&amp;&lt;&gt;]|{XMLESCAPE})
 
URL        = (http|https|[:][/][/][a-zA-Z0-9\-]+
            ([.][a-zA-Z\-]+)+([:][0-9]+)?
            ([/]([a-zA-Z0-9\-._,'\\+%)*
            ([?].+)?([#].+)?
</pre>
</blockquote>
 
===The String/URL Constant Class===
<blockquote>
<pre>
STRING = "\({SIMPLESTRING}{ESCAPEDQUOTE}?\)*"
SIMPLESTRING = [^"\\]
ESCAPEDQOTE=\\"
</pre>
</blockquote>
 
===The Opaque Constant Class===
<blockquote>
<pre>
OPAQUE = 0x([0-9A-Fa-f] [0-9A-Fa-f])+
</pre>
</blockquote>
 
There is a semantic constraint that if there is an odd
number of hex digits in the opaque constant, a zero hex digit
will be added to the end to ensure that the constant represents
a set of 8-bit bytes.
 
===The Identifier Class===
<blockquote>
<pre>
ID        = {IDCHAR}+
 
IDCHAR    = ({IDASCII}|{XMLESCAPE}|{UTF8})
 
XMLESCAPE  = [&amp;][#][0-9]+;
</pre>
</blockquote>
 
===The Atomic Type Class===
<blockquote>
<pre>
ATOMICTYPE =  Char | Byte
            | Int8 | UInt8 | Int16 | UInt16
            | Int32 | UInt32 | Int64 | UInt64
            | Float32 | Float64
            | String | URL
            | Enum
            | Opaque ;
</pre>
</blockquote>
This list should be consistent with the atomic types in the grammar.
 
===The Fully Qualified Name Class===
<blockquote>
<pre>
FQN      = ([/]{EID})+([.]{EID})*
EID      = {EIDCHAR}+
EIDCHAR  =  ({EIDASCII}|{XMLESCAPE}|{UTF8})
EIDASCII = [0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]
</pre>
</blockquote>
This should be consistent with the definition in Section [[#Fully Qualified Names|5.3]].
 
===DAP4 Type Definitions===
 
The RELAXNG [13] grammar references the following specific types. For each type, the following table give the lexical format as defined by the patterns previously given or by specific patterns as listed.
 
<table border=1 width="50%">
<tr><th>RELAXNG Data Type Name<th>Lexical Pattern
<tr><td>dap4_integer<td>{INTEGER}
<tr><td>dap4_float<td>{FLOAT}
<tr><td>dap4_char<td>{CHAR}
<tr><td>dap4_string<td>{STRING}
<tr><td>dap4_opaque<td>{OPAQUE}
<tr><td>dap4_vdim<td>[*]
<tr><td>dap4_id<td>{ID}
<tr><td>dap4_fqn<td>{FQN}
<tr><td>dap4_uri<td>{URL}
<tr><td>dap4_dim<td>[0-9]+
</table>
 
Note that the above lexical element classes are not disjoint.  The type element "&lt;datatype=.../&gt;" should be sufficient to interpret the type within the DMR.
 
===UTF-8===
 
The UTF-8 specification [14] defines several ways to validate a UTF-8 string of characters.
 
The full (most correct) validating version of UTF8 character set is as follows.
 
<blockquote>
<pre>
UTF8 =  ([\xC2-\xDF][\x80-\xBF])
      | (\xE0[\xA0-\xBF][\x80-\xBF])
      | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])
      | (\xED[\x80-\x9F][\x80-\xBF])
      | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])
      | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])
      | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])
      | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])
</pre>
</blockquote>
The lines of the above expression cover the UTF-8 characters as follows:
 
<ol>
<li> non-overlong 2-byte
<li>  excluding overlongs
<li> straight 3-byte
<li> excluding surrogates
<li> straight 3-byte
<li> planes 1-3
<li> planes 4-15
<li> plane 16
</ol>
Note that values from 0 through 127 (ASCII and control characters)
are not included in this any of these definitions.
 
The above reference also defines some alternative regular expressions.
 
There is what is termed the partially relaxed version of UTF8 defined by this regular expression.
 
<blockquote>
<pre>
UTF8 =    ([\xC0-\xD6][\x80-\xBF])
        | ([\xE0-\xEF][\x80-\xBF][\x80-\xBF])
         | ([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])
</pre>
</blockquote>
Second, there is what is termed the most-relaxed version of UTF8 defined by this regular expression.
 
<blockquote>
<pre>
UTF8 = ([\xC0-\xD6]...)|([\xE0-\xEF)...)|([\xF0 \xF7]...)
</pre>
</blockquote>
Any conforming DAP4 implementation MUST use at least the most-relaxed expression for validating UTF-8 character strings, but MAY use either the partially-relaxed or the full validation expression.
 
==DAP4 Error Response Format==
The Error Response is defined to be an XML document
with media type <i>application/vnd.org.opendap.dap4.error.xml</i>.
The specific format of the error response is defined in this
document:
http://docs.opendap.org/index.php/DAP4_Web_Services_v3#DAP4_Error_Response
 
==DAP4 DMR Syntax as a RELAX NG Schema==
The RELAX NG grammar for the DMR currently resides at this URL.
https://scm.opendap.org/svn/trunk/dap4/dap4.rng
 
== DAP4 Dataset Services Response: Examples and XML Schema ==
 
=== Example: A fully featured and annotated DSR document ===
 
This example contains all of the optional components for a hypothetical DAP4 server that also supports DAp2 requests, server side functions, and asynchronous transactions.
 
<font size="2">
<source lang="xml" >
 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="/opendap/xsl/serviceDescription.xsl"?>
<DatasetServices xmlns="http://xml.opendap.org/ns/DAP/4.0/dataset-services#"
                xml:base="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc">
 
    <Description>Our friend fnoc1.nc</Description>
 
    <!-- ##################################################################################### -->
    <!--      DAP versions this server supports                                              -->
    <!-- ..................................................................................... -->
    <DapVersion>4.0</DapVersion>
    <DapVersion>3.2</DapVersion>
    <DapVersion>2.0</DapVersion>
   
    <!-- ##################################################################################### -->
    <!--  The software version of the server that is providing the DAP4 Dataset Services      -->
    <!--  response.                                                                          -->
    <!-- The ServerSoftwareVersion element may contain text, or any XML element content as    -->
    <!-- long as the XML is not in the document namespace                                      -->
    <!-- ..................................................................................... -->
    <ServerSoftwareVersion>Hyrax-2.7.9</ServerSoftwareVersion>
   
   
    <!-- ##################################################################################### -->
    <!--      Required DAP4 Services                                                          -->
    <!-- ..................................................................................... -->
 
    <Service title="DAP4 Dataset Services" role="http://services.opendap.org/dap4/dataset-services">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_Dataset_Services_Description_Service">An index of the Services available for this data resource.</Description>
 
         <link description="Normative form of the DSR"
              type="application/vnd.opendap.org.dataset-services+xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc">
              <alt type="text/html"/>
              <alt type="text/xml"/>
        </link>
       
        <link description="HTML representation of the DSR" 
              type="text/html"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.html"/>
             
        <link description="Normative DSR with generic Content-Type"
              type="text/xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.xml"/>
             
    </Service>
 
    <Service title="DAP4 Dataset Metadata" role="http://services.opendap.org/dap4/dataset-metadata">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_Dataset_Service_-_The_metadata">The DAP4 metadata content for this data resource..</Description>
 
         <link description="Normative form of the DMR"
              type="application/vnd.org.opendap.dap4.dataset-metadata+xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr">
              <alt type="text/html"/>
              <alt type="text/xml"/>
              <alt type="application/rdf+xml"/>
        </link>
       
        <link description="Data Request Form"
              type="text/html"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr.html"/>
             
        <link description="Normative DMR with generic Content-Type"
              type="text/xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr.xml"/>
             
        <link description="RDF representation of DMR"
              type="application/rdf+xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr.rdf"/>
    </Service>
 
    <Service title="DAP4 Data" role="http://services.opendap.org/dap4/data">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_Data_Service">DAP4 Data object for this data resource.</Description>
        <link description="The normative form of the DAP4 Data Response"
              type="application/vnd.org.opendap.dap4.data" 
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dap" >
              <alt type="text/plain"/>
              <alt type="text/xml"/>
              <alt type="application/x-netcdf"/>
        </link>
       
        <link description="A comma separated values (CSV) representation of the DAP4 Data Response object."
              type="text/plain"  
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dap.ascii" />
        <link description="XML representation of the DAP4 Data Response object."
              type="text/xml" 
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dap.xml" />
        <link description="NetCDF-3 representation of the DAP4 Data Response object."
              type="application/x-netcdf" 
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dap.nc" />
    </Service>
 
    <!-- ##################################################################################### -->
    <!--  Optional DAP4 Related Services                                                      -->
    <!-- ..................................................................................... -->
 
    <Service title="ISO-19115 Metadata Service" role="http://services.opendap.org/dap4/iso-19115">
        <link description="Dataset metadata as ISO-19115"
              type="text/xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr.iso">
              <alt type="text/html"/>
        </link>
        <link description="ISO-19115 conformance score"
              type="text/html"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dmr.rubric">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_ISO_Conformance_Score_Service">ISO-19115 conformance score for the DMR.</Description>
    </Service>
 
    <Service title="File Access" role="http://services.opendap.org/dap4/file">
        <link type="text/xml ## This value depends on the file being accessed." 
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.file">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_Native_File_Access_Service">Access to dataset file.</Description>
    </Service>
 
 
 
    <!-- ##################################################################################### -->
    <!--      DAP2 Services                                                                  -->
    <!-- ..................................................................................... -->
    <Service title="DAP2 Data" role="http://services.opendap.org/dap2/dods">
         <link type="application/octet-stream"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dods">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP2:_Data_Service">DAP2 Data Object.</Description>
    </Service>
 
    <Service title="DDX" role="http://services.opendap.org/dap2/ddx">
        <link type="text/xml"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.ddx">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP2:_DDX_Service">OPeNDAP Data Description and Attribute XML Document.</Description>
    </Service>
 
    <Service title="DDS" role="http://services.opendap.org/dap2/dds">
        <link type="text/plain"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.dds">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP2:_DDS_Service">OPeNDAP Data Description Structure.</Description>
    </Service>
 
    <Service title="DAS" role="http://services.opendap.org/dap2/das" >
        <link type="text/plain"
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.das">
        <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP2:_DAS_Service">OPeNDAP Dataset Attribute Structure.</Description>
    </Service>


index:  INTEGER ;


    <Service title="INFO" role="http://services.opendap.org/dap2/info">
selection:
         <link type="text/html"
         segmenttree '|' filter
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.info">
         ;
         <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP2:_Info_Service">OPeNDAP Dataset Information Page.</Description>
    </Service>


    <Service title="Server Version" role="http://services.opendap.org/dap4/version">
filter:
         <link type="text/xml"
          predicate
              href="http://test.opendap.org:8090/opendap/hyrax/data/fnoc1.nc.ver">
         | predicate ',' predicate  /* ',' == AND */
         <Description href="http://docs.opendap.org/index.php/DAP4_Web_Services#DAP4:_Server_Version_Service">An XML document containing information about the software version of the server..</Description>
         | '!' predicate %prec NOT
    </Service>
        ;


predicate:
          primary relop primary
        | primary relop primary relop primary
        | primary eqop primary
        ;


    <!-- ##################################################################################### -->
relop:
    <!--      Server Side Functions                                                          -->
  '&lt;' '='
    <!-- ..................................................................................... -->
| '&gt;' '='
| '&lt;'
| '^gt;'
;


eqop:
  '=' '='
| '!' '='
| '~' '='
;


    <function name="geogrid" role="http://services.opendap.org/dap4/server-side-function/geogrid">
primary:
         <Description  href="http://docs.opendap.org/index.php/Server_Side_Processing_Functions#geogrid">Allows a DAP Grid variable to be sub-sampled using georeferenced values.</Description>
          fieldname
    </function>
        | constant
         | '(' predicate ')'
;


    <function name="grid" role="http://services.opendap.org/dap4/server-side-function/grid">
dimredef: NAME '=' slice ;
        <Description  href="http://docs.opendap.org/index.php/Server_Side_Processing_Functions#grid">Allows a DAP Grid variable to be sub-sampled using the values of the coordinate axes.</Description>
    </function>


    <function name="linear_scale" role="http://services.opendap.org/dap4/server-side-function/linear-scale">
fieldname: NAME
        <Description  href="http://docs.opendap.org/index.php/Server_Side_Processing_Functions#linear_scale">Applies a linear scale transform to the named variable.</Description>
    </function>


    <function name="version" role="http://services.opendap.org/dap4/server-side-function/version">
constant: STRING | INTEGER | DOUBLE | BOOLEAN ;
        <Description  href="http://docs.opendap.org/index.php/Server_Side_Processing_Functions#version">Returns version information for each server side function.</Description>
    </function>
 
    <!-- ##################################################################################### -->
    <!--      Server Side Function Group                                                      -->
    <!-- ..................................................................................... -->
 
    <functionGroup name="ferret" role="http://pmel.noaa.gov/dap4extension/ferret" >
        <Description href="http://ferret.pmel.noaa.gov/Ferret/documentation">This server supports ferret functions.</Description>
    </functionGroup>
 
    <!-- ##################################################################################### -->
    <!--      Extensions                                                                      -->
    <!-- ..................................................................................... -->
 
    <extension name="async" role="http://opendap.org/extension/async">
        <Description href="http://docs.opendap.org/index.php/DAP4:_Asynchronous_Request-Response_Proposal_v3">The server supports asynchronous transactions.</Description>
    </extension>
   
   
</DatasetServices>
</source>
</source>
</font>


=== DSR XML Schema ===
===Lexical Tokens for Bison Grammar for Constraints===


''I believe that this schema is correct, but it rigidly enforces the order of the content. We may want to rewrite to be more lenient. [[User:Ndp|ndp]] 16:02, 22 November 2013 (PST)''
The primary lexical tokens for constraints are:
NAME, STRING, INTEGER, DOUBLE, BOOLEAN.


<font size="2">
These lexemes are intended to match the patterns defined for the RELAX NG grammar.
<source lang="xml" >
 
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    targetNamespace="http://xml.opendap.org/ns/DAP/4.0/dataset-services#"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
    xmlns:dsr="http://xml.opendap.org/ns/DAP/4.0/dataset-services#"
    xmlns="http://xml.opendap.org/ns/DAP/4.0/dataset-services#"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified">
   
    <xs:import  namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd"/>
    <!--
    -->
    <xs:element name="DatasetServices" type="DatasetServicesType"/>
    <xs:element name="DapVersion" type="DapVersionType"/>
    <xs:element name="ServerSoftwareVersion" type="AnyContentType"/>
    <xs:element name="Service" type="ServiceType"/>
    <xs:element name="Description" type="DescriptionType"/>
    <xs:element name="link" type="linkType"/>
    <xs:element name="alt" type="altType"/>
    <xs:element name="function" type="ExtensionType"/>
    <xs:element name="functionGroup" type="ExtensionType"/>
    <xs:element name="extension" type="ExtensionType"/>
    <!--
    -->
    <xs:complexType name="DatasetServicesType">
        <xs:annotation>
            <xs:documentation>DatasetServices root element type</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element ref="DapVersion" minOccurs="1" maxOccurs="unbounded"/>
            <xs:element ref="ServerSoftwareVersion" minOccurs="1" maxOccurs="1"/>
            <xs:element ref="Description" minOccurs="0" maxOccurs="1"/>
            <xs:element ref="Service" minOccurs="3" maxOccurs="unbounded"/>
            <xs:element ref="function" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element ref="functionGroup" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element ref="extension" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute ref="xml:base"  use="required"/>
    </xs:complexType>
    <!--
    -->
    <xs:complexType name="ServiceType">
        <xs:annotation>
            <xs:documentation>DatasetServices Service type</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element ref="Description" minOccurs="0" maxOccurs="1"/>
            <xs:element ref="link" minOccurs="1" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="title" type="xs:string" use="optional"/>
        <xs:attribute name="role" type="xs:anyURI" use="required"/>
    </xs:complexType>
    <!--
       
    -->
    <xs:simpleType name="DapVersionType">
        <xs:restriction base="xs:string">
            <xs:pattern value="\d{1,}.\d{1,}"></xs:pattern>
        </xs:restriction>
    </xs:simpleType>
    <!--
       
    -->
    <xs:complexType name="DescriptionType">
        <xs:simpleContent>
            <xs:extension base="xs:string">
                <xs:attribute name="href" type="xs:anyURI"/>
            </xs:extension>
        </xs:simpleContent>
    </xs:complexType>
    <!--
       
    -->
    <xs:complexType name="altType">
        <xs:attribute name="type" type="xs:string"/>
    </xs:complexType>
    <!--
       
    -->
    <xs:complexType name="linkType">
        <xs:annotation>
            <xs:documentation>DatasetServices link type</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element ref="alt" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="href" type="xs:string" use="required"/>
        <xs:attribute name="type" type="xs:string" use="required"/>
        <xs:attribute name="description" type="xs:string" use="optional"/>       
    </xs:complexType>
    <!--
    -->
    <xs:complexType  name="AnyContentType" mixed="true">
        <xs:annotation>
            <xs:documentation>
                An element of this type may contain arbitrary XML or text content.
                The resulting content is may be ignored by DAP software. Other software
                might find the information useful. The XML elements must
                satisfy the requirements for 'lax' processing under schema 1.0.
                In practice, that means just about anything.
                ( <xs:any/>+ )
            </xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:any namespace="##any" minOccurs="0" processContents="lax"/>
        </xs:sequence>
        <xs:anyAttribute processContents="lax" namespace="##any"/>
    </xs:complexType> 
    <!--
    -->
    <xs:complexType name="ExtensionType">
        <xs:annotation>
            <xs:documentation>Server Extension Type</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element ref="Description" minOccurs="1" maxOccurs="1"/>
        </xs:sequence>
        <xs:attribute name="name" type="xs:string" use="required"/>
        <xs:attribute name="role" type="xs:anyURI" use="required"/>
    </xs:complexType>
    <!--
    -->
   
</xs:schema>
 
</source>
</font>
 
== DAP4 Asynchronous Response Schema ==
 
 
 
== DAP4 Error Response Schema ==
The normative XML representation for the Error Response is defined by the following RELAX-NG schema.
<font size="2">
<source lang="xml">
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:doc="http://www.example.com/annotation" datatypeLibrary="http://xml.opendap.org/datatypes/dap4" ns="http://xml.opendap.org/ns/DAP/4.0#">
    <start>
        <ref name="errorresponse"/>
    </start>
    <define name="errorresponse">
        <element name="Error">
            <optional>
                <interleave>
                    <element name="code">
                        <attribute name="protocol">
                            <data type="text"/>
                        </attribute>
                        <data type="dap4_integer"/>
                    </element>
                    <element name="Message">
                        <text/>
                    </element>
                    <element name="Context">
                        <text/>
                    </element>
                    <element name="OtherInformation">
                        <text/>
                    </element>
                </interleave>
            </optional>
        </element>
    </define>
</grammar></source>
</font>

Latest revision as of 22:53, 5 November 2021

<< Back to OPULS Development


The Data Access Protocol: DAP Version 4.0


Volume 1: Data Model and Serialized Representation


Date:May 31, 2012
Last Revised:24 February 2016
Status:Draft
Authors:John Caron (Unidata)
Ethan Davis (Unidata)
David Fulker (OPeNDAP)
James Gallagher (OPeNDAP)
Dennis Heimbigner (Unidata)
Nathan Potter (OPeNDAP)
Copyright:2016 University Corporation for Atmospheric Research and Opendap.org


Abstract

This document defines the Data Access Protocol (DAP) version 4.0 (referred to also as DAP4). This data transmission protocol is intended to supersede all previous versions of the DAP protocol. DAP4 is designed specifically for science data, but it is intended to be discipline neutral. The protocol relies on widely used and stable standards, and is capable of representing a wide variety of scientific data types.

Distribution of this document is unlimited.

This document takes material from the DAP2 specification and the OPULS Wiki page.


Change List

2012.05.24: Initial Draft
2012.05.27 Added specification of chunk order
2012.05.28 Added specification and interpretation of simple queries
2012.05.28 Added discussion about nested sequences.
2012.05.29 Formatting changes
2012.6.05 Removed serialized representation sections and constraint sections until James provides direction.
2012.6.24 Merge all changes from Gallagher, Potter, and Caron, except as noted.
2012.6.24 Removed all references to Sequences.
2012.6.24 Inserted James' version of serialized representation.
2012.6.25 Added DMR RELAX-NG Grammar.
2012.6.24 Added (semi-)formal description of the DAP4 serialization scheme.
2012.6.26 Added: (1) Revised Char type (2) Revised unlimited dimension rules (3) revised MAP rules. (4) Removed HTTP references
2012.7.09 Added discussion of identifier
2012.7.10 Added discussion of XML escaping
2012.7.10 Fix discrepancies between the formal definition of the on-th-wire format and the examples.
2012.7.12 Removed UByte and made Byte == UInt8
2012.8.21 Added draft constraints section
2012.8.25 Improved the discussion of named slices in constraints.
2012.9.4 Minor change to the grammar for simple constraints.
2012.9.6 Updated the Data Response section so that it no longer mentions Multipart MIME; edited the sections on FQNs and Attributes. I've added ‘nested attributes' back into the text. I also added ‘Sequence' in several places where we will need it once we've worked out how those are to be handled.
2012.11.1 Integrate Jame's changes with recent changes
2012.11.9 Rebuild the .docx because of repeated Word crashes; minor formatting info changed/lost.
2012.11.23 Add a Dataset construct to make the root group concept clear syntactically.
2013.3.8 Made unlimited into a boolean attribute because it does have a size.
2013.4.7 Inserted the new checksum description.
2013.4.15 Removed all mention of unlimited wrt Dimensions
2013.4.15 Remove the base and ns attributes from <Dataset>
2013.4.15 Introduce <Sequence> as a replacement for variable length dimensions; The term Sequence is subject to future change.
2013.10.14 Clarify the maximum number of elements as a function of the maximum number of bytes.
2013.10.14 Enforce a specific order on declarations in a Group body.
2013.11.22 Added sections for DSR, Async, and Error responses and their schemas
2013.11.22 Specified the case sensitivity of XML element names and XML attribute names
2014.07.04 Make a pass to clean up and clarify (dmh)
2016.02.14 Rollback to version of 2015.12.16
2016.02.24 Add back the multiple disjoint slice subset.
Provide a general mechanism for arbitrary reserved names.
2016.10.25 Add _DAP4_Little_Endian attribute to the DMR to reflect the bytorder used to encode the serialized data.
2016.12.5 Forgot to mention adding the special names section (5.3)
2016.12.18 Clarified the reserved names section (5.3) to say that all names beginning with "_" are reserved, but that the reverse DNS case is preferred.

Introduction

This specification defines the protocol referred to as the Data Access Protocol, version 4.0 ("DAP4"). In this document 'DAP' refers to DAP4 unless otherwise noted.

DAP is intended to be the successor to all previous versions of the DAP (specifically DAP version 2.0). The goal is to provide a very general data model capable of representing a wide variety of existing data sets.

The DAP builds upon a number of existing data representation schemes. Specifically, it is influenced by CDM[1], HDF5 [2], DAP version 2.0[3], and netCDF-4[5].

The DAP is a protocol for access to data organized as variables. It is particularly suited to accesses by a client computer to data stored on remote (server) computers that are networked to the client computer. DAP was designed to hide the implementation of different collections of data. The assumption is that a wide variety of data sets using a wide variety of data schemas can be translated into the DAP protocol for transmission from the server holding that dataset to a client computer for processing.

It is important to stress the discipline neutrality of the DAP and the relationship between this and adoption of the DAP in disciplines other than the Earth sciences. Because the DAP is agnostic as relates to discipline, it can be used across the very broad range of data types encountered in oceanography - biological, chemical, physical and geological. There is nothing that constrains the use of the DAP to the Earth sciences.

Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. [7]

Overall Operation

The DAP is a stateless protocol that governs clients making requests from servers, and servers issuing responses to those requests. This section provides an overview of the requests and responses (i.e. the messages) that DAP-compliant software MUST support. These messages are used to request information about a server and data made accessible by that server, as well as requesting data values themselves.

For every data resource the DAP defines a number of responses that may elicited by a client. These responses provide services information (i.e. capabilities), structural/semantic descriptions, data access timing and error information.

The Dataset Services Response (DSR) provides a 'Services' or 'Capabilities' response for the DAP. Dereferencing an unadorned DAP dataset resource URL will return a document describing the DAP services available for the dataset.

The DAP utilizes two responses to represent semantic structural description and data content of a data source. One response, called the DMR, returns metadata information describing the structure of a request for data. That is, it characterizes the variables, their datatypes, names and attributes. The second response, the Data Response, returns both the metadata about the request, but also the data that was requested. The DMR and the metadata part of the Data Response are represented using a specific XML [16] representation. The syntax of that representation is defined elsewhere in this document (Section 5.3).

The DAP returns error information using an Error response. If a request for any of the three basic responses cannot be completed then an Error response is returned in its place.

The two responses (DMR and Data Response) are complete in and of themselves so that, for example, a client can use the data response without ever requesting either of the two other responses. In many cases, client programs will request the DMR response first before requesting the Data Response but there is no requirement they do so and no server SHALL require that behavior on the part of clients.

Operationally, communication between a DAP client and a DAP server uses some underlying already existing protocol, most typically HTTP. Volume 2 of this specification discusses how the DAP should utilize HTTP.

In addition to these data objects, a DAP server MAY provide additional "services" which clients may find useful. For example, many DAP-compliant servers provide HTML-formatted representations or ASCII representations of a data source's structure and data. Such additional services are discussed in Volume 2 of this specification.

The DAP specification also defines extensions to the protocol and representing important, but optional, capabilities. At least the following extensions have been defined. 1. Asynchronous Response. The DAP Asynchronous Response is returned to a client when the requested resource (DMR, Data Response, etc.) is not immediately available and by making a specific request that it be made available the server is able to retrieve it. If the client makes the "retrieve it" request the server will inform the client through a subsequent Asynchronous Response when and where the client may access the requested resource. 2. CSV Data Encoding. The DAP4 CSV data encoding represents DAP4 data as structured Comma-Separated Values (CSV) in UTF-8 text. Though based on the text/csv media type described in RFC 4180[RFC 4180], the DAP4 CSV is more complex so that it can fully represent the more complex data structures of the DAP4 data model. Some structure beyond simple CSV is necessary to capture the DAP4 data structures.

Characterization of a Data Source

The DAP characterizes a data source as a collection of variables, dimensions, and enumeration types. Each variable consists of a name, a type, a value, and a collection of Attributes. Dimensions have a name and a size. Enumerations list names and values of the enumeration constants. These elements may be grouped into collections using the concept of a "group" that has an identifier and defines a naming scope for the elements within it. Groups may contain other groups.

The distinction between information in a variable and in an Attribute is somewhat arbitrary. However, the intention is that Attributes hold information that aids in the interpretation of data held in a variable. Variables, on the other hand, hold the primary content of a data source.

Section 13 provides a formal syntax for DAP DMR characterizations. It is defined using the RelaxNG standard [13] for describing the context-free syntax of a class of XML documents, the DMR in this case. It should be noted that any syntax specification requires a specification of the lexical elements of the syntax. The XML specification [16] provides most of the lexical context for the syntax, but there are certain places where additional lexical elements must be used. Section 11 describes those additional lexical elements, and those elements are discussed at appropriate points in this specification.

Since the syntax is context-free, there are semantic limitations on what is legal in a DMR. These semantic limitations are noted at appropriate places in the following documentation. It should also be noted that if there are conflicts between what is described here and the RelaxNG syntax, then the syntax takes precedence.

DMR Declarations

DMR XML Format

Element and Attribute Names
Within the DMR XML document, it is assumed that XML element and XML attribute names are case sensitive.
Character Escapes
Any string of characters appearing within an XML attribute in the DMR must apply the standard XML escapes. Specifically, any attribute value containing any of the following characters must replace them with the corresponding XML escape form.
CharacterEscaped Form
&&amp;
<&lt;
>&gt;
"&quot;

So for example, given the occurrence of the attribute 'name="&<>"' it must be re-written to this form 'name="&amp;&lt;&gt;"'.

Names

A name (aka identifier) in DAP4 consists of a sequence of any legal non-control UTF-8 characters. A control character is any UTF-8 character in the inclusive range 0x00 — 0x1F. Names are case sensitive.

Reserved Names

Any name that begins with the character sequence "_" is considered reserved. Note that if the receiver encounters such a name and has no information on how to process the name, it may at its discretion either ignore the object with that name, or it may treat the name as an ordinary name.

A special case is when the "_" is followed by a reverse DNS name defining both the definer of that reserved name and possible additional naming information. This form of reserved name is preferred because it provides information about the organization that defined it.

A (reverse) DNS name is of this syntactic form.

DNS = <name> | DNS '.' <name>

An example might be "edu.ucar.unidata.NAME1.NAME2...". This indicates the owner/definer of that name is "edu.ucar.unidata" and that the additional naming information ("NAME1.NAME2...) has meaning to the owner for defining the semantics of the so-named object.

Fully Qualified Names

Every object in a DAP4 Dataset has a Fully Qualified Name (FQN), which provides a way to unambiguously reference declarations in a dataset and which can be used in several contexts such as in the DMR in a constraint expression (see Section 8).

These FQNs follow the common conventions of names for lexically scoped identifiers. In DAP4 several kinds of lexical items provide lexical scoping: Dataset, Groups, Structures, Sequences, Enumerations, and AttributeSets. Just as with hierarchical file systems or variables in many programming languages, a simple grammar formally defines how the names are built using the names of the FQN's components (see Section 10).

The FQN for a "top-level" variable — as opposed to e.g. a field in a structure or sequence — is defined purely by the sequence of enclosing groups plus the variable's simple name. This also holds for Enumeration declarations.

Consider the following simple dataset, which contains a Structure named "inner" within a Structure named "outer" all contained in the Dataset "D".

<Dataset name="D">
    <Structure name="places">
        <String name="name"/>
        <Structure name="weather">
            <Float64 name="temperature"/>
            <Float64 name="dew_point"/>
        </Structure>
    </Structure>
</Dataset>

The FQN for the field 'temperature' is

'/places.weather.temperature'.

Substituting the keyword Sequence for one or more occurrences of Structure in the above example will leave the FQNs unchanged. Note that the name of the dataset ("D") is not included; it is implied by the leading "/".

As is the case with Structure or Sequence variables, Groups can be nested to form hierarchies, too, and this example shows that case.

<Dataset name="D">
    <Group name="environmental_data">
        <Structure name="places">
            <String name="name"/>
            <Sequence name="weather">
                <Float64 name="temperature"/>
                <Float64 name="dew_point"/>
            </Sequence>
        </Structure>
     </Group>
     <Group name="demographic_data">
         ...
     </Group>
</Dataset>

The FQN to the field 'temperature' in the dataset shown is

'/environmental_data/places.weather.temperature'.

Note the use of a different separator character — "." instead of "/" — once we enter the scope of a structure (or sequence).

Enumeration constants are treated similarly to fields. Consider this example.

<Dataset name="DE">
    <Enumeration name="e">
        <EnumConst name="v1" value="5"/>
    </Enumeration>
</Dataset>

The FQN for the "v1" constant in "e" is as follows.

/e.v1

Notes:

  1. Every dataset has a single outermost <Dataset> declaration, which semantically, acts like the root group. Whatever name that dataset has is ignored for the purposes of forming the FQN and instead is treated as if it has the empty name ("").
  2. There is no limit to the nesting of groups or the nesting of Structures or the nesting of Sequences. Enumerations cannot be nested.
  3. Reserved names (see above) inherently contain characters ('.') that will require escaping.

The characters "/" and "." have special meaning in the context of a fully qualified name. This means that if a name is added to the FQN and that name contains either of those two characters, then those characters must be specially escaped so that they will not be misinterpreted. The defined escapes are as follows.

CharacterEscaped Form
.\.
/\/
\\\
blank \blank

Note that the escape character itself must be escaped. Also note that this form of escape using '\' is independent of any required XML escape (Section 5.1).

FQN References

DAP4 imposes the rule that the definition of any object (e.g. dimension, group, or enumeration) must occur before any reference to that object. This rule also applies within a group, which in turn implies that, for example, all dimensions must be declared before all variables that reference them.

Definitional Declarations versus Data-Bearing Declarations

The declarations in a DMR can be grouped into two classes. One class is definitional. That is, it defines metadata that is used in the rest of the DMR. These definitional declarations are Groups (including the outer Dataset), Dimensions, and Enumerations. Such declarations do not contain data values themselves, although they may define constants such as the dimension size. The data-bearing declarations are Variables and Attributes. These elements of the data model are used to house data values or semantic metadata read from the dataset (or, in the latter case) synthesized from the values and standards/conventions that the dataset is known to follow.

Dataset

Every DMR contains exactly one Dataset declaration. It is the outermost XML element of the DMR.

A dataset is specified using this XML form:

<Dataset name="..." dapVersion="..." dmrVersion="...";
...
</Dataset>

The name, dapVersion, and dmrVersion, attributes are required. The attributes have the following semantics:

  • name – an identifier specifying the name of the dataset. Its content is determined solely by the Server and is completely uninterpreted with respect to DAP4.
  • dapVersion – the string "4.0" currently.
  • dmrVersion – the string "1.0" currently.

The body of the Dataset is the same as the body of a 5.7, and semantically the Dataset acts like the outermost, root, group.

Groups

A group is specified using this XML form:

<Group name="name">
...
<Group>

A group defines a name space and contains other DAP elements. Specifically, it can contain in this order: dimension, enumerations, variables, and (sub-)groups. The fact that groups can be nested means that the set of groups in a DMR form a tree data structure. For any given DMR, there exists a root group that is the root of this tree.

A nested set of groups defines a variety of name spaces and access to the contents of a group is specified using a notation of the form "/g1/g2/.../gn". This is called a "path". By convention "/" refers to the root group (the Dataset declaration). Thus the path "/g1/g2/g3" indicates that one should start in the root group, move to group g1 within that root group, then to group g2 within group g1, and finally to group g3. This is more fully described in the section on Fully Qualified names (Section 5.3).

The order of declarations within a Group is fixed and must conform to this order.

  1. Dimension declarations,
  2. Enumeration declarations,
  3. Variable declarations,
  4. and, finally, nested Group declarations,

For comparison purposes, DAP groups correspond to netCDF-4 groups and not to the more complex HDF5 Group type: i.e. the set of groups must form a tree.

Semantic Notes

  1. If declared, Groups must be named.
  2. A Group can contain any number of objects, including other Groups.
  3. Each Group declares a new lexical scope for the objects it contains.
  4. An array of Group is not allowed, and a Group cannot be defined within a Structure or Sequence.

Dimensions

A dimension declaration is specified using this XML form.

<Dimension name="name" size="size"/>

The size is a positive integer (which means that a zero length dimension is illegal). As described in the Arrays Section, the maximum size of any dimension is 261 - 1. A dimension declaration will be referenced elsewhere in the DMR by specifying its name. It should also be noted that anonymous dimensions also exist. They have a size but no name. Anonymous dimensions SHOULD NOT be declared.

Semantic Notes

  1. Dimension declarations are not associated with a data type.
  2. Dimension sizes MUST be a capable of being represented as a signed 64-bit integer.

Enumeration Types

An enumeration type defines a set of names with specific values called enumeration constants. As will be seen in Section 5.12, enumeration types may be used as the type for variables or attributes. The values that can be assigned to such typed objects must come from the set of enumeration constants.

An enumeration type specifies a set of named, integer constants. When a data source has a variable of type 'Enumeration' a DAP 4 server MUST represent that variable using a specified integer type, up to and including a 64-bit unsigned integer.

An Enumeration type is declared using this XML form.

<Enumeration name="name">
                basetype="Byte|Int8|UInt8|Int16|UInt16
                         |Int32|UInt32|Int64|UInt64"/>
    <EnumConst name="name" value="integer"/>
    ...
</Enumeration>

Semantic Notes

  1. The optional "basetype" XML attribute defines the type for the value XML attribute of each enumeration constant. This basetype must be one of the integer types (see Section 5.10.1). If unspecified, then it defaults to the Atomic type "Int32".

Atomic Types

The DAP4 specification assumes the existence of certain pre-defined, declared types called atomic types. As their name suggests, atomic data types are conceptually indivisible. Atomic variables are used to store integers, real numbers, strings and URLs. There are five classes of atomic types, with each family containing one or more variations: integer, floating-point, string, enumerations, and opaque.

Integer Types

The integer types are summarized in the following table. The syntax for integer constants is defined in Section 11.3.

Type NameDescriptionRange of Legal Values
Int8Signed 8-bit integer[-(27), (27) - 1]
UInt8Unsigned 8-bit integer[0, (28) - 1]
ByteSynonym for UInt8[0, (28) - 1]
CharSynonym for UInt8[0, (28) - 1]
Int16Signed 16-bit integer[-(215), (215) - 1]
UInt16Unsigned 16-bit integer[0, (216) - 1]
Int32Signed 32-bit integer[-(231), (231) - 1]
UInt32Unsigned 32-bit integer[0, (232) - 1]
Int64Signed 64-bit integer[-(263), (263) - 1]
UInt64Unsigned 64-bit integer[0, (264) - 1]

Note that for historical reasons, the Char type is defined to be a synonym of UInt8, this mean that technically, the Char type has no associated character set encoding. However, servers and clients are free to infer typical character semantics to this type. The inferred character set encoding is chosen purely at the discretion of the server or client using whatever conventions they agree to use, possibly specified using attributes. Note specifically that multi-byte character encodings such as UTF-8 are problematic precisely because they can be multi-byte.

Floating Point Types

The floating-point data types are summarized in Table 2. The two floating-point data types use IEEE 754 [6] to represent values. The two types correspond to ANSI C's float and double data types. The syntax for floating point constants is defined in Section 11.3.

Type NameDescriptionRange of Legal Values
Float3232-bit Floating-point numberRefer to the IEEE Floating Point Standard [6]
Float6464-bit Floating-point numberRefer to the IEEE Floating Point Standard [6]

String Types

The string data types are summarized in Table 3. Again, the syntax for these is defined in Section 11.4

Strings are individually sized. This means that in an array of strings, for example, each instance of that string MAY be of a different size.

Type NameDescriptionRange of Legal Values
StringA variable length string of UTF-8 charactersAs defined in [14]
URIA Uniform Resource IdentifierAs defined in IETF RFC 2396 [8]

The Opaque Type

The XML scheme for declaring an Opaque type is as follows.

<Opaque>

The Opaque type is use to hold objects like JPEG images and other Binary Large Object (BLOB) data that have significant internal structure which might be understood by clients (e.g., an image display program) but that would be very cumbersome to describe using the DAP4 built-in types. Defining a variable of type "Opaque" does not communicate any information about its content, although an attribute could be used to do that.

Opaque instances are individually sized. This means that in an array of opaques, for example, each instance of that opaque MAY be of a different size.

Semantic Notes

  1. The content of an opaque object is completely un-interpreted by the DAP4 implementation. The Opaque type is an Atomic Type, which might seem odd because instances of Opaque can be of different sizes. However, by thinking of Opaque as equivalent to a byte-string type, the analogy with strings makes it clear that it should be an Atomic type.

The Enum Type

The XML scheme for declaring an Enum type is as follows.

<Enum enum="FQN">

The Enum type is intended to be used in the definition of a variable. It should not be confused with the definition of an Enumeration, but rather references such a definition.

Semantic Notes

  1. The Enum typed requires the an attribute that references a previously defined <Enumeration> declaration.

A Note Regarding Implementation of the Atomic Types

When implementing the DAP, it is important to match information in a data source or read from a DAP response to the local data type which best fits those data. In some cases an exact match may not be possible. For example Java lacks unsigned integer types [4]. Implementations faced with such limitations MUST ensure that clients will be able to retrieve the full range of values from the data source. If this is impractical, then the server or client may implement this rule by hiding the variable in question or returning an error.

Container Types

There are currently two container types: <Structure> and <Sequence>.

The Structure Type

A Structure groups a list of variables so that the collection can be manipulated as a single item. The variables in a Structure may also be referred to as "fields" to conform to conventional use of that term, but there is otherwise no distinction between fields and variables. The Structure's fields MAY be of any type, including Structure or Sequence. The order of items in the Structure is significant only in relation to the serialized representation of that Structure.

The Sequence Type

A Sequence is intended to represent a sequence of instances of objects. Suppose that we have a sequence of this form.

<Sequence name="s">
    <Float64 name="field1"/>
    <Float64 name="field2"/>
</Sequence>

The corresponding Structure object is obtained by substituting the Sequence keyword with Structure. Our above example then has this associated Structure.

<Structure name="s">
    <Float64 name="field1"/>
    <Float64 name="field2"/>
</Structure>

The semantics of a sequence are that it represents a sequence of instances of the corresponding Structure. The length of the Sequence MAY be different for every instance of a Sequence. Consider this array of Sequence.

<Sequence name="s">
    ...
    <Dim size="3">
    <Dim size="2">
</Sequence>

This represents an array of six (3 times 2) sequence instances. However, the length MAY be different for each of those six instances.

Note that the <Sequence> construct was introduced to replace the concept of variable length dimensions. It turns out that trying to treat variable length dimensions as dimensions causes significant conceptual and implementation difficulties. It is hoped that isolating such variable length objects syntactically is a better representation.

Semantic Notes

  1. Structures and Sequences MAY freely nested.

Variables

Each variable in a data source MUST have a name, a type and one or more values. Using just this information and armed with an understanding of the definition ofv the DAP data types, a program can read any or all of the information from a data source.

The DAP variables come in several different types. There are several atomic types, the basic indivisible types representing integers, floating point numbers and the like, and a container type – the Structure or Sequence type – that supports aggregation of other variables into a single unit. A container type may contain both atomic typed variable as well as other container typed variables, thus allowing nested type definitions.

The DAP variables describe the data when it is being transferred from the server to the client. It does not necessarily describe the format of the data inside the server or client. The DAP defines, for each data type described in this document, a serialized representation, which is the information actually communicated between DAP servers and DAP clients. The serialized representation consists of two parts: the declaration of the type and the serialized encoding of its value(s). The data representation is presented in Section 6.1".

Arrays

An Array is a multi-dimensional indexed data structure. An Array's member variable MUST be of some DAP data type. Array indexes MUST start at zero. Arrays MUST be stored in row-major order (as is the case with ANSI C), which means that the order of declaration of dimensions is significant. The size of each Array's dimensions MUST be given. The total number of elements in an Array is fixed as that given by the product of the size(s) of its dimension(s). Note that a dimension size of zero is illegal.

For practical reasons having to do with current hardware limitations, the total number of bytes allocated to an array must fit in an unsigned 64-bit integer. The largest atomic types currently defined in this document are the floating point double and the (U)Int64 integer types. This means that the practical limit on the total number of elements is 264 / 8 = 261. Thus the dimension indices will run from 0 to a maximum of 261 - 1. Of course this limit on the maximum number of elements also applies to the maximum dimension size since the total number of elements is the product of all the dimensions sizes of the array.

There is a prescribed limit of 64 on the number of of dimensions for a variable (i.e. its arity). This is actually larger than will occur in practice. Assuming a dimension must be at least 1 bit in size, this effectively limits the number of dimensions to 61.

Semantic Notes

  1. Simple variables (see below) MAY be arrays.
  2. Structures and Sequences MAY be arrays.

Simple Variables

A simple, dimensioned variable is declared using this XML form.

<Int32 name="name">
  <Dim name="{fqn}"/>
  ...
  <Dim size="{integer}"/>
</Int32>

Note the use of two types of dimensions:

  1. name="{fqn}" – specify the fully qualified name of a Dimension that has been declared previously in the XML document order. See the W3C DOM-3 glossary for the definition of XML document order.
  2. size="{integer}" – specify an anonymous dimension of a given size,

A simple variable is one whose type is one of the Atomic Types (see Section 5.10). The name of the Atomic Type (Int32 in this example) is used as the XML element name. Within the body of that element, it is possible to specify zero or more dimension references. A dimension reference (<Dim.../>) MAY refer to a previously defined dimension declaration. It MAY also define an anonymous dimension with no name, but with a size specified as an integer constant.

Semantic Notes

  1. N.A.

Dimension Ordering Consider this example.

<Int32  name="i">
    <Dim name="/d1"/>
    <Dim name="/d2"/>
    ...
    <Dim name="/dn"/>
</Int32>

The dimensions are considered ordered from top to bottom. From this, a corresponding left-to-right order [d1][d2]...[dn] can be inferred where the top dimension is the left-most and the bottom dimension is the right-most. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the right-most is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this left-to-right ordering of dimensions.

Structure Variables

As with simple variables, a structure variable specifies a type as well as any dimension for that variable. The type, however, is a Structure.

Structures The XML scheme for a Structure typed variable is as follows.

<Structure name="name">
  {variable definition}
  {variable definition}
  ...
  {variable definition}
  <Dim ... />
  ...
  <Dim ... />
</Structure>

The Structure contains within it a list of variable definitions (Section 5.12). For discussion convenience, each such variable may be referred to as a "field" of the Structure. The list of fields may optionally be followed with a list of dimension references indicating the dimensions of the Structure typed variable.

Semantic Notes

  1. Structure variables MAY be dimensioned.

Sequence Variables

As with simple variables, a sequence variable specifies a type as well as any dimension for that variable. The type, however, is a Sequence.

Sequences

The XML scheme for a Sequence typed variable is as follows.

<Sequence name="name">
  {variable definition}
  {variable definition}
  ...
  {variable definition}
  <Dim ... />
  ...
  <Dim ... />
</Sequence>

The Sequence contains within it a list of variable definitions (Section 5.12). For discussion convenience, each such variable may be referred to as a "field" of the Sequence. The list of fields may optionally be followed with a list of dimension references indicating the dimensions of the Sequence typed variable.

Semantic Notes

  1. Sequence variables MAY be dimensioned.

Coverage Variables and Maps

A "Discrete Coverage" is a concept commonly found in many disciplines, where the term refers to a sampled function with both its domain and range explicitly enumerated by variables. DAP2 uses the name 'Grid' to denote what the OGC calls a 'rectangular grid' [12]. DAP4 expands on this so that other types of discrete coverages (hereafter 'coverage(s)') can be explicitly represented. Note that the DAP2 Grid construct is gone, and is replaced by these coverages, which are more general than DAP2 Grids.

Consider the example coverage function

Temp: lat X lon -> Float32
where
lat and lon subsets are of Float32 in the range [0,360).

The range is, of course, Float32 and the domain is lat X lon. The Temp function as a coverage is a sampled subset of the continuous function and is defined at some finite set of pairs from lat X lon.

In DAP4, the range for a coverage is represented by a variable, Temp in this example, whose values are the range of the sampled function. Because the domain of Temp is a two-tuple (lat,lon), the DAP4 variable must have rank two. In order to complete the sampling of Temp, it is necessary to also define two 'Map' (also called 'coordinate') variables representing the sampling of lat and lon. These two variables, lat and lon, have rank one each. Taken as whole, this collection of a variable plus maps is called a "grid" for convenience sake.

Suppose we want to access the value of the Temp function at position (x,y), where x is a value in the lat variable and y is a value in the lon variable. The lat variable is consulted to find ilat such that lat[ilat] = x. Similarly, we want the ilon index such that lon[ilon] = y. We can then obtain Temp(x,y) as the value of Temp[ilat][ilon]. This is probably the simplest example for using coverages and more complex examples exist for, for example, satellite swathes.

Using OGC coverage terminology, we have this.

  1. The maps (e.g. lat and lon) specify the "Domain"
  2. The array (e.g. Temp) specifies the "Range"
  3. The Grid itself is a "Coverage" per OGC.
  4. The Domain and Range are sampled functions

A map is defined using the following XML scheme.

<Map name="{FQN for some variable previously defined in the DMR}"/>

An example might look like this.

<Float32 name="Temp">
  <Dim name="/lat"/>
  <Dim name="/lon"/>
  <Map name="/lat"/>
  <Map name="/lon"/>
</Float32>

Where the map variables are defined elsewhere like this.

<Float32 name="lat">
  <Dim name="/lat"/>
</Float32>

<Float32 name="/lon">
  <Dim name="/lon"/>
</Float32>

The containing variable, temp in the example, will be referred to as the "array variable".

Semantic Notes

  1. Each map variable MUST have a rank no more than that of the array.
  2. An array variable can have as many maps as desired.
  3. Any map duplicates are ignored
  4. The order of declaration (top to bottom) MAY be significant.
  5. The fully qualified name of a map must either be in the same lexical scope as the array variable, or the map must be in some enclosing scope.
  6. The set of named "associated dimensions for a map must be a subset of the set of named "associated dimensions" for the array variable.

The term "associated dimensions" is computed as follows.

  1. The set of associated dimensions is initialized to empty.
  2. For each element mentioned in the fully qualified name (FQN) of the map or the array variable, add any named dimensions associated with FQN element to the set of associated dimensions (removing duplicates, of course).

In practice, the means that an array variable or map variable must take into account any dimensions associated with any enclosing dimensioned Structure or Sequence.

Attributes and Arbitrary XML

Attributes

Simple attributes are defined using the following XML scheme.

<Attribute name="name" type="{atomicTypeName|EnumType fqn}">
  <Namespace href="http://netcdf.ucar.edu/cf"/> <!--optional-->
  <Value value="value"/>
  ...
  <Value value="value"/>
</Attribute>

or

<Attribute name="name" type="{atomicTypeName|EnumType fqn}" value="value"/>

Attributes may also serve as containers for other attributes (and other containers). In this case, no type is specified, only a name.

<Attribute name="name">
  <Namespace href="http://netcdf.ucar.edu/cf"/>

  <Attribute name="name" type="...">
    ...
  </Attribute>

  ...

  <Attribute name="name" type="...">
    ...
  </Attribute>

</Attribute>

In DAP4, Attributes (not to be confused with XML attributes) are tuples with four components:

  • Name,
  • Type (one of the defined atomic types such as Int16, String, Enum fqn, etc.).
  • value as an alternate form for attributes with a single value,
  • Vector of one or more value declarations,
  • OR a set of contained attributes,
  • Zero or more Namespaces

This differs slightly from DAP2 Attributes because the namespace feature has been added, although clients can choose to ignore it. For more about namespaces, refer to Section 5.14. The intent of including the namespace information is to simplify interactions with semantic web applications where certain schemas or standards have formal definitions of attributes.

Attributes are typically used to associate semantic metadata with the variables in a data source. Attributes are similar to variables in their range of types and values, except that they are somewhat limited when compared to those for variables: they cannot use Structure or Sequence types

Attributes defined at the top-level within a group are also referred to as "group attributes". Attributes defined at the root group (i.e. Dataset) are "global attributes," which many file formats such as HDF4 or netCDF formally recognize.

While the DAP does not require any particular Attributes, some may be required by various metadata conventions. The semantic metadata for a data source comprises the Attributes associated with that data source and its variables. Thus, Attributes provide a mechanism by which semantic metadata may be represented without prescribing that a data source use a particular semantic metadata convention or standard.

Semantic Notes

  1. DAP4 explicitly treats an attribute with one value as an attribute whose value is a one-element vector.
  2. All of the atomic types are allowed as the type for an attribute
  3. If the attribute has type Enum, it must also have an XML attribute, enum, that references a previously defined <Enumeration> declaration.
  4. Attribute value constants MUST conform to the appropriate constant format for the given attribute type and as defined in Section 11.
  5. Attribute containers may may only contain attributes. Container attributes may not have values; only lowest level (leaf) attributes may have values.

Arbitrary XML content

Dap4 supports an explicit type to hold "arbitrary XML" markup that provides a way for the protocol to transport information encoded in XML. This is useful for "annotating" meta-data with information more complex than simple attributes. This can be used, for example, for passing semantic web information, or for passing out-of-band information: e.g about the conversion from some other meta-data system into DAP4.

The form on an otherXML declaration is as follows.

<otherXML name="name">
{arbitrary xml}
</otherXML>

There are no <value/> elements because the value of otherXML is the xml inside the <otherXML>...</otherXML>. The text content of the otherXML element must be valid XML and must be distinct from the XML markup used to encode elements of the DAP4 data model (i.e., in a practical sense, the content of an <OtherXML> attribute will be in a namespace other than DAP4). XML content may appear anywhere that an attribute may appear.

Attribute and OtherXML Specification and Placement

Attribute and OtherXML declarations MAY occur within the body of the following XML elements: Group, Dataset, Dimension, Variable, Structure, Sequence, and Attribute.

Namespaces

All elements of the DMR – Dataset, Groups, Dimensions, Variables, and Attributes – can contain an associated Namespace element. The namespace's value is defined in the form of an XML style URI string defining the context for interpreting the element containing the namespace. Suppose, hypothetically, that we wanted to specify that an Attribute is to be interpreted as a CF convention [15]. One might specify this as follows.

<Attribute name="latitude">
  <Namespace href="http://cf.netcdf.unidata.ucar.edu"/>
  ...
</Attribute>

Note that this is not to claim that this is how to specify a CF convention [15].; this is purely illustrative.

Data Representation

Data can be an elusive concept. Data may exist in some storage format on some disk somewhere, on paper somewhere else, in active memory on some server, or transmitted along some wire between two computers. All these can still represent the same data. That is, there is an important distinction to be made between the data and its representation. The data can consist of numbers: abstract entities that usually represent measurements of something, somewhere. Data also consist of the relationships between those numbers, as when one number defines a time at which some quantity was measured.

The abstract existence of data is in contrast to its concrete representation, which is how we manipulate and store it. Data can be stored as ASCII strings in a file on a disk, or as twos-complement integers in the memory of some computer, or as numbers printed on a page. It can be stored in HDF5 [2], netCDF [5], GRIB[17], a relational database, or any number of other digital storage forms.

The DAP specifies a particular representation of data, to be used in transmitting that data from one computer to another. This representation of some data is sometimes referred to as the serialized representation of that data, as distinguished from the representations used in some computer's memory. The DAP standard outlined in this document has nothing at all to say about how data is stored or represented on either the sending or the receiving computer. The DAP transmission format is completely independent of these details.

Response Format

There are two response formats that a server MUST provide to the client.

  1. DMR-only response
  2. (DMR +) Data response

DMR-Only Response If the client requests only the DMR, then it is returned as a standard XML encoded document. If constraints were specified, then the returned DMR may differ from the full DMR in that, for example, meta-data about only variables specified in the constraint will be returned. The DMR-Only response MUST be self-contained. This means that all declarations directly or transitively mentioned in the selected variables must be included in the returned DMR. Additionally, all attributes associated with the included declarations MUST be included as well.

Data Response The DAP4 data response uses a format very similar to that used for DAP2; the data payload is broken into two pieces. The first part holds metadata describing the names and types of the variables in the response while the second part holds the values of those variables.

The metadata information, sent as part 1 of the Data Response, is the DMR limited to just those variables included in the response. The response, however, MUST be self-contained (in the DMR-Only sense). DAP attributes for all included declarations MUST be included, but MAY be ignored by the receiving client.

Part 2 of the response consists of the binary data for each variable in the order they are listed in the DMR given as the response preface. DAP4 uses a receiver makes it right encoding, so the servers MAY simply write out binary data as they store it with the exceptions that floating-point data must be encoded according to IEEE 754[6] and Integer data must use twos-complement notation for signed types. Clients are responsible for performing byte-swapping operations needed to compute using the values retrieved.

The Data Response is encoded using chunking scheme (see Section 6.2). that breaks it into N parts where each part is prefixed with a chunk type and chunk byte count header. Chunk types include data and error types, making it simple for servers to indicate to clients that an error occurred during the transmission of the Data Response and (relatively) simple for clients to detect that error.

As with DAP2, the response describe here is a document that can be stored on disk or sent as the payload using a number of network transport protocols, HTTP being the primary transport in practice. However, any protocol that can transmit a document can be used to transmit these responses. As such, all critical information needed to decode the response is completely self-contained.

In the rest of this section we will describe the Data Response in the context of DAP4 using HTTP as its transport protocol.

Format of the DMR Part

The first part (part is not to be confused with chunk) of the Data Response always contains the DMR. The Data Response, when DAP is using HTTP as a transport protocol, is the payload for an HTTP response. It is separated from the last of the HTTP response's MIME headers by a single blank line, which MIME defines as a carriage return (ASCII character with byte value of 13) followed by a line feed (ASCII character with byte value of 10). This combination can be abbreviated as CRLF.

Format Related DMR Attributes
The DMR MAY contain attributes that reflect information from the serialized data. Specifically, the following attributes are defined.

  1. <Attribute name="_DAP4_Checksum_CRC32" type="Int32"/> — this attribute may be attached to each top-level variable to show the CRC-32 checksum of the content of that data. See Section 6.2 for more information.
  2. <Attribute name="_DAP4_Little_Endian" type="UInt8"/> — this attribute exists in the root group (the dataset) to indicate if the serialized data byte order is little-endian. The value "1" indicates that little-endian order was used and "0" indicates that big-endian order was used. If missing, little-endian is assumed.

Format of the Data Part The second part of the Data response consists of the serialized variables as specified by the data DMR. The variable serializations are concatenated to form a single binary dataset. If requested, each variable's serialization is followed by a CRC32 checksum.

Relationship to the Chunking format The data response format is technically independent of the chunking format

(see 6.1.3).

The assumption is that the DMR will be in a chunk of its own, the first chunk, and the serialized binary data will be in one or more additional chunks. This produces a format like this

CRLF
{DMR Length in binary form}
{DMR}
CRLF
{Chunk 1 containing some portion of the serialized data}
...
{Chunk n containing the last portion of the serialized data}

In the above and in the following, the form '{xxx}' is intended to represent any instance of the xxx.

How the Chunked Encoding Affects the Data Response Format

In a sense, the chunked encoding does not affect the format of the Data Response at all. Conceptually, the entire binary Data Response is built and then passed through a 'chunking encoder' transforming it into one that is broken up into a series of chunks. That 'chunked document' is then sent as the payload of some transport protocol, e.g., HTTP. In practice, that would be a wasteful implementation because a server would need to hold the entire response in memory. A better implementation would, for HTTP, write the initial parts of the HTTP response (its response code and headers) and then use a pipeline of filters to perform the encoding operations. The intent of the chunking scheme is to make it possible for servers to build responses in small chunks, and once they know those parts have been built without error, send them to the client. Thus a server should choose the chunk size to be small enough to fit comfortably in memory but large enough to limit the amount of overhead spent by the software that encodes and decodes those chunks. When an error is detected, the normal flow of building chunks and sending the data along is broken and an error chunk should be sent (See Section 12).

The DAP4 Serialized Representation

Given a DMR and the corresponding data, the serialized representation is formally described in this section.

A Note on Dimension Ordering

Consider this example.

<Int32  name="i">
  <Dimension name="d1"/>
  <Dimension name="d2"/>
  ...
  <Dimension name="dn"/>
</Int32>

The dimensions are considered ordered top-to-bottom textually. This order is linearized into a corresponding left-to-right order [d1][d2]...[dn]. The assumption of row-major order means that in enumerating all possible combinations of these dimensions, the rightmost is considered to vary the fastest. The terms "right(most)" or "left(most") refer to this ordering of dimensions.

Order of Serialization

The data appearing in a serialized representation is the concatenation of the variables specified in the tree of Groups within a DMR, where the variables in a group are taken in depth-first, top-to-bottom order. The term "top-to-bottom" refers to the textual ordering of the variables in an XML document specifying a given DMR.

If a variable is a Structure variable, then its data representation will be the concatenation of the variables it contains, which will appear in top-to-bottom order.

If a variable is a Sequence variable, then its data representation will have two parts.

  1. A 64-bit signed count of the number of elements in the sequence
  2. Count instances of the 5.11.2 for the Sequence.

If a variable has dimensions, then the contents of each dimensioned data item will appear concatenated and taken in row-major order.

Variable Representation

Given a dimensioned variable, it is represented as the N scalar values concatenated in row-major order.

If the variable is scalar, then it is represented as a single scalar value.

Numeric Scalar Atomic Types

For the numeric atomic types, scalar instances are represented as follows. In all cases a consistent byte ordering is assumed, but the choice of byte order is at the discretion of the program that generates the serial representation, typically a server program.

Type NameDescriptionRepresentation
Int8Signed 8-bit integer8 bits
UInt8Unsigned 8-bit integer8 bits
ByteUnsigned 8-bit integerSame as UInt8
CharUnsigned 8-bit integerSame as UInt8
Int16Signed 16-bit integer16 bits
UInt16Unsigned 16-bit integer16 bits
Int32Signed 32-bit integer32-bits
UInt32Unsigned 32-bit integer32-bits
Int64Signed 64-bit integer64-bits
UInt64Unsigned 64-bit integer64-bits
Float3232-bit IEEE floating point32-bits
Float6464-bit IEEE floating point64-bits

In narrative form: all numeric quantities are used as a raw, unsigned vector of N bytes, where N is 1 for Char, Int8, and UInt8; it is 2 for Int16 and UInt16; it is 4 for Int32, UInt32, and Float32; and it is 8 for Int64, UInt64, and Float64.

Byte Swapping Rules

If the server chooses to byte swap transmitted values, then the following swapping rules are used.

Size (bytes)Byte Swapping Rules
1Not Applicable.
2Byte 0 -> Byte 1
Byte 1 ->Byte 0
4Byte 0 -> Byte 3

Byte 1 ->Byte 2
Byte 2 -> Byte 1

Byte 3 ->Byte 0
8Byte 0 -> Byte 7

Byte 1 ->Byte 6
Byte 2 -> Byte 5
Byte 3 ->Byte 4

Byte 4 -> Byte 3

Byte 5 ->Byte 2
Byte 6 -> Byte 1
Byte 7 ->Byte 0

Variable-Length Scalar Atomic Types

The variable length atomic values are all represented as a signed 64-bit count followed by the data of the value.

Type NameDescriptionRepresentation
StringVector of 8-bit bytes representing a UTF-8 StringThe number of bytes in the string (in Int64 format) followed by the bytes.
URLVector of 8-bit bytes representing a URLSame as String
OpaqueVector of un-interpreted 8-bit bytesThe number of bytes in the vector (in Int64 format) followed by the bytes.

In narrative form, instances of String, Opaque, and URL types are represented as a 64 bit length (treated as Int64) of the instance followed by the vector of bytes comprising the value.

Structure Variable Representation

A Structure typed variable is represented as the concatenation of the representations of the variables contained in the Structure taken in textual top-to-bottom order. This representation may be nested if one of the variables itself is a Structure variable. Dimensioned structures are represented in a form analogous to dimensioned variables of atomic type. The Structure array is represented by the concatenation of the instances of the dimensioned Structure, where the instances are listed in row-major order.

It should be noted that no padding is present in the structure representation. One field's content is immediately followed by the next field's content.

Sequence Variable Representation

A Sequence typed variable is represented as a count specifying the number of objects (not bytes) of the sequence followed by count instances of the corresponding Structure using the Structure representation rules. This representation may be nested if one of the variables itself is a Sequence variable. Dimensioned sequences are represented in a form analogous to dimensioned variables of atomic type. The Sequence array is represented by the concatenation of the instances of the dimensioned Sequence, where the instances are listed in row-major order.

Each Sequence variable, then, consists of a length, L say, in Int64 form and giving the number of elements for a specific occurrence of the variable-length dimension. The count, L, is then followed by L instances of the serialized form of the sequence's corresponding structure.

Checksums

As an option, checksums will be computed for the values of all the "top-level" variables present in the DMR of a returned response from a server. The term "top-level" means that the variable is not a field of a Structure (or Sequence) typed variable.

The purpose of the checksum is to detect changes in data over time. That is, if a client requests the same variable and the returned checksums are the same, then the client may infer that the data has not changed. The checksum is not intended for transmission error detection, although the client MAY use it for that purpose if it chooses. Note that the value of the checksum will change depending on the byte order used to serialize the data.

The checksum is made visible to the client by adding an attribute to each top-level variable in the DMR. This attribute is named "_DAP4_Checksum_CRC32".

In all cases, the checksum is computed over the serialized representation of each top-level variable. The checksum is computed before any chunking Section 7) is applied.

If the request to the server is a dmr-only request, then the server will compute the checksum for each variable mentioned in the DMR and will insert the "_DAP4_Checksum_CRC32" attribute in the DMR. Note that this can have significant performance consequences since the server may need to read and serialize all of the data for all of the variables mentioned in the DMR even though that data is not transmitted to the client.

If the request to the server is a data request, then the checksum value will follow the value of the variable in the data part of the response. The computed checksum is appended to the serialized representation for transmission to the client. Note that in this case, the client is expected to add the "_DAP4_Checksum_CRC32" attribute to the DMR.

The default checksum algorithm is CRC32. So the size of each checksum inserted in the serialization will be a 32 bit integer. The checksum integer will use the same endian representation as for the all other data. Note that CRC32 is not a cryptographically strong checksum, so it is not suitable for detecting man-in-the-middle attacks.

Historical Note

The encoding described in Section 6.1 is similar to the serialization form of the DAP2 protocol [3], but has been extended to support arrays with a varying dimension and stripped of redundant information added by various XDR implementations.

The DAP4 Serialization rules are derived from, but not the same as, XDR [10]. The differences are as follows.

  1. Values are encoded using the byte order of the server. This is the so-called "receiver makes it right" rule.
  2. No padding is used.
  3. Floating point values always use the IEEE 754 standard.
  4. One and two-byte values are not converted to four byte values.

Example responses

In these examples, spaces and newlines have been added to make them easier to read. The real responses are more compact. Since this proposal is just about the form of the response - and it really focuses on the BLOB part - there is no mention of 'chunking.' For information on how this BLOB will/could be chunked. see Section 7. NB: Some poetic license used in the following and the checksums for single integer values seems silly, but these are really simple examples.

A single scalar

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
<Int32 name="x"/>
</Dataset>
CRLF
{chunk count+tag}
x
{checksum}

A single array

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
<Int32 name="x">
<Dim size="2">
<Dim size="4">
</Int32>
</Dataset>
CRLF
{chunk count+tag}
x00 x01 x02 x03 x10 x11 x12 x13
{checksum}

A single structure

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
  <Structure name="S">
    <Int32 name="x">
      <Dim size="2">
      <Dim size="4">
    </Int32>
    <Float64 name="y"/>
  </Structure>
</Dataset>
CRLF
{chunk count+tag}
x00 x01 x02 x03 x10 x11 x12 x13
y
{checksum}

Note that in this example, there is a single variable at the top-level of the root Group, and that is S; so it is S for which we compute the checksum.

An array of structures

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
  <Structure name="s">
    <Int32 name="x">
      <Dim size="2"/>
      <Dim size="4"/>
    </Int32>
    <Float64 name="y"/>
    <Dim size="3"/>
  </Structure>
</Dataset>
CRLF
{chunk count+tag}
x00 x01 x02 x03 x10 x11 x12 x13 y x00 x01 x02 x03 x10 x11 x12 x13 y x00 x01 x02 x03 x10 x11 x12 x13 y
{checksum}

Single array with sequence

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
  <String name="s"/>
  <Sequence name="a-star">
      <Int32 name="a"/>
  </Sequence>
  <Sequence name="x-star">
      <Int32 name="x"/>
      <Dim size="2"/>
  </Sequence>
</Dataset>
CRLF
{chunk count+tag}
16 This is a string
{checksum}
5 a0 a1 a2 a3 a4
{checksum}
3 x00 x01 x02 6 x00 x01 x02 x03 x04 x05
{checksum}

Notes:

  1. The checksum calculation includes only the values of the variable, not the containing chunk's length bytes.
  2. The Sequence objects are treated 'like strings' and prefixed with a length count. In the last of the three variables, the dimensioned sequence x-star has two sequence instances where the first sequence has 3 elements and the second has 6.

Nested Sequences

The sequence 'x-star' has a field that is itself a sequence. In the example, at the time of serialization 'x-star' has three elements the inner sequence (of which there are three instances) have three, six and one element, respectively.

...
Content-Type: application/vnd.opendap.org.dap4.data
CRLF
{chunk count+tag}
<Dataset name="foo">
  <Sequence name="x-star">
      <Sequence name="y-star">
          <Int32 name="z"/>
      </Sequence">
  </Sequence">
</Dataset>
CRLF
{chunk count+tag}
3 3 x00 x01 x02 6 x10 x11 x12 x3 x14 x15 1 x20
{checksum}

DAP4 Chunked Data Representation

An important capability for DAP4 is supporting clients in determining when a data transmission fails. This is especially difficult when sending binary data (Section 6.1). In order to support such a capability, the DAP4 protocol uses a simplified variation on the HTTP/1.1 chunked transmission format [9] to serialize the data part of the response document so that errors are simple to detect. Furthermore, this format is independent of the form or content of that part of the response, so the same format can be used with different response forms or dropped when/if DAP is used with protocols that support out-of-band error signaling, simplifying our ongoing refinement of the protocol.

The data part of a response document is "chunked" in a fashion similar to that outlined in HTTP/1.1. However, in addition to a prefix indicating the size of the chunk, DAP4 includes a chunk-type code. This provides a way for the receiver to know if the next chunk is part of the data response or if it contains an error response (Section 12). In the latter case, the client should assume that the data response has ended, even though the correct closing information was not provided.

Each chunk is prefixed by a chunk header consisting of a chunk type and byte count, all contained in a single four-byte word. The encoding of this word is always network byte order (i.e. Big-Endian) The chunk type will be encoded in the high-order byte of the four-byte word and chunk size will be given by the three remaining bytes of that word. The maximum chunk size possible is 224 bytes. Immediately following the four-byte chunk header will be chunk-count bytes followed by another chunk header. More precisely the initial four bytes of the chunk are decoded using the following steps.

  1. Treat the 32 bit header a single, big-endian, unsigned integer.
  2. Convert the integer to the local machine byte order by swapping bytes as necessary (Section 6.2.3.2). Let the resulting integer be called H.
  3. Compute the chunk type by the following expression: type = (H >> 24) & 0xff (Using C-language operators).
  4. Compute the chunk length by the following expression: length = (H & 0x00ffffff) (Using C-language operators).

The chunk type is determined as a set of one or more flags. Currently, the possible flags are as follows:

Chunk Type Encoding
Bit # Value of 0 Value of 1
0 A data containing chunk The last data chunk
1 The current chunk is not an error chunk. The current chunk is an "error chunk" and contains an error message
2 The data in this response is encoded using Big-Endian (i.e. network byte order) The data in this response is encoded using Little-Endian

It is possible for a chunk type to have more than one of the flags. So, for example, if the data fits into a single chunk, and we assume little-endian encoding, then its chunk type would be End + LittleEndian.

Error implies End, but if the Error flag is set, then bit 0 should be treated as set even if it is not. Note that in order for this to work, the chunk flags values must be powers of two: e.g. 1, 2, 4.

The Endian flag must be set only in the first Data chunk. It applies to the whole response. If set in any subsequent chunk type, it will be ignored.

Chunked Format Grammar

chunked_response: chunklist ;
chunklist: chunk | chunklist chunk ;
chunk: CHUNKTYPE SIZE CHUNKDATA ;

Note that there is semantic limitation in the definition of 'chunk': the number of bytes in the CHUNKDATA must be equal to SIZE.

Lexical Structure

/* A single 8-bit byte,
   with the encoding 0 = data, 1 = end, 2 = error, 4 = Little-Endian */
CHUNKTYPE = '\x00'|'\x01'|'\x02'|'\x4'|'\x06'
/* A sequence of three 8-bit bytes,
  interpreted as an integer on network byte order */
SIZE = [\0x00-\0xFF][\0x00-\0xFF][\0x00-\0xFF]
CHUNKDATA = [\0x00-\0xFF]*

Constraints

A request to a DAP4 server for either metadata (the DMR) or data may include a constraint expression. This constraint expression specifies which variables are to be returned and what subset of the data for each variable is to be returned.

This section defines the a constraint language that MUST be supported by any implementation claiming to support the DAP4 protocol. The method by which a server is provided with a constraint is specified in Volume 2. But as a typical example, if such a constraint were to be embedded in a URL, then it is presumed that it is prefixed with a "?dap4.ce=constraint-expression" that is appended to the end of the URL.

The DAP4 Constraint Expression (CE) syntax is an extension of the syntax used by DAP2 that adds some important new features for Arrays as well as addressing some ambiguities and structural problems in the DAP2 syntax. In this design we also introduce some new terminology to make the explanation of the CE syntax clearer. Additionally, we use a 'curly brace' notation for datasets to streamline the description of datasets because the XML documents that DAP4 servers produce is verbose and hard for humans to read.

When a client makes a request to a DAP4 server, it MAY send a CE where a missing (or empty) CE is interpreted to mean that the client wants the entire dataset sent. A CE is made up of a list of clauses, e ach of which names a variable in the dataset that the client would like the server to send to it. Each clause can further be broken down into two parts: The subset expression and the filter expression. There are limitations on the CE clauses depending on variable type. For scalar variables, getting the variable is the only option available, so filter expression is supported, and if present, the only subset expression allowed is [0] or []. Structure variables can be subset by field but do not support filter expressions (although fields within a Structure may support filtering). Sequences can be subset by field and do support filters. Arrays support index subsets.

Specifically, the new features added for DAP4 constraints include:

  • Using a grouping operator for Structures and Sequences.
  • Sequence filtering expressions explicitly bound to a specific Sequence variable.
  • Multiple, disjoint index subsets.

Terminology used by this section

selection expression
The entire expression passed to the server that is used to choose specific parts of a dataset.
subset
The act of choosing parts of a dataset based on the type of one or more of its variables. We define several types of subsetting operations as follows:
index subsetting
Choosing parts of an array based on the indexes of that array's dimensions. This operation always returns an array of the same rank as the original, although the size of the return array will (likely) be smaller. Index subsetting uses the bracket syntax described subsequently.
field subsetting
Choosing specific variables (fields) from the dataset. A dataset in DAP4 is made up of a number of variables and those may be Structures or Sequences that contain fields. Field subsetting uses the brace syntax described later. One or more fields can be specified using a semicolon (;) as the separator.
filter
A filter is a predicate that can be used to choose sequence rows based on the values of fields of the sequence. the vertical bar (|) is used as a prefix operator for the filter predicate. Filters can be applied to fields of a Sequence. A filter predicate consists of one or more filter subexpressions. One or more subexpressions can be specified, using a comma (,) as the separator. Implicitly, multiple filter subexpressions are logically and'ded together.
filter subexpression
A simple expression that consists of a single variable/field; the expression is composed from traditional set of binary and unary operators: comparison operators (=, !=, <, <=, >, >=) for numbers and strings, and a string specific regular expression comparison operator (~=). The operands of the operators must be either numeric or string constants or a field of the Sequence. Specifically, only atomic-valued, scalar fields can be used in the filter expression.
id
The name of a variable. These must be absolute, with some specific exceptions. Absolute names are fully qualified names (See Section 5.3).

Subsetting Constraints

The simplest constraint is the null string and it means 'return everything' from the dataset. Choosing variables in a dataset is referred to as the subset. To choose a subset of the variables in a dataset, enumerate them in a semicolon-separated list. To choose parts of a Structure, name those parts explicitly using the syntax structure_name{field name} or structure_name.field name. Each DAP4 dataset contains one or more Groups; the top-level Group is always present and is named / (pronounced 'root').

Example: subsetting by variable or field

<Dataset name="vol_1_ce_1" 
  dapVersion="4.0" 
  dmrVersion="1.0" 
  xml:base="file:dap4/test_ce_1.xml"
  xmlns="http://xml.opendap.org/ns/DAP/4.0#"
  xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#">

  <Int32 name="u"/>
  <Int32 name="v"/>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y"/>
  </Structure>

</Dataset>

Note: The syntax used for the examples is (hopefully) easier to read than the DAP4 DMR which uses XML; Curly braces indicate hierarchy.

Dataset {
    Int32 u;
    Int32 v;
    Structure {
        Int32 x;
        Int32 y;
    } Point;
} vol_1_ce_1;
Access just u
/u
Access just u and v
/u;/v
Access just x within Point
/Point{x}
Equivalent expression to access just x within Point
/Point.x
<Dataset name="vol_1_ce_2">
  <Int32 name="u"/>
  <Int32 name="v"/>
  <Group name="inst2">
    <Int32 name="u"/>
    <Int32 name="v"/>
    <Structure name="Point">
      <Int32 name="x"/>
      <Int32 name="y"/>
    </Structure>
  </Group>
</Dataset>
Dataset {
    Int32 u;
    Int32 v;
    Group {
        Int32 u;
        Int32 v;
	Structure {
	    Int32 x;
	    Int32 y;
	} Point;
   } inst2;
} vol_1_ce_2;
Access 'top-level' u and v
/u;/v.
Access 'top-level' u and v and inst2's u and v
/u;/v;/inst2/u;/inst2/v.
Access inst2's u and v
/inst2/u;/inst2/v
Access field x in Point, which is inside the inst2 Group
/inst2/Point{x} or /inst2/Point.x.

Notes

  • Using a semicolon is a change from DAP2 where clauses in the project part of the constraint were separated using a comma (,). We used semicolon because the comma is used elsewhere and using comma here made for a convoluted grammar. We wanted the grammar to be LALR(1) so that both table-driven and recursive-descent parsers would be easy to write.because it's easy to make both table and recursive descent parsers for these.
  • Every name in a constraint should be a fully qualified name, except that if a simple name is referenced inside curly braces (e.g. {x}) for a variable whose type is a structure or sequence type, S say, and "x" is a top-level field in S, then that is allowed.

Array Subsetting in Index Space

Subsetting fixed-size arrays in their index space is accomplished using square brackets. The syntax closely follows that of DAP2, with some extensions. For an array with N dimensions, N sets of brackets are used, even if the array is only subset on some of the dimensions. The names of array variables are fully qualified names (FQNs) so it's possible to name arrays in structures and/or Groups. Array index values are zero-based as with a number of programming languages such as C and Java. Every array has a known starting index value of zero. Within the square brackets, several subexpressions are allowed:

[ ]
return all of elements elements for a particular dimension or apply a shared dimension slice (more on this later).
[ n ]
return only the value at a single index, where 0 <= n < N for a dimension of size N. This slicing operator does not reduce the dimensionality of an array, but does return a dimension size of one for the dimension to which this is applied.
[ start : step : last ]
return every value whose index is in the range start <= index <= last and where (index - start) % step == 0. This is the complete version of the syntax.
[ start : last ]
return the values whose index is in the range start <= index <= last.
[ start : ]
return the values whose index is in the range start <= index <= the dimension size - 1.
[ start : step : ]
return every value whose index is in the range start <= index <= dimension size - 1 and where (index - start) % step == 0.

Subsetting can be applied to any array. It can also be applied to a scalar, but in this case, the only legal forms are [0] or [].

Example: Subsetting in Index Space

<Dataset name="vol_1_ce_3">
  
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="256"/>
  </Int32>
  <Int32 name="v">
    <Dim size="256"/>
    <Dim size="256"/>
  </Int32>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Dim size="256"/>
  </Structure>
</Dataset>
Dataset {
    Int32 u[256][256];
    Int32 v[256][256];
    Structure {
        Int32 x;
        Int32 y;
    } Point[256];
} vol_1_ce_3;
Access all of u
/u
Access all of Point 's x field
/Point{x} or /Point.x. This returns an array of Structures with a single (Int32) element, not an array of Int32.
Access elements 10 through 19 of array Point
/Point[10:19]. DAP4, like DAP2, uses zero-based indexes. This CE will return the 10th through the 19th elements (Structures in this case) of the array.
Access every 4th element in the Point array
/Point[0:4:255], or /Point[0:4:]. This is a simple decimation operation; this CE would return 64 Structures corresponding to elements at indexes 0, 3, 7, ..., 255 of the array.
The index-space and field subsetting may be combined in the logical way
/Point[0:4:]{x} will return an array of structures (with 64 elements) named Point that contains a single Int32 field named x.

Other possible CEs:

/u[0:4:][0:4:]
every fourth element in both dimensions; this would return 1/16th of the array's data.
/u[][10:19]
elements corresponding to every row and columns 10 through 19.
/u[7][10:19]
elements corresponding to the 8th row and columns 10 through 19.
u[10:19][10:19]
elements corresponding to rows 10 through 19 and columns 10 through 19.
/u[0:19][0:19]
elements corresponding to rows 0 through 19 and columns 0 through 19.
/u[][]
identical to /u, as are /u[0:][0:] and /u[0:1:][0:1:].

More complex subsetting examples

The data model for DAP4 is very similar to that of a modern structured programming language where constructor types like Structure may contain any allowed type (including other Structures, etc.) as well as being arrays themselves. The basic syntax for subsetting outlined so far can be applied to the fields of a Structure using braces to enclose the subsetting expression that apply to the fields of the Structure. This can be applied recursively.

<Dataset name="vol_1_ce_4">
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="1024"/>
  </Int32>
  <Structure name="Point">
    <Int32 name="x"/>
    <Int32 name="y">
      <Dim size="256"/>
    </Int32>
    <Int32 name="z">
      <Dim size="1024"/>
    </Int32>
    <Dim size="256"/>
  </Structure>
</Dataset>
Dataset {
    Int32 u[256][1024];
    Structure {
        Int32 x;
        Int32 y[1024];
        Int32 z[256];
    } Points[256];
} vol_1_ce_4;
/Points{y[7:256]} or /Points.y[7:256]
Get all of the elements of the Array of Structure Points and for each of those elements get the elements 7 through 256 from the field array y. Do not return the field x.
/Points[0:9]{y[0:9]} or /Points[0:9].y[0:9]
Get the first ten elements of Points and, for each of those, only the first ten elements of the array y.
/Points[0:9]{x;y[0:9]}
Get the first ten elements of Points and, for each of those, return only all of x' and the first ten elements of the array y.
/Points[0:9]
Get the first ten elements of Points (both fields are included)
/Points or /Points[] or /Points[0:]
Get all of Points with the subtle difference that if Points uses a shared dimension, the last of the three CEs will replace that with an anonymous dimension (see the section on shared dimensions, below).
<Dataset name="vol_1_ce_5">
  <Int32 name="u">
    <Dim size="256"/>
    <Dim size="1024"/>
  </Int32>
  <Structure name="Points">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Structure name="sounding">
      <Int32 name="height">
        <Dim size="1024"/>
      </Int32>
      <Int32 name="pressure">
        <Dim size="1024"/>
      </Int32>
    </Structure>
    
    <Dim size="256"/>
  </Structure>
</Dataset>
Dataset {
    Int32 u[256][1024];
    Structure {
        Int32 x;
        Int32 y;
        Structure {
            Int32 height[1024];
            Int32 pressure[1024];
        } sounding;
    } Points[256];
} vol_1_ce_5;
/Points[0]{x,y,sounding{height[0:8:]}}
Get only the first element of Points and, for that, get the fields x, y and a slice of sounding where the sounding slice is every 8th element of the field height and elide the field pressure. An equivalent way of writing this expression is /Points[0]{x,y,sounding.height[0:8:]}. The {} syntax provides an easy way to request x, y and sounding.height[0:8:] without having to repeat /Points[0] three times. A CE like /Points[0].x;/Points[0].y;Points[0].soundings.height[0:8:] is legal, but /Points[0] will only appear once in the result and a CE where Points is sliced differently is not legal. That is, Points[0].x;Points[0:10].y;Points[15].soundings.height[0:8:] is not legal because Points can appear only once in the result but has been sliced three different ways in the CE. In any CE, each variable can be constrained only one way.

Array subsetting with Disjoint Index Subsets

As a new feature in DAP4 constraints, index subset within square brackets can contain multiple, disjoint slices, where each slice is of any of the previously defined slice formats (most generally start:stride:last). The disjoint slices are separated by commas.

Using the preceding example (dataset vol_1_ce_4), some disjoint index examples might be as follows.

/u[10:12,19:23]
Access elements 10 through 12 and 19 through 23 of array u. The result will be an array of size 3+5 = 11 elements. The values returned will be, in order,

u[10] u[11] u[12] u[19] u[20] u[21] u[22] u[23].

/u[19:23, 10:12]
Access elements 19 through 23 and 10 through 12 of array u. The result will be an array of size 11, but the values returned will be in a different order, namely

u[19] u[20] u[21] u[22] u[23] u[10] u[11] u[12].

In the event that the slices are not disjoint, the result is undefined.

How Sequences fit into this syntax

The Sequence type is more general data type in DAP4 than in DAP2 where it was significantly limited. In DAP4 Arrays of Sequences will be supported as will Sequence fields that are themselves Arrays or Sequences. A Sequence variable is conceptually like a table of rows where each field in the Sequence is a column in the table (or like an array of Structures, where the size of the single array dimension is a secret). Note that while there is a big difference between the value held by a Structure and a Sequence, each has the same subsetting syntax in the CE (although Sequences may have filters applied while Structures may not).

<Dataset name="vol_1_ce_6">
  <Sequence name="s1">
    <Int32 name="x"/>
    <Int32 name="y"/>
  </Sequence>
  
  <Sequence name="s2">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Dim size="100"/>
  </Sequence>
  
  <Sequence name="s3">
    <Int32 name="z"/>
    <Int32 name="x">
      <Dim size="10"/>
    </Int32>
  </Sequence>
  
  <Sequence name="s4">
    <Int32 name="z"/>
    <Int32 name="x">
      <Dim size="1024"/>
    </Int32>
    <Dim size="100"/>
  </Sequence>
  
</Dataset>
Dataset {
    Sequence {
        Int32 x;
        Int32 y;
    } s1;

   Sequence {
        Int32 x;
        Int32 y;
    } s2[100];

    Sequence {
        Int32 z;
        Int32 x[10];
    } s3;

     Sequence {
        Int32 z;
        Int32 x[1024];
    } s4[100];
} example;
/s1
All of Sequence s1.
/s1{x;y}
Also all of Sequence s1.
/s1{x} or /s1.x
every 'row' of Sequence s1, but just field x.
/s2{x;y}
All one hundred Sequences instances (not rows, but full sequences) of the Array s2. Same as /s2 and /s2[0:99]{x,y} and /s2[]{x;y}.
/s2[0:9]{x;y}
The first ten Sequence instances of s2. That would be 10 Sequences and for each, both the fields x and y.
/s3{} | z < 10
Every instance of the Sequence s3 where z is less than 10. Note that this is the first example of a filter, a topic that is discussed in much more detail later on.

Subsetting and Shared Dimensions

Shared Dimensions provide additional information to indicate that a group of arrays share certain relationships; that specific groups of the arrays form coverages by indicating how dimensions of Maps and Arrays are linked. The DAP4 CE syntax provides a way to slice a Shared Dimension so that slice can be used by all of the arrays that use it without repeating the slicing operation for each Array. The syntax can be read 'Assign the shared dimension X this slice,' where the slice looks like, for example, row=[10:19]. All of the variations of the slice operator possible for an array are accepted for shared dimension slicing. In any CE, all of the shared dimension slicing clauses must precede the variable subsetting clauses.

Note DAP4 uses XML for it's actual grammar, and because that's wordy this document includes a mock notation. I will extend that notation used so far so it includes concepts needed to mimic DAP4's notation for a coverage:

  • The keyword Dimensions introduces a list of symbols and their sizes. (That is the definition of a Dimension in DAP4; a size bound to an identifier.)
  • Arrays where every dimension uses a Dimension to supply its extent are DAP4 Maps. Maps are the arrays that hold the domain values for a coverage.

New 4/15/16

Using Shared Dimensions for array slicing adds some complexity to the processing of constraints. Two cases are important to consider and are shown in the examples.

  • When a request is made for an Array with Maps but the request names only the Array and not the Maps, the assumption is made that the requester intended to receive only the Array and not the Maps. For example, the client might have already requested/received the Maps. Note that in this case the CDMR included with the data response will still include the Map element(s) for the Array, and the receiving client must know that the associated (Map) variable is not present in the response.
  • A second case involves requests for two or more Arrays that share Maps and that constrain (i.e. 'slice') those Maps differently. Because this can introduce a logical inconsistency, when a local dimension slice is applied to an Array's dimension that has a Map, using that local dimension slice will cause the Map to be removed from the data response's CDMR.

The examples make these two cases clearer.

/New

Example of this syntax

<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
  
  <Float32 name="lat">
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
  
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>

  <Float32 name="sal">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>
  
  <Float32 name="O2">
    <Dim name="nlat"/>
    <Dim name="nlon"/>
    <Map name="lon"/>
    <Map name="lat"/>
  </Float32>
  
  <Float32 name="CO2">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Dim size="10"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>
  
</Dataset>
Dataset {
    Dimensions: nlat=100, nlon=50; 
    Float32 lat[nlat];
    Float32 lon[nlon];

    // The maps ''lat'' and ''lon'' are used here and define a coverage
    Float32 temp[lon][lat];
    Float32 sal[lon][lat];
    Float32 O2[lat][lon];
    Float32 CO2[lon][lat][10];
} shared_dimensions;

Examples of subsetting using shared dimensions

nlat=[0:9];nlon=[10:19];lat[nlat];lon[nlon];temp[nlat][nlon]
This will return Dimensions nlat=10, nlon = 10, lat, lon and temp such that lat an lon are 10 element vectors and temp is a 10 x 10 array.

Because the arrays are dimensioned using nlat and nlon in the original DMR, this expression can also be written as nlat=[0:9];nlon=[10:19];lat[];lon[];temp[][] or nlat=[0:9];nlon=[10:19];lat;lon;temp

nlat=[0:9];nlon=[10:19];lat; lon; temp; sal
Same as above, but with both temp and sal included. This example shows how two or more arrays variables can be accessed along with their Maps without sending multiple copies of the Maps. Similarly, ...
nlat=[0:9];nlon=[10:19];lat; lon
This CE requests just the arrays that hold the domain values, while ...
nlat=[0:9];nlon=[10:19];temp; sal
This CE requests just the arrays that hold the range values. Taken together, the two preceding examples support clients that read the domain values first and then display a map (for example) providing a way for someone to view the data's geographical extent before accessing the values them selves. Also note that there is no restriction that the same shared dimension slices must be used for both requests; like DAP2, each request in DAP4 is stateless.
nlat=[0:9];nlon=[10:19];temp[][]; sal[][]
This CE requests exactly the same data as the previous one, but uses the [] notation to indicate that the shared dimensions should be used for the subset. An example below shows how this notation can be used to mix local and shared dimension slicing.
nlat=[0:4:];nlon=[0:4:];CO2
This CE decimates CO2 by returning every fourth value in the first two dimensions
nlat=[0:4:];nlon=[0:4:];CO2[][][0:4:]
This CE introduces the second meaning for []. When the empty braces are used for a dimension that corresponds to a shared dimension, it means use the shared dimension slice. This is useful because some arrays contain a mixture of shared and anonymous dimensions and it's desirable to slice both, using a shared dimension slice previously defined where applicable and an anonymous slice where that's needed. This expression will decimate CO2 by four in each of its three dimensions.
nlat=[0:4:];nlon=[0:4:];CO2[][1][0:4:]
To override the slicing provided by a shared dimension slice, simply replace the [] with a local dimension slice.

New 4/15/16

temp
This will return only the Array temp. The constraint lat;lon;temp will return three Arrays: The Map Arrays lat and lon and the 'value Array' temp. In both cases the CDMR returned in the response will include mention of the Maps lat' and lon. In the first case where only temp is requested, the client must be savvy (or permissive) enough to realize that the Map Arrays are not present. In summary, it is the requester's responsibility to understand that the Maps are separate variables and must be explicitly requested. Here are example CDMR responses:

The CDMR for the CE temp:

<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
    
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>

</Dataset>

The CDMR for the CE lat;lon;temp:

<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="100"/>
  <Dimension name="nlon" size="50"/>
  
  <Float32 name="lat">
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
  
  <Float32 name="temp">
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>

</Dataset>
nlat=[0:9];nlon=[10:19];lat; lon; temp; sal[][8:9]
This request is almost the same as the third example, but notice that sal uses a local dimension slice for its second dimension. This means that it will not use the nlon=[10:19] slice that temp uses. To avoid a conflict with the nlon slice and the fact that that is being applied to temp (and lon in this example), applying a local dimension slice to an Array with Maps will cause the associated Maps to be elided from the response's CDMR. For Arrays with no Maps, this has no effect.

The CDMR for the CE temp:

<Dataset name="vol_1_ce_7">
  <Dimension name="nlat" size="10"/> <!-- The effect of ''nlat=[0:9]'' -->
  <Dimension name="nlon" size="10"/> <!-- ... nlon=[10:19] ->
  
  <Float32 name="lat">               <!-- We asked for lat and lon -->
    <Dim name="nlat"/>
  </Float32>
  <Float32 name="lon">
    <Dim name="nlon"/>
  </Float32>
  
  <Float32 name="temp">              <!-- ... and temp -->
    <Dim name="nlon"/>
    <Dim name="nlat"/>
    <Map name="lat"/>
    <Map name="lon"/>
  </Float32>

  <Float32 name="sal">              <!-- ... and sal, but... -->
    <Dim name="nlon"/>
    <Dim size=2/>                   <!-- for this dimension, we use a local dim slice -->
    <Map name="lat"/>               <!-- and thus only one of the two Maps is shown. -->
  </Float32>
    
</Dataset>

/New

Constrained DMR Objects

When a DAP4 server receives a request for a Data response, it must build and return a Data Response Document that contains a text/xml part containing a DMR, a separator and a binary part that contains the data values. The organization of the Data Response Document is described in detail elsewhere in this document. In this section the focus is on the DMR returned in the first part of the response and how it relates to the DMR for the original unconstrained dataset. We refer to the original dataset's DMR as the DMR and the DMR associated with the data response as the CDMR (short-hand for Constrained DMR), although a data response can be generated using a null CE, we consider that a constraint, too.

The DMR contains a number of declarations for the dataset: Enumerations, Dimensions, Attributes, Groups and Variables. Each DMR and CDMR must follow the rules for the DMR described in this specification and, because DAP4 is a stateless protocol, each response from a server must stand on its own. Since a Constraint Expression alters the data returned (limiting variables, changing the size of dimensions and so on), it stands to reason that the contents of the CDMR will vary for any given dataset based on the CE. Furthermore, a goal of DAP4 is to specify that the CDMR be 'minimal' containing no unused definitions.

Because filters alter the values of variables, but not whether a variable is returned, they have no affect on the CDMR. Only the subsetting operators will be discussed here.

Enumerations

An enumeration is included in the CDMR if and only if some variable or attribute in the CE references it. A null CE returns the entire dataset, so it effectively references every variable.

Shared Dimensions

Shared Dimension declarations from the DMR are not included in the CDMR unless the Shared Dimension is used by a variable that has been projected and that variable does not override that shared dimension using a local slicing operation.

Variables

Each clause in the constraint must specify a variable and that variable will be declared in the CDMR. The variable must be referenced by a FQN.

Array Variables

Array variables follow all the rules for Variables with the additional conditions that their dimensions may appear altered depending on the CE. If the local slicing operations are used, then the sliced dimensions will have the size given be the slice operator, not the size as shown in the full dataset's DMR. If a shared dimension is sliced and the Array uses that slice, then its size will reflect that. Arrays may mix shared dimension slices and local slices and the result must be correctly reflected in the specific variable's declaration.

Note that slicing never affects the rank of an array.

Structure Variables

If the variable is a Structure, then either the entire Structure is included or a subset of its fields will be included in the variable declaration where the fields are those specifically mentioned in a constraint projection. As with all other variables, each variable in the structure will have the same rank and type as the original declaration in the DMR.

Sequence Variables

If the variable is a Sequence, then for declaration purposes, it is treated like a Structure (as above). Note that applying a filter to a Sequence will not change its declaration form because the number of records in the sequence is not specified in the DMR. Note also that mentioning a Sequence field in the filter does not necessarily mean it will be included in the DMR. It will only be included if it is mentioned in the projection part of the constraint clause.

Groups

Each declaration in the CDMR that corresponds to a declaration in the DMR will cause its containing group (and that group's parents) to be included in the CDMR. This ensures that the FQN for a declaration in the CDMR is the same as in the DMR.

Attributes

Attributes are unaffected by the CE and are simply included in the CDMR, with the stipulation that attributes for variables that are not included in the CDMR won't be part of the CDMR. Essentially DAP4 views those attributes as part of the variables and explicitly excluding the variable from the CDMR (by providing a CE that does not include it) excludes its attributes too. Group level attributes will be included if and only that group appears in the CDMR.

There is one situation that bears mention, however. Many datasets contain variables which include attributes that describe domain-specific values for for the variables value(s). For example, imagine a atmospheric profile that includes information about the minimum and maximum temperatures of that profile. If the values are stored in an array and the array is sliced so that only a subset of values are returned, the attributes will provide correct values for the original data but possibly not the data returned in the response because the slicing operation has removed some of the values of the array. Because DAP4 is a domain neutral protocol, it has no knowledge about how the values of a specific attribute relate to the values of the variable and cannot adjust the values of the attribute to match the CE.

Filters

While subsetting provides ways to choose data based on the dataset structure and the types of the variables, filters provide a way to choose data based on their values. The values to be returned are denoted using one or more simple predicates. The general syntax for a filter expression is to follow a subset (projection) expression with a pipe (|) and one or more filter predicates. Multiple predicates are separated by commas and the value of complete predicate is the logical AND of the comma-separated subexpressions.

Filter expressions can only be applied to Sequence variables (or arrays of them). In each case the result of the filter operation returns the same type variable. A Sequence variable is essentially a table of values and thus can be thought of as containing a number of rows and the filter expression is applied to each row in the order those rows are provided to the expression evaluator. Every row that satisfies the predicate will be included in the value returned; those that don't will not be included in the result. Note that no new values are computed by these operations; no interpolations, means, etc., are performed.

The behavior of filtering expressions on Sequences will be covered in the following sections.


Filters and more complex data types

The basic syntax for filters is that there is a subsetting expression, a pipe (|) and then one or more filter predicates. This syntax can appear any place a selection expression can appear, so it can be used inside braces when an Array or Sequence is a field of a Structure or Sequence. Note that the filter expression prefix operator binds to the index subset immediately to its left at the same level (i.e. eliding braces). Some examples follow.

Example: Filters on complex types

<Dataset name="vol_1_ce_9">
  <Sequence name="Points1">
    <Int32 name="x">
      <Dim size="100"/>
    </Int32>
    <Int32 name="y"/>
  </Sequence>
  
  <Sequence name="Points2">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="sounding">
      <Int32 name="depth"/>
      <Int32 name="temp"/>
    </Sequence>
  </Sequence>

  <Sequence name="Points3">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="sounding">
      <Int32 name="depth"/>
      <Int32 name="temp"/>
    </Sequence>
    <Dim size="20"/>
  </Sequence>
  
  <Structure name="Points4">
    <Int32 name="x"/>
    <Int32 name="y"/>
    <Sequence name="raw">
      <Int32 name="depth"/>
      <Int32 name="temps">
        <Dim size="4"/>
      </Int32>
      <Dim size="300"/>
    </Sequence>
  </Structure>
  
</Dataset>
Dataset {
    Sequence {
        Int32 x[100];
        Int32 y;
    } Points1;

    Sequence {
        Int32 x;
        Int32 y;
        Sequence {
            Int32 depth;
            Int32 temp;
        } sounding;
    } Points2;

    Sequence {
        Int32 x;
        Int32 y;
        Sequence {
            Int32 depth;
            Int32 temp;
        } sounding;
    } Points3[20];

    Structure {
        Int32 x;
        Int32 y;
        Sequence {
            Int32 depth;
            Int32 temps[4];
        } raw;
    } Points4[100]

} complex_types_example;
/Points1{x[0:9]}|y<3
For the Sequence Points1, return the rows of data where y is less than 3. In those rows, subset x so that only the first ten elements are included. Note that y is mentioned in the filter, but not in the selection so it will not appear in the resulting DMR.
/Points2{x; y; sounding | depth > 20} | x > 17
This show, without the added complexity of an array, how filter expressions associate with Sequences. For the sequence sounding the filter expression can use only depth and temp (and constants). When filtering the values of a child sequence, the sequence name must be used and thus the names of all of the fields of the parent sequence needed in the result must be listed.
/Points3[10:19] { x; y; sounding | depth > 10 } | 20 < x < 40, y <35
This selection expression first finds the index subset of Points3 and arranges to return the fields x, y and sounding where xand y satisfy the predicates 20 < x < 40 and y <35 and for the field sounding, which is a Sequence itself, it will return both fields where depth > 10. This example points out an important aspect to the syntax and to expression evaluation: the order of evaluation of the filter predicates happens after the index and variable and/or field subsetting. The order of evaluation of the complete filter predicates can happen in any order (i.e., the 20 < x < 40, y <35 and depth > 10 predicates can happen in any order. The order of evaluation of the filter predicate subexpressions (i.e., 20 < x < 40 and y <35) is also unspecified.
/Points4[3:2:8] {x; y; raw{temps[2] | temps > 7,ND=-1}}
In this expression the temps field of the Sequence raw is still an Array, it's just an Array with a single element, which illustrates that neither the subsetting nor filtering operations alter the types of the variables.


References

  1. Caron, J., Unidata's Common Data Model Version 4, 2012 (http://www.unidata.ucar.edu/software/netcdf-java/CDM/).

  2. Folk, M. and E. Pourmal, HDF5 Data Model, File Format and Library — HDF5 1.6, Category: Recommended Standard January 2007 NASA Earth Science Data Systems Recommended Standard ESDS-RFC-007, 2007 (http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/ese-rfc-007/ESDS-RFC-007v1.pdf).

  3. Gallagher J., N. Potter, T. Sgouros, S. Hankin, and G. Flierl, The Data Access Protocol—DAP 2.0, NASA Earth Science Data Systems Recommended Standard ESE-RFC-004.1.2 (http://opendap.org/pdf/ESE-RFC-004v1.2.pdf).

  4. Gosling, J., B. Joy, G. Steele, G. Bracha, A Buckley, The Java™ Language Specification — 7th Editition Oracle Corporation, 2012, (http://docs.oracle.com/javase/specs/jls/se7/html/).

  5. Hartnett, E., netCDF-4/HDF5 File Format, NASA Earth Science Data Systems Recommended Standard ESDS-RFC-022, 2011 (http://earthdata.nasa.gov/sites/default/files/field/document/ESDS-RFC-022v1.pdf).

  6. IEEE, IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985, Digital Object Identifier: 10.1109/IEEESTD.1985.82928, 1985.

  7. The Internet Society, IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels , 1997 (http://tools.ietf.org/html/rfc2119).

  8. The Internet Society, IETF RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax , 1998 (http://tools.ietf.org/html/rfc2396).

  9. The Internet Society, IETF RFC 2616: Hypertext Transfer Protocol — HTTP/1.1 , 1999 (http://tools.ietf.org/html/rfc2616).

  10. The Internet Society, IETF RFC 4506: XDR: External Data Representation Standard, 2006 (http://tools.ietf.org/html/rfc4506).

  11. ISO/IEC, Information technology — Portable Operating System Interface (POSIX) — Part 2: Shell and Utilities, ISO/IEC 9945-2,1993 (http://www.iso.org/iso/catalogue_detail.htm?csnumber=17841).

  12. The Open Geospatial Consortium Inc., Abstract Specifications, (http://www.opengeospatial.org/standards/as).

  13. The Organization for the Advancement of Structured Information Standards, RELAX NG Specification, Committee Specification: 2001, J. Clark, M. Makoto (eds.) (http://relaxng.org/spec-20011203.html).

  14. The Unicode Consortium. The Unicode Standard, Version 6.2.0, ISBN 978-1-936213-07-8, 2012.

  15. Unidata, CF Metadata, (http://www.cfconventions.org/).

  16. W3C, Extensible Markup Language (XML) 1.0, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau (eds.), Fifth Edition. 2008 (http://www.w3.org/TR/2008/REC-xml-20081126/).

  17. World Meteorological Organization, FM 92 GRIB, edition 2, version 2, 2003 (http://www.wmo.int/pages/prog/www/DPS/FM92-GRIB2-11-2003.pdf).007

Appendices

Appendix 1. DAP4 DMR Syntax as a RELAX NG Schema

This RELAX NG grammar is the definitive formal grammar for the DMR.

<!-- RELAX NG Grammar -->
<!-- Date: June 15, 2012 -->
<!-- Last Revised: November 23, 2012 -->

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
         datatypeLibrary="http://xml.opendap.org/datatypes/dap4"
         ns="http://xml.opendap.org/ns/DAP/4.0#"
         >
<start>
  <ref name="dataset"/>
</start>

<define name="dataset">
  <element name="Dataset">
    <a:documentation>
        Semantic restriction: dapVersion, dmrVersion are required.
    </a:documentation>
    
    <attribute name="dapVersion"><data type="dap4_string"/></attribute>
    <attribute name="dmrVersion"><data type="dap4_string"/></attribute>

    <ref name="groupbody"/>
  </element>
</define>

<define name="groupdef">
  <element name="Group">
    <ref name="groupbody"/>
  </element>
</define>

<define name="groupbody">
  <attribute name="name"><data type="dap4_id"/></attribute>
  
  <zeroOrMore>
    <ref name="dimdef"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="enumdef"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="variable"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="metadata"/>
  </zeroOrMore>
  <zeroOrMore>
    <ref name="groupdef"/>
  </zeroOrMore>

</define>

<define name="enumdef">
  <element name="Enumeration">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="basetype">
        <choice> <!-- Must be consistent with atomictype and variable -->
            <value>Byte</value> <!-- equivalent to UInt8 -->
            <value>Int8</value>
            <value>UInt8</value>
            <value>Int16</value>
            <value>UInt16</value>
            <value>Int32</value>
            <value>UInt32</value>
            <value>Int64</value>
            <value>UInt64</value>
        </choice>
    </attribute>
    <oneOrMore><ref name="enumconst"/></oneOrMore>
  </element>
</define>

<define name="enumconst">
  <element name="EnumConst">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="value"><data type="dap4_integer"/></attribute>
  </element>
</define>

<define name="namespace">
  <zeroOrMore>
    <element name="Namespace">
      <attribute name="href"><data type="dap4_uri"/></attribute>
    </element>
  </zeroOrMore>
</define>

<define name="dimdef">
  <element name="Dimension">
    <a:documentation>
      A Dimension is a binding of a name to a size; when two or more variables
      use the same 'name' it can be inferred that they 'share' that dimension.
      The 'size' attribute must be a positive integer.
    </a:documentation>
    <attribute name="name"><data type="dap4_id"/></attribute>
    <attribute name="size"><data type="dap4_dim"/></attribute>
    <ref name="metadatalist"/>
  </element>
</define>

<define name="dimref">
  <element name="Dim">
    <optional>
        <attribute name="name"><data type="dap4_fqn"/></attribute>
    </optional>
    <optional>
      <attribute name="size">
          <data type="dap4_dim"/>
      </attribute>
    </optional>
  </element>
</define>

<!-- Atomictype define is only a way
     to list the set of atomictypes;
     it is never used in the grammar
-->
<define name="atomictype">
  <!-- This must be consistent with "variable" below -->
  <choice>
    <value>Char</value>
    <value>Byte</value>
    <value>Int8</value>
    <value>UInt8</value>
    <value>Int16</value>
    <value>UInt16</value>
    <value>Int32</value>
    <value>UInt32</value>
    <value>Int64</value>
    <value>UInt64</value>
    <value>Float32</value>
    <value>Float64</value>
    <value>String</value>
    <value>URL</value>
    <value>Opaque</value>
    <value>Enum</value>
  </choice>
</define>

<define name="variable">
  <choice>
    <ref name="simplevariable"/>
    <ref name="structurevariable"/>
    <ref name="sequencevariable"/>
  </choice>
</define>

<define name="simplevariable">
  <choice>
    <!-- Following  must be consistent with "atomictype" -->
    <element name="Char"   ><ref name="variabledef"/></element>
    <element name="Byte"   ><ref name="variabledef"/></element>
    <element name="Int8"   ><ref name="variabledef"/></element>
    <element name="UInt8"  ><ref name="variabledef"/></element>
    <element name="Int16"  ><ref name="variabledef"/></element>
    <element name="UInt16" ><ref name="variabledef"/></element>
    <element name="Int32"  ><ref name="variabledef"/></element>
    <element name="UInt32" ><ref name="variabledef"/></element>
    <element name="Int64"  ><ref name="variabledef"/></element>
    <element name="UInt64" ><ref name="variabledef"/></element>
    <element name="Float32"><ref name="variabledef"/></element>
    <element name="Float64"><ref name="variabledef"/></element>
    <!-- Made 'string' capitalized. jhrg -->
    <element name="String" ><ref name="variabledef"/></element>
    <!-- Added URL type. jhrg -->
    <element name="URL" ><ref name="variabledef"/></element>
    <element name="Opaque"><ref name="variabledef"/></element>
    <element name="Enum">
      <attribute name="enum"><data type="dap4_fqn"/></attribute>
      <ref name="variabledef"/>
    </element>
  </choice>
</define>

<define name="variabledef">
  <attribute name="name"><data type="dap4_id"/></attribute>
  <zeroOrMore>
    <choice>
      <ref name="dimref"/>
      <ref name="mapref"/>
      <ref name="metadata"/>
    </choice>
  </zeroOrMore>
</define>

<define name="mapref">
  <element name="Map">
    <attribute name="name"><data type="dap4_fqn"/></attribute>
  </element>
</define>

<define name="structurevariable">
  <element name="Structure">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
      <choice>
        <ref name="dimref"/>
        <ref name="variable"/>
        <ref name="metadata"/>
      </choice>
    </zeroOrMore>
  </element>
</define>

<define name="sequencevariable">
  <element name="Sequence">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
      <choice>
        <ref name="dimref"/>
        <ref name="variable"/>
        <ref name="metadata"/>
      </choice>
    </zeroOrMore>
  </element>
</define>

<define name="metadatalist">
  <zeroOrMore>
    <ref name="metadata"/>
  </zeroOrMore>
</define>

<define name="metadata">
    <choice>
    <ref name="otherxml"/>
    <ref name="attribute"/>
    </choice>
</define>

<define name="attribute">
  <choice>
    <ref name="atomicattribute"/>
    <ref name="containerattribute"/>
  </choice>
</define>

<define name="atomicattribute">
  <element name="Attribute">
      <attribute name="name"><data type="dap4_id"/></attribute>
      <a:documentation>
        Semantic constraint: type must be compatible
        with the set of attribute value types
      </a:documentation>
      <attribute name="type">
        <choice>
          <value>Char</value>
          <value>Byte</value>
          <value>Int8</value>
          <value>UInt8</value>
          <value>Int16</value>
          <value>UInt16</value>
          <value>Int32</value>
          <value>UInt32</value>
          <value>Int64</value>
          <value>UInt64</value>
          <value>Float32</value>
          <value>Float64</value>
          <value>String</value>
          <value>URL</value>
          <value>Enum</value>
          <value>Opaque</value>
        </choice>
      </attribute>
      <optional>
          <ref name="namespace"/>
      </optional>
      <zeroOrMore>
	<choice>
          <element name="Value">
              <attribute name="value">
                <choice> <!-- technical ambiguity -->
                    <data type="dap4_integer"/>
                    <data type="dap4_float"/>
                    <data type="dap4_opaque"/>
                    <data type="dap4_char"/>
                    <data type="dap4_string"/>
                    <data type="dap4_fqn"/> <!-- for enum types -->
                </choice>
              </attribute>
         </element>
	 <element name="Value"><data type="dap4_text"/></element>
	</choice>
      </zeroOrMore>
  </element>
</define>

<define name="containerattribute">
  <element name="Attribute">
    <attribute name="name"><data type="dap4_id"/></attribute>
    <zeroOrMore>
	<ref name="attribute"/>
    </zeroOrMore>
  </element>
</define>

<define name="otherxml">
  <element name="OtherXML">
    <ref name="arbitraryxml"/>
  </element>
</define>

<define name="arbitraryxml">
    <element>
      <anyName/>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <text/>
          <ref name="arbitraryxml"/>
        </choice>
      </zeroOrMore>
    </element>
</define>
</grammar>

Appendix 2. DAP4 RELAX NG Lexical Elements

Within the RELAXNG DAP4 grammar there are markers for occurrences of primitive type such as integers, floats, or strings (ignoring case). The markers typically look like this when defining an attribute that can occur in the DAP4 DMR.

<attribute name="Principal_Investigator">
<datatype="dap4_string"/>
</attribute>

The "<data type="dap4_string"/>" specifies the lexical class for the values that this attribute can have. In this case, the "Principal_Investigator" attribute is defined to have a DAP4 string value. Similar notation is used for values occurring as text within an xml element.

The lexical specification later in this section defines the legal lexical structure for such items. Specifically, it defines the format of the following lexical items.

  1. Constants, namely: string, float, integer, character, and opaque.
  2. Identifiers
  3. Fully qualified names (also referred to as FQNs) (Section 5.3).

The specification is written using the extended POSIX regular expression notation [11] with some additions.

  1. Names are assigned to regular expressions using the notation "name = regular-expression"
  2. Named expressions can be used in subsequent regular expressions by using the notation "{name}". Such occurrences are equivalent to textually substituting the expression associated with name for the "{name}" occurrence.

Notes:

  1. The definition of {UTF8} is deferred to the next section.
  2. Comments are indicated using the "//" notation. Standard xml escape formats (&x#DDD; or &{name};) are assumed to be used as needed.

Basic character set definitions

CONTROLS   = [\x00-\x1F] // ASCII control characters

WHITESPACE = [ \r\n\t\f]+

HEXCHAR    = [0-9a-zA-Z]

// ASCII printable characters

ASCII = [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]

Ascii characters that may appear unescaped in Identifiers

This is assumed to be basically all ASCII printable characters except these characters: '.', '/', '"', ''', and '&'. Occurrences of these characters are assumed to be representable using the standard XML &{name}; notation (e.g. &amp;). In this expression, backslash is interpreted as an escape character.

IDASCII=[0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]

The Numeric Constant Classes: integer and float

INTEGER    = {INT}|{UINT}|{HEXINT}

INT        = [+-][0-9]+{INTTYPE}?

UINT       = [0-9]+{INTTYPE}?

HEXINT     = {HEXSTRING}{INTTYPE}?

INTTYPE    = ([BbSsLl]|"ll"|"LL")

HEXSTRING  = (0[xX]{HEXCHAR}+)

FLOAT      = ({MANTISSA}{EXPONENT}?)|{NANINF}

EXPONENT   = ([eE][+-]?[0-9]+)

MANTISSA   = [+-]?[0-9]*\.[0-9]*

NANINF     = (-?inf|nan|NaN)B.1.4 The String Constant Class

STRING     = ([^"&<>]|{XMLESCAPE})*

CHAR       = ([^'&<>]|{XMLESCAPE})

URL        = (http|https|[:][/][/][a-zA-Z0-9\-]+
             ([.][a-zA-Z\-]+)+([:][0-9]+)?
             ([/]([a-zA-Z0-9\-._,'\\+%)*
             ([?].+)?([#].+)?

The String/URL Constant Class

STRING = "\({SIMPLESTRING}{ESCAPEDQUOTE}?\)*"
SIMPLESTRING = [^"\\]
ESCAPEDQOTE=\\"

The Opaque Constant Class

OPAQUE = 0x([0-9A-Fa-f] [0-9A-Fa-f])+

There is a semantic constraint that if there is an odd number of hex digits in the opaque constant, a zero hex digit will be added to the end to ensure that the constant represents a set of 8-bit bytes.

The Identifier Class

ID         = {IDCHAR}+

IDCHAR     = ({IDASCII}|{XMLESCAPE}|{UTF8})

XMLESCAPE  = [&][#][0-9]+;

The Atomic Type Class

ATOMICTYPE =   Char | Byte
             | Int8 | UInt8 | Int16 | UInt16
             | Int32 | UInt32 | Int64 | UInt64
             | Float32 | Float64
             | String | URL
             | Enum
             | Opaque ;

This list should be consistent with the atomic types in the grammar.

The Fully Qualified Name Class

FQN      = ([/]{EID})+([.]{EID})*
EID      = {EIDCHAR}+
EIDCHAR  =  ({EIDASCII}|{XMLESCAPE}|{UTF8})
EIDASCII = [0-9a-zA-Z!#$%()*+:;<=>?@\[\]\\^_`|{}~]

This should be consistent with the definition in Section 5.3.

Appendix 2. DAP4 Type Definitions

The RELAXNG [13] grammar references the following specific types. For each type, the following table give the lexical format as defined by the patterns previously given or by specific patterns as listed.

RELAXNG Data Type NameLexical Pattern
dap4_integer{INTEGER}
dap4_float{FLOAT}
dap4_char{CHAR}
dap4_string{STRING}
dap4_opaque{OPAQUE}
dap4_id{ID}
dap4_fqn{FQN}
dap4_uri{URL}
dap4_dim[1-9][0-9]*

Note that the above lexical element classes are not disjoint. The type element "<datatype=.../>" should be sufficient to interpret the type within the DMR.

Appendix 3. UTF-8

The UTF-8 specification [14] defines several ways to validate a UTF-8 string of characters.

The full (most correct) validating version of UTF8 character set is as follows.

UTF8 =   ([\xC2-\xDF][\x80-\xBF])
       | (\xE0[\xA0-\xBF][\x80-\xBF])
       | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])
       | (\xED[\x80-\x9F][\x80-\xBF])
       | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])
       | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])
       | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])
       | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])

The lines of the above expression cover the UTF-8 characters as follows: 1. non-overlong 2-byte 2. excluding overlongs 3. straight 3-byte 4. excluding surrogates 5. straight 3-byte 6. planes 1-3 7. planes 4-15 8. plane 16

Note that values from 0 through 127 (ASCII and control characters) are not included in any of these definitions.

The above reference also defines some alternative regular expressions. First, there is what is termed the partially relaxed version of UTF8 defined by this regular expression.

UTF8 =    ([\xC0-\xD6][\x80-\xBF])
        | ([\xE0-\xEF][\x80-\xBF][\x80-\xBF])
        | ([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])

Second, there is what is termed the most-relaxed version of UTF8 defined by this regular expression.

UTF8 = ([\xC0-\xD6]...)|([\xE0-\xEF)...)|([\xF0 \xF7]...)

Any conforming DAP4 implementation MUST use at least the most-relaxed expression for validating UTF-8 character strings, but MAY use either the partially-relaxed or the full validation expression.

Appendix 4. LALR(1) Grammar for DMR using Bison Notation

It is conventient to have a Bison grammar that corresponds to the above RELAX NG grammar. If there is a conflict, then the RELAX NG grammar is considered correct.

%start dataset
%%
dataset:
	DATASET_
	xml_attribute_list
	groupbody
	_DATASET
	;

group:
	GROUP_
	ATTR_NAME
	groupbody
	_GROUP
	;

groupbody:
	  %empty
	| groupbody dimdef
	| groupbody enumdef
	| groupbody variable
	| groupbody metadata
	| groupbody group
	;

enumdef:
	ENUMERATION_
	xml_attribute_list
	enumconst_list
	_ENUMERATION
	;

enumconst_list:
	  enumconst
	| enumconst_list enumconst
	;

enumconst:
	  ENUMCONST_ ATTR_NAME ATTR_VALUE _ENUMCONST
	| ENUMCONST_ ATTR_VALUE ATTR_NAME _ENUMCONST
	;

dimdef:
	DIMENSION_
	xml_attribute_list
	metadatalist
	_DIMENSION
	;

dimref:
	  DIM_ ATTR_NAME _DIM
	| DIM_ ATTR_SIZE _DIM
	;

variable:
	  atomicvariable
	| enumvariable
	| structurevariable
	| sequencevariable
	;

atomicvariable:
	atomictype_
	ATTR_NAME
	varbody
	_atomictype
	;

enumvariable:
	ENUM_
	xml_attribute_list
	varbody
	_ENUM
	;

atomictype_:
	  CHAR
	| BYTE
	| INT8
	| UINT8
	| INT16
	| UINT16
	| INT32
	| UINT32
	| INT64
	| UINT64
	| FLOAT32
	| FLOAT64
	| STRING
	| URL
	| OPAQUE
	;

_atomictype:
	  _CHA
	| _BYT
	| _INT
	| _UINT
	| _INT1
	| _UINT1
	| _INT3
	| _UINT3
	| _INT6
	| _UINT6
	| _FLOAT3
	| _FLOAT6
	| _STRIN
	| _UR
	| _OPAQU
	| _ENUM
	;

varbody:
	  %empty
	| varbody dimref
	| varbody mapref
	| varbody metadata
	;

mapref:
	MAP_
	ATTR_NAME
	metadatalist
	_MAP
	;

structurevariable:
	STRUCTURE_
	ATTR_NAME
	structbody
	_STRUCTURE
	;

structbody:
	  %empty
	| structbody variable
	| structbody dimref
	| structbody mapref
	| structbody metadata
	;

sequencevariable:
	SEQUENCE_
	ATTR_NAME
	sequencebody
	_SEQUENCE
	;

sequencebody:
	  %empty
	| sequencebody dimref
	| sequencebody variable
	| sequencebody mapref
	| sequencebody metadata
	;

metadatalist:
	  %empty
	| metadatalist metadata
	;

metadata:
	  attribute
	;

attribute:
	  atomicattribute
	| containerattribute
	| otherxml
	;


atomicattribute:
	  ATTRIBUTE_
	  xml_attribute_list
	  namespace_list
	  valuelist
	  _ATTRIBUTE
	|
	  ATTRIBUTE_
	  xml_attribute_list
	  namespace_list
	  _ATTRIBUTE
	;

namespace_list:
	  %empty
	| namespace_list namespace
	;

namespace:
	NAMESPACE_
	ATTR_HREF
	_NAMESPACE
	;

containerattribute:
	  ATTRIBUTE_
	  xml_attribute_list
	  namespace_list
	  attributelist
	  _ATTRIBUTE
	;

attributelist:
	  attribute
	| attributelist attribute
	;

valuelist:
	  value
	| valuelist value
	;

value:
	  VALUE_ TEXT _VALUE
	| VALUE_ ATTR_VALUE _VALUE
	;

otherxml:
	OTHERXML_
	xml_attribute_list
	xml_body
	_OTHERXML
	;

xml_body:
	  element_or_text
	| xml_body element_or_text
	;

element_or_text:
	  xml_open
	  xml_attribute_list
	  xml_body
	  xml_close
	| TEXT
	;

xml_attribute_list:
	  %empty
	| xml_attribute_list xml_attribute
	;

xml_attribute:
	  ATTR_BASE
	| ATTR_BASETYPE
	| ATTR_DAPVERSION
	| ATTR_DMRVERSION
	| ATTR_ENUM
	| ATTR_HREF
	| ATTR_NAME
	| ATTR_NAMESPACE
	| ATTR_NS
	| ATTR_SIZE
	| ATTR_TYPE
	| ATTR_VALUE
	;

xml_open:
	  DATASET_
	| GROUP_
	| ENUMERATION_
	| ENUMCONST_
	| NAMESPACE_
	| DIMENSION_
	| DIM_
	| ENUM_
	| MAP_
	| STRUCTURE_
	| SEQUENCE_
	| VALUE_
	| ATTRIBUTE_
	| OTHERXML_
	| CHAR_
	| BYTE_
	| INT8_
	| UINT8_
	| INT16_
	| UINT16_
	| INT32_
	| UINT32_
	| INT64_
	| UINT64_
	| FLOAT32_
	| FLOAT64_
	| STRING_
	| URL_
	| OPAQUE_
	;

xml_close:
	  _DATASET
	| _GROUP
	| _ENUMERATION
	| _ENUMCONST
	| _NAMESPACE
	| _DIMENSION
	| _DIM
	| _ENUM
	| _MAP
	| _STRUCTURE
	| _SEQUENCE
	| _VALUE
	| _ATTRIBUTE
	| _OTHERXML
	| _CHAR
	| _BYTE
	| _INT8
	| _UINT8
	| _INT16
	| _UINT16
	| _INT32
	| _UINT32
	| _INT64
	| _UINT64
	| _FLOAT32
	| _FLOAT64
	| _STRING
	| _URL
	| _OPAQUE
	;

Lexical Tokens for Bison Grammar

The above Bison grammar assumes a corresponding lexer that will return a set of token types (listed below). The token with a trailing underscore represents an opening XML element and a token with a leading underscore represents a closing XML element. So, for example, token DATASET_ is <Dataset> and token _DATASET is </Dataset>.

/* XML Element Names */
%token DATASET_ _DATASET
%token GROUP_ _GROUP
%token ENUMERATION_ _ENUMERATION
%token ENUMCONST_ _ENUMCONST
%token NAMESPACE_ _NAMESPACE
%token DIMENSION_ _DIMENSION
%token DIM_ _DIM
%token MAP_ _MAP
%token STRUCTURE_ _STRUCTURE
%token SEQUENCE_ _SEQUENCE
%token VALUE_ _VALUE
%token ATTRIBUTE_ _ATTRIBUTE
%token OTHERXML_ _OTHERXML
%token ERROR_ _ERROR
%token MESSAGE_ _MESSAGE
%token CONTEXT_ _CONTEXT
%token OTHERINFO_ _OTHERINFO

/* XML Element Names for Atomic Types*/
%token CHAR_ _CHAR
%token BYTE_ _BYTE
%token INT8_ _INT8
%token UINT8_ _UINT8
%token INT16_ _INT16
%token UINT16_ _UINT16
%token INT32_ _INT32
%token UINT32_ _UINT32
%token INT64_ _INT64
%token UINT64_ _UINT64
%token FLOAT32_ _FLOAT32
%token FLOAT64_ _FLOAT64
%token STRING_ _STRING
%token URL_ _URL
%token OPAQUE_ _OPAQUE
%token ENUM_ _ENUM

/* XML Attribute Names */
%token ATTR_BASE ATTR_BASETYPE ATTR_DAPVERSION ATTR_DMRVERSION
%token ATTR_ENUM ATTR_HREF ATTR_NAME ATTR_NAMESPACE
%token ATTR_NS ATTR_SIZE ATTR_TYPE ATTR_VALUE 
%token ATTR_HTTPCODE

/* Arbitrary XML Text */
%token TEXT

Appendix 5. LALR(1) Grammar for Constraints using Bison Notation

%start constraint
%%
constraint:
	dimredeflist
	clauselist
	;

dimredeflist:
          %empty
        | dimredeflist ';' dimredef
        ;

clauselist:
          clause
        | clauselist ';' clause
        ;

clause:
          projection
	| selection
        ;

projection:
	segmenttree
        ;

segmenttree:
          segment
        | segmenttree '.' segment
        | segmenttree '.' '{' segmentforest '}'
        | segmenttree '{' segmentforest '}'
        ;

segmentforest:
	  segmenttree
	| segmentforest ',' segmenttree
	;

segment:
          NAME
        | NAME slicelist
        ;

slicelist: 
          slice
        | slicelist slice
        ;

slice:
          '[' ']'
        | '[' subsetlist ']'
	;

subsetlist:
	  subset
	| subsetlist ',' subset
	;
	
subset:
           index 
        |  index ':' index 
        |  index ':' index ':' index 
        |  index ':' 
        |  index ':' index ':' 
        ;

index:  INTEGER ;

selection:
        segmenttree '|' filter
        ;

filter:
          predicate
        | predicate ',' predicate  /* ',' == AND */
        | '!' predicate %prec NOT
        ;

predicate:
          primary relop primary
        | primary relop primary relop primary
        | primary eqop primary
        ;

relop:
	  '&lt;' '='
	| '&gt;' '='
	| '&lt;'
	| '^gt;'
	;

eqop:
	  '=' '='
	| '!' '='
	| '~' '='
	;

primary:
          fieldname
        | constant
        | '(' predicate ')'
	;

dimredef: NAME '=' slice ;

fieldname: NAME

constant: STRING | INTEGER | DOUBLE | BOOLEAN ;

Lexical Tokens for Bison Grammar for Constraints

The primary lexical tokens for constraints are: NAME, STRING, INTEGER, DOUBLE, BOOLEAN.

These lexemes are intended to match the patterns defined for the RELAX NG grammar.