DAP4: Overview: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
mNo edit summary
No edit summary
Line 10: Line 10:
DAP’s effectiveness is keyed on the underlying data model. This embraces a rich variety of data types (including multidimensional arrays) and spells out the (type-specific) retrieval operations that clients may request. The simplicity, flexibility and domain-neutrality of the DAP data model (which bears much similarity to that of DAP2) make it effective—as middleware, per the above—across a wide variety of data types and domains. More specifically, a wide variety of data sources, with a wide variety of data schemas, can be mapped onto the DAP model for retrieval and use by client computers and software.
DAP’s effectiveness is keyed on the underlying data model. This embraces a rich variety of data types (including multidimensional arrays) and spells out the (type-specific) retrieval operations that clients may request. The simplicity, flexibility and domain-neutrality of the DAP data model (which bears much similarity to that of DAP2) make it effective—as middleware, per the above—across a wide variety of data types and domains. More specifically, a wide variety of data sources, with a wide variety of data schemas, can be mapped onto the DAP model for retrieval and use by client computers and software.


== The DAP Data Model ==
= Understanding the DAP Data Model =


A DAP server typically makes accessible a ''collection'' of data sources, each identified by a unique (unadorned) URL. As discussed below, clients pose requests by modifying this URL with DAP-specific suffixes and query strings. The following subsections only summarize the formal specification, which takes precedent over anything stated here.
== Summary ==
[to be written]
 
== A Rough Glossary of Data-Model Entities ==
 
Much about DAP may be discerned from a dictionary or glossay-like list of key entities in its data model. Such a list follows, sequenced for ease of understanding (rather than alphabetically). The descriptions here are not definitive, as the formal specification documents take precedence over anything stated here.
 
'''Data Source''' - Though formally outside the DAP Data Model, the term Data Source generally refers to all the datasets (see below) that may be retrieved via DAP from a single server, identified by its domain name. Servers sometimes offer (catalogs and/or inventories of) collections and sub-collections, but the DAP Data Model focuses on granular and sub-granular retrievals.
 
'''Dataset''' - Sometimes called a “granule” in catalog/inventory parlance, a DAP Dataset is represented by a unique (unadorned) URL, and is the highest-level entity (metadata as well as content) described by the DAP Data Model. Clients invoke retrieval operations by adorning the Dataset URL with suffixes and query strings interpreted by the server.
 
'''Declarations and the DMR''' - For a specific Dataset, all aspects of the DAP Data Model (name assignments, structural definitions, etc) are governed by a formal declarations document. Created as part of making a Dataset DAP-retrievable, this document is dubbed the DMR (roughly: Dataset Metadata Response), and clients may retrieve DMRs alongside or independently of Dataset contents.
 
'''Name''' - Most entities in the DAP Data Model may or must be named. With some constraints on the use of special characters (such as “.”), a Name can be any character string. To avoid conflicts, DAP has scoping rules. For example, a Variable and a Dimension may have the same Name without ambiguity, but two Variables can have the same Name only if they are declared in different Groups or structures.
 
'''Variable''' - The building blocks for the DAP Data Model are Variables, which are strictly typed. Three classes of them (scalar, structure and sequence) are described separately below, but in actuality these are best distinguished by inspecting their Type declarations. Each Variable must be assigned a Type and a Name, and it may optionally have a number of Dimensions and Attributes, elaborated below.


=== Elements of a DAP Data Source ===
'''Type''' - Underpinning DAP’s “container” Types (Structure and Sequence, implied above) are “atomic” Types akin to those of computer languages: bytes, integers, floating-point values, strings, and URLs, plus an opaque type permitting arbitrary blobs of bits. More detailed Type descriptions are provided in Volume I of the DAP specification.


A DAP data source is fundamentally a collection of ''typed'' variables that have names, dimensions, attributes, and values. A variable's attributes and dimensions often are named as well. The ''types'' of variables in DAP are numerous, as outlined in the ensuing subsection. Furthermore, a variable having several dimensions is a natural and intuitive way to represent multidimensional arrays, and the DAP repertoire of client requests (see the subsection below on that topic) includes methods to retrieve ''subarrays'' per user specifications.
'''(Scalar) Variable''' - A Variable whose Type is atomic (see above) comprises a single value or an array of values, and all its values are of the designated Type. A Variable is an array only if its declaration includes Dimensions, which determine the array’s shape and its element-ordering (see below).


Attributes are much like variables except they are intended to facilitate ''interpretation'' of the entities to which they are assigned. In contrast, variables contain the ''primary'' content of a data source. The scope of an attribute is limited by the entity to which it is assigned (i.e., a variable, a group, or an entire data source)). Thus, for example, variables T and V attributes with the same name (say "Units") but these attributes can have distinct ''values'', such as "K" when applied to T and "m/s" when applied to V. In contrast, dimensions are essentially named constants, so their (integer) values are completely independent of the variables with which they are associated.
'''(Structure) Variable''' - A Variable of Type “Structure” is a container for other variables, often implying relationships among them. For example, a structure Variable named “Velocity” might contain a pair of scalar Variables (or fields) named “x” and “y,” representing components of a velocity vector. These components would be retrieved via their “qualified” Names, “Velocity.x” and “Velocity.y”.


Variables and their attributes may be collected into ''named groups'' (which can be nested to yield hierarchies), and variable names may be reused in multiple groups without generating conflicts. For example, a variable named T appearing in a group named G1 is understood to be distinct from and formally unrelated to a variable named T appearing in a second group named G2. Dimensions may ''not'' be assigned to groups, as their scope is always global, as indicated above. The numbers of groups, variables, attributes and dimensions in a data source is unlimited.
''Notes on Structures:''
<ul>
<li>Structures may contain variables of any type, including other structures.</li>
<li>A contained variable can be declared and used only within the context of a single structure variable, whether or not its (unqualified) Name appears in other contexts. For example, declarations for the scalars “Velocity.x” and “Displacement.x” would exist only within the declarations for the “Velocity” and “Displacement” structures respectively, despite reuse of the name “x”.</li>
<li>Though a dimensioned structure resembles a structure containing dimensioned variables (with the same shapes), these are not equivalent, and the means for referencing them differ. For example, array element i,j would be referenced as:
  <ul>
  <li>Velocity[i,j].x if two dimensions are assigned to the Velocity structure.</li>
  <li>Velocity.x[i,j] if two dimensions are assigned to its x-component variable.</li>
  </ul>
</li>
</ul>


[?insert a table or tables showing how the above elements (i.e., groups, variables, types, dimensions, shapes relate to one another?]
'''(Sequence) Variable''' - A Variable of Type “Sequence” is a container holding multiple (unordered) instances of other DAP Variables. For example, a sequence Variable named “TracerParticle” might contain a pair of structures named “Velocity” and “Displacement”, each declared—as in an earlier example—to have x and y components. The instances of TracerParticle would be like a set of tabular records whose four fields, Displacement.x, Displacement.y, Velocity.x, and Velocity.y are retrieved via filter-style (rather than indexed) retrievals, as discussed in a later section on Constraints.


=== Atomic Types, Container Types and Enumeration Types ===
''Notes on Sequences:''
<ul>
<li>Sequences may contain variables of any type, including other sequences.</li>
<li>Though a sequence is similar in some respects to a structure with a single (indexing) dimension, the differences are significant. For example, if a DAP server offers retrieval of records from a relational data base:</li>
<li>The most useful client retrievals may entail filtering based on the values in the fields, and this yields indexing gaps. In other words, indexing may have little or no utility.</li>
<li>The number of records may be hidden or dynamic, so a dimension length cannot be calculated, and the order in which records are returned may be volatile.</li>
</ul>


The specified type of any DAP variable, whether or not it is an array, must be an atomic type, a container type, or an enumerated type as described in the following paragraphs.  
'''Group''' - The DAP Data Model has a hierarchical mechanism for grouping Variables and carving out independent namespaces. Groups may be nested, and all but one must have Names, the exception being the root of the hierarchy, where the Dataset itself is a Group (needing no name). Retrieving a Variable whose declaration falls within a Named Group requires use of its fully qualified name (FQN), such as GroupA.Group2.Velocity. Any Group (including the Dataset) may be assigned Attributes but not Dimensions.


'''Atomic types''' - As with many programming languages the DAP atomic types include bytes, Integers (including 64-bit integers), floating-point values, and strings. DAP additionally includes the types URL and opaque, the latter to allow otherwise unspecified blobs of information.
'''Attribute''' - Otherwise nearly indistinguishable from a Variable, an Attribute must always be assigned to a specific Variable or Group. The purpose of Attributes is to provide context or add meaning to the assigned entities, whereas the purpose of Variables is to convey primary content. Retrieving an Attribute always requires prepending the name of the Variable or Group to which it is assigned, which implies that Attribute Names (such as “Units”) enjoy unlimited reusability.


'''Container types''' - structures, sequences, groups...
'''Dimension''' - A Dimension must have a size and may have a Name. A Variable of any type may optionally be assigned a number of Dimensions, in which case its (compound) values are organized and retrieved as an indexible array of rank n, where n is the number of assigned Dimensions.


'''Enumeration types''' -  
''Notes on Dimensions:''
<ul>
<li>Named Dimensions resemble named constants. Indeed, assigning a named dimension to multiple variables (within the scope of a single group) has the same effect on each, giving definition to that variable’s array shape and array-element ordering.</li>
<li>Unlike attributes, dimensions often are declared outside the variables to which they are assigned. Groups may not accept dimension assignments, but groups limit the scope of the dimension names and sizes declared within them.</li>
<li>Dimensions names may be reused, with differing sizes across multiple groups.</li>
<li>The order of the dimension assignments in a variable declaration is significant, as this determines the variable’s array-element ordering as well as its shape.</li>
<li>Retrieving a dimension may require prepending the name of the group in which it was declared but never the name of a variable to which it has been assigned.</li>
<li>A Dimension’s size must be a positive integer less than 2^61.</li>
</ul>


=== Higher-Level DAP Objects ===
== Higher-Level DAP Objects and Extensions ==


Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages...
Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages...
Line 40: Line 79:
Note: Though adoption to-date has been most pronounced in Earth sciences, DAP’s data types and structures (with the possible exception of coverages, discussed in this section) are not at all specific to these disciplines, so we think DAP is positioned for effective use in many domains, scientific and otherwise.
Note: Though adoption to-date has been most pronounced in Earth sciences, DAP’s data types and structures (with the possible exception of coverages, discussed in this section) are not at all specific to these disciplines, so we think DAP is positioned for effective use in many domains, scientific and otherwise.


== Characterization of a DAP Data Source: the DMR ==
= Client Use of a DAP Data Source =
 
== High-Level Info about DAP Datasets: the DMR ==


A client's first step in selectively retrieving a data source often is to discern the character (i.e., its schema) by requesting what DAP calls the DMR (the data-source metadata response). A DMR provides a complete characterization of the associated data source sans content, spelling out its groups, variables, types, dimensions, and attributes as discussed in the preceding two subsections. For ease of use in client software, the DMR adheres to a formal syntax and most often is delivered as an XML document, though other forms are anticipated as DAP4 ''extensions''.
A client's first step in selectively retrieving a data source often is to discern the character (i.e., its schema) by requesting what DAP calls the DMR (the data-source metadata response). A DMR provides a complete characterization of the associated data source sans content, spelling out its groups, variables, types, dimensions, and attributes as discussed in the preceding two subsections. For ease of use in client software, the DMR adheres to a formal syntax and most often is delivered as an XML document, though other forms are anticipated as DAP4 ''extensions''.
Line 46: Line 87:
Though it is common to retrieve its DMR prior to requesting content from a data source, this is not the only option. Indeed, a "Data Request" under DAP returns both the DMR and the content (i.e., the ''values'' of variables) for the designated data source, because the former is critical for interpreting the latter.
Though it is common to retrieve its DMR prior to requesting content from a data source, this is not the only option. Indeed, a "Data Request" under DAP returns both the DMR and the content (i.e., the ''values'' of variables) for the designated data source, because the former is critical for interpreting the latter.


== Client Requests Invoked on DAP Data Sources ==
== Retrieving Content from DAP Datasets: Posing DAP Requests ==


Under DAP, the requests clients make of servers, and the resulting server responses, are all governed by the protocol specification. As stated previously, the formal specification takes precedent over anything stated here.
Under DAP, the requests clients make of servers, and the resulting server responses, are all governed by the protocol specification. As stated previously, the formal specification takes precedent over anything stated here.
Line 142: Line 183:


DAP4 is the result of a joint, multiyear development effort by OPeNDAP and Unidata, funded by a generous grant from NOAA and guided by an advisory committee comprising Mike Folk (THG), Jim Frew (UCSB), Steve Hankin (NOAA), Eric Kihn (NOAA), Chris Lynnes (NASA) and Rich Signell (USGS).
DAP4 is the result of a joint, multiyear development effort by OPeNDAP and Unidata, funded by a generous grant from NOAA and guided by an advisory committee comprising Mike Folk (THG), Jim Frew (UCSB), Steve Hankin (NOAA), Eric Kihn (NOAA), Chris Lynnes (NASA) and Rich Signell (USGS).
= Old Material, probably to be Discarded =
  The DAP Data Model
A DAP server typically makes accessible a ''collection'' of data sources, each identified by a unique (unadorned) URL. As discussed below, clients pose requests by modifying this URL with DAP-specific suffixes and query strings. The following subsections only summarize the formal specification, which takes precedent over anything stated here.
  Elements of a DAP Data Source
A DAP data source is fundamentally a collection of ''typed'' variables that have names, dimensions, attributes, and values. A variable's attributes and dimensions often are named as well. The ''types'' of variables in DAP are numerous, as outlined in the ensuing subsection. Furthermore, a variable having several dimensions is a natural and intuitive way to represent multidimensional arrays, and the DAP repertoire of client requests (see the subsection below on that topic) includes methods to retrieve ''subarrays'' per user specifications.
Attributes are much like variables except they are intended to facilitate ''interpretation'' of the entities to which they are assigned. In contrast, variables contain the ''primary'' content of a data source. The scope of an attribute is limited by the entity to which it is assigned (i.e., a variable, a group, or an entire data source)). Thus, for example, variables T and V attributes with the same name (say "Units") but these attributes can have distinct ''values'', such as "K" when applied to T and "m/s" when applied to V. In contrast, dimensions are essentially named constants, so their (integer) values are completely independent of the variables with which they are associated.
Variables and their attributes may be collected into ''named groups'' (which can be nested to yield hierarchies), and variable names may be reused in multiple groups without generating conflicts. For example, a variable named T appearing in a group named G1 is understood to be distinct from and formally unrelated to a variable named T appearing in a second group named G2. Dimensions may ''not'' be assigned to groups, as their scope is always global, as indicated above. The numbers of groups, variables, attributes and dimensions in a data source is unlimited.
[?insert a table or tables showing how the above elements (i.e., groups, variables, types, dimensions, shapes relate to one another?]
  Atomic Types, Container Types and Enumeration Types
The specified type of any DAP variable, whether or not it is an array, must be an atomic type, a container type, or an enumerated type as described in the following paragraphs.
'''Atomic types''' - As with many programming languages the DAP atomic types include bytes, Integers (including 64-bit integers), floating-point values, and strings. DAP additionally includes the types URL and opaque, the latter to allow otherwise unspecified blobs of information.
'''Container types''' - structures, sequences, groups...
'''Enumeration types''' - </ul>

Revision as of 06:37, 28 March 2014

Following two decades of stability and increasing use, DAP2 is being superseded by DAP4, the first substantive revision in the history of the Data Access Protocol (DAP), an open-source endeavor led by OPeNDAP, Inc. The primary and continuing purpose of DAP is to realize remote, selective, data-retrieval as a widely-accepted and well-crafted Web service. This document outlines the fundamental concepts of DAP4, and (targeting those who have already programmed DAP-compatible clients and servers) it highlights how DAP4 differs from DAP2. In the following, DAP refers to DAP4 unless indicated otherwise.

Data Retrieval as a Web Service

The premise underlying DAP4 remains, as in DAP2, that values from data sources—or, notably, from proper subsets—along with pertinent metadata may be acquired remotely and effectively through an appropriately defined Web service, operated near the source data. To a surprising degree, DAP services shield users from idiosyncrasies in source-data formats and storage, so DAP functions as middleware with a further advantage: source-data and users may reside anyplace that has Internet connectivity. OPeNDAP's commitment to open source has fostered several DAP-compatible servers and an even larger number of DAP-compatible client environments, several of which (i.e., servers, clients and client-server libraries) are available at no cost.

DAP is designed for selectively retrieving (but not for storing) data organized as variables or groups of variables. It is well suited to cases where client computers retrieve data stored on remote computers (i.e., servers) networked to the client, especially where data sources are huge (comprising large arrays, e.g.) but clients typically need only small subsets of them. The protocol is fundamentally stateless (some might say “RESTful”), and it governs how clients pose requests and how servers issue corresponding responses.

DAP’s effectiveness is keyed on the underlying data model. This embraces a rich variety of data types (including multidimensional arrays) and spells out the (type-specific) retrieval operations that clients may request. The simplicity, flexibility and domain-neutrality of the DAP data model (which bears much similarity to that of DAP2) make it effective—as middleware, per the above—across a wide variety of data types and domains. More specifically, a wide variety of data sources, with a wide variety of data schemas, can be mapped onto the DAP model for retrieval and use by client computers and software.

Understanding the DAP Data Model

Summary

[to be written]

A Rough Glossary of Data-Model Entities

Much about DAP may be discerned from a dictionary or glossay-like list of key entities in its data model. Such a list follows, sequenced for ease of understanding (rather than alphabetically). The descriptions here are not definitive, as the formal specification documents take precedence over anything stated here.

Data Source - Though formally outside the DAP Data Model, the term Data Source generally refers to all the datasets (see below) that may be retrieved via DAP from a single server, identified by its domain name. Servers sometimes offer (catalogs and/or inventories of) collections and sub-collections, but the DAP Data Model focuses on granular and sub-granular retrievals.

Dataset - Sometimes called a “granule” in catalog/inventory parlance, a DAP Dataset is represented by a unique (unadorned) URL, and is the highest-level entity (metadata as well as content) described by the DAP Data Model. Clients invoke retrieval operations by adorning the Dataset URL with suffixes and query strings interpreted by the server.

Declarations and the DMR - For a specific Dataset, all aspects of the DAP Data Model (name assignments, structural definitions, etc) are governed by a formal declarations document. Created as part of making a Dataset DAP-retrievable, this document is dubbed the DMR (roughly: Dataset Metadata Response), and clients may retrieve DMRs alongside or independently of Dataset contents.

Name - Most entities in the DAP Data Model may or must be named. With some constraints on the use of special characters (such as “.”), a Name can be any character string. To avoid conflicts, DAP has scoping rules. For example, a Variable and a Dimension may have the same Name without ambiguity, but two Variables can have the same Name only if they are declared in different Groups or structures.

Variable - The building blocks for the DAP Data Model are Variables, which are strictly typed. Three classes of them (scalar, structure and sequence) are described separately below, but in actuality these are best distinguished by inspecting their Type declarations. Each Variable must be assigned a Type and a Name, and it may optionally have a number of Dimensions and Attributes, elaborated below.

Type - Underpinning DAP’s “container” Types (Structure and Sequence, implied above) are “atomic” Types akin to those of computer languages: bytes, integers, floating-point values, strings, and URLs, plus an opaque type permitting arbitrary blobs of bits. More detailed Type descriptions are provided in Volume I of the DAP specification.

(Scalar) Variable - A Variable whose Type is atomic (see above) comprises a single value or an array of values, and all its values are of the designated Type. A Variable is an array only if its declaration includes Dimensions, which determine the array’s shape and its element-ordering (see below).

(Structure) Variable - A Variable of Type “Structure” is a container for other variables, often implying relationships among them. For example, a structure Variable named “Velocity” might contain a pair of scalar Variables (or fields) named “x” and “y,” representing components of a velocity vector. These components would be retrieved via their “qualified” Names, “Velocity.x” and “Velocity.y”.

Notes on Structures:

  • Structures may contain variables of any type, including other structures.
  • A contained variable can be declared and used only within the context of a single structure variable, whether or not its (unqualified) Name appears in other contexts. For example, declarations for the scalars “Velocity.x” and “Displacement.x” would exist only within the declarations for the “Velocity” and “Displacement” structures respectively, despite reuse of the name “x”.
  • Though a dimensioned structure resembles a structure containing dimensioned variables (with the same shapes), these are not equivalent, and the means for referencing them differ. For example, array element i,j would be referenced as:
    • Velocity[i,j].x if two dimensions are assigned to the Velocity structure.
    • Velocity.x[i,j] if two dimensions are assigned to its x-component variable.

(Sequence) Variable - A Variable of Type “Sequence” is a container holding multiple (unordered) instances of other DAP Variables. For example, a sequence Variable named “TracerParticle” might contain a pair of structures named “Velocity” and “Displacement”, each declared—as in an earlier example—to have x and y components. The instances of TracerParticle would be like a set of tabular records whose four fields, Displacement.x, Displacement.y, Velocity.x, and Velocity.y are retrieved via filter-style (rather than indexed) retrievals, as discussed in a later section on Constraints.

Notes on Sequences:

  • Sequences may contain variables of any type, including other sequences.
  • Though a sequence is similar in some respects to a structure with a single (indexing) dimension, the differences are significant. For example, if a DAP server offers retrieval of records from a relational data base:
  • The most useful client retrievals may entail filtering based on the values in the fields, and this yields indexing gaps. In other words, indexing may have little or no utility.
  • The number of records may be hidden or dynamic, so a dimension length cannot be calculated, and the order in which records are returned may be volatile.

Group - The DAP Data Model has a hierarchical mechanism for grouping Variables and carving out independent namespaces. Groups may be nested, and all but one must have Names, the exception being the root of the hierarchy, where the Dataset itself is a Group (needing no name). Retrieving a Variable whose declaration falls within a Named Group requires use of its fully qualified name (FQN), such as GroupA.Group2.Velocity. Any Group (including the Dataset) may be assigned Attributes but not Dimensions.

Attribute - Otherwise nearly indistinguishable from a Variable, an Attribute must always be assigned to a specific Variable or Group. The purpose of Attributes is to provide context or add meaning to the assigned entities, whereas the purpose of Variables is to convey primary content. Retrieving an Attribute always requires prepending the name of the Variable or Group to which it is assigned, which implies that Attribute Names (such as “Units”) enjoy unlimited reusability.

Dimension - A Dimension must have a size and may have a Name. A Variable of any type may optionally be assigned a number of Dimensions, in which case its (compound) values are organized and retrieved as an indexible array of rank n, where n is the number of assigned Dimensions.

Notes on Dimensions:

  • Named Dimensions resemble named constants. Indeed, assigning a named dimension to multiple variables (within the scope of a single group) has the same effect on each, giving definition to that variable’s array shape and array-element ordering.
  • Unlike attributes, dimensions often are declared outside the variables to which they are assigned. Groups may not accept dimension assignments, but groups limit the scope of the dimension names and sizes declared within them.
  • Dimensions names may be reused, with differing sizes across multiple groups.
  • The order of the dimension assignments in a variable declaration is significant, as this determines the variable’s array-element ordering as well as its shape.
  • Retrieving a dimension may require prepending the name of the group in which it was declared but never the name of a variable to which it has been assigned.
  • A Dimension’s size must be a positive integer less than 2^61.

Higher-Level DAP Objects and Extensions

Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages...

Note: Though adoption to-date has been most pronounced in Earth sciences, DAP’s data types and structures (with the possible exception of coverages, discussed in this section) are not at all specific to these disciplines, so we think DAP is positioned for effective use in many domains, scientific and otherwise.

Client Use of a DAP Data Source

High-Level Info about DAP Datasets: the DMR

A client's first step in selectively retrieving a data source often is to discern the character (i.e., its schema) by requesting what DAP calls the DMR (the data-source metadata response). A DMR provides a complete characterization of the associated data source sans content, spelling out its groups, variables, types, dimensions, and attributes as discussed in the preceding two subsections. For ease of use in client software, the DMR adheres to a formal syntax and most often is delivered as an XML document, though other forms are anticipated as DAP4 extensions.

Though it is common to retrieve its DMR prior to requesting content from a data source, this is not the only option. Indeed, a "Data Request" under DAP returns both the DMR and the content (i.e., the values of variables) for the designated data source, because the former is critical for interpreting the latter.

Retrieving Content from DAP Datasets: Posing DAP Requests

Under DAP, the requests clients make of servers, and the resulting server responses, are all governed by the protocol specification. As stated previously, the formal specification takes precedent over anything stated here.

For each data source, a number of responses may elicited by a client, determined by adding a suffix and/or a query string to the basic URL for the desired data source. Passing the server a completely unadorned URL yields a Dataset Services Response (DSR). This XML document describes the various DAP services available for that source, and these always include provision of a DMR and provision of content from the source. Unlike the DMR, which is always textual, content (delivered in response to a Data Request, as discussed above) may be conveyed in textual or binary form, the latter minimizing data-transfer volumes, of course.

If the URL for a Data Request includes a query string, the server parses this string to determine what data processing the server should perform before constructing its Data Response. Though other classes of pre-retrieval processing are anticipated to be defined via DAP extensions, two forms are mandated by DAP4 for all servers, Index Subsetting and Field Subsetting, and a third form, Filtering, is defined in the core DAP specification, though its implementation by servers is optional.

Index Subsetting - Choosing parts of an array based on the indexes of that array's dimensions. This operation always returns an array of the same rank as the original, although the size of the return array will (likely) be smaller. Index subsetting uses the bracket syntax described later.

Field Subsetting - Choosing specific variables or fields from the dataset. A dataset in DAP4 is made up of a number of variables and those may be Structures or Sequences that contain fields (and, in effect, the Dataset is itself a Structure and all of its variables are fields - the distinction is more convenience than formal). Field subsetting using the brace syntax described later. One or more fields can be specified using a semicolon (;) as the separator.

Filtering - A filter is a predicate that can be used to choose data elements based on their values. the vertical bar (|) is used as a prefix operator for the filter predicate. Filters can be applied to elements of an Array or fields of a Sequence. A filter predicate consists of one or more filter subexpressions. One or more subexpressions can be specified, using a comma (,) as the separator.

Other services listed in the DSR might (at the server's option) include the DAP Asynchronous Response. Where implemented (such as for near-line data sources), this response is sent to the client when the requested resource (DMR, Data Response, etc.) is not immediately available. If, in turn, the client makes a "retrieve it" request, the server will respond with a second Asynchronous Response informing the client about when and where the requested resource may be retrieved.

In addition to the most common data objects, a DAP server may provide additional "services," such as HTML-formatted representations of a data source's structure and content. Such additional services are discussed in Volume 2 of the specification.

The Formal DAP Specification

The DAP4 specification spans two volumes: one describes the Data Model and DAP’s Request/Response objects; the other volume describes how DAP clients and servers communicate via HTTP and the modern Web. New volumes about DAP Extensions will be added as they emerge.

Partitioning the specification into two primary documents reflects the independence of DAP’s data-retrieval functionality from the underlying network transfer protocol. Indeed, DAP could (via extensions) be used with other transports. However, utilizing HTTP eases the building of DAP servers because they can take full advantage of widely used Web-server frameworks such as Apache. Use of Extensions documents will enable evolution of the protocol without the expense and complexity of another major protocol-development project. Anticipated extensions include a JSON encoding for DAP data/metadata and the provision of server functions (beyond DAP’s core subsetting and filtering operations).

[?should we insert here a partial table of contents (with active links) for volume I?]

[?should we insert here a partial table of contents (with active links) for volumes II?]

How DAP4 Differs from DAP2

Though the protocol, per se, is maintained primarily by OPeNDAP, many others have engaged in DAP2 realization. One implementation—by Unidata, in the University Corp. for Atmospheric Research—includes the popular THREDDS Data Server (TDS). A key motivation for DAP4, developed jointly by OPeNDAP and Unidata (see "Acknowledgments," below), was to reduce differences that have arisen, and impede interoperability, among DAP2 realizations. Our hope is that a modernized, clearer and more comprehensive specification will facilitate building clients and servers with greater interoperability, making such ventures more rewarding and less risky.

This section covers changes to the data model, response formats, and serialization, giving developers a roadmap to migration from DAP2 to DAP4. E.g., the “Grid” type now supports a notion of discrete functions similar to an OGC (or ISO) Coverage and to the Scientific Data Type found in Unidata’s Common Data Model (CDM). Also from this section, users may learn of functionalities to seek in clients. E.g., DAP4 servers return checksums with each data response, but clients may utilize these in varying degrees.

DAP4 is largely an extension of DAP2 concepts, including ideas that emerged as DAP gained prominence across the Earth sciences. Therefore DAP2-compatible software, in clients or servers, should be easy to adapt to DAP4, and this has been affirmed in the OPeNDAP-Unidata realization and testing work. Furthermore, DAP4 exhibits backward compatibility sufficient to enable gradual transitioning. Substantive changes include support for Groups, yielding greater compatibility with HDF and NetCDF4.

Data-Model Changes

Summary': DAP4 now supports groups, generalized coverages and a few new atomic types.

The DAP4 data model is fundamentally similar to that for DAP2. New atomic types include: enumeration, 64-bit integer, and opaque, and the container types now include groups. Groups provide a way to organize collections of variables and to encode these organizational relationships when they are present in the underlying source data.

Dimensions may now be named, and the presence of shared dimensions (i.e., several variables employ a dimension with a given name) serves to indicate relationships among arrays that can, in turn, be used to build/represent OGC "coverages." Coverages subsume the role of DAP2 grids, so these have been removed from DAP4.

Migrating from DAP2 to DAP4

For servers: A DAP2 DDS/DAS (or DDX) is very close to a DAP4 DMR. The set of datatypes supported by DAP4 is almost a proper superset of those in DAP2, the exception being that DAP2's Grid type has been removed and in its place is a Coverage. A Coverage is not a type per se; instead it is a binding of two or more arrays using Shared Dimensions. Thus, to transform a DAP2 Grid into a Coverage for DAP4, the dimensions from the Grid's Maps will have to be extracted and used to make Shared Dimensions in the DMR. However, the DAP4 Coverage model completely subsumes DAP2 Grids, so it will be easy to represent Grids in DAP4.

For clients: Some of the new data types are more challenging to implement than the types included with DAP2. Of particular note are Enumerations and Coverages.

Changed Responses

Summary:

  • DAP4 includes only one dataset metadata response, not two;
  • Several Sequences may be individually constrained in one access;
  • Predictable behavior for URLs
  • Asynchronous responses

In DAP4 these is a single XML document that encodes the metadata for a data source. This response is conceptually similar to, and in some ways identical too, the DDX response that is supported by many DAP2 servers, so it's organization will be familiar to many people already. As with DAP2, there us one data response that can be modified (constrained) using a expression to limit the information it includes. The basic concepts of slicing an array are present using the same essential notation. We've taken care to allow for servers to extend this, some that is covered in a bit ore detail below under web services. We have replaces the selection part of the DAP2 constraint expression with a filter sub-expression that is applied to a specific variable. this enable two or more Sequences to have different filtering operations applied (before that was not possible). Our expanded constraint language also provides a way to subset coverages and a proposed extension to the filtering sub-expression provides a way to subset arrays/coverages by value.

We wanted DAP4 to fully embrace REST. DAP2, even though it predates the term, including many, but not all, of the REST architecture's features. One change from DAP2 was to explicitly define what happens when a client dereferences a 'bare URL' (one without an extension used to ask for a specific DAP4 response. When a DAP4 sever is asked to return information at a bare URL, the result is a Dataset Services Response (DSR) which contains links to all of the other responses for that dataset. In addition, the DSR may contain other information such as server operations that can be used with the dataset (and maybe only with the particular dataset). The DSR is an XML document but can contain a stylesheet that transforms it to HTML for a web browser.

DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.

Migrating from DAP2 to DAP4

  • If your server or client already reads DAP2 DDX responses (which were never part of the official protocol but are widely used) then adapting to the DMR will be very easy since they are very close in structure.
  • Support for the new constraints may take a bit more work since now the Constraint Expression a Server Functions have been separated.
  • Clients will benefit from asynchronous response support, but this is a new behavior and may take some serious thought, particularly for clients that relied on the simpler semantics borrowed from file system accesses.

Response-Encoding Changes

Summary:

  • Checksums for data values;
  • Reliable delivery of error messages to clients;
  • Encode data using the server's native word order.

We have added three changes to the encoding of returned data values. All top-level variables in a data response now include a CRC32 checksum of their values. This enables people to see if the same request is returning the same data values (maybe the data have been changed?). The checksum values are encoded in Attributes bound to the returned variables. We have add an encoding scheme for data values that preserves compactness yet allows clients to easily detect when a server has encountered an error while sending a response. Similarly, we have adopted a Reader Make Right encoding scheme instead of the network byte order scheme used by DAP2. The latter has become more and more important as the predominance of little-endian processors has increased.

Migrating from DAP2 to DAP4

In many ways the encoding scheme is simpler for servers because the data response uses the server's native byte order. Clients must detect the byte order and twiddle bytes as needed. However, the server must correctly implement the chunking protocol used by the data response and must correctly computer CRC32 checksums for each of the top level variables.

Changes in the Use of HTTP

Summary: DAP4 is closer than DAP2 to the REST (Representational state transfer) architecture, and it uses HATEOS (hypermedia as the engine of application state), making all of the server's responses explicit via links in a document.

While DAP2 interwove the DAP and HTTP, using, for example, some of the HTTP headers as the only source of information that was critical to the DAP itself, DAP4 does not. Instead, DAP4 is completely isolated from HTTP, enabling it to work with other protocols without change. This does not mean that DAP4 does not use HTTP, only that it does not rely on it, making it simple to implement DAP4 servers that use a different protocol for transport (AMQP, et c.). However, in as much as HTTP is a ubiquitous network transport protocol, the DAP4 specification includes a volume devoted solely to how a server should implement DAP4 using HTTP.

The REST interface for the protocol is described in Volume 2, Web Services, of the specification. DAP4 requires that a server implement at least three responses for each dataset: The DSR; DMR; and Data response. The DSR is a XML document that provides a capabilities response for the dataset. This document provides links to all of the other responses available for the dataset, along with other information. The DSR provides information about alternative encodings for the different responses in addition to enumerating the basic responses themselves. The DSR may also list server functions that may be used with/on the dataset.

DAP4 servers are encouraged to support HTTP content negotiation, providing the standard DSR, DMR and Data responses in a variety of forms.

Migrating from DAP2 to DAP4

The web service for DAP4 will likely need to be written from scratch, but the good news is that those are easy to write. For clients, the behavioral differences between DAP2 and DAP4 servers are small, with two exceptions. Since DAP4 supports asynchronous responses, clients will need to be modified to access data available only using this new feature. DAP4 also supports content negotiation and that means a larger number of ways to get the different responses (even though each protocol has three basic responses).

Acknowledgments

DAP4 is the result of a joint, multiyear development effort by OPeNDAP and Unidata, funded by a generous grant from NOAA and guided by an advisory committee comprising Mike Folk (THG), Jim Frew (UCSB), Steve Hankin (NOAA), Eric Kihn (NOAA), Chris Lynnes (NASA) and Rich Signell (USGS).


Old Material, probably to be Discarded

 The DAP Data Model

A DAP server typically makes accessible a collection of data sources, each identified by a unique (unadorned) URL. As discussed below, clients pose requests by modifying this URL with DAP-specific suffixes and query strings. The following subsections only summarize the formal specification, which takes precedent over anything stated here.

  Elements of a DAP Data Source

A DAP data source is fundamentally a collection of typed variables that have names, dimensions, attributes, and values. A variable's attributes and dimensions often are named as well. The types of variables in DAP are numerous, as outlined in the ensuing subsection. Furthermore, a variable having several dimensions is a natural and intuitive way to represent multidimensional arrays, and the DAP repertoire of client requests (see the subsection below on that topic) includes methods to retrieve subarrays per user specifications.

Attributes are much like variables except they are intended to facilitate interpretation of the entities to which they are assigned. In contrast, variables contain the primary content of a data source. The scope of an attribute is limited by the entity to which it is assigned (i.e., a variable, a group, or an entire data source)). Thus, for example, variables T and V attributes with the same name (say "Units") but these attributes can have distinct values, such as "K" when applied to T and "m/s" when applied to V. In contrast, dimensions are essentially named constants, so their (integer) values are completely independent of the variables with which they are associated.

Variables and their attributes may be collected into named groups (which can be nested to yield hierarchies), and variable names may be reused in multiple groups without generating conflicts. For example, a variable named T appearing in a group named G1 is understood to be distinct from and formally unrelated to a variable named T appearing in a second group named G2. Dimensions may not be assigned to groups, as their scope is always global, as indicated above. The numbers of groups, variables, attributes and dimensions in a data source is unlimited.

[?insert a table or tables showing how the above elements (i.e., groups, variables, types, dimensions, shapes relate to one another?]

  Atomic Types, Container Types and Enumeration Types

The specified type of any DAP variable, whether or not it is an array, must be an atomic type, a container type, or an enumerated type as described in the following paragraphs.

Atomic types - As with many programming languages the DAP atomic types include bytes, Integers (including 64-bit integers), floating-point values, and strings. DAP additionally includes the types URL and opaque, the latter to allow otherwise unspecified blobs of information.

Container types - structures, sequences, groups...

Enumeration types -