DAP4: Overview: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 33: Line 33:


DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.
DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.
'''Migrating from DAP2 to DAP4'''
* If your server or client already reads DAP2 DDX responses (which were never part of the official protocol but are widely used) then adapting to the DMR will be very easy since they are very close in structure.
* Support for the new constraints may take a bit more work since now the Constraint Expression a Server Functions have been separated.
* Clients will benefit from asynchronous response support, but this is a new behavior and may take some serious thought, particularly for clients that relied on the simpler semantics borrowed from file system accesses.


== Response Encoding ==
== Response Encoding ==

Revision as of 03:14, 11 March 2014

DAP4 is the result of a three year effort by OPeNDAP and Unidata. When we started the DAP4 project there were a number of goals we had in mind, but none was more pressing than that the utility of DAP2 be extended by making it easier to build servers and clients that would be interoperable. This would make the effort spent n developing those tools more rewarding and less risky.

In many ways DAP4 is an extension of ideas already present in DAP2 or ideas that were introduced into DAP2 in the 10+ years that the protocol underwent evolution as its use spread within the Earth Sciences community. Most developers will find that adapting existing software that implements DAP2 to support DAP4 also is easy.

In this overview of DAP4, we highlight the important differences between DAP2 and DAP4. The information here should be useful to developers and to users. For developers this will provide a roadmap to changes they will have to make to support DAP4, including information about changes to the data model, response format, and content. For example, the Grid data type has been extended to support a more general notion of discrete functions, one that is very similar to the ODG's idea of a coverage or a scientific data type in the Common Data Model (CDM) developed by Unidata. For users it will provide information about new capabilities to look for in DAP4 clients. For example, all DAP4 servers will return checksums with every data access operation but different clients will provide those in different ways. Neither of these differences affects the underlying nature of DAP (2 or 4) which is that data values are accessed using subsetting operations in a way that shields the user from the idiosyncrasies of particular data formats or storage devices.

How the DAP4 Protocol is Specified

The DAP4 specification is provided in two volumes: One that describes the Data Model and Request/Response objects; and one that describes how DAP4 servers use HTTP and the existing web infrastructure. In addition, additional volumes that describe extensions to DAP4 will be supported. Planned extensions will cover a JSON encoding for DAP4 metadata and data and the operation of server processing operations. The division of the DAP4 specification into these separate documents makes it simpler to see how the data access operations that are central to DAP4 are separate from the network transfer protocol. By formalized this separation, we have paved the way for DAP4 extension documents to describe how the protocol might be used with other transport protocols. The extensions documents provide a way for developers to continue the evolution of the protocol without the expense and complexity of yet another protocol development project.

DAP4 and Data Access

Data Model

Summary: DAP4 supports generalized coverages and Groups

The DAP4 data model is fundamentally the same as with DAP2. Data are characterized as a collection of variables, each of which has a type, a name and one or more values. As with many programming languages and with DAP2, the types include Bytes, Integers (now including 64-bit integers), Floating point values, Strings, URLs, Structures and Sequences. We have added some new types in DAP4: Enumeration; Opaque; and Group. In addition, we have added Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages. In DAP4, Coverages provide a more comprehensive replacement for Grids, with the latter removed from DAP4.

In addition to variables, each data set can contain an arbitrary number of attributes and an arbitrary number of Groups. Attributes are a binding of name, type and value like a variable but are intended to hold metadata about the dataset and about each variable it contains. Groups provide a way to organize collections of variables and to encode these kinds of relationships when they are present in the underlying data store.

Migrating from DAP2 to DAP4

A DAP2 DDS/DAS (or DDX) is very close to a DAP4 DMR. The set of datatypes supported by DAP4 is almost a proper superset of those in DAP2, the exception being that DAP2's Grid type has been removed and in its place is a Coverage. A Coverage is not a type per se, instead it is a binding of two or more arrays using Shared Dimensions. Thus, to transform a DAP2 Grid into a Coverage for DAP4, the dimensions from the Grid's Maps will have to be extracted and used to make Shared Dimensions in the DMR. However, the DAP4 Coverage model completely subsumes DAP2 Grids, so it will be easy to represent Grids in DAP4.

Responses

Summary:

  • DAP4 includes only one dataset metadata response, not two;
  • Several Sequences may be individually constrained in one access;
  • Predictable behavior for URLs
  • Asynchronous responses

In DAP4 these is a single XML document that encodes the metadata for a data source. This response is conceptually similar to, and in some ways identical too, the DDX response that is supported by many DAP2 servers, so it's organization will be familiar to many people already. As with DAP2, there us one data response that can be modified (constrained) using a expression to limit the information it includes. The basic concepts of slicing an array are present using the same essential notation. We've taken care to allow for servers to extend this, some that is covered in a bit ore detail below under web services. We have replaces the selection part of the DAP2 constraint expression with a filter sub-expression that is applied to a specific variable. this enable two or more Sequences to have different filtering operations applied (before that was not possible). Our expanded constraint language also provides a way to subset coverages and a proposed extension to the filtering sub-expression provides a way to subset arrays/coverages by value.

We wanted DAP4 to fully embrace REST. DAP2, even though it predates the term, including many, but not all, of the REST architecture's features. One change from DAP2 was to explicitly define what happens when a client dereferences a 'bare URL' (one without an extension used to ask for a specific DAP4 response. When a DAP4 sever is asked to return information at a bare URL, the result is a Dataset Services Response (DSR) which contains links to all of the other responses for that dataset. In addition, the DSR may contain other information such as server operations that can be used with the dataset (and maybe only with the particular dataset). The DSR is an XML document but can contain a stylesheet that transforms it to HTML for a web browser.

DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.

Migrating from DAP2 to DAP4

  • If your server or client already reads DAP2 DDX responses (which were never part of the official protocol but are widely used) then adapting to the DMR will be very easy since they are very close in structure.
  • Support for the new constraints may take a bit more work since now the Constraint Expression a Server Functions have been separated.
  • Clients will benefit from asynchronous response support, but this is a new behavior and may take some serious thought, particularly for clients that relied on the simpler semantics borrowed from file system accesses.

Response Encoding

Summary:

  • Checksums for data values;
  • Reliable delivery of error messages to clients;
  • Encode data using the server's native word order.

We have added three changes to the encoding of returned data values. All top-level variables in a data response now include a CRC32 checksum of their values. This enables people to see if the same request is returning the same data values (maybe the data have been changed?). The checksum values are encoded in Attributes bound to the returned variables. We have add an encoding scheme for data values that preserves compactness yet allows clients to easily detect when a server has encountered an error while sending a response. Similarly, we have adopted a Reader Make Right encoding scheme instead of the network byte order scheme used by DAP2. The latter has become more and more important as the predominance of little-endian processors has increased.

How DAP4 Works with HTTP

While DAP2 interwove the DAP and HTTP, using, for example, some of the HTTP headers as the only source of information that was critical to the DAP itself, DAP4 does not. Instead, DAP4 is completely isolated from HTTP, enabling it to work with other protocols without change. This does not mean that DAP4 does not use HTTP, only that it does not rely on it, making it simple to implement DAP4 servers that use a different protocol for transport (AMQP, et c.). However, in as much as HTTP is a ubiquitous network transport protocol, the DAP4 specification includes a volume devoted solely to how a server should implement DAP4 using HTTP.

The REST interface for the protocol is described in Volume 2, Web Services, of the specification. DAP4 requires that a server implement at least three responses for each dataset: The DSR; DMR; and Data response. The DSR is a XML document that provides a capabilities response for the dataset. This document provides links to all of the other responses available for the dataset, along with other information. The DSR provides information about alternative encodings for the different responses in addition to enumerating the basic responses themselves. The DSR may also list server functions that may be used with/on the dataset.

DAP4 servers are encouraged to support HTTP content negotiation, providing the standard DSR, DMR and Data responses in a variety of forms.