DAP4: Overview: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 43: Line 43:
** Alternate media types - content negotiation
** Alternate media types - content negotiation
** DSR --> HATEOS
** DSR --> HATEOS
While DAP2 interwove the DAP and HTTP, using, for example, some of the HTTP headers as the only source of information that was critical to the DAP itself, DAP4 does not. Instead, DAP4 is completely separated from HTTP. This does not mean that DAP4 does not use HTTP, only that it does not rely on it, making it simple to implement DAP4 servers that use a different protocol for transport (AMQP, et c.). However, in as much as HTTP is a critical and almost ubiquitous network transport protocol, the DAP4 specification includes a volume devoted solely to how a server should implement DAP4 using HTTP.

Revision as of 04:00, 10 March 2014

DAP4 is the result of a three year effort by OPeNDAP and Unidata. When we started the DAP4 project there were a number of goals we had in mind, but none was more pressing than that the utility of DAP2 be extended by making it easier to build servers and clients that would be interoperable. This would make the effort spent n developing those tools more rewarding and less risky.

In many ways DAP4 is an extension of ideas already present in DAP2 or ideas that were introduced into DAP2 in the 10+ years that the protocol underwent evolution as its use spread within the Earth Sciences community. Most developers will find that adapting existing software that implements DAP2 to support DAP4 also is easy.

In this overview of DAP4, we highlight the important differences between DAP2 and DAP4. The information here should be useful to developers and to users. For developers this will provide a roadmap to changes they will have to make to support DAP4, including information about changes to the data model, response format, and content. For example, the Grid data type has been extended to support a more general notion of discrete functions, one that is very similar to the ODG's idea of a coverage or a scientific data type in the Common Data Model (CDM) developed by Unidata. For users it will provide information about new capabilities to look for in DAP4 clients. For example, all DAP4 servers will return checksums with every data access operation but different clients will provide those in different ways. Neither of these differences affects the underlying nature of DAP (2 or 4) which is that data values are accessed using subsetting operations in a way that shields the user from the idiosyncrasies of particular data formats or storage devices.

How the DAP4 Protocol is Specified

The DAP4 specification is provided in two volumes: One that describes the Data Model and Request/Response objects; and one that describes how DAP4 servers use HTTP and the existing web infrastructure. In addition, additional volumes that describe extensions to DAP4 will be supported. Planned extensions will cover a JSON encoding for DAP4 metadata and data and the operation of server processing operations. The division of the DAP4 specification into these separate documents makes it simpler to see how the data access operations that are central to DAP4 are separate from the network transfer protocol. By formalized this separation, we have paved the way for DAP4 extension documents to describe how the protocol might be used with other transport protocols. The extensions documents provide a way for developers to continue the evolution of the protocol without the expense and complexity of yet another protocol development project.

DAP4 and Data Access

Data Model

Summary: DAP4 supports generalized coverages and Groups

The DAP4 data model is fundamentally the same as with DAP2. Data are characterized as a collection of variables, each of which has a type, a name and one or more values. As with many programming languages and with DAP2, the types include Bytes, Integers (now including 64-bit integers), Floating point values, Strings, URLs, Structures and Sequences. We have added some new types in DAP4: Enumeration; Opaque; and Group. In addition, we have added Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages. In DAP4, Coverages provide a more comprehensive replacement for Grids, with the latter removed from DAP4.

In addition to variables, each data set can contain an arbitrary number of attributes and an arbitrary number of Groups. Attributes are a binding of name, type and value like a variable but are intended to hold metadata about the dataset and about each variable it contains. Groups provide a way to organize collections of variables and to encode these kinds of relationships when they are present in the underlying data store.

Responses

Summary:

  • DAP4 includes only one dataset metadata response, not two;
  • Several Sequences may be individually constrained in one access;
  • Predictable behavior for URLs
  • Asynchronous responses

In DAP4 these is a single XML document that encodes the metadata for a data source. This response is conceptually similar to, and in some ways identical too, the DDX response that is supported by many DAP2 servers, so it's organization will be familiar to many people already. As with DAP2, there us one data response that can be modified (constrained) using a expression to limit the information it includes. The basic concepts of slicing an array are present using the same essential notation. We've taken care to allow for servers to extend this, some that is covered in a bit ore detail below under web services. We have replaces the selection part of the DAP2 constraint expression with a filter sub-expression that is applied to a specific variable. this enable two or more Sequences to have different filtering operations applied (before that was not possible). Our expanded constraint language also provides a way to subset coverages and a proposed extension to the filtering sub-expression provides a way to subset arrays/coverages by value.

We wanted DAP4 to fully embrace REST. DAP2, even though it predates the term, including many, but not all, of the REST architecture's features. One change from DAP2 was to explicitly define what happens when a client dereferences a 'bare URL' (one without an extension used to ask for a specific DAP4 response. When a DAP4 sever is asked to return information at a bare URL, the result is a Dataset Services Response (DSR) which contains links to all of the other responses for that dataset. In addition, the DSR may contain other information such as server operations that can be used with the dataset (and maybe only with the particular dataset). The DSR is an XML document but can contain a stylesheet that transforms it to HTML for a web browser.

DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.

Response Encoding

Summary:

  • Checksums for data values;
  • Reliable delivery of error messages to clients;
  • Encode data using the server's native word order.

We have added three changes to the encoding of returned data values. All top-level variables in a data response now include a CRC32 checksum of their values. This enables people to see if the same request is returning the same data values (maybe the data have been changed?). The checksum values are encoded in Attributes bound to the returned variables. We have add an encoding scheme for data values that preserves compactness yet allows clients to easily detect when a server has encountered an error while sending a response. Similarly, we have adopted a Reader Make Right encoding scheme instead of the network byte order scheme used by DAP2. The latter has become more and more important as the predominance of little-endian processors has increased.

How DAP4 Works with HTTP

  • Web services
    • Defines how the protocol is used with HTTP --> implies that we've thought explicitly about using it with other protocols
    • Alternate media types - content negotiation
    • DSR --> HATEOS

While DAP2 interwove the DAP and HTTP, using, for example, some of the HTTP headers as the only source of information that was critical to the DAP itself, DAP4 does not. Instead, DAP4 is completely separated from HTTP. This does not mean that DAP4 does not use HTTP, only that it does not rely on it, making it simple to implement DAP4 servers that use a different protocol for transport (AMQP, et c.). However, in as much as HTTP is a critical and almost ubiquitous network transport protocol, the DAP4 specification includes a volume devoted solely to how a server should implement DAP4 using HTTP.