DAP4: Overview: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 1: Line 1:
DAP4 is the result of a three year effort by OPeNDAP and Unidata. When we started the DAP4 project there were a number of goals we had in mind, but none was more pressing than that the utility of DAP2 be extended by making it easier to build servers and clients that would be interoperable. This would make the effort spent n developing those tools more rewarding and less risky.


In many ways DAP4 is an extension of ideas already present in DAP2 or ideas that were introduced into DAP2 in the 10+ years that the protocol underwent evolution as its use spread within the Earth Sciences community. Most developers will find that adapting existing software that implements DAP2 to support DAP4 also is easy.
Following two decades of stability and increasing use, DAP2 is being superseded by DAP4, the first substantive revision in the history of the Data Access Protocol (DAP), an open-source endeavor led by OPeNDAP, Inc. The primary and continuing purpose of DAP is to realize remote, selective, data-acquisition as a widely-accepted and well-crafted set of Web services. This document outlines the fundamental concepts of DAP4 and (targeting those who previously have programmed DAP-compatible clients and servers) highlights how DAP4 differs from DAP2.


In this overview of DAP4, we highlight the important differences between DAP2 and DAP4. The information here should be useful to developers and to users. For developers this will provide a roadmap to changes they will have to make to support DAP4, including information about changes to the data model, response format, and content. For example, the Grid data type has been extended to support a more general notion of discrete functions, one that is very similar to the ODG's idea of a ''coverage'' or a ''scientific data type'' in the Common Data Model (CDM) developed by Unidata. For users it will provide information about new capabilities to look for in DAP4 clients. For example, all DAP4 servers will return checksums with every data access operation but different clients will provide those in different ways. Neither of these differences affects the underlying nature of DAP (2 or 4) which is that data values are accessed using subsetting operations in a way that shields the user from the idiosyncrasies of particular data formats or storage devices.
= Data Acquisition via Web Services =


= How the DAP4 Protocol is Specified =
The premise underlying DAP4 remains, as in DAP2, that values from datasets—or, notably, from proper subsets—along with pertinent metadata may be acquired remotely via Web services. To a pleasantly surprising degree, DAP services shield users from idiosyncrasies in source-data formats and storage, so DAP functions as middleware with a further advantage: source-data and users may reside anywhere that Internet connectivity permits Web use. OPeNDAP's commitment to open source has fostered several DAP-compatible servers and a larger number of DAP-compatible client environments, many of which (i.e., servers, clients and client-server libraries) are available at no cost.
The DAP4 specification is provided in two volumes: One that describes the Data Model and Request/Response objects; and one that describes how DAP4 servers use HTTP and the existing web infrastructure. In addition, additional volumes that describe ''extensions'' to DAP4 will be supported. Planned extensions will cover a JSON encoding for DAP4 metadata and data and the operation of server processing operations. The division of the DAP4 specification into these separate documents makes it simpler to see how the data access operations that are central to DAP4 are separate from the network transfer protocol. By formalized this separation, we have paved the way for DAP4 extension documents to describe how the protocol might be used with other transport protocols. The ''extensions'' documents provide a way for developers to continue the evolution of the protocol without the expense and complexity of yet another protocol development project.
 
== [more to come here, laying out the DAP4 concept in a manner that's accessible to those completely unfamiliar...] ==
 
= How DAP4 is Specified =
 
The DAP4 specification spans two volumes: one describes the Data Model and DAP4’s Request/Response objects; the other volume describes how DAP4 clients and servers communicate via HTTP and the modern Web. New volumes about DAP4 Extensions will be added as they emerge.
 
Partitioning the specification into two primary documents reflects the independence of DAP4’s data-acquisition functionality from the underlying network transfer protocol. Indeed, DAP4 could (via extensions) be used with other transports. However, utilizing HTTP eases the building of DAP servers because they can take full advantage of widely used Web-server frameworks such as Apache. Use of Extensions documents will enable evolution of the protocol without the expense and complexity of another major protocol-development project. Anticipated extensions include a JSON encoding for DAP4 data/metadata and the provision of server functions (beyond DAP4’s core subsetting and filtering operations).
 
== [?should we insert here the table of contents (with active links) for volume I?] ==
== [?should we insert here the table of contents (with active links) for volumes II?] ==
 
= How DAP4 Differs from DAP2 =
 
Though the protocol, per se, is maintained primarily by OPeNDAP, many others have engaged in DAP2 realization. One implementation—by Unidata, in the University Corp. for Atmospheric Research—includes the popular THREDDS Data Server (TDS). A key motivation for DAP4, developed jointly by OPeNDAP and Unidata (see "Acknowledgments," below), was to reduce differences that have arisen, and impede interoperability, among DAP2 realizations. Our hope is that a modernized, clearer and more comprehensive specification will facilitate building clients and servers with greater interoperability, making such ventures more rewarding and less risky.
 
This section covers changes to the data model, response formats, and serialization, giving developers a roadmap to migration from DAP2 to DAP4. E.g., the “Grid” type now supports a notion of discrete functions similar to an OGC (or ISO) Coverage and to the Scientific Data Type found in Unidata’s Common Data Model (CDM). Also from this section, users of learn of functionalities to seek in their clients. E.g., DAP4 servers return checksums with each data response, but clients may utilize these in varying degrees.
 
DAP4 is largely an extension of DAP2 concepts, including ideas that emerged as DAP gained prominence across the Earth sciences. Therefore DAP2-compatible software, in clients or servers, should be easy to adapt to DAP4, and this has been affirmed in the OPeNDAP-Unidata realization and testing work. Furthermore, DAP4 exhibits backward compatibility sufficient to enable gradual transitioning. Substantive changes include support for Groups, yielding greater compatibility with HDF and NetCDF4.
 
== [most or all of the (as yet unedited) material below will be folded into subsections here, probably including:] ==
== Data Model ==
== Responses ==
== Response Encoding ==
 
= Acknowledgments =
 
DAP4 is the result of a joint, multiyear development effort by OPeNDAP and Unidata, funded by a generous grant from NOAA and guided by an advisory committee comprising Mike Folk (THG), Jim Frew (UCSB), Steve Hankin (NOAA), Eric Kihn (NOAA), Chris Lynnes (NASA) and Rich Signell (USGS).
 
 
= ____unedited material____ =


= DAP4 and Data Access =
= DAP4 and Data Access =

Revision as of 19:25, 25 March 2014

Following two decades of stability and increasing use, DAP2 is being superseded by DAP4, the first substantive revision in the history of the Data Access Protocol (DAP), an open-source endeavor led by OPeNDAP, Inc. The primary and continuing purpose of DAP is to realize remote, selective, data-acquisition as a widely-accepted and well-crafted set of Web services. This document outlines the fundamental concepts of DAP4 and (targeting those who previously have programmed DAP-compatible clients and servers) highlights how DAP4 differs from DAP2.

Data Acquisition via Web Services

The premise underlying DAP4 remains, as in DAP2, that values from datasets—or, notably, from proper subsets—along with pertinent metadata may be acquired remotely via Web services. To a pleasantly surprising degree, DAP services shield users from idiosyncrasies in source-data formats and storage, so DAP functions as middleware with a further advantage: source-data and users may reside anywhere that Internet connectivity permits Web use. OPeNDAP's commitment to open source has fostered several DAP-compatible servers and a larger number of DAP-compatible client environments, many of which (i.e., servers, clients and client-server libraries) are available at no cost.

[more to come here, laying out the DAP4 concept in a manner that's accessible to those completely unfamiliar...]

How DAP4 is Specified

The DAP4 specification spans two volumes: one describes the Data Model and DAP4’s Request/Response objects; the other volume describes how DAP4 clients and servers communicate via HTTP and the modern Web. New volumes about DAP4 Extensions will be added as they emerge.

Partitioning the specification into two primary documents reflects the independence of DAP4’s data-acquisition functionality from the underlying network transfer protocol. Indeed, DAP4 could (via extensions) be used with other transports. However, utilizing HTTP eases the building of DAP servers because they can take full advantage of widely used Web-server frameworks such as Apache. Use of Extensions documents will enable evolution of the protocol without the expense and complexity of another major protocol-development project. Anticipated extensions include a JSON encoding for DAP4 data/metadata and the provision of server functions (beyond DAP4’s core subsetting and filtering operations).

[?should we insert here the table of contents (with active links) for volume I?]

[?should we insert here the table of contents (with active links) for volumes II?]

How DAP4 Differs from DAP2

Though the protocol, per se, is maintained primarily by OPeNDAP, many others have engaged in DAP2 realization. One implementation—by Unidata, in the University Corp. for Atmospheric Research—includes the popular THREDDS Data Server (TDS). A key motivation for DAP4, developed jointly by OPeNDAP and Unidata (see "Acknowledgments," below), was to reduce differences that have arisen, and impede interoperability, among DAP2 realizations. Our hope is that a modernized, clearer and more comprehensive specification will facilitate building clients and servers with greater interoperability, making such ventures more rewarding and less risky.

This section covers changes to the data model, response formats, and serialization, giving developers a roadmap to migration from DAP2 to DAP4. E.g., the “Grid” type now supports a notion of discrete functions similar to an OGC (or ISO) Coverage and to the Scientific Data Type found in Unidata’s Common Data Model (CDM). Also from this section, users of learn of functionalities to seek in their clients. E.g., DAP4 servers return checksums with each data response, but clients may utilize these in varying degrees.

DAP4 is largely an extension of DAP2 concepts, including ideas that emerged as DAP gained prominence across the Earth sciences. Therefore DAP2-compatible software, in clients or servers, should be easy to adapt to DAP4, and this has been affirmed in the OPeNDAP-Unidata realization and testing work. Furthermore, DAP4 exhibits backward compatibility sufficient to enable gradual transitioning. Substantive changes include support for Groups, yielding greater compatibility with HDF and NetCDF4.

[most or all of the (as yet unedited) material below will be folded into subsections here, probably including:]

Data Model

Responses

Response Encoding

Acknowledgments

DAP4 is the result of a joint, multiyear development effort by OPeNDAP and Unidata, funded by a generous grant from NOAA and guided by an advisory committee comprising Mike Folk (THG), Jim Frew (UCSB), Steve Hankin (NOAA), Eric Kihn (NOAA), Chris Lynnes (NASA) and Rich Signell (USGS).


____unedited material____

DAP4 and Data Access

Data Model

Summary: DAP4 supports generalized coverages and Groups

The DAP4 data model is fundamentally the same as with DAP2. Data are characterized as a collection of variables, each of which has a type, a name and one or more values. As with many programming languages and with DAP2, the types include Bytes, Integers (now including 64-bit integers), Floating point values, Strings, URLs, Structures and Sequences. We have added some new types in DAP4: Enumeration; Opaque; and Group. In addition, we have added Shared Dimensions that serve to indicate relations between different arrays which can be used to build/represent Coverages. In DAP4, Coverages provide a more comprehensive replacement for Grids, with the latter removed from DAP4.

In addition to variables, each data set can contain an arbitrary number of attributes and an arbitrary number of Groups. Attributes are a binding of name, type and value like a variable but are intended to hold metadata about the dataset and about each variable it contains. Groups provide a way to organize collections of variables and to encode these kinds of relationships when they are present in the underlying data store.

Migrating from DAP2 to DAP4

For servers: A DAP2 DDS/DAS (or DDX) is very close to a DAP4 DMR. The set of datatypes supported by DAP4 is almost a proper superset of those in DAP2, the exception being that DAP2's Grid type has been removed and in its place is a Coverage. A Coverage is not a type per se, instead it is a binding of two or more arrays using Shared Dimensions. Thus, to transform a DAP2 Grid into a Coverage for DAP4, the dimensions from the Grid's Maps will have to be extracted and used to make Shared Dimensions in the DMR. However, the DAP4 Coverage model completely subsumes DAP2 Grids, so it will be easy to represent Grids in DAP4.

For clients: Some of the new data types are more challenging to implement than the types included with DAP2. Of particular note are Enumerations and Coverages.

Responses

Summary:

  • DAP4 includes only one dataset metadata response, not two;
  • Several Sequences may be individually constrained in one access;
  • Predictable behavior for URLs
  • Asynchronous responses

In DAP4 these is a single XML document that encodes the metadata for a data source. This response is conceptually similar to, and in some ways identical too, the DDX response that is supported by many DAP2 servers, so it's organization will be familiar to many people already. As with DAP2, there us one data response that can be modified (constrained) using a expression to limit the information it includes. The basic concepts of slicing an array are present using the same essential notation. We've taken care to allow for servers to extend this, some that is covered in a bit ore detail below under web services. We have replaces the selection part of the DAP2 constraint expression with a filter sub-expression that is applied to a specific variable. this enable two or more Sequences to have different filtering operations applied (before that was not possible). Our expanded constraint language also provides a way to subset coverages and a proposed extension to the filtering sub-expression provides a way to subset arrays/coverages by value.

We wanted DAP4 to fully embrace REST. DAP2, even though it predates the term, including many, but not all, of the REST architecture's features. One change from DAP2 was to explicitly define what happens when a client dereferences a 'bare URL' (one without an extension used to ask for a specific DAP4 response. When a DAP4 sever is asked to return information at a bare URL, the result is a Dataset Services Response (DSR) which contains links to all of the other responses for that dataset. In addition, the DSR may contain other information such as server operations that can be used with the dataset (and maybe only with the particular dataset). The DSR is an XML document but can contain a stylesheet that transforms it to HTML for a web browser.

DAP4 servers also support asynchronous access to data, which enables access to data in near-line devices and can be used for some server processing operations (e.g., operations that take a long time to perform). Asynchronous access it accomplished by combining a switch in the request that informs the server that the client knows the request may not have an immediate response with a response that contains a URL to a response that will be ready in the future instead of the response itself.

Migrating from DAP2 to DAP4

  • If your server or client already reads DAP2 DDX responses (which were never part of the official protocol but are widely used) then adapting to the DMR will be very easy since they are very close in structure.
  • Support for the new constraints may take a bit more work since now the Constraint Expression a Server Functions have been separated.
  • Clients will benefit from asynchronous response support, but this is a new behavior and may take some serious thought, particularly for clients that relied on the simpler semantics borrowed from file system accesses.

Response Encoding

Summary:

  • Checksums for data values;
  • Reliable delivery of error messages to clients;
  • Encode data using the server's native word order.

We have added three changes to the encoding of returned data values. All top-level variables in a data response now include a CRC32 checksum of their values. This enables people to see if the same request is returning the same data values (maybe the data have been changed?). The checksum values are encoded in Attributes bound to the returned variables. We have add an encoding scheme for data values that preserves compactness yet allows clients to easily detect when a server has encountered an error while sending a response. Similarly, we have adopted a Reader Make Right encoding scheme instead of the network byte order scheme used by DAP2. The latter has become more and more important as the predominance of little-endian processors has increased.

Migrating from DAP2 to DAP4

In many ways the encoding scheme is simpler for servers because the data response uses the server's native byte order. Clients must detect the byte order and twiddle bytes as needed. However, the server must correctly implement the chunking protocol used by the data response and must correctly computer CRC32 checksums for each of the top level variables.

How DAP4 Works with HTTP

Summary: DAP4 comes closer to the REST (Representational state transfer) architecture and uses HATEOS (hypermedia as the engine of application state) making all of the server's responses explicit via links in a document.

While DAP2 interwove the DAP and HTTP, using, for example, some of the HTTP headers as the only source of information that was critical to the DAP itself, DAP4 does not. Instead, DAP4 is completely isolated from HTTP, enabling it to work with other protocols without change. This does not mean that DAP4 does not use HTTP, only that it does not rely on it, making it simple to implement DAP4 servers that use a different protocol for transport (AMQP, et c.). However, in as much as HTTP is a ubiquitous network transport protocol, the DAP4 specification includes a volume devoted solely to how a server should implement DAP4 using HTTP.

The REST interface for the protocol is described in Volume 2, Web Services, of the specification. DAP4 requires that a server implement at least three responses for each dataset: The DSR; DMR; and Data response. The DSR is a XML document that provides a capabilities response for the dataset. This document provides links to all of the other responses available for the dataset, along with other information. The DSR provides information about alternative encodings for the different responses in addition to enumerating the basic responses themselves. The DSR may also list server functions that may be used with/on the dataset.

DAP4 servers are encouraged to support HTTP content negotiation, providing the standard DSR, DMR and Data responses in a variety of forms.

Migrating from DAP2 to DAP4

The web service for DAP4 will likely need to be written from scratch, but the good news is that those are easy to write. For clients, the behavioral differences between DAP2 and DAP4 servers are small, with two exceptions. Since DAP4 supports asynchronous responses, clients will need to be modified to access data available only using this new feature. DAP4 also supports content negotiation and that means a larger number of ways to get the different responses (even though each protocol has three basic responses).