DAP4: Responses

From OPeNDAP Documentation

This is an old document that captures the starting point of the OPULS design work. It's out of date and should be referenced only as a baseline for the work.

<-- back to OPULS Development

Author: Jimg, NDP, ?

1 Overview

In response to DAP4 requests, a DAP4 system returns a chunked, multi-part MIME document containing the appropriate DAP4 response. This document describes the DAP4 responses, the manner in which they are bundled into MIME-Documents, and the chunked response structure.

2 Response Chunking

All DAP4 responses from a DAP4 server will be chunked, independently of any chunking utilized by protocols such as HTTP. This chunking is essentially is a message based communications scheme. Messages are sent in chunks and handled by the recipient. Out-of-band information can be passed via the extension chunks. The information contained in the extension chunks MAY be used to change the meaning/context of subsequent data chunks. For now we will only be sending Errors in the extension chunks.

The details of DAP4 Chunking are here.

This chunking schema provides the following desired outcomes:

  1. It provides a way for the server to send errors to the client in an out-of-band manner.
  2. It allows clients to know exactly how many bytes they can expect to read without a server (as opposed to a connection) error.
  3. It allows the client software to be in a position to deal with error messages as they arise, and easily locate the error content in the input stream.
  4. Because all messages are chunked, even errors generated by, say, the parseing of the constraint expression can be returned to the client in an error chunk (as the only chunk in the stream).
  5. Does not preclude a client from reading data from a partially completed response.

3 Persistent representations

DAP4 defines only two core responses that represent all of the information in a dataset: The Dataset and Data. (See the DAP4 Web Services document for a complete list of response objects - both required and suggested.)

Dataset
The Dataset response is requested by appending the suffix .xml to the file part of the dataset's referent (aka base) URL. The Dataset response is an XML document that contains all of the metadata included in the original dataset.
Data
The Data response is requested by appending the suffix .dap to the file part of the dataset's referent (aka base) URL. The Data response is a multipart MIME document that contains a N+1 parts for a response with N variables.

3.1 Dataset Metadata Response

In DAP2, there existed important information was present only in the HTTP headers. In DAP4, all of the information specified by the protocol will be present in the Dataset Metadata Response (DMR) document. Some of that information may also be present in HTTP headers when it's appropriate, because doing so simplifies processing the response.

3.1.1 Document Organization

In DAP4 the DAP2 data model has been be extended to include many new concepts and components. Groups, Shared dimensions and user-defined types are just a few of the new additions. For a more complete discussion see the new data model.

A rough syntax which describes how these additions will fit into the DAP and the existing Dataset notation is:

Dataset :== Groups
Groups :== null | Group Groups
Group :== SharedDimensions Attributes Groups Variables 
Dimensions :== null | SharedDimension Dimensions
Attributes :== null | Attribute Attributes
Variables :== null | Variable Variables

This pseudo-grammar does not capture what can be produced for a Group, et cetera. Instead it shows how these sections of the <Dataset/> document must be organized.

An XML schema for the Dataset response object may be found here: http://scm.opendap.org/trac/browser/trunk/xml/dap/dap4.xsd

NB: If a <Dataset/> document describes a dataset that has been constrained, attributes will not be included. It is not possible to know if attributes correctly describe the data once it has been constrained.

3.1.2 The Dataset Element

The Dataset element is the root element of the Dataset response.

The Dataset element has the following attributes:

name
The name of the dataset. This can be any name the server chooses. This should probably be the name of the file or database table/token.
version
The version of DAP used by the server to form this Dataset. This must be in int dot int form (e.g., "3.2", "4.11").
xml:base
The value of the xml:base attribute is the URL which was dereferenced to get this Dataset. The xml namespace should also be declared in the Dataset element.

NB: Because the <Dataset/> element, as defined by the schema, uses the Dublin Core, XLink and XML namespaces, those must be present in the element or elsewhere in the document (although, of course, you don't have to use the prefixes dc, xlink and xml, please do use them and please do define the namespaces in the <Dataset/> element). As with any XML document, you can define other namespaces anywhere they are needed.

Here's an example of the Dataset element declaration:

<Dataset name="fnoc1.nc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xml.opendap.org/ns/DAP/4.0#  http://xml.opendap.org/dap/dap4.xsd"
    xmlns="http://xml.opendap.org/ns/DAP/4.0#"
    xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#"
    dapVersion="4.0"
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
    xml:base="http://test.opendap.org/opendap/data/nc/fnoc1.nc.ddx"
    xmlns:xlink="..."
    xmlns:dc="..."
>
.
.
.
</Dataset>

3.2 Data Response

A Data response is the way DAP4 returns data to a client. Each Data response is returned over the wire as a multipart MIME document where the first MIME part contains the constrained Dataset Metadata response describing the data requested and the following MIME part contains the binary encoded data values for each variable in the dataset. MIME headers are included in the binary part the identify the endianness of the binary content.

Some aspects of this design have been borrowed from the W3C's "SOAP Messages with Attachments" and the OGC's "WCS Version 1.1 Corrigendum 2" specifications. See also The MIME Multipart/Related Content-type (rfc 2387) and MIME part one.

In DAP2 the 'data' or 'DataDDS' response is a MIME document with Content-Type 'application/octet-stream' which means essentially that the contents of the MIME document are binary and application specific, in this case specific to applications the understand DAP2. Within that dcoument, the DDS is used to provide the syntax needed to decode the binary information. Following the DDS is a separator and following that are data values written to the document using XDR.
The use of XDR is solely to ensure that the data values can be read on both little- and big-endian machines and that floating-point values do not suffer from the many different representations commonly found. In additon, XDR is used to include information about the size of arrays, string ans URLs, the latter two of which are really special case arry types. Thus XDR provides a common encoding for the bits and bytes to be transferred. It does not. however, represent any of the more complex structural information such as the organization of relational data.
The DDS sent with the DataDDS response is used to describe the organization of the data not covered by XDR. For example, if the response calls for values from three variables to be returned, the DDS in the DataDDS response will list those three variables and, furthermore, do so in the order that their values appear in the response. The variables described in the DDS response match exactly in number, type, shape and order with the data in the 'data part' of the response.

The Data response follows the basic design of DAP2's DataDDS response closely. The Dataset document included describes the number, type, shape and order of each variable with values in the binary part of the response. However, while the DAP2 response used a simple application/octet-stream document, DAP4 uses a multipart MIME document. The design of this document/response can accommodate including including several different data requests in one document, a feature useful for implementations of DAP that do not use HTTP for transport.

3.2.1 Response Chunking Of Binary Data Part

The binary part of the DAP4 Data Response will be chunked, independently of any chunking utilized by underlying protocols such as HTTP. This chunking is essentially is a message based communications scheme. Messages are sent in chunks and handled by the recipient. Out-of-band information can be passed via the extension chunks. The information contained in the extension chunks MAY be used to change the meaning/context of subsequent data chunks. For now we will only be sending Errors in the extension chunks.

The details of DAP4 Chunking are here.

This chunking schema provides the following desired outcomes:

  1. It provides a way for the server to send errors to the client in an out-of-band manner.
  2. It allows clients to know exactly how many bytes they can expect to read without a server (as opposed to a connection) error.
  3. It allows the client software to be in a position to deal with error messages as they arise, and easily locate the error content in the input stream.
  4. Because all messages are chunked, even errors generated by, say, the parseing of the constraint expression can be returned to the client in an error chunk (as the only chunk in the stream).
  5. Does not preclude a client from reading data from a partially completed response.

3.2.2 Transmitting Attributes in constrained Dataset documents

The Dataset document contained in a constrained Dataset or in the Data response will not contain any Attribute nodes. (The Dataset document in the Data response is always 'constrained'.)

Since the contents of the Data response are the result of access to the data subject to a constraint, various aspects of any of the variables in the response may have been changed. To make these changes the DAP must take into account the semantics of each of the variables' data types. It can do this because the semantics for the types are well defined and known a priori. However, this is not the case for attributes, where the semantics are intentionally not part of the DAP. The DAP is merely an 'envelope' for the name-type-value tuples of the attributes.

To understand why this restriction is placed on the Dataset document returned in the Data response, lets examine a common example. Suppose an image has some extent and has attributes that name that extent. A geographical image might have attributes that provide the latitude and longitude of two opposite corners and a medial image might have attributes that provide the height and width in millimeters. Now suppose the image is constrained in one or more dimensions, how should the attribute values be treated? If they are left alone they are likely no longer correct but to modify them requires detailed information about how they map to the image and while this information might be know to a client that has an understanding of a particular subject area, expecting the server to handle them correctly would require it to know about every subject area for all of the data to be served.

An alternative to 'universal knowledge' is to allow servers to return attributes that have 'well known' semantics and drop other attributes. While this is appealing at first, it presents a complex situation to clients because to make use of the attributes in the return DataDDX response they must know to test for them and if not present, fallback to some default behavior. In our opinion, it is easier to present clients with fewer 'optional behaviors', especially when the fallback is likely to compute the needed value anyway.

3.2.3 Organization of the multipart MIME document

Here's what the shell of the document looks like:

   Content-Type: multipart/related; type="application/vnd.org.opendap.dap4.data"; start="<<start id>>";  boundary="<<boundary>>"
 
   --<<boundary>>
   Content-Type: application/vnd.org.opendap.dap4.dataset-metadata+xml; charset=UTF-8
   Content-Transfer-Encoding: binary
   Content-Id: <<start id>>
   Content-Description: ddx

   <<Dataset document here. This includes a reference to <<data id>> >>

   --<<boundary>>
   Content-Type: application/vnd.org.opendap.dap4.data.big-endian
   Content-Transfer-Encoding: binary
   Content-Id: <<data id>>
   Content-Description: data
   
   <<Binary data>>
      
   --<<boundary>>

The example shows three sets of MIME headers separated by three --<<boundary>> lines; the third boundary line terminates the document. The first group of headers (in a real response, there would be other headers here like Date, XDAP, and others) provide information need to recognize the boundary separators and to find the first part of the document by matching the value of start to a Content-Id of one of the parts. The payload of that first part contains references to the related parts using the values of their Content-Id headers.

The Dataset document in the first part is unlike the one sent as it's own response in that it

  1. Contains no Attribute objects.
  2. Each DAP4 variable declaration will contain an xlink:href whose value is the value of the Content-Id of the MIME part containing the XDR encoded binary data for the variable.

3.2.4 Choosing values for the Data document Content-Ids and Boundaries

We would like the software that builds these Data responses to be compatible with as many different transport protocols as possible, so long as the cost to the implementation for which we know we must support is low. One thing that some transport protocols may do is combine several Data responses into a single document and, while the specifics of that will vary between protocols, one choice we can make now that will facilitate that is to ensure that the values of the Content-Ids and <<boundary>>s are unique within and across systems. This will free software that combines Data responses from having to process the Dataset document and Content-Id header to ensure that no name collisions are present. While using UUIDs, for example, makes the result values 'ugly', it adds virtually nothing to the time needed to build or process the responses. Other schemes, that combine a URI with some system-generated token could also be employed. The important point is to ensure that these symbols are unique not only within a system, but across systems.

3.2.5 Changes to the encoding of data

There are some issues with the way data values are encoded in DAP2 that we can address now.

  1. Arrays are prefixed with their sizes, the total number of elements, twice in DAP 2 because of an initial misuse of the xdr library. Now is the time to fix that and have just one copy of the Array size in DAP 4.
  2. Sequences are encoded in a way that's optimal but which requires fairly complex Constraint expression evaluation. We can reduce the likelihood that servers fail to implement the Selection sub-expression evaluation by simplifying it a bit.
  3. We can embed tags in the binary data to make it easier to read.

3.3 Error Response

An unsuccessful DAP4 request will cause the server to return a DAP4 error response. The error response may be returned in lieu of the Dataset response, or as part of the Data response. The XML used in the Error response is detailed in the DAP4 schema.

DAP4 Data responses are chunked and DAP4 errors always appear in an error chunk. As the client processes a DAP4 response it reads the (fixed length) chunk header prior to reading the chunk. The chunk header will signal to the client that the following chunk contains a DAP4 error. This enables the client to transition to an error processing state prior to ingesting the error. This is true even if the response contains only an error chunk.

3.3.1 Internal Error

The error is internal to the Server, most likely a programming bug/issue.

Example
<Error type="Internal">
    <Message>The server encountered a null pointer. Ouch.</Message>
    <Administrator>admin.email.address@your.domain.name</Administrator>
</InternalError>

3.3.2 User Syntax Error

The request contains a syntax error in the selection or the projection clause.

Example
<Error type="Syntax">
    <Message>Relational constrains may not be applied to DAP Structures.</Message>
    <Administrator>admin.email.address@your.domain.name</Administrator>
</Error>

3.3.3 Forbidden Error

The requestor is not allowed to access the resource.

Example
<Error type="Forbidden">
    <Message>The requested resource may not be accessed.</Message>
    <Administrator>admin.email.address@your.domain.name</Administrator>
</Error>

3.3.4 Not Found Error

The request resource cannot be found

Example
<Error type="NotFound">
    <Message>Unable to locate resource /data/nc/fnoc10.nc</Message>
    <Administrator>admin.email.address@your.domain.name</Administrator>
</Error>

4 Asynchronous Responses

Rather than duplicating content (and maintaining multiple copies) I have simply moved the content of this section to it's own DAP4 proposal page. When it's sorted out and adopted I'll move it back. ndp 13:26, 5 April 2012 (PDT)