DAP 4.0 Essential Features

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Version
We need to be able to extend the protocol so that different versions of servers and clients can still interoperate. This is, of course, a going to have conditions where data that cannot be encoded using an older version of the protocol cannot be served to an older client (because there's no way for it to support the newer protocol and still be an 'older client'). But newer clients can understand the older protocol and should be have reasonably.
Error
We need a way to reliably report errors when they are detected in midstream of data transmission. This is the primary 'new feature' of the transmission part of the protocol.
NGrid
We need a way to represent more sophisticated kinds of data organization than the simple vector maps supported by Grid. This is a change to the data model of the protocol.
Shared Dimensions
Shared Dimensions may not be necessary if NGrid is defined correctly. Alternatively, they may provide a way to redefine how the concepts represented by Grid and NGrid are encoded in the data model. I think these two things are different approaches to the same basic concepts.

These three/four features are the critical changes to the DAP to move it from version 2 to version 4. Other types like 64-bit integers, Opaque, and Enum seem critical, but once version number processing is handled correctly, they become much simpler enhancements. The NGrid/Shred-Dimension data model extension is fundamentally one that will affect the flow of control in clients, so it has more profound consequnces.

Left off this list are two lurking issues:

  • Groups, as defined in HDF5
  • Asynchronous responses

Version

There are three techniques we are looking at and/or have implemented so far:

  • Using HTTP MIME headers (which can be generalized to be Request MIME document headers; not limited to HTTP)
  • Using a function or keyword in the query string part of the URL
  • Making new URLs by adding to the path part of the URL

None are perfect...

HTTP/MIME headers

The existing code uses the XDAP HTTP/MIME header. This could be adapted for use with other protocols but that can be cumbersome. This does not work for clients like web browsers without fairly complex setup. Compounding this is that client libraries are not using it.

Here's a compatibility matrix:

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Works*
New Server Partial** Works Works

*Client Falls back to DAP 2.0 (required by protocol)
**New servers have to support the old protocol, but there may be some features/plugins that decide they cannot render all data sets. In this case 'supporting the protocol' might mean returning an error message, which is why this gets 'Partial'.

Function/keyword

Another idea: Put something in the query part of the URL. This might look like this:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap(3.2),u[0:1:15][0:1:16][0:1:20]

Because parens are not technically allowed in a URL so the result can be fairly obtuse:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap%283.2%29,u[0:1:15][0:1:16][0:1:20]

However, this would not require changing the CE syntax, we could address the parens (or their escape characters) issue by adding keywords to the set of allowable things passed in a CE. The resulting URL might look like:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap3.2,u[0:1:15][0:1:16][0:1:20]

This would change the syntax of a CE, but not in a way that's too hard to deal with, since any particular server will know which keywords it can accept and can easily test for those.

Both of these URLs have the same basic problem: with a server that does not recognize the function or keyword, the response is an error (a 400 response). So this syntax breaks existing servers, which means that to use this syntax in an automated sense a client needs to check for a 400 response and, if one comes back, guess it might be the function/keyword and try the request without it.

The function syntax has the advantage that it is allowed by the current dap2 spec and it will never trigger a 'name collision' should a dataset have a variable that is the same name as a keyword. Finally, the function notation opens up the possibility of passing these keywords into handlers, because both libdap and handlers can register their own functions. Thus libdap can register a function that defines a behavior for 'version("dap3.2")' (or maybe 'dap3.2()') and the hdf4 handler can define a different behavior for it. The hdf4 handler could also define a function like 'short_names()' that would trigger that option. Given the libdap/hyrax structure, this would be easy to implement.

Here's a compatibility matrix:

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Partial*
New Server Partial** Works Works

*Client will get an obtuse error message when it uses the keyword and will have to either guess that maybe this is an old server or look at the version response.
**As with the above option using a MIME header, the 2.0 client may get an error response from the server. In this case, it can be verbose and explain the problem/situation, which is an improvement over the 3.x client --> Old server condition.

URL Path

We could embed the protocol version in the path part of the URL. Doing this will likely complicate issues we already have with the dispatch software. It also does not fix the issue, from the client perspective, that a request from a 3.x client to an old server results in a somewhat inscrutable error message. It does not require a change to the CE parser or hack to the code that does initial processing of the request (DODSFilter in libdap).

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Partial*
New Server Partial** Works Works

Discussion

If we implement both the first and second option, we get as much functionality as we possibly can (because choosing the third option doesn't get any new behavior) without having to overload the URL path, which is already overloaded.

Clients we write, or that are written with toolkits that follow the spec, can be savvy about using the MIME header and having a server check in two places is not that onerous of a burden to support protocol extensibility. That technique can work for other protocols, too, if they agree that the payload is a MIME document and the (non-HTTP-based) DAP server implemented contains a pre-processor that can pull apart the MIME document. At the same time, the keyword in the projection part of the URL will be pretty easy to extract from the CE, and is also independent of the underlying transport protocol (HTTP, AMQP, ...)

Rationale for new clients using the keyword: While the 400 error they will get back from an old server is a real drag, these clients can be modified to be smart(er) and could even hide a call to the version service.

Note: We will need to define the behavior of keywords. I think they should be limited to changing behavior of responses and not specifying the responses themselves. For example, the dap4 keyword requests a DAP4 response. We would still ask for the DDX using the .ddx suffix on the path name part of the URL and would not include a ddx keyword. It also means no version keyword...

Choosing the design where a keyword is required to specify any version other than DAP2 means that all DAP4 clients will forever have to specify that they want DAP4 behavior. Is this really what we want? We could make DDX and DataDDX DAP4 responses and use the protocol version keyword (dap3.2, ...) as a transitional tool for the DDX only. The DataDDX will be only DAP4 and we will keep it as an Alpha feature throughout the development. This means that the curly-brace syntax will not be available from DAP4 servers or using DAP4, which would be a setback since XML is really a machine encoding. It also means that we'll likely have to introduce new response types for the next change to the protocol.