DAP4: Checksum

From OPeNDAP Documentation
Revision as of 21:35, 1 February 2012 by Jimg (talk | contribs) (Created page with "== Version == See Version in the DAP 4.0 Design for the low down on this feature. There are three techniques we are looking at and/or have impl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

Version

See Version in the DAP 4.0 Design for the low down on this feature.

There are three techniques we are looking at and/or have implemented so far:

  • Using HTTP MIME headers (which can be generalized to be Request MIME document headers; not limited to HTTP)
  • Using a function or keyword in the query string part of the URL
  • Making new URLs by adding to the path part of the URL
  • Define a new set of response types that are strictly DAP4 using a new set of extensions. This could make it easier to move more toward a REST protocol, although that really has more to do with the server specification and not the transport protocol.

None are perfect... Discussion of the options follows.

HTTP/MIME headers

The existing code uses the XDAP HTTP/MIME header. This could be adapted for use with other protocols but that can be cumbersome. This does not work for clients like web browsers without fairly complex setup. Compounding this is that client libraries are not using it.

Here's a compatibility matrix:

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Works*
New Server Partial** Works Works

*Client Falls back to DAP 2.0 (required by protocol)
**New servers have to support the old protocol, but there may be some features/plugins that decide they cannot render all data sets. In this case 'supporting the protocol' might mean returning an error message, which is why this gets 'Partial'.

Function/keyword

Another idea: Put a function call in the query part of the URL. This might look like this:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap(3.2),u[0:1:15][0:1:16][0:1:20]

However, because parens are not technically allowed in a URL so the result can be fairly obtuse:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap%283.2%29,u[0:1:15][0:1:16][0:1:20]

On the plus side, this would not require changing the CE syntax.

We could address the parens (or the cumbersome escape character) issue by adding keywords to the set of allowable things passed in a CE. The resulting URL might look like:

http://test.opendap.org/opendap/data/nc/fnoc1.nc.ascii?dap3.2,u[0:1:15][0:1:16][0:1:20]

This would change the syntax of a CE, but not in a way that's too hard to deal with, especially if keywords are restricted to appearing before any of the data set's variables.

Both of these URLs have the same basic problem: with a server that does not recognize the function or keyword, the response is an error (a 400 response). So this syntax breaks existing servers, which means that to use this syntax in an automated sense a client needs to check for a 400 response and, if one comes back, guess it might be the function/keyword and try the request without it.

Here's a compatibility matrix:

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Partial*
New Server Partial** Works Works

*Client will get an obtuse error message when it uses the keyword and will have to either guess that maybe this is an old server or look at the version response.
**As with the above option using a MIME header, the 2.0 client may get an error response from the server. In this case, it can be verbose and explain the problem/situation, which is an improvement over the 3.x client --> Old server condition.

URL Path

We could embed the protocol version in the path part of the URL. Doing this will likely complicate issues we already have with the dispatch software. It also does not fix the issue, from the client perspective, that a request from a 3.x client to an old server results in a somewhat inscrutable error message. It does not require a change to the CE parser or hack to the code that does initial processing of the request (DODSFilter in libdap) as with the keyword option (but note that the function option also does not require that hack). This would require extra processing on the path, however.

Compatibility Matrix
Server kind 2.0 Client Generic client 3.x Client
Old Server Works Works Partial*
New Server Partial** Works Works

Discussion

If we implement both the first and second option, we get as much functionality as we possibly can (because choosing the third option doesn't get any new behavior over the second) without having to overload the URL path, which is already overloaded, or do processing on the path, which we currently do not do.

Clients we write, or that are written with toolkits that follow the spec, can be savvy about using the MIME header. Having a server check in two places for this kind of information is not that onerous of a burden to support protocol extensibility. That technique can work for other transport protocols (e.g., using DAP over AMQP or GridFTP instead of HTTP), too, if they agree that the payload is a MIME document and the (non-HTTP-based) DAP server implementation contains a pre-processor that can pull apart the MIME document. At the same time, the keyword in the projection part of the URL will be pretty easy to extract from the CE, and is also independent of the underlying transport protocol (HTTP, AMQP, ...)

Rationale for new clients using the keyword: While the 400 error they will get back from an old server is a real drag, these clients can be modified to be smart(er) and could even hide a call to the version service.

Note: We will need to define the behavior of keywords. I think they should be limited to changing behavior of responses and not specifying the responses themselves. For example, the dap4 keyword requests a DAP4 response. We would still ask for the DDX using the .ddx suffix on the path name part of the URL and would not include a ddx keyword. It also means no version keyword...

Choosing the design where a keyword is required to specify any version other than DAP2 means that all DAP4 clients will forever have to specify that they want DAP4 behavior. Is this really what we want? We could make DDX and DataDDX DAP4 responses and use the protocol version keyword (dap3.2, ...) as a transitional tool for the DDX only. The DataDDX will be only DAP4 and we will keep it as an Alpha feature throughout the development. This means that the curly-brace syntax will not be available from DAP4 servers or using DAP4, which would be a setback since XML is really a machine encoding. It also means that we'll likely have to introduce new response types for the next change to the protocol.

Arguments for the function syntax over the keyword syntax

The function syntax has the advantage that it is allowed by the current DAP2 specification. In addition. it will never trigger a 'name collision' should a dataset have a variable that is the same name as a keyword (it might be that this mechanism is needed/used for things beyond just protocol negotiation). Finally, for Hyrax/libdap, the function notation opens up the possibility of passing these keywords into handlers, because both libdap and handlers can register their own functions. Thus, libdap can register a function that defines a behavior for 'version("dap3.2")' (or maybe 'dap3.2()') and the hdf4 handler can define a different behavior for it. The hdf4 handler could also define a function like 'short_names()' that would trigger that option. Given the libdap/hyrax structure, this would be easy to implement. This would provide a way for users to choose options that are currently set in the configuration files of the BES (in addition to HDF4's short_names, the netcdf handler's treatment of shared dimensions comes to mind).

Here's the type definition for one of these functions, from libdap:

// Projection function: A function that appears in the projection part of the
// CE and is executed for its side-effect. Usually adds a new variable to
// the DDS. These are run _during the parse_ so their side-effects can be used
// by subsequent parts of the CE.

typedef void(*proj_func)(int argc, BaseType *argv[], DDS &dds, ConstraintEvaluator &ce);

The most important points are that a projection function takes four arguments:

argc
The number of variables referenced by the second argument
argv
The arguments given in the Constraint expression
dds
The data set's DDS object. This is the lexically scoped environment in which the function will be evaluated. The function can modify this enviroment, by adding to it, calling methods that the DDS class defines, ...
ce
The ConstraintEvaluator object being used to evaluate this function. Because the function is run during the parse and because the ConstraintEvaluator object holds the parse tree for the constraint expression, a projection function can use this argument to alter the expression its part of (by adding clauses, for example.

Arguments for the Keyword syntax

In our server, and this might be a more general issue, the DDS is built (populated with the data set variables) before the CE is evaluated. This is the case because the constraint expression 'language' uses variable names whose bindings are the data set, so the DDS - the lexical environment in which the CE is evaluated - must exist first. However, we want the DAP protocol version to control how that DDS is built (not using DAP4-only types for a DAP2 client, e.g.). This is a catch-22 and argues for the keyword approach because those are parsed before the CE. They can simply be removed from the front of the HTTP Query String.

However, using this approach with the simple syntax means that keywords and dataset variable names could collide. While this seems unlikely (how many datasets will have a variable named dap3.2?) if more keywords are added it could become an issue.

Implementation

I've implemented the Keywords approach in libdap, but used the syntax of functions. I did this because:

  1. The processing of this information must come before evaluation of the constraint;
  2. it's impossible to be sure that a token used as a keyword won't also appear in a data set;
  3. the function notation provides an easy way to pass in values (2.0, true) for the keyword/function; and
  4. using the functional notation is not much more work than a plain keyword.

Originally I added a new class to libdap named ResponseBuiler that replaces DODSFilter and removes the baggage that class had for processing command line arguments (DODSFilter is still in the library). This was a first attempt and it makes the code in DODSFilter unnecessary so we can eventually remove it, dumping a lot of stuff that's no longer used by our server. However, setting the value in a class that builds responses in the handler didn't work well with the BES design, so...

I added a Keywords class that can be used to parse the keywords and store them for later use. As a side effect the parse method returns the CE without the keywords, so that it can be processed by the ConstraintExpression evaluator. The BES has to make use of this code to see if a keyword was used to pass in a value for the DAP version that the client is requesting.

That code can be found in BESDDSResponseHandler.cc and related classes, although there are still issues with the BES and getting the constraint into that code so that the Keywords class can read the information, remove it from the CE and set the correct fields in the DDS.

Given a constraint like 'dap(4.0),u,v' the Keywords class will:

  1. parse the 'dap(4.0)' and separate the word and value given the syntax word '(' value ')'
  2. ensure that word is registered as a known keyword and that value is registered as one of the allowed values
  3. record the keyword and its value
  4. provide access to value, using the keyword as an index.

I added an instance of Keywords to DDS and then a method that returns a reference to the Keywords instance in the DDS.