DAP3/4: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 42: Line 42:


* [[DDX]] (although this implementation needs significant review and update)
* [[DDX]] (although this implementation needs significant review and update)
* [[DataDDX]] (using the DDX to return data in a fashion similar to the way data are returned using the DDS now)
* [[XDAP header]] for the protocol version (Where should the  XDODS-Server and XOPeNDAP-Server headers go?)
* [[XDAP header]] for the protocol version (Where should the  XDODS-Server and XOPeNDAP-Server headers go?)
* [[Version response]] which provides information about software versions and the DAP version for a server.
* [[Version response]] which provides information about software versions and the DAP version for a server.

Revision as of 21:07, 16 September 2008

Structure and Function of this Page

This is both a place to document the implemented features of the various 3.x versions of DAP and a place to describe and discuss the designs that might be implemented. For example, the XDAP page contains information on how 3.1 conforming clients and servers should treat the XDAP header based on its current implementation in libdap. It also contains the design for that header as it should behave in DAP 3.2.

Note: See the discussion pages to talk/debate a feature or describe how it should be implemented.

DAP2 and DAP4 Specification efforts

The original (2004) DAP4 specification work morphed into separate DAP2 and DAP4 specification efforts. The DAP 2.0 specification (official version with notes, et c.) was approved by NASA ESE/SPG as a 'Community Standard' on 10/8/2007.

This page holds the DAP4 design and specification information. Note that ESE RFC 003 is the reference for NASA ESE/SPG standards documents.

Here's the original (2003!) DAP 4.0 draft document (and todo list) which we shelved in favor or writing the DAP 2.0 document because the latter described existing software - a requirement of the ESE/SPG.

Implementations OPeNDAP Maintains

OPeNDAP has three implementations of the DAP that it maintains: libdap (C++), Java-OPeNDAP (Java) and Ocapi (C; client-side only). There are other implementations which have been developed by other groups.

Versions
Library Implements
libdap (C++) 3.1
Java-OPeNDAP 2.0
Ocapi (C) 2.0 - client-side

Completed Changes From DAP2

Features of DAP3/4 implemented in libdap

This title is a bit misleading since the bullets here are both features as implemented at version X.Y and designs to be implemented at X.Y++. The libdap library is the reference implementation for DAP. Note: On some of these pages, especially ones with designs, the discussion section/tab at the top of the page has some important information.

Here are the DAP3 features implemented so far and designs for those features as we learn about their use:

  • DDX (although this implementation needs significant review and update)
  • DataDDX (using the DDX to return data in a fashion similar to the way data are returned using the DDS now)
  • XDAP header for the protocol version (Where should the XDODS-Server and XOPeNDAP-Server headers go?)
  • Version response which provides information about software versions and the DAP version for a server.

Other changes that affect the behavior of libdap, Hyrax, et c., and DAP 2.0/3.x

  1. In the distant past it was decided that handlers should wrap String attribute values in double quotes. Why is no longer known. However, this meant that the values of the attributes were not exactly the values in the files (because of the added quote characters). This was changed so that libdap 3.8.3 has code to add the quotes, if they are not present, when writing out the DAS (but not the DDX). Doing this ensures that the DAS will only contain String attributes with quoted values which means we know it will parse. The test was added so that the new libdap would work with old handers which will still be adding the quotes. at the same time, the handlers from the same vintage no longer add the quotes. For the latter, the compile-time constant ATTR_STRING_QUOTE_FIX should be defined; not defining that symbol gets the old, broken, behavior from the handlers.

Proposed Changes To DAP

Most desirable features

This is a list of the very most desirable features proposed for DAP3 (which when final will become DAP4). Only the very most important items are listed here, other documents list everything that's ever been suggested.

This ideas came directly from our experience with users of DAP 2 and from writing the DAP 2.0 specification:

  • Reliable error delivery when accessing data (known defect);
  • Eliminating the repetition of the array size for an array of Atomic types in the Data Response;
  • Support for 'any character in a name' using double quotes or some other scheme (use the DDX);
  • Support for the discovery of server-side functions (See CE Discovery for a draft design);
  • Add an XML representation for the error response. Use a Content-Description of "dap4-errorx" for this.

These ideas are very general and are paired with their 'justification':

  • Use of Multi-part MIME for DataDDX (feature --> adherence to standard where applicable);
  • Develop the DataDDX (feature --> adherence to standard ...). Use a Content-Description of "dap4-datax" for this;
  • Ability to return checksums (or digital watermarks) from data requests (feature with several use cases);

From the IOOS work:

  • Add the source URL of the DDX to the DDX as the value of the XML attribute Dataset@xml:base

Project-driven proposed changes

Changes from the NC-DAP project

The NC-DAP project's goal is to integrate support for DAP into Unidata's netCDF code base. This affects DAP because concurrent with this effort is the evolution of netCDF from he Version 3 to Version 4 API which included some substantial data model changes. Here are ideas from the netCDF 4 data model (aka the Common Data Model). In addition to netCDF, HDF5 also shares many aspects of the CDM, so these features are likely to help with compatibility with HDF5 as well. Information about the NC-DAP project is maintained in Trac under the active/pending projects heading.

Here's text from the June NC-DAP project meeting which we can edit here (without altering the record of the meeting):

We decided to lump a number of new types and features into DAP version 3.3 since this will follow nicely from the DAP 3.1 and nascent 3.2 versions. However, as part of this OPeNDAP will make a set of on-line documents that will be used to describe the changes that are being made to the DAP protocol so that those pages can be combined with the DAP 2.0, blessed by NASA, to get a picture of just how DAP 3.x should behave. For now we are not going to put much effort into trying to develop a finalized follow-on to DAP 2.0 -- the specification/protocol we have been referring to as DAP 4 for some time now -- and instead focus on adding features needed for netCDF 4 in an organized and documented way.

There are several new features to be added to DAP in DAP 3.3:

  1. Shared dimensions (which will indicate a common grid among variables that share dimensions).
  2. User defined types.
  3. An analogue to HDF5/netCDF-4 Groups


Shared dimensions will require DAP to include a way to specify that dimensions are to be shared among variables. We discussed that sharing means more than just 'the sizes of the dimensions are the same in two places' because sharing means that they will always be that way and closes the possibility that, for any given example, it is just chance. Using hrefs or xpath to tie subsequent uses of a dimension to an initial definition is one way to implement this, but it may make the notion of sharing depend too much on context.

User defined types are a way to simplify the description of a data source when the same structure is used many times over. They are part of HDF5 and netCDF-4 and while not used heavily now, are likely to be used as these formats become used more and more for archival storage. Supporting user defined types does not seem like a hard thing to do and it will provide a way to serve netCDF-4 files (and HDF5 files) that contain only type definitions.

Groups in HDF5 and netCDF-4 are similar to namespaces in modern programming languages. NetCDF4 does not support Groups in the full sense of HDF5's Groups because it does not allow a single variable to be a member of several groups. The netCDF-4 rules for Groups include:

  • All variables in the Group are visible
  • All dimensions in the Group plus all in enclosing parent Groups are visible (i.e., dimensions can be inherited)
  • All user-defined types are globally visible

Groups are different than Structures in the following ways:

  • Groups hold definitions and relations about parts of a file/granule (they are a logical entity)
  • Structures hold data
  • Structures imply 'locality' (that the data in the structure are all easily/quickly accessed in one operation) while Groups provide no such guarantee.
  • There are no arrays of Groups.


There are several new types to be added to DAP:

  • Opaque.
  • Strings encoded using UTF-8
  • Enumerations

One additional 'type' issue to be resolved is how best to handle the vlen data type. The (old) List type is the obvious choice but it was of (otherwise) no use and was removed. That leaves Sequence which carries with it some extra baggage in that servers are required to support relational operations on Sequences while vlens (as coded in the HDF5 and netCDF-4 APIs) don't. We talked about this and I (James) would like to propose that since vlens are generally short and that since the CE evaluator is part of libdap++, that adding support for applying a relational constraint expression to data read from a vlen should not be that hard, especially given that vlens are intended to be used for fairly small chunks of data. The alternatives are to make DAP more complicated by adding List back in (a bad idea given our previous experiences) or to provide several different behaviors for Sequences (also bad because it makes clients harder to write). Adding a little complexity to the the handful of servers that have to support the vlen type is reasonable trade off for a more robust system with simpler clients.


Changes From IOOS project

Part of the IOOS work is to write a WCS extension to Hyrax that allows Hyrax to serve DAP data as Coverages through a WCS interface. Part of that work involves using RDF and OWL to generate semantic interpretations of DAP data in order to identify the geospatial variables. To that end some changes to the DDX need to be made to facilitate the RDF work.

  • Add the source URL of the DDX to the DDX as the value of the XML attribute Dataset@xml:base. See trac ticket 1169 for more info on this.



All of the proposed changes to date

Items in italics have been moved up to the 'most desirable features' section which means they are slated for addition to DAP.

  • Data Access Features:
    • Checksum: Provide a checksum with each (data only?) response and also a way to access just the checksum for a given response.
    • Cyclic access to array indexes
    • Relational array constraints
  • Organization of attributes: The ideas from this email (2005) are relevant to recent (2008) work on the DDX and semantic web reasoners.
  • Use of MIME: Basic ideas here have been developed in much more detail as a result of work on SOAP interfaces for DAP (which themselves have been sort of a flop).
  • Length Bytes: How to encode length information that is not limited by word sizes of different architectures.
  • Aliases: The idea that we will need to alias DAP attributes has been kicking around for a while although its practical use has been nearly nil. Aliases might be better handled using XPath in the DDX/DataDDX. Never-the-less, here for your enjoyment is the old discussion on this topic...