AIS Using NcML

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽


This and the BES Aggregation using NcML page go hand-in-hand. The essential idea is to use NcML as a syntax to describe both aggregations of data sets (e.g., HDF4 files) and ancillary information that should be added to a data set. The motivation for using NcML is to not invent a new syntax and instead build on an accepted one, maybe adding new features where we need them.

Use Cases

  1. Add the NcML handler to the BES
  2. Add attributes to a single data set
  3. Adding one or more attributes to a group of data sets This use case is not complete since the scan element is not defined outside of an aggregation element
  4. Using the NcML Handler to get information

Definitions

Aggregation
A single data set (i.e., something referenced by a single DAP URL) that is actually made up of two or more discreet things, each of which (potentially at least) has their own DAP URL.
Data set
Anything that can be referenced by a DAP URL and that will return the DAP responses when requested.
NcML
Syntax for ancillary data (attributes and variables) and aggregations used by the TDS

Background

This new handler will be used to introduce new attributes into data sets for the IOOS/WCS project and for the REAP project. In the first case, the augmented DDX response generated by the handler will be filtered through XSLT to produce a WCS response of one form or another. In the second case, the DDX will be filtered to produce an EML document. So, this handler and the collection(s) of XML/NcML/? documents will be an important part of several projects we're working on.

NcML Information

Here are links that describe NcML 2.2:

Notes:

  1. NcML 2.2 is based on the CDM and thus includes Groups and shared dimensions, which DAP 3.2 does not support. We will want to elide that feature until DAP 4 is done and well supported.

Design

We will build a new handler that uses DDS/DAS/DDX and/or DataDDS/DataDDX objects from other handlers along with information in NcML files, to return a response that describes the contents of a virtual data set. This design covers only using the NcML handler to build an AIS for Hyrax, but the same handler software can be used as part or other designs such as an aggregation handler.

Simple example: Suppose you wanted to add the string attribute "color" with value "red" to a variable "temperature" in some data set. In NcML, this would look like:

<netcdf xmlns:nc="..."
        location="file:/.../data.nc">
    <variable name="red" type="Int16">
        <attribute name="color" type="string" value="red"/>
    </variable>
</netcdf>

And lets say this is stored in new_data.ncml

When the BES is asked to return the DAS for /.../new_data.ncml:

  1. The NcML handler would open this file, parse it and see that the it contains new information for the data file file:/.../data.nc.
  2. It will use the BES to get the DAS for file:/.../data.nc. A useful variant of this design will used http://.. type URLs as well, even though for distribution that feature will need to be disabled.
  3. It will then apply the changes in the new_data.ncml file
    1. first parse the variable element and find the named variable in the DAS initially returned by the BES
    2. See that the attribute color is to be added (or overwritten, using NcML's rules for applying this stuff)
  4. Return the resulting DAS

This means that the the NcML file effectively defines a new data set. The data.nc data set is still available as before.

N.B.: We will scrap the existing AIS code in libdap - it's just too far from anything we want.

Notes on adopting NcML

There are some aspects to NcML, which is based on the Common Data Model, that don't exactly match one-to-one with DAP syntax, even though the two models do match up semantically in most respects.

Syntax/Semantics issues

Since NcML is based on the CDM, it's pretty close to DAP 3.2, but there are some issues to be finessed.

  1. NcML considers everything to have rank and thus does not name a special constructor type Array. Instead scalars have rank zero, arrays have rank greater than zero. A rank of one or more is denoted using variable@shape which lists the dimension names. For this to work, all dimensions must be named. I think that something not captured in the schema is that @shape can be a list of sizes when the dimensions are not named.
  2. NcML does not have a variable@type for unsigned types or for 'Grid' or 'Sequence'.
  3. Although not captured by the schema, it appears that NcML that modifies the attributes of an existing variable does not have to specify either the variable@type or variable@shape attributes. This might make the above moot. In that case, the variable@type and @shape attributes might only come into play when/if we use NcML/AIS to add new variables to the data set.
  4. The NcML dimension element is in line with CDM's notion of a dimension and more closely related to DAP's Grid Maps. A future DAP (e.g., 3.5?) might have support for shared dimensions.
  5. NcML also has a Group data type, something we don't yet have in DAP 3.x (but might in DAP 3.5).
  6. The NcML 2.2 schema uses one DataType (<xsd:simpleType name="DataType">) for both variables and attributes; we can use the Structure data type value for attribute containers even though the names are not the same.
  7. NcML does not have an otherXML attribute@type so we'll have to add that. Maybe we can overload the attribute@shape attribute so that it has the special dimension name otherXML? This idea will make purists gag, and rightly so, but it might be a good way to try NcML out without changing the design at all.

Longer-term, I think we want to see the following changes to NcML:

  1. Separate the attribute and variable element types so that there's a different type (xsd:simpleType) for each.
  2. Add Grid and Sequence to the set of types for a variable element
  3. Add otherXML to the set of types for attribute.
  4. Expand variable@shape so that it can contain integer dimension sizes in addition to names.

Detailed Design

This design is broken down into several steps.

  1. Build a BES handler that that can loaded into the BES and configured but which actually dos nothing;
  2. Parse NcML, read the XXX tag and return its value (the data source) as text;
  3. Use other handlers in the BES to get the requested object/response from the data source; and
  4. Use information in the NcML file (as parsed) to augment those in the object/response from the data source

Build a BES Handler

Parse NcML

Get the Response

Augment the Response

Deliverables

  1. The NcML handler. It will run in the BES.
  2. Instructions on how to use said handler.

Period of use

This will be used for the remainder of the IOOS and REAP projects and hopefully for a long time thereafter.