AIS Using NcML

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽


This and the BES Aggregation using NcML page go hand-in-hand. The essential idea is to use NcML as a syntax to describe both aggregations of data sets (e.g., HDF4 files) and ancillary information that should be added to a data set. The motivation for using NcML is to not invent a new syntax and instead build on an accepted one, maybe adding new features where we need them.

Use Cases

  1. Add the NcML handler to the BES
  2. Add attributes to a single data set
  3. Adding one or more attributes to a group of data sets This use case is not complete since the scan element is not defined outside of an aggregation element
  4. Using the NcML Handler to get information

Definitions

Aggregation
A single data set (i.e., something referenced by a single DAP URL) that is actually made up of two or more discreet things, each of which (potentially at least) has their own DAP URL.
Data set
Anything that can be referenced by a DAP URL and that will return the DAP responses when requested.
NcML
Syntax for ancillary data (attributes and variables) and aggregations used by the TDS

Background

This new BES handler will be used to introduce new attributes into data sets for the IOOS/WCS project and for the REAP project. In the first case, the augmented DDX response generated by the handler will be filtered through XSLT to produce a WCS response of one form or another. In the second case, the DDX will be filtered to produce an EML document. So, this handler and the collection(s) of XML/NcML/? documents will be an important part of several projects we're working on. Beyond these two projects, this handler will provide important features to Hyrax.

Hyrax & BES Documentation

NcML Information

Here are links that describe NcML 2.2:

Notes:

  1. NcML 2.2 is based on the CDM and thus includes Groups and shared dimensions, which DAP 3.2 does not support. We will want to elide that feature until DAP 4 is done and well supported.

Design

Given a Hyrax server with a single URL that looks like http://test.opendap.org/dap/data/nc/fnoc1.nc from which you can get the usual set of DAP things (DAS, DDS, DataDDS, ASCII, HTML form and Info) adding the NcML Handler to that server's BES and writing a suitable NcML file (e.g., /data/ncml/fnoc_improved.ncml) would cause that server to have a second URL that would look like hhtp://test.opendap.org/dap/data/ncml/fnoc_improved.ncml to a DAP client. A DAP client could get the usual cast of suspects for this URL, too. Lets assume that the /data/ncml/fnoc_improved.ncml file adds some attributes to the /data/nc/fnoc1.nc data file, then the file /data/ncml/fnoc_improved.ncml would look something like:

<netcdf location="/data/nc/fnoc1.nc">

   <variable name="u">
       <attribute type="int32" name="max" value="2000"/>
       <attribute type="int32" name="min" value="0"/>
   </variable>

</netcdf>

The NcML handler would be triggered by the BES to read this file, it would see that the value of location is '/data/nc/fnoc1.nc' so it would invoke the BES *within which its running* to get the needed DAP object. The BES would sort out how to do that and just go do it, returning the right thing to the NcML handler which would then parse the rest of the NcML file and stuff the additional info into the DAS/DDS/DDX and return the end result.

The NcML handler will use the NcML document to find a 'source' data file and read a DAP object from it and then augment that DAP object using information in the NcML file. Because the NcML handler will use the BES to get the DAP objects, it will be able to add information to any file served by the BES, including those that are served by custom or 'one-off' handlers. This will make the NcML handler very flexible.

Example responses

Suppose the fnoc1.nc data set returns the following DDX:

<?xml version="1.0" encoding="UTF-8"?>
<Dataset name="fnoc1.nc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://xml.opendap.org/ns/DAP2"
xsi:schemaLocation="http://xml.opendap.org/ns/DAP2  http://xml.opendap.org/dap/dap2.xsd">

    <Array name="u">
        <Attribute name="units" type="String">
            <value>meter per second</value>
        </Attribute>
        <Attribute name="long_name" type="String">
            <value>Vector wind eastward component</value>
        </Attribute>
        <Attribute name="missing_value" type="String">
            <value>-32767</value>
        </Attribute>
        <Attribute name="scale_factor" type="String">
            <value>0.005</value>
        </Attribute>
        <Int16/>
        <dimension name="time_a" size="16"/>
        <dimension name="lat" size="17"/>
        <dimension name="lon" size="21"/>
    </Array>

Here's the DDX that would be returned when accessing the fnoc_improved.ncml data set (I've put 'data set' in bold because I want to emphasize that the NcML file essentially defines a new data set and the 'old' data set (i.e., fnoc1.nc) is still available using its URL.

<?xml version="1.0" encoding="UTF-8"?>
<Dataset name="fnoc1.nc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://xml.opendap.org/ns/DAP2"
xsi:schemaLocation="http://xml.opendap.org/ns/DAP2  http://xml.opendap.org/dap/dap2.xsd">

    <Array name="u">
        <Attribute name="units" type="String">
            <value>meter per second</value>
        </Attribute>
        <Attribute name="long_name" type="String">
            <value>Vector wind eastward component</value>
        </Attribute>
        <Attribute name="missing_value" type="String">
            <value>-32767</value>
        </Attribute>
        <Attribute name="scale_factor" type="String">
            <value>0.005</value>
        </Attribute>

        <Attribute name="max" type="Int32">
            <value>2000</value>
        </Attribute>
        <Attribute name="min" type="Int32">
            <value>0</value>
        </Attribute>

        <Int16/>
        <dimension name="time_a" size="16"/>
        <dimension name="lat" size="17"/>
        <dimension name="lon" size="21"/>
    </Array>

Notes on adopting NcML

There are some aspects to NcML, which is based on the Common Data Model, that don't exactly match one-to-one with DAP syntax, even though the two models do match up semantically in most respects.

Syntax/Semantics issues

Since NcML is based on the CDM, it's pretty close to DAP 3.2, but there are some issues to be finessed.

  1. NcML considers everything to have rank and thus does not name a special constructor type Array. Instead scalars have rank zero, arrays have rank greater than zero. A rank of one or more is denoted using variable@shape which lists the dimension names.
  2. Expand variable@shape so that it can contain integer dimension sizes in addition to names. This is the case in the code at Unidata, but it has not made it into the schema yet (April 2009).
  3. NcML does not have a variable@type for unsigned types or for 'Grid' or 'Sequence'.
  4. Although not captured by the schema, it appears that NcML that modifies the attributes of an existing variable does not have to specify either the variable@type or variable@shape attributes. This might make the above moot. In that case, the variable@type and @shape attributes might only come into play when/if we use NcML/AIS to add new variables to the data set.
  5. The NcML dimension element is in line with CDM's notion of a dimension and more closely related to DAP's Grid Maps. A future DAP (e.g., 3.5?) might have support for shared dimensions.
  6. NcML also has a Group data type, something we don't yet have in DAP 3.x (but might in DAP 3.5).
  7. The NcML 2.2 schema uses one DataType (<xsd:simpleType name="DataType">) for both variables and attributes; we can use the Structure data type value for attribute containers even though the names are not the same.
  8. NcML does not have an otherXML attribute@type so we'll have to add that. Maybe we can overload the attribute@shape attribute so that it has the special dimension name otherXML? This idea will make purists gag, and rightly so, but it might be a good way to try NcML out without changing the design at all.

Longer-term, I think we want to see the following changes to NcML:

  1. Separate the attribute and variable element types so that there's a different type (xsd:simpleType) for each.
  2. Add Grid and Sequence to the set of types for a variable element
  3. Add otherXML to the set of types for attribute.

Detailed Design

Control Flow in the NcML AIS Handler

The overall design of the NcML AIS handler is shown to the right in a UML Activity diagram. First the handler receives a request for a certain response given a specific container. In general a handler can get a request that involves several containers, but not this handler, at least not in the initial versions. Then the request is split into one for metadata (a DDS, DAS or DDX) or data (DataDDS). In the latter case the NcML is parsed only to determine the netcdf@localtion attribute's value and that data source's DataDDS is accessed using the BES and that response is returned by this handler. In the case of a metadata request, the DDX response is sought for the data source named in the @location attribute and then augmented with information in the NcML file. The result is used to return on of the three DAP2/3/4 metadata responses.

Here is a high resolution version of the activity diagram shown to the right.

Important points for this design:

NcML AIS: Build a BES Handler

This describes how to build a basic DAP handler for the BES

Parse NcML

Since NcML was designed for the netCDF data model and that does not match exactly the DAP data model, how should various parts of NcML be used by this handler?

Get the Response from the BES

How do you get a response object 'within the BES?' In other words, one BES typically has a number of handlers installed for a variety of data formats (netcdf, hdf4, et c.) and it also has handlers that modify standard DAP responses like the ASCII handler (which takes a DataDDS response and builds an ASCII text/plain type MIME document from it). The ASCII handler does this by asking another handler within the BES in which it's running to get the DataDDS. In the case of the NcML AIS, however, we won't be using the same sort of mechanism as eh ASCII handler. Instead, the mechanism used by the NcML AIS handler will be similar to that used in the WCS gateway.

Augment the Response

Once the handler has the DDX response, what does it need to do to insert new information in the C++ object(s)?

Deliverables

  1. The NcML handler. It will run in the BES.
  2. Instructions on how to use said handler.

Period of use

This will be used for the remainder of the IOOS and REAP projects and hopefully for a long time thereafter.