AIS Using NcML
This and the BES Aggregation using NcML page go hand-in-hand. The essential idea is to use NcML as a syntax to describe both aggregations of data sets (e.g., HDF4 files) and ancillary information that should be added to a data set. The motivation for using NcML is to not invent a new syntax and instead build on an accepted one, maybe adding new features where we need them.
- Add the NcML handler to the BES
- Add attributes to a single data set
- Adding one or more attributes to a group of data sets This use case is not complete since the scan element is not defined outside of an aggregation element
- Using the NcML Handler to get information
- Ancillary Information Service
- Hyrax is the next generation server from OPeNDAP. It utilizes a modular design that employs a light weight Java servlet (aka OLFS) to provide the public-accessible client interface, and a back-end daemon, the BES to handle the heavy lifting.
- OPeNDAP Back-End Server (BES) is a high-performance back-end server software framework that allows data providers more flexibility in providing end users views of their data. The current OPeNDAP data objects (DAS, DDS, and DataDDS) are still supported, but now data providers can add new data views, provide new functionality, and new features to their end users through the BES modular design. Providers can add new data handlers, new data objects/views, the ability to define views with constraints and aggregation, the ability to add reporting mechanisms, initialization hooks, and more.
- The OPeNDAP Lightweight Frontend Servlet (OLFS) provides the public-accessible client interface for Hyrax. The OLFS communicates with the Back End Server (BES) to provide data and catalog services to clients. The OLFS implements the DAP2 protocol and supports some of the new DAP4 features.
- A single data set (i.e., something referenced by a single DAP URL) that is actually made up of two or more discreet things, each of which (potentially at least) has their own DAP URL.
- Data set
- Anything that can be referenced by a DAP URL and that will return the DAP responses when requested.
- Syntax for ancillary data (attributes and variables) and aggregations used by the TDS
- NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
- Hierarchical Data Format (HDF) is provided by The HDF Group. The HDF Group provides a unique suite of technologies and supporting services that make possible the management of large and complex data collections. Its mission is to advance and support HDF technologies and ensure long-term access to HDF data.
- The OpenGIS® Web Coverage Service Interface Standard (WCS) defines a standard interface and operations that enables interoperable access to geospatial "coverages" . The term "grid coverages" typically refers to content such as satellite images, digital aerial photos, digital elevation data, and other phenomena represented by values at each measurement point.
This new BES handler will be used to introduce new attributes into data sets for the IOOS/WCS project and for the REAP project. In the first case, the augmented DDX response generated by the handler will be filtered through XSLT to produce a WCS response of one form or another. In the second case, the DDX will be filtered to produce an EML document. So, this handler and the collection(s) of XML/NcML/? documents will be an important part of several projects we're working on. Beyond these two projects, this handler will provide important features to Hyrax.
Hyrax & BES Documentation
- Hyrax: All about the Hyrax data server
- Hyrax - Create BES Module: Documentation for a script that will make a skeleton module/handler for you.
- Hyrax - Example BES Modules: An example module.
- Hyrax - Extending BES Module: How to extend the BES
- Extending the OLFS: Writing Custom Dispatch Handlers (PowerPoint)
Here are links that describe NcML 2.2:
- NcML 2.2 Tutorial This starts out with a page on NcML and then goes onto a page about aggregation. It's from the latter that we can see the parallels with our stuff here.
- Annotated Schema for NcML-2.2
- NcML 2.2 is based on the CDM and thus includes Groups and shared dimensions, which DAP 3.2 does not support. We will want to elide that feature until DAP 4 is done and well supported.
Given a Hyrax server with a single URL that looks like http://test.opendap.org/dap/data/nc/fnoc1.nc from which you can get the usual set of DAP data products (DAS, DDS, DataDDS, ASCII, HTML form and Info) adding the NcML Handler to that server's BES and writing a suitable NcML file (e.g., /data/ncml/fnoc_improved.ncml) would cause that server to have a second URL that would look like hhtp://test.opendap.org/dap/data/ncml/fnoc_improved.ncml to a DAP client. A DAP client could get the usual cast of suspects for this URL, too. Lets assume that the /data/ncml/fnoc_improved.ncml file adds some attributes to the /data/nc/fnoc1.nc data file, then the file /data/ncml/fnoc_improved.ncml would look something like:
<netcdf location="/data/nc/fnoc1.nc"> <variable name="u"> <attribute type="int32" name="max" value="2000"/> <attribute type="int32" name="min" value="0"/> </variable> </netcdf>
The NcML handler would be triggered by the BES to read this file, it would see that the value of location is '/data/nc/fnoc1.nc' so it would invoke the BES *within which it's running* to get the needed DAP object. The BES would sort out how to do that and just go do it, returning the right thing to the NcML handler which would then parse the rest of the NcML file and stuff the additional info into the DAS/DDS/DDX and return the end result.
The NcML handler will use the NcML document to find a 'source' data file and read a DAP object from it and then augment that DAP object using information in the NcML file. Because the NcML handler will use the BES to get the DAP objects, it will be able to add information to any file served by the BES, including those that are served by custom or 'one-off' handlers. This will make the NcML handler very flexible.
Suppose the fnoc1.nc data set returns the following DDX:
<?xml version="1.0" encoding="UTF-8"?> <Dataset name="fnoc1.nc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xml.opendap.org/ns/DAP2" xsi:schemaLocation="http://xml.opendap.org/ns/DAP2 http://xml.opendap.org/dap/dap2.xsd"> <Array name="u"> <Attribute name="units" type="String"> <value>meter per second</value> </Attribute> <Attribute name="long_name" type="String"> <value>Vector wind eastward component</value> </Attribute> <Attribute name="missing_value" type="String"> <value>-32767</value> </Attribute> <Attribute name="scale_factor" type="String"> <value>0.005</value> </Attribute> <Int16/> <dimension name="time_a" size="16"/> <dimension name="lat" size="17"/> <dimension name="lon" size="21"/> </Array>
Here's the DDX that would be returned when accessing the fnoc_improved.ncml data set (I've put 'data set' in bold because I want to emphasize that the NcML file essentially defines a new data set and the 'old' data set (i.e., fnoc1.nc) is still available using its URL.
<?xml version="1.0" encoding="UTF-8"?> <Dataset name="fnoc1.nc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xml.opendap.org/ns/DAP2" xsi:schemaLocation="http://xml.opendap.org/ns/DAP2 http://xml.opendap.org/dap/dap2.xsd"> <Array name="u"> <Attribute name="units" type="String"> <value>meter per second</value> </Attribute> <Attribute name="long_name" type="String"> <value>Vector wind eastward component</value> </Attribute> <Attribute name="missing_value" type="String"> <value>-32767</value> </Attribute> <Attribute name="scale_factor" type="String"> <value>0.005</value> </Attribute> <!-- Here is the added stuff --> <Attribute name="max" type="Int32"> <value>2000</value> </Attribute> <Attribute name="min" type="Int32"> <value>0</value> </Attribute> <!-- End of the added stuff --> <Int16/> <dimension name="time_a" size="16"/> <dimension name="lat" size="17"/> <dimension name="lon" size="21"/> </Array>
The overall design of the NcML AIS handler is shown to the right in a UML Activity diagram. First the handler receives a request for a certain response given a specific container. In general a handler can get a request that involves several containers, but not this handler, at least not in the initial versions. Then the request is split into one for metadata (a DDS, DAS or DDX) or data (DataDDS). In the latter case the NcML is parsed only to determine the netcdf@localtion attribute's value and that data source's DataDDS is accessed using the BES and that response is returned by this handler. In the case of a metadata request, the DDX response is sought for the data source named in the @location attribute and then augmented with information in the NcML file. The result is used to return on of the three DAP2/3/4 metadata responses.
Here is a high resolution version of the activity diagram shown to the right.
Important points for this design:
This describes how to build a basic DAP handler for the BES
Since NcML was designed for the netCDF data model and that does not match exactly the DAP data model, how should various parts of NcML be used by this handler?
Get the Response from the BES
How do you get a response object 'within the BES?' In other words, one BES typically has a number of handlers installed for a variety of data formats (netcdf, hdf4, et c.) and it also has handlers that modify standard DAP responses like the ASCII handler (which takes a DataDDS response and builds an ASCII text/plain type MIME document from it). The ASCII handler does this by asking another handler within the BES in which it's running to get the DataDDS. In the case of the NcML AIS, however, we won't be using the same sort of mechanism as eh ASCII handler. Instead, the mechanism used by the NcML AIS handler will be similar to that used in the WCS gateway.
Note: I'm a little murky on the details here and I'm waiting on information from Patrick.
Once the handler has the DDX response, what does it need to do to insert new information in the C++ object(s)?
- The NcML handler. It will run in the BES.
- Instructions on how to use said handler.
Period of use
This will be used for the remainder of the IOOS and REAP projects and hopefully for a long time thereafter.