BES - Modules - NcML Module: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 559: Line 559:
It also can be used to remove variables from existing datasets:
It also can be used to remove variables from existing datasets:


</pre>
<pre>
   <remove name="SomeExistingVariable" type="variable"/>
   <remove name="SomeExistingVariable" type="variable"/>
</pre>
</pre>

Revision as of 17:07, 4 November 2010

Introduction

Welcome to the OPeNDAP NcML Data Handler Module v1.1.0 for Hyrax 1.6.2!

[Updated for NcML Module v1.1.0 for Hyrax 1.6.2 Release -- 15 September 2010]

This module may be added to a Hyrax 1.6.2 server to extend its data serving capability to NcML 2.2 files (see http://www.unidata.ucar.edu/software/netcdf/ncml/). NcML provides support for modifying other datasets in various ways, such as adding metadata and data and aggregating multiple datasets in several ways.

We refer the reader to the Unidata NcML tutorial: http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Tutorial.html which will give the basics of using NcML. We then give a reference manual for the various elements and their attributes. Additionally we have provided a more extensive tutorial on NcML aggregation. Please see #Aggregation Tutorials.

Since the DAP Grid is a common case, we will also give a simple example for adding metadata to the various parts of a Grid dataset using NcML. Please see Grid_Metadata_Tutorial.

Current Release

The most recent release is 1.1.0, which is bundled with (and requires) Hyrax server 1.6.2. To download, visit the NcML Module Page.

New features in this release:


Features

This version currently implements a subset of NcML 2.2 functionality, along with some OPeNDAP extensions:

  • Metadata Manipulation
    • Addition, Removal, and Modification of attributes to other datasets (NetCDF, HDF4, HD5, etc.) served by the same Hyrax 1.6 server
    • Extends NcML 2.2 to allow for common nested "attribute containers"
    • Attributes can be DAP2 types as well as the NcML types
    • Attributes can be of the special "OtherXML" type for injecting arbitrary XML into a DDX response
  • Data Manipulation
    • Addition of new data variables (scalars or arrays of basic types as well as structures)
    • Variables may be removed from the wrapped dataset
    • Allows the creation of "pure virtual" datasets which do not wrap another dataset
  • Aggregations: JoinNew, JoinExisting Union #Aggregation Tutorials
    • JoinNew Aggregation (NCML_Module_Aggregation_JoinNew)
      • Allows multiple datasets to be "joined" by creating a new outer dimension for the aggregated variable
      • Aggregation member datasets can be listed explicitly with explicit coordinates for the new dimension for each member
      • Scan: Aggregations can be specified "automatically" by scanning a directory for files matching certain criteria, such as a suffix or regular expression.
      • Metadata may be added to the new coordinate variable for the new dimension
    • JoinExisting Aggregation (NCML_Module_Aggregation_JoinExisting)
      • Currently requires the author to specify the ncoords attribute.
      • Scan may also be used with ncoords attribute for uniform sized granules
      • Only allows join dimension to be aggregated from granules and not overridden in NcML
    • Union Aggregation (NCML_Module_Aggregation_Union)
      • Merges all member datasets into one by taking the first named instance of variables and metadata from the members
      • Useful for combining two or more datasets with different variables into a single set

Installation from Source

For information on how to build and install the NcML Data Module, please see the INSTALL file that came with the source distribution.

Installation Overview

The NcML Module requires a working Hyrax 1.6 installation. It is a module that is dynamically loaded into the Hyrax BES (Back End Server) to allow it to handle NcML files.

Please see the file INSTALL for full build and install instructions as well as requirements.

NOTE: After installation, you MUST restart Hyrax by restarting the BES and OLFS so the NcML Module is loaded!

Requirement: International Components for Unicode (ICU) Library

The most important external requirement is an installation of the International Components for Unicode (ICU) version 3.6 or higher (tested up to 4.2.1). The source distributions (as well as some binaries) may be found at the site: http://site.icu-project.org/download

If you are using Linux RPM's to run Hyrax, you can get an RPM for ICU as well. Search for the RPM named "libicu" using a package manager or yum, e.g. If you are compiling the module from source, you will also need the RPM "libicu-devel" to get the headers installed.

If you install in the default locations, the ncml_module should find the libraries and headers. Otherwise, please consult the INSTALL file for more information about installing ICU to a non-standard location.

Testing Installation

Test data is provided to see if the installation was successful. The file sample_virtual_dataset.ncml is a dataset purely created in NcML and doesn't contain an underlying dataset. You may also view fnoc1_improved.ncml to test adding attributes to an existing netCDF dataset (fnoc1.nc), but this requires the netCDF data handler to be installed first! Several other examples installed also use the HDF4 and HDF5 handlers.

Functionality

This version of the NcML Module implements a subset of NcML 2.2 functionality. The reader is directed to http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/ for more information on NcML.

Our module can currently:

  • Refer only to files being served locally (not remotely)
  • Add, modify, and remove attribute metadata to a dataset
  • Create a purely virtual dataset using just NcML and no underlying dataset
  • Create new scalar variables of any simple NcML type or simple DAP type
  • Create new Structure variables (which can contain new child variables)
  • Create new N-dimensional arrays of simple types (NcML or DAP)
  • Remove existing variables from a wrapped dataset
  • Rename existing variables in a wrapped dataset
  • Name dimensions as a mnemonic for specifying Array shapes
  • Perform union aggregations on multiple datasets, virtual or wrapped or both
  • Perform joinNew aggregations to merge a variable across multiple datasets by creating a new outer dimension
  • Specify aggregation member datasets by scanning directories for files matching certain criteria

We describe each supported NcML element in detail below.

<netcdf> Element

The <netcdf> element is used to define a dataset, either a wrapped dataset that is to be modified, a pure virtual dataset, or a member dataset of an aggregation. The <netcdf> element is assumed to be the topmost node, or as a child of an aggregation element.

Local vs. Remote Datasets

We assume that the location attribute (netcdf@location) refers to the full path (with respect to the BES data root directory) of a local dataset (served by the same Hyrax server). The current version of the module cannot be used to modify remote datasets.

If netcdf@location is the empty string (or unspecified, as empty is the default), the dataset is a pure virtual dataset, fully specified within the NcML file itself. Attributes and variables may be fully described and accessed with constraints just as normal datasets in this manner. The installed sample datafile "sample_virtual_dataset.ncml" is an example test case for this functionality.

Unsupported Attributes

The current version does not support the following attributes of <netcdf>:

  • enhance
  • addRecords
  • fmrcDefinition (will be supported when FMRC aggregation is added)
  • ncoords (will be supported when joinExisting is added)

<readMetadata> Element

The <readMetadata/> element is the default, so is effectively not needed.

<explicit> element

The <explicit/> element simply clears all attribute tables in the referred to netcdf@location before applying the rest of the NcML transformations to the metadata.

<dimension> Element

The <dimension> element has limited functionality in this release since the DAP2 doesn't support dimensions as more than mnemonics at this time. The limitations are:

  • We only parse the dimension@name and dimension@length attributes.
  • Dimensions can only be specified as a direct child of a <netcdf> element prior to any reference to them

For example:

<netcdf> 
  <dimension name="station" length="2"/>
  <dimension name="samples" length="5"/>
  <!-- Some variable elements refer to the dimensions here -->
</netcdf>

The dimension element sets up a mapping from the name to the unsigned integer length and can be used in a variable@shape to specify a length for an array dimension (see the section on <variable> below). The dimension map is cleared when </netcdf> is encountered (though this doesn't matter currently since we allow only one right now, but it will matter for aggregation, potentially). We also do not support <group>, which is the only other legal place in NcML 2.2 for a dimension element.

Parse Errors:

  • If the name and length are not both specified.
  • If the dimension name already exists in the current scope
  • If the length is not an unsigned integer
  • If any of the other attributes specified in NcML 2.2 are used. We do not handle them, so we consider them errors now.

<variable> Element

The <variable> element is used to:

  • Provide lexical scope for a contained <attribute> or <variable> element
  • Rename existing variables
  • Add new scalar variables of simple types
  • Add new Structure variables
  • Add new N-dimensional Array's of simple types
  • Specify the coordinate variable for the new dimension in a joinNew aggregation

We describe each in turn in more detail.

Specifying Lexical Scope with <variable type="">

Consider the following example:

  <variable name="u">
    <attribute name="Metadata" type="string">This is metadata!</attribute>
  </variable>

This code assumes that a variable named "u" exists (of any type since we do not specify) and provides the lexical scope for the attribute "Metadata" which will be added or modified within the attribute table for the variable "u" (it's qualified name would be "u.Metadata").

Nested DAP Structure and Grid Scopes

Scoping variable elements may be nested if the containing variable is a Structure (this includes the special case of Grid)

 <variable name="DATA_GRANULE" type="Structure">
    <variable name="PlanetaryGrid" type="Structure">
      <variable name="percipitate">
	<attribute name="units" type="String" value="inches"/>
      </variable>
    </variable>
  </variable>

This adds a "unit" attribute to the variable "percipitate" within the nested Structure's ("DATA_GRANULE.PlanetaryGrid.percipitate" as fully qualified name). Note that we must refer to the type explicitly as a "Structure" so the parser knows to traverse the tree.

Note the variable might be of type Grid, but the type "Structure" must be used in the NcML to traverse it.

Adding Multiple Attributes to the Same Variable

Once the variable's scope is set by the opening <variable> element, more than one attribute can be specified within it. This will make the NcML more readable and also will make the parsing more efficient since the variable will only need to be looked up once.

For example,

<variable name="Foo">
   <attribute name="Attr_1" type="string" value="Hello"/>
   <attribute name="Attr_2" type="string" value="World!"/>
</variable>

should be preferred over:

<variable name="Foo">
   <attribute name="Attr_1" type="string" value="Hello"/>
</variable>

<variable name="Foo">
   <attribute name="Attr_2" type="string" value="World!"/>
</variable>

although they produce the same result. Any number of attributes can be specified before the variable is closed.

Renaming Existing Variables

The attribute variable@orgName is used to rename an existing variable.

For example:

<variable name="NewName" orgName="OldName"/>

will rename an existing variable at the current scope named "OldName" to "NewName". After this point in the NcML file (such as in constraints specified for the DAP request), the variable is known by "NewName".

Note that the type is not required here --- the variable is assumed to exist and its existing type is used. It is not possible to change the type of an existing variable at this time!

Parse Errors:

  • If a variable with variable@orgName doesn't exist in the current scope
  • If the new name variable@name is already taken in the current scope
  • If a new variable is created but does not have exactly one values element

Adding a New Scalar Variable

The <variable> element can be used to create a new scalar variable of a simple type (i.e. an atomic NcML type such as "int" or "float", or any DAP atomic type, such as "UInt32" or "URL") by specifying an empty variable@shape (which is the default), a simple type for variable@type, and a contained <values> element with the one value of correct type.

For example:

<variable name="TheAnswerToLifeTheUniverseAndEverything" type="double">
    <attribute name="SolvedBy" type="String" value="Deep Thought"/>
    <values>42.000</values>
  </variable>

will create a new variable named "TheAnswerToLifeTheUniverseAndEverything" at the current scope. It has no shape so will be a scalar of type "double" and will have the value 42.0.

Parse Errors:

  • It is a parse error to not specify a <values> element with exactly one proper value of the variable type.
  • It is a parse error to specify a malformed or out of bounds value for the data type

Adding a New Structure Variable

A new Structure variable can be specified at the global scope or within another Structure. It is illegal for an array to have type structure, so the shape must be empty.

For example:

<variable name="MyNewStructure" type="Structure">
    <attribute name="MetaData" type="String" value="This is metadata!"/>
    <variable name="ContainedScalar1" type="String"><values>I live in a new structure!</values></variable>
    <variable name="ContainedInt1" type="int"><values>42</values></variable>
  </variable>

specifies a new structure called "MyNewStructure" which contains two scalar variable fields "ContainedScalar1" and "ContainedInt1".

Nested structures are allowed as well.

Parse Error:

  • If another variable or attribute exists at the current scope with the new name.
  • If a <values> element is specified as a direct child of a new Structure --- structures cannot contain values, only attributes and other variables.

Adding a New N-dimensional Array

An N-dimensional array of a simple type may be created virtually as well by specifying a non-empty variable@shape. The shape contains the array dimensions in left-to-right order of slowest varying dimension first. For example:

 <variable name="FloatArray" type="float" shape="2 5">
      <!-- values specified in row major order (leftmost dimension in shape varies slowest) 
	Any whitespace is a valid separator by default, so we can use newlines to pretty print 2D matrices.
	-->
      <values>
	0.1 0.2 0.3 0.4 0.5
	1.1 1.1 1.3 1.4 1.5
      </values>
    </variable>

will specify a 2x5 dimension array of float values called "FloatArray". The <values> element must contain 2x5=10 values in row major order (slowest varying dimension first). Since whitespace is the default separator, we use a newline to show the dimension boundary for the values, which is easy to see for a 2D matrix such as this.

A dimension name may also be used to refer mnemonically to a length. The DAP response will use this mnemonic in its output, but it is not currently used for shared dimensions, only as a mnemonic. See the section on the <dimension> element for more information. For example:

<netcdf>
 <dimension name="station" length="2"/>
 <dimension name="sample" length="5"/>
 <variable name="FloatArray" type="float" shape="station sample">
      <values>
	0.1 0.2 0.3 0.4 0.5
	1.1 1.1 1.3 1.4 1.5
      </values>
    </variable>

will produce the same 2x5 array, but will incorporate the dimension mnemonics into the response. For example, here's the DDS response:

Dataset {
     Float32 FloatArray[station = 2][samples = 5];
} sample_virtual_dataset.ncml;

Note that the <values> element respects the values@separator attribute if whitespace isn't correct. This is very useful for arrays of strings with whitespace, for example.

<variable name="StringArray" type="string" shape="3">
  <values separator="*">String 1*String 2*String 3</values>
</variable>

creates a length 3 array of string StringArray = {"String 1", "String 2", "String 3"}.


Parse Errors:

  • It is an error to specify the incorrect number of values
  • It is an error if any value is malformed or out of range for the data type.
  • It is an error to specify a named dimension which does not exist in the current <netcdf> scope.
  • It is an error to specify an Array whose flattened size (product of dimensions) is > 2^31-1.

Specifying the new coordinate variable for a joinNew aggregation

In the special case of a joinNew aggregation, the new coordinate variable may be specified with the <variable> element. The new coordinate variable is defined to have the same name as the new dimension. This allows for several things:

  • Explicit specification of the variable type and coordinates for the new dimension
  • Specification of the metadata for the new coordinate variable

In the first case, the author can specify explicitly the type of the new coordinate variable and the actual values for each dataset. In this case, the variable must be specified after the aggregation element in the file so the new dimension's size (number of member datasets) may be known and error checking performed. Metadata can also be added to the variable here.

In the second case, the author may just specify the variable name, which allows one to specify the metadata for a coordinate variable that is automatically generated by the aggregation itself. This is the only allowable case for a variable element to not contain a values element! Coordinate variables are generated automatically in two cases:

  • The author has specified an explicit list of member datasets, with or without explicit coordVal attributes.
  • The author has used a <scan> element to specify the member datasets via a directory scan

In this case, the <variable> element may come before or after the <aggregation>.

Parse Errors:

  • If an explicit variable is declared for the new coordinate variable:
    • And it contains explicit values, the number of values must be equal to the number of member datasets in the aggregation.
    • It must be specifed after the <aggregation> element
  • If a numeric coordVal is used to specify the first member dataset's coordinate, then all datasets must contain a numerical coordinate.
  • An error is thrown if the specified aggregation variable (variableAgg) is not found in all member datasets.
  • An error is thrown if the specified aggregation variable is not of the same type in all member datasets. Coercion is not performed!
  • An error is thrown if the specified aggregation variables in all member datasets do not have the same shape
  • An error is thrown if an explicit coordinate variable is specified with a shape that is not the same as the new dimension name (and the variable name itself).

<values> Element

The <values> element can only be used in the context of a new variable of scalar or array type. We cannot change the values for existing variables in this version of the handler. The characters content of a <values> element is considered to be a separated list of value tokens valid for the type of the variable of the parent element. The number of specified tokens in the content must equal the product of the dimensions of the enclosing variable@shape, or be one value for a scalar. It is an error to not specify a <values> element for a declared new variable as well.

Changing the Separator Tokens

The author may specify values@separator to change the default value token separator from the default whitespace. This is very useful for specifying arrays of strings with whitespace in them, or if data in CSV form is being pasted in.

Autogeneration of Uniform Arrays

We also can parse values@start and values@increment INSTEAD OF tokens in the content. This will "autogenerate" a uniform array of values of the given product of dimensions length for the containing variable. For example:

<variable name="Evens" type="int" shape="100">
  <values start="0" increment="2"/>
</variable>

will specify an array of the first 100 even numbers (including 0).

Parse Errors:

  • If the incorrect number of tokens are specified for the containing variable's shape
  • If any value token cannot be parsed as a valid value for the containing variable's type
  • If content is specified in addition to start and increment
  • If only one of start or increment is specified
  • If the values element is placed anywhere except within a NEW variable.

<attribute> Element

As an overview, whenever the parser encounters an <attribute> with a non-existing name (at the current scope), it creates a new one, whether a container or atomic attribute (see below). If the attribute exists, its value and/or type is modified to those specified in the <attribute> element. If an attribute structure (container) exists, it is used to define a nested lexical scope for child attributes.

Attributes may be scalar (one value) or one dimensional arrays. Arrays are specified by using whitespace (default) to separate the different values. The attribute@separator may also be set in order to specify a different separator, such as CSV format or to specify a non-whitespace separator so strings with whitespace are not tokenized. We will give examples of creating array attributes below.

Adding New Attributes or Modifying an Existing Attribute

If a specified attribute with the attribute@name does not exist at the current lexical scope, a new one is created with the given type and value. For example, assume "new_metadata" doesn't exist at the current parse scope. Then:

<attribute name="new_metadata" type="string" value="This is a new entry!"/>

will create the attribute at that scope. Note that value can be specified in the content of the element as well. This is identical to the above:

<attribute name="new_metadata" type="string">This is a new entry!</attribute>

If the attribute@name already exists at the scope, it is modified to contain the specified type and value.


Arrays

As in NcML, for numerical types an array can be specified by separating the tokens by whitespace (default) or be specifying the token separator with attribute@separator. For example,

<attribute name="myArray" type="int">1 2 3</attribute>

and

<attribute name="myArray" type="int" separator=",">1,2,3</attribute>

both specify the same array of three integers named "myArray".

TODO Add more information on splitting with a separator!


Structures (Containers)

We use attribute@type="Structure" to define a new (or existing) attribute container. So if we wanted to add a new attribute structure, we'd use something like this:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

Assuming "MySamples" doesn't already exist, an attribute container will be created at the current scope and the "Location" and "Samples" attributes will be added to it.

Note that we can create nested attribute structures to arbitrary depth this way as well.

If the attribute container with the given name already exists at the current scope, then the attribute@type="Structure" form is used to define the lexical scope for the container. In other words, child <attribute> elements will be processed within the scope of the container. For example, in the above example, if "MySamples" already exists, then the "Location" and "Samples" will be processed within the existing container (they may or may not already exist as well).

Renaming an Existing Attribute or Attribute Container

We also support the attribute@orgName attribute for renaming attributes.

For example,

<attribute name="NewName" orgName="OldName" type="string"/>

will rename an existing attribute "OldName" to "NewName" while leaving its value alone. If attribute@value is also specified, then the attribute is renamed and has its value modified.

This works for renaming attribute containers as well:

<attribute name="MyNewContainer" orgName="MyOldContainer" type="Structure"/>

will rename an existing "MyOldContainer" to "MyNewContainer". Note that any children of this container will remain in it.

DAP OtherXML Extension

The module now allows specification of attributes of the new DAP type "OtherXML". This allows the NCML file author to inject arbitrary well-formed XML into an attribute for clients that want XML metadata rather than just string or url. Internally, the attribute is still a string (and in a DAP DAS response will be quoted inside one string). However, since it is XML, the NCMLParser still parses it and checks it for well-formedness (but NOT against schemas). This extension allows the NCMLParser to parse the arbitrary XML within the given attribute without causing errors, since it can be any XML.

The injected XML is most useful in the DDX response, where it shows up directly in the response as XML. XSLT and other clients can then parse it.

Errors

  • The XML must be in the content of the <attribute type="OtherXML"> element. It is a parser error for attribute@value to be set if attribute@type is "OtherXML".
  • The XML must also be well-formed since it is parsed. A parse error will be thrown if the OtherXML is malformed.

Example

Here's an example of the use of this special case.

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/coverage/200803061600_HFRadar_USEGC_6km_rtv_SIO.nc">

    <attribute name="someName" type="OtherXML">
        <Domain xmlns="http://www.opengis.net/wcs/1.1" 
                xmlns:ows="http://www.opengis.net/ows/1.1"
                xmlns:gml="http://www.opengis.net/gml/3.2"
                >
            <SpatialDomain>
                <ows:BoundingBox crs="urn:ogc:def:crs:EPSG::4326">
                    <ows:LowerCorner>-97.8839 21.736</ows:LowerCorner>
                    <ows:UpperCorner>-57.2312 46.4944</ows:UpperCorner>
                </ows:BoundingBox>
            </SpatialDomain>
            <TemporalDomain>
                <gml:timePosition>2008-03-27T16:00:00.000Z</gml:timePosition>
            </TemporalDomain>
        </Domain>
        <SupportedCRS xmlns="http://www.opengis.net/wcs/1.1">urn:ogc:def:crs:EPSG::4326</SupportedCRS>
        <SupportedFormat xmlns="http://www.opengis.net/wcs/1.1">netcdf-cf1.0</SupportedFormat>
        <SupportedFormat xmlns="http://www.opengis.net/wcs/1.1">dap2.0</SupportedFormat>
    </attribute>

</netcdf>

TODO Put the DDX response for the above in here!

Namespace Closure

Furthermore, the parser will make the chunk of OtherXML "namespace closed". This means any namespaces specified in parent NCML elements of the OtherXML tree will be "brought down" and added to the root OtherXML elements so that the subtree may be pulled out and added to the DDX and still have its namespaces. The algorithm doesn't just bring used prefixes, but brings all of the lexically scoped closest namespaces in all ancestors. In other words, it adds unique namespaces (as determined by prefix) in order from the root of the OtherXML tree as it traverses to the root of the NCML document.

Namespace closure is a syntactic sugar that simplifies the author's task since they can specify the namespaces just once at the top of the NCML file and expect that when the subtree of XML is added to the DDX that these namespaces will come along with that subtree of XML. Otherwise they have to explicitly add the namespaces to each attributes.

TODO Add an example!

<remove> Element

The <remove> element can remove attributes and variables. For example:

  <attribute name="NC_GLOBAL" type="Structure">
    <remove name="base_time" type="attribute"/>
  </attribute>

will remove the attribute named "base_time" in the attribute structure named "NC_GLOBAL".

Note that this works for attribute containers as well! We could recursively remove the entire attribute container (i.e. it and all its children) with:

 <remove name="NC_GLOBAL" type="attribute"/>

It also can be used to remove variables from existing datasets:

  <remove name="SomeExistingVariable" type="variable"/>

This also recurses on variables of type Structure --- the entire structure including all of its children are removed from the dataset's response.

Parse Errors:

  • It is a parse error if the given attribute or variable doesn't exist in the current scope

<aggregation> Element

Aggregation involves combining multiple datasets (<netcdf>) into a virtual "single" dataset in various ways. For a tutorial on aggregation in NcML 2.2, the reader is referred to the Unidata page: http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html

NcML 2.2 supports multiple types of aggregation: union, joinNew, joinExisting, and fmrc (forecast model run collection).

The current version of the NcML module supports two of these aggregations:

A union aggregation specifies that the first instance of a variable or attribute (by name) that is found in the ordered list of datasets will be the one in the output aggregation. This is useful for combining two dataset files, each which may contain a single variable, into a composite dataset with both variables.

A JoinNew aggregation joins a variable which exists in multiple datasets (usually samples of a datum over time) into a new variable containing the data from all member datasets by creating a new outer dimension. The ith component in the new outer dimension is the variable's data from the ith member dataset. It also adds a new coordinate variable of whose name is the new dimension's name and whose shape (length) is the new dimension as well. This new coordinate variable may be explicitly given by the author or may be autogenerated in one of several ways.

<scan> Element

The scan element can be used within an aggregation context to allow a directory to be searched in various ways in order to specify the members of an aggregation. This allows a static NcML file to refer to an aggregation which may change over time, such as where a new data file is generated each day.

We describe usage of the <scan> element in detail in the joinNew aggregation tutorial at NCML_Module_Aggregation_JoinNew.

Errors

There are three types of error messages that may be returned:

  • Internal Error
  • Resource Not Found Error
  • Parse Error

Internal Errors

Internal errors should be reported to support@opendap.org as they are likely bugs.

Resource Not Found Errors

If the netcdf@location specifies a non-existent local dataset (one that is not being served by the same Hyrax server), it will specify the resource was not found. This may also be returned if a handler for the specified dataset is not currently loaded in the BES. Users should test that the dataset to be wrapped already exists and can be viewed on the running server before writing NcML to add metadata. It's also an error to refer to remote datasets (at this time).

Parse Errors

Parse errors are user errors in the NcML file. These could be malformed XML, malformed NcML, unimplemented features of NcML, or could be errors in referring to the wrapped dataset.

The error message should specify the error condition as well as the "current scope" as a fully qualified DAP name within the loaded dataset. This should be enough information to correct the parse error as new NcML files are created.

The parser will generate parse errors in various situations where it expects to find certain structure in the underlying dataset. Some examples:

  • A variable of the given name was not found at the current scope
  • attribute@orgName was specified, but the attribute cannot be found at current scope.
  • attribute@orgName was specified, but the new name is already used at current scope.
  • remove specified a non-existing attribute name

Grid Metadata Tutorial

Please see the page Grid_Metadata_Tutorial for an example of adding metadata to the various parts of a DAP Grid variable.


Aggregation Tutorials

The NcML module may also be used to aggregate multiple datasets into one virtual dataset. We currently support two of the NcML aggregtions:

  • Union NCML_Module_Aggregation_Union: combine multiple datasets into one by merging variables together, selecting the first of each unique name.
  • JoinNew NCML_Module_Aggregation_JoinNew: combine variables across multiple datasets by creating a new outer dimension and coordinate variable for each of the sample datasets.
  • JoinExisting NCML_Module_Aggregation_JoinExisting: combine variables with a common named outer dimension along that dimension by concatenating data for that dimension

Please see the sections for a tutorial on the various uses of these aggregations.

Additions/Changes to NcML 2.2

This section will keep track of changes to the NcML 2.2 schema. Eventually these will be rolled into a new schema.

Attribute Structures (Containers)

This module also adds functionality beyond the current NcML 2.2 schema --- it can handle nested <attribute> elements in order to make attribute structures. This is done by using the <attribute type="Structure"> form, for example:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

"MyContainer" describes an attribute structure with two attribute fields, a string "Location" and an array of int's called "Samples". Note that an attribute structure of this form can only contain other <attribute> elements and NOT a value.

If the container does not already exist, it will be created at the scope it is declared, which could be:

  • Global (top of dataset)
  • Within a variable's attribute table
  • Within another attribute container

If an attribute container of the given name already exists at the lexical scope, it is traversed in order to define the scope for the nested (children) attributes it contains.

Unspecified Variable Type Matching for Lexical Scope

We also allow the type attribute of a variable element (variable@type) to be the empty string (or unspecified) when using existing variables to define the lexical scope of an <attribute> transformation. In the schema, variable@type is (normally) required.


DAP 2 Types

Additionally, we allow DAP2 atomic types (such as UInt32, URL) in addition to the NcML types. The NcML types are mapped onto the closest DAP2 type internally.

DAP OtherXML Attribute Type

We also allow attributes to be of the new DAP type "OtherXML" for injecting arbitrary XML into an attribute as content rather than trying to form a string. This allows the parser to check well-formedness.

Forward Declaration of Dimensions

Since we use a SAX parser for efficiency, we require the <dimension> elements to come before their use in a variable@shape. One way to change the schema to allow this is to force the dimension elements to be specified in a sequence after explicit and metadata choice and before all other elements.

Aggregation Element Location and Processing Order Differences

NcML specifies that if a dataset (<netcdf> element) specifies an aggregation element, the aggregation element is always processed first, regardless of its ordering within the <netcdf> element. Our parser, since it is SAX and not DOM, modifies this behavior in that order matters in some cases:

  • Metadata (<attribute>) elements specified prior to an aggregation "shadow" the aggregation versions. This is be useful for "overriding" an attribute or variable in a union aggregation, where the first found will take precedence.
  • JoinNew: If the new coordinate variable's data is to be set explicitly by specifying the new dimension's shape (either with explicit data or the autogenerated data using values@start and values@increment attributes), the <variable> must come after the aggregation since the size of the dimension is unknown until the aggregation element is processed.

Backward Compatibility Issues

Due to the way shared dimensions were implemented in the NetCDF, HDF4, and HDF5 handlers, the DAS responses did not follow the DAP2 specification. The NcML module, on the other hand, generates DAP2 compliant DAS for these datasets, which means that wrapping some datasets in NcML will generate a DAS with a different structure. This is important for the NcML author since it changes the names of attributes and variables. In order for the module to find the correct scope for adding metadata, for example, the DAP2 DAS must be used.

In general, what this means is that an empty "passthrough" NcML file should be the starting point for authoring an NcML file. This file would just specify a dataset and nothing else:

<netcdf location="/data/ncml/myNetcdf.nc"/>

The author would then request the DAS response for the NCML file and use that as the starting point for modifications to the original dataset.

More explicit examples are given below.

NetCDF

The NetCDF handler represents some NC datasets as a DAP 2 Grid, but the returned DAS is not consistent with the DAP 2 spec for the attribute hierarchy for such a Grid. The map vector attributes are placed as siblings of the grid attributes rather than within the grid lexical scope. For example, here's the NetCDF Handler DDS for a given file:

Dataset {
    Grid {
      Array:
        Int16 cldc[time = 456][lat = 21][lon = 360];
      Maps:
        Float64 time[time = 456];
        Float32 lat[lat = 21];
        Float32 lon[lon = 360];
    } cldc;
} cldc.mean.nc;

showing the Grid. Here's the DAS the NetCDF handler generates:

Attributes {
    lat {
        String long_name "Latitude";
        String units "degrees_north";
        Float32 actual_range 10.00000000, -10.00000000;
    }
    lon {
        String long_name "Longitude";
        String units "degrees_east";
        Float32 actual_range 0.5000000000, 359.5000000;
    }
    time {
        String units "days since 1-1-1 00:00:0.0";
        String long_name "Time";
        String delta_t "0000-01-00 00:00:00";
        String avg_period "0000-01-00 00:00:00";
        Float64 actual_range 715511.00000000000, 729360.00000000000;
    }
    cldc {
        Float32 valid_range 0.000000000, 8.000000000;
        Float32 actual_range 0.000000000, 8.000000000;
        String units "okta";
        Int16 precision 1;
        Int16 missing_value 32766;
        Int16 _FillValue 32766;
        String long_name "Cloudiness Monthly Mean at Surface";
        String dataset "COADS 1-degree Equatorial Enhanced\\012AI";
        String var_desc "Cloudiness\\012C";
        String level_desc "Surface\\0120";
        String statistic "Mean\\012M";
        String parent_stat "Individual Obs\\012I";
        Float32 add_offset 3276.500000;
        Float32 scale_factor 0.1000000015;
    }
    NC_GLOBAL {
        String title "COADS 1-degree Equatorial Enhanced";
        String history "";
        String Conventions "COARDS";
    }
    DODS_EXTRA {
        String Unlimited_Dimension "time";
    }
}

Note the map vector attributes are in the "dataset" scope.

Here's the DAS that the NcML Module produces from the correctly formed DDX:

Attributes {
    NC_GLOBAL {
        String title "COADS 1-degree Equatorial Enhanced";
        String history "";
        String Conventions "COARDS";
    }
    DODS_EXTRA {
        String Unlimited_Dimension "time";
    }
    cldc {
        Float32 valid_range 0.000000000, 8.000000000;
        Float32 actual_range 0.000000000, 8.000000000;
        String units "okta";
        Int16 precision 1;
        Int16 missing_value 32766;
        Int16 _FillValue 32766;
        String long_name "Cloudiness Monthly Mean at Surface";
        String dataset "COADS 1-degree Equatorial Enhanced\\012AI";
        String var_desc "Cloudiness\\012C";
        String level_desc "Surface\\0120";
        String statistic "Mean\\012M";
        String parent_stat "Individual Obs\\012I";
        Float32 add_offset 3276.500000;
        Float32 scale_factor 0.1000000015;
        cldc {
        }
        time {
            String units "days since 1-1-1 00:00:0.0";
            String long_name "Time";
            String delta_t "0000-01-00 00:00:00";
            String avg_period "0000-01-00 00:00:00";
            Float64 actual_range 715511.00000000000, 729360.00000000000;
        }
        lat {
            String long_name "Latitude";
            String units "degrees_north";
            Float32 actual_range 10.00000000, -10.00000000;
        }
        lon {
            String long_name "Longitude";
            String units "degrees_east";
            Float32 actual_range 0.5000000000, 359.5000000;
        }
    }
}

Here the Grid Structure "cldc" and its contained data array (of the same name "cldc") and map vectors have their own attribute containers as DAP 2 specifies.

What this means for the author of an NcML file adding metadata to a NetCDF dataset that returns a Grid is that they should generate a "passthrough" file and get the DAS and then specify modifications based on that structure.

Here's an example passthrough:

<netcdf location="data/ncml/agg/cldc.mean.nc" title="This file results in a Grid">
</netcdf>


For example, to add an attribute to the map vector "lat" in the above, we'd need the following NcML:

<netcdf location="data/ncml/agg/cldc.mean.nc" title="This file results in a Grid">
  <!-- Traverse into the Grid as a Structure -->
  <variable name="cldc" type="Structure">
    <!-- Traverse into the "lat" map vector (Array) -->
    <variable name="lat"> 
      <attribute name="Description" type="string">I am a new attribute in the Grid map vector named lat!</attribute>
    </variable>
    <variable name="lon"> 
      <attribute name="Description" type="string">I am a new attribute in the Grid map vector named lon!</attribute>
    </variable>
  </variable>
</netcdf>

This clearly shows that the structure of the Grid must be used in the NcML: the attribute being added is technically "cldc.lat.Description" in a fully qualified name. The parser would return an error if it was attempted as "lat.Description" as the NetCDF DAS for the original file would have led one to believe.


HDF4/HDF5

Similarly to the NetCDF case, the Hyrax HDF4 Module produces DAS responses that do not respect the DAP2 specification. If an NcML file is used to "wrap" an HDF4 dataset, the correct DAP2 DAS response will be generated, however.

This is important for those writing NcML for HDF4 data since the lexical scope for attributes relies on the correct DAS form --- to handle this, the user should start with a "passthrough" NcML file (see the above NetCDF example) and use the DAS from that as the starting point for knowing the structure the NcML handler expects to see in the NcML file. Alternatively, the DDX has the proper attribute structure as well (the DAS is generated from it).


Known Bugs

There are no known bugs currently.

Planned Future Enhancements

Planned enhancements for future versions of the module include:

  • New NcML Aggregations

Copyright

This software is copyrighted under the GNU Lesser GPL. Please see the files COPYING and COPYRIGHT that came with this distribution.