BES - Modules - NcML Module: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
 
(42 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Introduction =
= Introduction =
[[Category:NCML]]
[[Category:BES Modules|NCML Module]]


'''Welcome to the OPeNDAP NcML Data Handler Module v1.0.0 for Hyrax 1.6!'''
'''Welcome to the OPeNDAP NcML Data Handler Module v1.4.2 for [http://www.opendap.org/download/hyrax.html Hyrax]'''


[Updated for NcML Module v1.0.0 for Hyrax 1.6.0 beta --  22 Feb 2010]
'''Note''': In the past Hyrax was distributed as a collection of separate binary packages which data providers would choose to install to build up a server with certain features. As the number of modules grew, this became more and more complex and time consuming. As of Hyrax 1.12 we started distributing the server in three discreet packages - the DAP library, the BES daemon and all of the most important handlers (including the NcML handler described here) and the Hyrax web services front end. In some places in this documentation you may read about 'installing the handler' or other similar text, and can safely ignore that. If you have a modern version of the server it includes this handler.


This module may be added to a Hyrax 1.6 server to extend its data
<!--
[Updated for NcML Module v1.4.2 for Hyrax 1.6.2 Release --  15 September 2010]
 
This module may be added to a [http://www.opendap.org/download/hyrax.html Hyrax] server to extend its data
serving capability to NcML 2.2 files (see
serving capability to NcML 2.2 files (see
http://www.unidata.ucar.edu/software/netcdf/ncml/).  NcML provides support for modifying other datasets in various ways, such as adding metadata and data and aggregating multiple datasets in several ways
http://www.unidata.ucar.edu/software/netcdf/ncml/).  NcML provides support for modifying other datasets in various ways, such as adding metadata and data and aggregating multiple datasets in several ways.
 
We refer the reader to the Unidata NcML tutorial: http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Tutorial.html which will give the basics of using NcML.  We then give a reference manual for the various elements and their attributes.  Additionally we have provided a more extensive tutorial on NcML aggregation.  Please see [[#Aggregation Tutorials]].
 
Since the DAP Grid is a common case, we will also give a simple example for adding metadata to the various parts of a Grid dataset using NcML.  Please see [[Grid_Metadata_Tutorial]].
 
== Current Release ==
 
The most recent release is 1.1.0, which is bundled with (and requires) Hyrax server 1.6.2.  To download, visit the [http://www.opendap.org/download/ncml_handler.html NcML Module Page].
 
New features in this release:
 
* joinExisting aggregation initial implementation ([[NCML_Module_Aggregation_JoinExisting]])
-->


==Features==         
==Features==         
    
    
This current version (v1.0.0) currently implements a subset of NcML 2.2
This version currently implements a subset of NcML 2.2
functionality, along with some OPeNDAP extensions:
functionality, along with some OPeNDAP extensions:


* Metadata Manipulation
* Metadata Manipulation
** Addition, Removal, and Modification of attributes to other datasets (NetCDF, HDF4, HD5, etc.) served by the same Hyrax 1.6 server
** Addition, Removal, and Modification of attributes to other datasets (NetCDF, HDF4, HDF5, etc.) served by the same Hyrax 1.6 server
** Extends NcML 2.2 to allow for common nested "attribute containers"  
** Extends NcML 2.2 to allow for common nested "attribute containers"  
** Attributes can be DAP2 types as well as the NcML types  
** Attributes can be DAP2 types as well as the NcML types  
Line 23: Line 41:
** Variables may be removed from the wrapped dataset
** Variables may be removed from the wrapped dataset
** Allows the creation of "pure virtual" datasets which do not wrap another dataset
** Allows the creation of "pure virtual" datasets which do not wrap another dataset
* Aggregations: JoinNew and Union
* Aggregations: JoinNew, JoinExisting Union [[#Aggregation Tutorials]]
** JoinNew Aggregation  
** JoinNew Aggregation ([[NCML_Module_Aggregation_JoinNew]])
*** Allows multiple datasets to be "joined" by creating a new outer dimension for the aggregated variable
*** Allows multiple datasets to be "joined" by creating a new outer dimension for the aggregated variable
*** Aggregation member datasets can be listed explicitly with explicit coordinates for the new dimension for each member
*** Aggregation member datasets can be listed explicitly with explicit coordinates for the new dimension for each member
*** Scan: Aggregations can be specified "automatically" by scanning a directory for files matching certain criteria, such as a suffix or regular expression.
*** Scan: Aggregations can be specified "automatically" by scanning a directory for files matching certain criteria, such as a suffix or regular expression.
*** Metadata may be added to the new coordinate variable for the new dimension
*** Metadata may be added to the new coordinate variable for the new dimension
** Union Aggregation
** JoinExisting Aggregation ([[NCML_Module_Aggregation_JoinExisting]])
*** The ''ncoords'' element can be left out of the ''joinexisting'' granules. However, this may be a slow operation, depending on the number of granules in the aggregation.
*** Scan may also be used with ''ncoords'' attribute for uniform sized granules
*** Only allows join dimension to be aggregated from granules and not overridden in NcML
** Union Aggregation ([[NCML_Module_Aggregation_Union]])
*** Merges all member datasets into one by taking the first named instance of variables and metadata from the members
*** Merges all member datasets into one by taking the first named instance of variables and metadata from the members
*** Useful for combining two or more datasets with different variables into a single set
*** Useful for combining two or more datasets with different variables into a single set
Line 38: Line 60:
= Installation Overview =
= Installation Overview =


The NcML Module requires a working Hyrax installation.  It is a module
The NcML Module requires a working Hyrax 1.6 installation.  It is a module
that is dynamically loaded into the Hyrax BES (Back End Server) to
that is dynamically loaded into the Hyrax BES (Back End Server) to
allow it to handle NcML files.   
allow it to handle NcML files.   


Please see the file INSTALL for full build and install instructions as
Please see the file INSTALL for full build and install instructions as
well as requirements.
well as requirements.
 
'''NOTE:''' After installation, you MUST restart Hyrax by restarting the BES and
OLFS so the NcML Module is loaded!
 
== Requirement: International Components for Unicode (ICU) Library ==
 
The most important external requirement is an installation of the International Components for Unicode (ICU) version 3.6 or higher (tested up to 4.2.1).  The source distributions (as well as some binaries) may be found at the site: http://site.icu-project.org/download


Once installed, the BES configuration file (bes.conf) needs to be
If you are using Linux RPM's to run Hyrax, you can get an RPM for ICU as well. Search for the RPM named "libicu" using a package manager or yum, e.g.   If you are compiling the module from source, you will also need the RPM "libicu-devel" to get the headers installed.   
modified to tell it to load the module and where it was installed. A
configuration helper script (bes-ncml-data.sh) is installed with the
module to automate this processThe script is used on the command
line (> prompt) as follows:


  > bes-ncml-data.sh [<bes.conf file to modify> [<bes modules dir>]]
If you install in the default locations, the ncml_module should find the libraries and headers. Otherwise, please consult the INSTALL file for more information about installing ICU to a non-standard location.


If you are doing a source installation, the `bes-conf' make target
== Configuration Parameters ==
runs the script while trying to select paths cleverly, and should be
called using:


  > make bes-conf
==== TempDirectory ====


After installation, you MUST restart Hyrax by restarting the BES and
Where should the NCML handler store temporary data on the server's file system.
OLFS so the NcML Module is loaded.
 
Default value is '/tmp'.
 
NCML.TempDirectory=/tmp
 
==== GlobalAttributesContainerName ====
 
In DAP2 all global attributes must be held in containers. However, the default behavior for the handler is set for DAP4, where this requirement is relaxed so that any kind of attribute can be a global attribute. However, to support older clients that only understand DAP2, the handler will bundle top-level non-container attributes into a container. Use this option to set the name of that container. By default, the container is named ''NC_GLOBAL'' (because lots of clients look for that name), but it can be anything you choose.
 
NCML.GlobalAttributesContainerName=NC_GLOBAL
 
== Testing Installation ==


Test data is provided to see if the installation was successful.  The file sample_virtual_dataset.ncml is a dataset purely created in NcML and doesn't contain an underlying dataset.   
Test data is provided to see if the installation was successful.  The file sample_virtual_dataset.ncml is a dataset purely created in NcML and doesn't contain an underlying dataset.   
You may also view fnoc1_improved.ncml to test adding attributes to an existing netCDF dataset (fnoc1.nc), but this requires the netCDF data handler to be
You may also view fnoc1_improved.ncml to test adding attributes to an existing netCDF dataset (fnoc1.nc), but this requires the netCDF data handler to be
installed first!
installed first! Several other examples installed also use the HDF4 and HDF5 handlers.


= Functionality =
= Functionality =


This version of the NcML Module implements a subset of NcML 2.2
This version of the NcML Module implements a subset of NcML 2.2
functionality.  It currently can:
functionality.  The reader is directed to http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/ for more information on NcML.
 
Our module can currently:


*Refer only to files being served locally (not remotely)
*Refer only to files being served locally (not remotely)
Line 80: Line 116:
*Rename existing variables in a wrapped dataset
*Rename existing variables in a wrapped dataset
*Name dimensions as a mnemonic for specifying Array shapes
*Name dimensions as a mnemonic for specifying Array shapes
*[In Development] Perform union aggregations on multiple datasets, virtual or referenced
*Perform union aggregations on multiple datasets, virtual or wrapped or both
*Perform joinNew aggregations to merge a variable across multiple datasets by creating a new outer dimension
*Specify aggregation member datasets by scanning directories for files matching certain criteria


This version of the NcML module can only handle a
We describe each supported NcML element in detail below.
subset of the NcML schema --- it is restricted to a subset of the
elements: <netcdf>, <explicit>, <readMetadata>, <attribute>,
<variable>, <remove>, <dimension>, and <values>.  In addition, there are constraints on which attributes are handled for some of these elements.


== <netcdf> Element ==
== <netcdf> Element ==


For this version we assume that:
The <netcdf> element is used to define a dataset, either a wrapped dataset that is to be modified, a pure virtual dataset, or a member dataset of an aggregation.  The <netcdf> element is assumed to be the topmost node, or as a child of an aggregation element.


* The location attribute (''netcdf@location'') refers to a '''''local''''' dataset (served by the same Hyrax server) or is unspecified.
=== Local vs. Remote Datasets ===
* No other attributes are specified


The <netcdf> element is assumed to be the topmost node, or as a child of an aggregation element.
We assume that the location attribute (''netcdf@location'') refers to the full path (with respect to the BES data root directory) of a '''''local''''' dataset (served by the same Hyrax server).  The current version of the module cannot be used to modify remote datasets.


If ''netcdf@location'' is the empty string (or unspecified, as empty is the default), the dataset is a pure virtual dataset, fully specified within the NcML file itself.  Attributes and variables may be fully described and accessed with constraints just as normal datasets in this manner.  The installed sample datafile "sample_virtual_dataset.ncml" is an example test case for this functionality.
If ''netcdf@location'' is the empty string (or unspecified, as empty is the default), the dataset is a pure virtual dataset, fully specified within the NcML file itself.  Attributes and variables may be fully described and accessed with constraints just as normal datasets in this manner.  The installed sample datafile "sample_virtual_dataset.ncml" is an example test case for this functionality.


We only parse the ''netcdf@location''  attribute at this point and ignore all others.  These may be added if needed in the future, but they don't map to DAP in any obvious way right now, but seem specific to netcdf data.
=== Unsupported Attributes ===
 
The current version does not support the following attributes of <netcdf>:
 
*enhance
*addRecords
*fmrcDefinition (will be supported when FMRC aggregation is added)
<!-- *ncoords (will be supported when joinExisting is added)  Commented 11/11/10 jhrg -->


== <readMetadata> Element ==
== <readMetadata> Element ==
Line 143: Line 184:
* Add new Structure variables
* Add new Structure variables
* Add new N-dimensional Array's of simple types
* Add new N-dimensional Array's of simple types
* Specify the coordinate variable for the new dimension in a joinNew aggregation


We describe each in turn in more detail.
We describe each in turn in more detail.
'''NB:''' ''When working with an existing variable (array or otherwise) it is not required that the variable type be specified in it' NcML declaration. All that is needed is the correct name (in lexical scope). When specifying the type for an existing variable care must be taken to ensure that the type specified in the NcML document matches the type of the existing variable. In particular, variables that are arrays must be called array, and not the type of the template primitive.''


=== Specifying Lexical Scope with <variable type=""> ===
=== Specifying Lexical Scope with <variable type=""> ===
Line 322: Line 367:
* It is an error to specify a named dimension which does not exist in the current <netcdf> scope.
* It is an error to specify a named dimension which does not exist in the current <netcdf> scope.
* It is an error to specify an Array whose flattened size (product of dimensions) is > 2^31-1.
* It is an error to specify an Array whose flattened size (product of dimensions) is > 2^31-1.
=== Specifying the new coordinate variable for a joinNew aggregation ===
In the special case of a joinNew aggregation, the new coordinate variable may be specified with the <variable> element.  The new coordinate variable is ''defined'' to have the same name as the new dimension.  This allows for several things:
*Explicit specification of the variable type and coordinates for the new dimension
*Specification of the metadata for the new coordinate variable
In the first case, the author can specify explicitly the type of the new coordinate variable and the actual values for each dataset.  In this case, the variable ''must'' be specified ''after'' the aggregation element in the file so the new dimension's size (number of member datasets) may be known and error checking performed.  Metadata can also be added to the variable here.
In the second case, the author may just specify the variable name, which allows one to specify the metadata for a coordinate variable that is automatically generated by the aggregation itself.  This is the only allowable case for a variable element to ''not'' contain a values element!  Coordinate variables are generated automatically in two cases:
*The author has specified an explicit list of member datasets, with or without explicit coordVal attributes. 
*The author has used a <scan> element to specify the member datasets via a directory scan
In this case, the <variable> element may come before or after the <aggregation>.
'''Parse Errors:'''
*If an explicit variable is declared for the new coordinate variable:
** And it contains explicit values, the number of values must be equal to the number of member datasets in the aggregation.
** It must be specifed ''after'' the <aggregation> element
* If a numeric coordVal is used to specify the first member dataset's coordinate, then ''all'' datasets must contain a numerical coordinate.
* An error is thrown if the specified aggregation variable (variableAgg) is not found in ''all'' member datasets.
* An error is thrown if the specified aggregation variable is not of the same type in ''all'' member datasets.  Coercion is ''not'' performed!
* An error is thrown if the specified aggregation variables in all member datasets do not have the same shape
* An error is thrown if an explicit coordinate variable is specified with a shape that is ''not'' the same as the new dimension name (and the variable name itself).


== <values> Element ==
== <values> Element ==
Line 434: Line 511:
=== DAP ''OtherXML'' Extension ===
=== DAP ''OtherXML'' Extension ===


The trunk version of the module now handles attributes of the new DAP type "OtherXML".  This allows the NCML file author to inject arbitrary well-formed XML into an attribute for clients that want XML metadata rather than just string or url.  Internally, the attribute is still a string (and in a DAP DAS response will be quoted inside one string).  However, since it is XML, the NCMLParser still parses it and checks it for well-formedness (but NOT against schemas).  This extension allows the NCMLParser to parse the arbitrary XML within the given attribute without causing errors, since it can be any XML.   
The module now allows specification of attributes of the new DAP type "OtherXML".  This allows the NCML file author to inject arbitrary well-formed XML into an attribute for clients that want XML metadata rather than just string or url.  Internally, the attribute is still a string (and in a DAP DAS response will be quoted inside one string).  However, since it is XML, the NCMLParser still parses it and checks it for well-formedness (but NOT against schemas).  This extension allows the NCMLParser to parse the arbitrary XML within the given attribute without causing errors, since it can be any XML.   
 
The injected XML is most useful in the DDX response, where it shows up directly in the response as XML.  XSLT and other clients can then parse it.


==== Errors ====
==== Errors ====


The XML '''must''' be in the content of the <attribute type="OtherXML"> element.  It is a parser error for ''attribute@value'' to be set if ''attribute@type'' is "OtherXML".   
*The XML '''must''' be in the content of the <attribute type="OtherXML"> element.  It is a parser error for ''attribute@value'' to be set if ''attribute@type'' is "OtherXML".   
 
*The XML must also be well-formed since it is parsed.  A parse error will be thrown if the OtherXML is malformed.
The XML must also be well-formed since it is parsed.  A parse error will be thrown if the OtherXML is malformed.


==== Example ====
==== Example ====
Line 472: Line 550:
</pre>
</pre>


This creates an attribute which is essentially a string, but it is checked for XML well-formedness in addition.
'''TODO''' Put the DDX response for the above in here!


Furthermore, the version as of changeset [21648] will make the chunk of OtherXML namespace-valid.  This means any namespaces specified in parent NCML elements of the OtherXML tree will be "brought down" and added to the ROOT OtherXML elements.  The algorithm doesn't just bring used prefixes, but brings ALL of the lexically scoped closest namespaces in all ancestors.  In other words, it adds unique namespaces (as determined by prefix) in order from the root of the OtherXML tree as it traverses to the root of the NCML document.
====Namespace Closure====
 
Furthermore, the parser will make the chunk of OtherXML "namespace closed".  This means any namespaces specified in parent NCML elements of the OtherXML tree will be "brought down" and added to the ''root'' OtherXML elements so that the subtree may be pulled out and added to the DDX and still have its namespaces.  The algorithm doesn't just bring used prefixes, but brings ''all'' of the lexically scoped closest namespaces in all ancestors.  In other words, it adds unique namespaces (as determined by prefix) in order from the root of the OtherXML tree as it traverses to the root of the NCML document.
 
Namespace closure is a syntactic sugar that simplifies the author's task since they can specify the namespaces just once at the top of the NCML file and expect that when the subtree of XML is added to the DDX that these namespaces will come along with that subtree of XML.  Otherwise they have to explicitly add the namespaces to each attributes.
 
'''TODO''' Add an example!


== <remove> Element ==
== <remove> Element ==
Line 497: Line 581:
It also can be used to remove variables from existing datasets:
It also can be used to remove variables from existing datasets:


</pre>
<pre>
   <remove name="SomeExistingVariable" type="variable"/>
   <remove name="SomeExistingVariable" type="variable"/>
</pre>
</pre>
Line 508: Line 592:
== <aggregation> Element ==
== <aggregation> Element ==


['''NOTE: Aggregation support is currently in development. This documentation is only here to serve as a reference until release.''']
Aggregation involves combining multiple datasets (<netcdf>) into a virtual "single" dataset in various ways.  For a tutorial on aggregation in NcML 2.2, the reader is referred to the Unidata page: http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html
 
NcML 2.2 supports multiple types of aggregation: union, joinNew, joinExisting, and fmrc (forecast model run collection). 
 
The current version of the NcML module supports two of these aggregations:
 
*Union [[NCML_Module_Aggregation_Union]]
*JoinNew [[NCML_Module_Aggregation_JoinNew]]
 
A ''union'' aggregation specifies that the first instance of a variable or attribute (by name) that is found in the ordered list of datasets will be the one in the output aggregation.   This is useful for combining two dataset files, each which may contain a single variable, into a composite dataset with both variables.


Any netcdf element may have one aggregation element containing some number of child netcdf elementsWe will call the data in a netcdf elements a ''dataset''.  For a tutorial on NcML 2.2 Aggregation, please see http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html.
A JoinNew aggregation joins a variable which exists in multiple datasets (usually samples of a datum over time) into a new variable containing the data from ''all'' member datasets by creating a new outer dimensionThe ''i''th component in the new outer dimension is the variable's data from the ''i''th member datasetIt also adds a new coordinate variable of whose name is the new dimension's name and whose shape (length) is the new dimension as well. This new coordinate variable may be explicitly given by the author or may be autogenerated in one of several ways.


The trunk version of the NcML Module now partially supports the "union" type aggregation for explicitly listed datasets (netcdf elements).  The scan element will be provided in a future release.
== <scan> Element ==


For the reference documentation on union, please see [[NCML_Module_Aggregation_Union]].
The scan element can be used within an aggregation context to allow a directory to be searched in various ways in order to specify the members of an aggregation.  This allows a static NcML file to refer to an aggregation which may change over time, such as where a new data file is generated each day.


== Future Additions ==
<font size="3">'''[[NCML_Module_Aggregation_JoinNew | We describe usage of the <scan> element in detail in the joinNew aggregation tutorial here.]]'''</font>.
We plan to add more features and NcML functionality in the future.  The next feature will be the ability to aggregate data. We are also considering allowing remote datasets to be referenced in a ''netcdf@location'''.


= Errors =
= Errors =
Line 557: Line 648:
* remove specified a non-existing attribute name
* remove specified a non-existing attribute name


= Grid Metadata Tutorial =
Please see the page [[Grid_Metadata_Tutorial]] for an example of adding metadata to the various parts of a DAP Grid variable.
= Aggregation Tutorials =
The NcML module may also be used to aggregate multiple datasets into one virtual dataset. 
We currently support three of the NcML aggregations:
* union
* joinNew
* joinExisiting
Please see the individual pages for each aggregation type for tutorials on their respective application and use..
=== [[NCML_Module_Aggregation_Union|Union]] ===
[[NCML_Module_Aggregation_Union|Union Aggregation]] - Combine multiple datasets into one by merging variables together, selecting the first of each unique name.
=== [[NCML_Module_Aggregation_JoinNew|JoinNew]] ===
[[NCML_Module_Aggregation_JoinNew|JoinNew Aggregation]] - Combine variables across multiple datasets by creating a new outer dimension and coordinate variable for each of the sample datasets.
=== [[NCML_Module_Aggregation_JoinExisting|JoinExisting]] ===
[[NCML_Module_Aggregation_JoinExisting|JoinExisting Aggregation]] - Combine variables with a common named outer dimension along that dimension by concatenating data for that dimension


= Additions/Changes to NcML 2.2 =
= Additions/Changes to NcML 2.2 =
Line 611: Line 724:
== Aggregation Element Location and Processing Order Differences ==
== Aggregation Element Location and Processing Order Differences ==


Since we're using a SAX parser (and for reasons of clarity and power), we don't enforce that the aggregation element come last in a <netcdf> element, as the NcML 2.2 schema specifies in terms of the  xsd:sequence.  The notes about aggregation at the Unidata site specify that the aggregation must be processed FIRST, even though it is specified last in the <netcdf> element.  As discussed in the aggregation section of this document, we allow the other elements before or after the aggregation, and this context changes which items end up in the output.  This allows an author to specify attributes prior to an aggregation that will "override" the aggregation versions (in the union sense).  It also means that '''any transformation elements desired to be applied to members of the aggregation must come after the aggregation'''.  This reflects our in-order processing of the aggregation.  We believe that this extension doesn't restrict parsers to a DOM model and also adds more specificity to how metadata transformations are applied to aggregated data.
NcML specifies that if a dataset (<netcdf> element) specifies an aggregation element, the aggregation element is always processed first, regardless of its ordering within the <netcdf> element.  Our parser, since it is SAX and not DOM, modifies this behavior in that order matters in some cases:


= NcML Examples =
* Metadata (<attribute>) elements specified ''prior'' to an aggregation "shadow" the aggregation versions.  This is be useful for "overriding" an attribute or variable in a union aggregation, where the first found will take precedence.
* JoinNew: If the new coordinate variable's data is to be set explicitly by specifying the new dimension's shape (either with explicit data or the autogenerated data using values@start and values@increment attributes), the <variable> ''must'' come after the aggregation since the size of the dimension is unknown until the aggregation element is processed.


Example NcML files used by the test-suite can be found in the data
= Backward Compatibility Issues =
directory of a src distribution.


''TODO Put a few here to give an idea of functionality...''
Due to the way shared dimensions were implemented in the NetCDF, HDF4, and HDF5 handlers, the DAS responses did not follow the DAP2 specification.  The NcML module, on the other hand, generates DAP2 compliant DAS for these datasets, which means that wrapping some datasets in NcML will generate a DAS with a different structure.  This is important for the NcML author since it changes the names of attributes and variables. In order for the module to find the correct scope for adding metadata, for example, the DAP2 DAS must be used.


In general, what this means is that an empty "passthrough" NcML file should be the starting point for authoring an NcML file.  This file would just specify a dataset and nothing else:


= HDF4 DAS Compatibility Issue =
<pre>
<netcdf location="/data/ncml/myNetcdf.nc"/>
</pre>


There is a bug in the Hyrax HDF4 Module such that the DAS produced in
The author would then request the DAS response for the NCML file and use that as the starting point for modifications to the original dataset.  
incorrect.  If an NcML file is used to "wrap" an HDF4 dataset, the
correct DAP2 DAS response will be generated, however.  


This is important for those writing NcML for HDF4 data since the
More explicit examples are given below.
lexical scope for attributes relies on the correct DAS form --- to
handle this, the user should start with a "passthrough" NcML file and
use the DAS from that as the starting point for knowing the structure
the NcML handler expects to see in the NcML file.  Alternatively, the DDX has the
proper attribute structure as well (the DAS is generated from it).


= NetCDF Compatibility Issue =
== NetCDF ==


The NetCDF handler represents some NC datasets as a DAP 2 Grid, but the returned DAS is not consistent with the DAP 2 spec for the attribute hierarchy for such a Grid.  The map vector attributes are placed as siblings of the grid attributes rather than within the grid lexical scope.  For example, here's the NetCDF Handler DDS for a given file:
The NetCDF handler represents some NC datasets as a DAP 2 Grid, but the returned DAS is not consistent with the DAP 2 spec for the attribute hierarchy for such a Grid.  The map vector attributes are placed as siblings of the grid attributes rather than within the grid lexical scope.  For example, here's the NetCDF Handler DDS for a given file:
Line 781: Line 890:


This clearly shows that the structure of the Grid must be used in the NcML:  the attribute being added is technically "cldc.lat.Description" in a fully qualified name.  The parser would return an error if it was attempted as "lat.Description" as the NetCDF DAS for the original file would have led one to believe.
This clearly shows that the structure of the Grid must be used in the NcML:  the attribute being added is technically "cldc.lat.Description" in a fully qualified name.  The parser would return an error if it was attempted as "lat.Description" as the NetCDF DAS for the original file would have led one to believe.
== HDF4/HDF5 ==
Similarly to the NetCDF case, the Hyrax HDF4 Module produces DAS responses that do not respect the DAP2 specification.  If an NcML file is used to "wrap" an HDF4 dataset, the
correct DAP2 DAS response will be generated, however.
This is important for those writing NcML for HDF4 data since the
lexical scope for attributes relies on the correct DAS form --- to
handle this, the user should start with a "passthrough" NcML file (see the above NetCDF example) and
use the DAS from that as the starting point for knowing the structure
the NcML handler expects to see in the NcML file.  Alternatively, the DDX has the
proper attribute structure as well (the DAS is generated from it).




Line 792: Line 914:


* New NcML Aggregations
* New NcML Aggregations
** JoinExisting
*** Joins a variable across multiple datasets by appending the data for a given dimension from each dataset
*** Will also allow directory scans for specifying the aggregation
** Forecast Model Run Collection (FMRC)
** Forecast Model Run Collection (FMRC)
*** Special case of JoinNew for forecast data with two time variables
*** Special case of JoinNew for forecast data with two time variables

Latest revision as of 18:06, 23 July 2016

Introduction

Welcome to the OPeNDAP NcML Data Handler Module v1.4.2 for Hyrax

Note: In the past Hyrax was distributed as a collection of separate binary packages which data providers would choose to install to build up a server with certain features. As the number of modules grew, this became more and more complex and time consuming. As of Hyrax 1.12 we started distributing the server in three discreet packages - the DAP library, the BES daemon and all of the most important handlers (including the NcML handler described here) and the Hyrax web services front end. In some places in this documentation you may read about 'installing the handler' or other similar text, and can safely ignore that. If you have a modern version of the server it includes this handler.


Features

This version currently implements a subset of NcML 2.2 functionality, along with some OPeNDAP extensions:

  • Metadata Manipulation
    • Addition, Removal, and Modification of attributes to other datasets (NetCDF, HDF4, HDF5, etc.) served by the same Hyrax 1.6 server
    • Extends NcML 2.2 to allow for common nested "attribute containers"
    • Attributes can be DAP2 types as well as the NcML types
    • Attributes can be of the special "OtherXML" type for injecting arbitrary XML into a DDX response
  • Data Manipulation
    • Addition of new data variables (scalars or arrays of basic types as well as structures)
    • Variables may be removed from the wrapped dataset
    • Allows the creation of "pure virtual" datasets which do not wrap another dataset
  • Aggregations: JoinNew, JoinExisting Union #Aggregation Tutorials
    • JoinNew Aggregation (NCML_Module_Aggregation_JoinNew)
      • Allows multiple datasets to be "joined" by creating a new outer dimension for the aggregated variable
      • Aggregation member datasets can be listed explicitly with explicit coordinates for the new dimension for each member
      • Scan: Aggregations can be specified "automatically" by scanning a directory for files matching certain criteria, such as a suffix or regular expression.
      • Metadata may be added to the new coordinate variable for the new dimension
    • JoinExisting Aggregation (NCML_Module_Aggregation_JoinExisting)
      • The ncoords element can be left out of the joinexisting granules. However, this may be a slow operation, depending on the number of granules in the aggregation.
      • Scan may also be used with ncoords attribute for uniform sized granules
      • Only allows join dimension to be aggregated from granules and not overridden in NcML
    • Union Aggregation (NCML_Module_Aggregation_Union)
      • Merges all member datasets into one by taking the first named instance of variables and metadata from the members
      • Useful for combining two or more datasets with different variables into a single set

Installation from Source

For information on how to build and install the NcML Data Module, please see the INSTALL file that came with the source distribution.

Installation Overview

The NcML Module requires a working Hyrax 1.6 installation. It is a module that is dynamically loaded into the Hyrax BES (Back End Server) to allow it to handle NcML files.

Please see the file INSTALL for full build and install instructions as well as requirements.

NOTE: After installation, you MUST restart Hyrax by restarting the BES and OLFS so the NcML Module is loaded!

Requirement: International Components for Unicode (ICU) Library

The most important external requirement is an installation of the International Components for Unicode (ICU) version 3.6 or higher (tested up to 4.2.1). The source distributions (as well as some binaries) may be found at the site: http://site.icu-project.org/download

If you are using Linux RPM's to run Hyrax, you can get an RPM for ICU as well. Search for the RPM named "libicu" using a package manager or yum, e.g. If you are compiling the module from source, you will also need the RPM "libicu-devel" to get the headers installed.

If you install in the default locations, the ncml_module should find the libraries and headers. Otherwise, please consult the INSTALL file for more information about installing ICU to a non-standard location.

Configuration Parameters

TempDirectory

Where should the NCML handler store temporary data on the server's file system.

Default value is '/tmp'.

NCML.TempDirectory=/tmp

GlobalAttributesContainerName

In DAP2 all global attributes must be held in containers. However, the default behavior for the handler is set for DAP4, where this requirement is relaxed so that any kind of attribute can be a global attribute. However, to support older clients that only understand DAP2, the handler will bundle top-level non-container attributes into a container. Use this option to set the name of that container. By default, the container is named NC_GLOBAL (because lots of clients look for that name), but it can be anything you choose.

NCML.GlobalAttributesContainerName=NC_GLOBAL

Testing Installation

Test data is provided to see if the installation was successful. The file sample_virtual_dataset.ncml is a dataset purely created in NcML and doesn't contain an underlying dataset. You may also view fnoc1_improved.ncml to test adding attributes to an existing netCDF dataset (fnoc1.nc), but this requires the netCDF data handler to be installed first! Several other examples installed also use the HDF4 and HDF5 handlers.

Functionality

This version of the NcML Module implements a subset of NcML 2.2 functionality. The reader is directed to http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/ for more information on NcML.

Our module can currently:

  • Refer only to files being served locally (not remotely)
  • Add, modify, and remove attribute metadata to a dataset
  • Create a purely virtual dataset using just NcML and no underlying dataset
  • Create new scalar variables of any simple NcML type or simple DAP type
  • Create new Structure variables (which can contain new child variables)
  • Create new N-dimensional arrays of simple types (NcML or DAP)
  • Remove existing variables from a wrapped dataset
  • Rename existing variables in a wrapped dataset
  • Name dimensions as a mnemonic for specifying Array shapes
  • Perform union aggregations on multiple datasets, virtual or wrapped or both
  • Perform joinNew aggregations to merge a variable across multiple datasets by creating a new outer dimension
  • Specify aggregation member datasets by scanning directories for files matching certain criteria

We describe each supported NcML element in detail below.

<netcdf> Element

The <netcdf> element is used to define a dataset, either a wrapped dataset that is to be modified, a pure virtual dataset, or a member dataset of an aggregation. The <netcdf> element is assumed to be the topmost node, or as a child of an aggregation element.

Local vs. Remote Datasets

We assume that the location attribute (netcdf@location) refers to the full path (with respect to the BES data root directory) of a local dataset (served by the same Hyrax server). The current version of the module cannot be used to modify remote datasets.

If netcdf@location is the empty string (or unspecified, as empty is the default), the dataset is a pure virtual dataset, fully specified within the NcML file itself. Attributes and variables may be fully described and accessed with constraints just as normal datasets in this manner. The installed sample datafile "sample_virtual_dataset.ncml" is an example test case for this functionality.

Unsupported Attributes

The current version does not support the following attributes of <netcdf>:

  • enhance
  • addRecords
  • fmrcDefinition (will be supported when FMRC aggregation is added)

<readMetadata> Element

The <readMetadata/> element is the default, so is effectively not needed.

<explicit> element

The <explicit/> element simply clears all attribute tables in the referred to netcdf@location before applying the rest of the NcML transformations to the metadata.

<dimension> Element

The <dimension> element has limited functionality in this release since the DAP2 doesn't support dimensions as more than mnemonics at this time. The limitations are:

  • We only parse the dimension@name and dimension@length attributes.
  • Dimensions can only be specified as a direct child of a <netcdf> element prior to any reference to them

For example:

<netcdf> 
  <dimension name="station" length="2"/>
  <dimension name="samples" length="5"/>
  <!-- Some variable elements refer to the dimensions here -->
</netcdf>

The dimension element sets up a mapping from the name to the unsigned integer length and can be used in a variable@shape to specify a length for an array dimension (see the section on <variable> below). The dimension map is cleared when </netcdf> is encountered (though this doesn't matter currently since we allow only one right now, but it will matter for aggregation, potentially). We also do not support <group>, which is the only other legal place in NcML 2.2 for a dimension element.

Parse Errors:

  • If the name and length are not both specified.
  • If the dimension name already exists in the current scope
  • If the length is not an unsigned integer
  • If any of the other attributes specified in NcML 2.2 are used. We do not handle them, so we consider them errors now.

<variable> Element

The <variable> element is used to:

  • Provide lexical scope for a contained <attribute> or <variable> element
  • Rename existing variables
  • Add new scalar variables of simple types
  • Add new Structure variables
  • Add new N-dimensional Array's of simple types
  • Specify the coordinate variable for the new dimension in a joinNew aggregation

We describe each in turn in more detail.

NB: When working with an existing variable (array or otherwise) it is not required that the variable type be specified in it' NcML declaration. All that is needed is the correct name (in lexical scope). When specifying the type for an existing variable care must be taken to ensure that the type specified in the NcML document matches the type of the existing variable. In particular, variables that are arrays must be called array, and not the type of the template primitive.


Specifying Lexical Scope with <variable type="">

Consider the following example:

  <variable name="u">
    <attribute name="Metadata" type="string">This is metadata!</attribute>
  </variable>

This code assumes that a variable named "u" exists (of any type since we do not specify) and provides the lexical scope for the attribute "Metadata" which will be added or modified within the attribute table for the variable "u" (it's qualified name would be "u.Metadata").

Nested DAP Structure and Grid Scopes

Scoping variable elements may be nested if the containing variable is a Structure (this includes the special case of Grid)

 <variable name="DATA_GRANULE" type="Structure">
    <variable name="PlanetaryGrid" type="Structure">
      <variable name="percipitate">
	<attribute name="units" type="String" value="inches"/>
      </variable>
    </variable>
  </variable>

This adds a "unit" attribute to the variable "percipitate" within the nested Structure's ("DATA_GRANULE.PlanetaryGrid.percipitate" as fully qualified name). Note that we must refer to the type explicitly as a "Structure" so the parser knows to traverse the tree.

Note the variable might be of type Grid, but the type "Structure" must be used in the NcML to traverse it.

Adding Multiple Attributes to the Same Variable

Once the variable's scope is set by the opening <variable> element, more than one attribute can be specified within it. This will make the NcML more readable and also will make the parsing more efficient since the variable will only need to be looked up once.

For example,

<variable name="Foo">
   <attribute name="Attr_1" type="string" value="Hello"/>
   <attribute name="Attr_2" type="string" value="World!"/>
</variable>

should be preferred over:

<variable name="Foo">
   <attribute name="Attr_1" type="string" value="Hello"/>
</variable>

<variable name="Foo">
   <attribute name="Attr_2" type="string" value="World!"/>
</variable>

although they produce the same result. Any number of attributes can be specified before the variable is closed.

Renaming Existing Variables

The attribute variable@orgName is used to rename an existing variable.

For example:

<variable name="NewName" orgName="OldName"/>

will rename an existing variable at the current scope named "OldName" to "NewName". After this point in the NcML file (such as in constraints specified for the DAP request), the variable is known by "NewName".

Note that the type is not required here --- the variable is assumed to exist and its existing type is used. It is not possible to change the type of an existing variable at this time!

Parse Errors:

  • If a variable with variable@orgName doesn't exist in the current scope
  • If the new name variable@name is already taken in the current scope
  • If a new variable is created but does not have exactly one values element

Adding a New Scalar Variable

The <variable> element can be used to create a new scalar variable of a simple type (i.e. an atomic NcML type such as "int" or "float", or any DAP atomic type, such as "UInt32" or "URL") by specifying an empty variable@shape (which is the default), a simple type for variable@type, and a contained <values> element with the one value of correct type.

For example:

<variable name="TheAnswerToLifeTheUniverseAndEverything" type="double">
    <attribute name="SolvedBy" type="String" value="Deep Thought"/>
    <values>42.000</values>
  </variable>

will create a new variable named "TheAnswerToLifeTheUniverseAndEverything" at the current scope. It has no shape so will be a scalar of type "double" and will have the value 42.0.

Parse Errors:

  • It is a parse error to not specify a <values> element with exactly one proper value of the variable type.
  • It is a parse error to specify a malformed or out of bounds value for the data type

Adding a New Structure Variable

A new Structure variable can be specified at the global scope or within another Structure. It is illegal for an array to have type structure, so the shape must be empty.

For example:

<variable name="MyNewStructure" type="Structure">
    <attribute name="MetaData" type="String" value="This is metadata!"/>
    <variable name="ContainedScalar1" type="String"><values>I live in a new structure!</values></variable>
    <variable name="ContainedInt1" type="int"><values>42</values></variable>
  </variable>

specifies a new structure called "MyNewStructure" which contains two scalar variable fields "ContainedScalar1" and "ContainedInt1".

Nested structures are allowed as well.

Parse Error:

  • If another variable or attribute exists at the current scope with the new name.
  • If a <values> element is specified as a direct child of a new Structure --- structures cannot contain values, only attributes and other variables.

Adding a New N-dimensional Array

An N-dimensional array of a simple type may be created virtually as well by specifying a non-empty variable@shape. The shape contains the array dimensions in left-to-right order of slowest varying dimension first. For example:

 <variable name="FloatArray" type="float" shape="2 5">
      <!-- values specified in row major order (leftmost dimension in shape varies slowest) 
	Any whitespace is a valid separator by default, so we can use newlines to pretty print 2D matrices.
	-->
      <values>
	0.1 0.2 0.3 0.4 0.5
	1.1 1.1 1.3 1.4 1.5
      </values>
    </variable>

will specify a 2x5 dimension array of float values called "FloatArray". The <values> element must contain 2x5=10 values in row major order (slowest varying dimension first). Since whitespace is the default separator, we use a newline to show the dimension boundary for the values, which is easy to see for a 2D matrix such as this.

A dimension name may also be used to refer mnemonically to a length. The DAP response will use this mnemonic in its output, but it is not currently used for shared dimensions, only as a mnemonic. See the section on the <dimension> element for more information. For example:

<netcdf>
 <dimension name="station" length="2"/>
 <dimension name="sample" length="5"/>
 <variable name="FloatArray" type="float" shape="station sample">
      <values>
	0.1 0.2 0.3 0.4 0.5
	1.1 1.1 1.3 1.4 1.5
      </values>
    </variable>

will produce the same 2x5 array, but will incorporate the dimension mnemonics into the response. For example, here's the DDS response:

Dataset {
     Float32 FloatArray[station = 2][samples = 5];
} sample_virtual_dataset.ncml;

Note that the <values> element respects the values@separator attribute if whitespace isn't correct. This is very useful for arrays of strings with whitespace, for example.

<variable name="StringArray" type="string" shape="3">
  <values separator="*">String 1*String 2*String 3</values>
</variable>

creates a length 3 array of string StringArray = {"String 1", "String 2", "String 3"}.


Parse Errors:

  • It is an error to specify the incorrect number of values
  • It is an error if any value is malformed or out of range for the data type.
  • It is an error to specify a named dimension which does not exist in the current <netcdf> scope.
  • It is an error to specify an Array whose flattened size (product of dimensions) is > 2^31-1.

Specifying the new coordinate variable for a joinNew aggregation

In the special case of a joinNew aggregation, the new coordinate variable may be specified with the <variable> element. The new coordinate variable is defined to have the same name as the new dimension. This allows for several things:

  • Explicit specification of the variable type and coordinates for the new dimension
  • Specification of the metadata for the new coordinate variable

In the first case, the author can specify explicitly the type of the new coordinate variable and the actual values for each dataset. In this case, the variable must be specified after the aggregation element in the file so the new dimension's size (number of member datasets) may be known and error checking performed. Metadata can also be added to the variable here.

In the second case, the author may just specify the variable name, which allows one to specify the metadata for a coordinate variable that is automatically generated by the aggregation itself. This is the only allowable case for a variable element to not contain a values element! Coordinate variables are generated automatically in two cases:

  • The author has specified an explicit list of member datasets, with or without explicit coordVal attributes.
  • The author has used a <scan> element to specify the member datasets via a directory scan

In this case, the <variable> element may come before or after the <aggregation>.

Parse Errors:

  • If an explicit variable is declared for the new coordinate variable:
    • And it contains explicit values, the number of values must be equal to the number of member datasets in the aggregation.
    • It must be specifed after the <aggregation> element
  • If a numeric coordVal is used to specify the first member dataset's coordinate, then all datasets must contain a numerical coordinate.
  • An error is thrown if the specified aggregation variable (variableAgg) is not found in all member datasets.
  • An error is thrown if the specified aggregation variable is not of the same type in all member datasets. Coercion is not performed!
  • An error is thrown if the specified aggregation variables in all member datasets do not have the same shape
  • An error is thrown if an explicit coordinate variable is specified with a shape that is not the same as the new dimension name (and the variable name itself).

<values> Element

The <values> element can only be used in the context of a new variable of scalar or array type. We cannot change the values for existing variables in this version of the handler. The characters content of a <values> element is considered to be a separated list of value tokens valid for the type of the variable of the parent element. The number of specified tokens in the content must equal the product of the dimensions of the enclosing variable@shape, or be one value for a scalar. It is an error to not specify a <values> element for a declared new variable as well.

Changing the Separator Tokens

The author may specify values@separator to change the default value token separator from the default whitespace. This is very useful for specifying arrays of strings with whitespace in them, or if data in CSV form is being pasted in.

Autogeneration of Uniform Arrays

We also can parse values@start and values@increment INSTEAD OF tokens in the content. This will "autogenerate" a uniform array of values of the given product of dimensions length for the containing variable. For example:

<variable name="Evens" type="int" shape="100">
  <values start="0" increment="2"/>
</variable>

will specify an array of the first 100 even numbers (including 0).

Parse Errors:

  • If the incorrect number of tokens are specified for the containing variable's shape
  • If any value token cannot be parsed as a valid value for the containing variable's type
  • If content is specified in addition to start and increment
  • If only one of start or increment is specified
  • If the values element is placed anywhere except within a NEW variable.

<attribute> Element

As an overview, whenever the parser encounters an <attribute> with a non-existing name (at the current scope), it creates a new one, whether a container or atomic attribute (see below). If the attribute exists, its value and/or type is modified to those specified in the <attribute> element. If an attribute structure (container) exists, it is used to define a nested lexical scope for child attributes.

Attributes may be scalar (one value) or one dimensional arrays. Arrays are specified by using whitespace (default) to separate the different values. The attribute@separator may also be set in order to specify a different separator, such as CSV format or to specify a non-whitespace separator so strings with whitespace are not tokenized. We will give examples of creating array attributes below.

Adding New Attributes or Modifying an Existing Attribute

If a specified attribute with the attribute@name does not exist at the current lexical scope, a new one is created with the given type and value. For example, assume "new_metadata" doesn't exist at the current parse scope. Then:

<attribute name="new_metadata" type="string" value="This is a new entry!"/>

will create the attribute at that scope. Note that value can be specified in the content of the element as well. This is identical to the above:

<attribute name="new_metadata" type="string">This is a new entry!</attribute>

If the attribute@name already exists at the scope, it is modified to contain the specified type and value.


Arrays

As in NcML, for numerical types an array can be specified by separating the tokens by whitespace (default) or be specifying the token separator with attribute@separator. For example,

<attribute name="myArray" type="int">1 2 3</attribute>

and

<attribute name="myArray" type="int" separator=",">1,2,3</attribute>

both specify the same array of three integers named "myArray".

TODO Add more information on splitting with a separator!


Structures (Containers)

We use attribute@type="Structure" to define a new (or existing) attribute container. So if we wanted to add a new attribute structure, we'd use something like this:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

Assuming "MySamples" doesn't already exist, an attribute container will be created at the current scope and the "Location" and "Samples" attributes will be added to it.

Note that we can create nested attribute structures to arbitrary depth this way as well.

If the attribute container with the given name already exists at the current scope, then the attribute@type="Structure" form is used to define the lexical scope for the container. In other words, child <attribute> elements will be processed within the scope of the container. For example, in the above example, if "MySamples" already exists, then the "Location" and "Samples" will be processed within the existing container (they may or may not already exist as well).

Renaming an Existing Attribute or Attribute Container

We also support the attribute@orgName attribute for renaming attributes.

For example,

<attribute name="NewName" orgName="OldName" type="string"/>

will rename an existing attribute "OldName" to "NewName" while leaving its value alone. If attribute@value is also specified, then the attribute is renamed and has its value modified.

This works for renaming attribute containers as well:

<attribute name="MyNewContainer" orgName="MyOldContainer" type="Structure"/>

will rename an existing "MyOldContainer" to "MyNewContainer". Note that any children of this container will remain in it.

DAP OtherXML Extension

The module now allows specification of attributes of the new DAP type "OtherXML". This allows the NCML file author to inject arbitrary well-formed XML into an attribute for clients that want XML metadata rather than just string or url. Internally, the attribute is still a string (and in a DAP DAS response will be quoted inside one string). However, since it is XML, the NCMLParser still parses it and checks it for well-formedness (but NOT against schemas). This extension allows the NCMLParser to parse the arbitrary XML within the given attribute without causing errors, since it can be any XML.

The injected XML is most useful in the DDX response, where it shows up directly in the response as XML. XSLT and other clients can then parse it.

Errors

  • The XML must be in the content of the <attribute type="OtherXML"> element. It is a parser error for attribute@value to be set if attribute@type is "OtherXML".
  • The XML must also be well-formed since it is parsed. A parse error will be thrown if the OtherXML is malformed.

Example

Here's an example of the use of this special case.

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/coverage/200803061600_HFRadar_USEGC_6km_rtv_SIO.nc">

    <attribute name="someName" type="OtherXML">
        <Domain xmlns="http://www.opengis.net/wcs/1.1" 
                xmlns:ows="http://www.opengis.net/ows/1.1"
                xmlns:gml="http://www.opengis.net/gml/3.2"
                >
            <SpatialDomain>
                <ows:BoundingBox crs="urn:ogc:def:crs:EPSG::4326">
                    <ows:LowerCorner>-97.8839 21.736</ows:LowerCorner>
                    <ows:UpperCorner>-57.2312 46.4944</ows:UpperCorner>
                </ows:BoundingBox>
            </SpatialDomain>
            <TemporalDomain>
                <gml:timePosition>2008-03-27T16:00:00.000Z</gml:timePosition>
            </TemporalDomain>
        </Domain>
        <SupportedCRS xmlns="http://www.opengis.net/wcs/1.1">urn:ogc:def:crs:EPSG::4326</SupportedCRS>
        <SupportedFormat xmlns="http://www.opengis.net/wcs/1.1">netcdf-cf1.0</SupportedFormat>
        <SupportedFormat xmlns="http://www.opengis.net/wcs/1.1">dap2.0</SupportedFormat>
    </attribute>

</netcdf>

TODO Put the DDX response for the above in here!

Namespace Closure

Furthermore, the parser will make the chunk of OtherXML "namespace closed". This means any namespaces specified in parent NCML elements of the OtherXML tree will be "brought down" and added to the root OtherXML elements so that the subtree may be pulled out and added to the DDX and still have its namespaces. The algorithm doesn't just bring used prefixes, but brings all of the lexically scoped closest namespaces in all ancestors. In other words, it adds unique namespaces (as determined by prefix) in order from the root of the OtherXML tree as it traverses to the root of the NCML document.

Namespace closure is a syntactic sugar that simplifies the author's task since they can specify the namespaces just once at the top of the NCML file and expect that when the subtree of XML is added to the DDX that these namespaces will come along with that subtree of XML. Otherwise they have to explicitly add the namespaces to each attributes.

TODO Add an example!

<remove> Element

The <remove> element can remove attributes and variables. For example:

  <attribute name="NC_GLOBAL" type="Structure">
    <remove name="base_time" type="attribute"/>
  </attribute>

will remove the attribute named "base_time" in the attribute structure named "NC_GLOBAL".

Note that this works for attribute containers as well! We could recursively remove the entire attribute container (i.e. it and all its children) with:

 <remove name="NC_GLOBAL" type="attribute"/>

It also can be used to remove variables from existing datasets:

  <remove name="SomeExistingVariable" type="variable"/>

This also recurses on variables of type Structure --- the entire structure including all of its children are removed from the dataset's response.

Parse Errors:

  • It is a parse error if the given attribute or variable doesn't exist in the current scope

<aggregation> Element

Aggregation involves combining multiple datasets (<netcdf>) into a virtual "single" dataset in various ways. For a tutorial on aggregation in NcML 2.2, the reader is referred to the Unidata page: http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html

NcML 2.2 supports multiple types of aggregation: union, joinNew, joinExisting, and fmrc (forecast model run collection).

The current version of the NcML module supports two of these aggregations:

A union aggregation specifies that the first instance of a variable or attribute (by name) that is found in the ordered list of datasets will be the one in the output aggregation. This is useful for combining two dataset files, each which may contain a single variable, into a composite dataset with both variables.

A JoinNew aggregation joins a variable which exists in multiple datasets (usually samples of a datum over time) into a new variable containing the data from all member datasets by creating a new outer dimension. The ith component in the new outer dimension is the variable's data from the ith member dataset. It also adds a new coordinate variable of whose name is the new dimension's name and whose shape (length) is the new dimension as well. This new coordinate variable may be explicitly given by the author or may be autogenerated in one of several ways.

<scan> Element

The scan element can be used within an aggregation context to allow a directory to be searched in various ways in order to specify the members of an aggregation. This allows a static NcML file to refer to an aggregation which may change over time, such as where a new data file is generated each day.

We describe usage of the <scan> element in detail in the joinNew aggregation tutorial here..

Errors

There are three types of error messages that may be returned:

  • Internal Error
  • Resource Not Found Error
  • Parse Error

Internal Errors

Internal errors should be reported to support@opendap.org as they are likely bugs.

Resource Not Found Errors

If the netcdf@location specifies a non-existent local dataset (one that is not being served by the same Hyrax server), it will specify the resource was not found. This may also be returned if a handler for the specified dataset is not currently loaded in the BES. Users should test that the dataset to be wrapped already exists and can be viewed on the running server before writing NcML to add metadata. It's also an error to refer to remote datasets (at this time).

Parse Errors

Parse errors are user errors in the NcML file. These could be malformed XML, malformed NcML, unimplemented features of NcML, or could be errors in referring to the wrapped dataset.

The error message should specify the error condition as well as the "current scope" as a fully qualified DAP name within the loaded dataset. This should be enough information to correct the parse error as new NcML files are created.

The parser will generate parse errors in various situations where it expects to find certain structure in the underlying dataset. Some examples:

  • A variable of the given name was not found at the current scope
  • attribute@orgName was specified, but the attribute cannot be found at current scope.
  • attribute@orgName was specified, but the new name is already used at current scope.
  • remove specified a non-existing attribute name

Grid Metadata Tutorial

Please see the page Grid_Metadata_Tutorial for an example of adding metadata to the various parts of a DAP Grid variable.


Aggregation Tutorials

The NcML module may also be used to aggregate multiple datasets into one virtual dataset.

We currently support three of the NcML aggregations:

  • union
  • joinNew
  • joinExisiting

Please see the individual pages for each aggregation type for tutorials on their respective application and use..

Union

Union Aggregation - Combine multiple datasets into one by merging variables together, selecting the first of each unique name.

JoinNew

JoinNew Aggregation - Combine variables across multiple datasets by creating a new outer dimension and coordinate variable for each of the sample datasets.

JoinExisting

JoinExisting Aggregation - Combine variables with a common named outer dimension along that dimension by concatenating data for that dimension

Additions/Changes to NcML 2.2

This section will keep track of changes to the NcML 2.2 schema. Eventually these will be rolled into a new schema.

Attribute Structures (Containers)

This module also adds functionality beyond the current NcML 2.2 schema --- it can handle nested <attribute> elements in order to make attribute structures. This is done by using the <attribute type="Structure"> form, for example:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

"MyContainer" describes an attribute structure with two attribute fields, a string "Location" and an array of int's called "Samples". Note that an attribute structure of this form can only contain other <attribute> elements and NOT a value.

If the container does not already exist, it will be created at the scope it is declared, which could be:

  • Global (top of dataset)
  • Within a variable's attribute table
  • Within another attribute container

If an attribute container of the given name already exists at the lexical scope, it is traversed in order to define the scope for the nested (children) attributes it contains.

Unspecified Variable Type Matching for Lexical Scope

We also allow the type attribute of a variable element (variable@type) to be the empty string (or unspecified) when using existing variables to define the lexical scope of an <attribute> transformation. In the schema, variable@type is (normally) required.


DAP 2 Types

Additionally, we allow DAP2 atomic types (such as UInt32, URL) in addition to the NcML types. The NcML types are mapped onto the closest DAP2 type internally.

DAP OtherXML Attribute Type

We also allow attributes to be of the new DAP type "OtherXML" for injecting arbitrary XML into an attribute as content rather than trying to form a string. This allows the parser to check well-formedness.

Forward Declaration of Dimensions

Since we use a SAX parser for efficiency, we require the <dimension> elements to come before their use in a variable@shape. One way to change the schema to allow this is to force the dimension elements to be specified in a sequence after explicit and metadata choice and before all other elements.

Aggregation Element Location and Processing Order Differences

NcML specifies that if a dataset (<netcdf> element) specifies an aggregation element, the aggregation element is always processed first, regardless of its ordering within the <netcdf> element. Our parser, since it is SAX and not DOM, modifies this behavior in that order matters in some cases:

  • Metadata (<attribute>) elements specified prior to an aggregation "shadow" the aggregation versions. This is be useful for "overriding" an attribute or variable in a union aggregation, where the first found will take precedence.
  • JoinNew: If the new coordinate variable's data is to be set explicitly by specifying the new dimension's shape (either with explicit data or the autogenerated data using values@start and values@increment attributes), the <variable> must come after the aggregation since the size of the dimension is unknown until the aggregation element is processed.

Backward Compatibility Issues

Due to the way shared dimensions were implemented in the NetCDF, HDF4, and HDF5 handlers, the DAS responses did not follow the DAP2 specification. The NcML module, on the other hand, generates DAP2 compliant DAS for these datasets, which means that wrapping some datasets in NcML will generate a DAS with a different structure. This is important for the NcML author since it changes the names of attributes and variables. In order for the module to find the correct scope for adding metadata, for example, the DAP2 DAS must be used.

In general, what this means is that an empty "passthrough" NcML file should be the starting point for authoring an NcML file. This file would just specify a dataset and nothing else:

<netcdf location="/data/ncml/myNetcdf.nc"/>

The author would then request the DAS response for the NCML file and use that as the starting point for modifications to the original dataset.

More explicit examples are given below.

NetCDF

The NetCDF handler represents some NC datasets as a DAP 2 Grid, but the returned DAS is not consistent with the DAP 2 spec for the attribute hierarchy for such a Grid. The map vector attributes are placed as siblings of the grid attributes rather than within the grid lexical scope. For example, here's the NetCDF Handler DDS for a given file:

Dataset {
    Grid {
      Array:
        Int16 cldc[time = 456][lat = 21][lon = 360];
      Maps:
        Float64 time[time = 456];
        Float32 lat[lat = 21];
        Float32 lon[lon = 360];
    } cldc;
} cldc.mean.nc;

showing the Grid. Here's the DAS the NetCDF handler generates:

Attributes {
    lat {
        String long_name "Latitude";
        String units "degrees_north";
        Float32 actual_range 10.00000000, -10.00000000;
    }
    lon {
        String long_name "Longitude";
        String units "degrees_east";
        Float32 actual_range 0.5000000000, 359.5000000;
    }
    time {
        String units "days since 1-1-1 00:00:0.0";
        String long_name "Time";
        String delta_t "0000-01-00 00:00:00";
        String avg_period "0000-01-00 00:00:00";
        Float64 actual_range 715511.00000000000, 729360.00000000000;
    }
    cldc {
        Float32 valid_range 0.000000000, 8.000000000;
        Float32 actual_range 0.000000000, 8.000000000;
        String units "okta";
        Int16 precision 1;
        Int16 missing_value 32766;
        Int16 _FillValue 32766;
        String long_name "Cloudiness Monthly Mean at Surface";
        String dataset "COADS 1-degree Equatorial Enhanced\\012AI";
        String var_desc "Cloudiness\\012C";
        String level_desc "Surface\\0120";
        String statistic "Mean\\012M";
        String parent_stat "Individual Obs\\012I";
        Float32 add_offset 3276.500000;
        Float32 scale_factor 0.1000000015;
    }
    NC_GLOBAL {
        String title "COADS 1-degree Equatorial Enhanced";
        String history "";
        String Conventions "COARDS";
    }
    DODS_EXTRA {
        String Unlimited_Dimension "time";
    }
}

Note the map vector attributes are in the "dataset" scope.

Here's the DAS that the NcML Module produces from the correctly formed DDX:

Attributes {
    NC_GLOBAL {
        String title "COADS 1-degree Equatorial Enhanced";
        String history "";
        String Conventions "COARDS";
    }
    DODS_EXTRA {
        String Unlimited_Dimension "time";
    }
    cldc {
        Float32 valid_range 0.000000000, 8.000000000;
        Float32 actual_range 0.000000000, 8.000000000;
        String units "okta";
        Int16 precision 1;
        Int16 missing_value 32766;
        Int16 _FillValue 32766;
        String long_name "Cloudiness Monthly Mean at Surface";
        String dataset "COADS 1-degree Equatorial Enhanced\\012AI";
        String var_desc "Cloudiness\\012C";
        String level_desc "Surface\\0120";
        String statistic "Mean\\012M";
        String parent_stat "Individual Obs\\012I";
        Float32 add_offset 3276.500000;
        Float32 scale_factor 0.1000000015;
        cldc {
        }
        time {
            String units "days since 1-1-1 00:00:0.0";
            String long_name "Time";
            String delta_t "0000-01-00 00:00:00";
            String avg_period "0000-01-00 00:00:00";
            Float64 actual_range 715511.00000000000, 729360.00000000000;
        }
        lat {
            String long_name "Latitude";
            String units "degrees_north";
            Float32 actual_range 10.00000000, -10.00000000;
        }
        lon {
            String long_name "Longitude";
            String units "degrees_east";
            Float32 actual_range 0.5000000000, 359.5000000;
        }
    }
}

Here the Grid Structure "cldc" and its contained data array (of the same name "cldc") and map vectors have their own attribute containers as DAP 2 specifies.

What this means for the author of an NcML file adding metadata to a NetCDF dataset that returns a Grid is that they should generate a "passthrough" file and get the DAS and then specify modifications based on that structure.

Here's an example passthrough:

<netcdf location="data/ncml/agg/cldc.mean.nc" title="This file results in a Grid">
</netcdf>


For example, to add an attribute to the map vector "lat" in the above, we'd need the following NcML:

<netcdf location="data/ncml/agg/cldc.mean.nc" title="This file results in a Grid">
  <!-- Traverse into the Grid as a Structure -->
  <variable name="cldc" type="Structure">
    <!-- Traverse into the "lat" map vector (Array) -->
    <variable name="lat"> 
      <attribute name="Description" type="string">I am a new attribute in the Grid map vector named lat!</attribute>
    </variable>
    <variable name="lon"> 
      <attribute name="Description" type="string">I am a new attribute in the Grid map vector named lon!</attribute>
    </variable>
  </variable>
</netcdf>

This clearly shows that the structure of the Grid must be used in the NcML: the attribute being added is technically "cldc.lat.Description" in a fully qualified name. The parser would return an error if it was attempted as "lat.Description" as the NetCDF DAS for the original file would have led one to believe.


HDF4/HDF5

Similarly to the NetCDF case, the Hyrax HDF4 Module produces DAS responses that do not respect the DAP2 specification. If an NcML file is used to "wrap" an HDF4 dataset, the correct DAP2 DAS response will be generated, however.

This is important for those writing NcML for HDF4 data since the lexical scope for attributes relies on the correct DAS form --- to handle this, the user should start with a "passthrough" NcML file (see the above NetCDF example) and use the DAS from that as the starting point for knowing the structure the NcML handler expects to see in the NcML file. Alternatively, the DDX has the proper attribute structure as well (the DAS is generated from it).


Known Bugs

There are no known bugs currently.

Planned Future Enhancements

Planned enhancements for future versions of the module include:

Copyright

This software is copyrighted under the GNU Lesser GPL. Please see the files COPYING and COPYRIGHT that came with this distribution.