BES - Modules - NcML Module: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
mNo edit summary
(Added more examples of functionality supported, errors, and cleaned up formatting.)
Line 1: Line 1:
README with distribution
= Introduction =


Updated for version 0.9.0 (27 July 2009)
[Updated for version 0.9.0 (27 July 2009)]


Welcome to the OPeNDAP NcML Data Handler Module for Hyrax!
'''Welcome to the OPeNDAP NcML Data Handler Module for Hyrax!'''


This module can be added to a Hyrax installation to extend its data
This module can be added to a Hyrax installation to extend its data
Line 12: Line 12:
For information on how to build and install the NcML Data Module,
For information on how to build and install the NcML Data Module,
please see the INSTALL file.
please see the INSTALL file.
     
           
More detailed documentation for this module can be found at: http://docs.opendap.org/index.php/BES_-_Modules_-_NcML_Module
This initial version (0.9.0) implements only a subset of NcML 2.2
   
functionality (specifically, attribute metadata manipulation).  It also adds new functionality  
The initial version implements only a subset of NcML 2.2
to NcML 2.2 -- please see the sections "Functionality" and "Additions to NcML 2.2" below.
functionality, and also adds functionality -- please see the sections
"Functionality" and "NcML Additions" below.


----------------------------------------------------------------------
If you are planning to use the module to wrap HDF4 datasets, please see the section below
* Contents
as there is a bug in HDF4 handler which could cause unexpected parse errors or behavior when
wrapping HDF4 datasets.


  * Installation Overview
  * Functionality
  * NcML Additions
  * Example NcML
  * HDF4 DAS Compatibility Issue
  * Copyright


----------------------------------------------------------------------
= Installation Overview =
* Installation Overview


The NcML Module requires a working Hyrax installation.  It is a module
The NcML Module requires a working Hyrax installation.  It is a module
Line 60: Line 52:
installed since it adds metadata to a netCDF dataset (fnoc1.nc).
installed since it adds metadata to a netCDF dataset (fnoc1.nc).


----------------------------------------------------------------------
= Functionality =
* Functionality


This version of the NcML Module implements a subset of NcML
This version of the NcML Module implements a subset of NcML
functionality.  It can:
functionality.  It can:


    1) Add metadata only to files being served locally (not remotely)
*Add metadata only to files being served locally (not remotely)
    2) Add metadata to only one dataset (one <netcdf> node).
*Add metadata to only one dataset (one <netcdf> node).
    3) Add, modify, and remove metadata (attributes) and not data (variables).
*Add, modify, and remove metadata (attributes) and not data (variables).


In particular, this version of the NcML module can only handle a
In particular, this version of the NcML module can only handle a
Line 75: Line 66:
<variable>, and <remove>.
<variable>, and <remove>.


The <variable> element can not be used to add data, but only be used
== <netcdf> Element ==
to provide the lexical scope for an <attribute> element.  For example:
 
For this version we assume that:
 
* There is only one <netcdf> element in the file.
* The location attribute (netcdf@location) refers to a ''local'' dataset (served by the same Hyrax server)
 
The <netcdf> element is assumed to be the topmost node.
 
== <readMetadata> Element ==
 
The <readMetadata/> element is the default, so is effectively not needed.
 
== <explicit> element ==
 
The <explicit/> element simply clears all attribute tables in the referred to netcdf@location before applying the rest of the
NcML transformations to the metadata.
 
== <variable> Element ==
 
The <variable> element cannot currently be used to add data, but only be used
to provide the lexical scope for a contained <attribute> element.  For example:


   <variable name="u">
   <variable name="u">
Line 82: Line 93:
   </variable>
   </variable>


assumes that a variable named "u" exists (of any type) and the
This assumes that a variable named "u" exists (of any type since we do not specify) and the
attribute "Metadata" will refer to the metadata for that variable.
attribute "Metadata" will be added/modified in the attribute table for that variable.
These can be nested.
 
These can be nested if, for example, if the variable is a Structure:
 
<pre>
<variable name="DATA_GRANULE" type="Structure">
    <variable name="PlanetaryGrid" type="Structure">
      <variable name="percipitate">
<attribute name="units" type="String" value="inches"/>
      </variable>
    </variable>
  </variable>
</pre>
 
This adds a "unit" attribute to the variable "percipitate" within the nested Structure's
("DATA_GRANULE.PlanetaryGrid.percipitate" as fully qualified name). 
Note that we refer to the type explicitly as a "Structure" so the parser knows
to traverse the tree. 
 
== <attribute> Element ==
 
As an overview, whenever the parser encounters an <attribute> with a non-existing name (at the current scope),
it creates a new one, whether a container or atomic attribute (see below).  If the attribute exists,
its value and/or type is modified to those specified in the <attribute> element.  If an attribute
structure (container) exists, it is used to define a nested lexical scope for child attributes. 
 
=== Adding New Attributes or Modifying an Existing Attribute ===
 
If a specified attribute with the attribute@name does not exist at the current lexical scope,
a new one is created with the given type and value.  For example, assume "new_metadata" doesn't
exist at the current parse scope. Then:
 
<attribute name="new_metadata" type="string" value="This is a new entry!"/>
 
will create the attribute at that scope.  Note that value can be specified in the content of the
element as well.  This is identical to the above:
 
<attribute name="new_metadata" type="string">This is a new entry!</attribute>
 
If the attribute@name already exists at the scope, it is modified to contain the specified type and value.
 
 
==== Arrays ====
 
As in NcML, for numerical types an array can be specified by separating the tokens by whitespace (default) or be
specifying the token separator with attribute@separator.  For example,
 
<attribute name="myArray" type="int">1 2 3</attribute>
 
and


Also, the <remove/> element can only remove attributes and not
<attribute name="myArray" type="int" separator=",">1,2,3</attribute>
variables.


both specify the same array of three integers named "myArray".
==== Structures (Containers) ====
We use attribute@type="Structure" to define a new (or existing) attribute container.  So if we
wanted to add a new attribute structure, we'd use something like this:
  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>
Assuming "MySamples" doesn't already exist, an attribute container will be created at the current scope
and the "Location" and "Samples" attributes will be added to it.
Note that we can create nested attribute structures to arbitrary depth this way as well.
If the attribute container with the given name already exists at the current scope,
then the attribute@type="Structure" form is used to define
the lexical scope for the container.  In other words, child <attribute> elements will
be processed within the scope of the container.  For example, in the above example, if
"MySamples" already exists, then the "Location" and "Samples" will be processed within the
existing container (they may or may not already exist as well).
==== Renaming an Existing Attribute or Attribute Container ====
We also support the attribute@orgName attribute for renaming attributes. 
For example,
<attribute name="NewName" orgName="OldName" type="string"/>
will rename an existing attribute "OldName" to "NewName" while leaving its value alone.
If attribute@value is also specified, then the attribute is renamed ''and'' has its value modified.
This works for renaming attribute containers as well:
<attribute name="MyNewContainer" orgName="MyOldContainer" type="Structure"/>
will rename an existing "MyOldContainer" to "MyNewContainer".  Note that any children
of this container will remain in it.
== <remove> Element ==
Currently, the <remove> element can only remove attributes and not
variables.  In other words, its type ''must'' be "attribute".  For example:
  <attribute name="NC_GLOBAL" type="Structure">
    <remove name="base_time" type="attribute"/>
  </attribute>
will remove the attribute named "base_time" in the attribute structure named "NC_GLOBAL".
Note that this works for attribute containers as well!  We could recursively remove the ''entire''
attribute container (i.e. it and all its children) with:
<remove name="NC_GLOBAL" type="attribute"/>
It is a parse error if the given attribute doesn't exist. 
== Future Additions ==
We plan to add more features and NcML functionality, such as the
We plan to add more features and NcML functionality, such as the
ability to add and remove variables and to aggregate data in future
ability to add and remove variables and to aggregate data in future
versions.
versions.


----------------------------------------------------------------------
= Errors =
* NcML Additions


There are two types of error messages that may be returned:
* Internal Error
* Parse Error
== Internal Errors ==
'''Internal errors''' should be reported to support@opendap.org as they are likely bugs.
== Parse Errors ==
'''Parse errors''' are user errors in the NcML file.  These could be malformed XML, malformed NcML,
unimplemented features of NcML, or could be errors in referring to the wrapped dataset. 
The error message should specify
the error condition as well as the "current scope" as a fully qualified DAP name within the
loaded dataset.  This should be enough information to correct the parse error as new NcML
files are created.
The parser will generate parse errors in various
situations where it expects to find certain structure in the underlying dataset.  Some examples:
* A variable of the given name was not found at the current scope
* attribute@orgName was specified, but the attribute cannot be found at current scope.
* attribute@orgName was specified, but the new name is already used at current scope.
* remove specified a non-existing attribute name
= Additions to NcML 2.2 =
This section will keep track of changes to the NcML 2.2 schema.  Eventually these
will be rolled into a new schema.
== Attribute Structures (Containers) ==
This module also adds functionality beyond the current NcML 2.2 schema
This module also adds functionality beyond the current NcML 2.2 schema
--- it can handle nested <attribute> elements in order to make
--- it can handle nested <attribute> elements in order to make
attribute structures.  This is done by using the <attribute
attribute structures.  This is done by using the <attribute
type="Structure"> form:
type="Structure"> form, for example:


   <attribute name="MySamples" type="Structure">
   <attribute name="MySamples" type="Structure">
Line 106: Line 259:
   </attribute>
   </attribute>


"MyContainer" is now an attribute structure with two attribute fields,
"MyContainer" describes an attribute structure with two attribute fields,
a string and an array of int's.  Note that an attribute structure can
a string "Location" and an array of int's called "Samples".   
Note that an attribute structure of this form can
only contain other <attribute> elements and NOT a value.
only contain other <attribute> elements and NOT a value.


Additionally, DAP2 atomic types (such as UInt32, URL) can also be used
If the container does not already exist, it will be created at the scope it is declared, which could
in addition to the NcML types. The NcML types are mapped onto the
be:
closest DAP2 type internally. 
 
* Global (top of dataset)
* Within a variable's attribute table
* Within another attribute container
 
If an attribute container of the given name already exists at the lexical scope, it is traversed
in order to define the scope for the nested (children) attributes it contains.
 
== Unspecified Variable Type Matching for Lexical Scope ==


----------------------------------------------------------------------
We also allow the type attribute of a variable element (variable@type) to be the empty string
* Example NcML
(or unspecified) when using existing variables to define the lexical scope of an <attribute>
transformation.  In the schema, variable@type is (normally) required. 
 
 
== DAP 2 Types ==
 
Additionally, we allow DAP2 atomic types (such as UInt32, URL) in addition to the NcML types. 
The NcML types are mapped onto the closest DAP2 type internally.
 
 
= NcML Examples =


Example NcML files used by the test-suite can be found in the data
Example NcML files used by the test-suite can be found in the data
directory of a src distribution.
directory of a src distribution.


----------------------------------------------------------------------
''TODO Put a few here to give an idea of functionality...''
* HDF4 DAS Compatibility Issue
 
 
= HDF4 DAS Compatibility Issue =


There is a bug in the Hyrax HDF4 Module such that the DAS produced in
There is a bug in the Hyrax HDF4 Module such that the DAS produced in
incorrect.  If an NcML file is used to "wrap" an HDF4 dataset, the
incorrect.  If an NcML file is used to "wrap" an HDF4 dataset, the
correct DAP2 DAS response will be generated.
correct DAP2 DAS response will be generated, however.  


This is important for those writing NcML for HDF4 data since the
This is important for those writing NcML for HDF4 data since the
Line 131: Line 305:
handle this, the user should start with a "passthrough" NcML file and
handle this, the user should start with a "passthrough" NcML file and
use the DAS from that as the starting point for knowing the structure
use the DAS from that as the starting point for knowing the structure
the NcML handler expects to see in the NcML file.
the NcML handler expects to see in the NcML file. Alternatively, the DDX has the
proper attribute structure as well (the DAS is generated from it).
 


----------------------------------------------------------------------
= Copyright =
* Copyright


This software is copyrighted under the GNU Lesser GPL.  Please see the
This software is copyrighted under the GNU Lesser GPL.  Please see the
files COPYING and COPYRIGHT that came with this distribution.
files COPYING and COPYRIGHT that came with this distribution.

Revision as of 15:00, 29 July 2009

Introduction

[Updated for version 0.9.0 (27 July 2009)]

Welcome to the OPeNDAP NcML Data Handler Module for Hyrax!

This module can be added to a Hyrax installation to extend its data serving capability to NcML 2.2 files (see http://www.unidata.ucar.edu/software/netcdf/ncml/) which provide a way to add metadata to existing datasets.

For information on how to build and install the NcML Data Module, please see the INSTALL file.

This initial version (0.9.0) implements only a subset of NcML 2.2 functionality (specifically, attribute metadata manipulation). It also adds new functionality to NcML 2.2 -- please see the sections "Functionality" and "Additions to NcML 2.2" below.

If you are planning to use the module to wrap HDF4 datasets, please see the section below as there is a bug in HDF4 handler which could cause unexpected parse errors or behavior when wrapping HDF4 datasets.


Installation Overview

The NcML Module requires a working Hyrax installation. It is a module that is dynamically loaded into the Hyrax BES (Back End Server) to allow it to handle NcML files.

Please see the file INSTALL for full build and install instructions as well as requirements.

Once installed, the BES configuration file (bes.conf) needs to be modified to tell it to load the module and where it was installed. A configuration helper script (bes-ncml-data.sh) is installed with the module to automate this process. The script is used on the command line (> prompt) as follows:

  > bes-ncml-data.sh [<bes.conf file to modify> [<bes modules dir>]]

If you are doing a source installation, the `bes-conf' make target runs the script while trying to select paths cleverly, and should be called using:

  > make bes-conf

After installation, you MUST restart Hyrax by restarting the BES and OLFS so the NcML Module is loaded.

Test data is provided to see if the installation was successful (fnoc1_improved.ncml), but it requires the netCDF data handler to be installed since it adds metadata to a netCDF dataset (fnoc1.nc).

Functionality

This version of the NcML Module implements a subset of NcML functionality. It can:

  • Add metadata only to files being served locally (not remotely)
  • Add metadata to only one dataset (one <netcdf> node).
  • Add, modify, and remove metadata (attributes) and not data (variables).

In particular, this version of the NcML module can only handle a subset of the NcML schema. First, it is restricted to a subset of the elements: <netcdf>, <explicit>, <readMetadata>, <attribute>, <variable>, and <remove>.

<netcdf> Element

For this version we assume that:

  • There is only one <netcdf> element in the file.
  • The location attribute (netcdf@location) refers to a local dataset (served by the same Hyrax server)

The <netcdf> element is assumed to be the topmost node.

<readMetadata> Element

The <readMetadata/> element is the default, so is effectively not needed.

<explicit> element

The <explicit/> element simply clears all attribute tables in the referred to netcdf@location before applying the rest of the NcML transformations to the metadata.

<variable> Element

The <variable> element cannot currently be used to add data, but only be used to provide the lexical scope for a contained <attribute> element. For example:

  <variable name="u">
    <attribute name="Metadata" type="string">This is metadata!</attribute>
  </variable>

This assumes that a variable named "u" exists (of any type since we do not specify) and the attribute "Metadata" will be added/modified in the attribute table for that variable.

These can be nested if, for example, if the variable is a Structure:

 <variable name="DATA_GRANULE" type="Structure">
    <variable name="PlanetaryGrid" type="Structure">
      <variable name="percipitate">
	<attribute name="units" type="String" value="inches"/>
      </variable>
    </variable>
  </variable>

This adds a "unit" attribute to the variable "percipitate" within the nested Structure's ("DATA_GRANULE.PlanetaryGrid.percipitate" as fully qualified name). Note that we refer to the type explicitly as a "Structure" so the parser knows to traverse the tree.

<attribute> Element

As an overview, whenever the parser encounters an <attribute> with a non-existing name (at the current scope), it creates a new one, whether a container or atomic attribute (see below). If the attribute exists, its value and/or type is modified to those specified in the <attribute> element. If an attribute structure (container) exists, it is used to define a nested lexical scope for child attributes.

Adding New Attributes or Modifying an Existing Attribute

If a specified attribute with the attribute@name does not exist at the current lexical scope, a new one is created with the given type and value. For example, assume "new_metadata" doesn't exist at the current parse scope. Then:

<attribute name="new_metadata" type="string" value="This is a new entry!"/>

will create the attribute at that scope. Note that value can be specified in the content of the element as well. This is identical to the above:

<attribute name="new_metadata" type="string">This is a new entry!</attribute>

If the attribute@name already exists at the scope, it is modified to contain the specified type and value.


Arrays

As in NcML, for numerical types an array can be specified by separating the tokens by whitespace (default) or be specifying the token separator with attribute@separator. For example,

<attribute name="myArray" type="int">1 2 3</attribute>

and

<attribute name="myArray" type="int" separator=",">1,2,3</attribute>

both specify the same array of three integers named "myArray".

Structures (Containers)

We use attribute@type="Structure" to define a new (or existing) attribute container. So if we wanted to add a new attribute structure, we'd use something like this:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

Assuming "MySamples" doesn't already exist, an attribute container will be created at the current scope and the "Location" and "Samples" attributes will be added to it.

Note that we can create nested attribute structures to arbitrary depth this way as well.

If the attribute container with the given name already exists at the current scope, then the attribute@type="Structure" form is used to define the lexical scope for the container. In other words, child <attribute> elements will be processed within the scope of the container. For example, in the above example, if "MySamples" already exists, then the "Location" and "Samples" will be processed within the existing container (they may or may not already exist as well).

Renaming an Existing Attribute or Attribute Container

We also support the attribute@orgName attribute for renaming attributes.

For example,

<attribute name="NewName" orgName="OldName" type="string"/>

will rename an existing attribute "OldName" to "NewName" while leaving its value alone. If attribute@value is also specified, then the attribute is renamed and has its value modified.

This works for renaming attribute containers as well:

<attribute name="MyNewContainer" orgName="MyOldContainer" type="Structure"/>

will rename an existing "MyOldContainer" to "MyNewContainer". Note that any children of this container will remain in it.


<remove> Element

Currently, the <remove> element can only remove attributes and not variables. In other words, its type must be "attribute". For example:

 <attribute name="NC_GLOBAL" type="Structure">
   <remove name="base_time" type="attribute"/>
 </attribute>

will remove the attribute named "base_time" in the attribute structure named "NC_GLOBAL".

Note that this works for attribute containers as well! We could recursively remove the entire attribute container (i.e. it and all its children) with:

<remove name="NC_GLOBAL" type="attribute"/>


It is a parse error if the given attribute doesn't exist.

Future Additions

We plan to add more features and NcML functionality, such as the ability to add and remove variables and to aggregate data in future versions.

Errors

There are two types of error messages that may be returned:

  • Internal Error
  • Parse Error

Internal Errors

Internal errors should be reported to support@opendap.org as they are likely bugs.

Parse Errors

Parse errors are user errors in the NcML file. These could be malformed XML, malformed NcML, unimplemented features of NcML, or could be errors in referring to the wrapped dataset.

The error message should specify the error condition as well as the "current scope" as a fully qualified DAP name within the loaded dataset. This should be enough information to correct the parse error as new NcML files are created.

The parser will generate parse errors in various situations where it expects to find certain structure in the underlying dataset. Some examples:

  • A variable of the given name was not found at the current scope
  • attribute@orgName was specified, but the attribute cannot be found at current scope.
  • attribute@orgName was specified, but the new name is already used at current scope.
  • remove specified a non-existing attribute name


Additions to NcML 2.2

This section will keep track of changes to the NcML 2.2 schema. Eventually these will be rolled into a new schema.

Attribute Structures (Containers)

This module also adds functionality beyond the current NcML 2.2 schema --- it can handle nested <attribute> elements in order to make attribute structures. This is done by using the <attribute type="Structure"> form, for example:

  <attribute name="MySamples" type="Structure">
    <attribute name="Location" type="string" value="Station 1"/>
    <attribute name="Samples" type="int">1 4 6</attribute>
  </attribute>

"MyContainer" describes an attribute structure with two attribute fields, a string "Location" and an array of int's called "Samples". Note that an attribute structure of this form can only contain other <attribute> elements and NOT a value.

If the container does not already exist, it will be created at the scope it is declared, which could be:

  • Global (top of dataset)
  • Within a variable's attribute table
  • Within another attribute container

If an attribute container of the given name already exists at the lexical scope, it is traversed in order to define the scope for the nested (children) attributes it contains.

Unspecified Variable Type Matching for Lexical Scope

We also allow the type attribute of a variable element (variable@type) to be the empty string (or unspecified) when using existing variables to define the lexical scope of an <attribute> transformation. In the schema, variable@type is (normally) required.


DAP 2 Types

Additionally, we allow DAP2 atomic types (such as UInt32, URL) in addition to the NcML types. The NcML types are mapped onto the closest DAP2 type internally.


NcML Examples

Example NcML files used by the test-suite can be found in the data directory of a src distribution.

TODO Put a few here to give an idea of functionality...


HDF4 DAS Compatibility Issue

There is a bug in the Hyrax HDF4 Module such that the DAS produced in incorrect. If an NcML file is used to "wrap" an HDF4 dataset, the correct DAP2 DAS response will be generated, however.

This is important for those writing NcML for HDF4 data since the lexical scope for attributes relies on the correct DAS form --- to handle this, the user should start with a "passthrough" NcML file and use the DAS from that as the starting point for knowing the structure the NcML handler expects to see in the NcML file. Alternatively, the DDX has the proper attribute structure as well (the DAS is generated from it).


Copyright

This software is copyrighted under the GNU Lesser GPL. Please see the files COPYING and COPYRIGHT that came with this distribution.