Parse NcML: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 9: Line 9:
NcML can be used to both add new attributes to existing variables in a data set, add new variables to a data set, create new data sets and define aggregations of any of these three this (exiting variables, new variables or new data sets). For the purposes of the NcML AIS for Hyrax, the initial version of this handler will be limited to adding attributes to existing variables in existing data sets. Subsequent versions will add support for defining new variables and new data sets (really those two capabilities are very similar) and a separate project will add support for NcML's ''<aggregation>'' element.
NcML can be used to both add new attributes to existing variables in a data set, add new variables to a data set, create new data sets and define aggregations of any of these three this (exiting variables, new variables or new data sets). For the purposes of the NcML AIS for Hyrax, the initial version of this handler will be limited to adding attributes to existing variables in existing data sets. Subsequent versions will add support for defining new variables and new data sets (really those two capabilities are very similar) and a separate project will add support for NcML's ''<aggregation>'' element.


In the following subsections, each of the NcML elements are described with respect to how they should be interpreted by this handler. The elements are: '''<netcdf>''', '''<readMetadata>''', '''<explicit>''', '''<group>''', '''<variable>''', <dimension>, '''<attribute>''', <remove>. Those listed in bold must be handled by the first version of the handler. Note that either ''<readMetadata>'' or ''<explicit>'' must be given, but never both.
In the following subsections, each of the NcML elements are described with respect to how they should be interpreted by this handler. The elements are: '''<netcdf>''', '''<readMetadata>''', '''<explicit>''', <group>, '''<variable>''', <dimension>, '''<attribute>''', <remove>. Those listed in bold must be handled by the first version of the handler. Note that either ''<readMetadata>'' or ''<explicit>'' must be given, but never both.


=== netcdf ===
=== netcdf ===
Line 29: Line 29:
=== group ===
=== group ===


A group will be used to represent, in DAP2 or any DAP version up to 3.3, a DAP Structure, Sequence or Grid. That is, all of the types that define a lexical scope can be represented using the ''<group>'' element. Nesting of the ''group'' elements must match the nesting of the various constructor types.  
'''A group will be ignored until we start supporting Groups in DAP (DAP 3.4?).'''
 
For versions of the DAP that actually include the notion of a Group type, the ''group'' element will apply to that as well.
 
Limitation: DAP supports arrays of Structures but ''<group>'' does not support arrays. This may not be an issue in practice since we never alter the type of a variable using the NcML AIS so we don't need the full range of expressiveness for existing variables. For new variables, a feature most likely used to add arrays of numerical data, this is not a factor. The only type we could not add would be an array of Structures. However, since the notion of group as representing any of a set of lexical types means that we're not actually able to define new instances of any of the Structure, Sequence or Grid types because any such definition would be ambiguous.


=== variable ===
=== variable ===


The the ''<variable>'' element will be used to represent any of the scalar and array types of Bytes, integers, floats and strings/URLs.
The the ''<variable>'' element will be used to represent any of the scalar and array types of Bytes, integers, floats and strings/URLs. Note that the NcML ''<variable>'' element has a ''variable@type'' attribute and that attribute can have ''structure'' as its value. Form this handler we will use ''<variable type="structure">'' elements to represent DAP Structures, Grids and Sequences


=== dimension ===
=== dimension ===
Line 54: Line 50:


'''This element might not be supported in the initial versions of the handler.'''
'''This element might not be supported in the initial versions of the handler.'''
=== Notes on NcML ===
# NcML considers everything to have ''rank'' and thus does not name a special constructor type ''Array''. Instead scalars have rank zero, arrays have rank greater than zero. A rank of one or more is denoted using ''variable@shape'' which lists the dimension names. This difference is syntactic only, but worth keeping in mind.
# Expand ''variable@shape'' so that it can contain integer dimension sizes in addition to names. This is the case in the code at Unidata, but it has not made it into the schema yet (April 2009).
# Although not captured by the schema, it appears that a NcML file that modifies the attributes of an existing variable does not have to specify either the ''variable@type'' or ''variable@shape'' attributes. The ''variable@type'' and ''@shape'' attributes might only come into play when/if we use NcML/AIS to add new variables to the data set or to define a complete data set (without referencing an existing data set as a base to build onto).
# The NcML 2.2 schema uses one ''DataType'' (<nowiki><xsd:simpleType name="DataType"></nowiki>) for both variables and attributes; we can use the ''Structure'' data type value for attribute containers.
# NcML does not have an ''otherXML'' ''attribute@type'' so we'll have to add that. Maybe we can overload the ''attribute@shape'' attribute so that it has the special dimension name ''otherXML''? This idea will make purists gag, and rightly so, but it might be a good way to try NcML out without changing the design at all.
Longer-term:
# Separate the ''attribute'' and ''variable'' element types so that there's a different type (xsd:simpleType) for each.
# Add Grid and Sequence to the set of types for a ''variable'' element
# Add ''otherXML'' to the set of types for ''attribute''.

Revision as of 22:22, 22 April 2009

Back: AIS Using NcML

Since we are interested in only the part of NcML which adds attributes and variables to a data set, build a parser for those parts of NcML and ignore the parts for aggregation.

The NcML schema.

How to treat NcML elements relative to the DAP

NcML can be used to both add new attributes to existing variables in a data set, add new variables to a data set, create new data sets and define aggregations of any of these three this (exiting variables, new variables or new data sets). For the purposes of the NcML AIS for Hyrax, the initial version of this handler will be limited to adding attributes to existing variables in existing data sets. Subsequent versions will add support for defining new variables and new data sets (really those two capabilities are very similar) and a separate project will add support for NcML's <aggregation> element.

In the following subsections, each of the NcML elements are described with respect to how they should be interpreted by this handler. The elements are: <netcdf>, <readMetadata>, <explicit>, <group>, <variable>, <dimension>, <attribute>, <remove>. Those listed in bold must be handled by the first version of the handler. Note that either <readMetadata> or <explicit> must be given, but never both.

netcdf

The design of NcML is such that using it as an AIS - augmenting an existing dataset - is really exploiting one of its optional features. When the <netcdf> element has a netcdf@localtion attribute, then the NcML file is providing information that will augment the stuff in the data set named by location. The original NcML design intended for location to name a netCDF file, but we are going to generalize that to be any file served by the BES running this handler (for location that is a file:/ URL) or any remote Hyrax server (when location is a DAP URL).

In most uses foreseen for this handler, the netcdf@location attribute will always be present.

In the initial version of the handler, we might require this attribute be present and we might not allow new variables to be defined, and instead only allow attributes to be specified for existing variables.

readMetadata

If present read the metadata from the data source and augment with the information supplied in this file. This will normally be present.

explicit

If present, retain only the variables from the data set named by netcdf@location and then add the information included here.

group

A group will be ignored until we start supporting Groups in DAP (DAP 3.4?).

variable

The the <variable> element will be used to represent any of the scalar and array types of Bytes, integers, floats and strings/URLs. Note that the NcML <variable> element has a variable@type attribute and that attribute can have structure as its value. Form this handler we will use <variable type="structure"> elements to represent DAP Structures, Grids and Sequences

dimension

Dimensions are part of the netCDF common data model and bind a name with a size (but not a type). These will be part of DAP 3.3(?) but are not currently part of our software. When a dimension element is processed, it will be used to create a binding between the name and the size (an integer) and any used of that name in a 'dimension context' will be treated as the same as the integer size. In practice, dimensions are used to declare the sizes of an Array with the additional notion that all variables that use a particular dimension are related (as in a DAP Grid). However, in a data set served by DAP, an existing Grid will already be explicit, so the dimension element is not needed to establish this relation.

Because the <dimension> element encodes information that is redundant for a DAP2--3.2 data source, we might not support it in the initial version of this handler.

attribute

The <attribute> element is used to add new attributes, or replace existing attributes, of a variable.

remove

The <remove> element is used to remove attributes from a variable.

This element might not be supported in the initial versions of the handler.

Notes on NcML

  1. NcML considers everything to have rank and thus does not name a special constructor type Array. Instead scalars have rank zero, arrays have rank greater than zero. A rank of one or more is denoted using variable@shape which lists the dimension names. This difference is syntactic only, but worth keeping in mind.
  2. Expand variable@shape so that it can contain integer dimension sizes in addition to names. This is the case in the code at Unidata, but it has not made it into the schema yet (April 2009).
  3. Although not captured by the schema, it appears that a NcML file that modifies the attributes of an existing variable does not have to specify either the variable@type or variable@shape attributes. The variable@type and @shape attributes might only come into play when/if we use NcML/AIS to add new variables to the data set or to define a complete data set (without referencing an existing data set as a base to build onto).
  4. The NcML 2.2 schema uses one DataType (<xsd:simpleType name="DataType">) for both variables and attributes; we can use the Structure data type value for attribute containers.
  5. NcML does not have an otherXML attribute@type so we'll have to add that. Maybe we can overload the attribute@shape attribute so that it has the special dimension name otherXML? This idea will make purists gag, and rightly so, but it might be a good way to try NcML out without changing the design at all.

Longer-term:

  1. Separate the attribute and variable element types so that there's a different type (xsd:simpleType) for each.
  2. Add Grid and Sequence to the set of types for a variable element
  3. Add otherXML to the set of types for attribute.