NCML Module Aggregation JoinNew: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
No edit summary
Line 3: Line 3:


= Join New Aggregation =
= Join New Aggregation =
A ''joinNew'' aggregation joins existing datasets along a new outer Array dimension.  Essentially, it adds a new index to the existing variable which points into the values in each member dataset.  One useful example of this aggregation is for joining multiple samples of data from different times into one virtual dataset containing all the times.  We will first provide a basic introduction to the joinNew aggregation, then demonstrate examples for the various ways to specify the members datasets of an aggregation, the values for the new dimension's coordinate variable (map vector), and ways to specify metadata for this aggregation.
The reader is also directed to a basic tutorial of this NcML aggregation which may be found at http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html#joinNew. 
'''PLEASE NOTE''' that our syntax is slightly different than that of the THREDDS Data Server (TDS), so please refer to this tutorial when using the Hyrax NcML Module!
== Introduction ==


A ''joinNew'' aggregation combines a variable with data across ''n'' datasets by creating a new ''outer dimension'' and placing the data from aggregation member ''i'' into the element ''i'' of the new outer dimension.  The data samples all must have the same data syntax; specifically the DDS of the variables must all match.  For example, if the ''aggregation variable'' has name '''sample''' and is a 10x10 Array of float32, then all the member datasets in the aggregation must include a variable named '''sample''' which are all also 10x10 Arrays of float32.  If there were 100 datasets specified in the aggregation, the resulting DDS would contain a variable named '''sample''' that was now of data shape 10x10x100.     
A ''joinNew'' aggregation combines a variable with data across ''n'' datasets by creating a new ''outer dimension'' and placing the data from aggregation member ''i'' into the element ''i'' of the new outer dimension.  The data samples all must have the same data syntax; specifically the DDS of the variables must all match.  For example, if the ''aggregation variable'' has name '''sample''' and is a 10x10 Array of float32, then all the member datasets in the aggregation must include a variable named '''sample''' which are all also 10x10 Arrays of float32.  If there were 100 datasets specified in the aggregation, the resulting DDS would contain a variable named '''sample''' that was now of data shape 10x10x100.     


In addition, a new variable specifying data values for the new dimension will be created at the same scope as (a sibling of) the specified aggregation variable.  For example, if the new dimension is called "filename" and the new dimension's values are unspecified (the default), then an Array of type String will be created with one element for each member dataset --- the filename of the dataset.  Additionally, if the aggregation variable was a DAP Grid, this new dimension data variable will ''also'' be added as a new Map vector inside the Grid to maintain the Grid specification.
In addition, a new variable specifying data values for the new dimension will be created at the same scope as (a sibling of) the specified aggregation variable.  For example, if the new dimension is called "filename" and the new dimension's values are unspecified (the default), then an Array of type String will be created with one element for each member dataset --- the filename of the dataset.  Additionally, if the aggregation variable was a DAP Grid, this new dimension data variable will ''also'' be added as a new Map vector inside the Grid to maintain the Grid specification.
The reader is also directed to a basic tutorial of this NcML aggregation which may be found at http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html#joinNew.  '''PLEASE NOTE''' that our syntax is slightly different than that of the THREDDS Data Server (TDS), so please refer to this tutorial when using the Hyrax NcML Module.


There are multiple ways to specify the member datasets of a joinNew aggregation:
There are multiple ways to specify the member datasets of a joinNew aggregation:

Revision as of 19:17, 15 April 2010


Join New Aggregation

A joinNew aggregation joins existing datasets along a new outer Array dimension. Essentially, it adds a new index to the existing variable which points into the values in each member dataset. One useful example of this aggregation is for joining multiple samples of data from different times into one virtual dataset containing all the times. We will first provide a basic introduction to the joinNew aggregation, then demonstrate examples for the various ways to specify the members datasets of an aggregation, the values for the new dimension's coordinate variable (map vector), and ways to specify metadata for this aggregation.

The reader is also directed to a basic tutorial of this NcML aggregation which may be found at http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html#joinNew.

PLEASE NOTE that our syntax is slightly different than that of the THREDDS Data Server (TDS), so please refer to this tutorial when using the Hyrax NcML Module!

Introduction

A joinNew aggregation combines a variable with data across n datasets by creating a new outer dimension and placing the data from aggregation member i into the element i of the new outer dimension. The data samples all must have the same data syntax; specifically the DDS of the variables must all match. For example, if the aggregation variable has name sample and is a 10x10 Array of float32, then all the member datasets in the aggregation must include a variable named sample which are all also 10x10 Arrays of float32. If there were 100 datasets specified in the aggregation, the resulting DDS would contain a variable named sample that was now of data shape 10x10x100.

In addition, a new variable specifying data values for the new dimension will be created at the same scope as (a sibling of) the specified aggregation variable. For example, if the new dimension is called "filename" and the new dimension's values are unspecified (the default), then an Array of type String will be created with one element for each member dataset --- the filename of the dataset. Additionally, if the aggregation variable was a DAP Grid, this new dimension data variable will also be added as a new Map vector inside the Grid to maintain the Grid specification.

There are multiple ways to specify the member datasets of a joinNew aggregation:

  • Explicit: using <netcdf> elements
  • Scan: scan a directory tree for files matching a conjunction of certain criteria:
    • Specific suffix
    • Older than a specific duration
    • Matching a specific regular expression
    • Either in a specific directory or recursively searching subdirectories

Additionally, there are multiple ways to specify the new coordinate variable's (the new outer dimension's associated data variable) data values:

  • Default: An Array of type String containing the filenames of the member datasets
  • Explicit Value Array: Explicit list of values of a specific data type, exactly one per dataset
  • Dynamic Array: a numeric Array variable specified using start and increment values -- one value is generated automatically per dataset
  • Timestamp from Filename: An Array of String with values of ISO 8601 Timestamps extracted from the dataset filenames using a specified Java SimpleDataFormat string.