JoinNew Explicit Dataset Tutorial

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Default Values for the New Coordinate Variable (on a Grid)

The default for the new coordinate variable is to be of type String with the location of the dataset as the value. For example, the following NcML file:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="Simple test of joinNew Grid aggregation">
  
  <aggregation type="joinNew" dimName="filename">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf"/> 
  </aggregation> 
  
</netcdf>

specifies an aggregation on a Grid variable dsp_band_1 sampled in four HDF4 datasets listed explicitly.

First, the data structure (DDS) is:

Dataset {
    Grid {
      Array:
        UInt32 dsp_band_1[filename = 4][lat = 1024][lon = 1024];
      Maps:
        String filename[filename = 4];
        Float64 lat[1024];
        Float64 lon[1024];
    } dsp_band_1;
    String filename[filename = 4];
} joinNew_grid.ncml;

We see the aggregated variable dsp_band_1 has the new outer dimension filename. A coordinate variable filename[filename]' was created as a sibling of the aggregated variable (the top level Grid we specified) and was also copied into the aggregated Grid as a new map vector.

The ASCII data response for just the new coordinate variable filename[filename] is:

String filename[filename = 4] = {"data/ncml/agg/grids/f97182070958.hdf", 
"data/ncml/agg/grids/f97182183448.hdf", 
"data/ncml/agg/grids/f97183065853.hdf", 
"data/ncml/agg/grids/f97183182355.hdf"};

We see that the absolute location we specified for the dataset as a String is the value for each element of the new coordinate variable.

The newly added map dsp_band_1.filename contains a copy of this data.

Explicitly Specifying the New Coordinate Variable

If the author wishes to have the new coordinate variable be of a specific data type with non-uniform values, then they must specify the new coordinate variable explicitly.

Array Virtual Dataset

Here's an example using a contrived pure virtual dataset:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="JoinNew on Array with Explicit Map">

  <!-- joinNew and form new outer dimension "day" -->
  <aggregation type="joinNew" dimName="day">
    <variableAgg name="V"/>

    <netcdf title="Slice 1">
      <dimension name="sensors" length="3"/>
      <variable name="V" type="int" shape="sensors">
	<values>1 2 3</values>
      </variable>
    </netcdf>

    <netcdf title="Slice 2">
      <dimension name="sensors" length="3"/>
      <variable name="V" type="int" shape="sensors">
	<values>4 5 6</values>
      </variable>
    </netcdf>

  </aggregation>

  <!-- This is recognized as the definition of the new coordinate variable, 
       since it has the form day[day] where day is the dimName for the aggregation. 
       It MUST be specified after the aggregation, so that the dimension size of day
      has been calculated.
  -->
  <variable name="day" type="int" shape="day">
    <!-- Note: metadata may be added here as normal! -->
    <attribute name="units" type="string">Days since 01/01/2010</attribute>
    <values>1 30</values>
  </variable>
	     
</netcdf>

The resulting DDS:

Dataset {
    Int32 V[day = 2][sensors = 3];
    Int32 day[day = 2];
} joinNew_with_explicit_map.ncml;

and the ASCII data:

Int32 V[day = 2][sensors = 3] = {{1, 2, 3},{4, 5, 6}};
Int32 day[day = 2] = {1, 30};

Note that the values we have explicitly given are used here as well as the specified NcML type, int which is mapped to a DAP Int32.

If metadata is desired on the new coordinate variable, it may be added just as in a normal new variable declaration. We'll give more examples of this later.

Grid with Explicit Map

Let's give one more example using a Grid to demonstrate the recognition of the coordinate variable as it is added to the Grid as the map vector for the new dimension:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="joinNew Grid aggregation with explicit map">
  
  <aggregation type="joinNew" dimName="sample_time">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf"/> 
  </aggregation> 
  
  <!-- Note: values are contrived -->
  <variable name="sample_time" shape="sample_time" type="float">
    <!-- Metadata here will also show up in the Grid map -->
    <attribute name="units" type="string">Days since 01/01/2010</attribute>
    <values>100 200 400 1000</values>
  </variable>

</netcdf>

This produces the DDS:

Dataset {
    Grid {
      Array:
        UInt32 dsp_band_1[sample_time = 4][lat = 1024][lon = 1024];
      Maps:
        Float32 sample_time[sample_time = 4];
        Float64 lat[1024];
        Float64 lon[1024];
    } dsp_band_1;
    Float32 sample_time[sample_time = 4];
} joinNew_grid_explicit_map.ncml;

You can see the explicit coordinate variable sample_time was found as the sibling of the aggregated Grid as was added as the new map vector for the Grid.

The values for the projected coordinate variables are as expected:

Float32 sample_time[sample_time = 4] = {100, 200, 400, 1000};

Errors

It is a Parse Error to:

  • Give a different number of values for the explicit coordinate variable than their are specified datasets
  • Specify the new coordinate variable prior to the <aggregation> element since the dimension size is not yet known


Autogenerated Uniform Numeric Values

If the number of datasets might vary (for example, if a <scan> element, described later, is used), but the values are uniform, the start/increment version of the <values> element may be used to generate the values for the new coordinate variable. For example:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="JoinNew on Array with Explicit Autogenerated Map">

  <aggregation type="joinNew" dimName="day">
    <variableAgg name="V"/>

    <netcdf title="Slice 1">
      <dimension name="sensors" length="3"/>
      <variable name="V" type="int" shape="sensors">
	<values>1 2 3</values>
      </variable>
    </netcdf>

    <netcdf title="Slice 2">
      <dimension name="sensors" length="3"/>
      <variable name="V" type="int" shape="sensors">
	<values>4 5 6</values>
      </variable>
    </netcdf>

  </aggregation>

  <!-- Explicit coordinate variable definition -->
  <variable name="day" type="int" shape="day">
    <attribute name="units" type="string" value="days since 2000-01-01 00:00"/>
    <!-- We sample once a week... -->
    <values start="1" increment="7"/>
  </variable>
	     
</netcdf>

The DDS is the same as before and the coordinate variable is generated as expected:

Int32 sample_time[sample_time = 4] = {1, 8, 15, 22};

Note that this form is useful for uniform sampled datasets (or if only a numeric index is desired) where the variable need not be changed as datasets are added. It is especially useful for a <scan> element that refers to a dynamic number of files that can be described with a uniformly varying index.

Explicitly Using coordValue Attribute of <netcdf>

The netcdf@coordValue may be used to specify the value for the given dataset right where the dataset is declared. This attribute will cause a coordinate variable to be automatically generated with the given values for each dataset filled in. The new coordinate variable will be of type double if the coordValue's can all be parsed as a number, otherwise they will be of type String.

String coordValue Example

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="joinNew Aggregation with explicit string coordValue">
  
  <aggregation type="joinNew" dimName="source">
    <variableAgg name="u"/>
    <variableAgg name="v"/>

    <!-- Same dataset a few times, but with different coordVal -->
    <netcdf title="Dataset 1" location="data/ncml/fnoc1.nc" coordValue="Station_1"/>
    <netcdf title="Dataset 2" location="data/ncml/fnoc1.nc" coordValue="Station_2"/>
    <netcdf title="Dataset 3" location="data/ncml/fnoc1.nc" coordValue="Station_3"/>
  </aggregation>

</netcdf>

This results in the following DDS:

Dataset {
    Int16 u[source = 3][time_a = 16][lat = 17][lon = 21];
    Int16 v[source = 3][time_a = 16][lat = 17][lon = 21];
    Float32 lat[lat = 17];
    Float32 lon[lon = 21];
    Float32 time[time = 16];
    String source[source = 3];
} joinNew_string_coordVal.ncml;

and ASCII data response of the projected coordinate variable is:

String source[source = 3] = {"Station_1", "Station_2", "Station_3"};

as we specified.

Numeric (double) Use of coordValue

If the first coordValue can be successfully parsed as a double numeric type, then a coordinate variable of type double (Float64) is created and all remaining coordValue specifications must be parsable as a double or a Parse Error is thrown.

Using the same example but with numbers instead:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="joinNew Aggregation with numeric coordValue">
  
  <aggregation type="joinNew" dimName="source">
    <variableAgg name="u"/>
    <variableAgg name="v"/>

    <!-- Same dataset a few times, but with different coordVal -->
    <netcdf title="Dataset 1" location="data/ncml/fnoc1.nc" coordValue="1.2"/>
    <netcdf title="Dataset 2" location="data/ncml/fnoc1.nc" coordValue="3.4"/>
    <netcdf title="Dataset 3" location="data/ncml/fnoc1.nc" coordValue="5.6"/>

  </aggregation>
</netcdf>

This time we see that a Float64 array is created:

Dataset {
    Int16 u[source = 3][time_a = 16][lat = 17][lon = 21];
    Int16 v[source = 3][time_a = 16][lat = 17][lon = 21];
    Float32 lat[lat = 17];
    Float32 lon[lon = 21];
    Float32 time[time = 16];
    Float64 source[source = 3];
} joinNew_numeric_coordValue.ncml;

The values we specified are in the coordinate variable ASCII data:

Float64 source[source = 3] = {1.2, 3.4, 5.6};