Metadata on Aggregations Tutorial

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Metadata Specification on the New Coordinate Variable

We can add metadata to the new coordinate variable in two ways:

  • Adding it to the <variable> element directly in the case where the new coordinate variable and values is defined explicitly
  • Adding the metadata to an automatically created coordinate variable by leaving the <values> element out

The first case we have already seen, but we will show it again explicitly. The second case is a little different and we'll cover it separately.

Adding Metadata to the Explicit New Coordinate Variable

We have already seen examples of explicitly defining the new coordinate variable and giving its values. In these cases, the metadata is added to the new coordinate variable exactly like any other variable. Let's see the example again:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="joinNew Grid aggregation with explicit map">
  
  <aggregation type="joinNew" dimName="sample_time">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf"/> 
  </aggregation> 
  
  <variable name="sample_time" shape="sample_time" type="float">
    <!-- Metadata here will also show up in the Grid map -->
    <attribute name="units" type="string">Days since 01/01/2010</attribute>
    <values>100 200 400 1000</values>
  </variable>

</netcdf>

We see that the units attribute for the new coordinate variable has been specified. This subset of the DAS (we don't show the extensive global metadata) shows this:

   dsp_band_1 {
        Byte dsp_PixelType 1;
        Byte dsp_PixelSize 2;
        UInt16 dsp_Flag 0;
        UInt16 dsp_nBits 16;
        Int32 dsp_LineSize 0;
        String dsp_cal_name "Temperature";
        String units "Temp";
        UInt16 dsp_cal_eqnNumber 2;
        UInt16 dsp_cal_CoeffsLength 8;
        Float32 dsp_cal_coeffs 0.125, -4;
        Float32 scale_factor 0.125;
        Float32 add_off -4;
        sample_time {
 --->           String units "Days since 01/01/2010";
        }
        dsp_band_1 {
        }
        lat {
            String name "lat";
            String long_name "latitude";
        }
        lon {
            String name "lon";
            String long_name "longitude";
        }
    }
    sample_time {
--->        String units "Days since 01/01/2010";
    }

We show the new metadata with the "--->" marker. Note that the metadata for the coordinate variable is also copied into the new map vector of the aggregated Grid.

Metadata can be specified in this way for any case where the new coordinate variable is listed explicitly.

Adding Metadata to An Autogenerated Coordinate Variable

If we expect the coordinate variable to be automatically added, we can also specify its metadata by referring to the variable without setting its values. This is useful in the case of using netcdf@coordValue and we will also see it is very useful when using a <scan> element for dynamic aggregations.

Here's a trivial example using the default case of the filename:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="Test of adding metadata to the new map vector in a joinNew Grid aggregation">
 
  <aggregation type="joinNew" dimName="filename">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
  </aggregation> 

  <!-- 
       Add metadata to the created new outer dimension variable after
       the aggregation is defined by using a placeholder variable
       whose values will be defined automatically by the aggregation.
  -->  
  <variable type="string" name="filename">
    <attribute name="units" type="string">Filename of the dataset</attribute>
  </variable>

</netcdf>

Note here that we just neglected to add a <values> element since we want the values to be generated automatically by the aggregation. Note also that this is almost the same way we'd modify an existing variable's metadata. The only difference is we need to "declare" the type of the variable here since technically the variable specified here is a placeholder for the generated coordinate variable. So after the aggregation is specified, we are simply modifying the created variable's metadata, in this case the newly generated map vector.

Here is the DAS portion with just the aggregated Grid and the new coordinate variable:

   dsp_band_1 {
        Byte dsp_PixelType 1;
        Byte dsp_PixelSize 2;
        UInt16 dsp_Flag 0;
        UInt16 dsp_nBits 16;
        Int32 dsp_LineSize 0;
        String dsp_cal_name "Temperature";
        String units "Temp";
        UInt16 dsp_cal_eqnNumber 2;
        UInt16 dsp_cal_CoeffsLength 8;
        Float32 dsp_cal_coeffs 0.125, -4;
        Float32 scale_factor 0.125;
        Float32 add_off -4;
        filename {
            String units "Filename of the dataset";
        }
        dsp_band_1 {
        }
        lat {
            String name "lat";
            String long_name "latitude";
        }
        lon {
            String name "lon";
            String long_name "longitude";
        }
    }
    filename {
        String units "Filename of the dataset";
    }

Here also the map vector gets a copy of the coordinate variable's metadata.

We can also use this syntax in the case that netcdf@coordValue was used to autogenerate the coordinate variable:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="joinNew Grid aggregation with coordValue and metadata">
  
  <aggregation type="joinNew" dimName="sample_time">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf" coordValue="1"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf" coordValue="10"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf" coordValue="15"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf" coordValue="25"/> 
  </aggregation> 
  
  <!-- Note: values are contrived -->
  <variable name="sample_time" shape="sample_time" type="double">
    <attribute name="units" type="string">Days since 01/01/2010</attribute>
  </variable>

</netcdf>

Here we see the metadata added to the new coordinate variable and associated map vector:

Attributes {
   dsp_band_1 {
        Byte dsp_PixelType 1;
        Byte dsp_PixelSize 2;
        UInt16 dsp_Flag 0;
        UInt16 dsp_nBits 16;
        Int32 dsp_LineSize 0;
        String dsp_cal_name "Temperature";
        String units "Temp";
        UInt16 dsp_cal_eqnNumber 2;
        UInt16 dsp_cal_CoeffsLength 8;
        Float32 dsp_cal_coeffs 0.125, -4;
        Float32 scale_factor 0.125;
        Float32 add_off -4;
        sample_time {
 --->           String units "Days since 01/01/2010";
        }
        dsp_band_1 {
        }
        lat {
            String name "lat";
            String long_name "latitude";
        }
        lon {
            String name "lon";
            String long_name "longitude";
        }
    }
    sample_time {
--->        String units "Days since 01/01/2010";
    }
}

Parse Errors

Since the processing of the aggregation takes a few steps, care must be taken in specifying the coordinate variable in the cases of autogenerated variables.

In particular, it is a Parse Error:

  • To specify the shape of the autogenerated coordinate variable if <values> are not set
  • To leave out the type or to use a type that does not match the autogenerated type

The second can be somewhat tricky to remember since for existing variables it can be safely left out and the variable will be "found". Since aggregations get processed fulled when the <netcdf> element containing them is closed, the specified coordinate variables in these cases are placeholders for the automatically generated variables, so they must match the name and type, but not specify a shape since the shape (size of the new aggregation dimension) is not known until this occurs.

Metadata Specification on the Aggregation Variable Itself

It is also possible to add or modify the attributes on the aggregation variable itself. If it is a Grid, metadata can be modified on the contained array or maps as well. Note that the aggregated variable begins with the metadata from the first dataset specified in the aggregation just like in a union aggregation.

We will use a Grid as our primary example since other datatypes are similar and simpler and this case will cover those as well.

An Aggregated Grid example

Let's start from this example aggregation:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf> 
  <aggregation type="joinNew" dimName="filename">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf"/> 
  </aggregation> 
</netcdf>

Here is the DAS for this unmodifed aggregated Grid (with the global dataset metadata removed):

Attributes {
   dsp_band_1 {
        Byte dsp_PixelType 1;
        Byte dsp_PixelSize 2;
        UInt16 dsp_Flag 0;
        UInt16 dsp_nBits 16;
        Int32 dsp_LineSize 0;
        String dsp_cal_name "Temperature";
        String units "Temp";
        UInt16 dsp_cal_eqnNumber 2;
        UInt16 dsp_cal_CoeffsLength 8;
        Float32 dsp_cal_coeffs 0.125, -4;
        Float32 scale_factor 0.125;
        Float32 add_off -4;
        filename {
        }
        dsp_band_1 {
        }
        lat {
            String name "lat";
            String long_name "latitude";
        }
        lon {
            String name "lon";
            String long_name "longitude";
        }
    }
    filename {
    }
}

We will now add attributes to all the existing parts of the Grid:

  • The Grid Structure itself
  • The Array of data within the Grid
  • Both existing map vectors (lat and lon)

We have already seen how to add data to the new coordinate variable as well.

Here's the NcML we will use. Note we have added units data to the subparts of the Grid, and also added some metadata to the grid itself.

<?xml version="1.0" encoding="UTF-8"?>
<netcdf title="Showing how to add metadata to all parts of an aggregated grid">
  
  <aggregation type="joinNew" dimName="filename">
    <variableAgg name="dsp_band_1"/> 
    <netcdf location="data/ncml/agg/grids/f97182070958.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97182183448.hdf"/> 
    <netcdf location="data/ncml/agg/grids/f97183065853.hdf"/>  
    <netcdf location="data/ncml/agg/grids/f97183182355.hdf"/> 
  </aggregation> 

  <variable name="dsp_band_1" type="Structure"> <!-- Enter the Grid level scope -->
    
1)  <attribute name="Info" type="String">This is metadata on the Grid itself.</attribute>
    
    <variable name="dsp_band_1"> <!-- Enter the scope of the Array dsp_band_1 -->
2)    <attribute name="units" type="String">Temp (packed)</attribute> <!-- Units of the array -->
    </variable> <!-- dsp_band_1.dsp_band_1 -->
    
    <variable name="lat"> <!-- dsp_band_1.lat map -->
3)    <attribute name="units" type="String">degrees_north</attribute>
    </variable> 
    
    <variable name="lon"> <!-- dsp_band_1.lon map -->
4)    <attribute name="units" type="String">degrees_east</attribute>
    </variable> <!-- dsp_band_1.lon map -->    
  </variable> <!-- dsp_band_1 Grid -->

  <!-- Note well: this is a new coordinate variable so requires the correct type.
  Also note that it falls outside of the actual grid since we must specify it 
  as a sibling coordinate variable it will be made into a Grid when the netcdf is closed. 
  -->
  <variable name="filename" type="String">
5)  <attribute name="Info" type="String">Filename with timestamp</attribute>
  </variable> <!-- filename -->
 
</netcdf

Here we show metadata being injected in several ways, denoted by the 1) -- 5) notations.

1) We are inside the scope of the top-level Grid variable, so this metadata will show up in the attribute table inside the Grid Structure. 2) This is the actual data Array of the Grid, dsp_band_1.dsp_band_1. We specify the units are a packed temperature. 3) Here we are in the scope of a map variable, dsp_band_1.lat. We add the units specification to this map. 4) Likewise, we add units to the lon map vector. 5) Finally, we must close the actual grid and specify the metadata for the NEW coordinate variable as a sibling of the Grid since this will be used as the canonical prototype to be added to all Grid's which are to be aggregated on the new dimension. Note in this case (unlike previous cases) the type of the new coordinate variable is required since we are specifying a "placeholder" variable for the new map until the Grid is actually processed once its containing <netcdf> is closed (i.e. all data is available to it).

The resulting DAS (with global dataset metadata removed for clarity):

Attribute {
... global data clipped ...
  dsp_band_1 {
        Byte dsp_PixelType 1;
        Byte dsp_PixelSize 2;
        UInt16 dsp_Flag 0;
        UInt16 dsp_nBits 16;
        Int32 dsp_LineSize 0;
        String dsp_cal_name "Temperature";
        String units "Temp";
        UInt16 dsp_cal_eqnNumber 2;
        UInt16 dsp_cal_CoeffsLength 8;
        Float32 dsp_cal_coeffs 0.125, -4;
        Float32 scale_factor 0.125;
        Float32 add_off -4;
 1)   String Info "This is metadata on the Grid itself.";
        filename {
 5)       String Info "Filename with timestamp";
        }
        dsp_band_1 {
2)        String units "Temp (packed)";
        }
        lat {
            String name "lat";
            String long_name "latitude";
3)        String units "degrees_north";
        }
        lon {
            String name "lon";
            String long_name "longitude";
4)        String units "degrees_east";
        }
    }
    filename {
5)    String Info "Filename with timestamp";
    }
}

We have annotated the DAS with numbers representing which lines in the NcML above correspond to the injected metadata.