THREDDS Catalog Metadata

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Overview

Goal
Improved THREDDS catalog responses from dynamically generated (BES) catalogs.


THREDDS catalogs typically contain metadata beyond the minimum required to simply list the catalogs holdings. This additional metadata is often comprised of Digital Library Metadata Elements. The Hyrax server is not currently able to produce THREDDS catalogs with this kind of metadata. Since these metadata elements provide crucial catalog content for dataset discovery, Hyrax needs to add this functionality.

The Problem

In a section of this THREDDS catalog: http://blackburn.whoi.edu:8081/thredds/bathy_catalog.xml We can see the Digital Library Metadata Elements:

   <dataset name="USGS Vineyard Sound Relief Model (1 arc sec)" ID="bathy/vs_1sec_20070725.nc"
            urlPath="bathy/vs_1sec_20070725.nc">
       <serviceName>Compound</serviceName>

       <authority>gov.usgs.er.whsc</authority>
       <dataType>GRID</dataType>
       <dataFormat>NetCDF</dataFormat>
       <documentation xlink:href="http://stellwagen.er.usgs.gov/models/grids/CGSherwo.doc"
                      xlink:title="USGS Vineyard Sound Coastal Relief Model (1 arc second)"/>
       <creator>
           <name vocabulary="DIF">WHSC/USGS</name>
           <contact url="http://www.usgs.gov/" email="rsignell@usgs.gov"/>
       </creator>
       <publisher>
           <name vocabulary="DIF">WHSC/USGS</name>
           <contact url="http://www.usgs.gov/" email="rsignell@usgs.gov"/>
       </publisher>
       <geospatialCoverage>
           <northsouth>
               <start>41.0</start>
               <size>1.4</size>
               <units>degrees_north</units>
           </northsouth>
           <eastwest>
               <start>-71.2</start>
               <size>0.5</size>
               <units>degrees_east</units>
           </eastwest>
           <updown>
               <start>-277.25</start>
               <size>459.6</size>
               <units>meters</units>
           </updown>
       </geospatialCoverage>

   </dataset>


A dataset element in a THREDDS catalog from Hyrax, generated from a BES showCatalog request, contains none of the Digital Library Metadata Elements:

   <dataset name="200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml" ID="netcdf/examples/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml">
     <dataSize units="bytes">1162</dataSize>
     <date type="modified">2010-02-22T23:19:51</date>
     <access serviceName="dap" urlPath="netcdf/examples/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml" />
   </dataset>


However, the DDX for the dataset in question contains a significant amount of metadata. This includes DAP attributes (associated with the Unidata Dataset Discovery convention) that relate to geospatial extents:

       <Attribute name="geospatial_lat_min" type="Float32">
           <value>21.73596001</value>
       </Attribute>
       <Attribute name="geospatial_lat_max" type="Float32">
           <value>46.49441910</value>
       </Attribute>
       <Attribute name="geospatial_lon_min" type="Float32">
           <value>-97.88385010</value>
       </Attribute>
       <Attribute name="geospatial_lon_max" type="Float32">
           <value>-57.23120880</value>
       </Attribute>
How can we add the Digital Library Metadata Elements content to Hyrax THREDDS catalogs?
Can we capitalize on existing metadata to build Digital Library Metadata Element content?

THREDDS Background Information

Existing THREDDS/NCML Implementations

How are people using THREDDS?

How does the TDS use THREDDS and NcML?

The TDS uses THREDDS catalog files stored on the local disk as a source of both catalog content and configuration information. The different files can be linked t gether through thredds:catalogRef elements. These catalog documents may contain explicit information about individual data sets. They also support two important features: thredds:datasetScan and embedded ncML.


Content =

Configuration

thredds:datasetScan

Directs the TDS to serve a directory tree of data files as a hierarchical catalog of data.

Embedded NcML

The TDS can use embedded NcML elements to add new, modify existing, or aggregate collections of data sets.

In this excerpt from a THREDDS catalog XML file that is read by the TDS there is an ncml:netcdf element:

   <dataset name="1-day" ID="satellite/PH/ssta/1day" urlPath="satellite/PH/ssta/1day">
     <serviceName>all</serviceName>
     <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> 
       <aggregation dimName="time" type="joinExisting" recheckEvery="720 min"> 
         <variableAgg name="PHssta" /> 
         <scan location="/u00/satellite/PH/ssta/1day/" suffix=".nc" /> 
       </aggregation>
     </netcdf>
   </dataset>

The ncml:netcdf element instructs the TDS that this is a logical dataset built from an aggregation, and contains the information need by the TDS to create the logical dataset from files stored on disk.

Possible Designs

Explicit Injection and XSLT

pre-conditions:

  • Write NcML for each dataset to place the desired Digital Library Metadata Elements XML directly into the DDX.

RunTime:

  • Use XSLT transforms to exctract the THREDDS namespace metadata from the DDX and add it to the THREDDS catalog response.
  • Cache all DDX's along with the time of caching. Update only as required.
  • Use cached DDX's to formulate subsequent responses.


XSLT semantic mappings

RunTime:

  • Use XSLT transforms to transform metadata in different conventions (like CF-1.1 and UDDv1) in the DDX into THREDDS namespace metadata and add it to the THREDDS catalog response.
  • Cache all DDX's along with the time of caching. Update only as required.
  • Use cached DDX's to formulate subsequent responses.

Semantic Inferencing

pre-conditions:

  • Write inferencing rules that map metadata conventions (CF-1.1 and UDDv1 to begin) to the THREDDS Digital Library Metadata Elements.

Operational Startup:

  • Ingest all of the DDXs in Hyrax into a semantic repository.
  • Run the repository rules to completion.
  • Trigger asynchronous updates.


RunTime:

  • Query repository for catalog node contents.
  • return catalog.