THREDDS Catalog Metadata

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Overview

Goal
Improved THREDDS catalog responses from dynamically generated (BES) catalogs.


THREDDS catalogs typically contain metadata beyond the minimum required to simply list the catalogs holdings. This additional metadata is often comprised of Digital Library Metadata Elements. The Hyrax server is not currently able to produce THREDDS catalogs with this kind of metadata. Since these metadata elements provide crucial catalog content for dataset discovery, Hyrax needs to be improved to add this functionality.

The Problem

In a section of this THREDDS catalog: http://blackburn.whoi.edu:8081/thredds/bathy_catalog.xml we can see the Digital Library Metadata Elements:

   <dataset name="USGS Vineyard Sound Relief Model (1 arc sec)" ID="bathy/vs_1sec_20070725.nc"
            urlPath="bathy/vs_1sec_20070725.nc">
       <serviceName>Compound</serviceName>

       <authority>gov.usgs.er.whsc</authority>
       <dataType>GRID</dataType>
       <dataFormat>NetCDF</dataFormat>
       <documentation xlink:href="http://stellwagen.er.usgs.gov/models/grids/CGSherwo.doc"
                      xlink:title="USGS Vineyard Sound Coastal Relief Model (1 arc second)"/>
       <creator>
           <name vocabulary="DIF">WHSC/USGS</name>
           <contact url="http://www.usgs.gov/" email="rsignell@usgs.gov"/>
       </creator>
       <publisher>
           <name vocabulary="DIF">WHSC/USGS</name>
           <contact url="http://www.usgs.gov/" email="rsignell@usgs.gov"/>
       </publisher>
       <geospatialCoverage>
           <northsouth>
               <start>41.0</start>
               <size>1.4</size>
               <units>degrees_north</units>
           </northsouth>
           <eastwest>
               <start>-71.2</start>
               <size>0.5</size>
               <units>degrees_east</units>
           </eastwest>
           <updown>
               <start>-277.25</start>
               <size>459.6</size>
               <units>meters</units>
           </updown>
       </geospatialCoverage>

   </dataset>


A dataset element in a THREDDS catalog from Hyrax, generated from a BES showCatalog request, contains none of the Digital Library Metadata Elements:

   <dataset name="200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml" ID="netcdf/examples/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml">
     <dataSize units="bytes">1162</dataSize>
     <date type="modified">2010-02-22T23:19:51</date>
     <access serviceName="dap" urlPath="netcdf/examples/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml" />
   </dataset>


However, the DDX for the dataset in question contains a significant amount of metadata. This includes DAP attributes (associated with the Unidata Dataset Discovery convention) that relate to geospatial extents:

       <Attribute name="geospatial_lat_min" type="Float32">
           <value>21.73596001</value>
       </Attribute>
       <Attribute name="geospatial_lat_max" type="Float32">
           <value>46.49441910</value>
       </Attribute>
       <Attribute name="geospatial_lon_min" type="Float32">
           <value>-97.88385010</value>
       </Attribute>
       <Attribute name="geospatial_lon_max" type="Float32">
           <value>-57.23120880</value>
       </Attribute>
How can we add the Digital Library Metadata Elements content to Hyrax THREDDS catalogs?
Can we capitalize on existing metadata to build Digital Library Metadata Element content?

THREDDS Background Information

Existing THREDDS/NCML Implementations

How are people using THREDDS?

How does the TDS use THREDDS and NcML?

Possible Designs

Explicit Injection and XSLT

pre-conditions:

  • Write NcML for each dataset to place the desired Digital Library Metadata Elements XML directly into the DDX.

RunTime:

  • Use XSLT transforms to exctract the THREDDS namespace metadata from the DDX and add it to the THREDDS catalog response.
  • Cache all DDX's along with the time of caching. Update only as required.
  • Use cached DDX's to formulate subsequent responses.


XSLT semantic mappings

RunTime:

  • Use XSLT transforms to transform metadata in different conventions (like CF-1.1 and UDDv1) in the DDX into THREDDS namespace metadata and add it to the THREDDS catalog response.
  • Cache all DDX's along with the time of caching. Update only as required.
  • Use cached DDX's to formulate subsequent responses.

Semantic Inferencing

pre-conditions:

  • Write inferencing rules that map metadata conventions (CF-1.1 and UDDv1 to begin) to the THREDDS Digital Library Metadata Elements.

Operational Startup:

  • Ingest all of the DDXs in Hyrax into a semantic repository.
  • Run the repository rules to completion.
  • Trigger asynchronous updates.


RunTime:

  • Query repository for catalog node contents.
  • return catalog.