Hyrax - THREDDS Configuration

From OPeNDAP Documentation
Revision as of 00:18, 28 June 2008 by Ndp (talk | contribs) (This Is the Important section. Read This First!)

@TODO: Revise this page to improve clarity and usability

This release of Hyrax supports the complete THREDDS catalog service stack. THREDDS catalogs are controlled by a catalog.xml file located in the (persistent) content directory for the OLFS (More on that here). Rather than provide an exhaustive explanation of the THREDDS catalog functionality and configuration I will appeal to the existing documents provided by our fine colleagues at UNIDATA:

Did you read all that? Excellent!



1 Configuration Instructions

In order to get THREDDS catalogs working on Hyrax YOU MUST do at least the following:


  1. For each collection that appears in the top level of the OPeNDAP directory response (http://localhost:8080/opendap/contents.html if Hyrax is running out of the box on your machine) you MUST create a <datasetScan> in the catalog.xml file located in the persistent content directory ($CATALINA_HOME/content/opendap).

    The THREDDS catalog views will NOT include a collection for which this is not done!

  2. In each <datasetScan> element that you create you MUST use the following element: <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" /> This is the only CrawlableDataset implementation available!

  3. No matter what your BES collection looks like, the location attribute of the <datasetScan> element must begin with "/bes". This is a THREDDS requirement.

  4. The service attribute in the <datasetScan> element must be set to "OPeNDAP-Hyrax".

  5. The path attribute in the <datasetScan> element appears in the URL after the servlet name, and MUST be the same as the value of the location attribute with the leading "/bes/" removed. In other words it MUST NOT start with a "/" character .

  6. You should apply a filter to the data that coincides with the value of the "BES.Catalog.catalog.TypeMatch" for the data types being served. I suggest that you make the filter expose ALL of the data types served by the BES. See the THREDDS pages on the DatasetScan Element for filter details. The point of this to remove files from the catalog view that are NOT OPeNDAP data. For example README files.

  7. Add metadata as you see fit. If you REALLY did read the THREDDS documentation above you will already have a clue about this. If you haven't DO SO NOW.



2 Reinitializing THREDDS

The THREDDS catalog is read when Tomcat is started. Hyrax will check the last modifed date of the catalog.xml file prior to responding to a THREDDS catalog request. If the last modifed date has changed since Tomcat started, then Hyrax will reload all of the THREDDS catalog information.

So if you make changes to ANY of the THREDDS catalog files in the $CATALINA_HOME/content/opendap directory tree, then there are two ways for you to get Hyrax to update:

  1. Change the last modified date of the file: $CATALINA_HOME/content/opendap/catalog.xml

    This can be accomplished with the unix command "touch" command: touch $CATALINA_HOME/content/opendap/cataog.xml This will cause Hyrax to reload all of the THREDDS catalogs the next time that a THREDDS catalog request is made (You might want to make this request yourself if you have a big THREDDS catalog configuration so that a knowing user doesn't have to wait for a response while Hyrax is working)

    OR you could:

  2. Restart Tomcat.



3 THREDDS Catalog Examples

3.1 Example 1

Here is an example catalog.xml file for a Hyrax installation in which the top level of the BES shows only ONE collection called "data":

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Hyrax Test Catalog"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink">

     <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->

     <service name="OPeNDAP-Hyrax" serviceType="OPeNDAP" base="/opendap/"/>

     <datasetScan location="/bes/data" path="data" name="AllMyData" serviceName="OPeNDAP-Hyrax">

	       <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" />

           <filter>
               <exclude wildcard=".*" atomic="true" collection="true" />
               <include wildcard="*" />
           </filter>
           <addDatasetSize />

           <metadata inherited="true">
               <serviceName>OPeNDAP-Hyrax</serviceName>
               <authority>opendap.org</authority>
               <dataType>Random</dataType>
           </metadata>
     </datasetScan>

     <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
</catalog>

3.2 Example 2

Here is an example catalog.xml file for a Hyrax installation in which the top level of the BES shows contains 4 collection called "nc", "hdf", and "ff":

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Hyrax Test Catalog"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink">

     <service name="OPeNDAP-Hyrax" serviceType="OPeNDAP" base="/opendap/"/>

     <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->

     <datasetScan location="/bes/nc" path="nc" name="NetCDF Archive" serviceName="OPeNDAP-Hyrax">

	       <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" />

           <filter>
               <exclude wildcard=".*" atomic="true" collection="true" />
               <include wildcard="*.nc" />
           </filter>
           <addDatasetSize />

           <metadata inherited="true">
               <serviceName>OPeNDAP-Hyrax</serviceName>
               <authority>opendap.org</authority>
               <dataType>Random</dataType>
           </metadata>
     </datasetScan>

     <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->

     <datasetScan location="/bes/hdf" path="hdf" name="HDF Archive" serviceName="OPeNDAP-Hyrax">

	       <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" />

           <filter>
               <exclude wildcard=".*" atomic="true" collection="true" />
               <include wildcard="*.hdf" />
           </filter>
           <addDatasetSize />

           <metadata inherited="true">
               <serviceName>OPeNDAP-Hyrax</serviceName>
               <authority>opendap.org</authority>
               <dataType>Random</dataType>
           </metadata>
    </datasetScan>

    <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->

    <datasetScan location="/bes/ff" path="ff" name="FreeForm Archive" serviceName="OPeNDAP-Hyrax">

	       <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" />

           <filter>
               <exclude wildcard=".*" atomic="true" collection="true" />
               <include wildcard="*.dat" />
           </filter>
           <addDatasetSize />

           <metadata inherited="true">
               <serviceName>OPeNDAP-Hyrax</serviceName>
               <authority>opendap.org</authority>
               <dataType>Random</dataType>
           </metadata>
     </datasetScan>

     <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
</catalog>



4 In Particular Note The Following

1. The line in which the CrawlableDataset implementation is defined:

    <crawlableDatasetImpl className="opendap.bes.BESCrawlableDataset" />

Identifies the correct CrawlableDataset class for Hyrax - the one that works with the BES to automatically generate catalogs.


2. In the <datasetScan> element the location attribute's value MUST begin with /bes. So, if the top level collection in the BES contains 4 sub collections they may each be identified using a separate <datasetScan> element like so:

	<datasetScan location="/bes/nc"  path="nc"  name="NetCDF Archive"   serviceName="OPeNDAP-Hyrax"> . . . </datasetScan>
	<datasetScan location="/bes/hdf" path="hdf" name="HDF Archive"      serviceName="OPeNDAP-Hyrax"> . . . </datasetScan>
	<datasetScan location="/bes/jg"  path="jg"  name="JGOFFS Archive"   serviceName="OPeNDAP-Hyrax"> . . . </datasetScan>
	<datasetScan location="/bes/ff"  path="ff"  name="FreeForm Archive" serviceName="OPeNDAP-Hyrax"> . . . </datasetScan>

Where each <datasetScan> element may have it's own filter and inheritance rules. You MUST NOT lump them all into one <datasetScan> element with one set of filter rules like so:

	<datasetScan location="/bes"     path="DATA"  name="AllYourDataAreUs"   serviceName="OPeNDAP-Hyrax"> . . . </datasetScan>

Because it does not work. If you want them all to be in one a single collection then configure the BES so that it has one top level collection (see Example 1)


3. The path attribute in the <datasetScan> element appears in the URL after the servlet name, and MUST be the same as the value of the location attribute with the leading "/bes/" removed. In other words it MUST NOT start with a "/" character .