THREDDS using XSLT

From OPeNDAP Documentation

1 Background

Prior to Hyrax 1.5 THREDDS catalog functionality in Hyrax was provided using an imported implementation of THREDDS. This was a large and complex dependancy for Hyrax, and the implementation had significant scalability problems for large catalogs (Catalogs with 20k or more entries would consume all available memory.)

In response to this we have written new code for Hyrax. We have replaced the imported code with 2 OLFS handlers.

2 BES THREDDS Handler

The opendap.bes.BESThreddsDispatchHandler provides THREDDS catalogs for all data served from a BES. It requires no configuration. Simply adding it to the olfs configuration (file: $CATALINA_HOME/content/opendap/olfs.xml) will provide THREDDS catalogs for data served from the BES.

This handler uses XSL transforms to convert the BES <showCatalog> response into a THREDDS catalog.

2.1 Default Configuration

           <Handler className="opendap.bes.BESThreddsDispatchHandler" />

3 THREDDS Dispatch Handler

The opendap.threddsHandler.Dispatch handler provides THREDDS catalog functionality for static THREDDS catalogs located on the system with the OLFS. The handler uses XSL transforms to provide HTML presentation views of both the catalogs and individual datasets within the catalog. Much like the TDS, data access links are available on the dataset pages (if the catalog contains the information for the access links).

3.1 Memory Caching

The implementation can be configured to use memory caching of THREDDS catalogs to improve speed and reduce disk thrashing.

When memory caching is enabled, the handler will traverse the local THREDDS catalogs at startup. Each catalog file will be read into a memory buffer and cached. The memory buffer is parsed to verify that the catalog represents valid XML, but the resulting document is not saved. When a thredds:catalogRef element is encountered during the traversal its href is evaluated:

  • If the href is a relative URL (does not begin with a "/" or "http://) then the catalog is traversed and cached.
  • If the href begins with a "/" character it is assumed that the catalog is being provided by another service on the same system, and it is not traversed or cached.
  • If the href begins with a "http://" it is assumed to be a remotely hosted catalog provided by another service on a different system, and it is not traversed or cached.

When a client asks for an XML catalog response, the entire cached buffer for the catalog is dumped to the client in a single write command. This should be very fast, as all that must happen is an existing byte buffer is written to the response stream.

If the client is asking for an HTML view of the catalog, the buffer is parsed and passed through an XSL transform to generate the HTML page. The thinking behind this is that machines will be traversing the XML files and would require very fast response times, while humans will be traversing the HTML views of the catalog and the latency generated by parsing and performing the transform would be acceptable to most users.

If memory caching is disabled, then the start up remains the same, except no data is cached. Subsequent client requests for THREDDS products are handled in the same manner as before, only the catalog content is read from disk each time. Although this means that the XML responses will be much slower, it will scale to handle much larger static catalog collections.

3.1.1 Cache Updates

Each time a catalog request is processed the source file's last modified date is checked. If the catalog in memory was cached prior to the last modified date, it and all of it's descendants in the catalog hierarchy are purged from the cache and reloaded.

3.2 prefix

This handler requires a prefix element in the configuration: <prefix>thredds</prefix>The value of the prefix element is used by the handler to identify requests intended for it. Basically it will claim any request whose path begins with the prefix.

For example, if the prefix is set to "thredds", then this request:

http://localhost:8080/opendap/thredds/catalog.xml

Will be claimed by the handler while this request:

http://localhost:8080/opendap/catalog.xml

Will not. (Although it would be claimed by the BES THREDDS Handler)

3.3 Presentation View (HTML)

Supplanting the .xml at the end of a catalog's name with .html will cause the opendap.threddsHandler.Dispatch to return an HTML presentation view of the catalog. This is accomplished by parsing the catalog.xml document (either from memory if cached, or from disk if not) and running the resulting document through an XSL transform. All the metadata for all thredds:dataset elements can be inspected in a separate HTML page that details the dataset. This page is also generated by an XSL transform applied to the catalog XML document.



3.4 Default configuration

           <Handler className="opendap.threddsHandler.Dispatch">
               <prefix>thredds</prefix>
               <useMemoryCache>true</useMemoryCache>
           </Handler>