Hyrax - OLFS Configuration

From OPeNDAP Documentation
Revision as of 20:28, 20 April 2007 by Ndp (talk | contribs) (→‎olfs.xml)
⧼opendap2-jumptonavigation⧽

Attention! Major changes have been made to the format and content of the olfs.xml file since OLFS version 1.1.1. Read this document carefully so that you may fully understand how to configure this new version of the OLFS.


This document should help you get started configuring the OLFS component of Hyrax. This servlet was developed, compiled, and tested using the java 1.5.0 compiler, the 1.5.0 Java Virtual Machine, and Jakarta Tomcat 5.5 (which also provided the javax.servlet packages).

Note: All Examples of web.xml configurations in this document were check against Jakarta Tomcat 5.5.0

The OLFS web application is composed of two servlets, the OLFS servlet and the Docs servlet.

  • The OLFS servlet does the majority of the work in the OLFS web application. It does this by providing a flexible "dispatch" mechanism through which incoming requests are evaluated by a series of DispatchHandlers that can choose to handle the request, or ignore it. The OLFS ships with a standard set of DispathHandlers which handle requests for OPeNDAP data products, THREDDS catalogs, and OPeNDAP directories. These defalut DispatchHandlers can be augmented by adding custom handlers without the need to recompile the software. All of the DispatchHandlers used by the OLFS are identified in the olfs.xml configuration file.
  • The Docs servlet provides clients access to a tree of static documents. By default a minimal set of documents are provide (conaining information about Hyrax), these can be replaced by user supplied documents and images. By changing the images and documents available through the Docs servlet the data provider can further customize their Hyrax installtion.



OLFS Servlet Configuration

The OLFS servlet is the front end (public interface) for Hyrax. It provides THREDDS catalogs, directory views, logging, and authentication services. The OLFS relies on one or more instances of the [BES] to provide it with data access and basic catalog metadata.

Dispatch Handlers

Files

The OLFS servlet gets its configuration from 4 files. In general all of your configuration need will be met by making changes to the first two: olfs.xml and catalog.xml located in the 'persistent content directory': $CATALINA_HOME/content/opendap

olfs.xml
role: Contains the localized OLFS configuration - location of the BES(s), directory view instructions, etc.
location: In the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/
catalog.xml
role: THREDDS catalog configuration.
location: In the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/
web.xml
role: Core servlet configuration.
location: The servlet's web.xml file located in the WEB-INF directory of the web application "opendap". Typically that means $CATALINA_HOME/webapps/opendap/WEB-INF/web.xml
log4j.xml
role: Contains the logging configuration for Hyrax.
location: The default location for the log4j.xml is in the WEB-INF directory of the web application "opendap". Typically that means $CATALINA_HOME/webapps/opendap/WEB-INF/log4j.xml However, Hyrax can be configured to look in additional places for the log4j.xml file. Read More About It Here.

olfs.xml

Attention! Major changes have been made to the format and content of the olfs.xml file since OLFS version 1.1.1. Read this document carefully so that you may fully understand how to configure this new version of the OLFS.


The olfs.xml file contains the core configuration of the OLFS. It identifies all of the DispatchHandlers to be used by the OLFS, at it must identify at least one BES to be paired with the OLFS and it controls both view and access behaviours of the OLFS servlet.

OLFSConfig element

The <OLFSConfig> element is the document root and it contains three elements that suppy the configuration for the OLFS: <BESConfig>, <DirectoryView>, and <AllowDirectDataSourceAccess >

<DispatchHandlers> element

The <DispatchHandlers> element has two child elements: <HttpGetHandlers> and <HttpPostHandlers>. The <HttpGetHandlers> contains and ordered list of the DispatchHandler classes used by the OLFS to handle incoming HTTP GET requests.


<HttpGetHandlers> element

The <HttpGetHandlers> contains and ordered list of the DispatchHandler classes used by the OLFS to handle incoming HTTP GET requests. The list order is significant, and permutating the order will (probably negatively) change the behavior of the OLFS. Each DispatchHandler on the list will be asked to handle the request. The first DispatchHandler on the list to claim the request will be asked to build the response.


<HttpPostHandlers> element

The <HttpPostHandlers> contains and ordered list of the DispatchHandler classes used by the OLFS to handle incoming HTTP POST requests. The list order is significant, and permutating the order will (probably negatively) change the behavior of the OLFS. Each DispatchHandler on the list will be asked to handle the request. The first DispatchHandler on the list to claim the request will be asked to build the response.


<Handler> elements

Both the <HttpGetHandlers> and <HttpPostHandlers> will contain an orderd list of <Handler> elements. Each <Handler> must have an attribute call className whose value is set to the fully qualified Java class name for the DispatchHandler implementation to be used. For example:

           <Handler className="opendap.bes.VersionDispatchHandler" />

Names the class opendap.bes.VersionDispatchHandler.

Each <Handler> element may contain a collection of child elements that provide configuration information to the DispatchHandler implementation. In this example:

           <Handler className="opendap.bes.DirectoryDispatchHandler">
               <DefaultDirectoryView>OPeNDAP</DefaultDirectoryView>
           </Handler>

The <Handler> element contains a child element <DefaultDirectoryView> that specifies the default directory view for Hyrax to the DirectoryDispatchHandler class.



<BES> element

The <BES> element provides the OLFS with connection and control information for a BES. There are 4 child elements in a <BES> element: <prefix>, <host>, <port>, and <ClientPool>

<prefix> element

This element contains the path prefix that the OLFS will associate with this BES. This provides a mapping for each BES connected to the OLFS to URI space serviced by the OLFS.

  1. There must one (but only one) BES in each <BESConfig> whose prefix has a value of "/" (see example 1). There may be more than one <BES> but there must be at least that one.
  2. For a single BES (the one with "/" as it's prefix) no additional effort is required. However, when using multiple BES's it is neccesary that each BES have a mount point exposed as a directory (aka collection) in URI space where it's going to appear. See Configuring With Multiple BES's for more information.

example 1:

<prefix>/</prefix>

example 2:

<prefix>/data/nc</prefix>
<host> element

This element contains the host name or IP address of the BES.

example:

<host>test.opendap.org</host >
<port> element

The port number on which the BES is listening.

example:

<port>10002</port >


<ClientPool> element

Configures the behavior of the pool of client connections that the OLFS maintains with the BES. These connections are pooled for efficiency and speed. Currently the only configuration item available is to control the maximum number of concurrent BES client connections that the OLFS may make, the default is 10, but the size should be optimized for your locale by empirical testing. The size of the Client Pool is controlled by the maximum attribute.

example:

<ClientPool maximum="17" />

If the <ClientPool> element is missing the pool size defaults to 10.

<DirectoryView> element

The <DirectoryView> element is used to indicate which type of directory view clients see. This DOES NOT affect the THREDDS catalogs! Only the HTML views of them. A value of "THREDDS" will provide the THREDDS directory view and a value of "OPeNDAP" will produce the OPeNDAP directory view.


<AllowDirectDataSourceAccess> element

The <AllowDirectDataSourceAccess /> element controls the users ability to directly access data sources via the web interface. If this element is present in the OLFS.xml file (and not commented out as in the example below) a client can get an entire data source (such as an HDF file) by simply requesting it through the HTTP URL interface. This is NOT a good practice and is not recommended. By default Hyrax ships with this option turned off and I recommend that you leave it that way unless you really want users to be able to circumvent the OPeNDAP request interface and have direct access to the data products stored on your server.


Default olfs.xml file

<?xml version="1.0" encoding="UTF-8"?>
<OLFSConfig>

    <DispatchHandlers>

        <HttpGetHandlers>

            <Handler className="opendap.bes.BESManager">

                <BES>
                    <!-- The path prefix for this BES -->
                    <prefix>/</prefix>

                    <!-- The hostname (or IP address) for this BES -->
                    <host>localhost</host>

                    <!-- The port number for this BES -->
                    <port>10002</port>

                    <!-- The ClientPool maximum number of concurrent
                      -  BES client connections allowed.
                      -->
                    <ClientPool maximum="10" />
                </BES>

            </Handler>

            <Handler className="opendap.coreServlet.SpecialRequestDispatchHandler" />
            
            <Handler className="opendap.bes.VersionDispatchHandler" />

            <Handler className="opendap.bes.DirectoryDispatchHandler">
                <!-- DirectoryView:
                  - Used to indicate the default directory view clients see. This DOES NOT
                  - affect the THREDDS catalogs! Only the HTML views of them. A value
                  - of "THREDDS" will provide the THREDDS directory view and a value
                  - of "OPeNDAP" will produce the OPeNDAP directory view for URL's
                  - that end with a "/".
                  -->
                <DefaultDirectoryView>OPeNDAP</DefaultDirectoryView>
            </Handler>

            <Handler className="opendap.bes.DapDispatchHandler" />

            <Handler className="opendap.bes.FileDispatchHandler" >
                <!-- AllowDirectDataSourceAccess
                  - If this element is opresent then the server will allow users to request
                  - the data source (file) directly. For example a user could just get the
                  - underlying NetCDF files located on the server without using the OPeNDAP
                  - request interface.
                  -
                  - THINK TWICE before allowing this, as data sources can be quite large
                  - and allowing their transmission with out subsetting can put heavy loads
                  - on the network and the server.
                  -->
                <!-- <AllowDirectDataSourceAccess /> -->
            </Handler>

            <Handler className="opendap.bes.ThreddsGetDispatchHandler" />

        </HttpGetHandlers>

        <HttpPostHandlers>
            <Handler className="opendap.coreServlet.SOAPRequestDispatcher" >
                <OpendapSoapDispatchHandler>opendap.bes.SoapDispatchHandler</OpendapSoapDispatchHandler>
            </Handler>
        </HttpPostHandlers>

    </DispatchHandlers>

</OLFSConfig>

Note that is much easier to read with the comments removed:

<?xml version="1.0" encoding="UTF-8"?>
<OLFSConfig>

    <DispatchHandlers>

        <HttpGetHandlers>

            <Handler className="opendap.bes.BESManager">

                <BES>
                    <prefix>/</prefix>
                    <host>localhost</host>
                    <port>10002</port>
                    <ClientPool maximum="10" />
                </BES>

            </Handler>

            <Handler className="opendap.coreServlet.SpecialRequestDispatchHandler" />
            
            <Handler className="opendap.bes.VersionDispatchHandler" />

            <Handler className="opendap.bes.DirectoryDispatchHandler">
                <DefaultDirectoryView>OPeNDAP</DefaultDirectoryView>
            </Handler>

            <Handler className="opendap.bes.DapDispatchHandler" />

            <Handler className="opendap.bes.FileDispatchHandler" >
                <!-- <AllowDirectDataSourceAccess /> -->
            </Handler>

            <Handler className="opendap.bes.ThreddsGetDispatchHandler" />

        </HttpGetHandlers>

        <HttpPostHandlers>
            <Handler className="opendap.coreServlet.SOAPRequestDispatcher" >
                <OpendapSoapDispatchHandler>opendap.bes.SoapDispatchHandler</OpendapSoapDispatchHandler>
            </Handler>
        </HttpPostHandlers>

    </DispatchHandlers>

</OLFSConfig>

catalog.xml

The catalog.xml file contains the THREDDS catalog configuration for Hyrax. It's complex. Read About It Here.

log4j.xml

The log4j.xml file contains the logging configuration for Hyrax. It too is complex. Read About It Here.

web.xml

We strongly recommend that you do NOT mess with the web.xml file. At least for now. Future versions of Server and the OLFS may have "user configurable" stuff in the web.xml file, but this version does not. SO JUST DON'T DO IT. OK? Having said that, here are the details regarding the web.xml file:


Servlet Definition

The OLFS running in the opendap context area needs an entry in the web.xml file. Multiple instances of a servlet and/or several different servlets can be configured in the one web.xml file. For instance you could have a DTS and a Hyrax running in from the same web.xml and thus under the same servlet context. Running multiple instances of the OLFS in a single web.xml file (aka context) will NOT work.

Each a servlet needs a unique name which is specified inside a <servlet> element in the web.xml file using the <servlet-name> tag. This is a name of convenience, for example if you where serving data from an ARGOS satellite you might call that servlet argos.

Additionally each instance of a <servlet> must specify which Java class contains the actual servlet to run. This is done in the <servlet-class> element. For example the OLFS servlet class name is opendap.coreServlet.DispatchServlet

Here is a syntax example combining the two previous example values:

<servlet>
	<servlet-name>argos</servlet-name>
	<servlet-class>opendap.coreServlet.DispatchServlet</servlet-name>
	.
	.
	.
</servlet>

This servlet could then be accessed as: http://hostname/opendap/servlet/argos

You may also add to the end of the web.xml file a set of <servlet-mapping> elements. These allow you to abbreviate the URL or the servlet. By placing the servlet mappings:

<servlet-mapping>
    <servlet-name>argos</servlet-name>
    <url-pattern>/argos</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>argos</servlet-name>
    <url-pattern>/argos/*</url-pattern>
</servlet-mapping>

At the end of the web.xml file our previous example changes it's URL to: http://hostname/opendap/argos

Eliminating the need for the word servlet in the URL. For more on the <servlet-mapping> element see the Jakarta-Tomcat documentation.

<init-param> Elements

The OLFS uses <init-param> elements inside of each <servlet> element to get specific configuration information.

<init-param>'s common to all OPeNDAP servlets are:


DebugOn

This controls output to the terminal from which the servlet engine was launched. The value is a list of flags that turn on debugging instrumentation in different parts of the code. Supported values are:

  • probeRequest: Prints a lengthy inspection of the HttpServletRequest object to stdout. Don't leave this on for long, it will clog your Catalina logs.
  • DebugInterface: Enables the servers debug interface. This ineractive interface allows a user to look at (and change) the server state via a web browser. Enable this only for analysis purposes, disable when finshed!

Example:

    <init-param>
	<param-name>DebugOn</param-name>
	<param-value>probeRequest</param-value>
    </init-param>

Default: If this parameter is not set, or the value field is empty then these features will be disabled - which is what you want unless there is a problem to analyze.



OpendapHttpDispatchHandlerImplementation

This parameter specifies the handler implementation that provides the responses for the HTTP GET commands to the servlet framework. Don't be messin' with this! It's what makes Hyrax, well, Hyrax.

Example:

    <init-param>
        <param-name>OpendapHttpDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.HttpDispatchHandler</param-value>
    </init-param>

Default: This parameter must be set to the value opendap.bes.HttpDispatchHandler for the server to actually be an instance of Hyrax.



OpendapSoapDispatchHandlerImplementation

This parameter specifies the handler implementation that provides the responses for the HTTP POST commands (via a SOAP interface) to the servlet framework. Don't be messin' with this! It's what makes Hyrax, well, Hyrax.

Example:

    <init-param>
        <param-name>OpendapSoapDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.SoapDispatchHandler</param-value>
    </init-param>

Default: This parameter must be set to the value opendap.bes.SoapDispatchHandler for the server to actually be an instance of Hyrax.

Example of web.xml content

<servlet>

    <servlet-name>hyrax</servlet-name>

    <servlet-class>opendap.coreServlet.DispatchServlet</servlet-class>

    <init-param>
        <param-name>OpendapHttpDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.HttpDispatchHandler</param-value>
    </init-param>

    <init-param>
        <param-name>OpendapSoapDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.SoapDispatchHandler</param-value>
    </init-param>

    <init-param>
        <param-name>DebugOn</param-name>
        <param-value></param-value>
    </init-param>

    <load-on-startup>1</load-on-startup>

</servlet>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>*</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>/hyrax</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>/hyrax/*</url-pattern>
</servlet-mapping>



Docs Servlet

The Docs (or documentation) servlet provides the OLFS web application with the ability to serve a tree of static documentation files. By default it will serve the files in the documentation tree provided with the OLFS in the Hyrax distribution. This tree is rooted at $CATALINA_HOME/webapps/opendap/docs/ and contains documentation pertaining to the software in the Hyrax distribution - installation and configuration instruction, release notes, java docs, etc.

If one wished to supplant this information with their own set of web pages, one could remove/replace the files in the default directory. However, installing a new version of Hyrax will cause these files to be overwritten, forcing them to be replaced after the install (and hopefully AFTER the new release documentation had been read and understood by the user).

The Docs servlet provides an alternative to this. If a docs directory is created in the persistent content directory for Hyrax the Docs servlet will detect it (when Tomcat is launched) and it will serve files from there instead of from the default location.

This scheme provides 2 beneficial effects:

  1. It allows localizations of the web documents associated with Hyrax to persist through Hyrax upgrades with no user intervention.
  2. It preserves important release documents that ship with the Hyrax software.

In summary, to provide persistent web pages as part of a Hyrax localization simple create the directory: $CATALINA_HOME/content/opendap/docs

Place your content in there and away you go. If later you wish to view the web based documentation bundled with Hyrax simply change the name of the directory from docs to something else and restart Tomcat. (or, you could just look in the $CATALINA_HOME/webapps/opendap/docs directory)

In the Docs servlet, is a URL ends in a directory name or a "/" then the servlet will attempt to serve the index.html in that directory. In other words index.html is the default document.



Logging

Logging is a big enough subject we gave it it's own page.



Security

Request For Input

So far Hyrax relies on the Tomcat security implementation. If after reading this section and the Tomcat documentation on security you are left wanting something more, then please get in touch with us at OPeNDAP. We would like to talk with you about your security needs for OPeNDAP Hyrax and possibly develop some use cases that describe your needs. In the future we hope to provide a richer security configuration environment than the one currently offered by Tomcat.

Synopsis

Hyrax currently relies on the security implemented by Tomcat. It is recommended that you read carefully and understand the Tomcat 5.x security documentation.

Tomcat security requires fairly extensive additions to the web.xml file. (It is important to keep in mind that altering the <servlet> definitions may render you Hyrax inoperable - please see the previous sections that discuss this.)

Examples of security content for the web.xml file can be found in the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/

Limitations

Officially Tomcat security supports context level authentication. What this means is that you can restrict access to the collection of servlets running in a single web application - in other words all of the stuff that is defined in a single web.xml file. You can call out different authentication rules for different <url-pattern>'s within the web application, but only clients do not cache ANY security information will be able to easily access the different areas.

For example in your web.xml file you might have:

    <security-constraint>
        <web-resource-collection>
            <web-resource-name>fnoc1</web-resource-name>
            <url-pattern>/hyrax/nc/fnoc1.txt</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <role-name>fn1</role-name>
        </auth-constraint>
    </security-constraint>

    <security-constraint>
        <web-resource-collection>
             <web-resource-name>fnoc2</web-resource-name>
             <url-pattern>/hyrax/nc/fnoc2.txt</url-pattern>
         </web-resource-collection>
         <auth-constraint>
             <role-name>fn2</role-name>
          </auth-constraint>
    </security-constraint>

    <login-config>
        <auth-method>BASIC</auth-method>
        <realm-name>MyApplicationRealm</realm-name>
    </login-config>

Where the the security roles fn1 and fn2 (defined in the tomcat-users.xml file) have no common members.

The complete URI's would be:

http://localhost:8080/mycontext/hyrax/nc/fnoc1.txt
http://localhost:8080/mycontext/hyrax/nc/fnoc2.txt

Now - this works, for clients that aren't too smart - i.e. they don't cache anything. However, if you try with a browser, once you authenticate for one URI, then you am locked out of the other one until I successfully "reset" the browser (purge all caches).

I think the reason is as follows: In the exchange between Tomcat and the client, Tomcat is sending the header:

WWW-Authenticate: Basic realm="MyApplicationRealm"

And the client authenticates. When the second URI is accessed Tomcat sends the the same authentication challenge, with the same WWW-Authenticate header. The client, having recently authenticated to this realm-name (defined in the <login-config> element in the web.xml file - see above) , resends the authentication information, and, since it's not valid for that url pattern, the request is denied.

Persistence

You should be careful back up your modified web.xml file to a location outside of the $CATALINA_HOME/webapps/opendap directory as new versions of Hyrax will overwrite it when installed. You could use an XML ENTITY and an entity reference in the web.xml to cause a local file containing the security configuration to be included in the web.xml. For example adding the ENITIY:

[<!ENTITY securityConfig SYSTEM "file:/fully/qualified/path/to/your/security/config.xml">]

To the <!DOCTYPE> declaration at the top of the web.xml in conjunction with adding an entity reference:

&securityConfig;

To the content of the <web-app> element would cause your external security configuration to be included in the web.xml file.

Here is an example of an ENTITY configuration:

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE web-app
        PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
        "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd"
        [<!ENTITY securityConfig      SYSTEM "file:/fully/qualified/path/to/your/security/config.xml">]
    >
    <web-app>

        <!--
            Loads a persistent security configuration from the content directory.
            This configuration may be empty, in which case no security constraints will be
            applied by Tomcat.
        -->
        &securityConfig;

        .
        .
        .

    </web-app>
 

This will not prevent you from losing your web.xml file when a new version of Hyrax is installed, but adding the ENTITY stuff to the new web.xml file would be easier than remembering an extensive security configuration. Of course, Y.M.M.V.



Compressed Responses and Tomcat

Many OPeNDAP clients accept compressed responses. This can greatly increase the efficiency of the client/server interaction by diminishing the number of bytes actually transmitted over "the wire". Tomcat provides native compression support for the GZIP compression mechanism, however it is NOT turned on by default.

The following example is based on Tomcat 5.15. We recommend that you read carefully the Tomcat documentation related to this topic before proceeding:


Details

To enable compression you will need to edit the $CATALINA_HOME/conf/server.xml file. You will need to locate the <Connector> element associated with your server, typically this will be the only <Connector> element whose port attribute is set equal to 8080. To this you will need to add/change several attributes to enable compression.

With my Tomcat 5.5 distribution I found this default <Connector> element definition in my server.xml file:

    <Connector port="8080" maxHttpHeaderSize="8192"
        maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
        enableLookups="false" redirectPort="8443" acceptCount="100"
        connectionTimeout="20000" disableUploadTimeout="true"
        compression="no"
     />

You will need to add to this four attributes:

        compression="force"
        compressionMinSize="2048"
        noCompressionUserAgents="gozilla, traviata"
        compressableMimeType="text/html,text/xml"

Notice that there is a list of compressible MIME types. Basically:

  • compression="no" means nothing gets compressed.
  • compression="yes" means only the compressible MIME types get compressed.
  • compression="force" means everything gets compressed (assuming the client accepts gzip and the response is bigger than compressionMinSize)

You MUST set compression="force" for compression to work with the OPeNDAP data transport.

The final result being:

    <Connector port="8080" maxHttpHeaderSize="8192"
        maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
        enableLookups="false" redirectPort="8443" acceptCount="100"
        connectionTimeout="20000" disableUploadTimeout="true"
        compression="force"
        compressionMinSize="2048"
        noCompressionUserAgents="gozilla, traviata"
        compressableMimeType="text/html,text/xml"
    />


Restart Tomcat for these changes to take effect.



Gotcha's

1. Because Hyrax supports two differnt "directory" level views (THREDSS catalog.html and OPeNDAP contents.html) it is possible to have the views incompatible.

For example:

  • Your BES configuration allows you to see 3 collections: nc, hdf, and freeform.
  • Your THREDDS catalog configuration only allows you to see nc and freeform.

If you are in the opendap view for the hdf collection and you click the link to the THREDDS HTML or XML view you will get garbage.

The moral - Make Sure That your BES and THREDDS configurations fundamentally agree about what is visible to the user!