Hyrax - OLFS Configuration

From OPeNDAP Documentation
Revision as of 00:21, 22 March 2007 by Ndp (talk | contribs) (New page: This document should help you get started configuring the OLFS component of Hyrax. This servlet was developed, compiled, and tested using the java 1.5.0 compiler, the 1.5.0 Java Virtual Ma...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

This document should help you get started configuring the OLFS component of Hyrax. This servlet was developed, compiled, and tested using the java 1.5.0 compiler, the 1.5.0 Java Virtual Machine, and Jakarta Tomcat 5.5 (which also provided the javax.servlet packages).

Note: All Examples of web.xml configurations in this document were check against Jakarta Tomcat 5.5.0

The OLFS web application is composed of two servlets, the OLFS servlet and the Docs servlet.

  • The OLFS servlet does the majority of the work in the OLFS web application. It handles client requests for the various OPeNDAP data products, THREDDS catalogs, and OPeNDAP directories.
  • The Docs servlet provides clients access to a tree of static documents.



OLFS Servlet Configuration

The OLFS servlet gets its configuration from 5 files: olfs.xml, catalog.xml, security.xml, log4j.xml, and web.xml.

Files

Locations

  • olfs.xml - Located in in the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/
  • catalog.xml - Located in in the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/
  • security.xml - Located in in the persistent content directory which by default is located at $CATALINA_HOME/content/opendap/
  • web.xml - The servlet's web.xml file located in the WEB-INF directory of the web application "opendap". Typically that means $CATALINA_HOME/webapps/opendap/WEB-INF/web.xml
  • log4j.xml - The default location for the log4j.xml is in the WEB-INF directory of the web application "opendap". Typically that means $CATALINA_HOME/webapps/opendap/WEB-INF/log4j.xml However, Hyrax can be configured to look in additional places for the log4j.xml file. Read More About It Here.

Configuration Roles

  • olfs.xml - Contains the OLFS localization configuration - location of the BES, directory view instructions, etc.
  • catalog.xml - THREDDS catalogs configuration.
  • web.xml - Core servlet configuration.
  • log4j.xml - Contains the logging configuration for Hyrax.

olfs.xml

Currently the OLFS has 3 normally configurable items:

The <OLFSConfig> element is the document root and it contains the following configuration information:

  1. The <BES> element contains the BES configuration information.
    • <host> element: The host/ip location of the BES.
    • <port> element: The port number on which the BES is listening.
    • <MaxClients> element: Provides the maximum number of concurrent BES client connections that the OLFS should maintain. These connections are pooled for efficiency and speed, and the limit should be determined empirically. If the <MaxClients> element is missing the pool size defaults to 10.
  2. The <DirectoryView> element is used to indicate which type of directory view clients see. This DOES NOT affect the THREDDS catalogs! Only the HTML views of them. A value of "THREDDS" will provide the THREDDS directory view and a value of "OPeNDAP" will produce the OPeNDAP directory view.
  3. The <AllowDirectDataSourceAccess /> element controls the users ability to directly access data sources via the web interface. If this element is present in the OLFS.xml file (and not commented out as in the example below) a client can get an entire data source (such as an HDF file) by simply requesting it through the HTTP URL interface. This is NOT a good practice and is not recommended. By default Hyrax ships with this option turned off and I recommend that you leave it that way unless you really want users to be able to circumvent the OPeNDAP request interface and have direct access to the data products stored on your server.


An example olfs.xml file:


  <?xml version="1.0" encoding="UTF-8"?>
  <OLFSConfig>

      <!-- The hostname (or IP address) and port location for the BES -->
      <BES>

          <!-- The hostname (or IP address) for the BES -->
          <host>localhost</host>

          <!-- The port number for the BES -->
          <port>10002</port>

          <!-- The Maximum number of concurrent BES client connections allowed. -->
          <MaxClients>10</MaxClients>

      </BES>

      <!-- Used to indicate which type of directory view clients see. This DOES NOT
           affect the THREDDS catalogs! Only the HTML views of them. A value
           of "THREDDS" will provide the THREDDS directory view and a value
           of "OPeNDAP" will produce the OPeNDAP directory view.-->

      <DirectoryView>THREDDS</DirectoryView>

      <!-- AllowDirectDataSourceAccess
         - If this element is opresent then the server will allow users to request
         - the data source (file) directly. For example a user could just get the
         - underlying NetCDF files located on the server without using the OPeNDAP
         - request interface.
         -
         - THINK TWICE before allowing this, as data sources can be quite large
         - and allowing their transmission with out subsetting can put heavy loads
         - on the network and the server.
         -->
      <!-- <AllowDirectDataSourceAccess /> -->

  </OLFSConfig>

catalog.xml

The catalog.xml file contains the THREDDS catalog configuration for Hyrax. It's complex. Read About It Here.

log4j.xml

The log4j.xml file contains the logging configuration for Hyrax. It too is complex. Read About It Here.

web.xml

We strongly recommend that you do NOT mess with the web.xml file. At least for now. Future versions of Server and the OLFS may have "user configurable" stuff in the web.xml file, but this version (0.1.5) does not. SO JUST DON'T DO IT. OK? Having said that, here are the details regarding the web.xml file:


Servlet Definition

The OLFS running in the opendap context area needs an entry in the web.xml file. Multiple instances of a servlet and/or several different servlets can be configured in the one web.xml file. For instance you could have a DTS and a Hyrax running in from the same web.xml and thus under the same servlet context. Running multiple instances of the OLFS in a single web.xml file (aka context) will NOT work.

Each a servlet needs a unique name which is specified inside a <servlet> element in the web.xml file using the <servlet-name> tag. This is a name of convenience, for example if you where serving data from an ARGOS satellite you might call that servlet argos.

Additionally each instance of a <servlet> must specify which Java class contains the actual servlet to run. This is done in the <servlet-class> element. For example the OLFS servlet class name is opendap.coreServlet.DispatchServlet

Here is a syntax example combining the two previous example values:

<servlet>
	<servlet-name>argos</servlet-name>
	<servlet-class>opendap.coreServlet.DispatchServlet</servlet-name>
	.
	.
	.
</servlet>

This servlet could then be accessed as: http://hostname/opendap/servlet/argos

You may also add to the end of the web.xml file a set of <servlet-mapping> elements. These allow you to abbreviate the URL or the servlet. By placing the servlet mappings:

<servlet-mapping>
    <servlet-name>argos</servlet-name>
    <url-pattern>/argos</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>argos</servlet-name>
    <url-pattern>/argos/*</url-pattern>
</servlet-mapping>

At the end of the web.xml file our previous example changes it's URL to: http://hostname/opendap/argos

Eliminating the need for the word servlet in the URL. For more on the <servlet-mapping> element see the Jakarta-Tomcat documentation.

<init-param> Elements

The OLFS uses <init-param> elements inside of each <servlet> element to get specific configuration information.

<init-param>'s common to all OPeNDAP servlets are:


DebugOn

This controls output to the terminal from which the servlet engine was launched. The value is a list of flags that turn on debugging instrumentation in different parts of the code. Supported values are:

  • probeRequest: Prints a lengthy inspection of the HttpServletRequest object to stdout. Don't leave this on for long, it will clog your Catalina logs.
  • DebugInterface: Enables the servers debug interface. This ineractive interface allows a user to look at (and change) the server state via a web browser. Enable this only for analysis purposes, disable when finshed!

Example:

    <init-param>
	<param-name>DebugOn</param-name>
	<param-value>probeRequest</param-value>
    </init-param>

Default: If this parameter is not set, or the value field is empty then these features will be disabled - which is what you want unless there is a problem to analyze.



OpendapHttpDispatchHandlerImplementation

This parameter specifies the handler implementation that provides the responses for the HTTP GET commands to the servlet framework. Don't be messin' with this! It's what makes Hyrax, well, Hyrax.

Example:

    <init-param>
        <param-name>OpendapHttpDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.HttpDispatchHandler</param-value>
    </init-param>

Default: This parameter must be set to the value opendap.bes.HttpDispatchHandler for the server to actually be an instance of Hyrax.



OpendapSoapDispatchHandlerImplementation

This parameter specifies the handler implementation that provides the responses for the HTTP POST commands (via a SOAP interface) to the servlet framework. Don't be messin' with this! It's what makes Hyrax, well, Hyrax.

Example:

    <init-param>
        <param-name>OpendapSoapDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.SoapDispatchHandler</param-value>
    </init-param>

Default: This parameter must be set to the value opendap.bes.SoapDispatchHandler for the server to actually be an instance of Hyrax.

Example of web.xml content

<servlet>

    <servlet-name>hyrax</servlet-name>

    <servlet-class>opendap.coreServlet.DispatchServlet</servlet-class>

    <init-param>
        <param-name>OpendapHttpDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.HttpDispatchHandler</param-value>
    </init-param>

    <init-param>
        <param-name>OpendapSoapDispatchHandlerImplementation</param-name>
        <param-value>opendap.bes.SoapDispatchHandler</param-value>
    </init-param>

    <init-param>
        <param-name>DebugOn</param-name>
        <param-value></param-value>
    </init-param>

    <load-on-startup>1</load-on-startup>

</servlet>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>*</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>/hyrax</url-pattern>
</servlet-mapping>

<servlet-mapping>
    <servlet-name>hyrax</servlet-name>
    <url-pattern>/hyrax/*</url-pattern>
</servlet-mapping>



Docs Servlet

The Docs (or documentation) servlet provides the OLFS web application with the ability to serve a tree of static documentation files. By default it will serve the files in the documentation tree provided with the OLFS in the Hyrax distribution. This tree is rooted at $CATALINA_HOME/webapps/opendap/docs/ and contains documentation pertaining to the software in the Hyrax distribution - installation and configuration instruction, release notes, java docs, etc.

If one wished to supplant this information with their own set of web pages, one could remove/replace the files in the default directory. However, installing a new version of Hyrax will cause these files to be overwritten, forcing them to be replaced after the install (and hopefully AFTER the new release documentation had been read and understood by the user).

The Docs servlet provides an alternative to this. If a docs directory is created in the persistent content directory for Hyrax the Docs servlet will detect it (when Tomcat is launched) and it will serve files from there instead of from the default location.

This scheme provides 2 beneficial effects:

  1. It allows localizations of the web documents associated with Hyrax to persist through Hyrax upgrades with no user intervention.
  2. It preserves important release documents that ship with the Hyrax software.

In summary, to provide persistent web pages as part of a Hyrax localization simple create the directory: $CATALINA_HOME/content/opendap/docs

Place your content in there and away you go. If later you wish to view the web based documentation bundled with Hyrax simply change the name of the directory from docs to something else and restart Tomcat. (or, you could just look in the $CATALINA_HOME/webapps/opendap/docs directory)

In the Docs servlet, is a URL ends in a directory name or a "/" then the servlet will attempt to serve the index.html in that directory. In other words index.html is the default document.



Logging

Logging is a big enough subject we gave it it's own page.



Security

Request For Input

So far Hyrax relies on the Tomcat security implementation. If after reading this section and the Tomcat documentation on security you are left wanting something more, then please get in touch with us at OPeNDAP. We would like to talk with you about your security needs for OPeNDAP Hyrax and possibly develop some use cases that describe your needs. In the future we hope to provide a richer security configuration environment than the one currently offered by Tomcat.

Synopsis

Hyrax currently relies on the security implemented by Tomcat. It is recommended that you read carefully and understand the Tomcat 5.x security documentation.

Tomcat security requires fairly extensive additions to the web.xml file. (It is important to keep in mind that altering the <servlet> definitions may render you Hyrax inoperable - please see the previous sections that discuss this.)

Limitations

Officially Tomcat security supports context level authentication. What this means is that you can restrict access to the collection of servlets running in a single web application - in other words all of the stuff that is defined in a single web.xml file. You can call out different authentication rules for different <url-pattern>'s within the web application, but only clients do not cache ANY security information will be able to easily access the different areas.

For example in your web.xml file you might have:

    <security-constraint>
        <web-resource-collection>
            <web-resource-name>fnoc1</web-resource-name>
            <url-pattern>/hyrax/nc/fnoc1.txt</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <role-name>fn1</role-name>
        </auth-constraint>
    </security-constraint>

    <security-constraint>
        <web-resource-collection>
             <web-resource-name>fnoc2</web-resource-name>
             <url-pattern>/hyrax/nc/fnoc2.txt</url-pattern>
         </web-resource-collection>
         <auth-constraint>
             <role-name>fn2</role-name>
          </auth-constraint>
    </security-constraint>

    <login-config>
        <auth-method>BASIC</auth-method>
        <realm-name>MyApplicationRealm</realm-name>
    </login-config>

Where the the security roles fn1 and fn2 (defined in the tomcat-users.xml file) have no common members.

The complete URI's would be:

http://localhost:8080/mycontext/hyrax/nc/fnoc1.txt
http://localhost:8080/mycontext/hyrax/nc/fnoc2.txt

Now - this works, for clients that aren't too smart - i.e. they don't cache anything. However, if you try with a browser, once you authenticate for one URI, then you am locked out of the other one until I successfully "reset" the browser (purge all caches).

I think the reason is as follows: In the exchange between Tomcat and the client, Tomcat is sending the header:

WWW-Authenticate: Basic realm="MyApplicationRealm"

And the client authenticates. When the second URI is accessed Tomcat sends the the same authentication challenge, with the same WWW-Authenticate header. The client, having recently authenticated to this realm-name (defined in the <login-config> element in the web.xml file - see above) , resends the authentication information, and, since it's not valid for that url pattern, the request is denied.

Persistence

You should be careful back up your modified web.xml file to a location outside of the $CATALINA_HOME/webapps/opendap directory as new versions of Hyrax will overwrite it when installed. You could use an XML ENTITY and an entity reference in the web.xml to cause a local file containing the security configuration to be included in the web.xml. For example adding the ENITIY:

[<!ENTITY securityConfig SYSTEM "file:/fully/qualified/path/to/your/security/config.xml">]

To the <!DOCTYPE> declaration at the top of the web.xml in conjunction with adding an entity reference:

&securityConfig;

To the content of the <web-app> element would cause your external security configuration to be included in the web.xml file.

Here is an example of an ENTITY configuration:

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE web-app
        PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
        "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd"
        [<!ENTITY securityConfig      SYSTEM "file:/fully/qualified/path/to/your/security/config.xml">]
    >
    <web-app>

        <!--
            Loads a persistent security configuration from the content directory.
            This configuration may be empty, in which case no security constraints will be
            applied by Tomcat.
        -->
        &securityConfig;

        .
        .
        .

    </web-app>
 

This will not prevent you from losing your web.xml file when a new version of Hyrax is installed, but adding the ENTITY stuff to the new web.xml file would be easier than remembering an extensive security configuration. Of course, Y.M.M.V.



Compressed Responses and Tomcat

Many OPeNDAP clients accept compressed responses. This can greatly increase the efficiency of the client/server interaction by diminishing the number of bytes actually transmitted over "the wire". Tomcat provides native compression support for the GZIP compression mechanism, however it is NOT turned on by default.

The following example is based on Tomcat 5.15. We recommend that you read carefully the Tomcat documentation related to this topic before proceeding:


Details

To enable compression you will need to edit the $CATALINA_HOME/conf/server.xml file. You will need to locate the <Connector> element associated with your server, typically this will be the only <Connector> element whose port attribute is set equal to 8080. To this you will need to add/change several attributes to enable compression.

With my Tomcat 5.5 distribution I found this default <Connector> element definition in my server.xml file:

    <Connector port="8080" maxHttpHeaderSize="8192"
        maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
        enableLookups="false" redirectPort="8443" acceptCount="100"
        connectionTimeout="20000" disableUploadTimeout="true"
        compression="no"
     />

You will need to add to this four attributes:

        compression="force"
        compressionMinSize="2048"
        noCompressionUserAgents="gozilla, traviata"
        compressableMimeType="text/html,text/xml"

Notice that there is a list of compressible MIME types. Basically:

  • compression="no" means nothing gets compressed.
  • compression="yes" means only the compressible MIME types get compressed.
  • compression="force" means everything gets compressed (assuming the client accepts gzip and the response is bigger than compressionMinSize)

You MUST set compression="force" for compression to work with the OPeNDAP data transport.

The final result being:

    <Connector port="8080" maxHttpHeaderSize="8192"
        maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
        enableLookups="false" redirectPort="8443" acceptCount="100"
        connectionTimeout="20000" disableUploadTimeout="true"
        compression="force"
        compressionMinSize="2048"
        noCompressionUserAgents="gozilla, traviata"
        compressableMimeType="text/html,text/xml"
    />


Restart Tomcat for these changes to take effect.



Gotcha's

1. Because Hyrax supports two differnt "directory" level views (THREDSS catalog.html and OPeNDAP contents.html) it is possible to have the views incompatible.

For example:

  • Your BES configuration allows you to see 3 collections: nc, hdf, and freeform.
  • Your THREDDS catalog configuration only allows you to see nc and freeform.

If you are in the opendap view for the hdf collection and you click the link to the THREDDS HTML or XML view you will get garbage.

The moral - Make Sure That your BES and THREDDS configurations fundamentally agree about what is visible to the user!