Hyrax Catalogs: Difference between revisions

From OPeNDAP Documentation
No edit summary
(No difference)

Revision as of 23:06, 20 April 2011

Hyrax/BES catalog integration issues

Up until now Hyrax has been relying on only the default catalog in the BES. The default catalog is named catalog (Unfortunately for those of us reading and writing about it the word is both a proper noun and a regular noun in this context - confusing to say the least) The BES supports any number of other catalogs, all of which have default names that can be remapped if needed through the BES configuration.(Patrick: Is that true? If not it should be. - yes, this is true. Could be called something different in the configuration file)

Unfortunately when Hyrax was developed it was written to rely on only the default BES catalog. This manifests in the Hyrax URLs as follows:

The catalog URL http://localhost:8080/opendap/ refers to the top level in the default BES catalog. Accessing this URL issues the BES command:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>

Notice that:

  • The bes:showCatalog element does not specify a particular catalog, thus it implicitly specifies the default catalog catalog.
  • The catalog URL has been reduced to a BES resource ID of /

Which is equivalent to the BES command:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="catalog:/" />
Patrick: Is that true? How do you "do" a bes:showCatalog for a particular catalog? It's not documented here: BES XML Commands - User:ndp
I changed the documentation at the BES XML Commands page and modified the request document above for showing the root node of the catalog named "catalog". This needs to change. Should be able to say <bes:showCatalog catalog="catname" node="/" /> - User:pwest

If Hyrax had been written to support the implicit catalog organization in the BES then the correct catalog URL would have been: http://localhost:8080/opendap/catalog/data/nc

Changing this presents backward compatibility issues for Hyrax in that it is important that Hyrax not change the access URL's to existing data sets, catalogs, or inventories.

Possible Solutions

In order to acheive this compatibility the following schemes have been proposed:

OLFS DispatchHandler for each catalog

For each catalog that gets added to the BES a new OLFS DispatchHandler is written. This really would entail making a copy of all the classes in the java package opendap.bes and modifying them slightly to allow for the different catalog names. It's a crummy solution because it grows the code base in a crazy way - lot's of replicate functionality that becomes a real problem to maintain as changes to the BES API will have to be implemented in multiple places in the OLFS code base. This, I predict won't happen and things will constantly broken.

It also means that every time someone writes a BES module that provides a complete DAP services stack and a new catalog that Hyrax will be unable to take advantage of it without having a java program written to support it. That's a bloody waste of resources.

BES interprets resource ID's

In lieu of specifying the catalog name explicitly in the various BES requests, the BES will look at the first part of the resource ID and compare it to its list of catalogs. If the first part of the resource ID matches a catalog name then the BES will pass the request to that catalog. Otherwise it will pass the request to the default catalog.

Keep existing URLs working

For example, in the catalog URL http://localhost:8080/opendap/data/nc Would produce a resource ID of /data/nc Unless the BES actually contains a catalog named data then a catalog request for the resource ID /data/nc will be routed to the default catalog.

Make the top level catalog include all of the catalogs

When a request is made for the default top-level catalog:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="/" />

The BES will return a catalog containing an merging of the catalogs by name (as catalog nodes) and a listing of the top-level of the default catalog. For example if the BES has the rdh, cedar, and jgofs catalogs installed, and the default catalog contains data sets, regular files, and nodes the response might look like this:

<response xmlns="http://xml.opendap.org/ns/bes/1.0#" reqID="some_unique_value">
       <dataset catalog="catalog" (What would this be in this example?)  count="13" lastModified="2009-03-24T19:02:46" name="/" node="true" size="578">
           <dataset catalog="rdh" lastModified="2003-09-11T16:27:59" name="rdh" node="true" size="-1" />
           <dataset catalog="jgofs" lastModified="2001-05-29T23:32:04" name="jgofs" node="true" size="-1" />
           <dataset catalog="cedar" lastModified="2017 -12-30T21:47:58" name="cedar" node="true" size="-1" />

           <dataset catalog="catalog" lastModified="2008-11-19T20:56:15" name="Test" node="true" size="918" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:34" name="Trunk Monkey" node="true" size="238" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:31:56" name="bears.nc" node="false" size="852">
           <dataset catalog="catalog" lastModified="2008-09-11T16:27:59" name="coverage" node="true" size="102" />
           <dataset catalog="catalog" lastModified="2009-01-30T20:10:16" name="data" node="true" size="510" />
           <dataset catalog="catalog" lastModified="2009-03-25T17:58:54" name="fanfang" node="true" size="306" />
           <dataset catalog="catalog" lastModified="2009-03-24T19:01:22" name="fanfang.tar.gz" node="false" size="9055684" />
           <dataset catalog="catalog" lastModified="2008-12-05T19:23:30" name="namespaceAttributesTest.nc" node="false" size="3396">
           <dataset catalog="catalog" lastModified="2009-02-12T21:25:11" name="netcdf" node="true" size="136" />
           <dataset catalog="catalog" lastModified="2008-09-17T21:45:19" name="robots.txt" node="false" size="25" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="sst" node="true" size="68" />
           <dataset catalog="catalog" lastModified="2008-12-30T21:47:58" name="thredds" node="true" size="68" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="wcs" node="true" size="68" />

If there is a conflict between catalog name(s) and holdings in the default catalog catalog the offending catalog name(s) can be remapped using the BES configuration.


I still think that this is totally busted and I don't see how to fix it with out simply rewriting the OLFS to correctly work with the different catalogs in the BES which WILL change all of the URL's. The fact is that we (well ok, I) didn't get it right. And this is a pretty intractable problem otherwise. Fixing it in one of these half-baked ways just begs for the whole thing to come crashing down around us, not to mention the burden of creating an even MORE COMPLEX install and configuration procedure for our users. Maybe one of you can see a way out of this box... ndp 13:35, 7 May 2009 (PDT)

Patrick seez

In the BES

instead of looking for the catalog as part of the node name, look for an attribute of the showCatalog element called catalog. If not present and ( the node attribute is not present or is empty or is slash ) then I return the list of catalogs.

If not present and the node is present and the node is not empty and the node is not slash then I default the catalog name to the default catalog specified in the BES configuration file.

If present then I use that catalog name.

In the response for <showCatalog /> I give you the list of the catalogs where the default catalog is marked with default="true".

Or perhaps there's a new command <showCatalogs /> that gives the list of catalogs?

In the OLFS

When the OLFS starts up make a <showCatalog /> request to get the list of catalogs.

When someone browses to www.example.com/opendap/ then you show them the list of catalogs.

When someone browses to www.example.com/opendap/catalogname then you show them the root node of that catalog

When someone browses to www.example.com/opendap/data/nc/ (no catalog name is present) then this is an old URL, you pass this off to the BES without a catalog name, and the BES will use the default catalog.

When someone browses to www.example.com/opendap/data/nc/fnoc1.nc.das (no catalog name is present) then, again this is an old URL. Given the list of catalogs in the beginning you use the default catalog name.

How can you tell if the catalog name is not present ... when the OLFS started up you asked for the list of catalogs. If data is not the name of a catalog then you know we are using the default catalog.

Or ... if possible ... redirect the old URLs to one with the catalog name. Not sure if you can do this. So, if someone browses to www.example.com/opendap/data/nc then you redirect that to www.example.com/opendap/catalog/data/nc, and the same for www.example.com/opendap/data/nc/fnoc1.nc.das you redirect to www.example.com/opendap/catalog/data/nc/fnoc1.nc

What do you think?

I don't like the idea of when asked for the root node, if no catalog is specified then I give you the root for the default catalog and the names of the other catalogs. I think that if you ask for no catalog and no node then you should get the list of catalogs, including the default.

<showCatalog />
 <response xmlns="http://xml.opendap.org/ns/bes/1.0#" reqID="some_unique_value">
       <dataset catalog="root"  count="4" lastModified="2009-03-24T19:02:46" name="/" node="true" size="-1">
           <dataset catalog="catalog" lastModified="2003-09-11T16:27:59" name="catalog" node="true" size="-1" default="true" />
           <dataset catalog="rdh" lastModified="2003-09-11T16:27:59" name="rdh" node="true" size="-1" />
           <dataset catalog="jgofs" lastModified="2001-05-29T23:32:04" name="jgofs" node="true" size="-1" />
           <dataset catalog="cedar" lastModified="2017 -12-30T21:47:58" name="cedar" node="true" size="-1" />


Why does this matter? Do we need to use multiple catalogs for something? jimg 10:43, 12 August 2009 (PDT)

Yes. As we grow Hyrax we will need to support additional catalogs and services. Because both the BES and the OLFS can independently (and as part of their regular usage pattern) modify the catalog by adding either services which are connected to a URL or catalog content that is mapped into the URL space.
I guess one way to think of this is that the catalog for a particular instance of Hyrax is the colection of all URLs that are valid. (Where valid means that they return an HTTP status of 200.) Can't we consider the collection of URL stings to represent a tree? Doesn't it define a collection of named nodes? Some which are leaves and some which contain more stuff?
--ndp 09:21, 7 March 2010 (PST)