RDH Catalog Organization: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 116: Line 116:


In lieu of specifying the catalog name explicitly in the various BES requests, the BES will look at the first part of the resource ID and compare it to its list of catalogs. If the first part of the resource ID matches a catalog name then the BES will pass the request to that catalog. Otherwise it will pas the request to the default catalog.
In lieu of specifying the catalog name explicitly in the various BES requests, the BES will look at the first part of the resource ID and compare it to its list of catalogs. If the first part of the resource ID matches a catalog name then the BES will pass the request to that catalog. Otherwise it will pas the request to the default catalog.
===== Keep existing URLs working =====


For example, in the catalog URL ''<nowiki> http://localhost:8080/opendap/data/nc</nowiki>''
For example, in the catalog URL ''<nowiki> http://localhost:8080/opendap/data/nc</nowiki>''
Would produce a resource ID of:
Would produce a resource ID of ''/data/nc'' Unless the BES actually contains a catalog named ''data'' then a catalog request for the resource ID ''/data/nc'' will be routed to the default catalog.  
/data/nc
Unless the BES actually contains a catalog named ''data'' then a catalog request for the resource ID ''/data/nc'' will be routed to the default catalog.
 
 


===== Make the top level catalog include all of the catalogs =====


http://localhost:8080/opendap/catalog/
When a request is made for the default top-level catalog:
<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="/" />
</bes:request>
The BES will return a catalog containing an merging of the catalogs by name (as catalog ''nodes'') and a listing of the top-level of the default catalog. For example if the BES has the ''rdh'', ''cedar'', and ''jgofs'' catalogs installed, and the default catalog contains data sets, regular files, and nodes the response might look like this:


http://localhost:8080/opendap/rdh/
<response xmlns="http://xml.opendap.org/ns/bes/1.0#" reqID="some_unique_value">
    <showCatalog>
        <dataset <font color=red>catalog="catalog" (What would this be in this example?)</font>  count="13" lastModified="2009-03-24T19:02:46" name="/" node="true" size="578">
            <dataset catalog="rdh" lastModified="2003-09-11T16:27:59" name="rdh" node="true" size="-1" />
            <dataset catalog="jgofs" lastModified="2001-05-29T23:32:04" name="jgofs" node="true" size="-1" />
            <dataset catalog="cedar" lastModified="2017 -12-30T21:47:58" name="cedar" node="true" size="-1" />
            <dataset catalog="catalog" lastModified="2008-11-19T20:56:15" name="Test" node="true" size="918" />
            <dataset catalog="catalog" lastModified="2008-05-29T23:32:34" name="Trunk Monkey" node="true" size="238" />
            <dataset catalog="catalog" lastModified="2008-05-29T23:31:56" name="bears.nc" node="false" size="852">
                <serviceRef>dap</serviceRef>
            </dataset>
            <dataset catalog="catalog" lastModified="2008-09-11T16:27:59" name="coverage" node="true" size="102" />
            <dataset catalog="catalog" lastModified="2009-01-30T20:10:16" name="data" node="true" size="510" />
            <dataset catalog="catalog" lastModified="2009-03-25T17:58:54" name="fanfang" node="true" size="306" />
            <dataset catalog="catalog" lastModified="2009-03-24T19:01:22" name="fanfang.tar.gz" node="false" size="9055684" />
            <dataset catalog="catalog" lastModified="2008-12-05T19:23:30" name="namespaceAttributesTest.nc" node="false" size="3396">
                <serviceRef>dap</serviceRef>
            </dataset>
            <dataset catalog="catalog" lastModified="2009-02-12T21:25:11" name="netcdf" node="true" size="136" />
            <dataset catalog="catalog" lastModified="2008-09-17T21:45:19" name="robots.txt" node="false" size="25" />
            <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="sst" node="true" size="68" />
            <dataset catalog="catalog" lastModified="2008-12-30T21:47:58" name="thredds" node="true" size="68" />
            <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="wcs" node="true" size="68" />
        </dataset>
    </showCatalog>
</response>


http://localhost:8080/opendap/cedar/
If there is a conflict between catalog name(s) and holdings in the default catalog '''catalog''' the offending catalog name(s) can be remapped using the BES configuration.


http://localhost:8080/opendap/jgofs/
'''GAH!'''


http://localhost:8080/opendap/
'''I still think that this is totally busted and I don't see how to fix it with out simply rewriting the OLFS to use correctly work with the different catalogs in the BES which WILL change all of the URL's. The fact is that we (well ok, I) didn't get it right. And this is a pretty intractable problem otherwise. Fixing it in one of these half-assed ways just begs for the whole thing to come crashing down around us, not to mention the burden of creating an even MORE COMPLEX install and configuration procedure for our users. Maybe one of you can see a way out of this box... [[User:Ndp|ndp]] 13:35, 7 May 2009 (PDT)'''

Revision as of 20:35, 7 May 2009

Definitions

Catalog
A collection of data holdings on a DAP server. This collection may be flat (a simple list) or it may be hierarchically organized.
Navigation URL
A URL that returns an HTML document that when rendered allows a human to navigate the catalog of data holdings on a DAP server.
Catalog URL
Inventory URL
A URL that returns a machine readable catalog (aka inventory) of data holdings on a DAP server. The most common of these is a THREDDS catalog URL.
Resource URL
The globally (like internet globally) unique address of a resource.
For example:
Resource ID
The locally unique ID of a resource. Locally here means within a particular server.

In the BES that means the catalog name followed by the path in that catalog to the resource.
For example:
  • /rdh/dataSourceOne - is the resource ID of the resource called dataSourceOne in the catalog called rdh
  • /catalog/data/nc/fnoc1.nc - is the resource ID of the resource called data/nc/fnoc1.nc in the catalog called catalog
  • /data/nc/fnoc1.nc - is the resource ID of the resource called data/nc/fnoc1.nc in the default catalog (which is confusingly named catalog)

In Hyrax that means the all of the stuff after the protocol, server name, and context name.
In the URL http://test.opendap.org:8080/opendap/data/nc/fnoc1.nc.dds the resource ID is: /data/nc/fnoc1.nc.dds
Data access URL
A URL that returns DAP data product (DDS, DDX, DAS, DataDDS, etc.) from a holding on a DAP server.

RDH Catalog

The RDH will need to have it's own catalog in the BES catalog system, by default it should be called rdh. It can be remapped to a different name via the BES configuration file if needed (see below).

In order to populate its catalog the RDH needs to be able to provide a representation of its data holdings. Since the RDH relies on a simple list of ODBC Data Sources in its configuration file (See the RDH Configuration Use Case and the RDH Catalog Use Case ) it's catalog is flat, just the simple list of Data Source names.

In BES speak that means that RDH catalog contains no sub collections, aka containers, aka child catalogs. The RDH holdings can and should be represented as a simple list of DAP data sets. Each of these data sets has a DDS/DDX/DAS representation that can be accessed in the typical DAP manner. When responding to a bes:showCatalog request the RDH should return a catalog composed of a top level dataset that contains a list of dataset elements each of which has a node attribute whose value is false and (at minimum) a child bes:serviceRef element with a value of "dap" :

 <response xmlns="http://xml.opendap.org/ns/bes/1.0#" reqID="some_unique_value">
   <showCatalog>
       <dataset catalog="rdh" count="13" lastModified="2009-03-24T19:02:46" name="/" node="true" size="578">
           <dataset catalog="catalog" lastModified="-1" name="dataSourceOne" node="false" size="-1">
               <serviceRef>dap</serviceRef>
           </dataset>
           <dataset catalog="catalog" lastModified="-1" name="dataSourceTwo" node="false" size="-1">
               <serviceRef>dap</serviceRef>
           </dataset>
           <dataset catalog="catalog" lastModified="-1" name="dataSourceThree" node="false" size="-1">
               <serviceRef>dap</serviceRef>
           </dataset>
       </dataset>
   </showCatalog>
</response>

size Attribute

What does the size of the dataset mean in this context?

  • Is it the aggregate size of all of the holdings in the Data Source?
  • Can that be easily determined through the ODBC API?

If determining the size of the holding is a sensible activity (there is a decent API, it's time efficient, etc) then we should do it. Otherwise return a "-1" to indicate that the value is not known.

lastModified Attribute

What does the last modified date of the dataset mean in this context?

  • Is it the the last time time that data was added to the Data Source?
  • Is is the last time the Data Source definition was changed?
  • What happens if the Data Source defines a subset of the available holdings in the underlying RDBMS?
    • Is last modified then the last time that one of the tables or views in the RDBMS made are available through the Data Source was changed?
  • Can that be easily determined through the ODBC API?

If determining the last modified time of the holding is a sensible activity (there is a decent API, it's time efficient, etc) then we should do it. Otherwise we should return a "-1" to indicate that the value is not known. (Is that right? What's the missing value for this in the BES XML API?)

Hyrax/BES catalog integration issues

Up until now Hyrax has been relying on only the default catalog in the BES. The default catalog is named catalog (Unfortunately for those of us reading and writing about it the word is both a proper noun and a regular noun in this context - confusing to say the least) The BES supports any number of other catalogs, all of which have default names that can be remapped if needed through the BES configuration.(Patrick: Is that true? If not it should be.)

Unfortunately when Hyrax was developed it was written to rely on only the default BES catalog. This manifests in the Hyrax URLs as follows:

The catalog URL http://localhost:8080/opendap/data/nc refers to the top level catalog in the default BES catalog. Accessing this URL issues the BES command:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="/data/nc" />
</bes:request>

Notice that:

  • The bes:showCatalog element does not specify a particular catalog, thus it implicitly specifies the default catalog catalog.
  • The catalog URL ahs been reduced to a BES resource ID of /data/nc

Which is equivalent to the BES command:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="catalog:/data/nc" />
</bes:request>
Patrick: Is that true? How do you "do" a bes:showCatalog for a particular catalog? It's not documented here: BES XML Commands - User:ndp
I changed the documentation at the BES XML Commands page and modified the request document above for showing the root node of the catalog named "catalog". This needs to change. Should be able to say <bes:showCatalog catalog="catname" node="/" /> - User:pwest

If Hyrax had been written to support the implicit catalog organization in the BES the the correct catalog URL would have been: http://localhost:8080/opendap/catalog/data/nc

Changing this presents backward compatibility issues for Hyrax in that it is important that Hyrax not change the access URL's to existing data sets, catalogs, or inventories.

Possible Solutions

In order to acheive this compatibility the following schemes have been proposed:

OLFS DispatchHandler for each catalog

For each catalog that gets added to the BES a new OLFS DispatchHandler is written. This really would entail making a copy of all the classes in the java package opendap.bes and modifying them slightly to allow for the different catalog names. It's a crummy solution because it grows the code base in a crazy way - lot's of replicate functionality that becomes a real problem to maintain as changes to the BES API will have to be implemented in multiple places in the OLFS code base. This, I predict won't happen and things will constantly broken.

It also means that every time someone writes a BES module that provides a complete DAP services stack and a new catalog that Hyrax will be unable to take advantage of it without having a java program written to support it. That's a bloody waste of resources.

BES interprets resource ID's

In lieu of specifying the catalog name explicitly in the various BES requests, the BES will look at the first part of the resource ID and compare it to its list of catalogs. If the first part of the resource ID matches a catalog name then the BES will pass the request to that catalog. Otherwise it will pas the request to the default catalog.

Keep existing URLs working

For example, in the catalog URL http://localhost:8080/opendap/data/nc Would produce a resource ID of /data/nc Unless the BES actually contains a catalog named data then a catalog request for the resource ID /data/nc will be routed to the default catalog.

Make the top level catalog include all of the catalogs

When a request is made for the default top-level catalog:

<bes:request xmlns:bes="http://xml.opendap.org/ns/bes/1.0#" reqID="[http-8080-1:16:bes_request]">
  <bes:setContext name="errors">xml</bes:setContext>
  <bes:showCatalog node="/" />
</bes:request>

The BES will return a catalog containing an merging of the catalogs by name (as catalog nodes) and a listing of the top-level of the default catalog. For example if the BES has the rdh, cedar, and jgofs catalogs installed, and the default catalog contains data sets, regular files, and nodes the response might look like this:

<response xmlns="http://xml.opendap.org/ns/bes/1.0#" reqID="some_unique_value">
   <showCatalog>
       <dataset catalog="catalog" (What would this be in this example?)  count="13" lastModified="2009-03-24T19:02:46" name="/" node="true" size="578">
           <dataset catalog="rdh" lastModified="2003-09-11T16:27:59" name="rdh" node="true" size="-1" />
           <dataset catalog="jgofs" lastModified="2001-05-29T23:32:04" name="jgofs" node="true" size="-1" />
           <dataset catalog="cedar" lastModified="2017 -12-30T21:47:58" name="cedar" node="true" size="-1" />

           <dataset catalog="catalog" lastModified="2008-11-19T20:56:15" name="Test" node="true" size="918" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:34" name="Trunk Monkey" node="true" size="238" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:31:56" name="bears.nc" node="false" size="852">
               <serviceRef>dap</serviceRef>
           </dataset>
           <dataset catalog="catalog" lastModified="2008-09-11T16:27:59" name="coverage" node="true" size="102" />
           <dataset catalog="catalog" lastModified="2009-01-30T20:10:16" name="data" node="true" size="510" />
           <dataset catalog="catalog" lastModified="2009-03-25T17:58:54" name="fanfang" node="true" size="306" />
           <dataset catalog="catalog" lastModified="2009-03-24T19:01:22" name="fanfang.tar.gz" node="false" size="9055684" />
           <dataset catalog="catalog" lastModified="2008-12-05T19:23:30" name="namespaceAttributesTest.nc" node="false" size="3396">
               <serviceRef>dap</serviceRef>
           </dataset>
           <dataset catalog="catalog" lastModified="2009-02-12T21:25:11" name="netcdf" node="true" size="136" />
           <dataset catalog="catalog" lastModified="2008-09-17T21:45:19" name="robots.txt" node="false" size="25" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="sst" node="true" size="68" />
           <dataset catalog="catalog" lastModified="2008-12-30T21:47:58" name="thredds" node="true" size="68" />
           <dataset catalog="catalog" lastModified="2008-05-29T23:32:04" name="wcs" node="true" size="68" />
       </dataset>
   </showCatalog>
</response>

If there is a conflict between catalog name(s) and holdings in the default catalog catalog the offending catalog name(s) can be remapped using the BES configuration.

GAH!

I still think that this is totally busted and I don't see how to fix it with out simply rewriting the OLFS to use correctly work with the different catalogs in the BES which WILL change all of the URL's. The fact is that we (well ok, I) didn't get it right. And this is a pretty intractable problem otherwise. Fixing it in one of these half-assed ways just begs for the whole thing to come crashing down around us, not to mention the burden of creating an even MORE COMPLEX install and configuration procedure for our users. Maybe one of you can see a way out of this box... ndp 13:35, 7 May 2009 (PDT)