Hyrax Metadata Management: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 26: Line 26:




== Products ==


=== THREDDS Catalog Metadata ===
=== NcML ===


== Possible Technologies ==
== Possible Technologies ==

Revision as of 16:14, 30 March 2010

Overview

Two current projects are deeply involved in the metadata aspects of our services.

  • The REAP project is working on crawling the DDX holdings of our servers and building an EML based catalog of the holdings for Metacat.
  • The WCS project is using semantic web tools to build a catalog of WCS Coverages from existing metadata in the DDXs help on a server.

New projects in the pipeline have similar needs for metadata:


All of these projects require that the DDX content of some part of a Hyrax server be collected. Both the EML work and the THREDDS Catalog Metadata will likely need to ingest the entire DDX holdings of a server.

I posit that it is not going to be effective to build THREDDS catalogs dynamically for each request received. This would mean opening all of the data files in a collection (potentially thousands) and reading enough data to build the DDX and then returning that to the THREDDS catalog response builder. The overhead for this on a busy service would be unacceptable, not to mention the fact that there would be a significant amount of building identical responses.

I think similar arguments exist for the EML project, WCS, and possibly the image services.

Which brings me to my point:

Do we need to build a metadata data cache within Hyrax?
I think the answer is YES.

The community of data providers is maturing. Major projects such as IOOS and OOI are driving forward with the cyber infrastructure for the sharing of data between nearly disjoint fields of study. The regional OOS systems in NOAA are building web applications that utilize catalog and dataset metadata to drive their applications.

Other software such as the TDS are clearly using the metadata content to provide higher level services.


Products

THREDDS Catalog Metadata

NcML

Possible Technologies

We could home grow our own.

James has been working with a crawler that caches content. This could potentially be used to build a metadata cache. The idea here would be to allow the C++ API to load DDX (documents? Memory objects?) from a cache and then utilize them in the C++ programing environment.

Semantic Web & RDF Triple Store

We could extend the semantic repository tools that have been developed for WCS. The DDX document would be converted to their RDF representation and placed in a Triple Store (Semantic Repository). New inferencing rules, queries, and (java) software could be written to create new products.

  • Do they scale?
  • How hard is it to add inferencing and queries to the system?
  • As of now this appears to be a java only implementation.

Relational Database

We could use a relational database to hold the metadata content. At this point I don't even have an idea for a database schema...

  • Is installing PostgreSQL or the equivalent an undo burden for users?