Semantic Generation Of WCS Catalogs

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Introduction

A WCS service must maintain a catalog of Coverages that will be used to generate the wcs:Capabilities and wcs:Coverages documents. In the Hyrax WCS service the implementation of this catalog is identified at server start-up from the DispatchHandler's configuration element in the olfs.xml file. This page documents the work on the development and usage of a catalog implementation that utilizes semantic web technologies to generate the WCS catalog content from existing (and probably supplemented) metadata from within the existing OPeNDAP data framework.

Technologies

The semantic activities rely on a number of software technologies that may or may not be familiar to the reader. Here is a brief overview of the core concepts.

RDF

The Resource Description Framework (RDF) is a W3C standard for describing Web resources, such as the title, author, modification date, content, and copyright information of a Web page. RDF documents can be written in XML (RDF/XML). RDF is intended to be read by machines, not by people (which means that reading it with your eyes is a pain)

RDF is comprised of a simple statement syntax call a triple that could be expressed in english as: The property of the resource is the the property's value. (ex: The color of the flower is red.)

All of the working bits of the WCS catalog are stored as RDF statements.

There is much more to know about RDF. These pages can provide a starting point:


OWL

The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full. [1] OWL adds vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.

For our work OWL is used to express ontologies of various metadata conventions/standards found within the DAP community, and to define crosswalks between these ontologies that allow us to "migrate" metadata from one convention to another.

As with RDF, there is much more to know about OWL. These pages can provide a starting point:


SWRL, SeQRL, & SPARQL

These languages are used to express the rules and queries used in the semantic processing to encode the crosswalk information used to create (migrate? a better word is needed here... ndp) metadata from one convention/standard to another.


SWRL: A Semantic Web Rule Language

SWRL is based on a combination of the OWL DL and OWL Lite sublanguages of the OWL Web Ontology Language with the Unary/Binary Datalog RuleML sublanguages of the Rule Markup Language. SWRL includes a high-level abstract syntax for Horn-like rules in both the OWL DL and OWL Lite sublanguages of OWL.


Let's fill in more information about these technologies please.


More information regarding SWRL can be found here:

SeQRL: A Semantic Web Rule Language

SeQRL is an SQL-based language which works with ontologies in RDF(S).


Let's fill in more information about these technologies please.


More information regarding SeQRL can be found here:

SPARQL Query Language for RDF

The SPARQL Protocol and RDF Query Language (SPARQL) is a query language and protocol for RDF. SPARQL is expressed in an XML syntax.


Let's fill in more information about these technologies please.


More information regarding SPARQL can be found here:

Implementations

For each of the technologies listed above an actual piece of software (an implementation) that manifests the behaviors/requirements of the technology is required.

Sesame RDF Repository

Sesame is an open source framework (written in Java) for storage, inferencing and querying of RDF data. Sesame provides a common API for RDF parsers, writers, and RDF stores.

Since OWl is an extension of RDF, all things OWL may be stored in a Sesame repository.

Swift OWLIM

OWLIM is a high-performance semantic repository developed in Java. It is packaged as a Storage and Inference Layer (SAIL) for the Sesame RDF database. OWLIM is based on TRREE – a native RDF rule-entailment engine.

The supported semantics can be configured through rule-set definition and selection. The most expressive pre-defined rule-set combines unconstrained RDFS with most of OWL Lite (as indicated on the OWL fragments map).

What does that mean? In more accessible language OWLIM is a (forward chaining) inferencing engine that can be coupled to a Sesame repository. The resulting software (A Sesame RDF repository with SwiftOWLIM installed) is capable of holding a collection of ontologies (OWL) and rules (SWRL) such that as new instance statements (RDF) are added to the repository all of the rules are applied (inferencing on ingest). The end result should be a repository in which all of the crosswalk functions have been completed, and new information (metadata) can be extracted using various query languages (SeQRL & SPARQL) to discover the new information.

Ontologies

My understanding is that once tuned the ontologies will be fairly static. They should get checked in to the subversion repository http://scm.opendap.org/svn in the (as yet to be determined) appropriate spot and then discussed and linked to them from here. ndp

Processing Scheme

Currently the processing steps are described on Benno's test page: http://iri.columbia.edu/~benno/opendaptest/
As this project moves forward and the RDF processing activities are integrated into the OLFS/Hyrax application they should be documented here.

Catalog Implementation

The specifics of the catalog implementation should be discussed here. Design/architecture diagrams, components etc.

Information needed for the WCS server is specified by the API WcsCatalog.java, which includes methods that can be grouped into a number of categories:


   * No argument functions that return XML elements:  getSupportedCrsElements, getCoverageDescriptionElements, getCoverageOfferingBriefElements
   * functions of coverageID that return XML elements:  getCoverageDescriptionElement, getCoverageOfferingBriefElement
   * functions of coverageID that return simple datatypes: hasCoverage, getLongitudeCoordinateDapId, getLatitudeCoordinateDapId, getElevationCoordinateDapId, getTimeCoordinateDapId, getlast_modified
   *  other function of coverageID: getCoverageDescription (which returns an OpenDAP-defined class).

The corresponding svn service is at http://scm.opendap.org/svn/branch/ioos/

Java-based framework for extracting information from RDF store as if it were XML.

will probably be implemented by using the inference network to map into a simple XML data model, having XMLelements and getChild, getChildren, getContent, getAttributes, getAttribute, getAttributeValue methods, along with namespace declaration methods. The returned elements would be taken from the RDF store either restricted to the declared namespaces or some more-efficient restriction to find the relevant subset of available properties that should be presented as XML.

WCS DispatchHandler Configuration

The Hyrax WCS service is made of a collection 4 implementations of the opendap.coreServlet.DispatchHandler interface.

  • HTTP GET: opendap.wcs.v1_1_2.DispatchHandler
  • HTTP POST (For posting unadorned XML WCS requests): opendap.wcs.v1_1_2.PostHandler
  • HTTP POST (For posting SOAP envelopes containg WCS requests): opendap.wcs.v1_1_2.SoapHandler
  • HTTP POST (For handling WCS requests posted from an HTML form): opendap.wcs.v1_1_2.FormHandler

The service is enabled by creating an entry for each service component in the olfs.xml file.

The primary DispatchHandler is opendap.wcs.v1_1_2.DispatchHandler. It is required for the WCS service and it's Handler declaration contains the configuration information for the service. In it's Handler declaration the location of the ServiceIdentification, ServiceProvider, and OperationsMetadata documents are identified, along with a WCS catalog implementation to be used by the service.


To "turn on" the service Handler declarations must appear in the olfs.xml opendap.wcs.v1_1_2.DispatchHandler Configuration


Example Configuration

 <?xml version="1.0" encoding="UTF-8"?>
 <OLFSConfig>
   <DispatchHandlers>
       <HttpGetHandlers>
           <Handler className="opendap.bes.BESManager">
               <BES>
                   <prefix>/</prefix>
                   <host>localhost</host>
                   <port>10002</port>
                   <ClientPool maximum="10" />
               </BES>
           </Handler>

           <Handler className="opendap.coreServlet.BotBlocker">
              <IpMatch>65\.55\.[012]?\d?\d\.[012]?\d?\d</IpMatch>
            </Handler>  
                       

           <Handler className="opendap.wcs.v1_1_2.DispatchHandler">
               <prefix>WCS</prefix>
               <ServiceIdentification>/absolute/path/to/the/document/ServiceIdentification.xml</ServiceIdentification>
               <ServiceProvider>/absolute/path/to/the/document/ServiceProvider.xml</ServiceProvider>
               <OperationsMetadata>/absolute/path/to/the/document/OperationsMetadata.xml</OperationsMetadata>

               <WcsCatalog className="opendap.wcs.v1_1_2.RdfOwlCatalog">
                   <RDFServer>/absolute/path/to/the/directory/containing/the/wcs:CoverageDescription/documents/coverages/dir</RDFServer>
               </WcsCatalog>

           </Handler>
 
           <Handler className="opendap.threddsHandler.StaticCatalogDispatch">
               <prefix>thredds</prefix>
               <useMemoryCache>true</useMemoryCache>
           </Handler>
           <Handler className="opendap.bes.DapDispatchHandler" />
           <Handler className="opendap.bes.DirectoryDispatchHandler" />
           <Handler className="opendap.coreServlet.SpecialRequestDispatchHandler" />
           <Handler className="opendap.bes.VersionDispatchHandler" />
           <Handler className="opendap.bes.FileDispatchHandler" >
               <AllowDirectDataSourceAccess />
           </Handler>
           <Handler className="opendap.bes.BESThreddsDispatchHandler" />
       </HttpGetHandlers>


       <HttpPostHandlers>

           <Handler className="opendap.wcs.v1_1_2.PostHandler" >
               <prefix>WCS/post</prefix>
           </Handler>
           <Handler className="opendap.wcs.v1_1_2.SoapHandler" >
               <prefix>WCS/soap</prefix>
           </Handler>
           <Handler className="opendap.wcs.v1_1_2.FormHandler" >
               <prefix>WCS/form</prefix>
           </Handler>
 

           <Handler className="opendap.coreServlet.SOAPRequestDispatcher" >
               <OpendapSoapDispatchHandler>opendap.bes.SoapDispatchHandler</OpendapSoapDispatchHandler>
           </Handler>
       </HttpPostHandlers>
   </DispatchHandlers>
</OLFSConfig>