Relational Database Handler: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 54: Line 54:
===Building the DDS object in memory.===
===Building the DDS object in memory.===


[[Image:RDH ActivityDiagram.png|thumb|400px|right| Activity diagram for RDH]]
[[Image:RDH ActivityDiagram.png|thumb|400px|right| Activity diagram for the RDH]]


The RDH will load (and cache) from it's specific configuration section of the BES configuration file a list of ODBC Data Sources that it will serve as DAP data sets (See Use Case  [[Adding the RDH to the BES]]).
The RDH will load (and cache) from it's specific configuration section of the BES configuration file a list of ODBC Data Sources that it will serve as DAP data sets (See Use Case  [[Adding the RDH to the BES]]).

Revision as of 17:16, 6 May 2009

In order to provide support for a Sensor Observation Service (SOS) implementation Hyrax will need a handler that allows it to access data in a Relational Database Management System (RDBMS). (This requirement stems from the observation that many of our stake holders store their in situ measurement data in RDBMSs)

It is anticipated RDBMS handler will have a much broader application than just SOS related data sets.


Use Cases

  1. Adding the RDH to the BES
  2. RDH handles bes:showCatalog request
  3. RDH handles a DDX request
  4. RDH handles a DDS request
  5. RDH handles a DAS request
  6. RDH handles a DAP2 data request

Definitions

row set
A row set is an object which encapsulates a set of rows. Database tables are row sets. A database view is a row set. SQL queries return row sets. SQL JOIN operations take row sets as input and produce row sets.
Constraint Expression (CE)
The DAP constraint expression string as described in the DAP2 Specification.

Background

In the past a Java servlet called the DODS Relational Database Server (DRDS) was used to provide DAP access to RDBMS holdings. However this older implementation has a number of shortcomings that preclude it's direct use in our current server architecture:

  • No longer supported.
  • Uses the Java DAP implementation
  • Not a BES module
  • Significant memory limitations
  • Difficult to configure/localize

Although there has been continuing interest in a DRDS replacement within the OPeNDAP community, no funding has been available to develop a soution until recently. The IOOS project has a need to provide a Sensor Observation Service (SOS) interface for Hyrax. Since much of the sensor data is already held in RDBMSs it is a natural and necessary time to develop a DRDS replacement.

Design

There are a number of client/server interactions that can take place in the DAP protocol. However, from a programatic viewpoint, pretty much all of the activities involved in these interactions take place when a client requests DAP2 data from the server. In order for the server to respond to a DAP2 data request it must:

  • Build a DDS instance in memory.
  • Populate it with instances of DAP variables (typically from a custom class factory)
  • Parse the DAP2 Constraint Expression.
  • Read data from the underlying data source and apply the CE to the data.
  • Send the constrained data back to the client.

In order to illuminate the design of the RDH we will examine each component of this interaction in some detail.

The RDH will be a BES module/plug-in. It will use an implementation of the ODBC (most likely unixODBC) to access the RDMS(s). ODBC Data Sources will be defined at the system level as usual. [1][2] [3] [4]

As with other DAP service implementations, the abstract data type classes in the [libdap library] get sub-classed and data source specific read methods are implemented. Because of the nature of the RDBMS data sources some data types (such as Sequence) will need to have other methods (such as serialize()) overridden.

Data type classes for RDH



Building the DDS object in memory.

Activity diagram for the RDH

The RDH will load (and cache) from it's specific configuration section of the BES configuration file a list of ODBC Data Sources that it will serve as DAP data sets (See Use Case Adding the RDH to the BES). When the RDH receives a request for content (such as a DDS, DDX, DAP2 data etc.) it will identify the DAP dataset from the request, and determine which ODBC Data Source it must interact with to fulfill the request. Then, using ODBC (or SQL) introspection methods in the ODBC API, the RDH will identify the collection of tables and views available in the ODBC Data Source and use them to construct a specialized DDS instance in memory.


The RDH will traverse the available tables and views in the ODBC Data Source and it will build a representation of each one as a DAP Sequence object. The columns in each table and view will represented by a DAP variable in the corresponding Sequence instance. Each of these Sequence instances will be added to the DDS instance. When building the Sequences and their variables the RDH will use a custom class factory to build instances of DAP objects that can utilize the ODBC API to extract data from the Result Set of an ODBC brokered database query. Each of the data types generated by the custom factory will have implemented read methods able to read data from the row set returned through the ODBC API. The Sequence instances will each hold the information about their associated ODBC Data Source for use during the data serialization activity.


Building a DDS instance from an ODBC Data Source

Handling the DAP Constraint Expression

Once a an instance of SqlDDS has been created the Constraint Expression can be parsed and applied to it. This process is normally handled in 2 steps: First the CE is parsed and all of the projected variables in the DDS are marked as such. Second, during the reading of the data (which takes place during the serialization process), each data value is checked against the collection of Clauses in the CE. If it meets the CE criteria it is serialized back to the client. In the RDH the process is somewhat more complex because the RDBMS system can (and should) be used to apply most f the relational constraints. Thus in the RDH applying the CE works more like this:

  1. Create the DDX in memory
  2. Parse the CE
  3. Mark all of the projected variables in the DDS.
  4. Use the CE to create an SQL query for each Sequence (which represents a table in the RDBMS) in the DDS.
    1. Projected variables should be "SELECT"ed in the SQL query.
    2. Where possible convert all the clauses in the CE to SQL WHERE clauses.
    3. More details here.
  5. When each Sequence variable in the DDS is serialized, the Sequence.serialize() method should first connect to the ODBC Data Source and issue the (previoulsy constructed) SQL query.
  6. The row set returned by the query is then used as the data source for the Sequence's child variables.
  7. Any CE clauses that were not converted into SQL query clauses should be applied to the data as it is read from the row set.
  8. Any array sub-setting operations must be applied to the data returned in the row set.
Processing a DAP Constraint Expression.

Sending DAP2 Data

In the DDS each Sequence containing projected variables will be serialized:

Sequence.serialize()
- Make database query using SQL query string generated by constraint parsing activity..
- retrieve row set response
- set row set cursor to first row, first column.
- for each row in the row set:
- serialize() each projected child variable
- pass row set to each variable
- Move row set cursor to next row.
BaseTypeVariable.serialize(){
- read value from row set
- move cursor to next column
- Apply remaining constraints and array sub-setting
- transmit data


Details

  • Each implementation of a simple DAP type will need to be able to:
    • Read from the row set returned by an SQL query via ODBC.
    • Move the row set "cursor" to the next column in the current row.
(The order of these two operations will depend on how the implementation of the Sequence type positions the row set cursor.)
  • The implementation of Sequence type will have to manage moving the row set "cursor" from one row to the next.
  • The DAP constraint expressions will have to be converted into SQL queries. (By a child class of ConstraintEvaluator perhaps? )
  • Each Sequence requested will be associated with a separate SQL query. As the DDS is serialized, each Sequence will use the ODBC API to send the query to the Data Source and then use the returned row set as the data content for the serialization of it's (the Sequence's) variables.

Mapping the ODBC data model to the DAP2 data model

Creating an SQL query from a DAP2 Constraint Expression

RDH Catalog Organization

Deliverables

Period of use