ProgrammerGuideChapter4: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 63: Line 63:


#Create concrete classes of the entire BaseType hierarchy, with <font color='green'>read</font> functions for each data type.  Certain APIs cannot handle certain OPeNDAP types.  For these types, there must still be a concrete class, but it can have a <font color='green'>read</font> method with a null body.
#Create concrete classes of the entire BaseType hierarchy, with <font color='green'>read</font> functions for each data type.  Certain APIs cannot handle certain OPeNDAP types.  For these types, there must still be a concrete class, but it can have a <font color='green'>read</font> method with a null body.
#Write functions that use the native API to extract from the dataset the information needed to build the OPeNDAP DAS and DDS objects, and then build them with the methods those classes provide.     
#Write functions that use the native API to extract from the dataset the information needed to build the OPeNDAP DAS and DDS objects, and then build them with the methods those classes provide.    <blockquote> NOTE: This step has nothing at all to do with OPeNDAP.  This is between you and your data.  OPeNDAP makes no demands on how these structures are created.  That is, for example, if all the data to be served has the same DDS, feel free to cheat.  The only thing that is important is that the structures accurately reflect
<blockquote> NOTE: This step has nothing at all to do with OPeNDAP.  This is between you and your data.  OPeNDAP makes no demands on how these structures are created.  That is, for example, if all the data to be served has the same DDS, feel free to cheat.  The only thing that is important is that the structures accurately reflect
the relationships of the data.</blockquote>
the relationships of the data.</blockquote>
#Create filter programs to return the DAS ,  DDS, data, and server usage and version information.
#Create filter programs to return the DAS ,  DDS, data, and server usage and version information.

Revision as of 12:48, 26 January 2008

Using the Toolkit

This chapter describes how to use the toolkit software to build new client libraries and data servers. Before beginning to build either part of a new OPeNDAP application, it is very important to be intimate with the details of the API to be replaced.


To create a client library that can replace the original API implementation at link time means that the client library must present exactly the same interface as the original library. This includes, to the extent that they are widely used, any undocumented features of the original implementation that manifest themselves as symbols that require link-time resolution. Building a client-library requires great understanding of the existing implementation as well as current use of the target API.


To build a good data server for files or data sets encoded using an API it is important to understand the data model(s) the API supports and how they relate to the OPeNDAP data models. Each of the various data types that the API supports must be translated into a OPeNDAP data type (i.e., one of the OPeNDAP classes that descend from BaseType). However, there is often not a one-to-one match between the API's types and the OPeNDAP types. Thus, the data server author must decide how to best translate the API's types into OPeNDAP types so as to preserve as much of the data set author's intent. This is exacerbated by the use of various conventions that (implicitly) bind several variables together with a data set. When this pattern shows up (as it does with NetCDF) you must decide whether to lump all variables together that appear to use the convention (and thus falsely group some variables) or to group only those which actually are explicitly grouped using whatever the API provides. If you choose the latter then any data sets which follow the convention will lose information. When building the data server it is important to keep such tradeoffs in mind.

The following sections discuss the specifics of building a data server and a client library. The existing NetCDF server and client library are used as examples. Many APIs are very similar in their overall organization. The source code used for these examples can be found in $(OPeNDAP_ROOT)/src/nc-dods/. Much of the NetCDF example will be relevant to your task, even if your target API is significantly different. The $(OPeNDAP_ROOT)/src/jg-dods/ directory contains both a data server and client library for the Joint Geoghsical Ocean Flux Study relational data system.

Data Servers

The OPeNDAP data server consists of a dispatch program and a set of filter programs. The dispatch program reads the incoming URL and decides which of the filter programs to run based on the URL suffix.


A typical OPeNDAP data request uses three filters: one to return the DAS (.das), one for the DDS (.dds), and the third for the data (.dods). A client can also request ASCII data (.asc or .ascii), usage information about the server (.info), or version information about the server and the data (.ver).


The task of building a OPeNDAP server can then be separated into the following steps:


  1. Create concrete classes of the entire BaseType hierarchy, with read functions for each data type. Certain APIs cannot handle certain OPeNDAP types. For these types, there must still be a concrete class, but it can have a read method with a null body.
  2. Write functions that use the native API to extract from the dataset the information needed to build the OPeNDAP DAS and DDS objects, and then build them with the methods those classes provide.

    NOTE: This step has nothing at all to do with OPeNDAP. This is between you and your data. OPeNDAP makes no demands on how these structures are created. That is, for example, if all the data to be served has the same DDS, feel free to cheat. The only thing that is important is that the structures accurately reflect

the relationships of the data.

  1. Create filter programs to return the DAS , DDS, data, and server usage and version information.
  2. Create a dispatch program to parse an incoming URL and invoke the correct filter program.

To install the finished server, put the filter programs into a web server's CGI directory, and put the datasets to be served somewhere they can be seen by those filter programs. Refer to the The OPeNDAP User Guide\ for more details about installing a server.


The Dispatch CGI

The OPeNDAP dispatch CGI program receives a data request from the OPeNDAP client, and dispatches the request to one of several filter programs. The dispatch CGI is stored in a CGI directory on the host machine. Its name is an important detail of its operation. The name should begin with nph-, and end with the letters that distinguish data files containing data formatted with that API from other files.\footnote{The nph- is a relic, dating from the misty dawn of the World Wide Web and the first http standards. It stands for "Non-Parsing Header" (See the CGI 1.1 Standard for more information.), and is the only way to pass data through many httpd servers unparsed.} So, for example, \netcdf data files are called \var{foo}.nc, so the \netcdf dispatch CGI is called nph-nc.


The dispatch CGI's job is to parse the incoming URL and execute the appropriate filter programs with the arguments enclosed in the URL. The dispatch CGI is also be responsible for the first level of error information that must be returned to the user. These tasks are easily accomplished in any scripting language. On the off chance you wish to use Perl, OPeNDAP provides a Perl class designed to make writing the CGI a simple task.


The file OPeNDAP_Dispatch.pm contains the definitions of the OPeNDAP_Dispatch class. This class provides several methods used to parse the incoming URL, and one method for delivering error messages to the client. The OPeNDAP_Dispatch provides the following methods:


command()
Returns the command string implied by the input

URL. The command string looks like:


\var{command} \var{filename} -e \var{query-string}.


Where \var{command} is the OPeNDAP filter program to be run, \var{filename} is the absolute filename of the dataset on which to run it, and \var{query-string} is the constraint expression that was enclosed in the URL. Of the OPeNDAP_dispatch methods, many dispatch CGI scripts may only need to use this one and print_error_msg. See File:Fig,cgi


query()
Returns the query string from the URL. This is

the OPeNDAP constraint expression.


filename()
Returns the absolute filename corresponding to

the requested dataset.


extension()
Returns the extension on the end of the URL.

For OPeNDAP, this will be das, dds, dods, info, or ver.


cgi-dir()
Returns the absolute pathname of the directory

in which the dispatch CGI is stored. This is generally the same as the directory in which the OPeNDAP filter programs are stored.


script()
Returns the name of the dispatch CGI, minus the

nph-, and any suffixes used for a secure server.


print_error_message(\var{ver})
This returns an error message to the client, explaining how to use the server. The

\var{ver} argument should be a string containing the version of the server software. The error message returned is encoded in the OPeNDAP_Dispatch.pm file.


print_help_message()
This returns a help message to the

client. This can be issued in response to a confusing or inadequate URL. The help message returned is encoded in the OPeNDAP_Dispatch.pm file.


A sample (simple) OPeNDAP dispatch CGI is shown in File:Fig,cgi. This is a Perl script using the OPeNDAP_Dispatch methods. This script assumes that all data is rooted in the http document directory subtree.\footnote{You can use this even if you want to access files outside that subtree. Simply use a symbolic link and make sure that your server is set to follow symbolic links.}


#!/usr/local/bin/perl

use Env;
use OPeNDAP_Dispatch;

$dispatch = new OPeNDAP_Dispatch;

<math>command = </math>dispatch->command();

if ($command ne "") {           # if no error...
    exec($command);
} else {
    my <math>script_rev = '</math>Revision: 11906 $ ';

<math>script_rev =~ s@$([A-z]*): (.*) $@</math>2@;

    <math>dispatch->print_error_msg(</math>script_rev);
}

\caption{A simple OPeNDAP data server dispatch CGI.}


The DAS and DDS filter programs

The simplest way to learn about creating a new filter program to return a dataset's DAS or DDS is to examine the existing filter programs. In this section, we will examine the \netcdf servers.

The source code for the DAS filter program distributed with the \netcdf server software is shown in File:Fig,das-filter. The DAS and DDS filters are very similar, so only the DAS filter will be discussed here. The important differences between the two will be pointed out.

The CGI dispatch program makes heavy use of commonly used functions collected in the OPeNDAP_Dispatch class. In the same way, the OPeNDAPFilter class collects several commonly used functions for the construction of filter programs. The example program uses several methods of that class. Other useful utility functions are in the cgi-util collection.


The filter program in File:Fig,das-filter can be separated into the following steps:

line 16
Step 1: The OPeNDAPFilter class provides a

constructor that parses the argument list to create the data. You can use the OK method to check that the list was parsed properly. Any errors here indicate a mistake in the dispatch CGI itself. This is why the print_usage function prints its message to the WWW server log file when it returns an error object to the client.

line 21
Step 2: If the user has only requested version

information from the server, it is provided here.

line 26
Step 3: The read_variables function performs the

real work of this program. This involves scanning the dataset itself for data variable attributes and using the DAS method functions to assemble the corresponding DAS. This operation is specific to the data access API in use, so does not make a good example.

line 29
Step 4: Each of the filter programs must create a

\MIME document to hold its return value. The DAS and DDS filters return a text MIME document; they set up the MIME headers using the utility function set_mime_text.

line 34
Step 5: Once the data set has been read and the

attribute table built, the DAS ancillary file is loaded. The example filter looks for a file with the same root name as the data set and an extension of .das. If such a file exists, it is read in using the DAS member function DAS::parse and the information it contains is merged with the DAS built from the dataset.

line 37
Step 6: Finally the DAS member function

print is used to send the textual representation of the DAS to the client. When it is invoked by the httpd daemon, the dispatch CGI's standard input and output are a socket connected to the remote client process. This means that since the filter is invoked by the dispatch script, its output goes directly to the client. The OPeNDAPFilter send_das method looks something like this:


OPeNDAPFilter::send_das(DAS &das)
{

set_mime_text(dods_das);

das.print();
return true;
}

#include <iostream.h>

#include "DAS.h"
#include "cgi_util.h"
#include "OPeNDAPFilter.h"

extern bool read_variables(DAS &das,

const char *filename, String *error);

int
main(int argc, char *argv[])
{

DAS das;

OPeNDAPFilter df(argc, argv);


if (!df.OK()) {

df.print_usage();

return 1;

}


if (df.version()) {

df.send_version_info();

return 0;

}


String errMsg;

if(!read_variables(das, df.get_dataset_name(), &errMsg)){

Error e(no_such_file, errMsg);
set_mime_text(dods_error);

e.print();

return 1;

}


if (!df.read_ancillary_das(das))

return 1;


if (!df.send_das(das))

return 1;


return 0;
}

\caption{The DAS filter program.}


Note that the example filter in File:Fig,das-filter does not use any caching. It is possible to build a more sophisticated filter program that saves the generated DAS to a text file and then uses that file without first interrogating the data set, thus saving on access. It is also possible to write a DAS by hand and always use that if the data set does not contain any of the type of information that the DAS has.


Caching DAS and DDS Objects

Because the construction of the DAS and DDS objects requires that an entire data set be scanned, it can become very inefficient to continually rebuild these objects. Because the DAS and DDS filter programs use a text representation for transmission from the server to the client, it is simple to store both the DAS and DDS objects once they have been created. Subsequent accesses to these objects can be accomplished by reading and transmitting the textual representation without actually building the binary data object.

When taking advantage of this optimization, it is important that the server check the date stamp of the DAS / DDS text objects and compare it to the latest modification date of the data set. For any dataset to which new data is periodically added, the DAS / DDS text object must clearly be updated so that the cached text object matches exactly the object that would be created if the object were built by querying the data set.

The update of the DAS / DDS text object can itself be optimized significantly. It is not actually necessary to completely re-read the entire data set. Because the software used to build both the DAS and the DDS binary objects work incrementally, it is possible to read text version of the DAS / DDS object, and then read only the new parts of the data set. The binary object will be added to as needed.

NOTE: The DAS / DDS software may not properly update

changed data (data that was present in a previous version of the data set, but is now different) nor is it straightforward to remove data which is no longer present in the data set. In these cases it is usually better to regenerate the DAS / DDS from scratch.


The Data filter

The data filter program is structured similarly to both the DAS and DDS filters except that it returns a binary MIME document rather than text and that it takes two arguments instead of just one. In addition to the data set or file name (argument 1) it also takes the OPeNDAP constraint expression (argument 2, which was enclosed in the URL's query ).

The \netcdf data filter is all but identical to the DDS filter. The only difference is that it calls the send_data method of OPeNDAPFilter to send the binary data over the network. This function calls the DDS send method.

If for some reason you cannot use the send member function of DDS, then you must ensure that the the read , \emph{CE

evaluation} and the serialize operations are all carried out in the correct order. Furthermore, you must ensure that the return value of the data filter is a binary MIME document with a text prefix (currently, OPeNDAP does not use the multi-part MIME standard); that is a regular binary MIME document with a section at the start that is text. This text is the DDS generated after evaluating the projection clauses of the constraint expression. The text part is separated from the data by the keyword "Data:" at the start of the line.\footnote{The "Data:" keyword is not in the scope of the text DDS so it is possible to have the text Data: in the DDS}.


The ASCII Data Filter

OPeNDAP is packaged with a filter to translate a OPeNDAP data stream into an ASCII data file. Clients can request ASCII data by appending .asc or .ascii to their URL instead of .dods. The asciival program is useful as a standalone client (see The OPeNDAP User Guide), but may also be used by a server to provide ASCII data.

A request for ASCII data is processed as any other request for data, but the final output of the data filter is piped into the asciival program and the result returned to the client:

nc_dods Data.nc | asciival -m -- -

\noindent The OPeNDAP_Dispatch class takes care of this step automatically, when it encounters a request using .asc or .ascii.


The Usage Filter

Client requests containing a .info suffix should return to the client HTML text containing documentation of both the server usage and the dataset named in the query. OPeNDAP provides a usage filter that can be used for this purpose. The OPeNDAP_Dispatch class invokes this filter.

The OPeNDAP-provided usage filter accepts two arguments, the data file name requested and the name of the CGI script (the dispatch CGI) in use:

<font color='green'>usage</font> \var{filename} \var{CGI-name}


The usage filter looks in the dataset directory for a file called \var{filename}.html, and in the directory specified in the \var{CGI-name} argument for a file called \var{CGI-name}.html. These two files must contain HTML, but without the \W<html>, <head> or <body> \Thtml, head, or body tags.

For example, suppose a dispatch CGI using the OPeNDAP_Dispatch class receives a URL like this:

http://dods/cgi-bin/nph-nc/data.info


In this case, the usage filter looks for two files: cgi-bin/nph-nc.html and data.html (the htdocs directory is assumed in the second case). The contents of these two files are concatenated with an HTML representation of the DAS and DDS for the data.nc file, and the whole thing is returned to the client. If the HTML files are not found, the returned document contains only the DAS and DDS.