UserGuideChapter1: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 370: Line 370:
(and the motivation behind many of the design decisions) of the OPeNDAP
(and the motivation behind many of the design decisions) of the OPeNDAP
software.
software.
=Using OPeNDAP=
A user uses OPeNDAP with an OPeNDAP client program.  This client program may
have been acquired by the user (for example, the OPeNDAP Matlab and IDL
graphic user interfaces, or Ferret, a freeware data analysis package
each use OPeNDAP for data access), or may be a program converted to
use the OPeNDAP library for data access (see ([http://www <cite> opd-client</cite>]).
In either case, there are a set of issues that must be addressed in
order to use a program to access data through OPeNDAP.  The issues can be
classed into two groups.  One set of issues involves configuring the
system to provide OPeNDAP with the helper applications and environment
variables it requires.  The other set concerns the manner in which a
user communicates with an OPeNDAP server.  We cover this first
==How OPeNDAP Finds Data==
Once linked to the
OPeNDAP libraries, an OPeNDAP client created from an existing program will
work exactly as before when run using local files.  However, a user
can also specify an OPeNDAP Uniform Resource Locator (URL) to indicate
some data file on a remote host machine.  When the program receives
this URL, the OPeNDAP libraries will recognize it as remote data, and
issue a network request for the data.  If a user has also installed
an OPeNDAP server on the local machine, then local data may be accessed
either through their local filenames or their OPeNDAP URL.
A URL is simply a unique name for some Internet resource.
The [[Image:opd-client,fig,url-parts]] shows the parts of a
typical OPeNDAP URL.
\begin{figure}[h]
\texorhtml
{\small
${}\overbrace{>dncview}^{Program}
\overbrace"http}^{Protocol"://
\overbrace"dods.gso.uri.edu}^{Machine Name"/
\overbrace"cgi-bin/nph-nc}^{Server"/
\overbrace"data}^{Directory"/
\overbrace"fnoc1.nc}^{Filename"/
\overbrace".das}^{URL Suffix}$"
{\begin{vcode}{cb}
>dncview http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc1.nc.das
^    ^      ^                        ^      ^    ^        ^
|    |      |                        |      |    |        |
Program  |      |                        |      |    |        |
Protocol--      |                        |      |    |        |
Machine Name-----                        |      |    |        |
Server------------------------------------      |    |        |
Directory----------------------------------------    |        |
Filename----------------------------------------------        |
URL Suffix-----------------------------------------------------
\end{vcode}}
\caption{Parts of an OPeNDAP URL (without a constraint expression)}
\end{figure}
The parts of the URL are:
<blockquote>
; protocol :
The protocol of an Internet request may be thought of as the kind
of conversation the client expects to have with the target machine.
For example, a web browser like Netscape Navigator wants to find
a server that can return hypertext documents, while an ftp client
wants to find a server that can understand file transfer requests. A
web browser equipped to display hypertext documents will specify
<font color='green'>http</font> as the protocol for its conversation, and hope that the target
machine has an <font color='green'>httpd</font> daemon listening.
; host : The host name in a URL is simply the
Internet address of the host machine running whatever server can
reply to the specified protocol.
; server : A special feature of the <font color='green'>httpd</font> server process is
that it may be configured to execute Common Gateway Interface (CGI)
programs  upon receipt of a properly specified URL. This is used, for example, by
Internet search engines that ask a user to fill out a form. The CGI
specification will be specific to the server in question, and the
part of the URL that follows the CGI name is passed to the CGI upon
invocation. This data may include a file name, but it may as easily
be some arbitrary string of instructions. The OPeNDAP server is simply
a set of CGI scripts executed on demand by the <font color='green'>httpd</font> server.
Here, the OPeNDAP server is represented by a CGI script called
<font color='green'>nph-nc</font>.
; filename :  If a CGI is not
specified, the part of the URL after the host name is simply the
name of a file that is to be returned to the inquiring browser.  If
a CGI is specified, the file is given to the program as its
argument.
; URL suffix :  If you are issuing an OPeNDAP request
from a non-OPeNDAP client, such as a web browser, you can specify the
type of request by appending a suffix to the URL.  Different
suffixes demand different services from the server.  The different
services are listed in ([http://www <cite> opd-client,services</cite>]).  If you
are using OPeNDAP from an OPeNDAP client, or a client program adapted to
use the OPeNDAP DAP library, you do not need to use a URL suffix.  For
example, to use OPeNDAP from Matlab, with the Matlab GUI or
command-line clients, you do not need to use a suffix.  To use OPeNDAP
from a simple web browser like Netscape Navigator, you will need to
use a suffix.
</blockquote>
The URL in [[Image:opd-client,fig,url-parts]] shows a client
request to the <font color='green'>httpd</font> server on the machine
<font color='green'>dods.gso.uri.edu</font>, for a netCDF dataset (specified by the
<font color='green'>nph-nc} in the \lit{cgi-bin</font> directory) contained in a file
called <font color='green'>fnoc1.nc}.  Upon receiving this URL, the \lit{httpd</font>
server executes the specified OPeNDAP server module (<font color='green'>nph-nc</font>), which
retrieves the file is in a directory called <font color='green'>data</font> relative to
wherever the <font color='green'>httpd</font> server looks for its data\footnote{The only
part of the URL whose spelling is not at the discretion of the
administrator of the host machine is the <font color='green'>http</font>, and the
<font color='green'>nph-</font> at the beginning of the CGI script name. Even the
<font color='green'>nc</font>, indicating netCDF, can be changed, although for clarity's
sake, we hope people won't do so.  Incidentally, the <font color='green'>nph-</font> is a
relic, dating from the early days of the World Wide Web and the
first hypertext protocol standards.  It stands for "Non-Parsing
Header" (See the CGI 1.1 Standard for more information.), and is
the only way to pass data through many httpd servers unparsed.}.
OPeNDAP URLs can get somewhat more complicated than this simple
description.  In particular, they can contain "constraint
expressions" that limit a request to data satisfying a set of
conditions, and they can contain requests to specific OPeNDAP services,
besides the data delivery service suggested here.  Constraint
expressions are described in more detail in
([http://www <cite> opd-client,constraint</cite>]), while the array of services
provided by OPeNDAP servers are described in
([http://www <cite> opd-client,services</cite>]).
===Security===
Some OPeNDAP data providers will choose to control access to some or all
of their data.  When you request data from one of these servers, the
OPeNDAP client  will prompt you for a username and password.  If you want
to avoid the prompt, you can make the OPeNDAP URL even more baroque by
embedding a username and password in it, like this:
\begin{vcode}{sib}
http://user:password@www.dods.org/nph-dods/etc...
\end{vcode}
==The OPeNDAP Services==
Up to now, we have treated the OPeNDAP server as if it has only one
service: providing data to clients who ask for it.  It is true that
this is the most important service a server provides.  However, it is
also true that the server provides several other services besides
that.  In fact, fulfilling a request for data actually requires three
separate requests from the client, using three different services of
the OPeNDAP server.
The services requested from an OPeNDAP server are specified in a suffix
appended to the URL described in
[[Image:opd-client,fig,url-parts]].  Depending on the suffix
supplied, the server will provide one of these services:
<blockquote>
; Data Attribute : This service returns the entire data
attribute structure for the given dataset. This is a text file
describing the attributes of each data quantity in that dataset.
(See ([http://www <cite> data,das</cite>]) for more information about data
attributes.)  This service is activated when the
server receives a URL ending with <font color='green'>.das</font>.
; Data Descriptor : This service returns the entire data descriptor
structure for the given dataset. This is a text file describing the
structure of the variables in the dataset. (See
([http://www <cite> data,dds</cite>]) for more information about data descriptors.)
This service is activated when the server receives a URL ending with
<font color='green'>.dds</font>.
; OPeNDAP Data : This service returns the actual data requested by
a given URL. This is not a text file, but is encoded as a
Multipurpose Internet Mail Extensions (MIME) document.  This service
is activated when the server receives a URL ending with <font color='green'>.dods</font>
; ASCII Data : This service returns an ASCII representation of
the requested data.  This can make the data available to a wide
variety of browser programs.  This service is activated when the
server receives a URL ending with <font color='green'>.asc} or \lit{.ascii</font>.
; \ifh : When the server receives a URL ending in
<font color='green'>.html</font>, it produces an HTML form containing information from
the dataset that you can use to construct a sensible URL with which
to request OPeNDAP data.  The \ifh is also triggered when the OPeNDAP
server receives a URL that references a directory instead of a file.
; Information :  This service returns information about
the server and dataset, in human-readable HTML form.  The returned
document may include information about both the data server itself
(e.g. server functions implemented), and the dataset referenced in
the URL.  The server administrator determines what information is
returned in response to such a request.  This service is activated
when the server receives a URL ending with <font color='green'>.info</font>. See
([http://www <cite> sec,document-data</cite>]) for more information about how to
configure the information service.
; Version : This service returns the version information for the
OPeNDAP server software running on the server.  This service is
triggered by a URL ending with <font color='green'>.ver</font>.
; Help : This service returns some help text in response to an
improperly specified URL.  This service is triggered by a URL ending
in any suffix that is not recognized by the OPeNDAP server.
</blockquote>
<blockquote>A request for data from an OPeNDAP client will generally make three
different service requests, for data attributes, data descriptors, and
for data.  The prepackaged OPeNDAP clients do this for you, so you may
not be aware that three requests are made for each URL.  That is, an OPeNDAP client may accept an OPeNDAP URL specifying some data, such as the
one shown in [[Image:opd-client,fig,url-parts]].  In this case, the
OPeNDAP client library (such as nc-dods) will accept the input URL, and
append the different suffixes to that URL, making three distinct
requests to the OPeNDAP server.</blockquote>
===\ifh===
Each OPeNDAP server implements a service called the \ifh .  This is a way
to use a standard Web client, such as Netscape, to get information
about the data served by a specific server.\footnote{The \ifh is only
available for servers later than version 3.1.} The \ifh has two
modes of operation: the directory level and the file level.
If an OPeNDAP URL references a directory instead of a file on the server
machine, the server produces a listing similar to that shown in
[[Image:opd-client,fig,ifh-dir]].
\figureplace{\ifh - Directory Level}{htbp}
{opd-client,fig,ifh-dir}{ifh-dir.ps}{ifh-dir.gif}{}
Clicking on a dataset shown in the directory-level listing
will produce an HTML form similar to the one in
[[Image:opd-client,fig,ifh]].  The top line in the window ("Data
URL") shows a URL that makes a request for an OPeNDAP dataset.  The
windows below it show the variables that make up the dataset.  You can
edit the form to select the data you'd like to see from this dataset,
and the \ifh will edit the Data URL so that it only requests the data
you are interested in.  When done, you can push the "ASCII" button,
to see an ASCII representation of the data you've requested.  Netscape
cannot handle binary data, so if you
want to use the binary data, you should copy the URL in the Data URL
window to the OPeNDAP client you'd like to use.
\figureplace{\ifh}{htbp}
{opd-client,fig,ifh}{ifh.ps}{ifh.gif}{}
==Using an OPeNDAP Program==
There are some
configuration issues a user must consider in order to use an OPeNDAP
client application program. There is a short list of software that is
required for some of the advanced features of OPeNDAP, and some
environment variables that control the execution of the OPeNDAP software.
For a piece of software that has been converted to use OPeNDAP, after
these conditions are satisfied, the program will run in the same
manner it ran before. Aside from network delays, the user should not
be able to tell that they are accessing data from the Internet.
Finally, though it may seem unnecessary to mention, in order for an OPeNDAP client application to communicate with an OPeNDAP server, the
computer running the OPeNDAP client must be connected to the Internet.
===Requirements===
In order to use of some of the features of the OPeNDAP core software, a
user's computer must have some additional software installed, and
available on the user's <font color='green'>PATH</font>, in
<font color='green'>&#36;DODS_ROOT/bin} or \lit{&#36;DODS_ROOT/etc</font>.
\indc{system
configuration}
*The <font color='green'>wish} {Tcl}}/{\ind{Tk</font> interpreter (or whatever
program is indicated by the <font color='green'>DODS_GUI</font> environment variable) is
used by the "GUI manager" to provide a progress indicator
that displays the status of a pending data request as it is being
processed. It is also used by the error reporting system to display
error message received from the server.  \tbd{and by the data
locator, to display information and query the user}
*The <font color='green'>gzip}</font> program, the \ind{GNU compression
software, is used to decompress data messages received from an OPeNDAP
server. If this program is not installed, the OPeNDAP core software
tells the server not to send compressed messages, so data may still
be received.  However, having the compression software installed and
available will increase the data transfer rate.
The required software, like OPeNDAP itself, is free software. Refer to
\appref{install} for information about acquiring that software.
===Environment Variables===
After successfully relinking an application program with the OPeNDAP
libraries, there is a short list of environment variables that
may be defined.  Only <font color='green'>DODS_ROOT</font> is required. The other three
variables are only used to override default values controlling the GUI
manager process. Most users may safely ignore them.
<blockquote>
; <font color='green'>DODS_ROOT</font> :  indicates the root directory of the OPeNDAP
software. The OPeNDAP core software must be able to locate utilities
that are located in this directory tree. \indc{environment
variables!DODS_ROOT}
; <font color='green'>DODS_GUI</font> : can contain the name of the program used by the
\new{GUI manager}.  A user might wish to change this variable to
point to a "safe" Tcl/Tk interpreter; whatever program is used
here must be able to process Tcl and Tk commands.  The default value
is the <font color='green'>wish</font> program.  \indc{environment
variables!DODS_GUI}
; <font color='green'>DODS_GUI_INIT</font> : indicates the name of any initialization
command required by the "GUI manager". The default
initialization string executes the Tcl program in
<font color='green'>&#36;DODS_ROOT/etc/dods_gui.tc1</font>.
\indc{environment
variables!DODS_GUI_INIT}
; <font color='green'>DODS_USE_GUI</font> : may be used to turn off the GUI manager. Set
the value of this variable to <font color='green'>no</font>, and the progress indicator
and the error message windows will not be displayed.
</blockquote>
<blockquote>The user has substantial control over the GUI manager. You can
change the program that listens for GUI commands from <font color='green'>wish</font> to
anything else, and you can actually change the action of the GUI
commands by editing the Tcl code in the files <font color='green'>dods_gui.tcl</font>,
<font color='green'>error.tcl}, and \lit{progress.tcl</font>. (These are in the
<font color='green'>&#36;DODS_ROOT/etc</font> directory.)  However, editing these files and
variables will not change the form of the messages from the OPeNDAP
server, and from the core software that are meant to invoke these
programs. In other words, the user may mess with these, but must be
careful to leave the GUI manager in a form that will be able to
process the messages it receives.</blockquote>
===The Error System===
The GUI manager is used to display error messages
to the user. The messages themselves will vary with the server
implementation. Refer to the documentation of the particular server,
or consult the server's <font color='green'>info</font> Service (See
([http://www <cite> opd-server,service</cite>]).), for a list of the error messages
that might be issued by a particular server.  \tbd{As error codes are
finalized, they should be included in an Appendix of this document,
and a pointer to them included here.}
===Temporary Files===
Using an OPeNDAP client application will
create a number of temporary files. They are created with the
<font color='green'>tmpnam()</font> function, so their names will correspond to the rules
for that function on your system (See the manual page for
<font color='green'>tmpnam(3)}, or type \lit{man tmpnam</font> for more information.)
During normal operation, OPeNDAP will delete the temporary files it
creates as it goes. However, if execution of the OPeNDAP client is
somehow interrupted, these files may remain, and will have to be
deleted by hand.
{{UserGuide1}}

Latest revision as of 02:36, 25 September 2007

What is OPeNDAP?

The OPeNDAP provides a way for ocean researchers to access oceanographic data anywhere on the Internet from a wide variety of new and existing programs. By developing network versions of commonly used data access Application Program Interface (API) libraries, such as NetCDF , HDF , JGOFS , and others, the OPeNDAP project can capitalize on years of development of data analysis and display packages that use those APIs, allowing users to continue to use programs with which they are already familiar.

The OPeNDAP architecture uses a client/server model, with a {\em

{client}} that sends requests for data out onto the network to some "server", that answers with the requested data. This is exactly the model used by the World Wide Web where client programs called browsers submit requests to web servers for the data that make up web pages. Of course, OPeNDAP clients can do much more than browse this data. Using flexible data types suitable for many uses, including scientific data, the OPeNDAP servers deliver real data directly to the client program in the format needed by that client.

In fact, the network communication model used by OPeNDAP uses URL addresses and web servers ("httpd") to deliver data to the researcher. This is done by using the OPeNDAP software to convert a researcher's data analysis software into a sophisticated (though specialized) web browser. In addition to providing network-compatible versions of popular data access APIs, the OPeNDAP project also provides a software client and server toolkit to help other developers create network-compatible OPeNDAP versions of other APIs.

To expand the universe of data available to a user, OPeNDAP incorporates a powerful data translation facility, so that data may be stored in data structures and formats defined by the data provider, but may be accessed by the user in a manner identical to the access of local data files on the user's own system. Though there are limitations on the types of data that may be translated (See ( data,trans)), the facility is flexible and general enough to handle many of the possible translation. There are two important results:

  • A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that "neither" data set is stored in a format readable by the original (i.e. without OPeNDAP) version of the data analysis and display program he or she uses.
  • No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. Of course, it doesn't have to be in a highly idiosyncratic format. The point is that OPeNDAP can handle a wide variety of possible cases.

The combination of the OPeNDAP network communication model and the data translation facility make OPeNDAP a powerful tool for the retrieval, sampling, and display of large distributed datasets. Though OPeNDAP was developed by oceanographers, its application is not constrained to oceanographic data. The organizing principles and algorithms may be applied to many other fields where data can be stored on computers.

The population of people who may be interested in a system such as OPeNDAP may be divided into data consumers and data providers. Though it was an important observation to the development of OPeNDAP that the two roles are often assumed by the same scientists, the division is a useful one for the introduction of the system. The following two sections provide a broad introduction to the roles of data consumer and data provider. The remainder of this guide is organized around this distinction between classes of users.

Why Use OPeNDAP to Read Data?

A scientist wishing to examine and sample some dataset will typically be comfortable using a relatively small number of data analysis and display programs or packages. Some of these packages will use one of the popular data access APIs currently available. However, few data access APIs provide direct access to distributed data

refers to datasets that reside on different computers which are linked by a network such as the Internet. The computers may or may not be physically remote from each other. The main point is that the computers manage their data resources independently. In this guide the terms "remote\/} and {\em distributed\/" are used to imply independently managed resources.}, so this access must be made with network tools, such as web browsers or "ftp". While relatively straightforward in principle, this process can nonetheless become time-consuming and somewhat challenging in practice.

The following example illustrates some of the differences between accessing distributed data with the tools currently in widespread use, and the same operation using OPeNDAP.

An Example: Using ftp

The advent of the WWW has made possible simple data browsers that allow sophisticated interactive sampling of on-line datasets. Using a web browser and "ftp", a user can sample any of several large oceanographic datasets available on the Internet. However, there are several problems with these data search engines that may only become apparent when a user actually tries to use the data.

Among the problems that can arise are those that appear when a user tries to use the results of one dataset to search a second dataset. Suppose that a user wishes to choose a sea-surface temperature image from the NOAA/NASA Pathfinder AVHRR archive at:

http://podaac-www.jpl.nasa.gov/mcsst/mcsst_subset.html

using the results of a time-series generated from the COADS Climatology archive at:

http://ferret.wrc.noaa.gov/fbin/climate_server

The steps are theoretically straightforward:


  1. Create the time series from the COADS Climatology archive. This is done by answering the menu of options on the COADS web page.
  2. Import the time series from step 1 to the user's local data analysis system. Note that this step may itself require several steps:
    1. The data must be down-loaded, using "ftp" or a similar program.
    2. Once down-loaded, the data may have to be converted into a format that can be read by the data analysis program.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done by answering the menu of option on the AVHRR Web page. Note that the COADS and AVHRR pages are not completely compatible in this respect. For example, the date formats of the two pages are different.
  4. Import the result of step 3 to the user's local data display system. This may also require several steps:
    1. The data must be down-loaded again.
    2. And again, once down-loaded, the data may have to be converted into a format that can be read by the data analysis program. Note that the set of available formats on the COADS page are distinct from the available options from the AVHRR archive.
  5. Think about the results.

Though the procedure is straightforward and the web servers designed to make sampling the datasets a simple task, upon close examination, the combination of the steps may create unforeseen difficulties. For example, a request to the COADS server will return either a spreadsheet suitable for use on a PC, a netCDF format file, or a file in one of a selection of simple ASCII formats. If the user is fortunate, the returned file will already be in a format compatible with the desired analysis package. But not all users will be so fortunate. Often this file must be converted to some other file format before it can be imported to the user's analysis program. This may or may not be a simple task.

Even a file format for which a user is properly equipped may be used in an unfamiliar manner. For example, the independent and dependent variables might be in a different order or an ASCII data file may use tabs instead of spaces.

Assuming the import of the COADS data has been accomplished and boundaries for the AVHRR search identified, the task of selecting from the second archive may begin. Unfortunately, the request to the AVHRR archive will return either a GIF picture, an HDF format file, or a raw (binary) data file. Again, importing this output into the user's analysis program may or may not be simple, but it will not be the same procedure as the one used for the first data request.

Other problems are also apparent. The COADS Climatology sampling program requests the user supply dates (month and day), whereas the AVHRR archive asks for the "Julian day" (an integer between 1 and 365 or 366). One server will accept "S" and "W" to indicate South latitudes and West longitudes, while the other requires that these be indicated with negative coordinate values. The sampling of the COADS dataset, while flexible, may not allow sampling in the manner the user needs. It cannot, for example, provide a section except along a line of constant latitude or longitude. If a user wanted to see a section along a NE-SW line, it would be a challenging and time-consuming task to assemble one from many small data requests.

Further, it might be desirable to use the results of sampling these two databases to construct a time series. This could conceivably mean repeating the entire procedure many times.

An Example: Using OPeNDAP

To produce the same data selection using OPeNDAP, a user would follow essentially the same steps. However, the steps themselves would be performed differently. Once the user's data analysis package has been converted to an OPeNDAP client (( opd-client,link)), the \tbd{add xref to install GUI

clients} accesses to the remote datasets are made through the analysis package itself. Instead of specifying a data file by a pathname reference to some local disk file, the user specifies a URL, which may point to either a local or a remote dataset. Here is a re cap of the same operation, outlined as they would be performed by an OPeNDAP application program:


  1. Create the time series from the COADS Climatology archive. This is done by using the sampling facilities of whatever data analysis program a scientist is familiar with. If desired, OPeNDAP constraint expressions may be used to reduce the network load, or to provide a sampling scheme not supported by the data analysis program.
  2. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 1.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done through the sampling facilities of whatever data analysis program the user is using, and OPeNDAP constraint expressions. Note that, whatever their actual format, both COADS and AVHRR archives appear to the OPeNDAP client to be stored in identical formats.
  4. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 3.
  5. Think about the results.

It is important to note that "any" data analysis package that can handle one of the DODS-supported data access APIs can be converted into an OPeNDAP client program capable of reading data stored by "all" of the DODS-supported data access APIs. (There are some limitations on translation. See ( intro,opd-client) and ( data,trans) for more information.) Therefore, assuming the user has some analysis package capable of doing the required sampling and analysis on local data, all the steps would be performed from within that package, just as if the user were operating on local files. The result is a simpler procedure, even though the same essential steps are followed.

The OPeNDAP scenario has, among others, the following advantages:


  • The user need not learn about any of the archival formats, since the OPeNDAP server and client cooperate to deliver the data in the format in which the analysis package expects to see it. Whereas the user of the ftp server has to worry about importing the data into the analysis program, the OPeNDAP client program imports it transparently.
  • The user can sample the distant datasets in any fashion supported by his or her own (local) analysis package. Unnecessary data need not be sent over the Internet.
  • By appending a "constraint expression" to the URLs given to the analysis program, the user can sample data using techniques that their analysis program cannot do.\footnote{For example, suppose a user wishes to access the NODC XBT database using a program that uses the netCDF API. A program that can process the arrays that netCDF manipulates are largely unsuitable for XBT station data. However, a user can define constraint expressions in the URL to sample the data and deliver it in a form the netCDF API can use. For more information about constraint expressions, see Section~(opd-client,constraint). For more information about data models and translation, see Chapter~(data).}\tbd{Use a different example in the footnote}
  • A substantial amount of the searching and sampling is performed on the server machines. This reduces Internet traffic, as well as decreasing the load on the local machine.

The OPeNDAP Client

OPeNDAP uses a client/server model. As mentioned, the OPeNDAP servers are simply "httpd} web servers, equipped to interpret an OPeNDAP URL sent to them. (See \chapterref{opd-server".) The OPeNDAP client program can be any program that uses one of the supported APIs, such as JGOFS or netCDF.\footnote{Or a program specially developed to read data from OPeNDAP servers.}

Without OPeNDAP, an application program that uses one of the common data access APIs such as netCDF will operate as shown in File:Intro,fig,unlinked. The user makes a request for data from the application program. The program in turn uses procedures defined by the data access API to access the data, which is stored locally on the host machine. Some APIs are somewhat more sophisticated than this, of course, but their general operation is similar to this outline.

\figureplace{The Architecture of a Data Analysis Package.}{htbp} {intro,fig,unlinked}{unlinked.ps}{unlinked.gif}{}

The operation of an OPeNDAP client is illustrated in File:Intro,fig,linked. Here, the same application program that was used in File:Intro,fig,unlinked has been linked with an OPeNDAP version of the data access API. Now, in addition to being able to use local data as before, the application program is able to access data from OPeNDAP server anywhere on the Internet in the same manner as the local data.

To make some program into an OPeNDAP client, it must only be re-linked with the OPeNDAP implementation of the supported API library. This is a simple process, generally requiring only a few minutes. The process will create a program that accepts URLs, specifying a location for the data somewhere on the Internet, in addition to file pathnames which only specify a location on the local platform's file system. (See ( opd-client,link).)

\figureplace{The Architecture of a Data Analysis Package Using OPeNDAP.}{htbp} {intro,fig,linked}{linked.ps}{linked.gif}{}

OPeNDAP also provides a data translation facility. Data from the original data file is translated by the OPeNDAP server into an OPeNDAP data model for transmission to the client. Upon receiving the data, the client translates the data into the data model it understands. (See ( data) for more information about the OPeNDAP data model.) Because the data transmitted from an OPeNDAP server to the client travel in the OPeNDAP format, the data set's original storage format is completely irrelevant to the user of an OPeNDAP client. If the client was originally designed to read netCDF format files, the data returned by the OPeNDAP-netCDF library will appear to have been read from a netCDF file, whatever the actual format of the files from which the data were read\footnote{Note that there is a limit to what can be translated. An API meant to support two-dimensional arrays may be able to handle one-dimensional vector data, but a program designed to process one-dimensional vector data will not know what to do with a two-dimensional array. The set of data access APIs supported by OPeNDAP contain several such mismatches. See Section~(data,trans) for more information.}. If the program expects JGOFS data, the DODS-JGOFS library will return data that seem to have come from a JGOFS dataset, again, no matter what the actual input file format.

OPeNDAP does not pretend to remove all the overhead of data searches. A user will still have to keep track of the URLs of interesting data sets in the same way a user must now keep track of the names of files containing interesting data. an OPeNDAP \new{catalog service} is in the process of being constructed that will help users scan the available datasets.

Providing Data with OPeNDAP

The OPeNDAP data provider is the person or organization willing to make their digital datasets available to the community with an OPeNDAP server.


The designers of OPeNDAP recognized that many of the data users are also the data providers, and OPeNDAP was built with a recognition that providing the data should be as simple and as straightforward as possible. In many cases, once a local web server is equipped to become an OPeNDAP server, a scientist need do very little beyond what must be done simply to make the data available locally. (i.e., Put the data into a file format that can be read by the locally used data analysis and display programs.) The tasks of a data provider can be separated into three parts:


  • Install and configure the OPeNDAP server.

(( opd-server,install).)

  • Create whatever ancillary data files are needed by the data set (if any). (( intro,ancillary).)  %
  • Register the data set with the master directory (optional).  %
  • Create the data catalog.

The OPeNDAP Server

The OPeNDAP data server is simply made up of a regular httpd server equipped with CGI programs (or filters) that will respond to requests for dataset structure, data attributes, and data itself. (See ( data,dap) for a description of the data returned by these requests and see ( opd-client,url) for a description of the OPeNDAP URL syntax used to send these requests.) Most of the task of a data provider consists of configuring this server. While perhaps not a trivial task, it potentially represents far less effort than packaging a dataset for submission to some central data archive. Furthermore, modifying a server's configuration to accommodate new data will be an almost trivial task, involving the simple editing of a configuration file.

Ancillary Data

In order for an OPeNDAP client to accept data from an OPeNDAP server, it must be able to allocate the data structures and arrange internal labels to organize the incoming data. The information the client library needs to do this organizing is called the ancillary data\footnote{It is also referred to as

the Data Descriptor Structure and the Data Attribute Structure. See

Chapter~(data) for more details about these structures.}. For many APIs, the ancillary data is inherent in the data files themselves, and the OPeNDAP server can glean that information by scanning the data files. For large data archives, where scanning the data files is impractical, and that might not change often, OPeNDAP can cache the ancillary data to speed access times. When a client requests the ancillary data, the OPeNDAP server can check this data cache first before scanning the data files.

This feature is useful in other cases because not all data file formats are self-describing. For example, a data set might contain several files of time vs. temperature data; the header information describing which numbers are temperature and which time may be in a different file or may simply be understood by the user of the local data analysis program equipped to look at this data. As an example, data accessed by OPeNDAP servers using the FreeForm data access API require provider-created ancillary data files.

Administration and Centralization of Data

Under OPeNDAP, there is no central archive of data. Data under OPeNDAP is organized in a manner similar to the World Wide Web itself. That is, all one need do to make one's data available is to start up a properly configured "httpd" server on an Internet node that has access to the data to be served. Each data provider is free to join and to leave the system when it is convenient, just as any proprietor of a web page is free to delete it or add to it as whimsy demands.

Of course, as can also be seen on the World Wide Web, there are some disadvantages to the lack of central authority. If no one knows about a web site, no one will visit it. Similarly, listing a dataset in a central data catalog, such as the Global Change Master Directory (http://gcmd.gsfc.nasa.gov/),can make data available to other researchers in a way that simply configuring an OPeNDAP server does not. OPeNDAP provided a facility for registering a data set with the GCMD catalog, which makes the data set known to the OPeNDAP data location service.


The remainder of this book will be divided into three major sections: instructions on the building and operating of OPeNDAP clients; a tutorial and reference on running OPeNDAP servers and making data available to OPeNDAP clients; and technical documentation describing the implementation details (and the motivation behind many of the design decisions) of the OPeNDAP software.