UserGuideChapter1: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
 
(8 intermediate revisions by the same user not shown)
Line 370: Line 370:
(and the motivation behind many of the design decisions) of the OPeNDAP
(and the motivation behind many of the design decisions) of the OPeNDAP
software.
software.
=Using OPeNDAP=
A user uses OPeNDAP with an OPeNDAP client program.  This client program may
have been acquired by the user (for example, the OPeNDAP Matlab and IDL
graphic user interfaces, or Ferret, a freeware data analysis package
each use OPeNDAP for data access), or may be a program converted to
use the OPeNDAP library for data access (see ([http://www <cite> opd-client</cite>]).
In either case, there are a set of issues that must be addressed in
order to use a program to access data through OPeNDAP.  The issues can be
classed into two groups.  One set of issues involves configuring the
system to provide OPeNDAP with the helper applications and environment
variables it requires.  The other set concerns the manner in which a
user communicates with an OPeNDAP server.  We cover this first
==How OPeNDAP Finds Data==
Once linked to the
OPeNDAP libraries, an OPeNDAP client created from an existing program will
work exactly as before when run using local files.  However, a user
can also specify an OPeNDAP Uniform Resource Locator (URL) to indicate
some data file on a remote host machine.  When the program receives
this URL, the OPeNDAP libraries will recognize it as remote data, and
issue a network request for the data.  If a user has also installed
an OPeNDAP server on the local machine, then local data may be accessed
either through their local filenames or their OPeNDAP URL.
A URL is simply a unique name for some Internet resource.
The [[Image:opd-client,fig,url-parts]] shows the parts of a
typical OPeNDAP URL.
\begin{figure}[h]
\texorhtml
{\small
${}\overbrace{>dncview}^{Program}
\overbrace"http}^{Protocol"://
\overbrace"dods.gso.uri.edu}^{Machine Name"/
\overbrace"cgi-bin/nph-nc}^{Server"/
\overbrace"data}^{Directory"/
\overbrace"fnoc1.nc}^{Filename"/
\overbrace".das}^{URL Suffix}$"
{\begin{vcode}{cb}
>dncview http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc1.nc.das
^    ^      ^                        ^      ^    ^        ^
|    |      |                        |      |    |        |
Program  |      |                        |      |    |        |
Protocol--      |                        |      |    |        |
Machine Name-----                        |      |    |        |
Server------------------------------------      |    |        |
Directory----------------------------------------    |        |
Filename----------------------------------------------        |
URL Suffix-----------------------------------------------------
\end{vcode}}
\caption{Parts of an OPeNDAP URL (without a constraint expression)}
\end{figure}
The parts of the URL are:
<blockquote>
; protocol :
The protocol of an Internet request may be thought of as the kind
of conversation the client expects to have with the target machine.
For example, a web browser like Netscape Navigator wants to find
a server that can return hypertext documents, while an ftp client
wants to find a server that can understand file transfer requests. A
web browser equipped to display hypertext documents will specify
<font color='green'>http</font> as the protocol for its conversation, and hope that the target
machine has an <font color='green'>httpd</font> daemon listening.
; host : The host name in a URL is simply the
Internet address of the host machine running whatever server can
reply to the specified protocol.
; server : A special feature of the <font color='green'>httpd</font> server process is
that it may be configured to execute Common Gateway Interface (CGI)
programs  upon receipt of a properly specified URL. This is used, for example, by
Internet search engines that ask a user to fill out a form. The CGI
specification will be specific to the server in question, and the
part of the URL that follows the CGI name is passed to the CGI upon
invocation. This data may include a file name, but it may as easily
be some arbitrary string of instructions. The OPeNDAP server is simply
a set of CGI scripts executed on demand by the <font color='green'>httpd</font> server.
Here, the OPeNDAP server is represented by a CGI script called
<font color='green'>nph-nc</font>.
; filename :  If a CGI is not
specified, the part of the URL after the host name is simply the
name of a file that is to be returned to the inquiring browser.  If
a CGI is specified, the file is given to the program as its
argument.
; URL suffix :  If you are issuing an OPeNDAP request
from a non-OPeNDAP client, such as a web browser, you can specify the
type of request by appending a suffix to the URL.  Different
suffixes demand different services from the server.  The different
services are listed in ([http://www <cite> opd-client,services</cite>]).  If you
are using OPeNDAP from an OPeNDAP client, or a client program adapted to
use the OPeNDAP DAP library, you do not need to use a URL suffix.  For
example, to use OPeNDAP from Matlab, with the Matlab GUI or
command-line clients, you do not need to use a suffix.  To use OPeNDAP
from a simple web browser like Netscape Navigator, you will need to
use a suffix.
</blockquote>
The URL in [[Image:opd-client,fig,url-parts]] shows a client
request to the <font color='green'>httpd</font> server on the machine
<font color='green'>dods.gso.uri.edu</font>, for a netCDF dataset (specified by the
<font color='green'>nph-nc} in the \lit{cgi-bin</font> directory) contained in a file
called <font color='green'>fnoc1.nc}.  Upon receiving this URL, the \lit{httpd</font>
server executes the specified OPeNDAP server module (<font color='green'>nph-nc</font>), which
retrieves the file is in a directory called <font color='green'>data</font> relative to
wherever the <font color='green'>httpd</font> server looks for its data\footnote{The only
part of the URL whose spelling is not at the discretion of the
administrator of the host machine is the <font color='green'>http</font>, and the
<font color='green'>nph-</font> at the beginning of the CGI script name. Even the
<font color='green'>nc</font>, indicating netCDF, can be changed, although for clarity's
sake, we hope people won't do so.  Incidentally, the <font color='green'>nph-</font> is a
relic, dating from the early days of the World Wide Web and the
first hypertext protocol standards.  It stands for "Non-Parsing
Header" (See the CGI 1.1 Standard for more information.), and is
the only way to pass data through many httpd servers unparsed.}.
OPeNDAP URLs can get somewhat more complicated than this simple
description.  In particular, they can contain "constraint
expressions" that limit a request to data satisfying a set of
conditions, and they can contain requests to specific OPeNDAP services,
besides the data delivery service suggested here.  Constraint
expressions are described in more detail in
([http://www <cite> opd-client,constraint</cite>]), while the array of services
provided by OPeNDAP servers are described in
([http://www <cite> opd-client,services</cite>]).
===Security===
Some OPeNDAP data providers will choose to control access to some or all
of their data.  When you request data from one of these servers, the
OPeNDAP client  will prompt you for a username and password.  If you want
to avoid the prompt, you can make the OPeNDAP URL even more baroque by
embedding a username and password in it, like this:
\begin{vcode}{sib}
http://user:password@www.dods.org/nph-dods/etc...
\end{vcode}
==The OPeNDAP Services==
Up to now, we have treated the OPeNDAP server as if it has only one
service: providing data to clients who ask for it.  It is true that
this is the most important service a server provides.  However, it is
also true that the server provides several other services besides
that.  In fact, fulfilling a request for data actually requires three
separate requests from the client, using three different services of
the OPeNDAP server.
The services requested from an OPeNDAP server are specified in a suffix
appended to the URL described in
[[Image:opd-client,fig,url-parts]].  Depending on the suffix
supplied, the server will provide one of these services:
<blockquote>
; Data Attribute : This service returns the entire data
attribute structure for the given dataset. This is a text file
describing the attributes of each data quantity in that dataset.
(See ([http://www <cite> data,das</cite>]) for more information about data
attributes.)  This service is activated when the
server receives a URL ending with <font color='green'>.das</font>.
; Data Descriptor : This service returns the entire data descriptor
structure for the given dataset. This is a text file describing the
structure of the variables in the dataset. (See
([http://www <cite> data,dds</cite>]) for more information about data descriptors.)
This service is activated when the server receives a URL ending with
<font color='green'>.dds</font>.
; OPeNDAP Data : This service returns the actual data requested by
a given URL. This is not a text file, but is encoded as a
Multipurpose Internet Mail Extensions (MIME) document.  This service
is activated when the server receives a URL ending with <font color='green'>.dods</font>
; ASCII Data : This service returns an ASCII representation of
the requested data.  This can make the data available to a wide
variety of browser programs.  This service is activated when the
server receives a URL ending with <font color='green'>.asc} or \lit{.ascii</font>.
; \ifh : When the server receives a URL ending in
<font color='green'>.html</font>, it produces an HTML form containing information from
the dataset that you can use to construct a sensible URL with which
to request OPeNDAP data.  The \ifh is also triggered when the OPeNDAP
server receives a URL that references a directory instead of a file.
; Information :  This service returns information about
the server and dataset, in human-readable HTML form.  The returned
document may include information about both the data server itself
(e.g. server functions implemented), and the dataset referenced in
the URL.  The server administrator determines what information is
returned in response to such a request.  This service is activated
when the server receives a URL ending with <font color='green'>.info</font>. See
([http://www <cite> sec,document-data</cite>]) for more information about how to
configure the information service.
; Version : This service returns the version information for the
OPeNDAP server software running on the server.  This service is
triggered by a URL ending with <font color='green'>.ver</font>.
; Help : This service returns some help text in response to an
improperly specified URL.  This service is triggered by a URL ending
in any suffix that is not recognized by the OPeNDAP server.
</blockquote>
<blockquote>A request for data from an OPeNDAP client will generally make three
different service requests, for data attributes, data descriptors, and
for data.  The prepackaged OPeNDAP clients do this for you, so you may
not be aware that three requests are made for each URL.  That is, an OPeNDAP client may accept an OPeNDAP URL specifying some data, such as the
one shown in [[Image:opd-client,fig,url-parts]].  In this case, the
OPeNDAP client library (such as nc-dods) will accept the input URL, and
append the different suffixes to that URL, making three distinct
requests to the OPeNDAP server.</blockquote>
===\ifh===
Each OPeNDAP server implements a service called the \ifh .  This is a way
to use a standard Web client, such as Netscape, to get information
about the data served by a specific server.\footnote{The \ifh is only
available for servers later than version 3.1.} The \ifh has two
modes of operation: the directory level and the file level.
If an OPeNDAP URL references a directory instead of a file on the server
machine, the server produces a listing similar to that shown in
[[Image:opd-client,fig,ifh-dir]].
\figureplace{\ifh - Directory Level}{htbp}
{opd-client,fig,ifh-dir}{ifh-dir.ps}{ifh-dir.gif}{}
Clicking on a dataset shown in the directory-level listing
will produce an HTML form similar to the one in
[[Image:opd-client,fig,ifh]].  The top line in the window ("Data
URL") shows a URL that makes a request for an OPeNDAP dataset.  The
windows below it show the variables that make up the dataset.  You can
edit the form to select the data you'd like to see from this dataset,
and the \ifh will edit the Data URL so that it only requests the data
you are interested in.  When done, you can push the "ASCII" button,
to see an ASCII representation of the data you've requested.  Netscape
cannot handle binary data, so if you
want to use the binary data, you should copy the URL in the Data URL
window to the OPeNDAP client you'd like to use.
\figureplace{\ifh}{htbp}
{opd-client,fig,ifh}{ifh.ps}{ifh.gif}{}
==Using an OPeNDAP Program==
There are some
configuration issues a user must consider in order to use an OPeNDAP
client application program. There is a short list of software that is
required for some of the advanced features of OPeNDAP, and some
environment variables that control the execution of the OPeNDAP software.
For a piece of software that has been converted to use OPeNDAP, after
these conditions are satisfied, the program will run in the same
manner it ran before. Aside from network delays, the user should not
be able to tell that they are accessing data from the Internet.
Finally, though it may seem unnecessary to mention, in order for an OPeNDAP client application to communicate with an OPeNDAP server, the
computer running the OPeNDAP client must be connected to the Internet.
===Requirements===
In order to use of some of the features of the OPeNDAP core software, a
user's computer must have some additional software installed, and
available on the user's <font color='green'>PATH</font>, in
<font color='green'>&#36;DODS_ROOT/bin} or \lit{&#36;DODS_ROOT/etc</font>.
\indc{system
configuration}
*The <font color='green'>wish} {Tcl}}/{\ind{Tk</font> interpreter (or whatever
program is indicated by the <font color='green'>DODS_GUI</font> environment variable) is
used by the "GUI manager" to provide a progress indicator
that displays the status of a pending data request as it is being
processed. It is also used by the error reporting system to display
error message received from the server.  \tbd{and by the data
locator, to display information and query the user}
*The <font color='green'>gzip}</font> program, the \ind{GNU compression
software, is used to decompress data messages received from an OPeNDAP
server. If this program is not installed, the OPeNDAP core software
tells the server not to send compressed messages, so data may still
be received.  However, having the compression software installed and
available will increase the data transfer rate.
The required software, like OPeNDAP itself, is free software. Refer to
\appref{install} for information about acquiring that software.
===Environment Variables===
After successfully relinking an application program with the OPeNDAP
libraries, there is a short list of environment variables that
may be defined.  Only <font color='green'>DODS_ROOT</font> is required. The other three
variables are only used to override default values controlling the GUI
manager process. Most users may safely ignore them.
<blockquote>
; <font color='green'>DODS_ROOT</font> :  indicates the root directory of the OPeNDAP
software. The OPeNDAP core software must be able to locate utilities
that are located in this directory tree. \indc{environment
variables!DODS_ROOT}
; <font color='green'>DODS_GUI</font> : can contain the name of the program used by the
\new{GUI manager}.  A user might wish to change this variable to
point to a "safe" Tcl/Tk interpreter; whatever program is used
here must be able to process Tcl and Tk commands.  The default value
is the <font color='green'>wish</font> program.  \indc{environment
variables!DODS_GUI}
; <font color='green'>DODS_GUI_INIT</font> : indicates the name of any initialization
command required by the "GUI manager". The default
initialization string executes the Tcl program in
<font color='green'>&#36;DODS_ROOT/etc/dods_gui.tc1</font>.
\indc{environment
variables!DODS_GUI_INIT}
; <font color='green'>DODS_USE_GUI</font> : may be used to turn off the GUI manager. Set
the value of this variable to <font color='green'>no</font>, and the progress indicator
and the error message windows will not be displayed.
</blockquote>
<blockquote>The user has substantial control over the GUI manager. You can
change the program that listens for GUI commands from <font color='green'>wish</font> to
anything else, and you can actually change the action of the GUI
commands by editing the Tcl code in the files <font color='green'>dods_gui.tcl</font>,
<font color='green'>error.tcl}, and \lit{progress.tcl</font>. (These are in the
<font color='green'>&#36;DODS_ROOT/etc</font> directory.)  However, editing these files and
variables will not change the form of the messages from the OPeNDAP
server, and from the core software that are meant to invoke these
programs. In other words, the user may mess with these, but must be
careful to leave the GUI manager in a form that will be able to
process the messages it receives.</blockquote>
===The Error System===
The GUI manager is used to display error messages
to the user. The messages themselves will vary with the server
implementation. Refer to the documentation of the particular server,
or consult the server's <font color='green'>info</font> Service (See
([http://www <cite> opd-server,service</cite>]).), for a list of the error messages
that might be issued by a particular server.  \tbd{As error codes are
finalized, they should be included in an Appendix of this document,
and a pointer to them included here.}
===Temporary Files===
Using an OPeNDAP client application will
create a number of temporary files. They are created with the
<font color='green'>tmpnam()</font> function, so their names will correspond to the rules
for that function on your system (See the manual page for
<font color='green'>tmpnam(3)}, or type \lit{man tmpnam</font> for more information.)
During normal operation, OPeNDAP will delete the temporary files it
creates as it goes. However, if execution of the OPeNDAP client is
somehow interrupted, these files may remain, and will have to be
deleted by hand.
=The OPeNDAP Client=
There are many different data analysis packages in use. Some packages, such
as MATLAB and IDL, are commercially available, but many more are written for
a specialized need or application. Many of these use one of the widely
available sets of scientific data access functions (called an {\em
Application Program Interface}, or API)\indc{Application Program
Interface|see{API}} such as NetCDF, JGOFS, or HDF. There is great variety
among all these programs, but one feature they share is that they all access
data through files containing that data\footnote{This is not true of some
APIs, such as JGOFS.  That API, however, uses a data dictionary to allow
the user to think that the data access is through files.}.  That is to say
that each program begins by identifying a file containing the data the user
wishes to examine or analyze.
An OPeNDAP client  is simply a data
analysis application linked with the OPeNDAP libraries instead of the
standard data access API. Using this program, a user can look at files
containing data in the same way as was possible without the OPeNDAP
libraries.  However, by using these libraries, a user can also use a
URL (URL), instead
of a simple file name, to specify data located anywhere on the
Internet.  \Figureref{intro,fig,unlinked} and
[[Image:intro,fig,linked]] illustrate the operation of an
application program linked with a standard data access API, and the
same program linked with the OPeNDAP version of that API.
An OPeNDAP client is then a data analysis application program
modified to become a web browser, somewhat like any other \ind{web
browser} (NCSA Mosaic) with
which you may be familiar. A web browser can only display the data it
receives, however. What makes an OPeNDAP client different from
another web browser is that, unlike Netscape, once the data has been
received from an OPeNDAP server, the OPeNDAP client application can
compute with it.
Like a web browser, an OPeNDAP client accepts a URL from a user, and
parses it to come up with a protocol, an address, and a message. (See
([http://www <cite> opd-client,url</cite>]) for more information about URLs.) The
browser then sends a message to the address, directed to the server
who can service the desired protocol, asking for the information
specified in the remainder of the URL. Unlike a typical web browser, an OPeNDAP client will not know what to do with data returned for a web page
containing text and pictures, but an OPeNDAP server will return scientific
data that an OPeNDAP client can understand and process.
Here is a simple example, using the <font color='green'>ncview</font> program. This program
simply prints out the contents of a netCDF formatted data file,
specified on the command line, like this:
<pre>
> ncview fnocl.nc
</pre>
Using OPeNDAP, this same function may be executed from any computer connected to
the Internet by substituting a URL for the
filename above:
<pre>
> dncview http://dods.gso.uri.edu/cgi-bin/nc/data/fnocl.nc
</pre>
(See [[Image:opd-client,fig,url-parts]] Aside from the fact that
the data is remote, and must be specified with a URL, the program will
seem to function in the same way it had with the simple netCDF library
(albeit somewhat more slowly due to having to make network connections
instead of local file operations). You can find <font color='green'>dncview</font> (the
<font color='green'>ncview</font> program linked with the OPeNDAP library) in the
<pre>
$DODS_ROOT/src/nc-dods/ncview
</pre>
directory. Running the above command will produce the following output:
<pre>
netcdf fnocl {
dimensions:
time_a = 16
lat = 17 ;
lon = 21 ;
time = 16 ;
variables:
long u(time_a, lat, ion) ;
u:units = "meter per second" ;
u:long_name = "Vector wind eastward component" ;
u:missing_value = "-32767" ;
u:scale_factor = "0.005" ;
long v(time_a, lat, ion) ;
v:units = "meter per second" ;
v:long_name = "Vector wind northward component" ;
v:missing_value = "-32767" ;
v:scale_factor = "0.005" ;
double lat(lat) ;
lat:units = "degree North" ;
double lon(lon) ;
lon:units = "degree East" ;
double time(time) ;
time:units = "hours from base_time" ;
// global attributes:
:base_time = "88- 10-00:00:00" ;
:title = "FNOC UV wind components
from 1988- 10 to 1988- 13." ;
data:
u =
-1728, -2449, -3099, -3585, -3254, -2406, -1252,
662, 2483, 2910, 2819, 2946, 2745, 2734,
2931, 2601, 2139, 1845, 1754, 1897, 1854, -1686,
...
</pre>
Although there are packaged OPeNDAP browsing programs that a user can use
to look at data, the user can also construct his or her own.  Linking
an OPeNDAP API with an already existing program allows a user to create a
customized web browser that can access data available from any OPeNDAP
server connected to the Internet.
The OPeNDAP APIs are designed to accurately mimic the behavior of several
different commonly used scientific data APIs.  As of this writing
(\today), the OPeNDAP API set includes:
{| border="1"
|+
! Supported APIs !!  !!
|-
|'''API'''  || '''Description'''  || '''Components'''
|-
|netCDF
|| Support for gridded data, such as satellite data,
interpolated ship station data, or current meter data.
|| Server and client.
|-
|JGOFS
|| Support for relational data, such as \class{Sequences}.
Created by the Joint Globar Ocean Flux Study (JGOFS) project for use
with oceanographic station data.
|| Server and client.
|-
|HDF
|| Support for gridded data.  Commonly used for astronomical
data and model data.
|| Server only.
|-
|DSP
|| Oceanographic and geophysical satellite data.  Provides
support for image processing.  Developed at the University of
Miami/RSMAS.  Primarily used for AVHRR and CZCS data.
|| Server only.
|-
|GRIB
|| Support for gridded binary data.  GRIB is the World
Meteorological Organization (WMO) format for the storage of weather
information and the exchange of weather product messages.
|| Server only, due in early 1999.
|-
|BUFR
|| The WMO's standard set of codes for the transmission and
storage of meteorological data, using a compressed format with each
data value occupying the least number of bits necessary to contain
its range of values.  Suitable for meteorological observations made
from a single point or set of points.
|| Server only, due in early 1999.
|-
|Free\-Form
|| On-the-fly conversion of arbitrarily formatted data,  including
relational data and gridded data.  May be used for sequence data,
satellite data, model data, or any other data format  that can be
described in the flexible FreeForm format definition
language.  This server can be used to serve data stored in almost
all home-grown data formats.
|| Server only; no client required.
|-
| native OPeNDAP
|| The OPeNDAP class library may be used directly by a client program.  It
supports relational data, array data, gridded data, and
a flexible assortment of data types that can be combined to
c  accommodate most data models.
|| Client.
|}
 
The API set is extensible, meaning that developers can use the OPeNDAP
software toolkit to write OPeNDAP-compliant versions of new APIs.  See
[http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] for more information.
The most important result of this architecture is that, just as the
use of the <font color='green'>dncview</font> program above is identical to the original
<font color='green'>ncview</font>, a user can use remote OPeNDAP data "and" continue to
use the same data analysis and display programs with which he or she
is familiar. Any program that uses one of the OPeNDAP-supported APIs may
be re-linked to use the OPeNDAP version of that API.  This creates an OPeNDAP
client. That and a connection to the Internet, are all that a
researcher requires to gain access to the available OPeNDAP data.
==Configuring Programs to Use OPeNDAP==
Relinking an existing program with the OPeNDAP implementation of some
data API is a simple procedure.  Find the directory that contains the
source/object code of the program you want to re-link and modify the
makefile (typically called <font color='green'>Makefile</font>) for the program so that the
OPeNDAP-compliant API library is used in place of the standard API
library.  (If you can't find the libraries on your system, see
\appref{install}, or ask the system administrator.) These
libraries are:
<blockquote>
; <font color='green'>libdap++.a</font> : Software common to all of the OPeNDAP-supported
APIs.
</blockquote>
OPeNDAP also uses facilities from some standard libraries, and these must
also be included in the link to resolve all the symbols.
<blockquote>
; <font color='green'>libwww.a</font> : The World Wide Web library. \indc{World Wide
Web!library} This contains the functions used to communicate
between the OPeNDAP client and server.
; <font color='green'>libexpect.a</font> : Functions from the <font color='green'>expect</font>
library are used to communicate between
OPeNDAP client processes.
; <font color='green'>libtcl.a</font> : Contains definitions necessary for the
<font color='green'>expect</font> library.  The use of this library in the link is not
related to the use of Tcl by OPeNDAP clients.
; <font color='green'>libstdc++.a</font> :
The GNU C++ class library (This is not necessary if using <font color='green'>g++</font>
to re-link.)
</blockquote>
You will also need to include the library containing the
OPeNDAP-compliant version of the API. The name of this library of course
depends on the API, but it is generally in the form
<pre>
<font color='green'>lib"API</font>-dods.a"
</pre>
Where "API" is an abbreviation indicating the API emulated by the
specified library.  For example, the OPeNDAP-compliant netCDF library is
called <font color='green'>libnc-dods.a</font> and the JGOFS version is <font color='green'>libjg-dods.a</font>.
===An Example Using netCDF===
The <font color='green'>ncview</font> program is a simple utility that prints the contents
of a netCDF-format file to standard output.  This section outlines the
process used to modify the <font color='green'>ncview</font> makefile to link that program
with the OPeNDAP netCDF API, thereby turning <font color='green'>ncview</font> into a
network-ready OPeNDAP client. The process of linking any other program
with the corresponding OPeNDAP library is entirely analogous to this one
and only requires the substitution of the program name and the
appropriate library.
First the link flags were modified so that the library search path
would include the likely places to find the OPeNDAP libraries:
<pre>
LDFLAGS = -g -L$(DODS_ROOT)/lib
</pre>
<font color='green'>DODS_ROOT</font> is an environment variable that indicates the root
directory of the OPeNDAP installation, and in this manual is used as
shorthand for this directory.  It is typically called something like
<font color='green'>/usr/local/DODS</font>. If you cannot find these directories on your
system, consult your system administrator, or refer to
\appref{install} for information about acquiring and installing
the OPeNDAP software.
After the link flags were modified, the OPeNDAP libraries were added to the list
of libraries used. The order in which the libraries are listed is important.
<pre>
LIBS = -lnc-dods -ldap++ -lnc-dods -ldap++ -lwww -ltcl
-lexpect -lz -lrx
</pre>
<blockquote>Because OPeNDAP is implemented as a core set of classes contained in one
library (<font color='green'>libdap++.a</font>) and a set of specializations of those classes in a
second library (<font color='green'>libnc-dods.a</font>), and because there is a circular
dependence between those two libraries, they must be included twice in the
linker command.</blockquote>
Finally, <font color='green'>g++</font> was substituted for the link command.\footnote{It
is possible to use <font color='green'>gcc</font>} instead of \lit{g++, but in that
case, <font color='green'>-lg++</font> must be added to the end of the library list.}
===Potential Problems===
When a user links an existing a program to the OPeNDAP libraries, there are
several possible conditions that may cause problems.
*Some programs use more than one API.
*Some programs access data using both API and UNIX system calls.
*Some programs use undocumented features of the APIs.
If this is the case for a given program, there is generally no good solution
beside rewriting the software to conform to a strict usage of the data
reading parts of the given API. Of course if the problem is that the
program uses more than one API, you can try linking the program with an OPeNDAP-compliant version of the second API as well.
*Re-linked programs can be very large.
\indc{troubleshooting!size
of executable}
The OPeNDAP libraries are large, and the <font color='green'>g++</font>, <font color='green'>www</font>,
<font color='green'>expect</font>, and <font color='green'>tcl</font> libraries on which they are built are even
larger. This means that the executable version of a re-linked OPeNDAP
client can seem unreasonably obese. Much of the disk space is occupied
by symbol tables, which can be removed from the executable file with
the <font color='green'>strip</font> utility.  In many cases, a user can recover a
substantial amount of disk space this way.
<blockquote>[CAUTION]{Without familiarity with the OPeNDAP software, it is best
only to strip the executable files. Stripping object files or
libraries might leave them in a useless condition for the linker.
Furthermore, stripping an executable file removes symbol names,
which may make diagnosing problems more difficult.</blockquote>
The OPeNDAP libraries only affect the data ''reading''  functionality
of the specified API. There are no OPeNDAP replacements for functions
like netCDF's <font color='green'>ncputrec()</font>, that ''write''  data to a disk file.
These functions are included in the OPeNDAP-compliant API library, but
they operate in a manner identical to the original (non-OPeNDAP)
versions, that is, they work on local files only, attempting to write
"over the network" will result in an error.  \indc{API!data output
functions}
==Writing New OPeNDAP Programs==
The OPeNDAP software may also be used to write new programs. This may be
done either through one of the OPeNDAP-supported API libraries, such as
netCDF or JGOFS, or by using the OPeNDAP data access protocol directly.
There are advantages and disadvantages to each approach.
The biggest advantage of writing new code using an OPeNDAP-supported API
such as netCDF or JGOFS is that the programmer in question is probably
already familiar with the use of that API. Writing an OPeNDAP program using
an adapted API is not significantly different than writing the same
program with the original API. While writing this new program, it will be
useful to remember that the data the program uses will often be remote,
implying that data retrieval may not be instantaneous, and that
implementation of local caching to store requested data might be a good
idea, but other than that, the process is the same as writing a program
using the regular API.
It is also possible to use the OPeNDAP data access protocol directly.
This is somewhat more involved than using one of the OPeNDAP-compliant
API libraries, and C++ is the only language supported for this.
However, this approach can provide substantially more efficient
programs. For further information about this approach, refer to the
technical information about the DAP in [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] .
[[http://docs.opendap.org/index.php/UserGuide1]]

Latest revision as of 02:36, 25 September 2007

What is OPeNDAP?

The OPeNDAP provides a way for ocean researchers to access oceanographic data anywhere on the Internet from a wide variety of new and existing programs. By developing network versions of commonly used data access Application Program Interface (API) libraries, such as NetCDF , HDF , JGOFS , and others, the OPeNDAP project can capitalize on years of development of data analysis and display packages that use those APIs, allowing users to continue to use programs with which they are already familiar.

The OPeNDAP architecture uses a client/server model, with a {\em

{client}} that sends requests for data out onto the network to some "server", that answers with the requested data. This is exactly the model used by the World Wide Web where client programs called browsers submit requests to web servers for the data that make up web pages. Of course, OPeNDAP clients can do much more than browse this data. Using flexible data types suitable for many uses, including scientific data, the OPeNDAP servers deliver real data directly to the client program in the format needed by that client.

In fact, the network communication model used by OPeNDAP uses URL addresses and web servers ("httpd") to deliver data to the researcher. This is done by using the OPeNDAP software to convert a researcher's data analysis software into a sophisticated (though specialized) web browser. In addition to providing network-compatible versions of popular data access APIs, the OPeNDAP project also provides a software client and server toolkit to help other developers create network-compatible OPeNDAP versions of other APIs.

To expand the universe of data available to a user, OPeNDAP incorporates a powerful data translation facility, so that data may be stored in data structures and formats defined by the data provider, but may be accessed by the user in a manner identical to the access of local data files on the user's own system. Though there are limitations on the types of data that may be translated (See ( data,trans)), the facility is flexible and general enough to handle many of the possible translation. There are two important results:

  • A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that "neither" data set is stored in a format readable by the original (i.e. without OPeNDAP) version of the data analysis and display program he or she uses.
  • No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. Of course, it doesn't have to be in a highly idiosyncratic format. The point is that OPeNDAP can handle a wide variety of possible cases.

The combination of the OPeNDAP network communication model and the data translation facility make OPeNDAP a powerful tool for the retrieval, sampling, and display of large distributed datasets. Though OPeNDAP was developed by oceanographers, its application is not constrained to oceanographic data. The organizing principles and algorithms may be applied to many other fields where data can be stored on computers.

The population of people who may be interested in a system such as OPeNDAP may be divided into data consumers and data providers. Though it was an important observation to the development of OPeNDAP that the two roles are often assumed by the same scientists, the division is a useful one for the introduction of the system. The following two sections provide a broad introduction to the roles of data consumer and data provider. The remainder of this guide is organized around this distinction between classes of users.

Why Use OPeNDAP to Read Data?

A scientist wishing to examine and sample some dataset will typically be comfortable using a relatively small number of data analysis and display programs or packages. Some of these packages will use one of the popular data access APIs currently available. However, few data access APIs provide direct access to distributed data

refers to datasets that reside on different computers which are linked by a network such as the Internet. The computers may or may not be physically remote from each other. The main point is that the computers manage their data resources independently. In this guide the terms "remote\/} and {\em distributed\/" are used to imply independently managed resources.}, so this access must be made with network tools, such as web browsers or "ftp". While relatively straightforward in principle, this process can nonetheless become time-consuming and somewhat challenging in practice.

The following example illustrates some of the differences between accessing distributed data with the tools currently in widespread use, and the same operation using OPeNDAP.

An Example: Using ftp

The advent of the WWW has made possible simple data browsers that allow sophisticated interactive sampling of on-line datasets. Using a web browser and "ftp", a user can sample any of several large oceanographic datasets available on the Internet. However, there are several problems with these data search engines that may only become apparent when a user actually tries to use the data.

Among the problems that can arise are those that appear when a user tries to use the results of one dataset to search a second dataset. Suppose that a user wishes to choose a sea-surface temperature image from the NOAA/NASA Pathfinder AVHRR archive at:

http://podaac-www.jpl.nasa.gov/mcsst/mcsst_subset.html

using the results of a time-series generated from the COADS Climatology archive at:

http://ferret.wrc.noaa.gov/fbin/climate_server

The steps are theoretically straightforward:


  1. Create the time series from the COADS Climatology archive. This is done by answering the menu of options on the COADS web page.
  2. Import the time series from step 1 to the user's local data analysis system. Note that this step may itself require several steps:
    1. The data must be down-loaded, using "ftp" or a similar program.
    2. Once down-loaded, the data may have to be converted into a format that can be read by the data analysis program.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done by answering the menu of option on the AVHRR Web page. Note that the COADS and AVHRR pages are not completely compatible in this respect. For example, the date formats of the two pages are different.
  4. Import the result of step 3 to the user's local data display system. This may also require several steps:
    1. The data must be down-loaded again.
    2. And again, once down-loaded, the data may have to be converted into a format that can be read by the data analysis program. Note that the set of available formats on the COADS page are distinct from the available options from the AVHRR archive.
  5. Think about the results.

Though the procedure is straightforward and the web servers designed to make sampling the datasets a simple task, upon close examination, the combination of the steps may create unforeseen difficulties. For example, a request to the COADS server will return either a spreadsheet suitable for use on a PC, a netCDF format file, or a file in one of a selection of simple ASCII formats. If the user is fortunate, the returned file will already be in a format compatible with the desired analysis package. But not all users will be so fortunate. Often this file must be converted to some other file format before it can be imported to the user's analysis program. This may or may not be a simple task.

Even a file format for which a user is properly equipped may be used in an unfamiliar manner. For example, the independent and dependent variables might be in a different order or an ASCII data file may use tabs instead of spaces.

Assuming the import of the COADS data has been accomplished and boundaries for the AVHRR search identified, the task of selecting from the second archive may begin. Unfortunately, the request to the AVHRR archive will return either a GIF picture, an HDF format file, or a raw (binary) data file. Again, importing this output into the user's analysis program may or may not be simple, but it will not be the same procedure as the one used for the first data request.

Other problems are also apparent. The COADS Climatology sampling program requests the user supply dates (month and day), whereas the AVHRR archive asks for the "Julian day" (an integer between 1 and 365 or 366). One server will accept "S" and "W" to indicate South latitudes and West longitudes, while the other requires that these be indicated with negative coordinate values. The sampling of the COADS dataset, while flexible, may not allow sampling in the manner the user needs. It cannot, for example, provide a section except along a line of constant latitude or longitude. If a user wanted to see a section along a NE-SW line, it would be a challenging and time-consuming task to assemble one from many small data requests.

Further, it might be desirable to use the results of sampling these two databases to construct a time series. This could conceivably mean repeating the entire procedure many times.

An Example: Using OPeNDAP

To produce the same data selection using OPeNDAP, a user would follow essentially the same steps. However, the steps themselves would be performed differently. Once the user's data analysis package has been converted to an OPeNDAP client (( opd-client,link)), the \tbd{add xref to install GUI

clients} accesses to the remote datasets are made through the analysis package itself. Instead of specifying a data file by a pathname reference to some local disk file, the user specifies a URL, which may point to either a local or a remote dataset. Here is a re cap of the same operation, outlined as they would be performed by an OPeNDAP application program:


  1. Create the time series from the COADS Climatology archive. This is done by using the sampling facilities of whatever data analysis program a scientist is familiar with. If desired, OPeNDAP constraint expressions may be used to reduce the network load, or to provide a sampling scheme not supported by the data analysis program.
  2. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 1.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done through the sampling facilities of whatever data analysis program the user is using, and OPeNDAP constraint expressions. Note that, whatever their actual format, both COADS and AVHRR archives appear to the OPeNDAP client to be stored in identical formats.
  4. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 3.
  5. Think about the results.

It is important to note that "any" data analysis package that can handle one of the DODS-supported data access APIs can be converted into an OPeNDAP client program capable of reading data stored by "all" of the DODS-supported data access APIs. (There are some limitations on translation. See ( intro,opd-client) and ( data,trans) for more information.) Therefore, assuming the user has some analysis package capable of doing the required sampling and analysis on local data, all the steps would be performed from within that package, just as if the user were operating on local files. The result is a simpler procedure, even though the same essential steps are followed.

The OPeNDAP scenario has, among others, the following advantages:


  • The user need not learn about any of the archival formats, since the OPeNDAP server and client cooperate to deliver the data in the format in which the analysis package expects to see it. Whereas the user of the ftp server has to worry about importing the data into the analysis program, the OPeNDAP client program imports it transparently.
  • The user can sample the distant datasets in any fashion supported by his or her own (local) analysis package. Unnecessary data need not be sent over the Internet.
  • By appending a "constraint expression" to the URLs given to the analysis program, the user can sample data using techniques that their analysis program cannot do.\footnote{For example, suppose a user wishes to access the NODC XBT database using a program that uses the netCDF API. A program that can process the arrays that netCDF manipulates are largely unsuitable for XBT station data. However, a user can define constraint expressions in the URL to sample the data and deliver it in a form the netCDF API can use. For more information about constraint expressions, see Section~(opd-client,constraint). For more information about data models and translation, see Chapter~(data).}\tbd{Use a different example in the footnote}
  • A substantial amount of the searching and sampling is performed on the server machines. This reduces Internet traffic, as well as decreasing the load on the local machine.

The OPeNDAP Client

OPeNDAP uses a client/server model. As mentioned, the OPeNDAP servers are simply "httpd} web servers, equipped to interpret an OPeNDAP URL sent to them. (See \chapterref{opd-server".) The OPeNDAP client program can be any program that uses one of the supported APIs, such as JGOFS or netCDF.\footnote{Or a program specially developed to read data from OPeNDAP servers.}

Without OPeNDAP, an application program that uses one of the common data access APIs such as netCDF will operate as shown in File:Intro,fig,unlinked. The user makes a request for data from the application program. The program in turn uses procedures defined by the data access API to access the data, which is stored locally on the host machine. Some APIs are somewhat more sophisticated than this, of course, but their general operation is similar to this outline.

\figureplace{The Architecture of a Data Analysis Package.}{htbp} {intro,fig,unlinked}{unlinked.ps}{unlinked.gif}{}

The operation of an OPeNDAP client is illustrated in File:Intro,fig,linked. Here, the same application program that was used in File:Intro,fig,unlinked has been linked with an OPeNDAP version of the data access API. Now, in addition to being able to use local data as before, the application program is able to access data from OPeNDAP server anywhere on the Internet in the same manner as the local data.

To make some program into an OPeNDAP client, it must only be re-linked with the OPeNDAP implementation of the supported API library. This is a simple process, generally requiring only a few minutes. The process will create a program that accepts URLs, specifying a location for the data somewhere on the Internet, in addition to file pathnames which only specify a location on the local platform's file system. (See ( opd-client,link).)

\figureplace{The Architecture of a Data Analysis Package Using OPeNDAP.}{htbp} {intro,fig,linked}{linked.ps}{linked.gif}{}

OPeNDAP also provides a data translation facility. Data from the original data file is translated by the OPeNDAP server into an OPeNDAP data model for transmission to the client. Upon receiving the data, the client translates the data into the data model it understands. (See ( data) for more information about the OPeNDAP data model.) Because the data transmitted from an OPeNDAP server to the client travel in the OPeNDAP format, the data set's original storage format is completely irrelevant to the user of an OPeNDAP client. If the client was originally designed to read netCDF format files, the data returned by the OPeNDAP-netCDF library will appear to have been read from a netCDF file, whatever the actual format of the files from which the data were read\footnote{Note that there is a limit to what can be translated. An API meant to support two-dimensional arrays may be able to handle one-dimensional vector data, but a program designed to process one-dimensional vector data will not know what to do with a two-dimensional array. The set of data access APIs supported by OPeNDAP contain several such mismatches. See Section~(data,trans) for more information.}. If the program expects JGOFS data, the DODS-JGOFS library will return data that seem to have come from a JGOFS dataset, again, no matter what the actual input file format.

OPeNDAP does not pretend to remove all the overhead of data searches. A user will still have to keep track of the URLs of interesting data sets in the same way a user must now keep track of the names of files containing interesting data. an OPeNDAP \new{catalog service} is in the process of being constructed that will help users scan the available datasets.

Providing Data with OPeNDAP

The OPeNDAP data provider is the person or organization willing to make their digital datasets available to the community with an OPeNDAP server.


The designers of OPeNDAP recognized that many of the data users are also the data providers, and OPeNDAP was built with a recognition that providing the data should be as simple and as straightforward as possible. In many cases, once a local web server is equipped to become an OPeNDAP server, a scientist need do very little beyond what must be done simply to make the data available locally. (i.e., Put the data into a file format that can be read by the locally used data analysis and display programs.) The tasks of a data provider can be separated into three parts:


  • Install and configure the OPeNDAP server.

(( opd-server,install).)

  • Create whatever ancillary data files are needed by the data set (if any). (( intro,ancillary).)  %
  • Register the data set with the master directory (optional).  %
  • Create the data catalog.

The OPeNDAP Server

The OPeNDAP data server is simply made up of a regular httpd server equipped with CGI programs (or filters) that will respond to requests for dataset structure, data attributes, and data itself. (See ( data,dap) for a description of the data returned by these requests and see ( opd-client,url) for a description of the OPeNDAP URL syntax used to send these requests.) Most of the task of a data provider consists of configuring this server. While perhaps not a trivial task, it potentially represents far less effort than packaging a dataset for submission to some central data archive. Furthermore, modifying a server's configuration to accommodate new data will be an almost trivial task, involving the simple editing of a configuration file.

Ancillary Data

In order for an OPeNDAP client to accept data from an OPeNDAP server, it must be able to allocate the data structures and arrange internal labels to organize the incoming data. The information the client library needs to do this organizing is called the ancillary data\footnote{It is also referred to as

the Data Descriptor Structure and the Data Attribute Structure. See

Chapter~(data) for more details about these structures.}. For many APIs, the ancillary data is inherent in the data files themselves, and the OPeNDAP server can glean that information by scanning the data files. For large data archives, where scanning the data files is impractical, and that might not change often, OPeNDAP can cache the ancillary data to speed access times. When a client requests the ancillary data, the OPeNDAP server can check this data cache first before scanning the data files.

This feature is useful in other cases because not all data file formats are self-describing. For example, a data set might contain several files of time vs. temperature data; the header information describing which numbers are temperature and which time may be in a different file or may simply be understood by the user of the local data analysis program equipped to look at this data. As an example, data accessed by OPeNDAP servers using the FreeForm data access API require provider-created ancillary data files.

Administration and Centralization of Data

Under OPeNDAP, there is no central archive of data. Data under OPeNDAP is organized in a manner similar to the World Wide Web itself. That is, all one need do to make one's data available is to start up a properly configured "httpd" server on an Internet node that has access to the data to be served. Each data provider is free to join and to leave the system when it is convenient, just as any proprietor of a web page is free to delete it or add to it as whimsy demands.

Of course, as can also be seen on the World Wide Web, there are some disadvantages to the lack of central authority. If no one knows about a web site, no one will visit it. Similarly, listing a dataset in a central data catalog, such as the Global Change Master Directory (http://gcmd.gsfc.nasa.gov/),can make data available to other researchers in a way that simply configuring an OPeNDAP server does not. OPeNDAP provided a facility for registering a data set with the GCMD catalog, which makes the data set known to the OPeNDAP data location service.


The remainder of this book will be divided into three major sections: instructions on the building and operating of OPeNDAP clients; a tutorial and reference on running OPeNDAP servers and making data available to OPeNDAP clients; and technical documentation describing the implementation details (and the motivation behind many of the design decisions) of the OPeNDAP software.