Wiki Testing/OPeNDAPUserGuide1: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
 
(29 intermediate revisions by 2 users not shown)
Line 2: Line 2:




The OPeNDAP provides a way for ocean researchers to
OPeNDAP provides a way for researchers to
access oceanographic data anywhere on the Internet from a wide variety of new
access scientific data anywhere on the Internet from a wide variety of new
''and existing''  programs. By developing network versions of commonly used
''and existing''  programs. By developing network versions of commonly used
data access Application Program Interface (API) libraries, such as
data access libraries, such as
[http://www.unidata.ucar.edu/packages/netcdf/guide.txn_toc.html <cite>NetCDF</cite>] ,
[http://www.unidata.ucar.edu/packages/netcdf/guide.txn_toc.html <cite>NetCDF</cite>] ,
[http://www.ncsa.uiuc.edu/SDG/Software/HDF/HDFIntro.html <cite>HDF</cite>] ,
[http://www.ncsa.uiuc.edu/SDG/Software/HDF/HDFIntro.html <cite>HDF</cite>] ,
[http://www1.whoi.edu/jgofs.html <cite>JGOFS</cite>] , and others,
[http://www1.whoi.edu/jgofs.html <cite>JGOFS</cite>] , and others,
the OPeNDAP project can capitalize on years of development of data analysis and
the OPeNDAP project can capitalize on years of development of data analysis and
display packages that use those APIs, allowing users to continue to use
display packages that use those file formats, allowing users to continue to use
programs with which they are already familiar.
programs with which they are already familiar.


The OPeNDAP architecture uses a client/server model, with a {\em
The OPeNDAP architecture uses a client/server model, with a ''client''
 
that sends requests for data out onto the network to some ''server'',
{client}} that sends requests for data out onto the network to some
that answers with the requested data. This is exactly the model used
"server", that answers with the requested data. This is exactly
by the [http://www.w3.org/hypertext/WWW/TheProject.html <nowiki>World
the model used by the [http://www.w3.org/hypertext/WWW/TheProject.html <cite>World Wide Web</cite>] where client programs
Wide Web</nowiki>] where client programs called browsers submit requests
called browsers submit requests to web servers for the data that make up web
to web servers for the data that make up web pages. Of course, OPeNDAP
pages. Of course, OPeNDAP clients can do much more than browse this data.  Using
clients can do much more than browse this data.  Using flexible data
flexible data types suitable for many uses, including scientific data, the
types suitable for many uses, including scientific data, the OPeNDAP
OPeNDAP servers deliver real data directly to the client program in the format
servers deliver real data directly to the client program in the format
needed by that client.
needed by that client.


In fact, the network communication model used by OPeNDAP uses URL
The network communication model used by OPeNDAP uses URL
addresses and web servers ("httpd") to deliver data to the
addresses and web servers to deliver data to the
researcher.  This is done by using the OPeNDAP software to convert a
researcher.  This is done by using the OPeNDAP software to convert a
researcher's data analysis software into a sophisticated (though
researcher's data analysis software into a sophisticated (though
Line 35: Line 35:
To expand the universe of data available to a user, OPeNDAP incorporates
To expand the universe of data available to a user, OPeNDAP incorporates
a powerful data translation facility, so that data may be stored in
a powerful data translation facility, so that data may be stored in
data structures and formats defined by the data provider, but may be
data structures and formats defined by the data provider, but  
accessed by the user in a manner identical to the access of local data
accessed by the user in a manner identical to the access of local data
files on the user's own system. Though there are limitations on the
files on the user's own system. Though there are limitations on the
types of data that may be translated (See ([[Wiki_Testing/OPeNDAPUserGuide6| Section 6.1.2]])),
types of data that may be translated (See [[Wiki_Testing/OPeNDAPUserGuide6| Section 6.1.2]]),
the facility is flexible and general enough to handle many of the
the facility is flexible and general enough to handle many of the
possible translation.  There are two important results:
possible translations.  There are two important results:


*A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that "neither" data set is stored in a format readable by the original (i.e. without OPeNDAP) version of the data analysis and display program he or she uses.
*A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that ''neither'' data set is stored in a format readable by the original version of the data analysis and display program he or she uses.  
*No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. Of course, it doesn't have to be in a highly idiosyncratic format. The point is that OPeNDAP can handle a wide variety of possible cases.
 
*No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. (Of course, it doesn't ''have'' to be in a highly idiosyncratic format.)


The combination of the OPeNDAP network communication model and the data
The combination of the OPeNDAP network communication model and the data
Line 51: Line 52:
oceanographic data. The organizing principles and algorithms may be
oceanographic data. The organizing principles and algorithms may be
applied to many other fields where data can be stored on computers.
applied to many other fields where data can be stored on computers.
The uniformity with which data appears makes the system very useful
both for easing data analysis for a researcher, but also for
automating data transport and manipulation tasks.  OPeNDAP libraries
make data seem uniform, and by making the data analysis programs
network-aware, simplify scripting and automation.  For example, NOAA's
[http://ferret.pmel.noaa.gov/Ferret/LAS/home/ Live Access Server (LAS)] (see, for example, [http://mynasadata.larc.nasa.gov/data.html My NASA Data]) uses OPeNDAP, as do many of the real-time observing systems that make up the [http://www.ioos.gov Integrated Ocean Observing System (IOOS)], like [http://gomoos.org Gulf of Maine Ocean Observing System]. 


The population of people who may be interested in a system such as
The population of people who may be interested in a system such as
Line 61: Line 70:
this distinction between classes of users.
this distinction between classes of users.


==Why Use OPeNDAP to Read Data?==
A scientist wishing to examine and sample some dataset will typically
be comfortable using a relatively small number of data analysis and
display programs or packages. Some of these packages will use one of
the popular data access APIs currently available. However, few data
access APIs provide direct access to distributed data [[Wiki_Testing/OPeNDAPUserGuideFootNotes|1]], so this access must be made with
network tools, such as web browsers or "ftp". While
relatively straightforward in principle, this process can nonetheless
become time-consuming and somewhat challenging in practice.
The following example illustrates some of the differences between
accessing distributed data with the tools currently in widespread use,
and the same operation using OPeNDAP.
===An Example: Using ftp===
The advent of the WWW has made possible simple data browsers that
allow sophisticated interactive sampling of on-line datasets. Using a
web browser and "ftp", a user can sample any of several large
oceanographic datasets available on the Internet. However, there are
several problems with these data search engines that may only become
apparent when a user actually tries to use the data.
Among the problems that can arise are those that appear when a user
tries to use the results of one dataset to search a second
dataset. Suppose that a user wishes to choose a sea-surface
temperature image from the NOAA/NASA Pathfinder AVHRR archive at:
<pre>
http://podaac-www.jpl.nasa.gov/mcsst/mcsst_subset.html
</pre>
using the results of a
time-series generated from the COADS Climatology archive at:
<pre>
http://ferret.wrc.noaa.gov/fbin/climate_server
</pre>
The steps are theoretically straightforward:
#Create the time series from the COADS Climatology archive. This is done by answering the menu of options on the COADS web page.
#Import the time series from step 1 to the user's local data analysis system.  Note that this step may itself require several steps:
##The data must be down-loaded, using "ftp" or a similar program.
##Once down-loaded, the data may have to be converted into a format that can be read by the data analysis program.
#:
#Examine the data and formulate a request to the AVHRR archive. This is again done by answering the menu of option on the AVHRR Web page.  Note that the COADS and AVHRR pages are not completely compatible in this respect. For example, the date formats of the two pages are different.
#Import the result of step 3 to the user's local data display system. This may also require several steps:
##The data must be down-loaded again.
##And again, once down-loaded, the data may have to be converted into a format that can be read by the data analysis program.  Note that the set of available formats on the COADS page are distinct from the available options from the AVHRR archive.
#:
#Think about the results.
Though the procedure is straightforward and the web servers designed
to make sampling the datasets a simple task, upon close examination,
the combination of the steps may create unforeseen difficulties. For
example, a request to the COADS server will return either a spreadsheet
suitable for use on a PC, a netCDF format file, or a file in one
of a selection of simple ASCII formats.
If the user is fortunate, the returned file will already be in a
format compatible with the desired analysis package. But not all users
will be so fortunate.  Often this file must be converted to some
other file format before it can be imported to the user's analysis
program. This may or may not be a simple task.
Even a file format for which a user is properly equipped may be used
in an unfamiliar manner. For example, the independent and dependent
variables might be in a different order or an ASCII data file may use
tabs instead of spaces.
Assuming the import of the COADS data has been accomplished and
boundaries for the AVHRR search identified, the task of selecting from
the second archive may begin. Unfortunately, the request to the AVHRR
archive will return either a GIF picture, an HDF format file, or a raw
(binary) data file. Again, importing this output into the user's
analysis program may or may not be simple, but it will not be the same
procedure as the one used for the first data request.
Other problems are also apparent. The COADS Climatology sampling
program requests the user supply dates (month and day), whereas the
AVHRR archive asks for the "Julian day" (an integer between 1 and
365 or 366). One server will accept "S" and "W" to indicate South
latitudes and West longitudes, while the other requires that these be
indicated with negative coordinate values. The sampling of the COADS
dataset, while flexible, may not allow sampling in the manner the user
needs. It cannot, for example, provide a section except along a line
of constant latitude or longitude. If a user wanted to see a section
along a NE-SW line, it would be a challenging and time-consuming
task to assemble one from many small data requests.
Further, it might be desirable to use the results of sampling these
two databases to construct a time series. This could conceivably mean
repeating the entire procedure many times.
===An Example: Using OPeNDAP===
To produce the same data selection using OPeNDAP, a user would follow
essentially the same steps. However, the steps themselves would be
performed differently. Once the user's data analysis package has been
converted to an OPeNDAP client
(([http://www <cite> opd-client,link</cite>])), the \tbd{add xref to install GUI
clients}
accesses to the remote datasets are made through the analysis package
itself. Instead of specifying a data file by a pathname reference to
some local disk file, the user specifies a URL, which may point to
either a local or a remote dataset.  Here is a re cap of the same operation,
outlined as they would be performed by an OPeNDAP application program:
#Create the time series from the COADS Climatology archive. This is done by using the sampling facilities of whatever data analysis program a scientist is familiar with.  If desired, OPeNDAP constraint expressions may be used to reduce the network load, or to provide a sampling scheme not supported by the data analysis program.
#The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 1.
#Examine the data and formulate a request to the AVHRR archive. This is again done through the sampling facilities of whatever data analysis program the user is using, and OPeNDAP constraint expressions.  Note that, whatever their actual format, both COADS and AVHRR archives appear to the OPeNDAP client to be stored in identical formats.
#The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 3.
#Think about the results.
It is important to note that "any" data analysis package that can
handle one of the DODS-supported data access APIs can be converted
into an OPeNDAP client program capable of reading data stored by "all"
of the DODS-supported data access APIs. (There are some limitations on
translation. See ([http://www <cite> intro,opd-client</cite>]) and
([http://www <cite> data,trans</cite>]) for more information.) Therefore, assuming
the user has some analysis package capable of doing the required
sampling and analysis on local data, all the steps would be performed
from within that package, just as if the user were operating on local
files. The result is a simpler procedure, even though the same
essential steps are followed.
The OPeNDAP scenario has, among others, the following advantages:
 
*The user need not learn about any of the archival formats, since the OPeNDAP server and client cooperate to deliver the data in the format in which the analysis package expects to see it. Whereas the user of the ftp server has to worry about importing the data into the analysis program, the OPeNDAP client program imports it transparently.
*The user can sample the distant datasets in any fashion supported by his or her own (local) analysis package. Unnecessary data need not be sent over the Internet.
*By appending a "constraint expression" to the URLs given to the analysis program, the user can sample data using techniques that their analysis program ''cannot''  do.\footnote{For example, suppose a user wishes to access the NODC XBT database using a program that uses the netCDF API. A program that can process the arrays that netCDF manipulates are largely unsuitable for XBT station data. However, a user can define constraint expressions in the URL to sample the data and deliver it in a form the netCDF API can use. For more information about constraint expressions, see Section~(opd-client,constraint). For more information about data models and translation, see Chapter~(data).}\tbd{Use a different example in the footnote}
*A substantial amount of the searching and sampling is performed on the server machines. This reduces Internet traffic, as well as decreasing the load on the local machine.


===The OPeNDAP Client===
===The OPeNDAP Client===




OPeNDAP uses a client/server model. As mentioned, the OPeNDAP
OPeNDAP uses a client/server model. The OPeNDAP
servers are simply "httpd} web servers, equipped to interpret an OPeNDAP URL sent to them. (See \chapterref{opd-server".) The OPeNDAP client
servers are web servers equipped to interpret an OPeNDAP URL sent to them. (See [[Wiki_Testing/OPeNDAPUserGuide5|Chapter 5]]) The OPeNDAP client
program can be any program that uses one of the supported APIs, such
program can be any program that uses one of the supported APIs, such
as JGOFS or netCDF.\footnote{Or a program specially developed to
as JGOFS or netCDF.([[Wiki_Testing/OPeNDAPUserGuideFootNotes|3]])
read data from OPeNDAP servers.}


Without OPeNDAP, an application program that uses one of the common data
Without OPeNDAP, an application program that uses one of the common data
access APIs such as netCDF will operate as shown in [[Image:intro,fig,unlinked]].
access APIs such as netCDF will operate as shown in
[[:Image:unlinked.gif | the figure below]].
The user
The user
makes a request for data from the application program.  The program in turn
makes a request for data from the application program.  The program in turn
Line 217: Line 87:
which is stored locally on the host machine.  Some APIs are somewhat more
which is stored locally on the host machine.  Some APIs are somewhat more
sophisticated than this, of course, but their general operation is
sophisticated than this, of course, but their general operation is
similar to this outline.
as simple, and the whole process happens on a single machine.
 
<center>
[[Image:unlinked.png|actual size]]
 
The Architecture of a Data Analysis Package


\figureplace{The Architecture of a Data Analysis Package.}{htbp}
</center>
{intro,fig,unlinked}{unlinked.ps}{unlinked.gif}{}


The operation of an OPeNDAP client is illustrated in [[Image:intro,fig,linked]].
The operation of an OPeNDAP client is illustrated in the [[:Image:unlinked.gif | figure below]].
Here, the
Here, the
''same application program''  that was used in [[Image:intro,fig,unlinked]]
''same application program''  that was used in [[:Image:unlinked.gif |
the figure above]]
has been linked
has been linked
with an OPeNDAP version of the data access API.  Now, in addition to being
with an OPeNDAP version of the data access API library.  Now, in addition to being
able to use local data as before, the application program is able to access
able to use local data as before, the application program is able to access
data from OPeNDAP server anywhere on the Internet in the same manner as the
data from OPeNDAP servers anywhere on the Internet in exactly the same manner as the
local data.
local data.


To make some program into an OPeNDAP client, it must only be re-linked with
To make some analysis program into an OPeNDAP client, just re-link it with
the OPeNDAP implementation of the supported API library. This is a simple
the OPeNDAP implementation of the supported API library. This is a simple
process, generally requiring only a few minutes. The process will
process, generally requiring only a few minutes. This will
create a program that accepts URLs, specifying a location for the data
create a program that accepts URLs as well as file pathnames to
somewhere on the Internet, in addition to file pathnames which only
identify data to be read. (See
specify a location on the local platform's file system. (See
[[Wiki_Testing/OPeNDAPUserGuide3|Section 3.1]]).
([http://www <cite> opd-client,link</cite>]).)
 
<center>
[[Image:linked.png|actual size]]


\figureplace{The Architecture of a Data Analysis Package Using OPeNDAP.}{htbp}
The Architecture of a Data Analysis Package Using OPeNDAP
{intro,fig,linked}{linked.ps}{linked.gif}{}
</center>


OPeNDAP also provides a data translation facility. Data from the original
OPeNDAP also provides a data translation facility. Data from the original
data file is translated by the OPeNDAP server into an OPeNDAP data model for
data file is translated by the OPeNDAP server into the OPeNDAP data model for
transmission to the client. Upon receiving the data, the client
transmission to the client. Upon receiving the data, the client
translates the data into the data model it understands. (See
translates the data into the data model it understands. (See
([http://www <cite> data</cite>]) for more information about the OPeNDAP data model.)
[[Wiki_Testing/OPeNDAPUserGuide6| Chapter 6]] for more information about the OPeNDAP data model.)
Because the data transmitted from an OPeNDAP server to the client travel
Because the data transmitted from an OPeNDAP server to the client travel
in the OPeNDAP format, the data set's original storage format is completely
in the OPeNDAP format, the dataset's original storage format is completely
irrelevant to the user of an OPeNDAP client. If the client was originally
irrelevant to the client. If the client was originally
designed to read netCDF format files, the data returned by the
designed to read netCDF format files, the data returned by the
OPeNDAP-netCDF library will appear to have been read from a netCDF file,
OPeNDAP-netCDF library will appear to have been read from a netCDF file,
whatever the actual format of the files from which the data were
whatever the actual format of the files from which the data were
read\footnote{Note that there is a limit to what can be translated. An
read([[Wiki_Testing/OPeNDAPUserGuideFootNotes|4]]). If the
API meant to support two-dimensional arrays may be able to handle
program expects JGOFS data, the OPeNDAP-JGOFS library will return data
one-dimensional vector data, but a program designed to process
that seem to have come from a JGOFS dataset, and so on.
one-dimensional vector data will not know what to do with a
two-dimensional array. The set of data access APIs supported by OPeNDAP
contain several such mismatches. See
Section~(data,trans) for more information.}. If the
program expects JGOFS data, the DODS-JGOFS library will return data
that seem to have come from a JGOFS dataset, again, no matter what the
actual input file format.


OPeNDAP does not pretend to remove all the overhead of data searches. A
OPeNDAP does not pretend to remove all the overhead of data
user will still have to keep track of the URLs of interesting data
searches. A user will still have to keep track of the URLs of
sets in the same way a user must now keep track of the names of files
interesting data sets in the same way a user must now keep track of
containing interesting data.  an OPeNDAP \new{catalog service} is in the
the names of files containing interesting data.  (You may run across
process of being constructed that will help users scan the available
datasets where the data consists of OPeNDAP URLs.  These are the
datasets.
OPeNDAP file servers, and have been developed by OPeNDAP users to
organize datasets consisting of large numbers of individual files.)


==Providing Data with OPeNDAP==
==Providing Data with OPeNDAP==
Line 277: Line 149:




The designers of OPeNDAP recognized that many of the data users are also
The OPeNDAP designers recognize that many data users are also
the data providers, and OPeNDAP was built with a recognition that
data providers, and the software was built with a recognition that
providing the data should be as simple and as straightforward as
providing the data should be as simple and as straightforward as
possible. In many cases, once a local web server is equipped to become
possible. In many cases, once a local web server is equipped to become
Line 288: Line 160:




*Install and configure the OPeNDAP server.
*Install and configure the OPeNDAP server.([[Wiki_Testing/OPeNDAPUserGuide5|Section 5.2]])
(([http://www <cite> opd-server,install</cite>]).)
*Store the data in the appropriate file format and store it where the server can find it.
*Create whatever ancillary data files are needed by the data set (if any). (([http://www <cite> intro,ancillary</cite>]).)  %
*Create whatever ancillary data files are needed by the data set (if any). <!-- ([[Wiki_Testing/OPeNDAPUserGuide1|Section 1.2.2]])-->   
*Register the data set with the master directory (optional).  %
 
*Create the data catalog.


===The OPeNDAP Server===
===The OPeNDAP Server===


The OPeNDAP data server is made up of two pieces.  You can think of
them as a front-end and a back-end.  Generally speaking, they will run
on the same machine, and from a user's perspective, appear to be a
single server.


The OPeNDAP data server is simply made up of a regular <font color='green'>httpd</font> server
The front-end server is a Tomcat servlet, and is also called the
equipped with CGI programs (or filters) that will respond to requests
'''OPeNDAP Lightweight Front-End Servlet''' (OLFS).  Its job is to
for dataset structure, data attributes, and data itself. (See
receive your request for data and manage all the different forms such
([http://www <cite> data,dap</cite>]) for a description of the data returned by these
a request might take. For example, you might be asking for the data,
requests and see ([http://www <cite> opd-client,url</cite>]) for a description of the
an ASCII version of the data, or a reply to a SOAP message. The front-end server can also reply to THREDDS catalog
OPeNDAP URL syntax used to send these requests.)  Most of the task of a
requests, for information about the data.
data provider consists of configuring this server.  While perhaps not
a trivial task, it potentially represents far less effort than
packaging a dataset for submission to some central data archive.
Furthermore, modifying a server's configuration to accommodate new
data will be an almost trivial task, involving the simple editing of a
configuration file.
 
===Ancillary Data===


The '''Back-End Server''' (BES) is more directly about performance,
and is designed to respond quickly and efficiently to requests from
the OLFS.  It is a pure data server, and has only one format of
request and response, relying on the OLFS to convert formats to
accommodate the user.  Most users won't make requests directly to the
BES.


In order for an OPeNDAP client to accept data from an OPeNDAP server, it must be able
[[Image:HyraxArchitecture.jpg|center|600px]]
to allocate the data structures and arrange internal labels to organize the
incoming data. The information the client library needs to do this
organizing is called the ancillary data\footnote{It is also referred to as


the Data Descriptor Structure and the Data Attribute Structure. See
''Hyrax'' is an alternative name for the OPeNDAP 4 Data Server.


Chapter~(data) for more details about these structures.}.  For many
See
APIs, the ancillary data is inherent in the data files themselves, and the
[[Wiki_Testing/OPeNDAPUserGuide6|Section 6.2]] for a description of the data returned by these
OPeNDAP server can glean that information by scanning the data files. For large
requests and see [[Wiki_Testing/OPeNDAPUserGuide2|Section 2.1]] for a description of the
data archives, where scanning the data files is impractical, and that might
OPeNDAP URL syntax used to send these requests.
not change often, OPeNDAP can cache the ancillary data to speed access times.
When a client requests the ancillary data, the OPeNDAP server can check this
data cache first before scanning the data files.


This feature is useful in other cases because not all data file formats
See [[Hyrax|the OPeNDAP 4 Data Server documentation]] for a
are self-describing.  For example, a data set might contain several files of
description of how to install and configure an OPeNDAP data server.
time vs. temperature data; the header information describing which numbers
are temperature and which time may be in a different file or may simply
be understood by the user of the local data analysis program equipped
to look at this data.  As an example, data accessed by OPeNDAP servers using
the FreeForm data access API require provider-created ancillary data files.


===Administration and Centralization of Data===
===Administration and Centralization of Data===
Line 342: Line 204:
is organized in a manner similar to the World Wide Web itself.  That
is organized in a manner similar to the World Wide Web itself.  That
is, all one need do to make one's data available is to start up a
is, all one need do to make one's data available is to start up a
properly configured "httpd" server on an Internet node that has
properly configured server on an Internet node that has
access to the data to be served.  Each data provider is free to join
access to the data to be served.  Each data provider is free to join
and to leave the system when it is convenient, just as any proprietor
and to leave the system when it is convenient, just as any proprietor
Line 354: Line 216:
configuring an OPeNDAP server does not.  OPeNDAP provided a facility for
configuring an OPeNDAP server does not.  OPeNDAP provided a facility for
registering a data set with the GCMD catalog, which makes the data set
registering a data set with the GCMD catalog, which makes the data set
known to the OPeNDAP data location service.
known to the OPeNDAP data location service.  The THREDDS catalog service
is another way to make information about your data widely available.





Latest revision as of 15:45, 21 September 2009

What is OPeNDAP?

OPeNDAP provides a way for researchers to access scientific data anywhere on the Internet from a wide variety of new and existing programs. By developing network versions of commonly used data access libraries, such as NetCDF , HDF , JGOFS , and others, the OPeNDAP project can capitalize on years of development of data analysis and display packages that use those file formats, allowing users to continue to use programs with which they are already familiar.

The OPeNDAP architecture uses a client/server model, with a client that sends requests for data out onto the network to some server, that answers with the requested data. This is exactly the model used by the World Wide Web where client programs called browsers submit requests to web servers for the data that make up web pages. Of course, OPeNDAP clients can do much more than browse this data. Using flexible data types suitable for many uses, including scientific data, the OPeNDAP servers deliver real data directly to the client program in the format needed by that client.

The network communication model used by OPeNDAP uses URL addresses and web servers to deliver data to the researcher. This is done by using the OPeNDAP software to convert a researcher's data analysis software into a sophisticated (though specialized) web browser. In addition to providing network-compatible versions of popular data access APIs, the OPeNDAP project also provides a software client and server toolkit to help other developers create network-compatible OPeNDAP versions of other APIs.

To expand the universe of data available to a user, OPeNDAP incorporates a powerful data translation facility, so that data may be stored in data structures and formats defined by the data provider, but accessed by the user in a manner identical to the access of local data files on the user's own system. Though there are limitations on the types of data that may be translated (See Section 6.1.2), the facility is flexible and general enough to handle many of the possible translations. There are two important results:

  • A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that neither data set is stored in a format readable by the original version of the data analysis and display program he or she uses.
  • No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. (Of course, it doesn't have to be in a highly idiosyncratic format.)

The combination of the OPeNDAP network communication model and the data translation facility make OPeNDAP a powerful tool for the retrieval, sampling, and display of large distributed datasets. Though OPeNDAP was developed by oceanographers, its application is not constrained to oceanographic data. The organizing principles and algorithms may be applied to many other fields where data can be stored on computers.

The uniformity with which data appears makes the system very useful both for easing data analysis for a researcher, but also for automating data transport and manipulation tasks. OPeNDAP libraries make data seem uniform, and by making the data analysis programs network-aware, simplify scripting and automation. For example, NOAA's Live Access Server (LAS) (see, for example, My NASA Data) uses OPeNDAP, as do many of the real-time observing systems that make up the Integrated Ocean Observing System (IOOS), like Gulf of Maine Ocean Observing System.


The population of people who may be interested in a system such as OPeNDAP may be divided into data consumers and data providers. Though it was an important observation to the development of OPeNDAP that the two roles are often assumed by the same scientists, the division is a useful one for the introduction of the system. The following two sections provide a broad introduction to the roles of data consumer and data provider. The remainder of this guide is organized around this distinction between classes of users.


The OPeNDAP Client

OPeNDAP uses a client/server model. The OPeNDAP servers are web servers equipped to interpret an OPeNDAP URL sent to them. (See Chapter 5) The OPeNDAP client program can be any program that uses one of the supported APIs, such as JGOFS or netCDF.(3)

Without OPeNDAP, an application program that uses one of the common data access APIs such as netCDF will operate as shown in the figure below. The user makes a request for data from the application program. The program in turn uses procedures defined by the data access API to access the data, which is stored locally on the host machine. Some APIs are somewhat more sophisticated than this, of course, but their general operation is as simple, and the whole process happens on a single machine.

actual size

The Architecture of a Data Analysis Package

The operation of an OPeNDAP client is illustrated in the figure below. Here, the same application program that was used in the figure above has been linked with an OPeNDAP version of the data access API library. Now, in addition to being able to use local data as before, the application program is able to access data from OPeNDAP servers anywhere on the Internet in exactly the same manner as the local data.

To make some analysis program into an OPeNDAP client, just re-link it with the OPeNDAP implementation of the supported API library. This is a simple process, generally requiring only a few minutes. This will create a program that accepts URLs as well as file pathnames to identify data to be read. (See Section 3.1).

actual size

The Architecture of a Data Analysis Package Using OPeNDAP


OPeNDAP also provides a data translation facility. Data from the original data file is translated by the OPeNDAP server into the OPeNDAP data model for transmission to the client. Upon receiving the data, the client translates the data into the data model it understands. (See Chapter 6 for more information about the OPeNDAP data model.) Because the data transmitted from an OPeNDAP server to the client travel in the OPeNDAP format, the dataset's original storage format is completely irrelevant to the client. If the client was originally designed to read netCDF format files, the data returned by the OPeNDAP-netCDF library will appear to have been read from a netCDF file, whatever the actual format of the files from which the data were read(4). If the program expects JGOFS data, the OPeNDAP-JGOFS library will return data that seem to have come from a JGOFS dataset, and so on.

OPeNDAP does not pretend to remove all the overhead of data searches. A user will still have to keep track of the URLs of interesting data sets in the same way a user must now keep track of the names of files containing interesting data. (You may run across datasets where the data consists of OPeNDAP URLs. These are the OPeNDAP file servers, and have been developed by OPeNDAP users to organize datasets consisting of large numbers of individual files.)

Providing Data with OPeNDAP

The OPeNDAP data provider is the person or organization willing to make their digital datasets available to the community with an OPeNDAP server.


The OPeNDAP designers recognize that many data users are also data providers, and the software was built with a recognition that providing the data should be as simple and as straightforward as possible. In many cases, once a local web server is equipped to become an OPeNDAP server, a scientist need do very little beyond what must be done simply to make the data available locally. (i.e., Put the data into a file format that can be read by the locally used data analysis and display programs.) The tasks of a data provider can be separated into three parts:


  • Install and configure the OPeNDAP server.(Section 5.2)
  • Store the data in the appropriate file format and store it where the server can find it.
  • Create whatever ancillary data files are needed by the data set (if any).


The OPeNDAP Server

The OPeNDAP data server is made up of two pieces. You can think of them as a front-end and a back-end. Generally speaking, they will run on the same machine, and from a user's perspective, appear to be a single server.

The front-end server is a Tomcat servlet, and is also called the OPeNDAP Lightweight Front-End Servlet (OLFS). Its job is to receive your request for data and manage all the different forms such a request might take. For example, you might be asking for the data, an ASCII version of the data, or a reply to a SOAP message. The front-end server can also reply to THREDDS catalog requests, for information about the data.

The Back-End Server (BES) is more directly about performance, and is designed to respond quickly and efficiently to requests from the OLFS. It is a pure data server, and has only one format of request and response, relying on the OLFS to convert formats to accommodate the user. Most users won't make requests directly to the BES.

HyraxArchitecture.jpg

Hyrax is an alternative name for the OPeNDAP 4 Data Server.

See Section 6.2 for a description of the data returned by these requests and see Section 2.1 for a description of the OPeNDAP URL syntax used to send these requests.

See the OPeNDAP 4 Data Server documentation for a description of how to install and configure an OPeNDAP data server.

Administration and Centralization of Data

Under OPeNDAP, there is no central archive of data. Data under OPeNDAP is organized in a manner similar to the World Wide Web itself. That is, all one need do to make one's data available is to start up a properly configured server on an Internet node that has access to the data to be served. Each data provider is free to join and to leave the system when it is convenient, just as any proprietor of a web page is free to delete it or add to it as whimsy demands.

Of course, as can also be seen on the World Wide Web, there are some disadvantages to the lack of central authority. If no one knows about a web site, no one will visit it. Similarly, listing a dataset in a central data catalog, such as the Global Change Master Directory (http://gcmd.gsfc.nasa.gov/),can make data available to other researchers in a way that simply configuring an OPeNDAP server does not. OPeNDAP provided a facility for registering a data set with the GCMD catalog, which makes the data set known to the OPeNDAP data location service. The THREDDS catalog service is another way to make information about your data widely available.


The remainder of this book will be divided into three major sections: instructions on the building and operating of OPeNDAP clients; a tutorial and reference on running OPeNDAP servers and making data available to OPeNDAP clients; and technical documentation describing the implementation details (and the motivation behind many of the design decisions) of the OPeNDAP software.