UserGuideChapter1

From OPeNDAP Documentation
Revision as of 01:36, 22 September 2007 by Yuan (talk | contribs)
⧼opendap2-jumptonavigation⧽

What is OPeNDAP?

The OPeNDAP provides a way for ocean researchers to access oceanographic data anywhere on the Internet from a wide variety of new and existing programs. By developing network versions of commonly used data access Application Program Interface (API) libraries, such as NetCDF , HDF , JGOFS , and others, the OPeNDAP project can capitalize on years of development of data analysis and display packages that use those APIs, allowing users to continue to use programs with which they are already familiar.

The OPeNDAP architecture uses a client/server model, with a {\em

{client}} that sends requests for data out onto the network to some "server", that answers with the requested data. This is exactly the model used by the World Wide Web where client programs called browsers submit requests to web servers for the data that make up web pages. Of course, OPeNDAP clients can do much more than browse this data. Using flexible data types suitable for many uses, including scientific data, the OPeNDAP servers deliver real data directly to the client program in the format needed by that client.

In fact, the network communication model used by OPeNDAP uses URL addresses and web servers ("httpd") to deliver data to the researcher. This is done by using the OPeNDAP software to convert a researcher's data analysis software into a sophisticated (though specialized) web browser. In addition to providing network-compatible versions of popular data access APIs, the OPeNDAP project also provides a software client and server toolkit to help other developers create network-compatible OPeNDAP versions of other APIs.

To expand the universe of data available to a user, OPeNDAP incorporates a powerful data translation facility, so that data may be stored in data structures and formats defined by the data provider, but may be accessed by the user in a manner identical to the access of local data files on the user's own system. Though there are limitations on the types of data that may be translated (See ( data,trans)), the facility is flexible and general enough to handle many of the possible translation. There are two important results:

  • A user may not need to know that data from one set are stored in a format different from data in another set. Further, it may be possible that "neither" data set is stored in a format readable by the original (i.e. without OPeNDAP) version of the data analysis and display program he or she uses.
  • No segment of OPeNDAP users will be effectively cut off from accessing data because of its storage format. A scientist who wishes to make his or her data available to other OPeNDAP users may do so while keeping that data in what may actually be a highly idiosyncratic storage format. Of course, it doesn't have to be in a highly idiosyncratic format. The point is that OPeNDAP can handle a wide variety of possible cases.

The combination of the OPeNDAP network communication model and the data translation facility make OPeNDAP a powerful tool for the retrieval, sampling, and display of large distributed datasets. Though OPeNDAP was developed by oceanographers, its application is not constrained to oceanographic data. The organizing principles and algorithms may be applied to many other fields where data can be stored on computers.

The population of people who may be interested in a system such as OPeNDAP may be divided into data consumers and data providers. Though it was an important observation to the development of OPeNDAP that the two roles are often assumed by the same scientists, the division is a useful one for the introduction of the system. The following two sections provide a broad introduction to the roles of data consumer and data provider. The remainder of this guide is organized around this distinction between classes of users.

Why Use OPeNDAP to Read Data?

A scientist wishing to examine and sample some dataset will typically be comfortable using a relatively small number of data analysis and display programs or packages. Some of these packages will use one of the popular data access APIs currently available. However, few data access APIs provide direct access to distributed data

refers to datasets that reside on different computers which are linked by a network such as the Internet. The computers may or may not be physically remote from each other. The main point is that the computers manage their data resources independently. In this guide the terms "remote\/} and {\em distributed\/" are used to imply independently managed resources.}, so this access must be made with network tools, such as web browsers or "ftp". While relatively straightforward in principle, this process can nonetheless become time-consuming and somewhat challenging in practice.

The following example illustrates some of the differences between accessing distributed data with the tools currently in widespread use, and the same operation using OPeNDAP.

An Example: Using ftp

The advent of the WWW has made possible simple data browsers that allow sophisticated interactive sampling of on-line datasets. Using a web browser and "ftp", a user can sample any of several large oceanographic datasets available on the Internet. However, there are several problems with these data search engines that may only become apparent when a user actually tries to use the data.

Among the problems that can arise are those that appear when a user tries to use the results of one dataset to search a second dataset. Suppose that a user wishes to choose a sea-surface temperature image from the NOAA/NASA Pathfinder AVHRR archive at:

http://podaac-www.jpl.nasa.gov/mcsst/mcsst_subset.html

using the results of a time-series generated from the COADS Climatology archive at:

http://ferret.wrc.noaa.gov/fbin/climate_server

The steps are theoretically straightforward:


  1. Create the time series from the COADS Climatology archive. This is done by answering the menu of options on the COADS web page.
  2. Import the time series from step 1 to the user's local data analysis system. Note that this step may itself require several steps:
    1. The data must be down-loaded, using "ftp" or a similar program.
    2. Once down-loaded, the data may have to be converted into a format that can be read by the data analysis program.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done by answering the menu of option on the AVHRR Web page. Note that the COADS and AVHRR pages are not completely compatible in this respect. For example, the date formats of the two pages are different.
  4. Import the result of step 3 to the user's local data display system. This may also require several steps:
    1. The data must be down-loaded again.
    2. And again, once down-loaded, the data may have to be converted into a format that can be read by the data analysis program. Note that the set of available formats on the COADS page are distinct from the available options from the AVHRR archive.
  5. Think about the results.

Though the procedure is straightforward and the web servers designed to make sampling the datasets a simple task, upon close examination, the combination of the steps may create unforeseen difficulties. For example, a request to the COADS server will return either a spreadsheet suitable for use on a PC, a netCDF format file, or a file in one of a selection of simple ASCII formats. If the user is fortunate, the returned file will already be in a format compatible with the desired analysis package. But not all users will be so fortunate. Often this file must be converted to some other file format before it can be imported to the user's analysis program. This may or may not be a simple task.

Even a file format for which a user is properly equipped may be used in an unfamiliar manner. For example, the independent and dependent variables might be in a different order or an ASCII data file may use tabs instead of spaces.

Assuming the import of the COADS data has been accomplished and boundaries for the AVHRR search identified, the task of selecting from the second archive may begin. Unfortunately, the request to the AVHRR archive will return either a GIF picture, an HDF format file, or a raw (binary) data file. Again, importing this output into the user's analysis program may or may not be simple, but it will not be the same procedure as the one used for the first data request.

Other problems are also apparent. The COADS Climatology sampling program requests the user supply dates (month and day), whereas the AVHRR archive asks for the "Julian day" (an integer between 1 and 365 or 366). One server will accept "S" and "W" to indicate South latitudes and West longitudes, while the other requires that these be indicated with negative coordinate values. The sampling of the COADS dataset, while flexible, may not allow sampling in the manner the user needs. It cannot, for example, provide a section except along a line of constant latitude or longitude. If a user wanted to see a section along a NE-SW line, it would be a challenging and time-consuming task to assemble one from many small data requests.

Further, it might be desirable to use the results of sampling these two databases to construct a time series. This could conceivably mean repeating the entire procedure many times.

An Example: Using OPeNDAP

To produce the same data selection using OPeNDAP, a user would follow essentially the same steps. However, the steps themselves would be performed differently. Once the user's data analysis package has been converted to an OPeNDAP client (( opd-client,link)), the \tbd{add xref to install GUI

clients} accesses to the remote datasets are made through the analysis package itself. Instead of specifying a data file by a pathname reference to some local disk file, the user specifies a URL, which may point to either a local or a remote dataset. Here is a re cap of the same operation, outlined as they would be performed by an OPeNDAP application program:


  1. Create the time series from the COADS Climatology archive. This is done by using the sampling facilities of whatever data analysis program a scientist is familiar with. If desired, OPeNDAP constraint expressions may be used to reduce the network load, or to provide a sampling scheme not supported by the data analysis program.
  2. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 1.
  3. Examine the data and formulate a request to the AVHRR archive. This is again done through the sampling facilities of whatever data analysis program the user is using, and OPeNDAP constraint expressions. Note that, whatever their actual format, both COADS and AVHRR archives appear to the OPeNDAP client to be stored in identical formats.
  4. The data need not be imported to the user's data analysis program, since it was down-loaded and converted automatically in step 3.
  5. Think about the results.

It is important to note that "any" data analysis package that can handle one of the DODS-supported data access APIs can be converted into an OPeNDAP client program capable of reading data stored by "all" of the DODS-supported data access APIs. (There are some limitations on translation. See ( intro,opd-client) and ( data,trans) for more information.) Therefore, assuming the user has some analysis package capable of doing the required sampling and analysis on local data, all the steps would be performed from within that package, just as if the user were operating on local files. The result is a simpler procedure, even though the same essential steps are followed.

The OPeNDAP scenario has, among others, the following advantages:


  • The user need not learn about any of the archival formats, since the OPeNDAP server and client cooperate to deliver the data in the format in which the analysis package expects to see it. Whereas the user of the ftp server has to worry about importing the data into the analysis program, the OPeNDAP client program imports it transparently.
  • The user can sample the distant datasets in any fashion supported by his or her own (local) analysis package. Unnecessary data need not be sent over the Internet.
  • By appending a "constraint expression" to the URLs given to the analysis program, the user can sample data using techniques that their analysis program cannot do.\footnote{For example, suppose a user wishes to access the NODC XBT database using a program that uses the netCDF API. A program that can process the arrays that netCDF manipulates are largely unsuitable for XBT station data. However, a user can define constraint expressions in the URL to sample the data and deliver it in a form the netCDF API can use. For more information about constraint expressions, see Section~(opd-client,constraint). For more information about data models and translation, see Chapter~(data).}\tbd{Use a different example in the footnote}
  • A substantial amount of the searching and sampling is performed on the server machines. This reduces Internet traffic, as well as decreasing the load on the local machine.

The OPeNDAP Client

OPeNDAP uses a client/server model. As mentioned, the OPeNDAP servers are simply "httpd} web servers, equipped to interpret an OPeNDAP URL sent to them. (See \chapterref{opd-server".) The OPeNDAP client program can be any program that uses one of the supported APIs, such as JGOFS or netCDF.\footnote{Or a program specially developed to read data from OPeNDAP servers.}

Without OPeNDAP, an application program that uses one of the common data access APIs such as netCDF will operate as shown in File:Intro,fig,unlinked. The user makes a request for data from the application program. The program in turn uses procedures defined by the data access API to access the data, which is stored locally on the host machine. Some APIs are somewhat more sophisticated than this, of course, but their general operation is similar to this outline.

\figureplace{The Architecture of a Data Analysis Package.}{htbp} {intro,fig,unlinked}{unlinked.ps}{unlinked.gif}{}

The operation of an OPeNDAP client is illustrated in File:Intro,fig,linked. Here, the same application program that was used in File:Intro,fig,unlinked has been linked with an OPeNDAP version of the data access API. Now, in addition to being able to use local data as before, the application program is able to access data from OPeNDAP server anywhere on the Internet in the same manner as the local data.

To make some program into an OPeNDAP client, it must only be re-linked with the OPeNDAP implementation of the supported API library. This is a simple process, generally requiring only a few minutes. The process will create a program that accepts URLs, specifying a location for the data somewhere on the Internet, in addition to file pathnames which only specify a location on the local platform's file system. (See ( opd-client,link).)

\figureplace{The Architecture of a Data Analysis Package Using OPeNDAP.}{htbp} {intro,fig,linked}{linked.ps}{linked.gif}{}

OPeNDAP also provides a data translation facility. Data from the original data file is translated by the OPeNDAP server into an OPeNDAP data model for transmission to the client. Upon receiving the data, the client translates the data into the data model it understands. (See ( data) for more information about the OPeNDAP data model.) Because the data transmitted from an OPeNDAP server to the client travel in the OPeNDAP format, the data set's original storage format is completely irrelevant to the user of an OPeNDAP client. If the client was originally designed to read netCDF format files, the data returned by the OPeNDAP-netCDF library will appear to have been read from a netCDF file, whatever the actual format of the files from which the data were read\footnote{Note that there is a limit to what can be translated. An API meant to support two-dimensional arrays may be able to handle one-dimensional vector data, but a program designed to process one-dimensional vector data will not know what to do with a two-dimensional array. The set of data access APIs supported by OPeNDAP contain several such mismatches. See Section~(data,trans) for more information.}. If the program expects JGOFS data, the DODS-JGOFS library will return data that seem to have come from a JGOFS dataset, again, no matter what the actual input file format.

OPeNDAP does not pretend to remove all the overhead of data searches. A user will still have to keep track of the URLs of interesting data sets in the same way a user must now keep track of the names of files containing interesting data. an OPeNDAP \new{catalog service} is in the process of being constructed that will help users scan the available datasets.

Providing Data with OPeNDAP

The OPeNDAP data provider is the person or organization willing to make their digital datasets available to the community with an OPeNDAP server.


The designers of OPeNDAP recognized that many of the data users are also the data providers, and OPeNDAP was built with a recognition that providing the data should be as simple and as straightforward as possible. In many cases, once a local web server is equipped to become an OPeNDAP server, a scientist need do very little beyond what must be done simply to make the data available locally. (i.e., Put the data into a file format that can be read by the locally used data analysis and display programs.) The tasks of a data provider can be separated into three parts:


  • Install and configure the OPeNDAP server.

(( opd-server,install).)

  • Create whatever ancillary data files are needed by the data set (if any). (( intro,ancillary).)  %
  • Register the data set with the master directory (optional).  %
  • Create the data catalog.

The OPeNDAP Server

The OPeNDAP data server is simply made up of a regular httpd server equipped with CGI programs (or filters) that will respond to requests for dataset structure, data attributes, and data itself. (See ( data,dap) for a description of the data returned by these requests and see ( opd-client,url) for a description of the OPeNDAP URL syntax used to send these requests.) Most of the task of a data provider consists of configuring this server. While perhaps not a trivial task, it potentially represents far less effort than packaging a dataset for submission to some central data archive. Furthermore, modifying a server's configuration to accommodate new data will be an almost trivial task, involving the simple editing of a configuration file.

Ancillary Data

In order for an OPeNDAP client to accept data from an OPeNDAP server, it must be able to allocate the data structures and arrange internal labels to organize the incoming data. The information the client library needs to do this organizing is called the ancillary data\footnote{It is also referred to as

the Data Descriptor Structure and the Data Attribute Structure. See

Chapter~(data) for more details about these structures.}. For many APIs, the ancillary data is inherent in the data files themselves, and the OPeNDAP server can glean that information by scanning the data files. For large data archives, where scanning the data files is impractical, and that might not change often, OPeNDAP can cache the ancillary data to speed access times. When a client requests the ancillary data, the OPeNDAP server can check this data cache first before scanning the data files.

This feature is useful in other cases because not all data file formats are self-describing. For example, a data set might contain several files of time vs. temperature data; the header information describing which numbers are temperature and which time may be in a different file or may simply be understood by the user of the local data analysis program equipped to look at this data. As an example, data accessed by OPeNDAP servers using the FreeForm data access API require provider-created ancillary data files.

Administration and Centralization of Data

Under OPeNDAP, there is no central archive of data. Data under OPeNDAP is organized in a manner similar to the World Wide Web itself. That is, all one need do to make one's data available is to start up a properly configured "httpd" server on an Internet node that has access to the data to be served. Each data provider is free to join and to leave the system when it is convenient, just as any proprietor of a web page is free to delete it or add to it as whimsy demands.

Of course, as can also be seen on the World Wide Web, there are some disadvantages to the lack of central authority. If no one knows about a web site, no one will visit it. Similarly, listing a dataset in a central data catalog, such as the Global Change Master Directory (http://gcmd.gsfc.nasa.gov/),can make data available to other researchers in a way that simply configuring an OPeNDAP server does not. OPeNDAP provided a facility for registering a data set with the GCMD catalog, which makes the data set known to the OPeNDAP data location service.


The remainder of this book will be divided into three major sections: instructions on the building and operating of OPeNDAP clients; a tutorial and reference on running OPeNDAP servers and making data available to OPeNDAP clients; and technical documentation describing the implementation details (and the motivation behind many of the design decisions) of the OPeNDAP software.

Using OPeNDAP

A user uses OPeNDAP with an OPeNDAP client program. This client program may have been acquired by the user (for example, the OPeNDAP Matlab and IDL graphic user interfaces, or Ferret, a freeware data analysis package each use OPeNDAP for data access), or may be a program converted to use the OPeNDAP library for data access (see ( opd-client).

In either case, there are a set of issues that must be addressed in order to use a program to access data through OPeNDAP. The issues can be classed into two groups. One set of issues involves configuring the system to provide OPeNDAP with the helper applications and environment variables it requires. The other set concerns the manner in which a user communicates with an OPeNDAP server. We cover this first

How OPeNDAP Finds Data

Once linked to the OPeNDAP libraries, an OPeNDAP client created from an existing program will work exactly as before when run using local files. However, a user can also specify an OPeNDAP Uniform Resource Locator (URL) to indicate some data file on a remote host machine. When the program receives this URL, the OPeNDAP libraries will recognize it as remote data, and issue a network request for the data. If a user has also installed an OPeNDAP server on the local machine, then local data may be accessed either through their local filenames or their OPeNDAP URL.

A URL is simply a unique name for some Internet resource. The File:Opd-client,fig,url-parts shows the parts of a typical OPeNDAP URL.

\begin{figure}[h] \texorhtml {\small ${}\overbrace{>dncview}^{Program} \overbrace"http}^{Protocol":// \overbrace"dods.gso.uri.edu}^{Machine Name"/ \overbrace"cgi-bin/nph-nc}^{Server"/ \overbrace"data}^{Directory"/ \overbrace"fnoc1.nc}^{Filename"/ \overbrace".das}^{URL Suffix}$" {\begin{vcode}{cb} >dncview http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc1.nc.das

^ ^ ^ ^ ^ ^ ^

| | | | | | | Program | | | | | | Protocol-- | | | | | Machine Name----- | | | | Server------------------------------------ | | | Directory---------------------------------------- | | Filename---------------------------------------------- | URL Suffix----------------------------------------------------- \end{vcode}} \caption{Parts of an OPeNDAP URL (without a constraint expression)}

\end{figure}

The parts of the URL are:

protocol


The protocol of an Internet request may be thought of as the kind of conversation the client expects to have with the target machine. For example, a web browser like Netscape Navigator wants to find a server that can return hypertext documents, while an ftp client wants to find a server that can understand file transfer requests. A web browser equipped to display hypertext documents will specify http as the protocol for its conversation, and hope that the target machine has an httpd daemon listening.

host
The host name in a URL is simply the

Internet address of the host machine running whatever server can reply to the specified protocol.

server
A special feature of the httpd server process is

that it may be configured to execute Common Gateway Interface (CGI) programs upon receipt of a properly specified URL. This is used, for example, by Internet search engines that ask a user to fill out a form. The CGI specification will be specific to the server in question, and the part of the URL that follows the CGI name is passed to the CGI upon invocation. This data may include a file name, but it may as easily be some arbitrary string of instructions. The OPeNDAP server is simply a set of CGI scripts executed on demand by the httpd server. Here, the OPeNDAP server is represented by a CGI script called nph-nc.

filename
If a CGI is not

specified, the part of the URL after the host name is simply the name of a file that is to be returned to the inquiring browser. If a CGI is specified, the file is given to the program as its argument.

URL suffix
If you are issuing an OPeNDAP request

from a non-OPeNDAP client, such as a web browser, you can specify the type of request by appending a suffix to the URL. Different suffixes demand different services from the server. The different services are listed in ( opd-client,services). If you are using OPeNDAP from an OPeNDAP client, or a client program adapted to use the OPeNDAP DAP library, you do not need to use a URL suffix. For example, to use OPeNDAP from Matlab, with the Matlab GUI or command-line clients, you do not need to use a suffix. To use OPeNDAP from a simple web browser like Netscape Navigator, you will need to use a suffix.

The URL in File:Opd-client,fig,url-parts shows a client request to the httpd server on the machine dods.gso.uri.edu, for a netCDF dataset (specified by the nph-nc} in the \lit{cgi-bin directory) contained in a file called fnoc1.nc}. Upon receiving this URL, the \lit{httpd server executes the specified OPeNDAP server module (nph-nc), which retrieves the file is in a directory called data relative to wherever the httpd server looks for its data\footnote{The only

part of the URL whose spelling is not at the discretion of the administrator of the host machine is the http, and the nph- at the beginning of the CGI script name. Even the nc, indicating netCDF, can be changed, although for clarity's sake, we hope people won't do so. Incidentally, the nph- is a relic, dating from the early days of the World Wide Web and the first hypertext protocol standards. It stands for "Non-Parsing Header" (See the CGI 1.1 Standard for more information.), and is the only way to pass data through many httpd servers unparsed.}.

OPeNDAP URLs can get somewhat more complicated than this simple description. In particular, they can contain "constraint expressions" that limit a request to data satisfying a set of conditions, and they can contain requests to specific OPeNDAP services, besides the data delivery service suggested here. Constraint expressions are described in more detail in ( opd-client,constraint), while the array of services provided by OPeNDAP servers are described in ( opd-client,services).

Security

Some OPeNDAP data providers will choose to control access to some or all of their data. When you request data from one of these servers, the OPeNDAP client will prompt you for a username and password. If you want to avoid the prompt, you can make the OPeNDAP URL even more baroque by embedding a username and password in it, like this:

\begin{vcode}{sib} http://user:password@www.dods.org/nph-dods/etc... \end{vcode}


The OPeNDAP Services

Up to now, we have treated the OPeNDAP server as if it has only one service: providing data to clients who ask for it. It is true that this is the most important service a server provides. However, it is also true that the server provides several other services besides that. In fact, fulfilling a request for data actually requires three separate requests from the client, using three different services of the OPeNDAP server.

The services requested from an OPeNDAP server are specified in a suffix appended to the URL described in File:Opd-client,fig,url-parts. Depending on the suffix supplied, the server will provide one of these services:


Data Attribute
This service returns the entire data

attribute structure for the given dataset. This is a text file

describing the attributes of each data quantity in that dataset.

(See ( data,das) for more information about data

attributes.) This service is activated when the

server receives a URL ending with .das.


Data Descriptor
This service returns the entire data descriptor

structure for the given dataset. This is a text file describing the structure of the variables in the dataset. (See ( data,dds) for more information about data descriptors.) This service is activated when the server receives a URL ending with .dds.

OPeNDAP Data
This service returns the actual data requested by

a given URL. This is not a text file, but is encoded as a Multipurpose Internet Mail Extensions (MIME) document. This service is activated when the server receives a URL ending with .dods

ASCII Data
This service returns an ASCII representation of

the requested data. This can make the data available to a wide variety of browser programs. This service is activated when the server receives a URL ending with .asc} or \lit{.ascii.

\ifh
When the server receives a URL ending in

.html, it produces an HTML form containing information from the dataset that you can use to construct a sensible URL with which to request OPeNDAP data. The \ifh is also triggered when the OPeNDAP server receives a URL that references a directory instead of a file.

Information
This service returns information about

the server and dataset, in human-readable HTML form. The returned document may include information about both the data server itself (e.g. server functions implemented), and the dataset referenced in the URL. The server administrator determines what information is returned in response to such a request. This service is activated when the server receives a URL ending with .info. See ( sec,document-data) for more information about how to configure the information service.

Version
This service returns the version information for the

OPeNDAP server software running on the server. This service is triggered by a URL ending with .ver.

Help
This service returns some help text in response to an

improperly specified URL. This service is triggered by a URL ending in any suffix that is not recognized by the OPeNDAP server.


A request for data from an OPeNDAP client will generally make three

different service requests, for data attributes, data descriptors, and for data. The prepackaged OPeNDAP clients do this for you, so you may not be aware that three requests are made for each URL. That is, an OPeNDAP client may accept an OPeNDAP URL specifying some data, such as the one shown in File:Opd-client,fig,url-parts. In this case, the OPeNDAP client library (such as nc-dods) will accept the input URL, and append the different suffixes to that URL, making three distinct

requests to the OPeNDAP server.

\ifh

Each OPeNDAP server implements a service called the \ifh . This is a way to use a standard Web client, such as Netscape, to get information about the data served by a specific server.\footnote{The \ifh is only

available for servers later than version 3.1.} The \ifh has two modes of operation: the directory level and the file level.

If an OPeNDAP URL references a directory instead of a file on the server machine, the server produces a listing similar to that shown in File:Opd-client,fig,ifh-dir.

\figureplace{\ifh - Directory Level}{htbp} {opd-client,fig,ifh-dir}{ifh-dir.ps}{ifh-dir.gif}{}

Clicking on a dataset shown in the directory-level listing will produce an HTML form similar to the one in File:Opd-client,fig,ifh. The top line in the window ("Data URL") shows a URL that makes a request for an OPeNDAP dataset. The windows below it show the variables that make up the dataset. You can edit the form to select the data you'd like to see from this dataset, and the \ifh will edit the Data URL so that it only requests the data you are interested in. When done, you can push the "ASCII" button, to see an ASCII representation of the data you've requested. Netscape cannot handle binary data, so if you want to use the binary data, you should copy the URL in the Data URL window to the OPeNDAP client you'd like to use.

\figureplace{\ifh}{htbp} {opd-client,fig,ifh}{ifh.ps}{ifh.gif}{}

Using an OPeNDAP Program

There are some configuration issues a user must consider in order to use an OPeNDAP client application program. There is a short list of software that is required for some of the advanced features of OPeNDAP, and some environment variables that control the execution of the OPeNDAP software. For a piece of software that has been converted to use OPeNDAP, after these conditions are satisfied, the program will run in the same manner it ran before. Aside from network delays, the user should not be able to tell that they are accessing data from the Internet.


Finally, though it may seem unnecessary to mention, in order for an OPeNDAP client application to communicate with an OPeNDAP server, the computer running the OPeNDAP client must be connected to the Internet.

Requirements

In order to use of some of the features of the OPeNDAP core software, a user's computer must have some additional software installed, and available on the user's PATH, in $DODS_ROOT/bin} or \lit{$DODS_ROOT/etc.


\indc{system

configuration}


  • The wish} {Tcl}}/{\ind{Tk interpreter (or whatever

program is indicated by the DODS_GUI environment variable) is used by the "GUI manager" to provide a progress indicator that displays the status of a pending data request as it is being processed. It is also used by the error reporting system to display error message received from the server. \tbd{and by the data locator, to display information and query the user}

  • The gzip} program, the \ind{GNU compression

software, is used to decompress data messages received from an OPeNDAP server. If this program is not installed, the OPeNDAP core software tells the server not to send compressed messages, so data may still be received. However, having the compression software installed and available will increase the data transfer rate.

The required software, like OPeNDAP itself, is free software. Refer to \appref{install} for information about acquiring that software.

Environment Variables

After successfully relinking an application program with the OPeNDAP libraries, there is a short list of environment variables that may be defined. Only DODS_ROOT is required. The other three variables are only used to override default values controlling the GUI manager process. Most users may safely ignore them.

DODS_ROOT
indicates the root directory of the OPeNDAP

software. The OPeNDAP core software must be able to locate utilities that are located in this directory tree. \indc{environment variables!DODS_ROOT}

DODS_GUI
can contain the name of the program used by the

\new{GUI manager}. A user might wish to change this variable to point to a "safe" Tcl/Tk interpreter; whatever program is used here must be able to process Tcl and Tk commands. The default value is the wish program. \indc{environment variables!DODS_GUI}

DODS_GUI_INIT
indicates the name of any initialization

command required by the "GUI manager". The default initialization string executes the Tcl program in $DODS_ROOT/etc/dods_gui.tc1. \indc{environment variables!DODS_GUI_INIT}

DODS_USE_GUI
may be used to turn off the GUI manager. Set

the value of this variable to no, and the progress indicator and the error message windows will not be displayed.


The user has substantial control over the GUI manager. You can

change the program that listens for GUI commands from wish to anything else, and you can actually change the action of the GUI commands by editing the Tcl code in the files dods_gui.tcl, error.tcl}, and \lit{progress.tcl. (These are in the $DODS_ROOT/etc directory.) However, editing these files and variables will not change the form of the messages from the OPeNDAP server, and from the core software that are meant to invoke these programs. In other words, the user may mess with these, but must be careful to leave the GUI manager in a form that will be able to

process the messages it receives.

The Error System

The GUI manager is used to display error messages to the user. The messages themselves will vary with the server implementation. Refer to the documentation of the particular server, or consult the server's info Service (See ( opd-server,service).), for a list of the error messages that might be issued by a particular server. \tbd{As error codes are finalized, they should be included in an Appendix of this document, and a pointer to them included here.}

Temporary Files

Using an OPeNDAP client application will create a number of temporary files. They are created with the tmpnam() function, so their names will correspond to the rules for that function on your system (See the manual page for tmpnam(3)}, or type \lit{man tmpnam for more information.) During normal operation, OPeNDAP will delete the temporary files it creates as it goes. However, if execution of the OPeNDAP client is somehow interrupted, these files may remain, and will have to be deleted by hand.

The OPeNDAP Client

There are many different data analysis packages in use. Some packages, such as MATLAB and IDL, are commercially available, but many more are written for a specialized need or application. Many of these use one of the widely available sets of scientific data access functions (called an {\em

Application Program Interface}, or API)\indc{Application Program

Interface|see{API}} such as NetCDF, JGOFS, or HDF. There is great variety among all these programs, but one feature they share is that they all access data through files containing that data\footnote{This is not true of some

APIs, such as JGOFS. That API, however, uses a data dictionary to allow

the user to think that the data access is through files.}. That is to say that each program begins by identifying a file containing the data the user wishes to examine or analyze.

An OPeNDAP client is simply a data analysis application linked with the OPeNDAP libraries instead of the standard data access API. Using this program, a user can look at files containing data in the same way as was possible without the OPeNDAP libraries. However, by using these libraries, a user can also use a URL (URL), instead of a simple file name, to specify data located anywhere on the Internet. \Figureref{intro,fig,unlinked} and File:Intro,fig,linked illustrate the operation of an application program linked with a standard data access API, and the same program linked with the OPeNDAP version of that API.

An OPeNDAP client is then a data analysis application program modified to become a web browser, somewhat like any other \ind{web

browser} (NCSA Mosaic) with which you may be familiar. A web browser can only display the data it receives, however. What makes an OPeNDAP client different from another web browser is that, unlike Netscape, once the data has been received from an OPeNDAP server, the OPeNDAP client application can compute with it.

Like a web browser, an OPeNDAP client accepts a URL from a user, and parses it to come up with a protocol, an address, and a message. (See ( opd-client,url) for more information about URLs.) The browser then sends a message to the address, directed to the server who can service the desired protocol, asking for the information specified in the remainder of the URL. Unlike a typical web browser, an OPeNDAP client will not know what to do with data returned for a web page containing text and pictures, but an OPeNDAP server will return scientific data that an OPeNDAP client can understand and process.

Here is a simple example, using the ncview program. This program simply prints out the contents of a netCDF formatted data file, specified on the command line, like this:

> ncview fnocl.nc

Using OPeNDAP, this same function may be executed from any computer connected to the Internet by substituting a URL for the filename above:

> dncview http://dods.gso.uri.edu/cgi-bin/nc/data/fnocl.nc


(See File:Opd-client,fig,url-parts Aside from the fact that the data is remote, and must be specified with a URL, the program will seem to function in the same way it had with the simple netCDF library (albeit somewhat more slowly due to having to make network connections instead of local file operations). You can find dncview (the ncview program linked with the OPeNDAP library) in the

$DODS_ROOT/src/nc-dods/ncview


directory. Running the above command will produce the following output:

netcdf fnocl {
dimensions:

time_a = 16

lat = 17 ;

lon = 21 ;

time = 16 ;

variables:

long u(time_a, lat, ion) ;

u:units = "meter per second" ;

u:long_name = "Vector wind eastward component" ;

u:missing_value = "-32767" ;

u:scale_factor = "0.005" ;

long v(time_a, lat, ion) ;

v:units = "meter per second" ;

v:long_name = "Vector wind northward component" ;

v:missing_value = "-32767" ;

v:scale_factor = "0.005" ;

double lat(lat) ;

lat:units = "degree North" ;

double lon(lon) ;

lon:units = "degree East" ;

double time(time) ;

time:units = "hours from base_time" ;

// global attributes:

:base_time = "88- 10-00:00:00" ;

:title = "FNOC UV wind components

from 1988- 10 to 1988- 13." ;
data:

u =

-1728, -2449, -3099, -3585, -3254, -2406, -1252,

662, 2483, 2910, 2819, 2946, 2745, 2734,

2931, 2601, 2139, 1845, 1754, 1897, 1854, -1686,
...

Although there are packaged OPeNDAP browsing programs that a user can use to look at data, the user can also construct his or her own. Linking an OPeNDAP API with an already existing program allows a user to create a customized web browser that can access data available from any OPeNDAP server connected to the Internet.

The OPeNDAP APIs are designed to accurately mimic the behavior of several different commonly used scientific data APIs. As of this writing (\today), the OPeNDAP API set includes:


Supported APIs
API Description Components
netCDF Support for gridded data, such as satellite data,

interpolated ship station data, or current meter data. || Server and client.

JGOFS Support for relational data, such as \class{Sequences}.

Created by the Joint Globar Ocean Flux Study (JGOFS) project for use with oceanographic station data. || Server and client.

HDF Support for gridded data. Commonly used for astronomical

data and model data. || Server only.

DSP Oceanographic and geophysical satellite data. Provides

support for image processing. Developed at the University of Miami/RSMAS. Primarily used for AVHRR and CZCS data. || Server only.

GRIB Support for gridded binary data. GRIB is the World

Meteorological Organization (WMO) format for the storage of weather information and the exchange of weather product messages. || Server only, due in early 1999.

BUFR The WMO's standard set of codes for the transmission and

storage of meteorological data, using a compressed format with each data value occupying the least number of bits necessary to contain its range of values. Suitable for meteorological observations made from a single point or set of points. || Server only, due in early 1999.

Free\-Form On-the-fly conversion of arbitrarily formatted data, including

relational data and gridded data. May be used for sequence data, satellite data, model data, or any other data format that can be described in the flexible FreeForm format definition language. This server can be used to serve data stored in almost all home-grown data formats. || Server only; no client required.

native OPeNDAP The OPeNDAP class library may be used directly by a client program. It

supports relational data, array data, gridded data, and a flexible assortment of data types that can be combined to c accommodate most data models. || Client.


The API set is extensible, meaning that developers can use the OPeNDAP software toolkit to write OPeNDAP-compliant versions of new APIs. See The OPeNDAP Programmer's Guide for more information.

The most important result of this architecture is that, just as the use of the dncview program above is identical to the original ncview, a user can use remote OPeNDAP data "and" continue to use the same data analysis and display programs with which he or she is familiar. Any program that uses one of the OPeNDAP-supported APIs may be re-linked to use the OPeNDAP version of that API. This creates an OPeNDAP client. That and a connection to the Internet, are all that a researcher requires to gain access to the available OPeNDAP data.

Configuring Programs to Use OPeNDAP

Relinking an existing program with the OPeNDAP implementation of some data API is a simple procedure. Find the directory that contains the source/object code of the program you want to re-link and modify the makefile (typically called Makefile) for the program so that the OPeNDAP-compliant API library is used in place of the standard API library. (If you can't find the libraries on your system, see \appref{install}, or ask the system administrator.) These libraries are:


libdap++.a
Software common to all of the OPeNDAP-supported

APIs.

OPeNDAP also uses facilities from some standard libraries, and these must also be included in the link to resolve all the symbols.

libwww.a
The World Wide Web library. \indc{World Wide

Web!library} This contains the functions used to communicate between the OPeNDAP client and server.

libexpect.a
Functions from the expect

library are used to communicate between OPeNDAP client processes.

libtcl.a
Contains definitions necessary for the

expect library. The use of this library in the link is not related to the use of Tcl by OPeNDAP clients.

libstdc++.a

The GNU C++ class library (This is not necessary if using g++ to re-link.)

You will also need to include the library containing the OPeNDAP-compliant version of the API. The name of this library of course depends on the API, but it is generally in the form

<font color='green'>lib"API</font>-dods.a"

Where "API" is an abbreviation indicating the API emulated by the specified library. For example, the OPeNDAP-compliant netCDF library is called libnc-dods.a and the JGOFS version is libjg-dods.a.

An Example Using netCDF

The ncview program is a simple utility that prints the contents of a netCDF-format file to standard output. This section outlines the process used to modify the ncview makefile to link that program with the OPeNDAP netCDF API, thereby turning ncview into a network-ready OPeNDAP client. The process of linking any other program with the corresponding OPeNDAP library is entirely analogous to this one and only requires the substitution of the program name and the appropriate library.

First the link flags were modified so that the library search path would include the likely places to find the OPeNDAP libraries:

LDFLAGS = -g -L$(DODS_ROOT)/lib




DODS_ROOT is an environment variable that indicates the root directory of the OPeNDAP installation, and in this manual is used as shorthand for this directory. It is typically called something like /usr/local/DODS. If you cannot find these directories on your system, consult your system administrator, or refer to \appref{install} for information about acquiring and installing the OPeNDAP software.

After the link flags were modified, the OPeNDAP libraries were added to the list of libraries used. The order in which the libraries are listed is important.

LIBS = -lnc-dods -ldap++ -lnc-dods -ldap++ -lwww -ltcl

-lexpect -lz -lrx


Because OPeNDAP is implemented as a core set of classes contained in one

library (libdap++.a) and a set of specializations of those classes in a second library (libnc-dods.a), and because there is a circular dependence between those two libraries, they must be included twice in the

linker command.

Finally, g++ was substituted for the link command.\footnote{It is possible to use gcc} instead of \lit{g++, but in that case, -lg++ must be added to the end of the library list.}


Potential Problems

When a user links an existing a program to the OPeNDAP libraries, there are several possible conditions that may cause problems.


  • Some programs use more than one API.
  • Some programs access data using both API and UNIX system calls.
  • Some programs use undocumented features of the APIs.

If this is the case for a given program, there is generally no good solution beside rewriting the software to conform to a strict usage of the data reading parts of the given API. Of course if the problem is that the program uses more than one API, you can try linking the program with an OPeNDAP-compliant version of the second API as well.


  • Re-linked programs can be very large.


\indc{troubleshooting!size

of executable} The OPeNDAP libraries are large, and the g++, www, expect, and tcl libraries on which they are built are even larger. This means that the executable version of a re-linked OPeNDAP client can seem unreasonably obese. Much of the disk space is occupied by symbol tables, which can be removed from the executable file with the strip utility. In many cases, a user can recover a substantial amount of disk space this way.


[CAUTION]{Without familiarity with the OPeNDAP software, it is best

only to strip the executable files. Stripping object files or libraries might leave them in a useless condition for the linker. Furthermore, stripping an executable file removes symbol names,

which may make diagnosing problems more difficult.

The OPeNDAP libraries only affect the data reading functionality of the specified API. There are no OPeNDAP replacements for functions like netCDF's ncputrec(), that write data to a disk file. These functions are included in the OPeNDAP-compliant API library, but they operate in a manner identical to the original (non-OPeNDAP) versions, that is, they work on local files only, attempting to write "over the network" will result in an error. \indc{API!data output

functions}

Writing New OPeNDAP Programs

The OPeNDAP software may also be used to write new programs. This may be done either through one of the OPeNDAP-supported API libraries, such as netCDF or JGOFS, or by using the OPeNDAP data access protocol directly. There are advantages and disadvantages to each approach.


The biggest advantage of writing new code using an OPeNDAP-supported API such as netCDF or JGOFS is that the programmer in question is probably already familiar with the use of that API. Writing an OPeNDAP program using an adapted API is not significantly different than writing the same program with the original API. While writing this new program, it will be useful to remember that the data the program uses will often be remote, implying that data retrieval may not be instantaneous, and that implementation of local caching to store requested data might be a good idea, but other than that, the process is the same as writing a program using the regular API.


It is also possible to use the OPeNDAP data access protocol directly. This is somewhat more involved than using one of the OPeNDAP-compliant API libraries, and C++ is the only language supported for this. However, this approach can provide substantially more efficient programs. For further information about this approach, refer to the technical information about the DAP in The OPeNDAP Programmer's Guide .