|
|
(One intermediate revision by the same user not shown) |
Line 691: |
Line 691: |
| the data provider. These issues are discussed in detail in | | the data provider. These issues are discussed in detail in |
| ([http://www <cite> data,trans</cite>]) | | ([http://www <cite> data,trans</cite>]) |
|
| |
|
| |
| =The OPeNDAP Server=
| |
|
| |
|
| |
| See the OPeNDAP Server installation guide for most of this information.
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| There are two separate pieces to the task of installing an OPeNDAP data
| |
| server: installing and configuring the server itself, and telling the
| |
| universe of possible users about it.\footnote{The second step is, of
| |
|
| |
| course, optional.} Only the first will be considered in this
| |
| chapter. OPeNDAP provides avenues for doing the second, including a
| |
| Catalog Service indexing OPeNDAP datasets, and cooperation with the
| |
| Global Change Master Directory, but these are still under
| |
| construction.
| |
|
| |
| An OPeNDAP server is nothing more than a World Wide Web server
| |
| (<font color='green'>httpd</font>) equipped with Common Gateway Interface (CGI) programs
| |
| that enable it to respond to requests for data from OPeNDAP client
| |
| programs. Web servers and CGI programs are standard parts of the Web,
| |
| and the details of their operation and installation are beyond the
| |
| scope of this guide. For further information about these, consult one
| |
| of the many World Wide Web references now available. For the purposes
| |
| of understanding the OPeNDAP architecture, a user need only understand
| |
| the following:
| |
|
| |
|
| |
| *A Web server is a process that runs on a computer (the host machine) connected to the Internet. When it receives a URL from some Web client, such as a user somewhere operating Netscape or Mosaic, it packages and returns the data specified by the URL to that client. The data can be text, as in a web page, but it may also be images, sounds, a program to be executed on the client machine, or some other data.
| |
| *A properly specified URL can cause a Web server to invoke a CGI program on its host machine, accepting the input that would have gone to the httpd, and returning the output of that program to the client who sent the URL in the first place. The CGI is executed on the server. The OPeNDAP server relies on this facility.
| |
|
| |
| ==Server Architecture==
| |
|
| |
|
| |
|
| |
| A request for
| |
| data made to the client OPeNDAP library will result in three different
| |
| requests for data to an OPeNDAP server. The user simply enters a single
| |
| URL, as described in ([http://www <cite> opd-client,url</cite>]). The core OPeNDAP
| |
| software then modifies the URL into three slightly different forms,
| |
| and makes three requests for data to the server. The first request is
| |
| for the "shape" of the data, and consists of the dataset descriptor
| |
| structure, described on [http://www.opendap.org/<cite>data,dds</cite>]. The second request is for
| |
| the attributes of the data types described in the DDS. This structure
| |
| is described on [http://www.opendap.org/<cite>data,das</cite>]. The last request is actually for
| |
| the data.
| |
|
| |
| The response to the DDS and DAS request URLs is text formatted using
| |
| the grammars in \tableref{data,tab,DAS} and \tableref{data,tab,DDS}.
| |
| This text can then be parsed by the caller to determine the structure
| |
| of the dataset, types and sizes of each of its components and their
| |
| attributes. Depending on the data access API used to access the data,
| |
| these structures may be derived either from information contained in
| |
| the dataset or from ancillary information supplied by the dataset
| |
| maintainers in separate text files, or both. The data in these
| |
| structures (which can be thought of as data about the real data) may
| |
| be cached by the client system.
| |
|
| |
| The OPeNDAP DAP is a stateless protocol. The protocol \new{entry points}
| |
| may be thought of as the different messages to which an OPeNDAP server
| |
| will respond. (A message is just a URL specifying a request.) Each of
| |
| the protocol entry points does a single isolated job and they can be
| |
| issued in any order. Of course, it may not make sense to the user to
| |
| ask for the data before asking for the data description structure, but
| |
| that is not the server's problem. This separability allows the user to
| |
| cache data locally if need be, so that future accesses to the same
| |
| dataset can skip the retrieval of these structures.
| |
|
| |
| To understand the operation of the OPeNDAP server, it is useful to follow
| |
| the actions taken to reply to a data request. The diagram in
| |
| [[Image:opd-server,fig,server-design]] lays out the relationship
| |
| between the various entities. Consider an OPeNDAP URL such as the
| |
| following:
| |
|
| |
| <pre>
| |
| http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc43.nc
| |
| </pre>
| |
|
| |
| When this URL is submitted to an OPeNDAP client, it will contact the Web
| |
| server (httpd) running on the platform, <font color='green'>dods.gso.uri.edu</font>. When
| |
| the connection has been established, the client will forward to the
| |
| server the remaining parts of the URL:
| |
| <font color='green'>/cgi-bin/nph-nc/data/fnoc43.nc</font>. As the server parses this
| |
| string, it will notice that <font color='green'>cgi-bin</font> corresponds to the name of
| |
| the directory where it keeps its CGI programs. (The actual directory
| |
| name is specific to the particular web server used, and the details of
| |
| its installation. Typically, the web server documnetation might call
| |
| it the <font color='green'>ScriptAlias</font> directory, and it might refer to something
| |
| like <font color='green'>/usr/local/etc/httpd/cgi-bin</font>.) It looks in that directory
| |
| to see whether there exists a CGI program called <font color='green'>nph-nc</font>, which
| |
| is the name of the netCDF OPeNDAP server packaged with OPeNDAP. Finally,
| |
| the server executes that program, specifying the rest of the URL
| |
| (<font color='green'>data/fnoc43.nc</font> in this case) for an argument. The standard
| |
| output of the CGI program is redirected to the output of the
| |
| <font color='green'>httpd</font>, so the client will receive the program output as the
| |
| reply to its request.
| |
|
| |
| \figureplace{The Architecture of an OPeNDAP Data Server.}{htbp}
| |
| {opd-server,fig,server-design}{arch.ps}{arch.gif}{}
| |
|
| |
| For APIs that are designed to read and write files, such as netCDF,
| |
| the CGI program will be executed with the working directory specified
| |
| by the <font color='green'>httpd</font> configuration. On the <font color='green'>dods.gso.uri.edu</font>
| |
| server, for example, all CGI programs are executed native to the
| |
| directory <font color='green'>/usr/local/spool/http</font>. The last section of the URL,
| |
| then, specifies the file <font color='green'>fnoc43.nc</font> in the directory:
| |
|
| |
| <pre>
| |
| /usr/local/spool/http/data.
| |
| </pre>
| |
|
| |
| Several existing data APIs, such as JGOFS, are not designed with file
| |
| access as their fundamental paradigm. The JGOFS system, for example,
| |
| uses an arrangement of "dictionaries" that define the location and
| |
| method of access for specified data "objects." A URL addressing a
| |
| JGOFS object may appear to represent a file, like the netCDF URL
| |
| above.
| |
|
| |
| <pre>
| |
| http://dods.gso.uri.edu/cgi-bin/nph-jg/station43
| |
| </pre>
| |
|
| |
|
| |
| However, the identifier (<font color='green'>station43</font>) after the CGI program name
| |
| (<font color='green'>nph-jg</font>) represents, not a file, but an entry in the JGOFS data
| |
| dictionary. The entry will, in turn, identify a file or a database
| |
| index entry (possibly on yet another system) and a method to access
| |
| the data indicated. (The <font color='green'>httpd</font> server must be a valid JGOFS user to
| |
| have access to the dictionary.)
| |
|
| |
| Note that the name and location of the <font color='green'>cgi-bin</font> directory, as
| |
| well as the name and location of the working directory used by the CGI
| |
| programs, are local configuration details of the particular web server
| |
| in use. The location of the JGOFS data dictionary is a configuration
| |
| issue of the JGOFS installation. That is to say these details will
| |
| probably be different on different machines.
| |
|
| |
| ===Service Programs===
| |
|
| |
|
| |
|
| |
|
| |
| At this point, the request for data, encoded in a URL, has caused the
| |
| <font color='green'>httpd</font> server to execute the CGI program that represents the OPeNDAP
| |
| server. The OPeNDAP server, in turn, executes one of several different
| |
| service programs, and returns the result of that execution to
| |
| the client. Though there may be others available on a given machine,
| |
| five of the services constitute the core functionality of the OPeNDAP
| |
| server:
| |
|
| |
|
| |
| *Data Attribute
| |
| *Data Description
| |
| *Data
| |
| *ASCII Data
| |
| *Information
| |
|
| |
| \note{There are other important OPeNDAP services. For a description of
| |
|
| |
| all the OPeNDAP services, see ([http://www <cite> opd-client,services</cite>]).}
| |
|
| |
| The OPeNDAP server is structured as a dispatch function, invoking
| |
| ancillary helper programs to provide its services. Installing an OPeNDAP
| |
| server involves making sure that each of the required helper programs
| |
| is available to the server software. Here is a table of the helper
| |
| programs required for each of the OPeNDAP services for the netCDF
| |
| server. For another OPeNDAP server, the names of some of the helper
| |
| programs would have a different root (e.g. <font color='green'>ff_</font> for the FreeForm
| |
| server, <font color='green'>jg_</font> for JGOFS, etc.).
| |
|
| |
|
| |
| \begin{table}[htbp]
| |
| \caption{OPeNDAP Services, with their suffixes and helper programs\@.}
| |
|
| |
| \begin{center}
| |
| \begin{tabular}{|p{0.75in}|p{0.75in}|p{2in}|} \hline
| |
| \tblhd{Service} & \tblhd{Suffix} & \tblhd{Helper Program}
| |
| \hline \hline
| |
| Data Attribute & <font color='green'>.das</font> & <font color='green'>nc_das</font>
| |
| \hline
| |
| Data Descriptor & <font color='green'>.dds</font> & <font color='green'>nc_dds</font>
| |
| \hline
| |
| OPeNDAP Data & <font color='green'>.dods</font> & <font color='green'>nc_dods</font>
| |
| \hline
| |
| ASCII Data & <font color='green'>.asc</font> or <font color='green'>.ascii</font> & <font color='green'>asciival</font>
| |
| \hline
| |
| Information & <font color='green'>.info</font> & <font color='green'>usage</font>, see
| |
| ([http://www <cite> sec,document-data</cite>]) for configuration information.
| |
|
| |
|
| |
| \hline
| |
| \ifh & <font color='green'>.html</font> & None
| |
| \hline
| |
| Version & <font color='green'>.ver</font> & None
| |
| \hline
| |
| Help & Anything else & None
| |
| \hline
| |
| \end{tabular}
| |
| \end{center}
| |
| \end{table}
| |
|
| |
|
| |
| The service programs are started by the CGI depending on the extension
| |
| given with the URL. If the URL ends with `.das' then the DAS service
| |
| program is started. Similarly, the extension `.dds' will cause the DDS
| |
| service to run and so on. The CGI program (the "dispatch" script),
| |
| which serves to dispatch the request to one of the three service
| |
| programs, can be very simple. In the servers distributed with OPeNDAP,
| |
| the CGI is simply a shell script that takes its own name and catenates
| |
| the enclosed URL suffix. The services, being more complex programs,
| |
| will generally be written in C or \Cpp .
| |
|
| |
| On the client side, the user may never see the `.das,'
| |
| `.dds,' or `.dds' URL extensions. Nor will the user necessarily be
| |
| aware that each URL given to the OPeNDAP client produces three different
| |
| requests for information. These manipulations happen within the client
| |
| library, and the user need never be aware of them. \tbd{(Refer to
| |
|
| |
| Section ?? for more information about how and where this
| |
|
| |
| substitution takes place)}
| |
|
| |
| There may be more than five service programs for a given server
| |
| implementation.\footnote{A couple of services, such as the version and
| |
|
| |
| help services, are built into the server software, and need no
| |
|
| |
| configuration.} A server may provide other "services," such as the
| |
| catalog service, or a service specific to a particular data
| |
| implementation. The three data services, however, constitute the
| |
| minimum configuration for a functional server. All three services are
| |
| involved in data requests, as the client program will use the output
| |
| from the <font color='green'>_dds</font> and <font color='green'>_dds</font> services to allocate memory and
| |
| define parameters for the output of the <font color='green'>_dods</font> service, which is
| |
| the actual data requested. The remaining two services, the ASCII and
| |
| information services, are primarily intended for interactive use, as
| |
| they make dataset and service information directly available to a
| |
| browser client, such as Netscape.
| |
|
| |
| ==Installing an OPeNDAP Server==
| |
|
| |
|
| |
|
| |
| Most of the task of installing an OPeNDAP server consists of getting the
| |
| required Web server installed and running. The intricacies of this
| |
| task, and the variety of available Web servers make this task beyond
| |
| the scope of this guide. Proceed with the following steps only after
| |
| the Web server itself is operational.
| |
|
| |
|
| |
|
| |
|
| |
| Installing the OPeNDAP CGI programs and the data to be served is a
| |
| relatively simple operation. After
| |
| installing the OPeNDAP source tree and building the software, (See
| |
| \appref{install}), the user need only copy the CGI program from the
| |
| <font color='green'>etc</font> directory in the OPeNDAP source tree (<font color='green'>$(DODS_ROOT)/etc</font>)
| |
| to one of the directories where the Web server expects to find its CGI
| |
| programs. The exact name of this directory is an implementation detail
| |
| of the Web server itself.
| |
|
| |
| The service programs used by the CGI are generally kept in the same
| |
| directory as the CGI itself, although this can be changed by modifying
| |
| the OPeNDAP CGI dispatch script.
| |
|
| |
| \note{The server programs come with release notes and installation
| |
|
| |
| notes, in files <font color='green'>README</font> and <font color='green'>INSTALL</font>, among others. These
| |
|
| |
| will be found in the distribution directories for the particular
| |
|
| |
| server. For example, the documentation for the JGOFS server will be
| |
|
| |
| found in <font color='green'>$DODS_ROOT/src/http/jg-dods</font>. See [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] for
| |
|
| |
| additional information about server documentation.}
| |
|
| |
| After installing the CGI program and the services, the data to be
| |
| provided must be put in some location where it may be served to
| |
| clients. Again, the location of the data depends on the configuration
| |
| of the Web server and the API used by the CGI services. Most often,
| |
| data that is served by a Web server is kept in the <font color='green'>htdocs</font>
| |
| directory, the exact pathname of which is specified in the
| |
| <font color='green'>httpd</font>.configuration file. A server may also be enabled to search
| |
| a user's home directory tree or may follow links from the <font color='green'>htdocs</font>
| |
| directory (if the server is enabled to follow symbolic links). There
| |
| may be yet other options provided by the specific server used in a
| |
| particular installation, so there is really no way to avoid consulting
| |
| the configuration instructions of the Web server.
| |
|
| |
| As noted, the location of the data depends not only on the
| |
| configuration of the Web server, but also on the API used to access
| |
| the data requested. For example, the netCDF server simply
| |
| stores data in a path relative to the working directory of the CGI
| |
| program, <font color='green'>htdocs</font>, while the JGOFS server uses its data
| |
| dictionary to specify the location of its data. Refer to the specific
| |
| installation notes for each API for more information about the
| |
| location of the data.
| |
|
| |
| ===Configuring the Server===
| |
|
| |
|
| |
| The issues of server configuration depend to a large extent on the
| |
| particular server in question. The OPeNDAP server for JGOFS data is
| |
| configured differently than the OPeNDAP server for netCDF data. Each
| |
| server comes with its own installation and configuration instructions.
| |
| These can be found in a file called <font color='green'>INSTALL</font> in the distribution
| |
| directory for the server. The server distribution directories are in
| |
| <font color='green'>$DODS_ROOT/src</font>. Here is a checklist of items that need to be
| |
| attended in order to install any OPeNDAP server: \tbd{Is this list
| |
|
| |
| complete?}
| |
|
| |
|
| |
|
| |
| *Is the <font color='green'>httpd</font> server configured to execute CGI programs?
| |
| *Are the main CGI and subsidiary CGI programs installed in the
| |
| server's CGI directory? For the netCDF API, these will be called
| |
| <font color='green'>nph-nc</font>, and <font color='green'>nc_das</font>, <font color='green'>nc_dds</font>, and so on. The
| |
| server CGI's for other API's will have comparable names.
| |
| *Is the <font color='green'>gzip</font> program installed in the <font color='green'>PATH</font> of the
| |
| <font color='green'>httpd</font> server? This is used to compress data messages returned
| |
| to the client.
| |
|
| |
| ===Constructing the URL===
| |
|
| |
| After a dataset has been installed, and the server programs installed,
| |
| you need to know what its address is. ([http://www <cite> opd-client,url</cite>])
| |
| contains an explanation of the various parts of the OPeNDAP URL,
| |
| including a diagram in [[Image:opd-client,fig,url-parts]]. Refer
| |
| to this section, with a copy of the Web server configuration data
| |
| readily available. Using the configuration data, you should be able
| |
| to determine the appropriate URL for the data you are serving.
| |
|
| |
| Remember that the web server will have its own definition of the root
| |
| directory for data, and another definition for CGI programs, depending
| |
| on the configuration.
| |
|
| |
|
| |
| ===Documenting Your Data===
| |
|
| |
|
| |
| OPeNDAP contains provisions for supplying documentation to users about a
| |
| server, and also about the data that server provides. When a server
| |
| receives an information request (through the <font color='green'>info</font> service that
| |
| invokes the <font color='green'>usage</font> program), it
| |
| returns to the client an HTML document created from the DAS and DDS of
| |
| the referenced data. It may also return information about the server,
| |
| and more detail about the dataset.
| |
|
| |
| If you would like to provide more information about a dataset than is
| |
| contained in the DAS and DDS, simply create an HTML document (without
| |
| the <font color='green'><html></font> and <font color='green'><body></font> tags, which are supplied by the
| |
| <font color='green'>info</font> service), and store it in the same directory as the
| |
| dataset, with a name corresponding to the dataset filename. For
| |
| example, the datasets <font color='green'>fnoc1.nc</font>, <font color='green'>fnoc2.nc</font>, and
| |
| <font color='green'>fnoc3.nc</font> might be documented with a file called <font color='green'>fnoc.html</font>.
| |
|
| |
| You may prefer to override this method of creating documentation and
| |
| simply provide a single, complete HTML document that contains general
| |
| information for the server or for a group of datasets. For example,
| |
| to force the info server to return a particular HTML document for all
| |
| its datasets, you would create a complete HTML document and give it
| |
| the name \var{dataset}<font color='green'>.ovr</font>, where \var{dataset} is the dataset
| |
| name.
| |
|
| |
| More information about providing user information, including sample
| |
| HTML files, and a complete description of the search procedure for
| |
| finding the dataset documentation, is to be found in [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] .
| |
|
| |
| ===Testing the Installation===
| |
|
| |
|
| |
| It is possible to test the OPeNDAP server to see whether an installation
| |
| has been properly done. The easiest way to test the installation is
| |
| with a simple Web client like Netscape or Mosaic. (A simple Web client
| |
| called <font color='green'>geturl</font> is provided in the OPeNDAP core software which
| |
| can retrieve text from Web servers. Look for it in the
| |
| <font color='green'>$(DODS_ROOT)/etc</font> directory.)
| |
|
| |
| The simplest test is simply to ask for the version of the server, or
| |
| the help message. The OPeNDAP server uses helper programs to return the
| |
| DAS, DDS, and data. If you want to test the server itself, and not
| |
| the configuration of the helper programs, the version, help, or info
| |
| services will suffice. Issuing a URL with <font color='green'>.ver</font> on the end will
| |
| return the version information for this server, appending <font color='green'>.info</font>
| |
| will return the info message, and issuing a URL with a nonsense suffix
| |
| or <font color='green'>.help</font> will return a help message:
| |
|
| |
|
| |
| <pre>
| |
| > geturl http://dods.gso.uri.edu/cgi-bin/nph-nc/data/test.nc.ver
| |
| > geturl http://dods.gso.uri.edu/cgi-bin/nph-nc/data/test.nc.info
| |
| > geturl http://dods.gso.uri.edu/cgi-bin/nph-nc/data/test.nc.help
| |
| </pre>
| |
|
| |
| To return the data attribute structure of a dataset, use a URL such as
| |
| the following:\footnote{The <font color='green'>geturl</font> program knows about the OPeNDAP
| |
|
| |
| protocols, so you can also omit the <font color='green'>.das</font> suffix, and use the
| |
|
| |
| <font color='green'>-a</font> option to the <font color='green'>geturl</font> command. This tells
| |
|
| |
| <font color='green'>geturl</font> to append <font color='green'>.das</font> for you.}
| |
|
| |
| <pre>
| |
| > geturl http://dods.gso.uri.edu/cgi-bin/nph-nc/data/test.nc.das
| |
| </pre>
| |
|
| |
| Refer to ([http://www <cite> data,das</cite>]) for a description of a data attribute
| |
| structure. You can compare the description against what is returned by
| |
| the above URL to test the operation of the OPeNDAP server.
| |
|
| |
| You can use your web client to test the OPeNDAP server by using it to
| |
| submit URLs that address specific services of the client. See
| |
| ([http://www <cite> opd-client,services</cite>]) for information about how to request
| |
| individual services. If any of the services fail, you can check the
| |
| list of helper programs in ([http://www <cite> opd-server,service</cite>]) to find
| |
| out which is missing. From the web browser, you can access all the
| |
| OPeNDAP services, except the (binary) data service. However, if all the
| |
| others work, you can be relatively assured that one will, too.
| |
|
| |
| Using the <font color='green'>.html</font> suffix produces the \ifh, providing a
| |
| forms-based interface with which a user can query the dataset using a
| |
| simple web browser. There's more about the \ifh in
| |
| ([http://www <cite> opd-client</cite>]).
| |
|
| |
| ==Displaying Information to the OPeNDAP User==
| |
|
| |
|
| |
|
| |
| OPeNDAP contains a system that
| |
| allows an OPeNDAP server a degree of control over the user's graphic user
| |
| interface (GUI). This system runs the
| |
| system progress indicator, that displays to the user the status of a
| |
| pending data request. However, a server may also use the GUI interface
| |
| to display messages to the user, such as error messages, and even to
| |
| query the user for information.
| |
|
| |
| ===GUI Architecture===
| |
|
| |
|
| |
| Since OPeNDAP is built inside a data access API, and since the
| |
| application program that has become the OPeNDAP client was presumably not
| |
| built with network I/O in mind, an OPeNDAP client will typically not do
| |
| any processing at all while it awaits a return message from a data
| |
| request. Any communication that must happen between the OPeNDAP software
| |
| and the user must occur without the involvement of the application
| |
| program that has invoked the OPeNDAP software. To avoid this limitation,
| |
| OPeNDAP starts up a \new{GUI manager} sub-process. This sub-process can
| |
| receive data from the OPeNDAP core software, and can operate the user's
| |
| graphical user interface. \indc{Tcl!interpreter
| |
|
| |
| subprocess}
| |
|
| |
| The operation of the GUI manager is illustrated in
| |
| [[Image:opd-server,fig,gui]]. As seen in the figure, the client
| |
| application can usually control the user's screen, but during a data
| |
| request, this communication is suspended. Until the request returns
| |
| control to the client application, messages returned from the OPeNDAP
| |
| server can be displayed to the user by passing them to the GUI manager
| |
| sub-process, who can display them in a window to the user.
| |
|
| |
| \figureplace{The Architecture of an OPeNDAP Client GUI.}{htbp}
| |
| {opd-server,fig,gui}{wish.ps}{wish.gif}{}
| |
|
| |
| The GUI manager in \OPDversion uses a Tcl/Tk interpreter (the
| |
| <font color='green'>wish</font> program is the default) to interpret messages from the
| |
| server. These messages usually contain Tcl programs to display
| |
| information to the user. However, the <font color='green'>wish</font> interpreter can also
| |
| be sent programs to query the user for more information, or draw
| |
| little rabbits on the screen or any other graphic function the server
| |
| needs to have displayed to the user. See Tcl and the Tk
| |
| Toolkit~\citel{osterhout:tcl} for more information about Tcl.
| |
|
| |
| By default, the GUI manager initializes by running the Tcl programs in
| |
| the files <font color='green'>dods_gui.tcl</font>, <font color='green'>error.tcl</font> and <font color='green'>progress.tcl</font>.
| |
| (These are stored in <font color='green'>$DODS_ROOT/etc</font>.) Server commands to the
| |
| GUI manager can use the functions defined in these files. Note also
| |
| that the user may be using a "safe" Tcl interpreter, with a
| |
| restricted subset of the usual array of Tcl commands available to it.
| |
| The user can control these features of the operation of the GUI by
| |
| changing several environment variables. These are described in
| |
| ([http://www <cite> opd-client,environment</cite>]).
| |
|
| |
|
| |
| A server will use the features of the GUI manager to display error
| |
| messages to the user. A server may also use the GUI to query a user to
| |
| correct whatever condition caused the error. For example, if a user has
| |
| misspelled some part of a constraint expression in a URL submitted to a
| |
| server, the server can send the constraint expression back to the user in
| |
| an edit window, with instructions to fix it. The user can edit the
| |
| expression, and send it back, allowing the server to proceed without
| |
| submitting a new request. Consult the client and server toolkit manual
| |
| for more information about the \class{Error} object on this subject.
| |
|
| |
| ==Building OPeNDAP Data Servers==
| |
|
| |
|
| |
| Though servers are included in the OPeNDAP core software, some
| |
| users may wish to write their own OPeNDAP data servers. The architecture
| |
| of the <font color='green'>httpd</font> server and the OPeNDAP core software make this a
| |
| relatively simple task.
| |
|
| |
| A user may wish to write his or her own OPeNDAP server for any or all of
| |
| the following reasons:
| |
|
| |
|
| |
| *The data to be served may be stored in a format not compatible
| |
| with one of the existing OPeNDAP servers.
| |
| *The data may be arranged in a fashion that allows a user to
| |
| optimize the access of those data by rewriting the service programs.
| |
| *The user may wish to provide ancillary data to OPeNDAP clients not
| |
| anticipated by the writers of the servers available.
| |
|
| |
| The design of the OPeNDAP library make the task a relatively simple one
| |
| for a programmer already familiar with the data access API to be used.
| |
| Also, though the servers provided with the OPeNDAP core software are
| |
| written in C++, they may be written in any language from which the
| |
| OPeNDAP libraries may be called.
| |
|
| |
| Once it is invoked, a CGI program scoops up whatever input is going to
| |
| the standard input stream of the Web server (<font color='green'>httpd</font>) that invoked it.
| |
| Further, the standard output of the CGI is piped directly to the WWW
| |
| library, which sends it directly back to the requesting client. This means
| |
| that the CGI program itself need only read its input from standard input
| |
| and write its output to standard output.
| |
|
| |
| Most of the task of writing a server, then, consists of reading the data
| |
| with the data access API and loading it into the OPeNDAP classes. Method
| |
| functions defined for each class make it simple to output the data so that
| |
| it may be sent back to the requesting client.
| |
|
| |
| Refer to [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] for specific information about the classes and the
| |
| facilities of the OPeNDAP core software, and instructions about how to
| |
| write a new server.
| |
|
| |
|
| |
| =Data and Data Models=
| |
|
| |
|
| |
| Basic to the operation of OPeNDAP is the translation of data from one
| |
| format to another. An OPeNDAP server must read data on some disk and
| |
| translate it into an intermediate format for transmission to the
| |
| client. It is to the question of these formats to which we shall turn
| |
| first.
| |
|
| |
| ==Data models==
| |
|
| |
|
| |
|
| |
| Any data set is made up of data and a \new{data model}. The data model
| |
| defines the size and arrangement of data values, and may be thought of
| |
| as an abstract representation of the relationship between one data
| |
| value and another. Though it may seem paradoxical, it is precisely
| |
| this relationship that defines the meaning of some number. Without the
| |
| context provided by a data model, a number does not represent
| |
| anything. For example, within some data set, it may be apparent that a
| |
| number represents the value of temperature at some point in space and
| |
| time. Without its neighboring temperature measurements, and without
| |
| the latitude, longitude, depth, and time, the same number means
| |
| nothing.
| |
|
| |
| As the model only defines an abstract set of relationships, two data
| |
| sets containing different data may share the same data model. For
| |
| example, the data produced by two different measurements with the same
| |
| instrument will use the same data model, though the values of the data
| |
| are different. Sometimes two models may be equivalent. For example,
| |
| an XBT measures a time series of temperature, but is usually stored as
| |
| a series of temperature and depth measurements. The temperature vs.
| |
| time model of the original data is equivalent to the temperature vs.
| |
| depth model of the stored data.
| |
|
| |
| In a computational sense, a data model may be considered to be the
| |
| data type or collection of data types used to represent that data. A
| |
| temperature measurement might occur as half an entry in a sequence of
| |
| temperature and depth pairs. However the data model also includes the
| |
| scalar latitude, longitude and date that identify the time and place
| |
| where the temperature measurements were taken. Thus the data set might
| |
| be represented in a C-like syntax like this ([[Image:fig,data,XBT-DDS]]):
| |
|
| |
| \begin{figure}[htbp]
| |
| <pre>
| |
| Dataset {
| |
|
| |
| Float64 lat;
| |
|
| |
| Float64 lon;
| |
|
| |
| Int32 minutes;
| |
|
| |
| Int32 day;
| |
|
| |
| Int32 year;
| |
|
| |
| Sequence {
| |
|
| |
| Float64 depth;
| |
|
| |
| Float64 temperature;
| |
|
| |
| } cast;
| |
| } xbt-station;
| |
| </pre>
| |
| \caption{Example Data Description of XBT Station}
| |
|
| |
| \end{figure}
| |
|
| |
| In the above example, a data set is described that contains all the
| |
| data from a single XBT. The data set is called <font color='green'>xbt-station</font>, and
| |
| contains floating-point representations of the latitude and longitude
| |
| of the station, and three integers that specify when the XBT was
| |
| released. The <font color='green'>xbt-station</font> contains a single sequence (called
| |
| <font color='green'>cast</font>) of measurements, which are here represented as values for
| |
| depth and temperature\footnote{In the remainder of this document, the
| |
| phrase "sequence data}, or just {\em sequence",
| |
| will mean an
| |
| ordered set of elements each of which contains one or more
| |
| sub-elements where all of the sub-elements of an element are
| |
| somehow related to each other.}.
| |
|
| |
| A different data model representing the same data might look like
| |
| this ([[Image:fig,data,XBT-DDS-struct]]):
| |
|
| |
| \begin{figure}[htbp]
| |
| <pre>
| |
| Dataset {
| |
|
| |
| Structure {
| |
|
| |
| Float64 lat;
| |
|
| |
| Float64 lon;
| |
|
| |
| } location;
| |
|
| |
| Structure {
| |
|
| |
| Int32 minutes;
| |
|
| |
| Int32 day;
| |
|
| |
| Int32 year;
| |
|
| |
| } time;
| |
|
| |
| Sequence {
| |
|
| |
| Float64 depth;
| |
|
| |
| Float64 temperature;
| |
|
| |
| } cast;
| |
| } xbt-station;
| |
| </pre>
| |
| \caption{Example Data Description of XBT Station Using Structures}
| |
|
| |
| \end{figure}
| |
|
| |
| In this example, several of the data have been grouped, implying a
| |
| relation between them. The nature of the relationship is not defined,
| |
| but it is clear that <font color='green'>lat</font> and <font color='green'>lon</font> are both components of
| |
| <font color='green'>location</font>, and that each measurement in the <font color='green'>cast</font>
| |
| sequence is made up of depth and temperature values.
| |
|
| |
|
| |
| In these two examples, meaning was added to the data set only by
| |
| providing a more refined context for the data values. No other data
| |
| was added, but still the second example can be said to contain more
| |
| information than the first one.
| |
|
| |
| These two examples are refinements of the same basic arrangement of
| |
| data. However, there is nothing that says that a completely different
| |
| data model can't be just as useful or just as accurate. For example,
| |
| the depth and temperature data, instead of being represented by a
| |
| sequence of pairs, as in [[Image:fig,data,XBT-DDS]] and [[Image:fig,data,XBT-DDS-struct]], could be represented by a pair of
| |
| sequences or arrays, as in [[Image:fig,data,XBT-DDS-array]]
| |
|
| |
| \begin{figure}[htbp]
| |
| <pre>
| |
| Dataset {
| |
|
| |
| Structure {
| |
|
| |
| Float64 lat;
| |
|
| |
| Float64 lon;
| |
|
| |
| } location;
| |
|
| |
| Structure {
| |
|
| |
| Int32 minutes;
| |
|
| |
| Int32 day;
| |
|
| |
| Int32 year;
| |
|
| |
| } time;
| |
|
| |
| Float64 depth[500];
| |
|
| |
| Float64 temperature[500];
| |
| } xbt-station;
| |
| </pre>
| |
| \caption{Example Data Description of XBT Station Using Arrays}
| |
|
| |
| \end{figure}
| |
|
| |
| The relationship between the depth and temperature variables
| |
| is no longer clear, but, depending on what sort of processing is
| |
| intended, this may not be that important a loss.
| |
|
| |
| The choice of a computational data model to contain some data set
| |
| depends in many cases on the whims and preferences of the user, as
| |
| well as on the data analysis software to be used. Several different
| |
| data models may be equally useful for a given task. Of course, some
| |
| data models will contain more information about the data than others,
| |
| but this information can also be carried in a scientist's head.
| |
|
| |
| Note that with a carefully chosen set of data type constructors,
| |
| such as those we've used in the preceding examples, a user can
| |
| implement an infinite number of data models. The examples above use
| |
| the OPeNDAP Dataset Descriptor Structure (DDS) format, which will
| |
| become important in later discussions of the details of the OPeNDAP Data
| |
| Access Protocol. The precise details of the DDS syntax are described
| |
| in ([http://www <cite> data,dds</cite>]).
| |
|
| |
| ===Data Models and APIs===
| |
|
| |
|
| |
|
| |
| A data access Application Program Interface (API) is a library
| |
| of functions designed to be used by a computer program to read, write,
| |
| and sample data. Any given data access API can be said to define
| |
| implicitly some data model. That is, the functions that make up the
| |
| API accept and return data using a certain collection of computational
| |
| data types: multi-dimensional arrays might be required for some data,
| |
| scalars for others, lists for others. This collection of data types,
| |
| and their use constitute the data model represented by that API. (Or
| |
| data models---there is no reason an API cannot accommodate several
| |
| different models.)
| |
|
| |
| Among others, OPeNDAP currently supports two very different data access
| |
| APIs: netCDF and JGOFS\@. The netCDF API is designed for access to
| |
| gridded data, but has some limited capacity to access sequence data.
| |
| The JGOFS API provides access to relational or sequence data. Both
| |
| APIs support access in several programming languages (at least C and
| |
| Fortran) and both provide extensive support for limiting the amount of
| |
| data retrieved. For example a program accessing a gridded dataset
| |
| using netCDF can extract a subsampled portion or \new{hyperslab} of
| |
| that data. Likewise, the JGOFS API provides a powerful set of
| |
| operators which can be used to specify which sequence elements to
| |
| extract (for example, a user could request only those values
| |
| corresponding to data captured between 12:01am and 11:59am) as well as
| |
| masking certain parameters from the returned elements so that only
| |
| those parameters needed by the program are returned.
| |
|
| |
|
| |
| ===Translating Data Models===
| |
|
| |
|
| |
|
| |
| The problem of data model translation is central to the implementation
| |
| of OPeNDAP. With an effective data translator, an OPeNDAP program originally
| |
| designed to read netCDF data can have some access to data sets that
| |
| use an incompatible data model, such as JGOFS.
| |
|
| |
| In general, it is not possible to define an algorithm that will
| |
| translate data from any model to any other, without losing information
| |
| defined by the position of data values or the relations between them.
| |
| Some of these incompatibilities are obvious; a data model designed for
| |
| time series data may not be able to accommodate multi-dimensional
| |
| arrays. Others are more subtle. For example, a sequence looks very
| |
| similar to a collection of lists in many respects. However, a
| |
| sequence is an ''ordered'' collection of data types, whereas a list
| |
| implies no order. However, there are many useful translations that
| |
| can be done, and there are many others that are still useful despite
| |
| their inherent information loss.
| |
|
| |
| For example, consider a relational structure like the one in
| |
| [[Image:fig,data,XBT-DDS-ex]]. This is similar to the examples in
| |
| ([http://www <cite> data,model</cite>]), rearranged to accommodate an entire cruise
| |
| worth of temperature-depth measurements. This is the sort of data type
| |
| that the JGOFS API is designed to use.
| |
|
| |
| \begin{figure}[htbp]
| |
| <pre>
| |
| Dataset {
| |
|
| |
| Sequence {
| |
|
| |
| Int32 id;
| |
|
| |
| Float64 latitude;
| |
|
| |
| Float64 longitude;
| |
|
| |
| Sequence {
| |
|
| |
| Float64 depth;
| |
|
| |
| Float64 temperature;
| |
|
| |
| } xbt_drop;
| |
|
| |
| } station;
| |
| } cruise;
| |
| </pre>
| |
| \caption{Example Data Description of XBT Cruise}
| |
|
| |
| \end{figure}
| |
|
| |
| Note that each entry in the <font color='green'>cruise</font> sequence is composed of a
| |
| tuple of data values (one of which is itself a sequence). Were we to
| |
| arrange these data values as a table, they might look like this:
| |
|
| |
| <pre>
| |
| id lat lon depth temp
| |
| 1 10.8 60.8 0 70
| |
|
| |
| 10 46
| |
|
| |
| 20 34
| |
| 2 11.2 61.0 0 71
| |
|
| |
| 10 45
| |
|
| |
| 20 34
| |
| 3 11.6 61.2 0 69
| |
|
| |
| 10 47
| |
|
| |
| 20 34
| |
| </pre>
| |
|
| |
| This can be made into an array, although that introduces redundancy.
| |
|
| |
| <pre>
| |
| id lat lon depth temp
| |
| 1 10.8 60.8 0 70
| |
| 1 10.8 60.8 10 46
| |
| 1 10.8 60.8 20 34
| |
| 2 11.2 61.0 0 71
| |
| 2 11.2 61.0 10 45
| |
| 2 11.2 61.0 20 34
| |
| 3 11.6 61.2 0 69
| |
| 3 11.6 61.2 10 47
| |
| 3 11.6 61.2 20 34
| |
| </pre>
| |
|
| |
| The data is now in a form that may be read by an API such as
| |
| netCDF. But consider the analysis stage. Suppose a user wants to see
| |
| graphs of station data. It is not obvious simply from the arrangement
| |
| of the array where a station stops and the next one begins. Analyzing
| |
| data in this format is not a function likely to be accommodated by a
| |
| program that uses the netCDF API.
| |
| \tbd{This section will be finished when the form of the translation specification
| |
| is determined.}
| |
|
| |
| ==Data Access Protocol==
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| The OPeNDAP Data Access Protocol (DAP) defines how an OPeNDAP client
| |
| and an OPeNDAP server communicate with one another to pass data from the
| |
| server to the client. The job of the functions in the OPeNDAP client
| |
| library is to translate data from the DAP into the form expected by
| |
| the data access API for which the OPeNDAP library is substituting. The
| |
| job of an OPeNDAP server is to translate data stored on a disk in whatever
| |
| format they happen to be stored in to the DAP for transmission to the
| |
| client.
| |
|
| |
| The DAP consists of four components:
| |
|
| |
|
| |
| #An "intermediate data representation" for data sets. This is
| |
| used to transport data from the remote source to the client. The
| |
| data types that make up this representation may be thought of as the
| |
| OPeNDAP data model.
| |
| #A format for the "ancillary data" needed to translate
| |
| a data set into the intermediate representation, and to translate
| |
| the intermediate representation into the target data model. The
| |
| ancillary data in turn consists of two pieces:
| |
|
| |
|
| |
| #*A description of the shape and size of the various data types
| |
| stored in some given data set. This is called the \new{Data
| |
| Description Structure} (DDS).
| |
|
| |
| #*Capsule descriptions of some of the properties of the data
| |
| stored in some given data set. This is the \new{Data Attribute
| |
| Structure} (DAS).
| |
|
| |
| #:
| |
| #A "procedure" for retrieving data and ancillary data from
| |
| remote platforms.
| |
| #An "API" consisting of OPeNDAP classes and data access
| |
| calls designed to implement the protocol,
| |
|
| |
| The intermediate data representation and the ancillary data formats
| |
| are introduced in ([http://www <cite> data,types</cite>]) and
| |
| ([http://www <cite> data,ancillary</cite>]), below. The steps of the procedure are
| |
| outlined in ([http://www <cite> opd-server,arch</cite>]), and the OPeNDAP core software
| |
| is described in the [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>] .
| |
|
| |
| ==Data representation==
| |
|
| |
|
| |
| There are many popular data storage formats, and many more than that in
| |
| use. These formats are optimized (it they are optimized at all) for data
| |
| storage, and are not generally suitable for data transmission. In order to
| |
| transmit data over the Internet, OPeNDAP must translate the data model used
| |
| by a particular storage format into the data model used for transmission.
| |
|
| |
| If the data model for transmission is defined to be general enough to encompass
| |
| the abstractions of several data models for storage, than this intermediate
| |
| representation--the transmission format--can be used to translate between one data
| |
| model and another.
| |
|
| |
| The OPeNDAP data model consists of a fairly elementary set of base types, combined
| |
| with an advanced set of constructs and operators that allows it to
| |
| define data types of arbitrary complexity. This way, the OPeNDAP data access
| |
| protocol can be used to transmit data from virtually any data storage format.
| |
|
| |
| The elements of the OPeNDAP data access protocol are:
| |
|
| |
|
| |
| *{\bf Base Types} These are the simple data types, like integers, floating point numbers, strings, and character data.
| |
| *{\bf Constructor Types} These are the more complex data types that can be constructed from the simple base types. Examples are structures, sequences, arrays, and grids.
| |
| *{\bf Operators} Access to data can be operationally defined with operators defined on the various data types.
| |
| *{\bf External Data Representation} In order to transmit the data across the Internet, there needs to be a machine-independent definition of what the various data types look like. For example, the client and server need to agree on the most significant digit of a particular byte in the message
| |
|
| |
| These elements are defined in greater detail in the sections that follow.
| |
|
| |
| ===Base Types===
| |
|
| |
|
| |
| The OPeNDAP data model uses the concepts of variables and operators. Each
| |
| data set is defined by a set of one or more variables, and each
| |
| variable is defined by a set of attributes. A variable's
| |
| ''attributes'' ---such as units, name and type---must not be
| |
| confused with the data ''value'' (or values) that may be
| |
| represented by that variable. A variable called <font color='green'>time</font> may contain
| |
| an integer number of minutes, but it does not contain a particular
| |
| number of minutes until a context, such as a specific event recorded
| |
| in a data set, is provided. Each variable may further be the object of
| |
| an operator that defines a subset of the available data set. This is
| |
| detailed in ([http://www <cite> data,operators</cite>]).
| |
| Variables in the OPeNDAP DAP have two forms. They are either base types
| |
| or type constructors. Base type variables are similar to predefined
| |
| variables in procedural programming languages like C or Fortran (such
| |
| as <font color='green'>int</font> or <font color='green'>integer*4</font>). While these certainly have an
| |
| internal structure, it is not possible to access parts of that
| |
| structure using the DAP\@. Base type variables in the DAP have two
| |
| predefined attributes (or characteristics): name, and type.\tbd{Should
| |
| also have "units."} They are defined as follows:
| |
|
| |
|
| |
|
| |
| <blockquote>
| |
|
| |
|
| |
| ; Name : A unique identifier that can be used to reference the part of
| |
|
| |
| the dataset associated with this variable.
| |
|
| |
|
| |
| ; Type : The data type contained by the variable. This can be one
| |
| of <font color='green'>Byte</font>, <font color='green'>Int32</font>, <font color='green'>UInt32</font>, <font color='green'>Float64</font>,
| |
|
| |
| <font color='green'>String</font>, and
| |
| <font color='green'>URL</font>\@. Where:
| |
|
| |
| <blockquote>
| |
|
| |
|
| |
| ; <font color='green'>Byte</font> : is a single byte of data. This is the same as
| |
| <font color='green'>unsigned char</font> in ANSI C\@.
| |
|
| |
|
| |
| ; <font color='green'>Int32</font> : is a 32 bit two's complement integer---it
| |
| is synonymous with long in ANSI C when that type is implemented as 32
| |
| bits.
| |
|
| |
|
| |
| ; <font color='green'>UInt32</font> : is a 32 bit unsigned integer.
| |
|
| |
|
| |
| ; <font color='green'>Float64</font> : is the IEEE 64 bit floating point data type.
| |
|
| |
|
| |
| ; <font color='green'>String</font> : is a sequence of bytes terminated by a null
| |
| character.
| |
|
| |
| ; <font color='green'>Url</font> : is a string containing an OPeNDAP URL. Please refer to
| |
| ([http://www <cite> opd-client,url</cite>]) for more information about these
| |
| strings. A special <font color='green'>*</font> operator is defined for a URL. If the
| |
| variable <font color='green'>my-url</font> is defined as a URL data type, then
| |
| <font color='green'>my-url</font> indicates the string spelling out the URL, and <font color='green'>*my-url</font> indicates the data specified by the URL.
| |
|
| |
| </blockquote>
| |
|
| |
| </blockquote>
| |
|
| |
| The declaration in a DDS of a variable of any of the base types is simply the type
| |
| of the variable, followed by its name, and a semicolon. For example, to declare
| |
| a <font color='green'>month</font> variable to be a 32-bit integer, one would type:
| |
|
| |
| <pre>
| |
| Int32 month;
| |
| </pre>
| |
|
| |
|
| |
| ===Constructor Types===
| |
|
| |
|
| |
|
| |
|
| |
| Constructor types, such as arrays, structures, and lists, describe the
| |
| grouping of one or more variables within a dataset. These classes are
| |
| used to describe different types of relations between the variables
| |
| that comprise the dataset. For example, an array might indicate that
| |
| the variables grouped are all measurements of the same quantity with
| |
| some spatial relation to one another, whereas a structure might
| |
| indicate a grouping of measurements of disparate quantities that
| |
| happened at the same place and time.
| |
|
| |
| There are six classes of type constructor variables defined by the
| |
| OPeNDAP DAP: lists, arrays, structures, sequences, functions, and grids.
| |
| The types are defined as:
| |
|
| |
| <blockquote>
| |
|
| |
|
| |
|
| |
| [\class{List}] The list type constructor is used to hold
| |
|
| |
| lists of 0 or more items of one type. Lists are specified using the
| |
|
| |
| keyword <font color='green'>list</font> before the variable's class, for example,
| |
|
| |
| <font color='green'>list int32</font> or <font color='green'>list grid</font>. Access to the elements of a
| |
|
| |
| list is possible using one of the three operators shown in
| |
|
| |
| \tableref{data,tab,class-ops}:
| |
|
| |
|
| |
| <blockquote>
| |
|
| |
|
| |
| \item{"list}.<font color='green'>length</font>" Returns the integer length of the
| |
|
| |
| "list".
| |
|
| |
| \item{"list}.<font color='green'>nth({\em n</font>)}} Returns the {\em n"th member of
| |
|
| |
| the "list".
| |
|
| |
|
| |
| \item{"list}.<font color='green'>member({\em value</font>)}} Returns \lit{true" if
| |
|
| |
| the "value} is a member of the {\em list".
| |
|
| |
|
| |
| </blockquote>
| |
|
| |
|
| |
| \note{ The syntax of these operators differs between their use in a
| |
|
| |
| C++ program and a constraint expression. The length of some list,
| |
|
| |
| given by <font color='green'>list.length()</font> in a program, would be
| |
|
| |
| <font color='green'>length(list)</font> in a constraint expression. Similarly, in a
| |
|
| |
| constraint expression, the position of a value in a list is given
| |
|
| |
| by <font color='green'>nth(list, value)</font>, and the presence of a value is
| |
|
| |
| indicated by <font color='green'>member(list, value)</font>. See
| |
|
| |
| ([http://www <cite> opd-client,constraint</cite>]) for more information about
| |
|
| |
| constraint expressions.}
| |
|
| |
|
| |
| A list declaration to create a list of integers would look like the
| |
|
| |
| following:
| |
|
| |
| <pre>
| |
| List Int32 months;
| |
| </pre>
| |
|
| |
|
| |
| [\class{Array}] An array is a one dimensional indexed data
| |
|
| |
| structure as defined by ANSI C\@. Multidimensional arrays are
| |
|
| |
| defined as arrays of arrays. An array may be subsampled using
| |
|
| |
| subscripts or ranges of subscripts enclosed in brackets (<font color='green'>()</font>).
| |
|
| |
| For example, <font color='green'>temp[3][4]</font> would indicate the value in the fourth
| |
|
| |
| row and fifth column of the <font color='green'>temp</font> array.\footnote{As in C, OPeNDAP
| |
|
| |
| array indices start at zero.} A chunk of an array may be specified
| |
|
| |
| with subscript ranges; the array <font color='green'>temp[2:10][3:4]</font> indicates an
| |
|
| |
| array of nine rows and two columns whose values have been lifted
| |
|
| |
| intact from the larger <font color='green'>temp</font> array.
| |
|
| |
|
| |
|
| |
| A \new{hyperslab} may be selected from an
| |
|
| |
| array with a \new{stride} value. The array
| |
|
| |
| represented by <font color='green'>temp[2:2:10][3:4]</font> would have only five rows;
| |
|
| |
| the middle value in the first subscript range indicates that the
| |
|
| |
| output array values are to be selected from alternate input array
| |
|
| |
| rows. The array <font color='green'>temp[2:3:10][3:4]</font> would select from every
| |
|
| |
| third row, and so on. \tableref{data,tab,class-ops} shows the syntax
| |
|
| |
| for array accesses including hyperslabs.
| |
|
| |
|
| |
| To declare a <math>5x6</math> array of floating point numbers, the declaration
| |
| would look like the following:
| |
|
| |
| <pre>
| |
| Float64 data[5][6];
| |
| </pre>
| |
|
| |
| In addition to its magnitude, every dimension of an array may also
| |
| have a name. The previous declaration could be written:
| |
|
| |
| <pre>
| |
| Float64 data[height = 5][width = 6];
| |
| </pre>
| |
|
| |
|
| |
| [\class{Structure}] A Structure is a class that may contain
| |
| several variables of different classes. However, though it implies
| |
| that its member variables are related somehow, it conveys no
| |
| relational information about them. The structure type can also be
| |
| used to group a set of unrelated variables together into a single
| |
| dataset. The "dataset" class name is a synonym for {\tt
| |
| structure}.
| |
|
| |
| A Structure declaration containing some data and the month in which
| |
| the data was taken might look like this:
| |
|
| |
| <pre>
| |
| Structure {
| |
| Int32 month;
| |
| Float64 data[5][6];
| |
| } measurement;
| |
| </pre>
| |
|
| |
| Use the <math>.</math> operator to refer to members of a \class{Structure}. For
| |
| example, <font color='green'>measurement.month</font> would identify the integer member of
| |
| the \class{Structure} defined in the above declaration.
| |
|
| |
|
| |
| [\class{Sequence}] A \class{Sequence} is an ordered set of
| |
|
| |
| variables each of which may have several values. The variables may
| |
|
| |
| be of different classes. Each element of a \class{Sequence} consists
| |
|
| |
| of a value for each member variable. Thus a \class{Sequence} can be
| |
|
| |
| represented as:
| |
|
| |
| \begin{displaymath}
| |
| \begin{matrix}{ccc}
| |
| s_{0 0} & \cdots & s_{0 n}
| |
|
| |
| \vdots & \ddots & \vdots
| |
|
| |
| s_{i 0} & \cdots & s_{i n}
| |
| \end{matrix}
| |
| \end{displaymath}
| |
|
| |
| \noindent Every instance of sequence <math>S</math> has the same number, order,
| |
| and class of member variables. A \class{Sequence} implies that each of
| |
| the variables is related to each other in some logical way. For
| |
| example, a sequence containing position and temperature measurements
| |
| might imply that the temperature measurements were taken at the
| |
| corresponding position. A sequence is different from a structure
| |
| because its constituent variables have several instances while a
| |
| structure's variables have only one instance (or value). Because a
| |
| sequence has several values for each of its variables it has an
| |
| implied \new{state}, in addition to those values. The state
| |
| corresponds to a single element in the sequence.
| |
|
| |
|
| |
| A \class{Sequence} declaration is similar to a \class{Structure}'s.
| |
| For example, the following would define a \class{Sequence} that would
| |
| contain many members like the \class{Structure} defined above:
| |
|
| |
| <pre>
| |
| Sequence {
| |
| Int32 month;
| |
| Float64 data[5][6];
| |
| } measurement;
| |
| </pre>
| |
|
| |
| \noindent Note that, unlike an \class{Array}, a \class{Sequence} has
| |
| no index. This means that a \class{Sequence}'s values are not
| |
| simultaneously accessible. Like a \class{Structure}, the variable
| |
| <font color='green'>measurement.month</font> has a single value. The distinction is that
| |
| this variable's value changes depending on the state of the
| |
| \class{Sequence}.
| |
|
| |
| %
| |
| ["Function"] Functions are a subclass of sequences and are
| |
| [\class{Grid}] is an association of
| |
| an <math>N</math> dimensional array with <math>N</math> named vectors (one-dimensional
| |
|
| |
| arrays), each of which has the same number of elements as the
| |
|
| |
| corresponding dimension of the array. Each data value in the grid
| |
|
| |
| is associated with the data values in the vectors associated with
| |
|
| |
| its dimensions.
| |
|
| |
|
| |
| As an example, consider an array of temperature values that is six
| |
|
| |
| columns wide by five rows long. Suppose that this array represents
| |
|
| |
| measurements of temperature at five different depths in six
| |
|
| |
| different locations. The problem is the indication of the precise
| |
|
| |
| location of each temperature measurement, relative to one
| |
|
| |
| another.\footnote{The absolute location and orientation of the
| |
|
| |
| entire array is specified by another set of scalar values; we are
| |
|
| |
| here considering the relationship between data type members.}
| |
|
| |
|
| |
| If the six locations are evenly spaced, and the five depths are also
| |
|
| |
| evenly spaced, then the data set can be completely described using
| |
|
| |
| the array and two scalar values indicating the distance between
| |
|
| |
| adjacent vertices of the array. However, if the spacing of the
| |
|
| |
| measurements is ''not'' regular, as in [[Image:data,fig,grid]]
| |
|
| |
| then an array will be inadequate. To adequately describe the
| |
|
| |
| positions of each of the points in the grid, the precise location of
| |
|
| |
| each volume and row must be described.
| |
|
| |
| \figureplace{An Irregular Grid of Data.}{htbp}
| |
| {data,fig,grid}{grid.ps}{grid.gif}{}
| |
|
| |
| The secondary
| |
| vectors in the \class{Grid} data type provide a solution to this
| |
| problem. Each member of these vectors defines a value for all the data
| |
| values in the corresponding rank of the array. The value can represent
| |
| location or time or some other quantity, and can even be a constructor
| |
| data type. The following declaration would define a data type that
| |
| could accommodate a structure like this:
| |
|
| |
| <pre>
| |
|
| |
| Grid {
| |
|
| |
| Float64 data[distance = 6][depth = 5];
| |
|
| |
| Float64 distance[6];
| |
|
| |
| Float64 depth[5];
| |
|
| |
| } measurement;
| |
| </pre>
| |
|
| |
| In the above example, an vector called <font color='green'>depth</font> would contain five
| |
| values corresponding to the depths of each row of the array, while
| |
| another vector called <font color='green'>distance</font> might contain the scalar distance
| |
| between the location of the corresponding column, and some reference
| |
| point. The <font color='green'>distance</font> array could also contain six (latitude,
| |
| longitude) pairs indicating the absolute location of each column of
| |
| the grid.
| |
|
| |
| <pre>
| |
|
| |
| Grid {
| |
|
| |
| Float64 data[distance = 6][depth = 5];
| |
|
| |
| Float64 depth[5];
| |
|
| |
| Array Structure {
| |
|
| |
| Float64 latitude;
| |
|
| |
| Float64 longitude;
| |
|
| |
| } distance[6];
| |
|
| |
| } measurement;
| |
| </pre>
| |
|
| |
| </blockquote>
| |
| ===Operators===
| |
|
| |
|
| |
| Access to variables can be modified using operators. Each type of
| |
| variable has its own set of selection and projection operators which
| |
| can be used to modify the result of accessing a variable of that type.
| |
| \tableref{data,tab,class-ops} lists the types and the operators
| |
| applicable to them. In the table, operators have the meaning defined
| |
| by ANSI C except as follows: the array hyperslab operators are as
| |
| defined by netCDF\citel{netcdf}, the string operators are as defined
| |
| by AWK\citel{kern:upe}, and the list operators are as defined by
| |
| Common Lisp\citel{steele:clisp}.
| |
|
| |
| \begin{table}[htbp]
| |
| \caption{Classes and operators in the DAP\@.}
| |
|
| |
| \begin{center}
| |
| \begin{tabular}{|p{0.75in}|p{2in}|} \hline
| |
| \tblhd{Class} & \tblhd{Operators}
| |
|
| |
| \hline \hline
| |
| \multicolumn{2}{|c|}"Simple Types\/"
| |
| \hline
| |
| \class{Byte}, \class{Int32}, \class{UInt32}, \class{Float64} & <font color='green'>< > = != <= >=</font>
| |
| \hline
| |
| \class{String} & <font color='green'>= != </font> \math[\~{}=]{\sim =}
| |
| \hline
| |
| \class{URL} & <font color='green'>*</font>
| |
| \hline
| |
| \multicolumn{2}{|c|}"Compound Types\/"
| |
| \hline
| |
| \class{Array} & <font color='green'>[start:stop] [start:stride:stop]</font>
| |
| \hline
| |
| \class{List} & <font color='green'>length("list</font>), nth({\em list,n}), member({\em list,elem})"
| |
| \hline
| |
| \class{Structure} & <font color='green'>.</font>
| |
| \hline
| |
| \class{Sequence} & <font color='green'>.</font>
| |
| \hline
| |
| \class{Grid} & <font color='green'>[start:stop] [start:stride:stop] .</font>
| |
| \hline
| |
| \end{tabular}
| |
| \end{center}
| |
| \end{table}
| |
|
| |
| Two of the operators deserve special note. Individual fields of type
| |
| constructors may be accessed using the dot (<font color='green'>.</font>) operator or the
| |
| virtual file system syntax. If a structure <font color='green'>s</font> has two fields
| |
| <font color='green'>time</font> and <font color='green'>temperature</font>, then those fields may be accessed
| |
| using <font color='green'>s.time</font> and <font color='green'>s.temperature</font> or as <font color='green'>s/time</font> and
| |
| <font color='green'>s/temperature</font>. Also, a special dereferencing <font color='green'>*</font>
| |
| operator is defined for a URL. This is roughly analogous to the
| |
| pointer-dereference operator of ANSI C. That is, if the variable
| |
| <font color='green'>my-url</font> is defined as a URL data type, then <font color='green'>my-url</font>
| |
| indicates the string spelling out the URL, and <font color='green'>*my-url</font> indicates
| |
| the actual data indicated by the URL.
| |
|
| |
| More information about variables and operators can be found in the
| |
| discussion of constraint expressions in
| |
| ([http://www <cite> opd-client,constraint</cite>]).
| |
|
| |
| ===External Data Representation===
| |
|
| |
|
| |
| Each of the base-type and type constructor variables has an
| |
| external representation defined by the OPeNDAP data access
| |
| protocol. This representation is used when an object of the given
| |
| type is transferred from one computer to another. Defining a single
| |
| external representation simplifies the translation of variables from
| |
| one computer to another when those computers use different internal
| |
| representations for those variable types.
| |
|
| |
| \begin{table}[htbp]
| |
| \caption{The XDR data types corresponding to OPeNDAP base-type variables}
| |
|
| |
| \begin{center}
| |
| \begin{tabular}{|l|l|} \hline
| |
| \tblhd{Base Type} & \tblhd{XDR Type}
| |
| \hline
| |
| \class{Byte} & <font color='green'>xdr byte</font>
| |
| \hline
| |
| \class{Int32} & <font color='green'>xdr long</font>
| |
| \hline
| |
| \class{UInt32} & <font color='green'>xdr unsigned long</font>
| |
| \hline
| |
| \class{Float64} & <font color='green'>xdr double</font>
| |
| \hline
| |
| \class{String} & <font color='green'>xdr string</font>
| |
| \hline
| |
| \class{URL} & <font color='green'>xdr string</font>
| |
| \hline
| |
| \end{tabular}
| |
| \end{center}
| |
| \end{table}
| |
|
| |
| Constraint expressions do not affect ''how'' a base-type variable
| |
| is transmitted from a client to a server; they determine ''if'' a
| |
| variable is to be transmitted. For constructor type variables,
| |
| however, constraint expressions may be used to exclude portions of the
| |
| variable. For example, if a constraint expression is used to select
| |
| the first three of six fields in a structure, the last three fields of
| |
| that structure are not transmitted by the server.
| |
|
| |
| The data access protocol uses Sun Microsystems' XDR
| |
| protocol\citel{xdr} for the external representation of all of the base
| |
| type variables. \tableref{data,tab,base-xdr} shows the XDR types used
| |
| to represent the various base type variables.
| |
|
| |
| In order to transmit constructor type variables, the data access
| |
| protocol defines how the various base type variables, which comprise
| |
| the constructor type variables, are transmitted. Any constructor type
| |
| variable may be subject to a constraint expression which changes the
| |
| amount of data transmitted for the variable (see
| |
| ([http://www <cite> opd-client,constraint</cite>]) for more information about
| |
| constraint expressions.). For each of the six constructor types these
| |
| definitions are:
| |
|
| |
| <blockquote>
| |
|
| |
|
| |
| [\class{Array}] An \class{Array} is sent using the
| |
|
| |
| <font color='green'>xdr_array</font> function. This means that an \class{Array} of 100
| |
|
| |
| <font color='green'>Int32</font>s is sent as a single block of 100 <font color='green'>xdr long</font>s, not
| |
|
| |
| 100 separate "xdr long"s.
| |
|
| |
|
| |
| [\class{List}] A \class{List} is sent as if it were an
| |
|
| |
| \class{Array}.
| |
|
| |
|
| |
| [\class{Structure}] A \class{Structure} is sent by encoding each
| |
|
| |
| field in the order those fields are declared in the DDS and
| |
|
| |
| transmitting the resulting block of bytes.
| |
|
| |
|
| |
| [\class{Sequence}] A \class{Sequence} is transmitted by encoding
| |
|
| |
| each item in the sequence as if it were a \class{Structure}, and
| |
| sending each such structure after the other, in the order of their
| |
|
| |
| occurrence in the sequence. The entire sequence is sent, subject to
| |
|
| |
| the constraint expression. In other words, if no constraint
| |
|
| |
| expression is supplied then the entire sequence is sent. However, if
| |
|
| |
| a constraint expression is given all the records in the sequence
| |
|
| |
| that satisfy the expression are sent\footnote{The client process can
| |
|
| |
| limit the information received by either using a constraint
| |
|
| |
| expression or prematurely closing the I/O stream. In the latter
| |
|
| |
| case the server will exit without sending the entire sequence.}.
| |
|
| |
|
| |
| %
| |
| [Function] A Function is encoded as if it were a Sequence (one
| |
|
| |
|
| |
|
| |
|
| |
| [\class{Grid}] A \class{Grid} is encoded as if it were a
| |
|
| |
| \class{Structure} (one component after the other, in the order of
| |
|
| |
| their declaration).
| |
|
| |
| </blockquote>
| |
|
| |
| The external data representation used by an OPeNDAP server and client may
| |
| be compressed, depending on the configuration of the respective
| |
| machines. The compression is done using the <font color='green'>gzip</font> program.
| |
| Only the data transmission itself will be affected by this; the
| |
| transmission of the ancillary data is not compressed.
| |
|
| |
|
| |
| ==Ancillary data==
| |
|
| |
|
| |
| In order to use some data set, a user must have some information at
| |
| his or her disposal that is not strictly included in the data set
| |
| itself. This information, called \new{ancillary data}
| |
| \footnote{We
| |
|
| |
| have learned to shy away from this term since we have found that
| |
|
| |
| `metadata' to one person is `data' to another; the categorization
| |
|
| |
| often limits the usefulness of the underlying information.}),
| |
| describes the shape and size of the data types that make up the data
| |
| set, and provides information about many of the data set's attributes,
| |
| as well. OPeNDAP uses two different structures, to supply this ancillary
| |
| information about an OPeNDAP data set. The Dataset Descriptor Structure
| |
| (DDS) describes the data set's structure and the relationships between
| |
| its variables, and the Dataset Attribute Structure (DAS) provides
| |
| information about the variables themselves. Both structures are
| |
| described in the following sections.
| |
|
| |
| ===Dataset Descriptor Structure===
| |
|
| |
|
| |
| In order to translate data from one data model into another, OPeNDAP must
| |
| have some knowledge about the types of the variables, and their
| |
| semantics, that comprise a given data set. It must also know something
| |
| about the relations of those variables---even those relations which
| |
| are only implicit in the dataset's own API\@. This knowledge about the
| |
| dataset's structure is contained in a text description of the dataset
| |
| called the \new{Dataset Description Structure}.
| |
|
| |
|
| |
|
| |
| The DDS does not describe how the information in the dataset is
| |
| physically stored, nor does it describe how the data set API is used
| |
| to access that data. Those pieces of information are contained in the
| |
| API itself and in the OPeNDAP server, respectively. The server uses the
| |
| DDS to describe the structure of a particular dataset to a
| |
| translator---the DDS contains knowledge about the dataset variables
| |
| and the interrelations of those variables. In addition, the DDS can
| |
| be used to satisfy some of the DODS-supported API data set description
| |
| calls. For example, netCDF has a function which returns the names of
| |
| all the variables in a netCDF data file. The DDS can be used to get
| |
| that information.
| |
|
| |
| The DDS is a textual description of the variables and their classes
| |
| that make up some data set. The DDS syntax is based on the variable
| |
| declaration and definition syntax of C and C++. A variable that is a
| |
| member of one of the base type classes is declared by writing the
| |
| class name followed by the variable name. The type constructor classes
| |
| are declared using C's brace notation. A grammar for the syntax is
| |
| given in \tableref{data,tab,DDS}. Each of the keywords for the type
| |
| constructor and base type classes have already been described in
| |
| ([http://www <cite> data,types</cite>]). The <font color='green'>Dataset</font> keyword has the same
| |
| syntactic function as \class{Structure} but is used for the specific job
| |
| of enclosing the entire data set even when it does not technically
| |
| need an enclosing element.
| |
|
| |
| \begin{table}[htbp]
| |
| \caption{Dataset Descriptor Structure Syntax}
| |
|
| |
| \small
| |
| \begin{center}
| |
| \begin{tabular}{|l|l|} \hline
| |
| "data set\/} & <font color='green'>Dataset</font> <font color='green'>\{</font> {\em declarations\/} <font color='green'>\</font>"
| |
|
| |
| " name\/} \lit{;"
| |
| \hline
| |
|
| |
| "declaration\/} & <font color='green'>List</font> {\em declaration\/"
| |
|
| |
|
| |
| & "base-type} {\em var\/} \lit{;"
| |
|
| |
|
| |
| & <font color='green'>Structure</font> <font color='green'>\{</font> "declarations\/} <font color='green'>\</font>"
| |
|
| |
| "var\/} \lit{;"
| |
|
| |
|
| |
| & <font color='green'>Sequence</font> <font color='green'>\{</font> "declarations\/} <font color='green'>\</font>"
| |
|
| |
| "var\/} \lit{;"
| |
|
| |
|
| |
| & <font color='green'>Grid</font> <font color='green'>\{</font> <font color='green'>ARRAY</font> <font color='green'>:</font>
| |
|
| |
| "declaration\/} <font color='green'>MAPS</font> \lit{:"
| |
|
| |
| "declarations\/} <font color='green'>\</font>"
| |
|
| |
| "var\/} \lit{;"
| |
| \hline
| |
|
| |
| "base-type\/} & \lit{Byte"
| |
|
| |
|
| |
| & <font color='green'>Int32</font>
| |
|
| |
|
| |
| & <font color='green'>UInt32</font>
| |
|
| |
|
| |
| & <font color='green'>Float64</font>
| |
|
| |
| & <font color='green'>String</font>
| |
|
| |
|
| |
| & <font color='green'>Url</font>
| |
| \hline
| |
|
| |
| "var\/} & {\em name"
| |
|
| |
|
| |
| & "name} {\em array-decl\/"
| |
| \hline
| |
|
| |
| "array-decl\/} & <font color='green'>[</font> integer \lit{]"
| |
|
| |
|
| |
| & <font color='green'>[</font> "name} <font color='green'>=</font> integer \lit{]"
| |
| \hline
| |
|
| |
| "name\/" & User-chosen name of data set, variable,
| |
|
| |
| or array dimension.
| |
| \hline
| |
| \end{tabular}
| |
| \end{center}
| |
| \normalsize
| |
| \end{table}
| |
|
| |
| Different data access APIs will store the information in the DDS in
| |
| different places. Some APIs are self-documenting in the sense that the
| |
| data files themselves will contain all the information about the
| |
| structure of their data types. Other APIs need secondary files
| |
| containing what is called ancillary data, describing the data
| |
| structure. For some APIs, such as netCDF, gathering the ancillary
| |
| information from the data archive may be a time-consuming process. The
| |
| OPeNDAP server for these APIs may cache ancillary data files to save
| |
| time. An example DDS entry is shown in [[Image:data,fig,dds]]. (See
| |
| ([http://www <cite> data,model</cite>]) for an explanation of the information implied
| |
| by the data model, and for several other DDS examples).
| |
|
| |
| \begin{figure}[htbp]
| |
| \W
| |
| <pre>
| |
| Dataset {
| |
|
| |
| Int32 catalog_number;
| |
|
| |
| Sequence {
| |
|
| |
| String experimenter;
| |
|
| |
| Int32 time;
| |
|
| |
| Structure {
| |
|
| |
| Float64 latitude;
| |
|
| |
| Float64 longitude;
| |
|
| |
| } location;
| |
|
| |
| Sequence {
| |
|
| |
| Float64 depth;
| |
|
| |
| Float64 salinity;
| |
|
| |
| Float64 oxygen;
| |
|
| |
| Float64 temperature;
| |
|
| |
| } cast;
| |
|
| |
| } station;
| |
| } data;
| |
| </pre>
| |
|
| |
|
| |
| \caption{Example Dataset Descriptor Entry.}
| |
| \T
| |
| \end{figure}
| |
|
| |
| When creating a DDS to be kept in an ancillary file, you can use the
| |
| <font color='green'>\#</font> character as a comment indicator. All characters after the
| |
| <font color='green'>\#</font> on a line are ignored.
| |
|
| |
| ===Dataset Attribute Structure===
| |
|
| |
|
| |
| The \new{Dataset Attribute Structure} (DAS) is used to store
| |
| attributes for variables in the dataset. An attribute is any piece of
| |
| information about a variable that the creator wants to bind with that
| |
| variable ''excluding'' the type and shape, which are
| |
| part of the DDS. Attributes can be as simple as error measurements or
| |
| as elaborate as text describing how the data was collected or
| |
| processed\footnote{To define attributes for the entire dataset, create
| |
|
| |
| an entry for a variable with the same name as the dataset.}. In
| |
| principle, attributes are not processed by software, other than to be
| |
| displayed. However, many systems rely on attributes to store extra
| |
| information that is necessary to perform certain manipulations of
| |
| data. In effect, attributes are used to store information that is used
| |
| `by convention' rather than `by design'. OPeNDAP can effectively support
| |
| these conventions by passing the attributes from data set to user
| |
| program via the DAS\@. Of course, OPeNDAP cannot enforce conventions in
| |
| datasets where they were not followed in the first place.
| |
|
| |
| Similarly to the DDS, the actual location of the DAS storage will vary
| |
| from one API to another. Data files created with some APIs will
| |
| contain within themselves attribute information that can be contained
| |
| in the DAS. For
| |
| these APIs, the DAS will be constructed dynamically by the OPeNDAP server
| |
| from data within the files.
| |
|
| |
| Other data access APIs must have attribute information specified in an
| |
| ancillary data file. APIs that contain attribute information
| |
| can have that information enriched by the addition of these ancillary
| |
| attribute files. These files are typically stored in the same
| |
| directory as the data files, and given the same name as the data
| |
| files, appended with <font color='green'>.das</font>.
| |
|
| |
| The syntax for attributes in a DAS is given in
| |
| \tableref{data,tab,DAS}. Every attribute of a variable is a triple:
| |
| attribute name, type and value. Note that the attributes specified
| |
| using the DAS are different from the information contained in the
| |
| DDS\@. Each attribute is completely distinct from the name, type,
| |
| and value of its associated variable. The name of an
| |
| attribute is an identifier, following the normal rules for an
| |
| identifier in a programming language with the addition that the `/'
| |
| character may be used. The type of an attribute may be one of:
| |
| \class{Byte}, \class{Int32}, \class{UInt32}, \class{Float64},
| |
| \class{String} or \class{Url}. An attribute may be scalar or vector.
| |
| In the latter case the values of the vector are separated by commas
| |
| (,) in the textual representation of the DAS\@.
| |
|
| |
|
| |
| {Dataset Attribute Structure Syntax}
| |
|
| |
|
| |
| {| border="1"
| |
| |+
| |
| ! ''DAS'' !! <font color='green'>Attributes </font>(var-attr-list)
| |
| |-
| |
| | ''var-attr-list''
| |
| || ''var-attr''
| |
| |-
| |
| |
| |
| || ''var-attr-list'' ''var-attr''
| |
| |-
| |
| |
| |
| || (empty list)
| |
| |-
| |
| | ''var-attr''
| |
| || ''variable'' (attr-list)
| |
| |-
| |
| |
| |
| || ''container'' (''var-attr-list'')
| |
| |-
| |
| |
| |
| || ''global-attr''
| |
| |-
| |
| |
| |
| || ''alias''
| |
| |-
| |
| | ''global-attr''
| |
| || <font color='green'>Global</font> ''variable'' (''attr-list'')
| |
| |-
| |
| |''attr-list''
| |
| || ''attr-triple;''
| |
| |-
| |
| |
| |
| || ''attr-list'' ''attr-triple''
| |
| |-
| |
| |
| |
| || ''(empty list)''
| |
| |-
| |
| |''attr-triple''
| |
| || ''attr-type'' ''attribute'' ''attr-val-vec''
| |
| |-
| |
| |''attr-val-vec''
| |
| ||''attr-val''
| |
| |-
| |
| |
| |
| || ''attr-val-vec'' ''attr-val''
| |
| |-
| |
| |''attr-val''
| |
| ||''numeric value''
| |
| |-
| |
| |
| |
| ||''variable''
| |
| |-
| |
| |
| |
| ||''string''
| |
| |-
| |
| | ''attr-type''
| |
| || <font color='green'>Byte</font>
| |
| |-
| |
| |
| |
| || <font color='green'>Int32</font>
| |
| |-
| |
| |
| |
| || <font color='green'>UInt32</font>
| |
| |-
| |
| |
| |
| || <font color='green'>Float64</font>
| |
| |-
| |
| |
| |
| || <font color='green'>String</font>
| |
| |-
| |
| |
| |
| || <font color='green'>Url</font>
| |
| |-
| |
| |''alias''
| |
| || <font color='green'>Alias</font> ''alias-name'' ''variable;''
| |
| |-
| |
| | ''variable''
| |
| || user-chosen variable name
| |
| |-
| |
| |''attribute''
| |
| || user-chosen attribute name
| |
| |-
| |
| | ''container''
| |
| || user-chosen container name
| |
| |-
| |
| |''alias-name''
| |
| || user-chosen alias name
| |
|
| |
| |}
| |
|
| |
| When creating a DAS to be kept in an ancillary file, you can use the
| |
| <font color='green'>\#</font> character as a comment indicator. All characters after the
| |
| <font color='green'>\#</font> on a line are ignored.
| |
|
| |
| ====Containers====
| |
|
| |
| An attribute can contain another attribute, or set of attributes.
| |
| This is roughly comparable to the way compound variables can contain
| |
| other variables in the DDS. The container defines a new lexical scope
| |
| for the attributes it contains\footnote{Containers, aliases, and
| |
|
| |
| global attributes were introduced into OPeNDAP at version 2.16. In
| |
|
| |
| early OPeNDAP releases, the DAS was ''not'' a hierarchical
| |
|
| |
| structure; it was similar to a flat-file database. Although using
| |
|
| |
| the new structure is strongly recommended for new code, old code
| |
|
| |
| will still work with the old DAS. See [http://www.opendap.org/support/docs.html/api/pguide-html/<cite>The OPeNDAP Programmer's Guide</cite>]ref for a description
| |
|
| |
| of the changes made to the \class{AttrTable} class.}.
| |
|
| |
| Consider the following example:
| |
|
| |
| \begin{figure}[h]
| |
| \begin{vcode}{cb}
| |
| Attributes {
| |
|
| |
| Bill {
| |
|
| |
| String LastName "Evans";
| |
|
| |
| Byte Age 53;
| |
|
| |
| String DaughterName "Matilda";
| |
|
| |
| Matilda {
| |
|
| |
| String LastName "Fink";
| |
|
| |
| Byte Age 26;
| |
|
| |
| }
| |
|
| |
| }
| |
| }
| |
| </pre>
| |
| \caption{An Example of Attribute Containers}
| |
|
| |
|
| |
| \end{figure}
| |
|
| |
| \noindent
| |
| Here, the attribute <font color='green'>Bill.LastName</font> would be associated with the
| |
| string "Evans", and <font color='green'>Bill.Age</font> with the number 53. However, the
| |
| attribute <font color='green'>Bill.Matilda.LastName</font> would be associated with the
| |
| string "Fink" and <font color='green'>Bill.Matilda.Age</font> with the number 26.
| |
|
| |
| Using container attributes as above, you can construct a DAS that
| |
| exactly mirrors the construction of a DDS that uses compound data
| |
| types, like \class{Structure} and \class{Sequence}. Note that though
| |
| the <font color='green'>Bill</font> attribute is a container, it has attributes of its own,
| |
| as well. This exactly corresponds to the situation where, for
| |
| example, a \class{Sequence} would have attributes belonging to it, as
| |
| well as attributes for each of its member variables. Suppose the
| |
| sequence represented a single time series of measurements, where
| |
| several different data types are measured at each time. The sequence
| |
| attributes might be the time and location of the measurements, and the
| |
| individual variables might have attributes describing the method or
| |
| accuracy of that measurement.
| |
|
| |
| ====Aliases====
| |
|
| |
| Building on the previous example, it might be true that it would be
| |
| convenient to refer to Matilda without prefixing every reference with
| |
| <font color='green'>Bill</font>. In this case, we can define an \new{alias} attribute
| |
|
| |
| as follows:
| |
|
| |
| \begin{figure}[h]
| |
| \begin{vcode}{cb}
| |
| Attributes {
| |
|
| |
| Bill {
| |
|
| |
| String LastName "Evans";
| |
|
| |
| Byte Age 53;
| |
|
| |
| String DaughterName "Matilda";
| |
|
| |
| Matilda {
| |
|
| |
| String LastName "Fink";
| |
|
| |
| Byte Age 26;
| |
|
| |
| }
| |
|
| |
| }
| |
|
| |
| Alias Matilda Bill.Matilda;
| |
| }
| |
| </pre>
| |
|
| |
| \caption{An Example of Attribute Alias}
| |
|
| |
|
| |
| \end{figure}
| |
|
| |
| \noindent
| |
| By defining an equivalence between the alias <font color='green'>Matilda</font> and the
| |
| original attribute <font color='green'>Bill.Matilda</font>, the string <font color='green'>Matilda.Age</font>
| |
| can be used with or without the prefix <font color='green'>Bill</font>. In either case,
| |
| the attribute value will be 26.
| |
|
| |
| ====Global Attributes====
| |
|
| |
| A \new{global attribute} is not bound to a
| |
| particular identifier in a dataset; these attributes are stored in one
| |
| or more containers with the name <font color='green'>Global</font> or ending with
| |
| <font color='green'>_Global</font>. Global attributes are used to describe attributes of
| |
| an entire dataset. For example, a global attribute might contain the
| |
| name of the satellite or ship from which the data was collected.
| |
| Here's an example:
| |
|
| |
| \begin{figure}[h]
| |
| \begin{vcode}{cb}
| |
| Attributes {
| |
|
| |
| Bill {
| |
|
| |
| String LastName "Evans";
| |
|
| |
| Byte Age 53;
| |
|
| |
| String DaughterName "Matilda";
| |
|
| |
| Matilda {
| |
|
| |
| String LastName "Fink";
| |
|
| |
| Byte Age 26;
| |
|
| |
| }
| |
|
| |
| }
| |
|
| |
| Alias Matilda Bill.Matilda;
| |
|
| |
| Global {
| |
|
| |
| String Name "FamilyData";
| |
|
| |
| String DateCompiled "11/17/98";
| |
|
| |
| }
| |
| }
| |
| </pre>
| |
|
| |
| \caption{An Example of Global Attributes}
| |
|
| |
|
| |
| \end{figure}
| |
|
| |
| Global attributes can be used to define a certain view of a dataset.
| |
| For example, consider the following DAS:
| |
|
| |
| \begin{figure}[h]
| |
| \begin{vcode}{cb}
| |
| Attributes {
| |
|
| |
| CTD {
| |
|
| |
| String Ship "Oceanus";
| |
|
| |
| Temp {
| |
|
| |
| String Name "Temperature";
| |
|
| |
| }
| |
|
| |
| Salt {
| |
|
| |
| String Name "Salinity";
| |
|
| |
| }
| |
|
| |
| }
| |
|
| |
| Global {
| |
|
| |
| String Names "OPeNDAP";
| |
|
| |
| }
| |
|
| |
| FNO_Global {
| |
|
| |
| String Names "FNO";
| |
|
| |
| CTD {
| |
|
| |
| Temp {
| |
|
| |
| String FNOName "TEMPERATURE";
| |
|
| |
| }
| |
|
| |
| Salinity {
| |
|
| |
| String FNOName "SALINITY";
| |
|
| |
| }
| |
|
| |
| }
| |
|
| |
| Alias T CTD.Temp;
| |
|
| |
| Alias S CTD.Salt;
| |
|
| |
| }
| |
| }
| |
| </pre>
| |
|
| |
| \caption{An Example of Global Attributes In Use}
| |
|
| |
|
| |
| \end{figure}
| |
|
| |
| Here, a dataset contains temperature and salinity measurements. To
| |
| aid processing of this dataset by some OPeNDAP client, long names are
| |
| supplied for the <font color='green'>Temp</font> and <font color='green'>Salt</font> variables. However, a
| |
| different client (FNO) spells variable names differently. Since it is
| |
| seldom practical to come up with general-purpose translation
| |
| tables\footnote{"Temperature" can be spelled "T", "Temp",
| |
|
| |
| "TEMPERATURE", "TEMP", and so on. Worse, "T" is also commonly
| |
|
| |
| used for "Time."}, the dataset administrator has chosen to include
| |
| these synonyms under the <font color='green'>FNO_Global</font> attributes, as a convenience
| |
| to those users.
| |
|
| |
| Similar conveniences can be provided using the Alias feature. In the
| |
| example in [[Image:fig,das,global-use]], the temperature variable
| |
| can be referred to as <font color='green'>FNO_Global.T</font> if desired. That is, a
| |
| global alias can provide a client with a known attribute name to query
| |
| for some property, even if that attribute name is not an integral part
| |
| of the dataset.
| |
|
| |
| Using global attributes, a dataset or catalog administrator can create
| |
| a layer of aliases and attributes to make OPeNDAP datasets conform to
| |
| several different dataset naming standards. This becomes significant
| |
| when trying to compile an OPeNDAP dataset database.
| |
Data Analysis with OPeNDAP
The OPeNDAP software is not only a data transport mechanism. Using OPeNDAP,
you can subsample the data you are looking at. That is, you can
request an entire data file, or just a small piece of it.
Selecting Data: Using Constraint Expressions
The URL such as this one:
http://dods.gso.uri.edu/cgi-bin/nph-nc/data/buoys.nc
refers to the entire
dataset contained in the buoys.nc file. A user may, however, choose
to sample the dataset simply by modifying the submitted URL. The
\new{constraint expression} attached to the URL directs that the data
set specified by the first part of the URL be sampled to select only
the data of interest from a dataset even for programs that do not
have a built-in way to accomplish such selections. This can vastly
reduce the amount of data a program needs to process, and reduce the
network load of transmitting that data to the client.
Constraint Expression Syntax
A constraint expression is appended to the target URL following a
question mark, as in the following examples:
http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp
http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp[1,100,5]
http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?u&lat>15.0
http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?cast.02<15.0
http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?station&station.temp<15.0
A constraint expression consists of two parts: a \new{projection}
,
separated by an ampersand
(\&). Either part may contain several sub-expressions. Either
part may be present, or both.
\begin{center}
\end{center}
A projection is simply a comma-separated list of the variables
that are to be returned to the client. If an array is to be
subsampled, the projection specifies the manner in which the sampling
is to be done. If the selection is omitted, all the variables in the
projection list are returned. If the projection is omitted, the entire
dataset is returned, subject to the evaluation of the selection
expression. The projection can also include functional expressions of
the form:
\begin{center}
\end{center}
\noindent
where the arguments are variables from the dataset, scalar values, or
other functions.
A simple selection expression is a boolean expression of the form
\begin{center}
"variable operator variable"
or
"variable operator value"
or
"function()"
\end{center}
Where
- \var{operator}
- can be one of the relational operators listed in
\tableref{opd-client,tab,cons-ops} on
opd-client,tab,cons-ops;
- \var{variable}
- can be any variable recorded in the dataset;
- \var{value}
- can be any scalar, string, function, or list of
numbers (Lists are denoted by comma-separated items enclosed in
curly braces ,for example, \{3,11,4.5\}.); and
- \var{function}
- is a function defined by the server to operate
on variables or values, and to return a boolean value (See
( opd-client,function)).
Each selection clause begins with an ampersand (\&) representing
the "AND" boolean operation\footnote{The "OR" function may be
implemented with a list. For example, to say that "i" must
equal 3 OR 11 you would write "i} = \{3,11\"The clause evaluates
to true when \var{i} equals any one of the elements.}.
The \& is actually a prefix operator, not an infix
operator. That is, it must appear at the beginning of each
selection clause, no matter what. This means that a constraint
expression that contains no projection clause must still have an
\& in front of the first selection clause.
There is no limit on the number of selection clauses that can be
combined to create a compound constraint expression. Data that
produces a true (non-zero) value for the entire selection expression
will be included in the data returned to the client by the server. If
only a part of some data structure, such as a \class{Sequence},
satisfies the selection criteria, then only that part will be
returned.
Due to the differences in data model paradigms, selection is not
implemented for the OPeNDAP array data types, such as \class{Grid} or
\class{Array}. However, many OPeNDAP servers implement selection
functions you can use for the same effect. You can query the server
for the functions it implements with the usage service outlined in
( opd-client,function).
Simple Constraint Expression Examples
Consider the data descriptor in File:Opd-client,fig,dds. The
figure is an example of the Data Descriptor Structure \indc{Data Descriptor
Structure!example} , one of the messages returned by an OPeNDAP server in response to a query about some dataset. The full syntax
description for this structure is given in ( data,ancillary). For
the moment, it is only important that it is the description of a dataset
containing station data including temperature, oxygen, and salinity. Each
station also contains 20 oxygen data points, taken at 20 fixed depths, used
for calibration of the data.
The following URL will return only the pressure and temperature pairs
of this dataset. (Note that the constraint expression parser removes
all spaces, tabs, and newline characters before the expression is
parsed.) There is only a projection clause, without a selection, in
this constraint expression\footnote{For the sake of clarity, this and
several of the following constraint expression examples span
multiple lines. While the constraint expression evaluator ignores
newline characters, program limitations of the OPeNDAP client will
likely prevent a user from typing a newline in a constraint
expression.}.
\begin{figure}[htbf]
Dataset {
Sequence{
Int32 day;
Int32 month;
Int32 year;
Float64 lat;
Float64 lon;
Float64 O2cal[20];
Sequence{
Float64 press;
Float64 temp;
Float64 O2;
Float64 salt;
} cast;
String comments;
} station;
} arabian-sea;
\caption{Sample Data Descriptor}
\end{figure}
http://oceans.edu/cgi/nph-jg/exp1O2/cruise?station.cast.press,
station.cast.temp
Incidentally, we have assumed that the dataset was stored in the
JGOFS format\footnote{Because it contains an array, the dataset
pictured in Figure~(opd-client,fig,dds) is technically not a
valid JGOFS dataset. We have included the array for pedagogical
purposes, and hope that the JGOFS purists will forgive us.} on the
remote host oceans.edu, in a file called explO2/cruise.
For the sake of brevity, from here on we will omit the first part of
the URL, to concentrate on the constraint expression alone.
If we only want to see pressure and temperature pairs below 500 meters
deep, we can modify the constraint expression by adding a selection
clause.
?station.cast.press,station.cast.temp&station.cast.press>500.0
In order to retrieve all of each cast that has any temperature reading
greater than 22 degrees, use the following:
?station.cast&station.cast.temp>22.0
Simple constraint expressions may be combined into compound
expressions with logical AND (\&). To retrieve all
stations west of 60 degrees West and north of the equator:
\indc{constraint
expression!boolean functions}
?station&station.lat>0.0&station.lon<-60.0
As was mentioned, the logical OR can be implemented using a list
of scalars. The following expression will select only stations taken
north of the equator in April, May, June, or July.
?station&station.lat>0.0&station.month={4,5,6,7}
If our dataset contained a field called monsoon-month,
indicating the month in which monsoons happened that year, we could
modify the last example search to include those months as follows:
?station&station.lat>O.O
&station.month={4,5,6,7,station.monsoon-month}
In other words, a list can contain both values and other variables. If
monsoon-month was itself a list of months, a search could be written
as:
?station&station.lat>0.0&station.month=station.monsoon-month
For arrays
and grids, there is a special way to select data within the projection
clause. Suppose we want to see only the first five oxygen calibration
points for each station. The constraint expression for this would be:
?station.02cal[0:4]
By specifying a \new{stride} value, we can also select a
\new{hyperslab} of the oxygen calibration array:
?station.02cal[0:5:19]
This expression will return every fifth member of the 02cal
array. In other words, the result will be a four-element array
containing only the first, sixth, eleventh, and sixteenth members of
the 02cal array. Each dimension of a multi-dimensional arrays
may be subsampled in an analogous way. The return value is an array of
the same number of dimensions as the sampled array, with each
dimension size equal to the number of elements selected from it.
Operators, Special Functions, and Data Types
The data types accessible through the OPeNDAP software are listed and
described in ( data,types). It is advisable to be familiar
with these types before trying to construct complex constraint
expressions.
The constraint expression syntax defines a number of operators for each
data type. These operators are listed in \tableref{opd-client,tab,cons-ops}
Except for the operation defined on the URL data
type, all the operators defined for the scalar base types are boolean
operators whose result depends on the specified comparison between its
arguments. Refer to ( opd-client,CE,url) for a description
of the URL data type and its operator.
The \math[\~{}=]{\sim =} operator returns true when the character string
on the left of the operator matches the regular expression on the
right. See ( opd-client,CE,regex) for a discussion of
regular expressions.
The \class{Structure}, \class{Sequence}, and \class{Grid} data types
are each composed of a collection of simpler data types. The .
and operators allow a user to refer to the subsidiary variables within
these compound types. For example, station.year indicates the
value of the year member of the station sequence.
The array operator is used to subsample the given array.
See opd-client,array-op for an explanation and example of
its use.
\begin{table}[htbp]
\caption{Constraint Expression Operators\@.}
\begin{center}
\begin{tabular}{|p{0.75in}|p{2in}|} \hline
\tblhd{Class} & \tblhd{Operators}
\hline \hline
\multicolumn{2}{|c|}"Simple Types\/"
\hline
\class{Byte}, \class{Int32}, \class{UInt32}, \class{Float64} & < > = != <= >=
\hline
\class{String} & = != \math[\~{=}]{\sim =}
\hline
\class{URL} & *
\hline
\multicolumn{2}{|c|}"Compound Types\/"
\hline
\class{Array} & [start:stop] [start:stride:stop]
\hline
\class{List} & length("list), nth({\em list,n}), member({\em list,elem})"
\hline
\class{Structure} & .
\hline
\class{Sequence} & .
\hline
\class{Grid} & [start:stop] [start:stride:stop] .
\hline
\end{tabular}
\end{center}
\end{table}
There are three special functions defined to operate on the
\class{List} data type. The length() function returns the
number of elements in the given list, the "nth()" function
returns the list element indicated by the input index, and the
"member()" function, which returns true if the given value
equals any member of the list. Note that the behavior of the
nth() function is undefined for indices beyond the range of the
list.
Using Functions in a Constraint Expression
An OPeNDAP data server may define its own set of
functions that may be used in a constraint expression. For example,
the data server containing the example data from
File:Opd-client,fig,dds might define a sigma1() function
to return the density of the water at the given temperature, salinity
and pressure. A query like the following would return all the stations
containing water samples whose density exceeded 1.0275"".
?station.cast&sigma1(station.cast.temp,
station.cast.salt,
station.cast.press)>27.5
Functions like this one are not a standard part of the OPeNDAP
architecture, and may vary from one server to another. A user may
query a server for a list of such functions by sending a URL ending with
".info". For example, you can query the data server installed on the
OPeNDAP home site with the following URL:
http://dods.gso.uri.edu/cgi-bin/nph-nc/fnoc1.nc.info
The data returned will be an HTML message, readable with a standard
web browser, containing documentation of the server running on the
given site, and the data named in the URL. In this case, you will
learn that the specified server defines two functions that can be used
in a constraint expression:
\item[geolocate(\var{variable}, \var{lat1}, \var{lat2}, \var{lon1},
\var{lon2})]
Returns the elements of \var{variable} that fall
within the box created by (\var{lat1},\var{lon1}) and
(\var{lat2},\var{lon2}).
- time(\var{variable}, \var{start_time}, \var{stop_time})
Returns the elements of \var{variable} that fall within the time
interval \var{start_time} and \var{stop_time}.
Using URLs in a Constraint Expression
The OPeNDAP data access protocol defines a special data type to handle
distributed data: \class{URL}. This is a scalar data type, much like
the \class{String} type, intended to hold one OPeNDAP URL. It generally
points at some remote dataset or data value. Using this data type, a
constraint expression may make the data returned from one OPeNDAP data
server dependent on data held at an entirely different site.
In order to accommodate this data type, OPeNDAP defines a special
"dereference" . Similar to
its function with pointers in C, applying this operator to a URL
returns the data specified by that URL. The \class{URL} data type
itself contains only a character string. It must be dereferenced to
produce a reference to the data named by the URL.
Examples
The following example will return all the stations containing oxygen
values greater than fifteen:
?station&station.cast.O2>15.0
Similarly, the following constraint expression will yield all the
stations in the dataset whose value is greater than that of the
oxygen value indicated by the URL:
?station&station.cast.O2>*"http://ocean.edu/etc/nc/data?O2MAX"
Finally, suppose that the dataset itself contained a variable of type
\class{URL}, and that this URL contained the address of oxygen data
stored at some other site. The data descriptor for the dataset might
look like the following:
Dataset {
Sequence{
.
.
.
URL O2cal;
.
.
.
} station;
} arabian-sea;
We can now write the previous constraint as:
?station&station.cast.O2>*O2cal
URLs stored in remote datasets may also be used in the projection
clause of the constraint expression. Imagine a dataset that consists
only of a list of URLs for each square degree of latitude and
longitude. A user could query this dataset for the actual list of
URLs, or, by using the * operator, could construct a constraint
expression that would return the actual data indicated by the URLs in
the target dataset.
Pattern Matching with Constraint Expressions
There are three operators defined to compare one \class{String} data
type to another. The = operator returns TRUE if its two input
character strings are identical, and the != operator returns
TRUE if the \class{Strings} do not match. A third operator,
\math[\~{}=]{\sim =} is provided that returns TRUE if the \class{String}
to the left of the operator matches the regular expression in
the \class{String} on the right.
A regular expression is simply a
character string containing wildcard characters that allow it to match
patterns within a longer string. For example, the following constraint
expression might return all the stations on the sample cruise at which
a shark was sighted:
?station&station.comment~=".*shark.*"
Most characters in a
regular expression match themselves. That is, an "f" in a regular
expression matches an "f" in the target string. There are several
special characters, however, that provide more sophisticated
pattern-matching capabilities.
- .
The period matches any single character except a newline.
- * + ?
These are postfix operators, which indicate to try to match the
preceding regular expression repetitively (as many times as
possible). Thus, o* matches any number of o's. The operators differ in that o* also matches zero o's,
o+ matches only a series of one or more o's, and
o? matches only zero or one o.
- `[ ... ]'
Define a "character set," which begins with [ and is
terminated by ]. In the simplest case, the characters between
the two brackets are what this set can match. The expression
[Ss] matches either an upper or lower case s. Brackets can also contain character ranges, so [0-9] matches all the
numerals. If the first character within the brackets is a caret
(\^{ }), the expression will only match characters that do not
appear in the brackets. For example, [\^{ 0-9]*} only matches
character strings that contain no numerals.
- $
These are special characters that match the empty string at the beginning or end of a line.
- \|
These two characters define a logical OR between the largest
possible expression on either side of the operator. So, for
example, the string EndeavorOceanus matches
either Endeavor or Oceanus. The scope of the OR can be contained with the grouping operators, ( and
).
- ( )
These are used to group a series of characters into an expression,
or for the OR function. So, for example,
(abc)* matches zero or more
repetitions of the string abc2.
There are several more special characters and several other features
of the characters described here, but they are beyond the scope of
this guide. The OPeNDAP regular expression syntax is the same as that
used in the Emacs editor. See the documentation for
Emacs~\citel{emacs} for a complete description of all the pattern-
matching capabilities of regular expressions.
Examples
In the above example, a user might wonder whether the shark comments
had been spelled with upper or lower case letters. The following
constraint expression will return any station that mentions a shark in
upper or lower case.
?station&station.comment~=".*\(SHARK\|shark\).*"
Of course, this would miss Shark and sHark and so on. The
constraint could be written this way to catch all odd permutations of
upper and lower case:
?station&station.comment~=".*[Ss][Hh][Aa][Rr][Kk].*"
Optimizing the Query
Using the tools provided by OPeNDAP, a user can build quite elaborate and
sophisticated constraint expressions that will return precisely the
data he or she wishes to examine. However, as the complexity of the
constraint expression increases, so does the time necessary to process
that expression. There are some techniques a user may user to optimize
the evaluation of a constraint that will ease the load on the server,
and provide faster replies to OPeNDAP dataset queries.
The OPeNDAP constraint expression evaluator uses a "\ind{lazy
evaluation}" algorithm. This means that the
sub-clauses of the selection clause are evaluated in order, and
parsing halts when any sub-clause returns FALSE. Consider a constraint
expression that looks like this: \indc{constraint expression!parse
order}
?station&station.cast.O2>15.0&station.cast.temp>22.0
If the server encounters a station with no oxygen values over 15.0, it
does not bother to look at the temperature records at all. The first sub-
clause evaluates FALSE, so the second clause is never even parsed.
A careful user may use this feature to his or her advantage. In the
above example, the order of the clauses does not really matter; there
are the same number of temperature and oxygen measurements at each
station. However, consider the following expression:
?station&station.cast.O2>15.0&station.month={3,4,5}
For each station there is only one month value, while there are many
oxygen values. Passing a constraint expression like this one will
force the server to sort through all the oxygen data for each station
(which could be in the thousands of points), only to throw the data
away when it finds that the month requested does not match the month
value stored in the station data. This would be far better done with
the clauses reversed:
?station&station.month={3,4,5}&station.cast.O2>15.0
This expression will evaluate much more quickly because unwanted
stations may be quickly discarded by the first sub-clause of the
selection. The server will only examine each oxygen value in the
station if it already knows that the station might be worth keeping.
This sort of optimization becomes even more important when one of the
clauses contains a URL. In general, any selection sub-clause
containing a URL should be left to the end of the selection. This way,
the OPeNDAP server will only be forced to go to the network for data if
absolutely necessary to evaluate the constraint expression. \tbd{Are
there other optimization issues besides order?}
A Word About Data Translation
Once a researcher is freed from the
confines of using only local data, he or she will soon discover that
there is a wealth of data available on the Internet, and nearly all of
it is stored in formats incompatible with her own. Worse, the data
formats are often mutually incompatible, rendering the confusion
complete. OPeNDAP provides a solution applicable to a great many such
problems.
When an OPeNDAP server
retrieves data from some distant machine, that data may be in any of
several file formats supported by OPeNDAP. The server translates the
data, however, into an intermediate format for transmission. Upon
receipt of the messages containing data, the OPeNDAP client software
unpacks the data into the form expected by the calling client program
and returns it to that program. Because all data must be translated
into the same intermediate format, OPeNDAP becomes a powerful format
translator for datasets. In effect, this means that a program
designed to read and display JGOFS data can look at the OPeNDAP data
catalog and see everything as JGOFS datasets. A netCDF program can
look at those same datasets, from that same catalog, and think they
are all in netCDF format. This system of translation allows a
researcher to ignore the question of formats and concentrate on the
data alone.
Of course, there are some translations that cannot be done
transparently, if they can be done at all. Consider a two-dimensional
array of satellite sea-surface temperature measurements. Assume the
data is stored in netCDF format on some machine called
satt.uri.edu. The data might be uniquely specified by some URL,
say http://satt.uri.edu/sst/010694.nc. However, were a user to
feed that URL to a JGOFS-originated OPeNDAP client designed to draw
property vs. depth graphs of station data, no translation facility
would be able to map the original data into a form accommodated by the
client program.
The issues of data models and data translation are important ones to
the data provider. These issues are discussed in detail in
( data,trans)