Wiki Testing/OPeNDAPUserGuide4

From OPeNDAP Documentation
Revision as of 10:55, 5 January 2008 by Yuan (talk | contribs) (New page: =Data Analysis with OPeNDAP = The OPeNDAP software is not only a data transport mechanism. Using OPeNDAP, you can subsample the data you are looking at. That is, you can request an entir...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

Data Analysis with OPeNDAP

The OPeNDAP software is not only a data transport mechanism. Using OPeNDAP, you can subsample the data you are looking at. That is, you can request an entire data file, or just a small piece of it.

Selecting Data: Using Constraint Expressions

The URL such as this one:

http://dods.gso.uri.edu/cgi-bin/nph-nc/data/buoys.nc

refers to the entire dataset contained in the buoys.nc file. A user may, however, choose to sample the dataset simply by modifying the submitted URL. The constraint expression attached to the URL directs that the data set specified by the first part of the URL be sampled to select only the data of interest from a dataset even for programs that do not have a built-in way to accomplish such selections. This can vastly reduce the amount of data a program needs to process, and reduce the network load of transmitting that data to the client.

Constraint Expression Syntax

A constraint expression is appended to the target URL following a question mark, as in the following examples:

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp[1,100,5]

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?u&lat>15.0

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?cast.02<15.0
 
 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc
                                   ?station&station.temp<15.0


A constraint expression consists of two parts: a projection and a selection, separated by an ampersand (&). Either part may contain several sub-expressions. Either part may be present, or both.

               proj_1,proj_2,...,proj_n&sel_1&sel_2&...&sel_m

A projection is simply a comma-separated list of the variables that are to be returned to the client. If an array is to be subsampled, the projection specifies the manner in which the sampling is to be done. If the selection is omitted, all the variables in the projection list are returned. If the projection is omitted, the entire dataset is returned, subject to the evaluation of the selection expression. The projection can also include functional expressions of the form:

         function(arg_1,arg_2,...,arg_n)

where the arguments are variables from the dataset, scalar values, or other functions.

A simple selection expression is a boolean expression of the form
variable operator variable
or
variable operator value
or
function(arg_1,arg_2,...,arg_n)

Where

operator

can be one of the relational operators listed in table 4.1.2 on here;

variable

can be any variable recorded in the dataset;

value

can be any scalar, string, function, or list of numbers (Lists are denoted by comma-separated items enclosed in curly braces ,for example, {3,11,4.5}.); and

function
   is a function defined by the server to operate on variables or values, and to return a boolean value (See Section 4.1.3). 

Each selection clause begins with an ampersand (&) representing the "AND" boolean operation10.

NOTE: The & is actually a prefix operator, not an infix operator. That is, it must appear at the beginning of each selection clause, no matter what. This means that a constraint expression that contains no projection clause must still have an & in front of the first selection clause.

There is no limit on the number of selection clauses that can be combined to create a compound constraint expression. Data that produces a true (non-zero) value for the entire selection expression will be included in the data returned to the client by the server. If only a part of some data structure, such as a Sequence, satisfies the selection criteria, then only that part will be returned.

NOTE: Due to the differences in data model paradigms, selection is not implemented for the OPeNDAP array data types, such as Grid or Array. However, many OPeNDAP servers implement selection functions you can use for the same effect. You can query the server for the functions it implements with the usage service outlined in [Wiki_Testing/OPeNDAPUserGuide4|Section 4.1.3]].

Simple Constraint Expression Examples

Consider the data descriptor in figure 4.1.1. The figure is an example of the Data Descriptor Structure , one of the messages returned by an OPeNDAP server in response to a query about some dataset. The full syntax description for this structure is given in Section 6.4. For the moment, it is only important that it is the description of a dataset containing station data including temperature, oxygen, and salinity. Each station also contains 20 oxygen data points, taken at 20 fixed depths, used for calibration of the data.

The following URL will return only the pressure and temperature pairs of this dataset. (Note that the constraint expression parser removes all spaces, tabs, and newline characters before the expression is parsed.) There is only a projection clause, without a selection, in this constraint expression11.

Dataset {
   Sequence{
      Int32 day;
      Int32 month;
      Int32 year;
      Float64 lat;
      Float64 lon;
      Float64 O2cal[20];
      Sequence{
         Float64 press;
         Float64 temp;
         Float64 O2;
         Float64 salt;
      } cast;
      String comments;
   } station;
} arabian-sea;

Sample Data Descriptor

http://oceans.edu/cgi/nph-jg/exp1O2/cruise?station.cast.press,
                                           station.cast.temp

Incidentally, we have assumed that the dataset was stored in the JGOFS format12 on the remote host oceans.edu, in a file called explO2/cruise. For the sake of brevity, from here on we will omit the first part of the URL, to concentrate on the constraint expression alone.

If we only want to see pressure and temperature pairs below 500 meters deep, we can modify the constraint expression by adding a selection clause.

?station.cast.press,station.cast.temp&station.cast.press>500.0

In order to retrieve all of each cast that has any temperature reading greater than 22 degrees, use the following:

?station.cast&station.cast.temp>22.0

Simple constraint expressions may be combined into compound expressions with logical AND (& ). To retrieve all stations west of 60 degrees West and north of the equator:

?station&station.lat>0.0&station.lon<-60.0

As was mentioned, the logical OR can be implemented using a list of scalars. The following expression will select only stations taken north of the equator in April, May, June, or July.

?station&station.lat>0.0&station.month={4,5,6,7}

If our dataset contained a field called monsoon-month, indicating the month in which monsoons happened that year, we could modify the last example search to include those months as follows:

?station&station.lat>O.O
        &station.month={4,5,6,7,station.monsoon-month}

In other words, a list can contain both values and other variables. If monsoon-month was itself a list of months, a search could be written as:

?station&station.lat>0.0&station.month=station.monsoon-month

For arrays and grids, there is a special way to select data within the projection clause. Suppose we want to see only the first five oxygen calibration points for each station. The constraint expression for this would be:

?station.02cal[0:4]


By specifying a stride value, we can also select a hyperslab of the oxygen calibration array:

?station.02cal[0:5:19]

This expression will return every fifth member of the 02cal array. In other words, the result will be a four-element array containing only the first, sixth, eleventh, and sixteenth members of the 02cal array. Each dimension of a multi-dimensional arrays may be subsampled in an analogous way. The return value is an array of the same number of dimensions as the sampled array, with each dimension size equal to the number of elements selected from it.


Operators, Special Functions, and Data Types