Wiki Testing/OPeNDAPUserGuide4: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 210: Line 210:




<center>
</center>
 
 
There are three special functions defined to operate on the List data type. The length() function returns the number of elements in the given list, the nth()  function returns the list element indicated by the input index, and the member()  function, which returns true if the given value equals any member of the list. Note that the behavior of the nth() function is undefined for indices beyond the range of the list.
There are three special functions defined to operate on the List data type. The length() function returns the number of elements in the given list, the nth()  function returns the list element indicated by the input index, and the member()  function, which returns true if the given value equals any member of the list. Note that the behavior of the nth() function is undefined for indices beyond the range of the list.
=== Using Functions in a Constraint Expression ===
An OPeNDAP data server may define its own set of functions that may be used in a constraint expression. For example, the data server containing the example data from figure 4.1.1 might define a sigma1() function to return the density of the water at the given temperature, salinity and pressure. A query like the following would return all the stations containing water samples whose density exceeded 1.0275g/cm3.
<pre>
?station.cast&sigma1(station.cast.temp,
                    station.cast.salt,
                    station.cast.press)>27.5
</pre>
Functions like this one are not a standard part of the OPeNDAP architecture, and may vary from one server to another. A user may query a server for a list of such functions by sending a URL ending with ".info". For example, you can query the data server installed on the OPeNDAP home site with the following URL:
<pre>
  http://dods.gso.uri.edu/cgi-bin/nph-nc/fnoc1.nc.info
</pre>
The data returned will be an HTML message, readable with a standard web browser, containing documentation of the server running on the given site, and the data named in the URL. In this case, you will learn that the specified server defines two functions that can be used in a constraint expression:
;geolocate(variable, lat1, lat2, lon1, lon2)
    Returns the elements of variable that fall within the box created by (lat1,lon1) and (lat2,lon2).
;time(variable, start_time, stop_time)
    Returns the elements of variable that fall within the time interval start_time and stop_time.

Revision as of 11:03, 5 January 2008

Data Analysis with OPeNDAP

The OPeNDAP software is not only a data transport mechanism. Using OPeNDAP, you can subsample the data you are looking at. That is, you can request an entire data file, or just a small piece of it.

Selecting Data: Using Constraint Expressions

The URL such as this one:

http://dods.gso.uri.edu/cgi-bin/nph-nc/data/buoys.nc

refers to the entire dataset contained in the buoys.nc file. A user may, however, choose to sample the dataset simply by modifying the submitted URL. The constraint expression attached to the URL directs that the data set specified by the first part of the URL be sampled to select only the data of interest from a dataset even for programs that do not have a built-in way to accomplish such selections. This can vastly reduce the amount of data a program needs to process, and reduce the network load of transmitting that data to the client.

Constraint Expression Syntax

A constraint expression is appended to the target URL following a question mark, as in the following examples:

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?temp[1,100,5]

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?u&lat>15.0

 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc?cast.02<15.0
 
 http://oceans.univ.edu/cgi-bin/nc/expl/buoys.nc
                                   ?station&station.temp<15.0


A constraint expression consists of two parts: a projection and a selection, separated by an ampersand (&). Either part may contain several sub-expressions. Either part may be present, or both.

               proj_1,proj_2,...,proj_n&sel_1&sel_2&...&sel_m

A projection is simply a comma-separated list of the variables that are to be returned to the client. If an array is to be subsampled, the projection specifies the manner in which the sampling is to be done. If the selection is omitted, all the variables in the projection list are returned. If the projection is omitted, the entire dataset is returned, subject to the evaluation of the selection expression. The projection can also include functional expressions of the form:

         function(arg_1,arg_2,...,arg_n)

where the arguments are variables from the dataset, scalar values, or other functions.

A simple selection expression is a boolean expression of the form
variable operator variable
or
variable operator value
or
function(arg_1,arg_2,...,arg_n)

Where

operator

can be one of the relational operators listed in table 4.1.2 on here;

variable

can be any variable recorded in the dataset;

value

can be any scalar, string, function, or list of numbers (Lists are denoted by comma-separated items enclosed in curly braces ,for example, {3,11,4.5}.); and

function
   is a function defined by the server to operate on variables or values, and to return a boolean value (See Section 4.1.3). 

Each selection clause begins with an ampersand (&) representing the "AND" boolean operation10.

NOTE: The & is actually a prefix operator, not an infix operator. That is, it must appear at the beginning of each selection clause, no matter what. This means that a constraint expression that contains no projection clause must still have an & in front of the first selection clause.

There is no limit on the number of selection clauses that can be combined to create a compound constraint expression. Data that produces a true (non-zero) value for the entire selection expression will be included in the data returned to the client by the server. If only a part of some data structure, such as a Sequence, satisfies the selection criteria, then only that part will be returned.

NOTE: Due to the differences in data model paradigms, selection is not implemented for the OPeNDAP array data types, such as Grid or Array. However, many OPeNDAP servers implement selection functions you can use for the same effect. You can query the server for the functions it implements with the usage service outlined in [Wiki_Testing/OPeNDAPUserGuide4|Section 4.1.3]].

Simple Constraint Expression Examples

Consider the data descriptor in figure 4.1.1. The figure is an example of the Data Descriptor Structure , one of the messages returned by an OPeNDAP server in response to a query about some dataset. The full syntax description for this structure is given in Section 6.4. For the moment, it is only important that it is the description of a dataset containing station data including temperature, oxygen, and salinity. Each station also contains 20 oxygen data points, taken at 20 fixed depths, used for calibration of the data.

The following URL will return only the pressure and temperature pairs of this dataset. (Note that the constraint expression parser removes all spaces, tabs, and newline characters before the expression is parsed.) There is only a projection clause, without a selection, in this constraint expression11.

Dataset {
   Sequence{
      Int32 day;
      Int32 month;
      Int32 year;
      Float64 lat;
      Float64 lon;
      Float64 O2cal[20];
      Sequence{
         Float64 press;
         Float64 temp;
         Float64 O2;
         Float64 salt;
      } cast;
      String comments;
   } station;
} arabian-sea;

Sample Data Descriptor

http://oceans.edu/cgi/nph-jg/exp1O2/cruise?station.cast.press,
                                           station.cast.temp

Incidentally, we have assumed that the dataset was stored in the JGOFS format12 on the remote host oceans.edu, in a file called explO2/cruise. For the sake of brevity, from here on we will omit the first part of the URL, to concentrate on the constraint expression alone.

If we only want to see pressure and temperature pairs below 500 meters deep, we can modify the constraint expression by adding a selection clause.

?station.cast.press,station.cast.temp&station.cast.press>500.0

In order to retrieve all of each cast that has any temperature reading greater than 22 degrees, use the following:

?station.cast&station.cast.temp>22.0

Simple constraint expressions may be combined into compound expressions with logical AND (& ). To retrieve all stations west of 60 degrees West and north of the equator:

?station&station.lat>0.0&station.lon<-60.0

As was mentioned, the logical OR can be implemented using a list of scalars. The following expression will select only stations taken north of the equator in April, May, June, or July.

?station&station.lat>0.0&station.month={4,5,6,7}

If our dataset contained a field called monsoon-month, indicating the month in which monsoons happened that year, we could modify the last example search to include those months as follows:

?station&station.lat>O.O
        &station.month={4,5,6,7,station.monsoon-month}

In other words, a list can contain both values and other variables. If monsoon-month was itself a list of months, a search could be written as:

?station&station.lat>0.0&station.month=station.monsoon-month

For arrays and grids, there is a special way to select data within the projection clause. Suppose we want to see only the first five oxygen calibration points for each station. The constraint expression for this would be:

?station.02cal[0:4]


By specifying a stride value, we can also select a hyperslab of the oxygen calibration array:

?station.02cal[0:5:19]

This expression will return every fifth member of the 02cal array. In other words, the result will be a four-element array containing only the first, sixth, eleventh, and sixteenth members of the 02cal array. Each dimension of a multi-dimensional arrays may be subsampled in an analogous way. The return value is an array of the same number of dimensions as the sampled array, with each dimension size equal to the number of elements selected from it.


Operators, Special Functions, and Data Types

The data types accessible through the OPeNDAP software are listed and described in Section 6.3. It is advisable to be familiar with these types before trying to construct complex constraint expressions.

The constraint expression syntax defines a number of operators for each data type. These operators are listed in table 4.1.2

Except for the * operation defined on the URL data type, all the operators defined for the scalar base types are boolean operators whose result depends on the specified comparison between its arguments. Refer to Section 4.1.4 for a description of the URL data type and its operator.

The ~= operator returns true when the character string on the left of the operator matches the regular expression on the right. See Section 4.1.5 for a discussion of regular expressions.

The Structure, Sequence, and Grid data types are each composed of a collection of simpler data types. The . and operators allow a user to refer to the subsidiary variables within these compound types. For example, station.year indicates the value of the year member of the station sequence.

The array operator [] is used to subsample the given array. See here for an explanation and example of its use.

Constraint Expression Operators.

Class Operators
Simple Types
Byte, Int32, UInt32, Float64 < > = != <= >=
String = != ~=
URL *
Compound Types
Array [start:stop] [start:stride:stop]
List length(list), nth(list,n), member(list,elem)
Structure *
Sequence *
Grid [start:stop] [start:stride:stop] .



There are three special functions defined to operate on the List data type. The length() function returns the number of elements in the given list, the nth() function returns the list element indicated by the input index, and the member() function, which returns true if the given value equals any member of the list. Note that the behavior of the nth() function is undefined for indices beyond the range of the list.


Using Functions in a Constraint Expression

An OPeNDAP data server may define its own set of functions that may be used in a constraint expression. For example, the data server containing the example data from figure 4.1.1 might define a sigma1() function to return the density of the water at the given temperature, salinity and pressure. A query like the following would return all the stations containing water samples whose density exceeded 1.0275g/cm3.

?station.cast&sigma1(station.cast.temp,
                     station.cast.salt,
                     station.cast.press)>27.5

Functions like this one are not a standard part of the OPeNDAP architecture, and may vary from one server to another. A user may query a server for a list of such functions by sending a URL ending with ".info". For example, you can query the data server installed on the OPeNDAP home site with the following URL:

  http://dods.gso.uri.edu/cgi-bin/nph-nc/fnoc1.nc.info


The data returned will be an HTML message, readable with a standard web browser, containing documentation of the server running on the given site, and the data named in the URL. In this case, you will learn that the specified server defines two functions that can be used in a constraint expression:

geolocate(variable, lat1, lat2, lon1, lon2)
   Returns the elements of variable that fall within the box created by (lat1,lon1) and (lat2,lon2). 
time(variable, start_time, stop_time)
   Returns the elements of variable that fall within the time interval start_time and stop_time.