Server Side Processing Functions: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 223: Line 223:


=== bbox() ===
=== bbox() ===
 
Brief: Return the bounding box for an array
Return the bounding box for an array


Given an N-dimensional Array of simple numeric types and two
Given an N-dimensional Array of simple numeric types and two

Revision as of 22:04, 27 April 2015

Server functions and composition

Server-side functions provide a way to access the processing power of the data server and perform operations that fall outside the scope of the DAP constraint mechanism of projection and selection. Each server can load functions at run-time, so the set of functions supported may be different than those documented here. Use the version() function to get a list of functions supported by a particular server. To get information about a particular function, call that function with no arguments. The 'help' response from both version() and a function such as linear_scale() is a simple XML document listing the function's name, version and URL to more complete documentation.

All the functions here are included in the Hyrax server, version 1.6 and later. Other servers may also support these.

All of these functions can be composed. Thus, the values from the geogrid() function can be used by the linear_scale() function. Here's an example:

 linear_scale(geogrid(SST, 45, -82, 40, -78))  // spaces added for clarity; in reality spaces in server function arg lists are a syntax error

This first subsets the variable SST so only those values in latitude 45 to 40 and longitude -82 to -78 are returned and then passes those values to the linear_scale() function, which will scale them and return those new values to the caller.

Server Functions in the BES functions Module

For Hyrax 1.9, the server functions listed here were moved from libdap, where they were 'hard coded' into the constraint evaluator, to a module that is loaded like the other BES modules. Currently this 'functions' module is part of the BES source code while we decide where it should reside. Also note that make_array(), the #Type special form, bind_name() and bind_shape() are new functions designed to pass large arrays filled with constant values into custom server functions. We will expand on these as part of an NSF-sponsored project in the coming two years.

geogrid()

Version documented: 1.2

The geogrid() function applies a constraint given in latitude and longitude to a DAP Grid variable. The arguments to the function are:

geogrid(grid variable, top, left, bottom, right[, expression ...])
geogrid(grid variable, latitude map, longitude map, top, left, bottom, right[, expression ...])

The grid variable is the data to be sub-sampled and must be a Grid. The optional latitude and longitude maps must be Maps in the named Grid and specifying these overrides the geogrid heuristics for choosing the lat/lon maps. The Top, left, bottom, right are the latitude and longitude coordinates of the northwesterm and southeastern corners of the selection box. The expressions consist of one or more quoted relational expressions. See grid() for more information about the expressions.

The function will always return a single Grid variable whose values completely cover the given region, although there may be cases when some additional data are also returned. If the longitude values 'wrap around' the right edge of the data, then the function will make two requests and return those joined together as a single Grid. If the data are stored with the southern latitudes at the top of the array, the return result will be flipped so that the northern latitudes are at the top. If the Longitude values are offset, the function will correct for that, as well.


Version documented: 1.1

The geogrid() function applies a constraint given in latitude and longitude to a DAP Grid variable. The arguments to the function are:

 geogrid(variable, top, left, bottom, right[, expression ...])

The variable is the data to be sub-sampled. The Top, left, bottom, right are the latitude and longitude coordinates of the northwesterm and southeastern corners of the selection box. The expressions consist of one or more quoted relational expressions. See grid() for more information about the expressions.

The function will always return a single Grid variable whose values completely cover the given region, although there may be cases when some additional data are also returned. If the longitude values 'wrap around' the right edge of the data, then the function will make two requests and return those joined together as a single Grid. If the data are stored with the southern latitudes at the top of the array, the return result will be flipped so that the northern latitudes are at the top.

grid()

Version documented: 1.0

The grid() function takes a DAP Grid variable and zero or more relational expressions. Each relational expression is applied to the grid using the server's constraint evaluator and the resulting grid is returned. The expressions may use constants and the grid's map vectors but may not use any other variables. In particular, you cannot use the grid values themselves

Two forms of expression are provided:

  1. "var relop const"
  2. "const relop var relop const"

Where relop stands for one of the relational operators, like = and >

For example: grid(sst,"20>TIME>=10") and grid(sst,"20>TIME","TIME>=10") are both legal and, in this case, also equivalent.

linear_scale

Version documented: 1.0b1

The linear_scale() function applies the familiar y = mx + b equation to data. It has three forms:

  1. linear_scale(var)
  2. linear_scale(var,scale_factor,add_offset)
  3. linear_scale(var,scale_factor,add_offset,missing_value)

If only the name of a variable is given, the function looks for the COARDS/CF-1.0 scale_factor, add_offset and missing_value attributes. In the equation, 'm' is scale_factor, 'b' is add_offset and data values that match missing_value are not scaled.

If add_offset cannot be found, it defaults to zero; if missing_value cannot be found, the test for it is not performed.

In the second and third form, if the given values conflict with the dataset's attributes, the given values override.

The make_array() function

The make_array() function takes three or more arguments and returns a DAP2 Array with the values passed to the function.

make_array(<type>, <shape>, <values>, ...)
<type> is any of the DAP2 numeric types (Byte, Int16, UInt16, Int32, UInt32, Float32, Float64); <shape> is a string that indicates the size and number of the array's dimensions. Following those two arguments are N arguments that are the values of the array. The number of values must equal the product of the dimension sizes.

Example: make_array(Byte,"[4][4]",2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5) will return a DAP2 four by four Array of Bytes with the values 2, 3, ... . The Array will be named g<int> where <int> is 1, 2, ..., such that the name does not conflict with any existing variable in the dataset. Use bind_name() to change the name.

This function can build an array with 1024 X 1024 Int32 elements in about 4 seconds.

The 'make array' special forms

These special forms can build vectors with specific values and return them as DAP2 Arrays. The Array variables can be named using the bind_name() function and have their shape set using bind_shape().

$<type>(size hint,: values, ...)
The $<type> ($Byte, $Int32, ...) literal starts the special form. The first argument size hint provides a way to preallocate the memory needed to hold the vector of values. Following that, the values are listed. Unlike make_array(), it is not necessary to provide the exact size of the vector; the size hint is just that, a hint. If a size hint of zero is supplied, it will be ignored. Any of the DAP2 numeric types can be used with this special form. This is called a 'special form' because it invokes a custom parser that can process values very efficiently.

Example: $Byte(16:2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5) will return a one dimensional (i.e., a vector) Array of Bytes with values 2, 3, ... . The vector is named g<int> just like the array returned by make_array(). The vector can be turned in to a N-dimensional Array using bind_shape() using bind_shape("[4][4]",$Byte(16:2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5)).

The special forms can make a 1,047,572 element vector on Int32 in 0.4 seconds, including the time required to parse the million plus values.

Performance measurements

Time to make 1,000,000 (actually 1,048,576) element Int32 array using the special form, where the argument vector<int> was preset to 1,048,576 elements. Times are for 50 repeats.

Summary: Using the special for $Int32(size_hint, values...) is about 10 times faster for a 1 million element vector than make_array(Int32,[1048576],values...). As part of the performance testing, the scanner and parser were run under a sampling runtime analyzer ('Instruments' on OS/X) and the code was optimized so that long sequences of numbers would scan and parse more efficiently. This benefited both the make_array() function and #type() special form.

Raw timing data

In all cases, a 1,048,576 element vector of Int32 was built 50 times. The values were serialized and written to /dev/null using the command time besstandalone -c bes.conf -i bescmd/fast_array_test_3.dods.bescmd -r 50 > /dev/null where the .bescmd file lists a massive constraint expression (a million values). The same values were used.

NB: The DAP2 consraint expression scanner was improved based on info from 'instruments', an OS/X profiling tool. Copying values and applying www2id escaping was moved from the scanner, where it was applied it to every token that matched SCAN_WORD, to the parser, where it was used only for non-numeric tokens. This performance tweak makes a big difference in this case since there are a million SCAN_WORD tokens that are not symbols.

Runtimes for make_array() and $type, scanner/parser optimized, two trials
Time in seconds
What Real (s) User System
$type, with hint 19.844 19.355 0.437
$type, with hint 19.817 19.369 0.427
$type, no hint 19.912 19.444 0.430
$type, no hint 19.988 19.444 0.428
make_array() 195.332 189.271 6.058
make_array() 197.900 191.628 6.254

bind_name() and bind_shape()

These functions take a BaseType* object and bind a name or shape to it (in the latter case the BaseType* must be an Array*). They are intended to be used with make_array() and the $type special forms, but they can be used with any variable in a dataset.

bind_name(name,variable)
The name must not exist in the dataset; variable may be the name of a variable in the dataset (so this function can rename an existing variable) or it can be a variable returned by another function or special form.
bind_shape(shape expression,variable)
The shape expression is a string that gives the number and size of the array's dimensions; the variable may be the name of a variable in the dataset (so this function can rename an existing variable) or it can be a variable returned by another function or special form.

Here's an example showing how to combine bind_name, bind_shape and $Byte to build an array of constants: bind_shape("[4][4]",bind_name("bob",$Byte(0:2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5))). The result, in a browser, is:

Dataset: function_result_coads_climatology.nc
bob[0], 2, 3, 4, 5
bob[1], 2, 3, 4, 5
bob[2], 2, 3, 4, 5
bob[3], 2, 3, 4, 5

Unstructured Grid subsetting

The ugr5() function subsets an Unstructured Grid (aka flexible mesh) if it conforms to the Ugrid Conventions built around netCDf and CF. More information on subsetting files that conform to this convention can be found here.

See ugr5 documentation for more information.

This function is optional with Hyrax and is provided by the ugrid_functions module.

version()

The version function provides a list of the server-side processing functions available on a given server along with their versions. For information on a specific function, call it with no arguments or look at this page.

tabular()

roi()

Brief: Subset N arrays using index slicing information

This function should be called with a series of array variables, each of which are N-dimensions or greater, where the N common dimensions should all be the same size. The intent of this function is that a N-dimensional bounding box, provided in indicial space, will be used to subset each of the arrays. There are other functions that can be used to build these bounding boxes using values of dataset variables - see bbox() and bbox_union(). Taken together, the roi(), bbox() and bbox_union() functions can be used to subset a collection of Arrays where some arrays are taken to be dependent variables and others independent variables. The result is a subset of 'discrete coverage' the collection of independent and dependent variables define.

roi(array1, array2, ..., arrayN, bbox(...))
roi(array1, array2, ..., arrayN, bbox_union(bbox(...), bbox(...), ..., "union"))
Subset array1, ..., using the bound box given as the last argument. Teh assumption is that the arrays will be the range variables of a coverage and that the bounding boxes will be computed using the range variables. See the bbox() and bbox_union() function descriptions.

bbox()

Brief: Return the bounding box for an array

Given an N-dimensional Array of simple numeric types and two minimum and maximum values, return the indices of a N-dimensional bounding box. The indices are returned using an Array of Structure, where each element of the array holds the name, start index and stop index in fields with those names.

It is up to the caller to make use of the returned values; the array is not modified in any way other than to read in it's values (and set the variable's read_p property).

The returned Structure Array has the same name as the variable it applies to, so that error messages can reference the source variable.

bbox(array, min-value, max-value)
Given that array is an N-dimensional array, return a DAP Array with N elements. Each element is a DAP Structure with two fields, the indices corresponding to the first and last occurrence of the values min-value and max-value.

bbox_union()

SummaryL Combine several bounding boxes, forming their union.

This combines N BBox variables (Array of Structure) forming either their union or intersection, depending on the last parameter's value ("union" or "inter[section]").

If the function is passed bboxes that have no intersection, an exception is thrown. This is so that callers will know why no data were returned. Otherwise, an empty response, while correct, could be baffling to the client.

bbox_union(bbox(a1, min-value-1, max-value-1), bbox(a2, min-2, max-2), ..., "union"|"intersection")
Given 1 or more bounding box Array of Structures (as returned by the bbox() function) form their union or intersection and return that bounding box (using the same Array of Structures representation).

Functions Included FreeForm Module

There are a number of date and time functions supported by the FreeForm server.

@TODO Add documentation for the functions

Projection functions

Selection functions