Aggregation enhancements

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

In response to requests from NASA, and with their support, we have added two new kinds of aggregation to Hyrax. Both of these aggregation operations provide a way for client software to specify the granules that will be used to build the aggregate result. While our existing aggregation interface, based on NcML, works well for NASA's level 3 data products, it is all but useless for level 2 'swath data'. These aggregation functions are specifically designed to work with satellite swath data without being limited to just swath data and are explicitly intended for use with search interfaces that have knowledge of the individual files that make up typical satellite data sets (often called a dataset inventory).

Overview of the new capability

Providing search results that include explicit references to hundreds or thousands of discrete files has been the only option for many search interfaces up to this point. This is especially when the datasets holds satellite swath data because swath data are not easily aggregated. For this interface to Hyrax's aggregation software, we provide two kinds of responses: Data in multiple files that are bundled together using an zip archive and data in tabular form. For clients that request the aggregate result in a zip file, given a request for values from N files, there will be N entries in the resulting zip archive. Some of these entries may simply indicate that no data matching the spatial or other constraints were found. While the source data files can be in any format that the Hyrax server can read, the response will be either netCDF3, netCDF4 or ASCII. The netCDF3/4 files returned will conform to CF 1.6 to the extent possible (the underlying data files may lack information CF 1.6 requires). For clients that request data in tabular form, the data from N files will be returned in one ASCII CSV response. These values can be easily assimilated by database systems, Excel and other tools.

Intended audience

This service was originally intended for software developers working data search tools who need to be able to return results that encompass hundreds or thousands of granules. It works best from a programmatic interface, but it's certainly open to end users, see the examples using curl for one way to access the service.

Accessing the Aggregation Services

This 'service' is accessed using HTTP's GET or POST methods. In this documentation I will describe how to use POST to send information, but the same key-value parameters can be sent using the GET method, albeit within the character limits of a URL (which vary depending on implementation).

The service is accessed using the following set of key-value parameters:

operation
Use operation to select from various kinds of responses. The form of the response also determines how the aggregation is built. The current values for this parameter are: version which returns information about the service's version; file returns a collection of files; netcdf3, netcdf4, ascii all translate the underlying granule format to netcdf3, etc., and return that collection of translated files; csv returns data from many granules as a single table of data, using Comma Separated Values (csv) format. More information about this is given below.
var
A comma-separated list of variables to include in the files returned when using operation equal to netcdf3, netcdf4, ascii, or csv
bbox
Limit the values returned to those that fall within a bounding box for a given variable. Like var, this applies only to netcdf3, netcdf4, ascii, or csv
file
The URL path component to a granule served by Hyrax. This parameter will appear one for each file.

Performance

Performance is linear in terms of the number of granules. The response is streamed as it is built, so even very large responses use only a little memory on the server. Of course, that won't be the case on the client...

Requesting collections of files

This returns a Zip file containing a number of resources read/produced by the Hyrax BES. The ZIP64(tm) format extensions are used to overcome the size limitations of the original ZIP format.It can handle a list of resources (typically files) and simply return them, unaltered or translated into netCDF files. In the later case, a constraint expression can be applied to each resource before the transformation takes place, limiting the variables and/or parts of variables in the resulting netCDF file. Note that for the netCDF format response to work, the BES must be able to read the format of the original resource (e.g., HDF4).

Requesting tabular results

Server functions used