Template:Use Case: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 62: Line 62:


=== Response Structure ===
=== Response Structure ===
The response will be in the form of a table of values.
The response will be in the form of a table of values. The table will have columns that list the independent variables first and then the dependent variables. Only rows with all values will be included.
 
The function will return the table encoded as a DAP Sequence; the server will transform that into CSV if that is part of the request URL. Other formats might work, although netCDF is not supported at this time (NB: Check on that(.


== Alternate Flow ==
== Alternate Flow ==

Revision as of 22:02, 26 January 2015

Point Of Contact: A Human

Description

Satellite_Swath_1

Goal

The user wants to get data from several granules of a many-granule data set using a single request. The user does not want to iterate over several response objects/files; all the data should be contained in a single entity.

Summary

Principal actor: A person who wants to access level 2 satellite swath data from a number of granules using a single URL. The person may be an end user who issues a data request or they may be the developer of a system that will build a URL to return to an end user. In either case, an end-user dereferences the URL to get data.

Goal: Eliminate users having to use many URLs to get these data.

The data are Satellite Swath data (e.g., MODIS, level2) where each granule contains a number of dependent variables are matched to lat and lon arrays, all of which are 2D (this is the general case for a 2D discrete coverage). The data set is made up of a number of these granules, organized in a hierarchy of directories by date. Each of the granules holds one 'pass' of the satellite from one pole to another. When a user searches for data, they use a spatial and time box to choose values from the whole data set. Currently the response to that search is a list of granules, where each contains some of the data in the users query. The complete granules, or subset versions, may be downloaded. In the either case, the user must access/download a number of files and then the values combined somehow - the exact mechanism being left up to the end user, including reading the files that contain the data.

In this use case, the user instead uses a single URL to access all of the data that match the selection criteria. Those values will be returned using the CSV format, so there is virtually no file decoding burden put on the user. The URL will be one that a system developer can easily build in software as well as one that a person could write by hand, at least in many cases.

Actors

A developer building a 'response URL' from a user's data search request
This person must write software that will translate the search request into the Hyrax server's (to-be developed) aggregation request syntax.
A person making a data request to the Hyrax server
This actor will need to a similar understanding of the new aggregation mechanism as the 'developer' but will be writing the URLs 'by hand'.
The Hyrax data server
This is an important actor because the server will have to be updated to realize the new software's benefits.
Level 2 satellite swath data, stored in multiple files (file == granule == 'pass')
This use case is limited to a particular kind of data, although the resulting software might be useful for Time Series data too.

Preconditions

The data to be accessed are served using Hyrax
Yup
The Hyrax server has been updated to include the aggregation function
Ditto, but this will require that system admins install the code.
The user's software understands the structure of the returned data values
Again, Yup.
The data are indexed by a search system
The main use-case - the scenario where a search system would normally return a list of URLs instead returns a single URL that will call the aggregation function that will return all of the granules' data in one shot.
The user invoking the aggregation function understands the organization of the data
This is actually a variant of the main use-case because the in the variant scenario the user builds the URL that calls the aggregation function and not a search system

Triggers

  • A user searches for data and the result indicates that the data they want are spread over a number of granules.
  • A user knows they ant data that are (or may be) spread over a number of granules in a dataset.

Basic Flow

A user (the actor that initiates the use case) performs a search (using EDSC?) and the result set contains two or more granules. The search client would normally return a list of URLs to the discrete granules that make up the result set. However, in this use case, the client has been programed to recognize this situation and will respond by forming a URL that will run the aggregation server-side function and request the data aggregated data be returned as a list of CSV data points.

The server function interface

The aggregation server function needs to know what granules to aggregation on, the variables that are to be returned (nominally the returned variables are a subset of all the variables in the granules) and the space-time constraints that data must satisfy. The return format (DAP2 or DAP4 binary; CSV; or NetCDF file is determined using the request extension of the data access URL.

Specification of the granules

The specification of granules will use a regular expression. This will provide a way for callers of the function to limit the granules using various information encoded in the filenames as well as specifying all of the files in (or under) a given location in the server's file system. For example, a user might want only ascending passes or only passes made during daylight. Often L2 data files encode this kind of information.

One issue with this is that there's no standard way to make the more fine-grained distinctions (e.g., passes that are on the ascending part of the satellite's orbit), so how a user or search client would know apply this algorithmically is hard to say.

Variables to include in the response

The caller will list a number of names. The function will assume that every granule that matches the regex will contain all of the variables. Each variable is assumed to hold 'dependent values'. For any given granule (maybe all granules?) the listed variables may not have any values included in the response because no values may have sampled with the space-time constraint.

Space and Time constraints

Two pieces of information will be provided to specify the space time constraint. The list of variables that contain the latitude, longitude and time values will be given along with the constraints on their values. To make it easier to unambiquously associated the variable with the constraint, the limitations will be made using 'mini expressions' of the form value relop var or value relop var relop value. If one var appears in more than one of these expressions the result will be the intersection of the values specified byt eh expressions.

The parameter specification is designed to be flexible enough to specify the constraints without having to configure the function for each dataset. The downside is that it will not take into account the specifics of latitude, longitude or time. For example, geospatial subsetting often takes into account that longitude values 'wrap' at either the dateline or prime meridian. The scheme used here will not do that, which means it can be applied to any independent variables' values. For these data (level 2) that will not be a problem because the values are returned in a table.

Response Structure

The response will be in the form of a table of values. The table will have columns that list the independent variables first and then the dependent variables. Only rows with all values will be included.

The function will return the table encoded as a DAP Sequence; the server will transform that into CSV if that is part of the request URL. Other formats might work, although netCDF is not supported at this time (NB: Check on that(.

Alternate Flow

Here we give any alternate flows that might occur. May include flows that involve error conditions. Or flows that fall outside of the basic flow.

Post Conditions

Here we give any conditions that will be true of the state of the system after the use case has been completed.

Activity Diagram

Here a diagram is given to show the flow of events that surrounds the use case.

Notes

There is always some piece of information that is required that has no other place to go. This is the place for that information.

Resources

In order to support the capabilities described in this Use Case, a set of resources must be available and/or configured. These resources include data and services, and the systems that offer them. This section will call out examples of these resources.


Resource Owner Description Availability Source System
name Organization that owns/ manages resource Short description of the resource How often the resource is available Name of system which provides resource