Template:Use Case

Point Of Contact: A Human

Description

Satellite_Swath_1

Goal

The user wants to get data from several granules of a many-granule data set using a single request. The user does not want to iterate over several response objects/files; all the data should be contained in a single entity.

Summary

Principal actor: A person who wants to access level 2 satellite swath data from a number of granules using a single URL. The person may be an end user who issues a data request or they may be the developer of a system that will build a URL to return to an end user. In either case, an end-user dereferences the URL to get data.

Goal: Eliminate users having to use many URLs to get these data.

The data are Satellite Swath data (e.g., MODIS, level2) where each granule contains a number of dependent variables are matched to lat and lon arrays, all of which are 2D (this is the general case for a 2D discrete coverage). The data set is made up of a number of these granules, organized in a hierarchy of directories by date. Each of the granules holds one 'pass' of the satellite from one pole to another. When a user searches for data, they use a spatial and time box to choose values from the whole data set. Currently the response to that search is a list of granules, where each contains some of the data in the users query. The complete granules, or subset versions, may be downloaded. In the either case, the user must access/download a number of files and then the values combined somehow - the exact mechanism being left up to the end user, including reading the files that contain the data.

In this use case, the user instead uses a single URL to access all of the data that match the selection criteria. Those values will be returned using the CSV format, so there is virtually no file decoding burden put on the user. The URL will be one that a system developer can easily build in software as well as one that a person could write by hand, at least in many cases.

Actors

A developer building a 'response URL' from a user's data search request: This person must write software that will translate the search request into the Hyrax server's (to-be developed) aggregation request syntax.
A person making a data request to the Hyrax server: This actor will need to a similar understanding of the new aggregation mechanism as the 'developer' but will be writing the URLs 'by hand'.
The Hyrax data server: This is an important actor because the server will have to be updated to realize the new software's benefits.
Level 2 satellite swath data, stored in multiple files (file == granule == 'pass'): This use case is limited to a particular kind of data, although the resulting software might be useful for Time Series data too.

Preconditions

The data to be accessed are served using Hyrax: Yup
The Hyrax server has been updated to include the aggregation function: Ditto, but this will require that system admins install the code.
The user's software understands the structure of the returned data values: Again, Yup.
The data are indexed by a search system: The main use-case - the scenario where a search system would normally return a list of URLs instead returns a single URL that will call the aggregation function that will return all of the granules' data in one shot.
The user invoking the aggregation function understands the organization of the data: This is actually a variant of the main use-case because the in the variant scenario the user builds the URL that calls the aggregation function and not a search system

Triggers

A user searches for data and the result indicates that the data they want are spread over a number of granules.
A user knows they ant data that are (or may be) spread over a number of granules in a dataset.

Basic Flow

A user (the actor that initiates the use case) performs a search (using EDSC?) and the result set contains two or more granules. The search client would normally return a list of URLs to the discrete granules that make up the result set. However, in this use case, the client has been programed to recognize this situation and will respond by forming a URL that will run the aggregation server-side function and request the data aggregated data be returned as a list of CSV data points.

The response will return the table encoded as a DAP Sequence; the server will transform that into CSV if that is part of the request URL. (Other formats might work, although netCDF is not supported at this time.)

To formulate a request, the client will need to provide three pieces of information:

The granules to consider when building the aggregation
The (dependent) variables within those granules to include in the aggregation
The space and time bounds (the independent variables) that will be used to constraint values of the dependent variables and
where to find those independent variables (in the granules' variables, attributes or filenames)

These four parameters will be passed to the server using some sort of a constraint. If the granules to aggregate can be specified using a regex, then it should be possible to use HTTP's GET verb. If each granule must be listed separately, then POST will likely be needed to make the request. Because we know that the aggregation operation will be be working with Level 2 satellite swath (geospatial and temporal) data, some optimizations can be made. We know that latitude, longitude and time are encoded in these granules for each sample point. Thus the cases where time is encoded in an attribute or the granule name don't need to be addressed. (They might be addressed by a future version of the code, however.)

Because of variations in time representation, we may adopt ISO8601 as the only way to specify time.

Here are two possible ways to do this:

Using a server function: http://host/server/path.asc?function(regex, vars, constraint sub-expressions)
Using a special end point: http://host/server/aggregator?d4_func=function(vars, constraint sub-expressions)&granules=regex&return_as=csv

The response from the request will be a CSV table listing the dependent variables followed by the values of the dependent vars. For example, the response would typically look like:

d_lat, d_lon, d_time, SST, U_WND, V_WND
45, -120, 2010/10/01, 14, 127, 126
.
.
.

Alternate Flow

There are a number of alternate flows involving errors, all of which involve invalid parameters or granules that fail in some way.

Post Conditions

Here we give any conditions that will be true of the state of the system after the use case has been completed.

Activity Diagram

Here a diagram is given to show the flow of events that surrounds the use case.

Notes

There is always some piece of information that is required that has no other place to go. This is the place for that information.

Resources

In order to support the capabilities described in this Use Case, a set of resources must be available and/or configured. These resources include data and services, and the systems that offer them. This section will call out examples of these resources.

Resource	Owner	Description	Availability	Source System
name	Organization that owns/ manages resource	Short description of the resource	How often the resource is available	Name of system which provides resource

Template:Use Case

Contents