Hyrax: Asynchronous Response Implementation

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Overview

The proposed use cases so far are:

  1. Near-line tape drive
  2. Server side processing

I think it's important to identify the various issues in each of these use cases because much of what has to happen isn't just the asynchronous aspects of the response.


URLs
I think that while the Async definition for DAP4 allows the Async response essentially forward the request to a new URL in order to access the requested item at a later date, we might want to utilize our existing architecture to make the URL the same as the original request URL. We do exactly this with uncompressed files that are cached, a we could use a similar mechanism for the async use cases...

Near-line tape drive

In this use case primary data are held in some type of near-line tape storage system. This means that the data can be accessed, but the access time may take lounger than a typical TCP time-out period.

Issues

Catalog
The near-line system must be cataloged: Some expressive list of contents must be made available to the BES so that the BES can include them in either the default catalog (called "catalog") or in some specific catalog implementation for the near-line system.
Caching
The BES will be required to cache datasets held in the near-line system.
Cache Purging
The BES will need to be able to purge, or otherwise make room in the cache for new datasets being brought in from the near-line system.
Concurrency
It will be crucial that the BES be able to identify when a near-line dataset has been requested but has not yet been cached in order to prevent multiple concurrent calls to unpack the same dataset.

Basic Flow Stage 1

  1. A client asks for a catalog of holdings from a Hyrax server.
  2. The client makes a request for one the datasets in the catalog.
  3. OLFS receives the request for a dataset and routes that request to the BES.
  4. The BES identifies that this is a request for a near-line dataset
  5. The BES checks to see if the requested dataset is in the near-line cache andy finds the requested dataset is NOT represented in the cache.
  6. The BES returns an error/message to the OLFS indicating that the response will be delayed by a specified amount of time.
  7. The OLFS responds to the client request with the "async" response and HTTP 201 (Accepted)
  8. The BES checks to see if the requested dataset is in the process of being extracted from the near-line system.
  9. If the requested dataset is not already in the process of being extracted the BES initiates the retrieval of the requested dataset from the near-line system.

Basic Flow Stage 2

  1. The client makes a request for one the datasets in the catalog.
  2. OLFS receives the request for a dataset and routes that request to the BES.
  3. The BES identifies that this is a request for a near-line dataset
  4. The BES checks to see if the requested dataset is in the near-line cache.
  5. The requested dataset is in the cache the BES services the request immediately.
  6. The OLFS forwards the BES response and returns HTTP 200.

Server Side Processing

In this use case the server is asked to perform a computationally extensive processing operation on existing data, for example re-projection and re-gridding of a high resolution global dataset.

Issues

Caching
The BES will be required to cache datasets generated by asynchronous server side processing operations.
Cache Purging
The BES will need to be able to purge, or otherwise make room in the cache for new datasets being generated by server side processing operations.
Concurrency
It will be crucial that the BES be able to identify when a server side processing dataset has been requested but has not yet been cached in order to prevent multiple concurrent calls to generate the same dataset.

Basic Flow Stage 1

  1. A client asks for a catalog of holdings from a Hyrax server.
  2. The client makes a request for one the datasets in the catalog, in the constraint expression the client requests that server perform a computationally intensive operation.
  3. OLFS receives the request for a dataset and routes that request to the BES.
  4. The BES identifies that this is a request will require sometime to service.
  5. The BES checks to see if the requested result is already in the server side processing cache, and finds that is not there.
  6. The BES returns an error/message to the OLFS indicating that the response will be delayed by a specified amount of time.
  7. The OLFS responds to the client request with the "async" response and HTTP 201 (Accepted).
  8. The BES initiates the generation of the requested dataset/processing combination.

Basic Flow Stage 2

  1. The client makes a request for one the datasets in the catalog, in the constraint expression the client requests that server perform a computationally intensive operation.
  2. OLFS receives the request for a dataset and routes that request to the BES.
  3. The BES identifies that this is a request will require sometime to service.
  4. The BES checks to see if the requested result is already in the server side processing cache.
  5. The requested dataset is in the cache and the BES services the request immediately.
  6. The OLFS forwards the BES response and returns HTTP 200.