DAP4: Asynchronous Request-Response Proposal v3

From OPeNDAP Documentation
Revision as of 16:56, 1 August 2012 by Jimg (talk | contribs) (Client willingness to accept asynchronous responses)

<-- back to OPULS Development

-- EthanDavis ndp 15:55, 15 June 2012 (PDT)

1 Attribution

Ethan, Nathan and James

2 Background

Over the years, a number of users have requested support for an asynchronous request/response in OPeNDAP. Most of the requests have been from large data centers dealing with near-line data archives (e.g., tape archives) where access to the data may take tens of minutes. Another use case that has been discussed is server-side processing which, depending on the processing, may take some time to calculate.

At least one ad hoc attempt to implement this functionality has been made by an external group.

3 Problem addressed

This proposal aims to support requests for data which are not readily available, and it builds on some of the ideas in the DAP4: Capabilities and Versioning document.

The intent is to support the following cases:

  • When a client, that does not understand DAP4 asynchronous responses, makes a request that will return an asynchronous response, it should fail gracefully (or at least as gracefully as returning an error message can be).
  • When a client does not know an asynchronous response will be returned in response to specific data request, it should be able to recover and use information in the response to decide either to
    • not continue with the request; or
    • make the request explicitly stating that it is willing to accept an asynchronous response.

In general, asynchronous response schemes all rely on some form of notification. For the sake of simplicity, DAP4 asynchronous responses will use the following notification scheme:

  • Client polling, via URL. The initial response from the server indicates that the desired response will be provided asynchronously and provides a URL that can be used at a later time to retrieve the data.

The following notification schemes are used by other systems, but they will not be supported by DAP4:

  • Maintain connection and wait for response (see COMET, long poll, connection stayalive, etc.).
  • Client-provided URL on which it can receive notification (see Facebook Graph API's [notification] for how they support this).
  • Email notification. This leverages two basic things about the network: mail and people, and that makes it very robust. However, I don't think it fits well into DAP4. It adds too many options to the request, which has already been muddied up quite a bit.

4 Proposed solution

Asynchronous responses are responses that will take the server some time to build. When a client is told that a response 'is asynchronous,' it must know to come back at a later time to retrieve the response. The concept is a very simple one, and the existing network infrastructure is very good at supporting these kinds of interactions. A major factor in the success of the proposed solution will be the level of uniform support for the design. Secondly, as is often the case, the details will be more complex than the underlying concept. In particular, the request mechanism must be extended so that synchronous (regular) requests are not affected by the addition of asynchronous requests and, at the same time, clients do not inadvertently make asynchronous requests. another detail is that the (asynchronous) responses are ephemeral because they typically only persist for a period of time and then be purged.

A typical 'workflow' for an asynchronous request is:

  1. A client makes a data request that indicates that it will accept either an asynchronous or synchronous response. Optionally, the client can place a time constraint on the response, indicating that if the response will not be ready in a given period of time, it does not want the response.
  2. The server returns an initial response (without delay) that indicates the request has indeed resulted in an asynchronous response and provides the client with a URL and time estimate.
  3. The client reads the time estimate and waits...
  4. The client dereferences the URL and gets the response.

The remainder of this document will expand on this basic workflow.

4.1 Client willingness to accept asynchronous responses

A client can indicate willingness to accept asynchronous responses in one of two ways:

If the client indicates that it must have access to the asynchronous response content within a certain time (utilizing either the X-DAP-Async-Accept HTTP header and/or the async keyword in the constraint expression) and the response will not be available in that time frame, the server MUST reject the request and return an HTTP status of 412 and the DAP Asynchronous Request Rejected XML document.

If both the X-DAP-Async-Accept HTTP header and the async keyword are used, the keyword takes precedence.

Servers must reject requests that require an asynchronous response if the client has not indicated willingness to accept such a response. Rejection of such requests is indicated by all three of the following:

  1. HTTP status of 400
  2. Inclusion of the X-DAP-Async-Required HTTP response header
  3. The response body must contain the DAP Asynchronous Response Required XML document.

This safety check (requiring clients to explicitly indicate their willingness to accept asynchronous responses) is required because otherwise very simple clients might inadvertently make requests that will result in an asynchronous responses, and these kinds of responses are likely to use disproportionately (relative to synchronous responses) more server resources. We want to make DAP4 so that simple clients work well and don't encounter unexpected 'hiccups.'

4.2 Initial processing by the server

When a request is accepted by the server and it will result in an asynchronous response, the server MUST return the DAP Asynchronous Request Accepted XML document. This document contains a URL to the pending result of the request.

Of course, this discussion is about the mechanism that enables a client to make a request and the server to provide information about an asynchronous response to that request. It does not cover any of the nearly infinite ways a server might actually make the content of that response. It is likely that servers will write the responses to files and the URL returned to the client will be used to retrieve that file, but there's no requirement that servers do that. The only requirement on servers is that the URL returned can be dereferenced and that operation will return the response requested by the client.

4.3 Response retrieval by the client

If the client attempts to access the asynchronous result prior to it's availability, the server SHOULD return an HTTP response status of 409 (DAP Response Not Ready) along with the DAP Asynchronous Response Not Available XML document. If the server does not return the 409 response, it MUST return a 404 (Not Found) response.

If the client attempts to access the asynchronous result after it is no longer available, the server SHOULD return an HTTP response status of 410 (Gone). If the server does not return a 410 response in this case it MUST return a 404 (Not Found) response.

In each case above where the server SHOULD return a specific error code, but may return a 404 code instead, the intent is for servers to provide the most appropriate use of HTTP/1.1's error codes while also providing servers with an 'out' when that is hard for them to do. For example, knowing that a response, which is essentially ephemeral, is gone would, in theory, require to server to keep a record of every URL ever issued for an asynchronous response and that is not practical. At the same time, it is easy to see that a client would really like to know that the response has not yet been finished (i.e., it has not waited long enough) or that it is gone (i.e., it waited too long).

4.4 DAP4 Constraint Expression extension for Async

By adding a keyword and value pair to the DAP4 constraint expression we can allow a client to encode it's willingness to accept an asynchronous response, along with the a maximum amount of time the client can wait before it can access the response.

async
A value of zero indicates the client is willing to unconditionally accept an asynchronous response. A positive integer value will be interpreted as the number of seconds that the client will wait for access to the response. If the value is negative the serve MUST return an error.
Examples
Client is willing to unconditionally accept an asynchronous response
?async=0&projection=x,y,temp
Client is willing to wait for 60 seconds (1 minute) for access to the asynchronous response
?async=60&projection=x,y,temp

4.5 HTTP

In many (most?) cases the DAP4 protocol is going to be transported using HTTP. It is important for any DAP server/client interaction over HTTP to correctly utilize the various components of the HTTP protocol when possible. In order to support asynchronous responses DAP4 will utilize two dedicated DAP HTTP headers in addition to various HTTP response/status codes described below.

4.5.1 HTTP Response Headers

4.5.1.1 X-DAP-Async-Required

The X-DAP-Async-Required HTTP response header is included in the response if the request requires an asynchronous response and the client has not indicated willingness to accept such a response. Rejection of the request should also be indicated by the 400 DAP Asynchronous Response Required HTTP response code.

4.5.2 HTTP Request Headers

4.5.2.1 X-DAP-Async-Accept

A client indicates willingness to accept asynchronous responses by including the X-DAP-Async-Accept HTTP header. Clients can make conditional requests for asynchronous responses by indicating the maximum time they are willing to wait by using the X-DAP-Async-Accept HTTP header with a value given in seconds. A value of zero indicates that the client is willing to accept whatever delay the server may encounter.

4.5.3 HTTP Response Codes

4.5.3.1 202 Accepted

A server indicates that a request has been accepted and will be handled asynchronously by returning a '202 Accepted' HTTP response code. The response body must contain a document in one of the asynchronous information media types listed below.

4.5.3.2 400 DAP Asynchronous Response Required

The '400 DAP Asynchronous Response Required' HTTP response code is used to indicate that the DAP request has been rejected because an asynchronous response is required and the client did not indicate willingness to accept an asynchronous response.

The response code text is used to indicate the reason for the rejection. However, since the '400' HTTP response code is not specific to asynchronous DAP (the standard text for the '400' code is "Bad Request"), the X-DAP-Async-Required HTTP response header is also included in the response (see above).

Note that a standard 400 HTTP response code is returned. In this way, a client that does not understand asynchronous DAP can fail gracefully. The response code text message has been changed to be more informative of the reason for the failure. For clients that are aware of asynchronous DAP, the "DAP-Async-Required" header is set to "true". The body of the response also returns some information the client can use to decide on how it will continue.

An alternative would be to use a non-standard 4xx HTTP response code (e.g., we could choose 473). Clients should interpret any 4xx code that they do not recognize as a 400. However, this may not be handled well by all clients.

4.5.3.3 409 Conflict - DAP Response Not Ready

The '409 Conflict' HTTP response code is used to indicate that the DAP request has been rejected because a previous asynchronous request has not been completed and the result is not ready for access.

4.5.3.4 412 Precondition Failed

The '412 Precondition Failed' HTTP response code is used to indicate that the DAP request has been rejected because it did not meet the X-DAP-Async-Accept condition (see above) that was specified in the request.

4.6 Asynchronous Response Documents

The two uses of these documents are:

  • to inform clients that a request will result in an asynchronous response and
  • to provide clients with the status of an an accepted asynchronous request.
  • to inform clients that a request for and asynchronous response has been rejected.

4.6.1 DAP Asynchronous Response Required

This document informs clients that a request will result in an asynchronous response, and that the client has not yet indicated it's willingness to accept an asynchronous response. This document must be a plain DAP4 Error response (since the client has not told the server it knows about asynchronous responses, there's a real chance it doesn't and won't know how to process the AsynchronousResponse XML documents).

<Error/>
<!-- Syntax TBD -->

These documents are XML that follows the DAP Asynchronous XML schema and are declared in the namespace http://opendap.org/ns/dap/asynchronous.

4.6.2 DAP Asynchronous Request Accepted

This response informs clients that a request resulting in an asynchronous response has been accepted, along with operational information about retrieving the asynchronous response result. Note that the expectedDelay and responseLifetime elements are an estimate by the server. A server SHOULD ensure that the response will remain available for the time period given by expectedDelay and responseLifetime. We say SHOULD and not MUST because we cannot predict all possible operational situations where these kinds of responses might be used. For example, a server might be providing access for several types of users who might have different access priorities, especially to limited resources like those typically involved with asynchronous access, and thus some responses might be further delayed, or removed early, to enable processing of requests from users with higher priority. It should be kept in mind, however, that the usefulness of the asynchronous responses will depend, in part, on servers' providing a facility on which clients can depend.

While the expectedDelay and responseLifetime elements are required, a server MAY set their seconds attribute to 0 to indicate that it cannot provide a reliable value. In this case, clients SHOULD poll every 300 seconds and servers SHOULD expect this behavior. This is the default TCP user timeout period (see http://tools.ietf.org/html/rfc5482).

<AsynchronousResponse status="accepted">
  <expectedDelay seconds="600" />
  <responseLifetime seconds="3600"/>
  <link href="http://server.org/async/path/result" />
</AsynchronousResponse>

4.6.3 DAP Asynchronous Response Not (Yet) Available

This document informs clients that a while a previous request for an asynchronous response has been accepted the result is not available.

<AsynchronousResponse status="pending"/>

4.6.4 DAP Asynchronous Request Rejected

This document informs clients that a request for an asynchronous response has been rejected, even though the client said it is willing to process an asynchronous response.

<AsynchronousResponse status="requestRejected">
    <expectedDelay seconds="600" />
    <description>Acceptable access delay was less than estimated delay.</description>
</AsynchronousResponse>

4.7 Examples

4.7.1 Constrained Data Request-Response using GET

Simple Request
GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org
Accept: */*

If the server decides it needs to handle this request in an asynchronous manner, it will refuse the request because it did not say it would accept an asynchronous response.

Response
400 DAP Asynchronous Response Required
X-DAP-Async-Required: true
Content-Type: application/vnd.opendap.org.dap.asynchronous+xml;charset=UTF-8
 
<AsynchronousResponse status="required">
  <expectedDelay millisec="600000" />
  <notificationSupport>
    <polling frequencyLimitInMillisecs="60000" />
    <email />
    <longConnect />
    <http />
  </notificationSupport>
  ...
</AsynchronousResponse>

4.7.2 Constrained Data Request-Response with DAP-Async-Accept Request Header

Request:

GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org
X-DAP-Async-Accept: 0
Accept: multipart/mixed

Alternately, this request would produce the same result using only the URL:

GET /dap/path/data.nc.dap?acceptAsync=0&projection=x,y,temp HTTP/1.1
Host: server.org


Response:

202 Accepted
Content-Type: text/xml;charset=UTF-8

<AsynchronousResponse status="accepted">
  <expectedDelay millisec="600000" /> <!-- Estimated delay of 10 minutes. -->
  <notificationSupport>
    <polling frequencyLimitInMillisecs="60000" /> <!-- Don't poll more often than every minute. -->
  </notificationSupport>
  <access
          role="http://services.opendap.org/dap4/data#"
          type="multipart/mixed"
          href="http://server.org/async/path/data.nc?projection=x,y,temp" />
  ...
</AsynchronousResponse>


4.7.3 Constrained Data Request-Response with conditional DAP-Async-Accept Request Headers

Request:

GET /dap/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org
X-DAP-Async-Accept: 60000
Accept: application/vnd.opendap.org.dap4.data

Alternately, this request would produce the same result using only the URL:

GET /dap/path/data.nc.dap?acceptAsync=60000&projection=x,y,temp HTTP/1.1
Host: server.org

Response:

412 Precondition Failed
Content-Type: application/vnd.opendap.org.dap.asynchronous+xml;charset=UTF-8
 
<AsynchronousResponse status="requestRejected">
    <expectedDelay millisec="600000" /> <!-- Greater than that specified in DAP-Async-Accept-If-Expected-Delay-Less-Than header. -->
    <description>Acceptable access delay was less than estimated delay.</description>
</AsynchronousResponse>


4.7.4 Premature Request For Asynchronous Result

Request:

GET /async/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org
Accept: multipart/mixed

Alternately, this request would produce the same result using only the URL:

GET /async/path/data.nc?projection=x,y,temp HTTP/1.1
Host: server.org


Response:

409 Conflict
Content-Type: text/xml;charset=UTF-8

<AsynchronousResponse status="pending">
  <expectedDelay millisec="599979" /> <!-- Estimated delay. -->
  <notificationSupport>
    <polling frequencyLimitInMillisecs="60000" /> <!-- Don't poll more often than every minute. -->
  </notificationSupport>
  <access
          role="http://services.opendap.org/dap4/data#"
          type="multipart/mixed"
          href="http://server.org/async/path/data.nc?projection=x,y,temp" />
  ...
</AsynchronousResponse>

4.7.5 Constrained Data Request-Response using POST

Request:

POST /dap/path/data.nc HTTP/1.1
Host: server.org
X-DAP-Async-Accept: true
Content-Type: application/vnd.opendap.org.dap4.ce+xml;charset=UTF-8
  
<constraintExp>...</constraintExp>

Alternately, this request would produce the same result using only the URL:

POST /dap/path/data.nc.dap HTTP/1.1
Host: server.org

<constraintExp>acceptAsync=60000&projection=x,y,temp</constraintExp>

Response:

202 Accepted
Content-Type: application/vnd.opendap.org.dap.asynchronous+xml;charset=UTF-8
 
<AsynchronousResponse status="accepted">
  <expectedDelay millisec="600000" /> <!-- Estimated delay of 10 minutes. -->
  <notificationSupport>
    <polling frequencyLimitInMillisecs="60000" /> <!-- Don't poll more often than every minute. -->
  </notificationSupport>
  <access
          role="http://services.opendap.org/dap4/data#"
          type="multipart/mixed"
          href="http://server.org/async/path/data.nc?projection=x,y,temp" />
  ...
</AsynchronousResponse>

4.7.6 Support notification of completion

Request:

thing

Response:

thing

5 Rationale for the solution

Um... Cause it's gonna work!


6 Discussion

I think this is better than sliced bread ndp