DAP4: Chunked encoding: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 28: Line 28:
;error: This chunk header prefixes an error message; the current data response has ended
;error: This chunk header prefixes an error message; the current data response has ended
;end: This chunk header is the last one for the current data response
;end: This chunk header is the last one for the current data response
[[User:JohnCaron|JohnCaron]]
1) Perhaps "message" is a better name than "chunk' ?
2) I would not limit the size of the chunk to 2^24. In fact, I would use a base-128 variable length encoding to let the size be as large as needed, without wasting space.


===== Grammar =====
===== Grammar =====

Revision as of 16:55, 12 June 2012

<< Back to OPULS Development

Background

One persistent (ahem) problem with DAP2 was the inability of client applications to recognize when a data transmission failed. If an error happened before the initial set of headers for the response were sent, then DAP2's error reporting scheme worked just fine. However, if the server encountered an error once the serialization of data started, for example, it was essentially impossible for a client to detect the error (essentially being the operative word; the Ocapi library did detect these errors, but it is the only client I know that did so).

This proposal suggests that DAP4 use a simple variation on the HTTP/1.1 chunked transmission scheme to serialize the data Part of the response document so that errors are simple to detect. Furthermore, this scheme is independent of the form or content of that part of the response, so the same scheme can be used with different response forms or dropped when/if DAP is used with protocols that support out-of-band error signaling, simplifying our ongoing refinement of the protocol.

References

  1. HTTP/1.1

Problem Addressed

The DAP needs to format its data responses so that a server that interleaves transmission with data reads and serialization can signal errors to a receiver reliably.

Proposed Solution

Overview

The data part of a response document (from now on I will just say 'response document') will be 'chunked' in a fashion similar to that outlined in HTTP/1.1. However, in addition to a prefix indicating the size of the chunk, DAP4 will include a chunk-type code. This will provide a way for the receiver to know if the next chunk is part of the data response or if it contains an error response. In the latter case, the client should assume that the data response ended, even though the correct closing information was not provided.

More detail

Each chunk will be prefixed by a chunk header consisting of a chunk type and byte count, all contained in a single four-byte word, encoded using network byte order. The chunk type will be encoded in the high-order byte of the four-byte word and chunk size will be given by the three remaining bytes of that word. The maximum chunk size possible is 2^24 (16 777 216) bytes. Immediately following the four-byte chunk header will be chunk-count bytes followed by another chunk header.

Three chunk-type types are defined in this proposal:

data
This chunk header prefixes the next chunk in the current data response
error
This chunk header prefixes an error message; the current data response has ended
end
This chunk header is the last one for the current data response

JohnCaron

1) Perhaps "message" is a better name than "chunk' ?

2) I would not limit the size of the chunk to 2^24. In fact, I would use a base-128 variable length encoding to let the size be as large as needed, without wasting space.

Grammar
response = chunk *chunk

chunk = chunk-header chunk-data

chunk-header = b0 b1 b2 b3 

chunk-type = b0 (OCTET)
                      ; 0 = data, 1 = error, 2 = end

chunk-size = b1 b2 b3 3(OCTET)
                      ; the three lower-order bytes, interpreted as an integer on network byte order

chunk-data = chunk-size(OCTET)
                      ; exactly chunk-size bytes

Rationale

The chunk headers use a simple prefix with a binary count because it will be fast and easy to build without cumbersome binary to ascii conversions (although HTTP uses ASCII...). It will be simple to encode the chunk-type and count in a single four-byte word and equally simple to decode it on the client. Using network byte order makes it straightforward to read. Having space for 256 chunk types is overkill, but enables clients to use byte masks which might be easier in some programming environments. The maximum data block size of 16MB seems large enough.

Discussion

Jimg 17:10, 8 June 2012 (PDT) Question: Why apply this to just the BLOB part of the dat response? If we chunk the whole response, then only DAP4 clients will be able to read it. If we chunk only the BLOB part, then a generic web client can do something with the first part of the response.