DAP4: DAP4 Replacing Chunking: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
(Created page with "DevelopmentDAP4 << Back to OPULS Development == Background == Assuming one believes that the multipart-mime bou...")
 
No edit summary
 
Line 22: Line 22:
This proposal has a number of advantages.
This proposal has a number of advantages.
# It simplifies the client and server processing by eliminating the extra processing required by chunking.
# It simplifies the client and server processing by eliminating the extra processing required by chunking.
# It removes the need to test the chunking layer.
# It no longer duplicates the existing HTTP chunked transfer encoding.
# It no longer duplicates the existing HTTP chunked transfer encoding.
# It works for any data format: binary, json, protobuf, utf. They all are treated the same.
# It works for any data format: binary, json, protobuf, utf. They all are treated the same.


The cost is in searching the incoming stream of bytes
<strike>The cost is in searching the incoming stream of bytes
for the boundary. Using, for example, the Boyer-Moore [1]
for the boundary. Using, for example, the Boyer-Moore [1]
fast string search I believe that cost is low, especially
fast string search I believe that cost is low, especially
since the boundary string is long.
since the boundary string is long.</strike>
 
Note that every response will end with the following.
(1) --&lt;boundary&gt;
(2) success info or error info
(3) --&lt;boundary&gt;
 
So this trailer can be easily located by searching backward from the end of th
response to locate the first boundary (line 1). This only requires searching
at most a few hundred bytes.


[[User:dmh|Dennis Heimbigner]]
[[User:dmh|Dennis Heimbigner]]


[1] [http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm Boyer-Moore Search]. Note that this page also contains both C and Java implementation code.
<strike>[1] [http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm Boyer-Moore Search]. Note that this page also contains both C and Java implementation code.</strike>


=== Possible Extension ===
=== Possible Extension ===

Latest revision as of 18:11, 24 August 2012

<< Back to OPULS Development

Background

Assuming one believes that the multipart-mime boundary is indeed a unique string in the response, it makes the existing specification for chunking of the data part of Data DMR redundant.

Proposal

I propose that we get rid of the chunking and instead modify the multipart-mime representation to be three parts instead of the current two. The additional, third part would indicate the success or failure of the preceding parts. Because the multipart-mime boundary is (by assumption) unique, it is always possible to unambiguously locate the final success/failure part. This satisfies the original reason for using chunking, which was to allow for the insertion of an error message into the output stream at any point.

This proposal has a number of advantages.

  1. It simplifies the client and server processing by eliminating the extra processing required by chunking.
  2. It removes the need to test the chunking layer.
  3. It no longer duplicates the existing HTTP chunked transfer encoding.
  4. It works for any data format: binary, json, protobuf, utf. They all are treated the same.

The cost is in searching the incoming stream of bytes for the boundary. Using, for example, the Boyer-Moore [1] fast string search I believe that cost is low, especially since the boundary string is long.

Note that every response will end with the following.

(1) --<boundary>
(2) success info or error info
(3) --<boundary>

So this trailer can be easily located by searching backward from the end of th response to locate the first boundary (line 1). This only requires searching at most a few hundred bytes.

Dennis Heimbigner

[1] Boyer-Moore Search. Note that this page also contains both C and Java implementation code.

Possible Extension

It is possible to extend this proposal to do proper semantic chunking by extending the multipart-mime format from 2+1 parts to 2+N parts. Each of the N middle parts would contain the data for a single variable.