DAP4: DAP4 Replacing Chunking
Background
Assuming one believes that the multipart-mime boundary is indeed a unique string in the response, it makes the existing specification for chunking of the data part of Data DMR redundant.
Proposal
I propose that we get rid of the chunking and instead modify the multipart-mime representation to be three parts instead of the current two. The additional, third part would indicate the success or failure of the preceding parts. Because the multipart-mime boundary is (by assumption) unique, it is always possible to unambiguously locate the final success/failure part. This satisfies the original reason for using chunking, which was to allow for the insertion of an error message into the output stream at any point.
This proposal has a number of advantages.
- It simplifies the client and server processing by eliminating the extra processing required by chunking.
- It no longer duplicates the existing HTTP chunked transfer encoding.
- It works for any data format: binary, json, protobuf, utf. They all are treated the same.
The cost is in searching the incoming stream of bytes for the boundary. Using, for example, the Boyer-Moore [1] fast string search I believe that cost is low, especially since the boundary string is long.
[1] Boyer-Moore Search. Note that this page also contains both C and Java implementation code.
Possible Extension
It is possible to extend this proposal to do proper semantic chunking by extending the multipart-mime format from 2+1 parts to 2+N parts. Each of the N middle parts would contain the data for a single variable.