DAP4: Encoding for the Data Response

Background

There are two different approaches to deserializing the data received by a DAP client: The client may process the data as it is received (i.e., eager evaluation) or it may write those data to a store and process them after the fact (lazy evaluation). A variant of these techniques is to process the data and write it to a store, presumably because the initial processing steps are useful while having the data stored for later processing enables still other uses. However, in this document I'm not going to look at the latter case because experience so far with DAP2 has not provided any indication that would present any performance benefits. We do have example clients that use both eager and lazy evaluation.

HTTP/1.1 HTTP/1.1 defines a chunked transport scheme. In the past we have spent a fair amount of time on the notion of chunking as a way to achieve reliable transmission of errors when those errors are encountered during response formulation, that won't be addressed in this document. Instead, this document will assume that the entire response document described here is chunked in a way that enables reliable transmission of errors. The details of that transfer encoding will be described elsewhere.

References

I found these useful and thought they might be better not lost at the end of a long document.

HTTP/1.1
WikiPedia on Endianness
Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
WebSockets
W3C WebSockets
Javascript Worker Threads (I wish there were a better reference than a blog post)

Problem addressed

There is a need to move information from the server to a client. The way this is done should facilitate many different designs for both server and client components.

Assumptions:

Since DAP is so closely tied to the web and HTTP, its design is dominated by that protocol's characteristics.
Processing on either the client or the server is an order of magnitude faster than network transmission.
Server memory should be conserved with favor given to a design that does not require storage of large parts of a response before it is transmitted (but large is a relative term).
Clients are hard to write and the existence of a plentiful supply of high-quality clients is important (of course, servers are hard to write, too, but there are between 5 and 10 times the number of DAP2 clients as servers).
The response does not explicitly support a real-time stream of data (e.g., a temperature sensor which is a data structure of essentially infinite size). It may, however, be the case that the response can encode that kind of information.

Broad issues:

It should be fast
It should simple
It should be part of the web - meaning that the XML part(s) should be identifiable/useable by generic web software even though the binary data part will be completely opaque.

Proposed solution

The response document will use the multipart-mime standard. The response is the server's answer to a request for data from a client. Each such request must either include a Constraint Expression enumerating the variables requested or a null CE that is taken to mean 'return the entire dataset.' A response will consist of two parts:

A DDX that has no attribute information and contains (only) the variables requested; and
A binary part that contains the data for those variables

The response uses the multipart-mime standard, but there are always exactly two parts - the DDX containing variable names and types and the binary BLOB containing data.

Structure of the metadata (DDX) Part

The start of the DataDDX document consists of the initial Content-Type header that indicates the response is a multipart mime document, followed by the first part. The first part always contains the DDX. Note that the Content-Type of this part is text/xml and that its charset parameter is UTF-8. Note also that the transfer encoding is binary

Content-Type: multipart/related; type="text/xml"; start="<<start id>>";  boundary="<<boundary>>"
 
--<<boundary>>
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: binary
Content-Description: ddx
Content-Id: <<start-id>>

    <<DDX here>>
--<<boundary>>
...

Structure of the binary part

The binary part starts with the MIME headers for a Part in a multipart-related document [Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies]. This header will include the byte-order (big-endian or little-endian) used to encode values.

Data in the 'binary part' will be serialized in the order of the variables listed in the DDX part. Essentially this is the serialization form of DAP2, but has been extended to support arrays with varying dimensions and stripped of the redundant information added by various XDR implementations.

The entire binary content of the response is contained in a second part. Note that the Content-Type of this part is application/x-dap-big-endian or application/x-dap-little-endian. The client will use this header to correctly decode data values. The Content-Length header is present here to help internet tools (such as caches) when the server can realistically know the size of the data to be serialized before the serialization takes place. A value of -1 indicates an unknown size.

...
--<<boundary>>
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-Description: data
Content-Id: <<next-id>>
Content-Length: <<-1 or the size in bytes of the binary data>>

...
--<<boundary>>

Encoding of values

Unlike DAP2, DAP4 will not use XDR. Values will be encoded using the byte order of the server. We will pad all values to four-byte words using an algorithm similar to the one employed by XDR (single- and two-byte variables will be padded, however, arrays will be padded to four-byte 'boundaries' but the elements of an array will not be individually padded). This ensures that a Int16 x[1024][1024] takes up 2MB and not 4MB as would be the case if XDR were used to encode it.

About serialization of varying-sized variables

There are several kinds of varying data:

Strings
- String s;
Array variables that vary in size
- Int32 i[*];
- Float64 j[10][*];
Structure variables with varying dimensions and Sequence variables
- Structure { int32 i; int32 j[10]; } thing[*];
- Sequence { int32 i; int32 j[10]; } thing2;
Structure variables that have a varying dimension and one or more fields that vary
- Structure { int32 i[*]; int32 j[10][*]; } thing[*];

Note that there is no practical difference between a (character) String and an integer or floating point array with varying size except that the type of the elements differ. Thus, the issues associated with encoding Int32 i[*] are really no different than encoding the String type. This same logic can be extended to a varying array of Structures; it can be seen as a string of Structures.

General serialization rules

Narrative form:

Fixed size types: Serialized by writing their (encoded) data.
Strings: Serialized by writing their size as a N-bit integer, then their encoded value
Scalar Structures (which may have String/varying fields): Each field is iteratively serialized.
Arrays (possibly with varying dimensions): An array is serialized by serializing the vector denoted by the leftmost dimension. For a fixed size dimension, each element is serialized. For a varying dimension, the length of the vector is written and then each element is serialized.
Sequences are serialized row by row: First a Start of Instance marker is written, then each of the fields of the row are serialized, until the last row of the Sequence is serialized, then a End of Sequence marker
Opaque types will be treated like Byte [*] variables.

TBD:

How to encode the checksum information? Suggestion: Provide a checksum for all top-level variables in the response This prevents 'checksum madness' where, because each field in a structure is checksummed, an array of a 10^6 structures of two bytes results in 2 x 10^6 checksum values. It would make much more sense to checksum the whole structure. Of course, this begs the question, why not the whole dataset/response? The initial use-case for this feature stated 'each variable' but was either vague or unaware about exactly what that means.

Assumptions:

String: it is assumed that the server will know ( or can determine w/o undue cost) the length of the String at the time serialization begins.
It is assumed that the size of any variable dimension will be known at the time of serialization of that variable's dimension.
For a Sequence, it is assumed that the total size may be considerable and not known at the time serialization begins.

Notes:

This will use receiver-makes-right and thus needs a header to convey that; I suggest using the Content-Type headers subtype value which RFC 2045 allows to be an x- value of our choosing.
Sequences cannot contain child Sequences (i.e., we are not allowing 'nested sequences') in DAP4
This set of serialization rules can be modified slightly to support the case where fixed, string and varying size data are separated into different parts of a multipart-mime document. In that case there would be more than two parts ot the response.

Rationale

There are two main differences between this proposed design and Proposed DAP4 On-The-Wire Format: The data corresponding to varying-size variables is mixed in with the fixed-size variables, and this design depends on the DDX (i.e., the metadata in the first of the two Parts of the response) to provide critical information regarding the organization of the binary information.

Combining the fixed- and varying-size data

The effort required by a server to build that response that supports random access does not seem justified. To build a response that separates the fixed- and varying-size data values, the server either must make two passes at serializing the response or store the varying-size data after serialization until all of the fixed-size data have been serialized/transmitted. If DAP were operating over transports that used parallel I/O, this would not be the case. However, for HTTP that issue is moot. A two-pass serialization process is complex and storing the serialized varying-size data is not acceptable (either because it demands run-time memory, uses slow secondary storage, ...).

On the other hand, clients can easily read the response described here and reformat it for random access use. In addition, a client may be able to take advantage of information unavailable to the server, such as the intended use of the data, to optimize the storage in ways that the server cannot. What performance penalties will this place on a client? If the client uses only a single thread/process, it cannot begin to use the data until all of the response has been read from the socket. However, it can reformat the data while they are being read (either by writing those data to two or more files or to different parts of memory) and it will have to do that anyway. That is, a single-threaded client is stuck reading all of the bytes of the response from a socket and storing them somewhere before it can do anything else. A savvy client would certainly look at the DDX and note that if it contains only fixed-size variables, it is already in a form amenable to random access. Either way, the client can have the data in a form that facilitates random access just as soon as the response is completely received (based on the assumption that network I/O is far slower than even spinning disk I/O). A multi-threaded (or double-buffered) client could do significant processing while waiting for successive read operations to complete.

Removing (most of) the tags from the binary content

In the initial Proposed DAP4 On-The-Wire Format tags for the sizes and types were included in the binary stream so it was not necessary to also 'walk the DDX' to deserialize the data. This has considerable appeal. It makes error detection easier and makes the response document less bound to a particular deserialization scheme. So why leave those out? Compactness and deserialization speed. This response has the minimal amount of extra information, which makes it compact but also faster to deserialize for single-threaded clients (assumed to be most clients). It is assumed that most clients will read a block from a socket, then store it somewhere, then read the next block, and so on. Effectively mixing the parse of the response with the network I/O. The use of prefixed length information, kept to a minimum, minimizes the number of reads for a typical client implementation. Of course, a client could read fixed size blocks from the stream and parse it in-memory, but those that do will hardly suffer from this response format.

Suitability for other protocols

Two candidate protocols for DAP appear to be AMQP and WebSockets. I know of no real draw to implement DAP over AMQP; WebSockets seems like it could be very useful for building interactive web-base UIs, but it also seems very draft. I think we should focus our efforts on a response that works well with HTTP.

Discussion

Example responses

In these examples, spaces and newlines have been added to make them easier to read. The real responses are as compact as they can be.

A single scalar

Dataset {
    Int32 x;
} foo;

(NB: Some poetic license used in the following)

Content-Type: multipart/related; type="text/xml"; start="<<start id>>";  boundary="<<boundary>>"
 
--<<boundary>>
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: binary
Content-Description: ddx
Content-Id: <<start-id>>

    <<DDX here>>
--<<boundary>>
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-Description: data
Content-Id: <<next-id>>
Content-Length: <<-1 or the size in bytes of the binary data>>

x 
--<<boundary>>

A single array

Dataset {
    Int32 x[2][4];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

x00 x01 x02 x03 x10 x11 x12 x13 
--<<boundary>>

A single structure

Dataset {
    Structure {
        Int32 x[2][4];
        Float64 y;
    } s;
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

x00 x01 x02 x03 x10 x11 x12 x13 
y 

--<<boundary>>

An array of structures

Dataset {
    Structure {
        Int32 x[2][4];
        Float64 y;
    } s[3];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

x00 x01 x02 x03 x10 x11 x12 x13 
y 
x00 x01 x02 x03 x10 x11 x12 x13 
y 
x00 x01 x02 x03 x10 x11 x12 x13 
y 

--<<boundary>>

A single varying array (one varying dimension)

Dataset {
    String s;
    Int32 a[*];
    Int32 x[2][*];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

16 This is a string 

5 a0 a1 a2 a3 a4

3 x00 x01 x02 6 x00 x01 x02 x03 x04 x05 

--<<boundary>>

NB: varying dimensions are treated 'like strings' and prefixed with a length count. In the last of the three variables, the array x is a 2 by varying array with the example's first 'row' containing 3 elements and the second 6.

A single varying array (two varying dimensions)

Dataset {
    Int32 x[*][*];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

3

3 x00 x01 x02 

6 x10 x11 x12 x3 x14 x15

1  x20 

--<<boundary>>

An varying array of structures

Dataset {
    Structure {
        Int32 x[4][4];
        Float64 y;
    } s[*];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

2

x00 x01 x02 x03 x10 x11 x12 x13
y 

x00 x01 x02 x03 x10 x11 x12 x13 
y 

--<<boundary>>

NB: two rows...

An varying array of structures with fields that have varying dimensions

Dataset {
    Structure {
        Int32 x[2][*];
        Float64 y;
    } s[*];
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

3

1 x00 4 x10 x11 x12 x13 
y 

3 x00 x01 x02 2 x10 x11
y 

2 x00 x01 2 x10 x11
y 
--<<boundary>>

An Sequence

Dataset {
    Sequence {
        Int32 x[2][*];
        Float64 y;
        Float64 x;
        Structure {
            int32 p;
            Int32 q;
        } ps_and_qs;
    } s;
} foo;

...
Content-Length: <<-1 or the size in bytes of the binary data>>

SOI

1 x00 4 x10 x11 x12 x13 
y 
x
p
q

SOI
3 x00 x01 x02 2 x10 x11
y 
x
p
q

SOI
2 x00 x01 2 x10 x11
y
x
p
q

EOS
--<<boundary>>