DAP4: DAP4 Checksum Changes

From OPeNDAP Documentation
Revision as of 21:00, 1 March 2013 by DennisHeimbigner (talk | contribs) (Created page with "DevelopmentDAP4 << Back to OPULS Development ==Background== Currently, the specification says that checksums ar...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

<< Back to OPULS Development

Background

Currently, the specification says that checksums are computed on the contents of the top-level variables in the serialized response. I propose that we change when and where the checksum is computed so that it is computed at the grain of the data chunks instead of at the grain of variable.

Problem Addressed

Checksumming currently requires knowledge of the DMR (and potentially any constraints) so that the contents of "top-level" variables can be identified in the data serialization. Further, checksum errors are not detected until the whole variable has been read.

Proposed Solution

General Format

Each chunk, including the DMR meta-data chunk, has a 16 byte MD5 checksum preceding the contents of the chunk, but following the length count for the chunk. Thus, the general form of a chunk would be as follows:

--------------------------
| length + tags [4bytes] |
--------------------------
| checksum [16 bytes]    |
--------------------------
| data [length-16 bytes] |
--------------------------

The checksum is computed over the contents of the chunk, where the content is treated as uninterpreted bytes. This would also apply to the DMR chunk. The length field for each chunk includes the checksum as part of its length.

DMR Format

In keeping with our current format for the DMR chunk, all parts of this chunk are encoded as UTF-8 characters So the general DMR chunk format would be as follows

------------------------------------------
| 0xHHHHHHHH CRLF                         |
------------------------------------------
| 0xHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH CRLF |
------------------------------------------
| DMR in XML format CRLF                  |
------------------------------------------

Each 'H' stands for a hex digit. As in the general case, the length includes the checksum and the chunk content (i.e. the DMR).

Discussion

Computing the checksum at the chunk level simplifies processing because no knowledge of the DMR is required. The computation can be carried out completely in the chunk reading and writing code.

Additionally, computing the checksum at the chunk level can potentially detect errors more quickly than at the variable level. To see this, suppose that a variable crosses, say, three chunks, and there is an error in the first chunk. If variable level grain is used, three chunks will be read an processed before the checksum error is detected. If chunk grain is used, then the error will be detected as soon as the first chunk is read; the following two chunks need not be processed.

This proposal also introduces no new error processing complexity because a checksum error at the chunk level will generate an IOException error to the layers above. These layers must already be prepared to receive IOException errors such as when, for example, the client-server link is unexpectedly closed.

Dennis