DAP4: VLens (and Sequences): Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
(No difference)

Revision as of 19:58, 25 February 2012

[ Unfinished Page ]

"Oh what a tangled web we weave when first we practice to build a type system."

Recently, I made the claim to James that the Sequence construct could serve to represent CDM/HDF5 vlen constructs.

His reply was as follows:

In my opinion we will want vlens to be in the data model, if not as a type, then as a feature of arrays - see the schema and my text on the Data Model page. The reason for that [is that] I have already tried using Sequence for vlen (in hdf4) and it was a failure. Not a failure from the technical POV but because neither server writers nor client writers nor client users ever 'got it.' I think one part of the problem for those people was that vlens in HDF4 do not support relational operators via the API while Sequences are supposed to. So the conceptual mismatch doomed the idea.

Further, Nathan Potter notes:

At one time we consider it as a representation of an SQL database, where essentially the keys that link the tables in the DB define the structure of some nested sequence thing. I think it's a useful idea, but there is no implementation of it to play with.

These are very important observation, and has caused me to re-think how sequences (and vlens) should be handled in DAP4.

I realize that I was treating sequence and vlen as the same thing and giving short thrift to the fact that we can query (using selection) over sequences. Further, in the few examples I could find, it seemed to me that nested sequences and arrays of sequences were being used as, in effect, vlens.

[Aside: The term "vlen" is kind of odd; the term "list" would actually make more sense.]

My current belief is that we should keep sequences but with the following restrictions:

  1. Sequences can only occur as top-level, scalar, variables within a group.
  2. Sequences may not be nested in any other container (i.e. other sequences or structure)

This keeps sequences for the original purpose of acting as "relations". All other places where we might use a sequence before will now use a vlen.

Issues to be addressed:

  1. Translating to/from CDM
  2. Representing vlens syntactically

James has already proposed this for representing vlens.

One idea is to allow the rightmost dimension of an array to be either fixed in size or marked as varying. I think that captures the vlen semantics. One dimensional arrays would be the simple case; n-dimensional arrays would be the 'array of vlen' case. Since we can have arrays of all types except Sequence...



-Dennis Heimbigner