DAP4: VLens (and Sequences): Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 3: Line 3:
when first we practice to build a type system."
when first we practice to build a type system."
</blockquote>
</blockquote>
==BackGround==


Recently, I made the claim to James that the Sequence construct could serve to represent CDM/HDF5 vlen constructs.
Recently, I made the claim to James that the Sequence construct could serve to represent CDM/HDF5 vlen constructs.
Line 39: Line 41:
   }[d1][*];
   }[d1][*];


=== Commentary ===
===Alternate proposal ===
[Added 4/7/2012]
<ins>After our most recent telecon, it seems that James is of the strong opinion that we need to keep Sequences, including nested Sequences. In light of this, I propose that we therefore drop the vlen construct and just stick with sequences. This would be essentially equivalent to option 2 above, but using the keyword "sequence" instead of "vlen".
 
Note that this partly contradicts the inference from James' experience above but differs in that we are saying that selection CANNOT be applied to some sequences, which is different, I thin, than saying that selection CAN be applied to some vlens.</ins>
== Discussion ==


* As a personal matter, I would prefer to use the CDM representation. Adding a new container type (Vlen),while appealing semantically, only complicates the model more. James' approach requires the use of additional Structure definitions which, in my opinion, obscures the underlying semantics.
* As a personal matter, I would prefer to use the CDM representation. Adding a new container type (Vlen),while appealing semantically, only complicates the model more. James' approach requires the use of additional Structure definitions which, in my opinion, obscures the underlying semantics.

Revision as of 21:27, 7 April 2012

"Oh what a tangled web we weave when first we practice to build a type system."

BackGround

Recently, I made the claim to James that the Sequence construct could serve to represent CDM/HDF5 vlen constructs.

His reply was as follows:

In my opinion we will want vlens to be in the data model, if not as a type, then as a feature of arrays - see the schema and my text on the Data Model page. The reason for that [is that] I have already tried using Sequence for vlen (in hdf4) and it was a failure. Not a failure from the technical POV but because neither server writers nor client writers nor client users ever 'got it.' I think one part of the problem for those people was that vlens in HDF4 do not support relational operators via the API while Sequences are supposed to. So the conceptual mismatch doomed the idea.

This is a very important observation, and has caused me to re-think how sequences and vlens should be handled in DAP4.

Vlens

Let us start by addressing the addition of vlens to the DAP4 data model.

Some possible ways to insert vlens into the dap4 data model include the following.

  1. In CDM, a vlen is marked by a "*" as a dimension name in the set of dimensions associated with a variable; the list of dimensions is allowed to have any number of occurrences of "*". [Aside, I will note that "unlimited" is also an option for CDM, but should not be needed for DAP4 because at the time of a request for data, the size of the unlimited dimension is known]
  2. James has proposed something similar except that the "*" is restricted to occurring as the last dimension.
  3. Another possibility is to create a new container object, call it Vlen, that (like sequence) is inherently of variable length and is not dimensionable. [Aside: The term "vlen" is kind of odd; the term "list" would actually make more sense]

If we were to choose option 2, then we must address the question of translation between CDM and DAP4. Going from DAP4 to CDM is straightforward because having the last dimension be "*" is legal in CDM. Going from CDM to DAP4 requires the introduction of a number of Structure elements.

Consider the following CDM example (using a pseudo-syntax)

Int32 v[d1][*][d2][*][d3];

This would have to be represented in DAP4 something like this.

 Structure v_1 {
   Structure v_2 {
     Int32 v[d3];
   }[d2][*];
 }[d1][*];

In the event that the last dimension is a "*":

 Int32 v[d1][*][d2][*];

This would have to be represented in DAP4 something like this, which has one less extra structure.

 Structure v_1 {
     Int32 v[d2][*];
 }[d1][*];

Alternate proposal

[Added 4/7/2012] After our most recent telecon, it seems that James is of the strong opinion that we need to keep Sequences, including nested Sequences. In light of this, I propose that we therefore drop the vlen construct and just stick with sequences. This would be essentially equivalent to option 2 above, but using the keyword "sequence" instead of "vlen".

Note that this partly contradicts the inference from James' experience above but differs in that we are saying that selection CANNOT be applied to some sequences, which is different, I thin, than saying that selection CAN be applied to some vlens.

Discussion

  • As a personal matter, I would prefer to use the CDM representation. Adding a new container type (Vlen),while appealing semantically, only complicates the model more. James' approach requires the use of additional Structure definitions which, in my opinion, obscures the underlying semantics.
  • One thing that I need to check is how this affects the proposed on the wire format in DAP4: DAP4 On the Wire Format.

Sequences

Nathan Potter noted the following.

At one time we considered using nested sequences as a representation of an SQL database, where essentially the keys that link the tables in the DB define the structure of some nested sequence thing. I think it's a useful idea, but there is no implementation of it to play with. [Aside: I could swear that someone told me that they actually used a relational database as the back end to a sequence]

[ ndp - I mentioned (in the same email that contained the nested sequence comment quoted above) that we wrote and released a server that represents a single database table or view as a single sequence. This is quite different from the point that I was making in the section quoted above that there may be a use case for nested sequences. (which was in response to your question "Are there any other legitimate examples for using NESTED sequences.") - ndp 12:08, 26 February 2012 (PST)]

It is this ability to have selection constraints applied that separates sequences from vlens. It is clear that in the absence of selection constraints, there is no essential difference between sequences and vlens. Further, in the few examples I could find or were sent to me, it seemed that nested sequences were being used as, in effect, vlens.

So, we seem to have two very similar concepts (vlen and sequence), which complicates the DAP4 model. The question for me is:

Do we get rid of the Sequence concept, or at least define it as equivalent to the following?
Structure {...} [*]

My current belief is that we should keep sequences but with the following restrictions:

  1. Sequences can only occur as top-level, scalar, variables within a group.
  2. Sequences may not be nested in any other container (i.e. other sequences or structure)

This keeps sequences for the original purpose of acting as "relations". All other places where we might use a sequence before will now use a vlen.

As with vlens, the translation between DAP4 and CDM needs to be addressed.

  1. The conversion from DAP4 to CDM can be addressed using the rule above, namely that a sequence is, in CDM, represented as follows.
    Structure {...} [*]
  2. Translation from CDM to DAP4 allows for the option of never using sequences, but always using vlens. An alternate translation might be to say that if you have a top-level CDM structure whose only dimension is a vlen, then translate that to a sequence (in effect inverting the DAP4->CDM translation).

-Dennis Heimbigner