DAP4: Proposal for Keys in Sequences: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
(Created page with "DevelopmentDAP4 << Back to OPULS Development ==Background== John Caron has argued that supporting nested sequen...")
 
No edit summary
Line 6: Line 6:


==Proposal==
==Proposal==
I propose to support a specific form of nested Sequences. Consider this example.
I propose to add concepts of keys and foreign keys to Sequences. This addition will allow for the support of, among other things, nested sequences. Consider this example.
<source lang="xml">
<source lang="xml">
<Sequence name="SQ1">
<Sequence name="SQ1">
   <Int32 name="f11"/>
   <Key name="f1" type="Int32"/>
  <Sequence name="SQ2">
   <Float32 name="fx"/>
    <Int32 name="f21"/>
   ...
  </Sequence>
   <Structure name="ST1">
    <Sequence name="SQ3">
      <Int32 name="f31"/>
    </Sequence>
   </Structure>
</Sequence>
</Sequence>
<Sequence name="SQ2">
  <ForeignKey key="/SQ1.f1"/>
  <Float32 name="fy"/>
  ...
</Sequence>
</Sequence>
</source>
</source>
This example would be legal under my proposal and would be equivalent to the following.
This approach is a direct representation of the idea of a foreign key as defined in traditional relational database theory.
It specifically indicates how two Sequences can be combined (effectively using join) based on the ForeignKey element in one Sequence pointing to a Key element in another Sequence.
 
The <Key> element in the example in SQ1 is equivalent to the following field.
<Int32 name="f1"/> as far as its role as a field in the Sequence.
 
We might also consider this alternative form.
<source lang="xml"><Int32 name="f1" key="true"/></source>
Here the "keyness" is indicated by an XML attriute versus a <Key> element.
 
In either case, other Sequences can refer to that key using a <ForeignKey> element.
 
As far as fields go, defining a foreign key implicitly includes the key being referenced (f1 in our example) as a field in the Sequence.
 
So, the above example is equivalent to the following.
<source lang="xml">
<source lang="xml">
<Sequence name="SQ1">
<Sequence name="SQ1">
   <Int32 name="f11"/>
   <Int32 name="f1"/>
   <Int32 name="SQ1.SQ2"/>
   <Float32 name="fx"/>
  ...
</Sequence>
</Sequence>
<Sequence name="SQ1.SQ2">
<Sequence name="SQ2">
   <Int32 name="SQ1.SQ2"/>
   <Int32 name="f1"/>
   <Int32 name="f21"/>
  <Float32 name="fy"/>
   <Int32 name="SQ1.ST1.SQ3"/>
  ...
</Sequence>
</source>
 
Once we have keys and foreign keys, it is easy to represent two Sequences as if they were a nested Sequence. So, our example above could be presented to a user as the following equivalent nested Sequence.
<source lang="xml">
<Sequence name="SQ1">
   <Int32 name="f1"/>
   <Sequence name="SQ2">
    <Int32 name="fy"/>
  </Sequence>
</Sequence>
</Sequence>
<Sequence name="SQ1.ST1.SQ3">
  <Int32 name="SQ1.ST1.SQ3"/>
  <Int32 name="f31"/>
</Sequence>
</Sequence>
</source>
</source>
What I have done is to flatten the sequences so that they are all treated as
Note that the common key field (f1) only appears in the outer Sequence because its existence with the same value is implicit in the nesting.
top-level sequences with slightly different names. In addition I have added foreign key columns to the flattened sequences to support proper joining of the sequences.


==Rationale==
==Rationale==
This proposal supports John's desire for nested sequences while it still maintains the effect that all sequences are treated as top-level.
This proposal keeps the data model simple, while still allowing client code to present the appearance of a nested sequence to the user.
 
Note specifically that the first version is how it would appear in the DDX and the second, flattened version is how it would be interpreted


[[User:dmh|Dennis]]
[[User:dmh|Dennis]]

Revision as of 21:05, 24 May 2012

<< Back to OPULS Development

Background

John Caron has argued that supporting nested sequences is desirable because it provides a natural representation for certain datasets such as trajectories of trajectories.

Proposal

I propose to add concepts of keys and foreign keys to Sequences. This addition will allow for the support of, among other things, nested sequences. Consider this example.

<Sequence name="SQ1">
  <Key name="f1" type="Int32"/>
  <Float32 name="fx"/>
  ...
</Sequence>
<Sequence name="SQ2">
  <ForeignKey key="/SQ1.f1"/>
  <Float32 name="fy"/>
  ...
</Sequence>

This approach is a direct representation of the idea of a foreign key as defined in traditional relational database theory. It specifically indicates how two Sequences can be combined (effectively using join) based on the ForeignKey element in one Sequence pointing to a Key element in another Sequence.

The <Key> element in the example in SQ1 is equivalent to the following field. <Int32 name="f1"/> as far as its role as a field in the Sequence.

We might also consider this alternative form.

<Int32 name="f1" key="true"/>

Here the "keyness" is indicated by an XML attriute versus a <Key> element.

In either case, other Sequences can refer to that key using a <ForeignKey> element.

As far as fields go, defining a foreign key implicitly includes the key being referenced (f1 in our example) as a field in the Sequence.

So, the above example is equivalent to the following.

<Sequence name="SQ1">
  <Int32 name="f1"/>
  <Float32 name="fx"/>
  ...
</Sequence>
<Sequence name="SQ2">
  <Int32 name="f1"/>
  <Float32 name="fy"/>
  ...
</Sequence>

Once we have keys and foreign keys, it is easy to represent two Sequences as if they were a nested Sequence. So, our example above could be presented to a user as the following equivalent nested Sequence.

<Sequence name="SQ1">
  <Int32 name="f1"/>
  <Sequence name="SQ2">
    <Int32 name="fy"/>
  </Sequence>
</Sequence>
</Sequence>

Note that the common key field (f1) only appears in the outer Sequence because its existence with the same value is implicit in the nesting.

Rationale

This proposal keeps the data model simple, while still allowing client code to present the appearance of a nested sequence to the user.

Dennis