DAP4: Proposal for Keys in Sequences: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 69: Line 69:
==Rationale==
==Rationale==
This proposal keeps the data model simple, while still allowing client code to present the appearance of a nested sequence to the user.
This proposal keeps the data model simple, while still allowing client code to present the appearance of a nested sequence to the user.
==Discussion==
[[User:dmh|Dennis]] I am having second thoughts about how to support nested relations. The problem is not with the key idea, but rather how to generate a query to simulate a query against nested relations. If one actually supported nested relations in the model, then a query could be done with a single query. With the relations flattened, it requires two queries: one against each relation. This is both confusing and more complex than is desirable.


[[User:dmh|Dennis]]
[[User:dmh|Dennis]]

Revision as of 16:52, 28 May 2012

<< Back to OPULS Development

Background

John Caron has argued that supporting nested sequences is desirable because it provides a natural representation for certain datasets such as trajectories of trajectories.

Proposal

I propose to add concepts of keys and foreign keys to Sequences. This addition will allow for the support of, among other things, nested sequences. Consider this example.

<Sequence name="SQ1">
  <Key name="f1" type="Int32"/>
  <Float32 name="fx"/>
  ...
</Sequence>
<Sequence name="SQ2">
  <ForeignKey key="/SQ1.f1"/>
  <Float32 name="fy"/>
  ...
</Sequence>

This approach is a direct representation of the idea of a foreign key as defined in traditional relational database theory. It specifically indicates how two Sequences can be combined (effectively using join) based on the ForeignKey element in one Sequence pointing to a Key element in another Sequence.

The <Key> element in the example in SQ1 is equivalent to the following field. <Int32 name="f1"/> as far as its role as a field in the Sequence.

We might also consider this alternative form.

<Int32 name="f1" key="true"/>

Here the "keyness" is indicated by an XML attriute versus a <Key> element.

In either case, other Sequences can refer to that key using a <ForeignKey> element.

As far as fields go, defining a foreign key implicitly includes the key being referenced (f1 in our example) as a field in the Sequence.

So, the above example is equivalent to the following.

<Sequence name="SQ1">
  <Int32 name="f1"/>
  <Float32 name="fx"/>
  ...
</Sequence>
<Sequence name="SQ2">
  <Int32 name="f1"/>
  <Float32 name="fy"/>
  ...
</Sequence>

Once we have keys and foreign keys, it is easy to represent two Sequences as if they were a nested Sequence. So, our example above could be presented to a user as the following equivalent nested Sequence.

<Sequence name="SQ1">
  <Int32 name="f1"/>
  <Sequence name="SQ2">
    <Int32 name="fy"/>
  </Sequence>
</Sequence>
</Sequence>

Note that the common key field (f1) only appears in the outer Sequence because its existence with the same value is implicit in the nesting.

Rationale

This proposal keeps the data model simple, while still allowing client code to present the appearance of a nested sequence to the user.

Discussion

Dennis I am having second thoughts about how to support nested relations. The problem is not with the key idea, but rather how to generate a query to simulate a query against nested relations. If one actually supported nested relations in the model, then a query could be done with a single query. With the relations flattened, it requires two queries: one against each relation. This is both confusing and more complex than is desirable.

Dennis