New Data Model

From OPeNDAP Documentation
Revision as of 23:30, 24 June 2008 by Jimg (talk | contribs) (New page: I'd like to jot down some ideas for a change to our current data model. The idea is partly new and partly based on the idea that Nathan has talked about that we expand the set of types for...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

I'd like to jot down some ideas for a change to our current data model. The idea is partly new and partly based on the idea that Nathan has talked about that we expand the set of types for attributes.

Suppose we have only variables. Each variable has a name, type, value and a tag. The tag is like an enumeration which is used to include something akin to C.J. Date's definition of 'metadata.' It is used to describe the variable's use in the data source. Some values of this tag would be 'variable', 'attribute' and maybe 'synthetic.'

In this model, the 'attributes' of a netCDF or HDF4/5 file would be variables. The 'variables' from those files would be variables that would would have child variables; those child variables would hold the netCDF file's 'attributes'. The tag would be used to record information so that the original structure would not be lost.

A DDX w/o data would list the names, types and tags (sorry for using that name since it has an XML meaning, too) of everything. Clients would need to ask for the stuff they wanted, but we'd extend the syntax of a CE to make it easy to ask for all of the stuff that's part of 'Array X.'

In this model, 'attributes' would be sent as binary values.

I think this would make some operations like searches across data sources and combined use of data sources easier because the arbitrary distinctions between variables and attribtues would go away (but the original data source's organizational structure would be preserved). It would also provide a way to pose queries that ask for 'variables' based on the values of 'attributes'.

This data model might make it harder to implement something like the netCDF CL, although I'm not sure that's true.

Comments? James Gallagher - 30 Sep 2003

It's hard to know whether it's better to edit the above, or put changes in as addenda...

It might be better to say that each variable has name, type, value, a list of properties which are themselves variables and a tag. The list of properties could be null. The idea is to shift from the hard division between variables and attributes to variables only while preserving the idea that a variable may have multiple values and those values may have different types (which is what attributes accomplish but they do it in a way that creates an artificial separation). See Winston and Horn, "Lisp," 2ed, pp.96--7.

The idea is not that every variable is a structure! It's that an array of bytes like: Byte x[1024][1024]; Has not only the 1M values but can also have a string named Units with a specific value and some other names types and values. So, in a sense it's like a structure, but there's an explicit hierarchy which says that x is the variable and Units is part of x, not that both x and Units occupy equal position in the hierarchy (which would be the case if they were all part of a structure). James Gallagher - 01 Oct 2003


I like the idea of being able to select variables based on the values of their attributes (obviously using old terminology here). That could be a very powerful feature. OTOH, it looks like these tags would have the same semantics as attributes already have (i.e., as metadata for variables). I think its important to distinguish the user-visible "Abstract Data Model" from the implementation model, in which you might use the same class to implement different things. So I'm not sure which we are talking about here. I would suggest that we have a thread where we just talk about the user-visible Abstract Data Model (how about "Abstract Model" for short?), and make some UML diagrams etc. I have been working on similar things for both netCDF and HDF5, and its pretty useful for seeing where there are semantic mismatches. John Caron 02 Oct 2003

OK. Lets do that, lets start a thread on the Abstract Model. I tend to mix up the implementation with the abstraction, I'll try to keep that from happening. I'll crank that up in Abstract Model. James Gallagher - 03 Oct 2003