DAP4: DAP4 Grids Proposal

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

<<back to OPULS Development

Background: Grids Delenda Est

(with apologies to Cato the Elder)

The grid construct as originally established in the DAP2 protocol has been a source of problems from its inception. The evolution of the notion of coordinate variables makes its use in its current form (or even closely similar forms) untenable.

Problems Addressed

Grid as scoping/lexical container

This means that properly sharing coordinate variables is not possible without duplication, which is highly undesirable.

Consider the following situation.

    Arrays: D1(x,y), D2(y,z), D3(x,z).
    coord vars: x(x), y(y), z(z)

No grid, as currently defined can represent this because the three coordinate variables x(x), y(y), and z(z), cannot be properly distributed across needed three grids without duplication. The only way this can work is if all the Arrays and all the coordinate variables reside in a single grid; not, I maintain, a useful solution. Further, the Grid must change if new arrays are defined that use any of the coordinate variables, D4(x,w), for example.

Grid projections

When a projection is applied to a grid, the result cannot be a grid. This has been an ongoing source of problems in DAP2 where projecting the array component of a grid results in a structure. From the point of view of semantics, this is a really bad idea.

Multi-dimensional coordinate variables

When representing point data, it is desirable to have coordinate variables distinguished using more than a single dimension. Consider the following:

    array: temp(x,y,z)
    coordinate vars: lat(x,y,z), lon(x,y,z), and depth(x,y,z).

Here we are trying to represent point data where each point is defined by three dimensions: lat, lon, and depth. Grids are not capable of properly representing this case. I should note that neither is, for example, netcdf-3 or netcdf-4. CDM can do it, by only by encoding the proper relationships as attributes with complex internal structure.

Coordinate Variable Duplication

In examining a large number of DAP2 DDS's, I note that coordinate variables inside grids are almost always duplicated outside the grid. My hypothesis has been that this a result of the fact of problem (1) above. In any case, this proposal below would obviate the need for duplication.

Proposal: Grid as mapping

NB: The current Data Model page in the straw man design already does this. Unfortunately, I used XML for the examples (which muddies the idea of an abstract information model with one particular representation of that model) but I think what is presented there is the same as this proposal.Jimg

Rather than making grids be scope containers, grids need to be simple relationship instances between an array and its coordinate variables. This would be done by associating the coordinate variables with an array variable. For example, the first case above (D1,D2,D3) might be represented as:

<variable name="D1"...>
   <map coordinate="x"/>
   <map coordinates="y"/>
 </variable>
...

The case of point data would be represented as follows:

<variable name="temp"...>
   <map coordinate="lat"/>
   <map coordinates="lon"/>
   <map coordinates="depth"/>
</variable>

Note that the dimensions can be inferred from the specified coordinate variables.


-Dennis Heimbigner

Discussion

The four points listed above were/are addressed on the Data Model but in doing so we created a new set of issues with constraints (and there were already some issues left over from DAP2). John's comment #2 immediately below seems to address one of these new problems. I think that restricting the way grids can be subset along with differentiating between subsetting Grids and parts of Grids addresses the problems. Jimg 14:59, 6 March 2012 (PST)

John's comments ...

1) The CDM uses this object model for coordinate systems:

 CDM CoordSys

When translating things like GRIB into CDM, we usually also add the CF attributes, which simplifies things since now the coordsys info is encoded at the data access layer. This is very simple, in CDL:

float Temp(z,y,x);
 :coordinates = "lat, lon, depth";

2) It appears that a Variable that contains map elements is a "grid", and that when you make a data request for a grid, you get back the corresponding values of the maps. Correct?

One problem with that is when you have 2D coordinates, as in:

float lon(y,x);
float lat(y,x);
float Temp(z,y,x);
 :coordinates = "lat, lon, depth";

then you get back 3X more data, which you may not want.

--JohnCaron 15:29, 2 March 2012 (PST)

From above: "...when you make a data request for a grid, you get back the corresponding values of the maps. Correct?" Ans: Maybe. Slightly longer answer: If a client requests that a Grid be subset, then it receives all of the information that a Grid requires in the response. If a client asks for a subset of one of a Grid's components, then it will receive just that component (which will no longer be a Grid). I'm defining a Grid to be a set with three things: The Grid (a specialization of an Array), one or more Maps (each also a specialization of an Array) and one or more Dimensions). In your example, the set of lon, lat and Temp is a Grid. A client can ask for the Grid to be subset and it will receive parts of all three of those pieces, which would also be a Grid. If it asks for just part of Temp, it will receive just that, which is not a Grid. Jimg 13:57, 6 March 2012 (PST)

Basic features of Grids in DAP4

We're still discussing just how constraining "grids" works. I think that the model we choose needs to support:

  • N-dimensional coordinate variables (aka maps)
  • Shared dimensions
  • subsetting that returns a valid grid
  • subsetting that returns parts that make up the grid

I think optimizing transfers should be secondary to proper semantics.

NB: This is already present in the DAP4: Data Model

Jimg 17:11, 2 March 2012 (PST); Updated: Jimg

Counter Proposal

I think I misunderstood the original proposal... I thought it was about subsetting. I've removed my comments about that topic from this page.

I think the gist of this proposal is correct: "...grids need to be simple relationship instances between an array and its coordinate variables." However, the proposal also includes: "Note that the dimensions can be inferred from the specified coordinate variables." which is not correct. In the past we have talked this over and it's come to light that some grids do not have coordinate variables for all of their axis (this was news to me, but an example, I believe, is that a model run might have maps/coordinates for lat and lon and a third dimension for run number where various parameters are varied for different runs).

I propose that a grid be an array with explicit maps/coordinate variables for some or all of its dimensions. I also propose that an array must have explicit dimensions, which I know is obvious, but I wan to draw attention to the idea that a grid has both dimensions and maps/coordinates.

Jimg 11:30, 7 March 2012 (PST)