DAP4: Subsetting Arrays and Grids By Value

From OPeNDAP Documentation

<-- back to OPULS Development

ndp

1 Background

DAP2 did not support sub-setting of Arrays and Grids using relational operators. Many of our support questions over the years have been from frustrated users who were attempting to perform relational sub-setting on these objects and couldn't understand why it wasn't working.


2 Problem addressed

Allow users to subset Arrays (and "Grids") using relational operators.

3 Proposed solution

Allow users to apply relational constraints to the values of Arrays (and "Grids"). The server should return a Sequence which holds the matching Array values, along with the values of all of the associated Maps.

For example, let's consider this data object:

<Dimension name="x" size="1024"/>
<Dimension name="y" size="1024"/>
<Dimension name="z" size="1000"/>

<!-- The dimensions of a Coordinate MUST be SharedDimensions -->
<Float32 name="lon">
    <dimension ref="x"/>
    <dimension ref="y"/>
</Map>

<Float32 name="lat">
    <dimension ref="x"/>
    <dimension ref="y"/>
</Map>

<Float32 name="depth">
    <Attribute name="unit" type="String"><value>meters</value></Attribute>
    <dimension ref="x"/>
    <dimension ref="y"/>
    <dimension ref="z"/>
</Map>

<Float64 name="sal">
    <dimension ref="x"/>
    <dimension ref="y"/>
    <dimension ref="z"/>
    <map name="lat" />
    <map name="lon" />
    <map name="depth" />
</Float64>

Applying the constraint "?sal<3.2" would return this:

<Sequence name="sal">
    <Float64 name="sal" />
    <Float32 name="lat"/>
    <Float32 name="lon"/>
    <Float32 name="depth"/>
</Sequence>

And the evaluation might look something like this (pseudo-code) for a returned sequence:

for(int x=0; x<1024 ; x++){
    for(int y=0; y<1024 ; y++){
        for(int z=0; z<1000 ; z++){
            if(sal[x][y][z] < 3.2){
                match_sal   = sal[x][y][z];
                match_lat   = lat[x][y];
                match_lon   = lon[x][y];
                match_depth = depth[x][y][z];
                send_sequence_row(match_sal, match_lat, match_lon, match_depth);
            }
        }
    }
}

Or the evaluation might look something like this (pseudo-code) for a returned mask:

int xsize = 1024;
int ysize = 1024;
int zsize = 1000;

Mask sal_mask = new Mask(x,y,z);
sal_mask.setAll(false);

for(int x=0; x<1024 ; x++){
    for(int y=0; y<1024 ; y++){
        for(int z=0; z<1000 ; z++){
            if(sal[x][y][z] < 3.2){
                match_sal   = sal[x][y][z];
                sal_mask.set(x,y,z,true);
            }
        }
    }
}

sal_mask.serialize();

4 Rationale for the solution

Subsetting arrays and "grids" using relational expressions is unlikely to yield a set of matching items that can still be viewed as the same object in the data model. By representing the result as a Sequence we are able to return a reasonable representation of the result of the applied constraint using another representation found in the data model.

5 Discussion

Dennis(4/28/2012) I like this proposal. It avoids the problem of using variable length dimensions and provides an additional use for sequences.

Dennis(4/29/2012) I might suggest that the produced sequence should contain columns for the dimensions of the selected variable. Using the above example, I would suggest the new sequence should look like this.

<Sequence name="sal">
    <Float64 name="sal" />
    <Int32 name="x" />
    <Int32 name="y" />
    <Int32 name="z" />
    <Float32 name="lat"/>
    <Float32 name="lon"/>
    <Float32 name="depth"/>
</Sequence>

Dennis(4/28/2012) I am not sure, however that I see the algorithm for choosing the values for the map columns. Nathan, can you elaborate on the rule for doing that? Never mind; this issues requires some detailed discussion.

Dennis(4/28/2012) slightly off topic. I thought we had agreed that map variables are just declared using ordinary variable declarations. The example above is using the element keyword to define coordinate variables. E.g.

<Map name="lon" type="Float32">
    <dimension ref="x"/>
    <dimension ref="y"/>
</Map>

should instead be:

<Float32 name="lon">
    <dimension ref="x"/>
    <dimension ref="y"/>
</Float32>

ndp That's probably the case - I was just doing a copy-pasta from somewhere else on the wiki for the example. And I'm sure that somewhere hasn't been brought into line with our current discussion. I suspect that once we iron out an agreement on the various bits of the data model we'll need to go back and rework the wiki content to reflect our thinking.