OPULS: UGrid Subsetting

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Overview

Adding a server side function to hyrax that allows the user to subset unstructured grid (triangular mesh) data objects based on values in their spatial range.


03/15/2013 Update

I have a new version of the server up and running with the ugrid subsetting function. Please check it out, details below. It's a "test" server that I am using to work with NOAA to explore the use of AWS storage services, so it's not going to be rock solid. If you want to use it for a demo, tell me and I'll work with you to see it running when you want it.


The Function

Name
ugr3 - Restrict the domain of an unstructured 2D Mesh Topology grid
Syntax
ugr3(0, rangeVariable:string, [rangeVariable:string, ... ] condition:string)
  • The first parameter is currently required to have value zero, to indicate that you are subsetting the nodes of the mesh and not the edges and faces. No other value is currently supported.
  • The second parameter is a list of one or more data variables that you wish to subset, they may be associated with nodes, edges, or faces of the mesh.
  • The third parameter is string whose value is an expression that defines conditions that the domain variables must meet in order to be included in the result.
Here is an example:
ugr3(0,depth,"28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0")
Used in a url:
http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/hyrax/ebs/Ike/2D_varied_manning_windstress/test_dir-norename.ncml.dds?ugr3(0,depth,"28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0")
And then once your browser gets ahold of it:
http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/hyrax/ebs/Ike/2D_varied_manning_windstress/test_dir-norename.ncml.dds?ugr3(0,depth,%2228.0%3Clat%20&%20lat%3C29.0%20&%20-89.0%3Clon%20&%20lon%3C-88.0%22)

The Data

I have a new server up with the ugrid data and subsetting function on board here: http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/

In this dir: http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/ You can find the datasets (hex.nc, test4.nc) that were used by Bill & Scott at UW to develop the first ugrid restrict using the gridfields library. However their code was based on an older version of the ugrid spec, and it seems the specification shifted out from under their code. I worked on the code to get it to respond to datasets organized as described here: https://publicwiki.deltares.nl/display/NETCDF/Deltares+CF+proposal+for+Unstructured+Grid+data+model

I adapted their original test data using ncml files. I wrote an ncml file for test4.nc ( called test4-ugrid.ncml in the same directory) that makes it compliant with the Deltares specification and it can be subset with the ugrid restrict function. http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/test4-ugrid.ncml Also in that directory is the time-sliced dataset (fvcom_1step.nc) that you provided http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/fvcom_1step.nc Which can be subset using the ugrid restrict function.

Also, a while back I tracked down and downloaded on of the larger, multi-time-step fvcom ugrid data files, it's now set up here: http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/Ike/ Inside you'll find that the 2D models have useful stuff. In http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/Ike/2D_varied_manning_windstress/ You should see the original nmcl files from the source server. These weren't compatible with Hyrax - mostly the result of Hyrax not being as "elastic" as the TDS with its inputs.

The file test_dir-norename.ncml can be subset using the ugrid restrict function: http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/Ike/2D_varied_manning_windstress/test_dir-norename.ncml As long as you subset a node variable that only has one dimension. The file test_dir.ncml is closer to the original ncml file 00_dir.ncml, unfortunately it only works superficially. You can get the metadata responses for this dataset, but there is a bug in the ncml code that prevents ugrid restrict function from returning data values for renamed variables. The test_dir-norename.ncml is simply test_dir.ncml without the renamed variables.



Places for improvement

  • Make a version in which only the lat and lon boundaries are specified that then proceeds to subset all of the nodes, edges, and faces.
    See the "super restrict" function discussion below
  • Allow the user to specify both face and grid variables
    It appears that this should be straight forward, and this is where I'll proceed.
    UPDATE: This works now. Test URL: http://ec2-54-242-224-73.compute-1.amazonaws.com:8080/opendap/ebs/fvcom_1step.nc.dds?ugr3(0,ua,h,%2228.0%3Clat%20&%20lat%3C29.0%20&%20-89.0%3Clon%20&%20lon%3C-88.0%22)
  • Figure out how to handle node (or edge, or face) variables that have additional dimensions.
    Need to think about the various designs for this - there an example below (using dap2 syntax??) but I don't think it would work currently.
  • Can we have the CE allow us to get a single time slice using something like this:
ugr3(0,zeta[18][*],"28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0")
where zeta is defined "Float32 zeta[time = 1081][node = 417642];"??
  • Even so it looks like we'll something to load each extra dimensioal slice into GF lib and subset.
    And if I understood Bill correctly we may be able to reorganize the code so that we can restrict the coordinate topology and then load one variable at a time, subset the variable and then transmit it. Which I think we can make work for both one and N dimensional variables...


  1. Make individual functions that can be 'nested'
    regrid(ugr(....),....)
    ugr(regrid(....))
    Do we let the users expression dictate the efficency of the operation, or do we try to optimize (subset before regrid etc.) and how would we do that?
  2. Can we restrict time series like this (using the current syntax)??
    We need a way to do it, either in the ugrid code or elsewhere. This example (potentially) uses DAP but it may not be support and there may be issues making it happen. Maybe DAP4 syntax would make this easier? James?
    ?zeta[0:15:1000][*],ugr3(0,zeta,"28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0")
  3. What about nesting using some kind of wild card for the current (ugrid) dataset. This keeps the type signature constant. ($ is the entire dataset which is a ugrid dataset, and is the inout to the interior ugr3 function call)
    datasetName?ugr3(2,ugr3(0,$,"28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0"),"temp>12")
  4. Create a super restrict function that can operate at any rank and that always operates on the entire ugrid dataset
    datsetname?superrestrict("28.0<lat & lat<29.0 & -89.0<lon & lon<-88.0","","temp>12")
  5. If I understood Bill correctly we may be able to reorganize the code and make a couple more calls into gridfields which would allow us to restrict the coordinate topology (but preserve the coordinate index mapping) so that then we can load one variable at a time, subset the variable and then transmit it. Which I think we can make work for both one and N dimensional variables.