Talk:Server-side Functions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Tentative specification

This is a first tentative specification, just to see how far we can go with the ideas alreay in mind. Feel free to change, edit, adapt, modify or leave comments. --RobDeAlmeida

Capabilities introspection

The capabilities response is requested by accessing the URL /functions.xml. Perhaps it would be also a good idea to embed the response in the thredds catalog, or put at least a pointer to it?

Different representations of the response could be accessed using different extensions: /functions.html for a readable representation, /functions.json for JSON, etc.

Capabilities response

This is a plain old xml response, inspired by XINS. Parameters are positional, and always required:

<functions xmlns="http://xml.opendap.org/ns/SSF">
  <function name="mean">
    <description>Calculates mean over axis between two points.</description>
    <example>mean(sst,"lat",10.0,40.0)</example>
    <input type="Grid">
      <description>The name of the variable to be averaged.</description>
    </input>
    <input type="String">
      <description>The name of the axis.</description>
    </input>
    <input type="Float64">
      <description>The initial value of the averaging.</description>
    </input>
    <input type="Float64">
      <description>The last value of the averaging.</description>
    </input>
    <output type="Grid">
      <description>A new Grid with a degenerated axis.</description>
    </output>
  </function>
</functions>

Problem: how do we specify more than one type of input type allowed? I can think of:

<input type="Grid Array">

But the list could get big quickly if we want to list Int32, UInt16, etc. It would be nice to have a shortcut to the base types:

<input type="Array Base">

With Base meaning all the base types.

Another problem is that it would be useful to allow for a variable number of arguments, like in the geogrid function, which allows an optional relational expression.

Comment: I think that we should keep this limited to explicitly named types and not provide for polymorphic arguments. My approach with grid(), geogrid() and the rest was to make the numeric arguments Float64 and convert other numeric values to that type, throwing exceptions for stuff that doesn't fit (e.g., String). James

Comment: I like the idea of having a capabilities document that makes the server self-describing which runs counter, in a way, to my earlier suggestion that we define groups of functions. Steve's correct that using the groups to define what's present will lead to the DLL mess. The Groups would/should/could be more like a set of logical features which, defined as such, help implementors know what to build to cover certain common use cases. James

Syntax specification

RobDeAlmeida proposed two approaches: first, for simple requests use the standard DAP syntax for calling funtions; second, for complex requests, allow a user submitted script to be associated with a newly created function. This second suggestion is a bit controversial because it requires some client software (Ferret, eg) to be rewritten, or at least relinked with a HTTP library that allows POSTs. It also breaks a request in two separate steps. Following Roland Schweitzer's suggestion, the syntax should be tests with the Server Side Functions Use Cases proposed by Steve Hankin.

Difference time series of the same area mean from two different datasets

Requires:

  1. area-average each of the variables over the same (or different) areas
  2. specify the vertical coordinate point on each in a geo-aware manner. (i.e. not by vertical index)
  3. specify the time range; regrid the time series of one variable to match the other
  4. take a difference between the two time series

Using nested functions:

/dataset.dods?sub(grid(mean(mean(A,"lon",120,280),"lat",0,60),"0<z<100"),
                  grid(mean(mean(B,"lon",120,280),"lat",0,60),"0<z<100"))

Here I use the grid function to specify the vertical coordinate. The mean function comes from the definition above. I also assume the existence of a hypothetical sub function which takes the difference between two variables, regridding the second to the grid of the first if necessary.


I (RobDeAlmeida) think that even with the use of "smart" functions -- like the grid-aware sub I mentioned above -- the URL grows too quickly. This is how it would look like with the F-TDS/GDS+Ingrid syntax:

/dataset_expr_{}{A/X/(120E)(280E)RANGE/Y/(0N)(60N)RANGE[X/Y]average/Z/(0)(100)RANGE/
                 B/X/(120E)(280E)RANGE/Y/(0N)(60N)RANGE[X/Y]average/Z/(0)(100)RANGE/SUB}.dods

That's why I suggested POSTing scripts. In this case it would become:

POST /functions
Content-type: text/x-ferret

let var1 = $1[x=120:280@ave,y=0:60@ave,z=0:100]
let var2 = $2[x=120:280@ave,y=0:60@ave,z=0:100]
let output = var1 - var2[G=var1]

Giving the response:

201 Created
Location: http://example.com/function1.xml

function1

And we'd proceed with the call:

/dataset.dods?function1(A,B)

Comment: What about having the script return a result (instead of the function reference)? If we do that, then the server can choose whether is maintains state. Also, what about using an XML document in the body of the request (POST) to define the function/operations? James

Comment: I hate to harp on this topic ;-) , but we really need to be thinking about a sand box to make sure this is secure! James

Difference time series of the same area mean from two different datasets, one remote

Same example as above, but now one of the datasets is remote (DAP-accessible).

Using nested functions. We specify the dataset with an URL pointing to the object, omitting any extensions (.dods, .dds):

/dataset.dods?sub(grid(mean(mean(A,"lon",120,280),"lat",0,60),"0<z<100"),
                  grid(mean(mean("http://server/dataset?B[0]","lon",120,280),"lat",0,60),"0<z<100"))

The function should be smart to parse any string input where a variable is expected, and retrieve them remotely.

Comment: One issue about remote access is that it means the server is also acting as a client under command of the 'initiating' client. That will enlarge the amount of code which must undergo a security review considerably. I'm not saying let's not do this because it's going to be (more) costly, but we should keep this in mind. It also makes implementation of most server much more complex. James

Difference XY fields from time-averaged results

Requires:

  1. time-average each of the variables over the same (or different) time ranges
  2. specify the vertical coordinate point on each in a geo-aware manner. (i.e. not by vertical index)
  3. specify the lat-lon range; regrid the lat-lon coordinates of one field to match the other
  4. take a difference between the lat-long fields

Using nested functions:

/dataset.dods?sub(grid(mean(A,"time","1990-01-01T12:00:00Z","1990-12-31T12:00:00Z"),
                       "0<z<100","120<lon<280","0<lat<60"),
                  grid(mean(B,"time","1990-01-01T12:00:00Z","1990-12-31T12:00:00Z"),
                       "0<z<100","120<lon<280","0<lat<60"))

A few comments:

  • the grid function could be modified to accept a single value, instead of a range: grid(var,"z=100").
  • time selection here is done using ISO dates.

Standard list of functions