DAP4: Type 1 Server Side Functions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Proposal for DAP4 Server-Side Functions

Author: Dennis Heimbigner
Organization: Unidata/UCAR
Initial Draft: 1/1/31

Introduction

The server-side function (SSF) problem divides into two parts.

  • Type 1: There are limited set of functions that do "simple" computations on the server with respect to one or more datasets. These computations include (and sometimes extend) the constraint computations associated with (e.g. DAP4) queries.
  • Type 2: There are much larger, longer-duration, computations that do significant computing over one or more datasets.

This proposal addresses Type 1 functions for use within an extended DAP4 constraint language. I will address my approach to Type 2 in a subsequent document.

General Syntax

The key element of this proposal is to use single assignment syntax and semantics.

  • A query consists of a sequence of assignment statements optionally separated by blanks.
  • The basic assignment statement is of the form:
name=(<DAP4constraint>)
or
name=f(a1...an)

Note that by using parentheses, we can jam statements together without (I think) any parsing ambiguity. I considered using a semicolon separator, but we already overload that character.

Interesting. This idea is much more like a series of statements than the DAP2 (original) notion of a sing function per query/access. This is effectively what we've started doing - allowing for a sequence of 'statements.' This idea takes that notion farther by binding names to the results, which is interesting but also means that the lookup table (environment) must be read/write. Not a huge deal, but something to consider.

I considered the possibility of allowing a function to return multiple values, like this

name,...name=f(...)

but I think that this makes the construction of meta-data (see below) harder. The price one pays is that one might have to convert the above to something like this.

name1=f(1,...)
...
namen=f(n,...)

This implies that f() must be called repeatedly. This may be mitigated by havinf f() cache its results or by having f() return a structure.

I need to think on this some more.

We are using the 'structure return' approach now and it works fairly well. There are, of course, some wrinkles, but on the whole it's a simple solution that works well with the existing software. jhrg 1/5/16

Single Assignment Rule

Any "assignment" is unique in that no other statement may assign to this same variable; this is why it is called "single assignment".

It is convenient for a number of reasons:

  1. It is (IMO) easier to read and write
  2. It unwinds nested expressions, hence simplifying the syntax.

Function Syntax

A function call is of the usual form

f(a1,a2,...,an)

It is intended that function namespaces be supported, so the function name, f, may actually be of the form x.y.z.f. The assumption is that the server maintains a namespace tree with functions as leaves.

The arguments (a1,...,an) present a problem. It is unlikely that we can syntactically prescribe the form of arguments because they are specific to the function. For those familiar with lisp, they need to act like FEXPRs.

Because of the need to construct meta-data (see below), it must be possible to detect which arguments refer to previously defined variables. The approach taken here is to assume that the arguments to an expression are separated by commas. Thus, the top-level query processor can parse the arguments to a function and detect which are the names of previously assigned variables.

In addition, there must be some way to pass other kinds of arbitrary arguments as unevalated strings to the function. For other arguments, including simple names that might be mistaken for variables, we need some kind of escape process. I propose that we use parentheses plus, where necessary, the backslash ('\') as the escaping mechanism. This means that any argument that is surrounded by parentheses is passed unchanged as a string argument to the function. Note that if the argument itself contains balanced parentheses, these will be parsed as matching pairs and passed along. The gotcha is that an unbalanced parenthesis must be escaped witha backslash.

We're doing this. Again, it's not so bad. (sorry about the red, but it makes these comments easier to see)

Returning Variables

At the end of our query, it will be necessary to specify the variables that will be returned to the client that made they original query. The simplest approach is to have a "return" statement of this form.

return(d1,d2,...dn)

where the di are the names to which expressions were previously assigned. One potential complication is with respect to DAP4 groups. It may be desirable to insert any of the resulting variables in a specific DAP4 group. To this end, I propose to all the di to actually take this form: g1.g2...gn.di to indicate the group position of each variable.

Meta-data for Queries

For DAP4, every query is associated with a meta-data description that describes the structure of the returned data.

This is the really hard part of this proposal, and everything else must be designed to support meta-data construction.

Note that meta-data construction will be needed even if the expression is not evaluated. This is, of course, because the client can ask for the .dmr and that will return only the meta-data.

As a minor point, note that the result(s) of a query are defined by the return statement. This means that we only need to obtain the meta-data for the expressions that defined the variables in the return statement.

For DAP4 basic constraints, we already know how to construct the meta-data. For a function evaluation, we have no idea because it could be largely arbitrary.

The approach taken here is to require every function to be able to describe the meta-data that will result from its execution with specific arguments. This means that when the client requests the dmr, the query must be evaluated as the meta-data level to obtain the final dmr. If, later, the client asks for the .dap (the data), then the evaluation must produce data that conforms to the dmr.

The dmr construction process is likely to look like this.

  1. Meta-evaluate each statement in the query from left to right.
  2. If the statement is "name=(<DAP4 expression>)" then use the existing rules to meta-assign the DAP4 expression meta-data to the left side variable.
  3. Consider a function "f(a1,...,an)", where some of the ai are references to previously defined variables. In this case, we invoke the functions's meta-evaluator with its variable references replaced with the corresponding meta-data for that variable.
  4. The final return statement returns the concatenation of the meta-data of the specified variables.

There are some issues that need addressing.

  1. Structures: A function might return a structure, which is ok, but it might return a structure with different numbers of fields depending on its actual inputs. This is not ok and I propose to prohibit it.
  2. Selections over dimensioned arrays: We discussed this a long time ago and decided to leave it out of the basic DAP4 constraint language. However here it is possible for a function to, for example, take an array as input and return another array as output, where the output is defined by the values of the input array.
The problem is that the actual dimensions might differ depending on the content of the input array. So, if the client asks for the dmr only, what dimension is returned? I am considering the following alternatives:
  1. Encapsulate the output array as a sequence or (sometimes) a set of nested sequences [N.B. James, this is one reason I am considering the VLEN concept to replace sequences].
  2. Introduce the netCDF concept of UNLIMITED dimensions to indicate that the actual size is unknown, but it is some fixed value.
I have not decided which way to go on this. I lean towards the Sequence solution.

Attributes

I propose to allow functions to annotate their meta-data with whatever attributes they desire.

GrADS Examples

Mostly out of curiosity, I attempted to convert some GrADS '_expr_' examples to the form proposed here. I failed because GrADS uses a full-blown scripting language, so it was not possible to emulate the given examples.

Additional Open Issues

  1. Should this proposal be folded into dap.ce (i.e. the existing basic constraint language) or as a separate constraint system dap.fcn, say.