DAP4: Type 1 Server Side Functions: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
(Created page with "== Proposal for DAP4 Server-Side Functions == '''Author''': Dennis Heimbigner '''Organization''': Unidata/UCAR '''Initial Draft''': 1/1/31 The server-side function (SSF) p...")
 
No edit summary
Line 1: Line 1:
== Proposal for DAP4 Server-Side Functions ==
== Proposal for DAP4 Server-Side Functions ==
'''Author''': Dennis Heimbigner  
'''Author''': Dennis Heimbigner<br />
'''Organization''': Unidata/UCAR   
'''Organization''': Unidata/UCAR<br />  
'''Initial Draft''': 1/1/31
'''Initial Draft''': 1/1/31


=== Introduction ===
The server-side function (SSF) problem divides
The server-side function (SSF) problem divides
into two parts.
into two parts.


* Type 1: There are limited set of functions that do "simple" computations
* Type 1: There are limited set of functions that do "simple" computations on the server with respect to one or more datasets. These computations include (and sometimes extend) the constraint computations associated with (e.g. DAP4) queries.
  on the server with respect to one or more datasets. These computations
  include (and sometimes extend) the constraint computations associated
  with (e.g. DAP4) queries.


* Type 2: There are much larger, longer-duration, computations that do
* Type 2: There are much larger, longer-duration, computations that do significant computing over one or more datasets.
  significant computing over one or more datasets.


This proposal addresses Type 1 functions for use within an
This proposal addresses Type 1 functions for use within an extended DAP4 constraint language. I will address my approach to Type 2 in a subsequent document.
extended DAP4 constraint language. I will address my approach to Type 2
in a subsequent document.


=== General Syntax ===
=== General Syntax ===
The key element of this proposal is to use single assignment
The key element of this proposal is to use single assignment syntax and semantics.
syntax and semantics.


* A query consists of a sequence of assignment statements
* A query consists of a sequence of assignment statements optionally separated by blanks.
optionally separated by blanks.


* The basic assignment statement is of the form:
* The basic assignment statement is of the form:
````
<pre>name=(<DAP4constraint>)
name=(<DAP4constraint>)
or
or
name=f(a1...an)
name=f(a1...an)</pre>
````
 
Notes that by using parentheses, we can jam statements together without
Note that by using parentheses, we can jam statements together without (I think) any parsing ambiguity. I considered using a semicolon separator, but we already overload that character.
(I think) any parsing ambiguity. I considered using a semicolon separator,
but we already overload that character.


I considered the possibility of allowing a function to return multiple
I considered the possibility of allowing a function to return multiple
values, like this
values, like this
````
<pre>
name,...name=f(...)
name,...name=f(...)
````
</pre>
but I think that this makes the construction of meta-data (see below)
but I think that this makes the construction of meta-data (see below) harder. The price one pays is that one might have to convert the above to something like this.
harder. The price one pays is that one might have to convert the above
<pre>
to something like this.
````
name1=f(1,...)
name1=f(1,...)
...
...
namen=f(n,...)
namen=f(n,...)
````
</pre>
This implies that f() must be called repeatedly.
This implies that ''f()'' must be called repeatedly. This may be mitigated by havinf f() cache its results or by having ''f()'' return a structure.
This may be mitigated by havinf f() cache its results
or by having f() return a structure.


I need to think on this some more.
I need to think on this some more.


=== Single Assignment Rule ===
=== Single Assignment Rule ===
Any "assignment" is unique in that no other statement may assign to this
Any "assignment" is unique in that no other statement may assign to this same variable; this is why it is called "single assignment".
same variable; this is why it is called "single assignment".


It is convenient for a number of reasons:
It is convenient for a number of reasons:


1. It is (IMO) easier to read and write
# It is (IMO) easier to read and write
2. It unwinds nested expressions, hence simplifying the syntax.
# It unwinds nested expressions, hence simplifying the syntax.


=== Function Syntax ===
=== Function Syntax ===
A function call is of the usual form
A function call is of the usual form
````
<pre>
f(a1,a2,...,an)
f(a1,a2,...,an)
````
</pre>
It is intended that function namespaces be supported,
It is intended that function namespaces be supported,
so the function name, f, may actually be of the form
so the function name, f, may actually be of the form
`x.y.z.f`.
''x.y.z.f''.
The assumption is that the server maintains a namespace
The assumption is that the server maintains a namespace
tree with functions as leaves.
tree with functions as leaves.
Line 81: Line 66:


Because of the need to construct meta-data (see below), it must be
Because of the need to construct meta-data (see below), it must be
possible to detect which arguments refer to previously defined variables.
possible to detect which arguments refer to previously defined variables. The approach taken here is to assume that the arguments to an expression are separated by commas. Thus, the top-level query processor can parse the arguments to a function and detect which are the names of previously assigned variables.
The approach taken here is to assume that the arguments to an expression
are separated by commas. Thus, the top-level query processor can parse
the arguments to a function and detect which are the names of previously
assigned variables.


In addition, there must be some way to pass other kinds of arbitrary arguments
In addition, there must be some way to pass other kinds of arbitrary arguments as unevalated strings to the function.
as unevalated strings to the function.
For other arguments, including simple names that might be mistaken for
For other arguments, including simple names that might be mistaken for
variables, we need some kind of escape process.
variables, we need some kind of escape process. I propose that we use parentheses plus, where necessary, the backslash ('\') as the escaping mechanism. This means that any argument that is surrounded by parentheses is passed unchanged as a string argument to the function. Note that if the argument itself contains balanced parentheses, these will be parsed as matching pairs and passed along. The gotcha is that an unbalanced parenthesis must be escaped witha backslash.
I propose that we use parentheses plus, where necessary,  
the backslash ('\\') as the escaping mechanism.
This means that any argument that is surrounded by parentheses
is passed unchanged as a string argument to the function.
Note that if the argument itself contains balanced parentheses,
these will be parsed as matching pairs and passed along.
The gotcha is that an unbalanced parenthesis must be escaped with
a backslach.


=== Returning Variables ===
=== Returning Variables ===
At the end of our query, it will be necessary to specify the
At the end of our query, it will be necessary to specify the variables that will be returned to the client that made they original query. The simplest approach is to have a "return" statement of this form.
variables that will be returned to the client that made they original
<pre>return(d1,d2,...dn)</pre>
query. The simplest approach is to have a "return" statement
where the di are the names to which expressions were previously assigned. One potential complication is with respect to DAP4 groups. It may be desirable to insert any of the resulting variables in a specific
of this form.
DAP4 group.  To this end, I propose to all the di to actually take this form: ''g1.g2...gn.di'' to indicate the group position of each variable.
````
return(d1,d2,...dn)
````
where the di are the names to which expressions were previously assigned.
One potential complication is with respect to DAP4 groups. It may be
desirable to insert any of the resulting variables in a specific
DAP4 group.  To this end, I propose to all the di to actually take this form:
`g1.g2...gn.di` to indicate the group position of each variable.


=== Meta-data for Queries ===
=== Meta-data for Queries ===
For DAP4, every query is associated with a meta-data
For DAP4, every query is associated with a meta-data description that describes the structure of the returned data.
description that describes the structure of the returned data.


This is the really hard part of this proposal, and everything else
This is the really hard part of this proposal, and everything else must be designed to support meta-data construction.
must be designed to support meta-data construction.


Note that meta-data construction will be needed even if the
Note that meta-data construction will be needed even if the expression is not evaluated. This is, of course, because the client can ask for the ''.dmr'' and that will return only the meta-data.
expression is not evaluated. This is, of course, because the
client can ask for the *.dmr* and that will return only the meta-data.


As a minor point, note that the result(s) of a query are defined by the
As a minor point, note that the result(s) of a query are defined by the return statement.  This means that we only need to obtain the meta-data for the expressions that defined the variables in the return statement.
return statement.  This means that we only need to obtain the meta-data
for the expressions that defined the variables in the return statement.


For DAP4 basic constraints, we already know how to construct the meta-data.
For DAP4 basic constraints, we already know how to construct the meta-data. For a function evaluation, we have no idea because it could be largely arbitrary.
For a function evaluation, we have no idea because it could be largely
arbitrary.


The approach taken here is to require every function to be able to
The approach taken here is to require every function to be able to describe the meta-data that will result from its execution with specific arguments. This means that when the client requests the ''dmr'', the query must be evaluated as the meta-data level to obtain the final ''dmr''.  If, later, the client asks for the ''.dap'' (the data), then the evaluation must produce data that conforms to the ''dmr''.
describe the meta-data that will result from its execution with specific
arguments. This means that when the client requests the *dmr*, the query
must be evaluated as the meta-data level to obtain the final *dmr*.  If,
later, the client asks for the *.dap* (the data), then the evaluation
must produce data that conforms to the *dmr*.


The *dmr* construction process is likely to look like this.
The ''dmr'' construction process is likely to look like this.


1. Meta-evaluate each statement in the query from left to right.
# Meta-evaluate each statement in the query from left to right.
2. If the statement is `name=(<DAP4 expression>) then use
# If the statement is "name=(<DAP4 expression>)" then use the existing rules to meta-assign the DAP4 expression meta-data to the left side variable.
  the existing rules to meta-assign the DAP4 expression meta-data
# Consider a function "f(a1,...,an)", where some of the ai are references to previously defined variables. In this case, we invoke the functions's meta-evaluator with its variable references replaced with the corresponding meta-data for that variable.
  to the left side variable.
# The final ''return'' statement returns the concatenation of the meta-data of the specified variables.
3. Consider a function `f(a1,...,an)`, where some of the ai
  are references to previously defined variables. In this case,
  we invoke the functions's meta-evaluator with its variable
  references replaced with the corresponding meta-data for that
  variable.
4. The final *return* returns the concatenation of the meta-data
  of the specified variables.


There are some issues that need addressing.
There are some issues that need addressing.


1. Structures
#'''Structures''': A function might return a structure, which is ok, but it might return a structure with different numbers of fields depending on its actual inputs. This is not ok and I propose to prohibit it.
A function might return a structure, which is ok, but it
#'''Selections over dimensioned arrays''': We discussed this a long time ago and decided to leave it out of the basic DAP4 constraint language. However here it is possible for a function to, for example, take an array as input and return another array as output, where the output is defined by the values of the input array.
might return a structure with different numbers of fields depending on
:: The problem is that the actual dimensions might differ depending on the content of the input array. So, if the client asks for the ''dmr'' only, what dimension is returned? I am considering the following alternatives:
its actual inputs. This is not ok and I propose to prohibit it.
::#Encapsulate the output array as a sequence or (sometimes) a set of nested sequences [N.B. James, this is one reason I am considering the VLEN concept to replace sequences].
 
::#Introduce the netCDF concept of UNLIMITED dimensions to indicate that the actual size is unknown, but it is some fixed value.
2. Selections over dimensioned arrays
::I have not decided which way to go on this. I lean towards the Sequence solution.
We discussed this a long time ago and decided to leave it out of the
basic DAP4 constraint language. However here it is possible for a
function to, for example, take an array as input and return another
array as output, where the output is defined by the values of the input
array.
 
The problem, as with, structure, is that the actual dimensions might differ
depending on the content of the input array. So, if the client asks for
the *dmr* only, what dimension is returned.
I am considering the following alternatives:
 
1. Encapsulate the output array as a sequence or (sometimes) a set of
  nested sequences [N.B. James, this is one reason I am considering the
  VLEN concept to replace sequences].
 
2. Introduce the netCDF concept of UNLIMITED dimensions to indicate
  that the actual size is unknown, but it is some fixed value.
 
I have not decided which way to go on this. I lean towards the Sequence
solution.


==== Attributes ====
==== Attributes ====
I propose to allow functions to annotate their meta-data
I propose to allow functions to annotate their meta-data with whatever attributes they desire.
with whatever attributes they desire.


=== GrADS Examples ===
=== GrADS Examples ===
Mostly out of curiosity, I attempted to convert some
Mostly out of curiosity, I attempted to convert some [http://www.iges.org/grads/gds/doc/user.html GrADS '_expr_' examples] to the form proposed here. I failed because GrADS uses a full-blown scripting language, so it was not possible to emulate the given examples.
[GrADS '_expr_' examples](http://www.iges.org/grads/gds/doc/user.html)
to the form proposed here. I failed because GrADS
uses a full-blown scripting language, so it was not possible to
emulate the given examples.


=== Additional Open Issues ===
=== Additional Open Issues ===
1. Should this proposal be folded into dap.ce
#Should this proposal be folded into dap.ce (i.e. the existing basic constraint language) or as a separate constraint system dap.fcn, say.
  (i.e. the existing basic constraint language)
  or as a separate constraint system dap.fcn, say.

Revision as of 18:47, 4 January 2016

Proposal for DAP4 Server-Side Functions

Author: Dennis Heimbigner
Organization: Unidata/UCAR
Initial Draft: 1/1/31

Introduction

The server-side function (SSF) problem divides into two parts.

  • Type 1: There are limited set of functions that do "simple" computations on the server with respect to one or more datasets. These computations include (and sometimes extend) the constraint computations associated with (e.g. DAP4) queries.
  • Type 2: There are much larger, longer-duration, computations that do significant computing over one or more datasets.

This proposal addresses Type 1 functions for use within an extended DAP4 constraint language. I will address my approach to Type 2 in a subsequent document.

General Syntax

The key element of this proposal is to use single assignment syntax and semantics.

  • A query consists of a sequence of assignment statements optionally separated by blanks.
  • The basic assignment statement is of the form:
name=(<DAP4constraint>)
or
name=f(a1...an)

Note that by using parentheses, we can jam statements together without (I think) any parsing ambiguity. I considered using a semicolon separator, but we already overload that character.

I considered the possibility of allowing a function to return multiple values, like this

name,...name=f(...)

but I think that this makes the construction of meta-data (see below) harder. The price one pays is that one might have to convert the above to something like this.

name1=f(1,...)
...
namen=f(n,...)

This implies that f() must be called repeatedly. This may be mitigated by havinf f() cache its results or by having f() return a structure.

I need to think on this some more.

Single Assignment Rule

Any "assignment" is unique in that no other statement may assign to this same variable; this is why it is called "single assignment".

It is convenient for a number of reasons:

  1. It is (IMO) easier to read and write
  2. It unwinds nested expressions, hence simplifying the syntax.

Function Syntax

A function call is of the usual form

f(a1,a2,...,an)

It is intended that function namespaces be supported, so the function name, f, may actually be of the form x.y.z.f. The assumption is that the server maintains a namespace tree with functions as leaves.

The arguments (a1,...,an) present a problem. It is unlikely that we can syntactically prescribe the form of arguments because they are specific to the function. For those familiar with lisp, they need to act like FEXPRs.

Because of the need to construct meta-data (see below), it must be possible to detect which arguments refer to previously defined variables. The approach taken here is to assume that the arguments to an expression are separated by commas. Thus, the top-level query processor can parse the arguments to a function and detect which are the names of previously assigned variables.

In addition, there must be some way to pass other kinds of arbitrary arguments as unevalated strings to the function. For other arguments, including simple names that might be mistaken for variables, we need some kind of escape process. I propose that we use parentheses plus, where necessary, the backslash ('\') as the escaping mechanism. This means that any argument that is surrounded by parentheses is passed unchanged as a string argument to the function. Note that if the argument itself contains balanced parentheses, these will be parsed as matching pairs and passed along. The gotcha is that an unbalanced parenthesis must be escaped witha backslash.

Returning Variables

At the end of our query, it will be necessary to specify the variables that will be returned to the client that made they original query. The simplest approach is to have a "return" statement of this form.

return(d1,d2,...dn)

where the di are the names to which expressions were previously assigned. One potential complication is with respect to DAP4 groups. It may be desirable to insert any of the resulting variables in a specific DAP4 group. To this end, I propose to all the di to actually take this form: g1.g2...gn.di to indicate the group position of each variable.

Meta-data for Queries

For DAP4, every query is associated with a meta-data description that describes the structure of the returned data.

This is the really hard part of this proposal, and everything else must be designed to support meta-data construction.

Note that meta-data construction will be needed even if the expression is not evaluated. This is, of course, because the client can ask for the .dmr and that will return only the meta-data.

As a minor point, note that the result(s) of a query are defined by the return statement. This means that we only need to obtain the meta-data for the expressions that defined the variables in the return statement.

For DAP4 basic constraints, we already know how to construct the meta-data. For a function evaluation, we have no idea because it could be largely arbitrary.

The approach taken here is to require every function to be able to describe the meta-data that will result from its execution with specific arguments. This means that when the client requests the dmr, the query must be evaluated as the meta-data level to obtain the final dmr. If, later, the client asks for the .dap (the data), then the evaluation must produce data that conforms to the dmr.

The dmr construction process is likely to look like this.

  1. Meta-evaluate each statement in the query from left to right.
  2. If the statement is "name=(<DAP4 expression>)" then use the existing rules to meta-assign the DAP4 expression meta-data to the left side variable.
  3. Consider a function "f(a1,...,an)", where some of the ai are references to previously defined variables. In this case, we invoke the functions's meta-evaluator with its variable references replaced with the corresponding meta-data for that variable.
  4. The final return statement returns the concatenation of the meta-data of the specified variables.

There are some issues that need addressing.

  1. Structures: A function might return a structure, which is ok, but it might return a structure with different numbers of fields depending on its actual inputs. This is not ok and I propose to prohibit it.
  2. Selections over dimensioned arrays: We discussed this a long time ago and decided to leave it out of the basic DAP4 constraint language. However here it is possible for a function to, for example, take an array as input and return another array as output, where the output is defined by the values of the input array.
The problem is that the actual dimensions might differ depending on the content of the input array. So, if the client asks for the dmr only, what dimension is returned? I am considering the following alternatives:
  1. Encapsulate the output array as a sequence or (sometimes) a set of nested sequences [N.B. James, this is one reason I am considering the VLEN concept to replace sequences].
  2. Introduce the netCDF concept of UNLIMITED dimensions to indicate that the actual size is unknown, but it is some fixed value.
I have not decided which way to go on this. I lean towards the Sequence solution.

Attributes

I propose to allow functions to annotate their meta-data with whatever attributes they desire.

GrADS Examples

Mostly out of curiosity, I attempted to convert some GrADS '_expr_' examples to the form proposed here. I failed because GrADS uses a full-blown scripting language, so it was not possible to emulate the given examples.

Additional Open Issues

  1. Should this proposal be folded into dap.ce (i.e. the existing basic constraint language) or as a separate constraint system dap.fcn, say.