DAP4: URL Annotations
[Most recent modifications are either underlined or struck out]
Characterization of URL Annotations
Requests for data using the DAP4 protocol will require a significant number of annotations specifying what is to be retrieved, commands to the server, and commands to the client.
This document is intended to just describe the information with which URLs need to annotated based on past experience. It also enumerates the possible URL components that can be used to encode the annotations. I will consider a specific encoding in a separate document.
Looking at the DAP2 URLs, we see three classes of annotations: protocol, server commands, client commands, and queries (aka constraints).
For DAP2, the fact that the protocol being used is inferred from context; in netcdf-C, for example, the fact that the dataset name is a URL is sufficient to indicate the use of the DAP2 protocol. For some servers, such as TDS, the protocol is also inferrable from elements of the URL path. For example, in http://.../thredds/dodsC/... the "dodsC" indicates the use of the DAP2 protocol. TDS also supports a schema called "dods:" that also indicates the use of DAP2.
Server commands in DAP2 are appended to the dataset URL to indicate attributes of the request. For example:
The defined kinds of server commands for DAP2 are as follows:
- Component requests: ".dds", ".das"
- Data requests with format: ".dods", ".asc"
- Miscellaneous: ".html", ".ver"
Client commands are interpreted by the client-side library to specify actions to be performed by the library. The existence of client commands is important because we want to communicate from the user to the library without requiring any knowledge by intermediate code layers. For example, netcdf C tools such as ncdump sends URLs to the underlying netcdf DAP2 library without having to be cognizant of their structure.
Currently, there are two primary uses for client commands.
- Debugging - cause the DAP library to output information about, for example, the specific requests it sends to the server.
- Caching - control the degree of caching and prefetch to be used with a given request to the server.
Currently, client commands are represented as "name=value" pairs or just "name" enclosed in square braces: "[nocache]", for example. These commands are prefixed to the URL such as this.
The legal set of command is library specific.
One notable problem with this form of client command is that it prevented generic URL parsers from parsing the URL because, of course, the square bracket notation is non-standard.
An alternative to using client commands in the URL is to use a configuration file (often referred to as the .rc file such as .dodsrc). This configuration file is assumed to be either in the caller's home directory or in the current working directory. It contains the necessary client commands to be applied. It is mildly less convenient for the user to use a .rc file than to embed a client command in the URL.
The third class of URL annotations specifies some form of query to control the information to be extracted from a dataset on the server. This information is then passed back to the client.
In DAP2, queries consisted of projections and selections specifying a subset of the data in a dataset.
A projection represents a path through the DDS parse tree annotated with constraints on the dimensions. This, for example.
A selection represented a boolean expression to be applied to the records of a sequence. Syntactically, a selection could cross sequences, thus implying a join of the sequences, but in practice this was not allowed.
DAP2 queries also allowed the use of functions in the projections and selections to compute, for example, sums or averages. But the semantics was never very well defined. The set of allowable functions is server dependent.
DAP4 will need to support at least the three classes of annotations described above. Whatever annotation mechanisms are chosen, the following properties seem desirable.
- The resulting URL should be parseable by generic URL parsers
- =>Client commands should be embedded at the end of URLs, not the beginning.
- Whatever annotation encoding is used, it is desirable if it is as uniform as possible.
As mechanisms, we have the following available to us:
- The URL schema -- "http:" for example, or the TDS "dods:" schema. Using this is somewhat undesirable because it would need to encode also an underlying encrypted protocol like https: (versus http:).
- URL path elements such as the current use of e.g. http://host/../dodsC/... by TDS.
- URL query -- everything after the first '?' in the URL. URL queries technically have a defined form as name=value pairs, but in practice are pretty much free form.
- URL fragment -- everything after the last '#'. Again these are pretty much free form.
- Filename extensions -- everything between the data set name in the path and the start of the query. The DAP2 ".dds" and ".dods" are examples of this.
- Alternate extension formats. Ethan Davis has proposed the use of a "+" notation instead of filename extensions: +ddx+ascii, for example. This has the advantage of clearly not being confused with filename extensions while also making clear the additive nature of such annotation.
I should note that the Ferret server has taken to seriously abusing the URL format with URLs like this.
so we have much to aspire to :-)