Organization of attributes

From OPeNDAP Documentation
Revision as of 21:06, 24 June 2008 by Jimg (talk | contribs) (New page: From: benno@iri.columbia.edu Subject: DAP 4.0 Schema Date: May 12, 2005 8:50:02 AM MDT To: dods-tech@unidata.ucar.edu My apologies for making these comments so late: I was tr...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽
  From: benno@iri.columbia.edu
  Subject: DAP 4.0 Schema
  Date: May 12, 2005 8:50:02 AM MDT
  To: dods-tech@unidata.ucar.edu

My apologies for making these comments so late: I was trying to monitor the DAP 4.0 progress, but the document I was looking at did not mention the schema, and somehow I missed it when the document got improved.

Essentially DAP 4.0 as it currently stands uses XML to carry an improved dataset with attributes structure not unlike DAP 2.0. In particular, it defines an attribute cell which has a name and type and contents. This is awfully close to defining a structure within XML to carry the exact same information that XML is designed to carry (shades of GMT and netcdf ...).

I was thinking that it would be awfully nice if a set of attributes that belong to a convention (say CF) got translated to a set of XML tags with a CF schema, i.e. the tags would be defined in a namespace CF. So we would see

<verbatim>
<dataset name=test xmlns:cf="http://unidata.ucar.edu/2005/CF">
<cf:title>A wonderful test dataset</cf:title>
<cf:institution>the great beyond</cf:institution>
<cf:references>dods-tech@unidata.ucar.edu<cf:references>
....
</dataset>
</verbatim>

This way we would truly be using XML to transmit the information. The implication of this is that we would split the DAP schema into two parts: a organizational part (datasets and variables), and a concrete data-typing part (most of the rest of the current DAP4.0 schema). The DAP core would only use the concrete data-typing part, (except maybe for the nesting of the organizational part to create dot-separated names in the constraints, though in this example that is explicit in the Blob url so perhaps it is not even an issue).

<verbatim>
<variable name=sst>
<cf:long_name>sea surface temperature</cf:long_name>
<cf:units>Celsius_scale</cf:units>
<dap:Array>
<dap:Float32/>
<dap:dimension size="16" name="latitude"/>
<dap:dimension size="17" name="longitude"/>
<dap:dimension size="21" name="time"/>
</dap:Array>
<dap:Blob URL="http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u"/>
</variable>
</verbatim>

This is only a rough idea -- there are others out there that could make better syntactic choices (like a single dap container that resets the default name space for its contents), I am just trying to make a general point. The reason for doing this is two-fold: 1) the core core could be simpler and more stable, because almost everything outside the hopefully small core namespace would not affect the OpenDAP core, and 2) OpenDAP xml could simply be inserted in an XML document following whatever specific metadata conventions and data structures to create a document with accessible data. (GML perhaps?) It seems pretty clear that the future is metadata transmission in XML (with many different standards for metadata), and OpenDAP has always been a transmission mechanism that avoided constraining the metadata, it would be really nice to turn over the metadata responsibility to XML.

Benno


Added to this wiki: jimg 14:06, 24 June 2008 (PDT)