Organization of attributes: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 99: Line 99:
And I'd suggest that we adopt the following addition to that syntax: That the Attribute element support a ''namespace'' xml attribute (dap:attribute@namespace) and that xml attribute be used to indicate that a particular dap:attribute belongs to a given convention/standard. Like this:
And I'd suggest that we adopt the following addition to that syntax: That the Attribute element support a ''namespace'' xml attribute (dap:attribute@namespace) and that xml attribute be used to indicate that a particular dap:attribute belongs to a given convention/standard. Like this:
  <Attribute name="long_name" type="string" namespace="cf">
  <Attribute name="long_name" type="string" namespace="cf">
where we would declare the string "cf" to be a namespace prefix in the usual way.
Pro:
# This will be possible to implement in our handlers. We will not have to divine new names for data set attributes that contain characters that are not allowed in QNames.
# The dap:attribute@namespace is a bit of a hack, but it's a recognizable one that should provide the needed information to XSLT and other code
# encoding the namespace & its prefix (i.e., "cf" in this example) using the XML notation for such a beast is something that the handler can do and that XSLT and otehr code should also be able to access
Con:
# In order to get the attributes represented in the form they take on in their namespaces (when attributes are in a namespace; e.g. CF-1.0), we will have to transform the DDX using XSLT.


== Development Plan ==
== Development Plan ==

Revision as of 23:19, 13 January 2009

From: benno@iri.columbia.edu

  Subject: DAP 4.0 Schema
  Date: May 12, 2005 8:50:02 AM MDT
  To: dods-tech@unidata.ucar.edu

My apologies for making these comments so late: I was trying to monitor the DAP 4.0 progress, but the document I was looking at did not mention the schema, and somehow I missed it when the document got improved.

Essentially DAP 4.0 as it currently stands uses XML to carry an improved dataset with attributes structure not unlike DAP 2.0. In particular, it defines an attribute cell which has a name and type and contents. This is awfully close to defining a structure within XML to carry the exact same information that XML is designed to carry (shades of GMT and netcdf ...).

I was thinking that it would be awfully nice if a set of attributes that belong to a convention (say CF) got translated to a set of XML tags with a CF schema, i.e. the tags would be defined in a namespace CF. So we would see

<dataset name=test xmlns:cf="http://unidata.ucar.edu/2005/CF">
    <cf:title>A wonderful test dataset</cf:title>
    <cf:institution>the great beyond</cf:institution>
    <cf:references>dods-tech@unidata.ucar.edu<cf:references>
     ...
</dataset>

This way we would truly be using XML to transmit the information. The implication of this is that we would split the DAP schema into two parts: a organizational part (datasets and variables), and a concrete data-typing part (most of the rest of the current DAP4.0 schema). The DAP core would only use the concrete data-typing part, (except maybe for the nesting of the organizational part to create dot-separated names in the constraints, though in this example that is explicit in the Blob url so perhaps it is not even an issue).

 <variable name=sst>
     <cf:long_name>sea surface temperature</cf:long_name>
     <cf:units>Celsius_scale</cf:units>
     <dap:Array>
         <dap:Float32/>
         <dap:dimension size="16" name="latitude"/>
         <dap:dimension size="17" name="longitude"/>
         <dap:dimension size="21" name="time"/>
     </dap:Array>
     <dap:Blob URL="http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u"/>
 </variable>

This is only a rough idea -- there are others out there that could make better syntactic choices (like a single dap container that resets the default name space for its contents), I am just trying to make a general point. The reason for doing this is two-fold: 1) the core core could be simpler and more stable, because almost everything outside the hopefully small core namespace would not affect the OpenDAP core, and 2) OpenDAP xml could simply be inserted in an XML document following whatever specific metadata conventions and data structures to create a document with accessible data. (GML perhaps?) It seems pretty clear that the future is metadata transmission in XML (with many different standards for metadata), and OpenDAP has always been a transmission mechanism that avoided constraining the metadata, it would be really nice to turn over the metadata responsibility to XML.

Benno


Added to this wiki: jimg 14:06, 24 June 2008 (PDT)

More ideas about this syntax

Some other ideas; a mixture of syntaxes:

 <variable name="sst">
     <cf:long_name>sea surface temperature</cf:long_name>
     <cf:units>Celsius_scale</cf:units>

     <dap:attribute name="long_name" type="string">sea surface temperature</dap:attribute >

     <dap:attribute name="coefs" type="float64"><value>0.01</value> <value>-1.5</value> </dap:attribute>

     <att:units type="string">Celsius_scale</att:units >
     <att:foo type="string">bar</att:foo>

     <dap:Array>
         <dap:Float32/>
         <dap:dimension size="16" name="latitude"/>
         <dap:dimension size="17" name="longitude"/>
         <dap:dimension size="21" name="time"/>
     </dap:Array>
 </variable>

But, the question I would like to understand better is How does having an XML document with a potentially infinite number of elements affect systems? That is, when we take an attribute from a netcdf file (or any other file type...) and make an element name from the attribute's name, since there's no limit to the names of attributes, there's a potentially infinite numer of element names in the DDX.

Other issues:

  1. It's possible to have attributes in netcdf (and other?) formats that have colons in their names. That means that when a handler sees these names, it must remove the colon in order to make that attribute name an element name that's legal XML.
  2. There are other characters that are not allowed in QNames, and these would all have to be removed for this attribute-name --> element-name scheme to work in general.
  3. Is it realistic to think that a handler (netcdf or otherwise) is going to be rewritten in such a way that it can distinguish which dap:Attribute@name values that contain colons need to be interpreted as namespace prefixed names vs which ones that contain colons must have their colons removed/changed so that they become a vaild QName? (If you do have modify it how do you accurately store the real name of the element? Add an attribute called "name" to store the Name? At that point we're back to where we were with dap:Attribute)

An alternative is to use the existing DDX syntax but add an optional namespace attribute which provides the handler with an opportunity to say it knows this attribute belongs to a particular namespace. The handler would have to take care of assigning a prefix for the namespace in many situations - work out the details...

Proposed alternative syntax

The idea behind all of this is to have the handlers build responses that can include information about the conventions/standards followed by the dataset

The current (DAP 3.2) DDX syntax for attributes looks like:

    <Grid name="temp">
        <Attribute name="long_name" type="String">
            <value>temperature</value>
        </Attribute>
        <Attribute name="units" type="String">
            <value>K</value>
        </Attribute>
        <Attribute name="grid_mapping" type="String">
            <value>crs</value>
        </Attribute>
        <Attribute name="wcs:gridCRS" type="String">
            <value>crs</value>
        </Attribute>
        <Array name="temp">
        ...

And I'd suggest that we adopt the following addition to that syntax: That the Attribute element support a namespace xml attribute (dap:attribute@namespace) and that xml attribute be used to indicate that a particular dap:attribute belongs to a given convention/standard. Like this:

<Attribute name="long_name" type="string" namespace="cf">

where we would declare the string "cf" to be a namespace prefix in the usual way.

Pro:

  1. This will be possible to implement in our handlers. We will not have to divine new names for data set attributes that contain characters that are not allowed in QNames.
  2. The dap:attribute@namespace is a bit of a hack, but it's a recognizable one that should provide the needed information to XSLT and other code
  3. encoding the namespace & its prefix (i.e., "cf" in this example) using the XML notation for such a beast is something that the handler can do and that XSLT and otehr code should also be able to access

Con:

  1. In order to get the attributes represented in the form they take on in their namespaces (when attributes are in a namespace; e.g. CF-1.0), we will have to transform the DDX using XSLT.

Development Plan

How to implement this change:

  1. Adopt this syntax for DAP 3.3
  2. Local attribute namespace must be made
    1. Cast all attributes into this namespace
    2. Then figure out how to allow other namespaces
    3. Must be clear about how to organize the stuff in the DDX - make a schema or schema sniplet
    4. Also adjust schema to allow foreign content

How this will be used by the WCS interface:

  1. Use an AIS to inject new attributes into the DDX
  2. These attributes will not be in the DAP attribute namespace

jimg 13:41, 13 January 2009 (PST)