DAP4: DDX Grammar: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[OPULS_Development| <<back to OPULS Development]]
'''Version: 1.0'''
'''Version: 1.0'''


Below is the formal grammar for the DAP4 DXD, version 1.0.
At the end of this document are instructions for accessing and testing a formal grammar for the DAP4 DDX using the Relax-NG schema language. I constructed it without any reference to any other explicit or implicit grammars so I could record my ideas. I have since modified it based examining the implied grammar in page [[DAP4: Data Model]] and from comments from others and from a comparison with the xsd grammar.
I constructed it without any reference to any other
explicit or implicit grammars so I could record my
proposal.


Now, having looked at the implied grammar in page [[DAP4: Data Model]],
'''NOTE''': [[User:Jimg|Jimg]] 14:19, 28 August 2012 (PDT) There is a copy of the dap4.rng file in subversion at https://scm.opendap.org/svn/trunk/xml/dap/. I think that version is more recent than the dropbox files referenced by this document.
I note at least the following differences.


==== Variable definitions ====
== Differences with DAP4 xsd Grammar ==
I use the notation:
I converted the xsd-based grammar
<pre><variable name="v1" type="int32">... </pre>
https://scm.opendap.org/trac/browser/trunk/xml/dap/dap4.xsd
as opposed to using the typename as the element tag.
to an equivalent relax-ng grammar.
<pre><Int32 name="v1>...</pre>
http://dl.dropbox.com/u/53929684/xsd.rng (now also at https://scm.opendap.org/svn/trunk/xml/dap/dap4.rng)


I actually prefer the second notation, but it potentially interferes with
my attempt to provide the equivalent of named user-defined types.
That is, I want to give a name to, for example, a structure definition
<pre><structure name="struct">...</pre>
so that I can allow that type name to be used in variable


One major difference I see is in dimension handling.
# I just used the name "dimension" rather than "shareddimension"; For me, all dimensions (except anonymous ones and variable length) are shared.
# The xsd separates out scalars from arrays. I always allowed the dimensions for a variable to be optional to handle the scalar case.
# I attempted to be as consistent as possible, so I allowed any type including sequences <del>and structures</del> to be dimensioned.
# <del>The dimensions of a variable are currently specified in the rng grammar as an attribute named "dimensions" associated with "variables": e.g. dimensions="dr d1".<br/> Previously I used this:<br/>
<pre>
<pre>
<!-- Relax NG Grammar -->
<dimensions>
 
<dimension name="dr"/>
<!--grammar datatypelibrary="http://opendap.org/DAP4/datatypes.xml"-->
<dimension name="d1"/>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
</dimensions>
 
</pre> But this seemed kind of verbose.</del>
<start>
  <element name="dataset">
    <ref name="groupdef"/>
  </element>
</start>
 
<define name="groupdef">
  <attribute name="name"/>
  <zeroOrMore>
    <element name="attribute"><ref name="attributedef"/></element>
  </zeroOrMore>
  <optional>
    <element name="dimensions">
      <zeroOrMore>
        <element name="dimension"><ref name="dimdef"/></element>
      </zeroOrMore>
    </element>
  </optional>
  <zeroOrMore>
    <element name="variable"><ref name="variabledef"/></element>
  </zeroOrMore>
  <zeroOrMore>
    <element name="group"><ref name="groupdef"/></element>
  </zeroOrMore>
</define>


<define name="dimdef">
Other differences:
    <attribute name="name"/>
# The Dataset element in the xsd has a couple of extra attributes. I added these.
     <attribute name="size"/>
# The xsd appears to allow attributes to themselves have attributes. This needs discussion.
    <zeroOrMore>
# I forgot enumerations and opaque. I added them.
      <element name="attribute"><ref name="attributedef"/></element>
# The URL basetype is in the xsd. What is the justification for keeping it?
    </zeroOrMore>
# It appears that the Dataset contains a top level <group> declaration; I chose to treat the Dataset itself as the top-level group.
</define>
# Attribute declarations appear to have their own "namespace" attribute. Not sure why this is needed.
# I do not understand the purpose of the "NewAttribute" attribute.
# <del>The Grid issue, of course</del> There are still some minor differences in representing coordinate variables.
# The xsd represents attribute values thus:
<Attribute name="a">
    <value>...</value>
     <value>...</value>
</Attribute>
: I provided an alternate form when there is only one value:
<Attribute name="a" value="..."/>
: and I chose to use attributes in the multi-valued case because I prefer not to use elements with content unless really necessary. So I represented the above as this.
<Attribute name="a">
  <Value value="..."/>
  <Value value="..."/>
</Attribute>


<define name="variabledef">
There are also some minor differences.
  <attribute name="name"/> 
# <del>Element names (e.g. <structure>) are capitalized in the xsd grammar</del> I modified the rng grammar to capitalize.
  <zeroOrMore>
# There is an issue of interleaving of definitions, or equivalently, what elements must occur in a fixed order.
    <element name="dimension">
# Where should attributes be legal? I think the rng grammar and the xsd grammar agree on this: putting them almost everywhere, but it needs discussion.
      <attribute name="name"/>
    </element>
  </zeroOrMore>
  <zeroOrMore>
    <element name="coordinatevar">
      <attribute name="name"/>
    </element>
  </zeroOrMore>
  <choice>
  <attribute name="type"/>
  <element name="structure"><ref name="structuredef"/></element>
  <element name="sequence"><ref name="sequencedef"/></element>
  <element name="grid"><ref name="griddef"/></element>
  </choice>
  <zeroOrMore>
    <element name="attribute"><ref name="attributedef"/></element>
  </zeroOrMore>
</define>


<define name="structuredef">
Other differences:
  <attribute name="name"/> 
# <del>I temporarily suppressed OtherXML because it did not translate correctly</del> Fixed.
  <zeroOrMore>
# I dropped Blobtype; I fail to see the need for this.
    <element name="variable"><ref name="variabledef"/></element>
  </zeroOrMore> 
  <zeroOrMore>
    <element name="attribute"><ref name="attributedef"/></element>
  </zeroOrMore>
</define>


<define name="sequencedef">
=== Testing the Relax-NG Grammar ===
  <attribute name="name"/> 
'''NOTE''': See subversion as described at the top of this page for more recent versions of the grammar.
  <zeroOrMore>
    <element name="variable"><ref name="variabledef"/></element>
  </zeroOrMore> 
  <zeroOrMore>
    <element name="attribute"><ref name="attributedef"/></element>
  </zeroOrMore>
</define>


<define name="griddef">
You will need to copy three files:
  <attribute name="name"/>
# dap4.rng - this is the grammar file; it uses the Relax-NG schema language (http://relaxng.org/).<br/>This can be obtained from http://dl.dropbox.com/u/53929684/dap4.rng
  <element name="array">
# test.xml - this is a test file, that I am growing to cover the whole grammar.<br/>This can be obtained from http://dl.dropbox.com/u/53929684/test.xml
    <element name="variable"><ref name="variabledef"/></element>
# jing.jar - Jing is a validator that takes the grammar and a test file and checks that the test file conforms to the grammar.<br/>This can be obtained from http://dl.dropbox.com/u/53929684/jing.jar.
  </element>
  <element name="maps">
    <zeroOrMore>
      <element name="variable"><ref name="variabledef"/></element>
    </zeroOrMore>
  </element>
  <zeroOrMore>
    <element name="attribute"><ref name="attributedef"/></element>
  </zeroOrMore>
</define>


<define name="attributedef">
To use it, do the command:
  <attribute name="name"/>
java -jar jing.jar dap4.rng test.xml
  <attribute name="type"/>
No output is produced if the validation succeeds, otherwise, error messages are produced.
  <optional>
    <attribute name="namespace"/>
  </optional>
  <choice>
    <attribute name="value"/>
    <oneOrMore>
      <element name="value">
        <attribute name="value"/>
      </element>
    </oneOrMore>
  </choice>
</define>


</grammar>
''-Dennis Heimbigner''
</pre>

Latest revision as of 16:40, 4 September 2012

<<back to OPULS Development

Version: 1.0

At the end of this document are instructions for accessing and testing a formal grammar for the DAP4 DDX using the Relax-NG schema language. I constructed it without any reference to any other explicit or implicit grammars so I could record my ideas. I have since modified it based examining the implied grammar in page DAP4: Data Model and from comments from others and from a comparison with the xsd grammar.

NOTE: Jimg 14:19, 28 August 2012 (PDT) There is a copy of the dap4.rng file in subversion at https://scm.opendap.org/svn/trunk/xml/dap/. I think that version is more recent than the dropbox files referenced by this document.

Differences with DAP4 xsd Grammar

I converted the xsd-based grammar https://scm.opendap.org/trac/browser/trunk/xml/dap/dap4.xsd to an equivalent relax-ng grammar. http://dl.dropbox.com/u/53929684/xsd.rng (now also at https://scm.opendap.org/svn/trunk/xml/dap/dap4.rng)


One major difference I see is in dimension handling.

  1. I just used the name "dimension" rather than "shareddimension"; For me, all dimensions (except anonymous ones and variable length) are shared.
  2. The xsd separates out scalars from arrays. I always allowed the dimensions for a variable to be optional to handle the scalar case.
  3. I attempted to be as consistent as possible, so I allowed any type including sequences and structures to be dimensioned.
  4. The dimensions of a variable are currently specified in the rng grammar as an attribute named "dimensions" associated with "variables": e.g. dimensions="dr d1".
    Previously I used this:
<dimensions>
<dimension name="dr"/>
<dimension name="d1"/>
</dimensions>

But this seemed kind of verbose.

Other differences:

  1. The Dataset element in the xsd has a couple of extra attributes. I added these.
  2. The xsd appears to allow attributes to themselves have attributes. This needs discussion.
  3. I forgot enumerations and opaque. I added them.
  4. The URL basetype is in the xsd. What is the justification for keeping it?
  5. It appears that the Dataset contains a top level <group> declaration; I chose to treat the Dataset itself as the top-level group.
  6. Attribute declarations appear to have their own "namespace" attribute. Not sure why this is needed.
  7. I do not understand the purpose of the "NewAttribute" attribute.
  8. The Grid issue, of course There are still some minor differences in representing coordinate variables.
  9. The xsd represents attribute values thus:
<Attribute name="a">
   <value>...</value>
   <value>...</value>
</Attribute>
I provided an alternate form when there is only one value:
<Attribute name="a" value="..."/>
and I chose to use attributes in the multi-valued case because I prefer not to use elements with content unless really necessary. So I represented the above as this.
<Attribute name="a">
  <Value value="..."/>
  <Value value="..."/>
</Attribute>

There are also some minor differences.

  1. Element names (e.g. <structure>) are capitalized in the xsd grammar I modified the rng grammar to capitalize.
  2. There is an issue of interleaving of definitions, or equivalently, what elements must occur in a fixed order.
  3. Where should attributes be legal? I think the rng grammar and the xsd grammar agree on this: putting them almost everywhere, but it needs discussion.

Other differences:

  1. I temporarily suppressed OtherXML because it did not translate correctly Fixed.
  2. I dropped Blobtype; I fail to see the need for this.

Testing the Relax-NG Grammar

NOTE: See subversion as described at the top of this page for more recent versions of the grammar.

You will need to copy three files:

  1. dap4.rng - this is the grammar file; it uses the Relax-NG schema language (http://relaxng.org/).
    This can be obtained from http://dl.dropbox.com/u/53929684/dap4.rng
  2. test.xml - this is a test file, that I am growing to cover the whole grammar.
    This can be obtained from http://dl.dropbox.com/u/53929684/test.xml
  3. jing.jar - Jing is a validator that takes the grammar and a test file and checks that the test file conforms to the grammar.
    This can be obtained from http://dl.dropbox.com/u/53929684/jing.jar.

To use it, do the command:

java -jar jing.jar dap4.rng test.xml

No output is produced if the validation succeeds, otherwise, error messages are produced.

-Dennis Heimbigner