DAP4: DDX Lexical Elements

From OPeNDAP Documentation
Revision as of 18:10, 27 March 2012 by Ndp (talk | contribs)

At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines

  • Constants: string, float, integer, char
  • Identifiers: ID
  • References identifiers: IDREF
  • Whitespace separated lists of IDREF: IDREFS


I don't understand this proposal. The following sentence and XML snippet imply to me that the flex grammar is specifically designed to parse the values of XML attributes found in the DDX. If that's not the intention then could someone please reword this to more clearly illustrate the intention of this page? Thanks ndp 11:10, 27 March 2012 (PDT)

Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.

<Value value="..."/>


-Dennis Heimbigner

/* lex specification for tokens for DAP4 DDX */

/* The most correct (validating) version of UTF8 character set
   (Taken from: http://www.w3.org/2005/03/23-lex-U)

Note that ASCII and control are not included.

The lines of the expression cover the UTF8 characters as follows:
1. non-overlong 2-byte
2. excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16

UTF8   ([\xC2-\xDF][\x80-\xBF])                       \
     | (\xE0[\xA0-\xBF][\x80-\xBF])                   \
     | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])            \
     | (\xED[\x80-\x9F][\x80-\xBF])                   \
     | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])            \
     | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])        \
     | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \
     | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])        \

*/


/*The most relaxed version of UTF8 (not used)
UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...)
*/

/*The partially relaxed version of UTF8, and the one used here */
UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])

/* ASCII control characters */
CONTROLS  [\x00-\x1F]

WHITESPACE [ \r\t\f]+

HEXCHAR   [0-9a-zA-Z]

/* Generic Escapes */
XMLESCAPE  "&x{HEXCHAR}{HEXCHAR};"

/* ASCII printable characters */
ASCII     [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]

/* ASCII Printable Characters minus
   ' ','.','/', '"', '&'
*/
IDASCII   [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~]

/* Escapes for ' ','.','/', '&', and '"' */
IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;")

/* Escapes for '"', '&', and '\\' */
STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;")

/* Escapes for '\\', '\'' */
CHARESCAPES ("&x27;" | "&x5C;")

HEXSTRING       (0[xX]{HEXCHAR}{HEXCHAR}*)

EXPONENT ([eE][+-]?[0-9]+)

MANTISSA [+-]?[0-9]*\.[0-9]*

NANINF   (-?inf|nan|NaN)

INTTYPE  ([BbSsLl]|"ll"|"LL")

INT      [+-][0-9][0-9]*{INTTYPE}?
UINT     [0-9][0-9]*{INTTYPE}?
HEXINT   {HEXSTRING}{INTTYPE}?

GROUPPATH  [/]?({ID}[/])*{ID}
STRUCTPATH ({ID}[.])*{ID}

string   ([^"\\&]|{XMLESCAPE})*

char     ([^'\\&]|{XMLESCAPE})

integer   {INT}|{UINT}|{HEXINT}

float    ({MANTISSA}{EXPONENT}?)|{NANINF}

IDCHAR   ({IDASCII}|{XMLESCAPE}|{UTF8})
ID       {IDCHAR}{IDCHAR}*

/* IDREF == path to an object; leads with group path
            separated by '/' and then struct path using '.'
*/
IDREF    {GROUPPATH}{STRUCTPATH}

/* IDREFS is a whitespace separated list of IDREF */
IDREFS   {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})*

%%  /* Order is important */
{integer} {}
{float}   {}
{IDREF}   {}
{IDREFS}  {}
{ID}      {}
{string}  {}