DAP4: DDX Lexical Elements: Difference between revisions

From OPeNDAP Documentation
(Created page with "At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines * Constants: string, float, integer, char * Identifier...")
(No difference)

Revision as of 22:22, 23 February 2012

At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines

  • Constants: string, float, integer, char
  • Identifiers: ID
  • References identifiers: IDREF
  • Whitespace separated lists of IDREF: IDREFS

Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.

<Value value="..."/>

-Dennis Heimbigner

/* lex specification for tokens for DAP4 DDX */

/* The most correct (validating) version of UTF8 character set
   (Taken from: http://www.w3.org/2005/03/23-lex-U)

Note that ASCII and control are not included.

The lines of the expression cover the UTF8 characters as follows:
1. non-overlong 2-byte
2. excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16

UTF8   ([\xC2-\xDF][\x80-\xBF])                       \
     | (\xE0[\xA0-\xBF][\x80-\xBF])                   \
     | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])            \
     | (\xED[\x80-\x9F][\x80-\xBF])                   \
     | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])            \
     | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])        \
     | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \
     | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])        \


/*The most relaxed version of UTF8 (not used)
UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...)

/*The partially relaxed version of UTF8, and the one used here */
UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])

/* ASCII control characters */
CONTROLS  [\x00-\x1F]

WHITESPACE [ \r\t\f]+

HEXCHAR   [0-9a-zA-Z]

/* Generic Escapes */

/* ASCII printable characters */
ASCII     [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]

/* ASCII Printable Characters minus
   ' ','.','/', '"', '&'
IDASCII   [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~]

/* Escapes for ' ','.','/', '&', and '"' */
IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;")

/* Escapes for '"', '&', and '\\' */
STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;")

/* Escapes for '\\', '\'' */
CHARESCAPES ("&x27;" | "&x5C;")


EXPONENT ([eE][+-]?[0-9]+)

MANTISSA [+-]?[0-9]*\.[0-9]*

NANINF   (-?inf|nan|NaN)

INTTYPE  ([BbSsLl]|"ll"|"LL")

INT      [+-][0-9][0-9]*{INTTYPE}?
UINT     [0-9][0-9]*{INTTYPE}?

GROUPPATH  [/]?({ID}[/])*{ID}

string   ([^"\\&]|{XMLESCAPE})*

char     ([^'\\&]|{XMLESCAPE})

integer   {INT}|{UINT}|{HEXINT}



/* IDREF == path to an object; leads with group path
            separated by '/' and then struct path using '.'

/* IDREFS is a whitespace separated list of IDREF */

%%  /* Order is important */
{integer} {}
{float}   {}
{IDREF}   {}
{IDREFS}  {}
{ID}      {}
{string}  {}