DAP4: DDX Lexical Elements

From OPeNDAP Documentation
Revision as of 22:22, 23 February 2012 by DennisHeimbigner (talk | contribs) (Created page with "At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines * Constants: string, float, integer, char * Identifier...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines

  • Constants: string, float, integer, char
  • Identifiers: ID
  • References identifiers: IDREF
  • Whitespace separated lists of IDREF: IDREFS

Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.

<Value value="..."/>


-Dennis Heimbigner

/* lex specification for tokens for DAP4 DDX */

/* The most correct (validating) version of UTF8 character set
   (Taken from: http://www.w3.org/2005/03/23-lex-U)

Note that ASCII and control are not included.

The lines of the expression cover the UTF8 characters as follows:
1. non-overlong 2-byte
2. excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16

UTF8   ([\xC2-\xDF][\x80-\xBF])                       \
     | (\xE0[\xA0-\xBF][\x80-\xBF])                   \
     | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])            \
     | (\xED[\x80-\x9F][\x80-\xBF])                   \
     | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])            \
     | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])        \
     | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \
     | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])        \

*/


/*The most relaxed version of UTF8 (not used)
UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...)
*/

/*The partially relaxed version of UTF8, and the one used here */
UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])

/* ASCII control characters */
CONTROLS  [\x00-\x1F]

WHITESPACE [ \r\t\f]+

HEXCHAR   [0-9a-zA-Z]

/* Generic Escapes */
XMLESCAPE  "&x{HEXCHAR}{HEXCHAR};"

/* ASCII printable characters */
ASCII     [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]

/* ASCII Printable Characters minus
   ' ','.','/', '"', '&'
*/
IDASCII   [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~]

/* Escapes for ' ','.','/', '&', and '"' */
IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;")

/* Escapes for '"', '&', and '\\' */
STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;")

/* Escapes for '\\', '\'' */
CHARESCAPES ("&x27;" | "&x5C;")

HEXSTRING       (0[xX]{HEXCHAR}{HEXCHAR}*)

EXPONENT ([eE][+-]?[0-9]+)

MANTISSA [+-]?[0-9]*\.[0-9]*

NANINF   (-?inf|nan|NaN)

INTTYPE  ([BbSsLl]|"ll"|"LL")

INT      [+-][0-9][0-9]*{INTTYPE}?
UINT     [0-9][0-9]*{INTTYPE}?
HEXINT   {HEXSTRING}{INTTYPE}?

GROUPPATH  [/]?({ID}[/])*{ID}
STRUCTPATH ({ID}[.])*{ID}

string   ([^"\\&]|{XMLESCAPE})*

char     ([^'\\&]|{XMLESCAPE})

integer   {INT}|{UINT}|{HEXINT}

float    ({MANTISSA}{EXPONENT}?)|{NANINF}

IDCHAR   ({IDASCII}|{XMLESCAPE}|{UTF8})
ID       {IDCHAR}{IDCHAR}*

/* IDREF == path to an object; leads with group path
            separated by '/' and then struct path using '.'
*/
IDREF    {GROUPPATH}{STRUCTPATH}

/* IDREFS is a whitespace separated list of IDREF */
IDREFS   {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})*

%%  /* Order is important */
{integer} {}
{float}   {}
{IDREF}   {}
{IDREFS}  {}
{ID}      {}
{string}  {}