DAP4: DDX Lexical Elements

From OPeNDAP Documentation
Revision as of 22:22, 23 February 2012 by DennisHeimbigner (talk | contribs) (Created page with "At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines * Constants: string, float, integer, char * Identifier...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines

  • Constants: string, float, integer, char
  • Identifiers: ID
  • References identifiers: IDREF
  • Whitespace separated lists of IDREF: IDREFS

Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.

<Value value="..."/>


-Dennis Heimbigner

/* lex specification for tokens for DAP4 DDX */

/* The most correct (validating) version of UTF8 character set
   (Taken from: http://www.w3.org/2005/03/23-lex-U)

Note that ASCII and control are not included.

The lines of the expression cover the UTF8 characters as follows:
1. non-overlong 2-byte
2. excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16

UTF8   ([\xC2-\xDF][\x80-\xBF])                       \
     | (\xE0[\xA0-\xBF][\x80-\xBF])                   \
     | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF])            \
     | (\xED[\x80-\x9F][\x80-\xBF])                   \
     | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF])            \
     | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])        \
     | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \
     | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF])        \

*/


/*The most relaxed version of UTF8 (not used)
UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...)
*/

/*The partially relaxed version of UTF8, and the one used here */
UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])

/* ASCII control characters */
CONTROLS  [\x00-\x1F]

WHITESPACE [ \r\t\f]+

HEXCHAR   [0-9a-zA-Z]

/* Generic Escapes */
XMLESCAPE  "&x{HEXCHAR}{HEXCHAR};"

/* ASCII printable characters */
ASCII     [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]

/* ASCII Printable Characters minus
   ' ','.','/', '"', '&'
*/
IDASCII   [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~]

/* Escapes for ' ','.','/', '&', and '"' */
IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;")

/* Escapes for '"', '&', and '\\' */
STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;")

/* Escapes for '\\', '\'' */
CHARESCAPES ("&x27;" | "&x5C;")

HEXSTRING       (0[xX]{HEXCHAR}{HEXCHAR}*)

EXPONENT ([eE][+-]?[0-9]+)

MANTISSA [+-]?[0-9]*\.[0-9]*

NANINF   (-?inf|nan|NaN)

INTTYPE  ([BbSsLl]|"ll"|"LL")

INT      [+-][0-9][0-9]*{INTTYPE}?
UINT     [0-9][0-9]*{INTTYPE}?
HEXINT   {HEXSTRING}{INTTYPE}?

GROUPPATH  [/]?({ID}[/])*{ID}
STRUCTPATH ({ID}[.])*{ID}

string   ([^"\\&]|{XMLESCAPE})*

char     ([^'\\&]|{XMLESCAPE})

integer   {INT}|{UINT}|{HEXINT}

float    ({MANTISSA}{EXPONENT}?)|{NANINF}

IDCHAR   ({IDASCII}|{XMLESCAPE}|{UTF8})
ID       {IDCHAR}{IDCHAR}*

/* IDREF == path to an object; leads with group path
            separated by '/' and then struct path using '.'
*/
IDREF    {GROUPPATH}{STRUCTPATH}

/* IDREFS is a whitespace separated list of IDREF */
IDREFS   {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})*

%%  /* Order is important */
{integer} {}
{float}   {}
{IDREF}   {}
{IDREFS}  {}
{ID}      {}
{string}  {}