DAP4: DDX Lexical Elements
From OPeNDAP Documentation
Revision as of 22:22, 23 February 2012 by DennisHeimbigner (talk | contribs) (Created page with "At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines * Constants: string, float, integer, char * Identifier...")
At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines
- Constants: string, float, integer, char
- Identifiers: ID
- References identifiers: IDREF
- Whitespace separated lists of IDREF: IDREFS
Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.
<Value value="..."/>
-Dennis Heimbigner
/* lex specification for tokens for DAP4 DDX */ /* The most correct (validating) version of UTF8 character set (Taken from: http://www.w3.org/2005/03/23-lex-U) Note that ASCII and control are not included. The lines of the expression cover the UTF8 characters as follows: 1. non-overlong 2-byte 2. excluding overlongs 3. straight 3-byte 4. excluding surrogates 5. straight 3-byte 6. planes 1-3 7. planes 4-15 8. plane 16 UTF8 ([\xC2-\xDF][\x80-\xBF]) \ | (\xE0[\xA0-\xBF][\x80-\xBF]) \ | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF]) \ | (\xED[\x80-\x9F][\x80-\xBF]) \ | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF]) \ | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF]) \ | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \ | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]) \ */ /*The most relaxed version of UTF8 (not used) UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...) */ /*The partially relaxed version of UTF8, and the one used here */ UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF]) /* ASCII control characters */ CONTROLS [\x00-\x1F] WHITESPACE [ \r\t\f]+ HEXCHAR [0-9a-zA-Z] /* Generic Escapes */ XMLESCAPE "&x{HEXCHAR}{HEXCHAR};" /* ASCII printable characters */ ASCII [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~] /* ASCII Printable Characters minus ' ','.','/', '"', '&' */ IDASCII [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~] /* Escapes for ' ','.','/', '&', and '"' */ IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;") /* Escapes for '"', '&', and '\\' */ STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;") /* Escapes for '\\', '\'' */ CHARESCAPES ("&x27;" | "&x5C;") HEXSTRING (0[xX]{HEXCHAR}{HEXCHAR}*) EXPONENT ([eE][+-]?[0-9]+) MANTISSA [+-]?[0-9]*\.[0-9]* NANINF (-?inf|nan|NaN) INTTYPE ([BbSsLl]|"ll"|"LL") INT [+-][0-9][0-9]*{INTTYPE}? UINT [0-9][0-9]*{INTTYPE}? HEXINT {HEXSTRING}{INTTYPE}? GROUPPATH [/]?({ID}[/])*{ID} STRUCTPATH ({ID}[.])*{ID} string ([^"\\&]|{XMLESCAPE})* char ([^'\\&]|{XMLESCAPE}) integer {INT}|{UINT}|{HEXINT} float ({MANTISSA}{EXPONENT}?)|{NANINF} IDCHAR ({IDASCII}|{XMLESCAPE}|{UTF8}) ID {IDCHAR}{IDCHAR}* /* IDREF == path to an object; leads with group path separated by '/' and then struct path using '.' */ IDREF {GROUPPATH}{STRUCTPATH} /* IDREFS is a whitespace separated list of IDREF */ IDREFS {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})* %% /* Order is important */ {integer} {} {float} {} {IDREF} {} {IDREFS} {} {ID} {} {string} {}