DAP4: DDX Lexical Elements
From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines
- Constants: string, float, integer, char
- Identifiers: ID
- References identifiers: IDREF
- Whitespace separated lists of IDREF: IDREFS
I don't understand this proposal. The following sentence and XML snippet imply to me that the flex grammar is specifically designed to parse the values of XML attributes found in the DDX. If that's not the intention then could someone please reword this to be less confusing?
Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.
<Value value="..."/>
-Dennis Heimbigner
/* lex specification for tokens for DAP4 DDX */ /* The most correct (validating) version of UTF8 character set (Taken from: http://www.w3.org/2005/03/23-lex-U) Note that ASCII and control are not included. The lines of the expression cover the UTF8 characters as follows: 1. non-overlong 2-byte 2. excluding overlongs 3. straight 3-byte 4. excluding surrogates 5. straight 3-byte 6. planes 1-3 7. planes 4-15 8. plane 16 UTF8 ([\xC2-\xDF][\x80-\xBF]) \ | (\xE0[\xA0-\xBF][\x80-\xBF]) \ | ([\xE1-\xEC][\x80-\xBF][\x80-\xBF]) \ | (\xED[\x80-\x9F][\x80-\xBF]) \ | ([\xEE-\xEF][\x80-\xBF][\x80-\xBF]) \ | (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF]) \ | ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \ | (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]) \ */ /*The most relaxed version of UTF8 (not used) UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...) */ /*The partially relaxed version of UTF8, and the one used here */ UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF]) /* ASCII control characters */ CONTROLS [\x00-\x1F] WHITESPACE [ \r\t\f]+ HEXCHAR [0-9a-zA-Z] /* Generic Escapes */ XMLESCAPE "&x{HEXCHAR}{HEXCHAR};" /* ASCII printable characters */ ASCII [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~] /* ASCII Printable Characters minus ' ','.','/', '"', '&' */ IDASCII [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~] /* Escapes for ' ','.','/', '&', and '"' */ IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;") /* Escapes for '"', '&', and '\\' */ STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;") /* Escapes for '\\', '\'' */ CHARESCAPES ("&x27;" | "&x5C;") HEXSTRING (0[xX]{HEXCHAR}{HEXCHAR}*) EXPONENT ([eE][+-]?[0-9]+) MANTISSA [+-]?[0-9]*\.[0-9]* NANINF (-?inf|nan|NaN) INTTYPE ([BbSsLl]|"ll"|"LL") INT [+-][0-9][0-9]*{INTTYPE}? UINT [0-9][0-9]*{INTTYPE}? HEXINT {HEXSTRING}{INTTYPE}? GROUPPATH [/]?({ID}[/])*{ID} STRUCTPATH ({ID}[.])*{ID} string ([^"\\&]|{XMLESCAPE})* char ([^'\\&]|{XMLESCAPE}) integer {INT}|{UINT}|{HEXINT} float ({MANTISSA}{EXPONENT}?)|{NANINF} IDCHAR ({IDASCII}|{XMLESCAPE}|{UTF8}) ID {IDCHAR}{IDCHAR}* /* IDREF == path to an object; leads with group path separated by '/' and then struct path using '.' */ IDREF {GROUPPATH}{STRUCTPATH} /* IDREFS is a whitespace separated list of IDREF */ IDREFS {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})* %% /* Order is important */ {integer} {} {float} {} {IDREF} {} {IDREFS} {} {ID} {} {string} {}