DAP4: DDX Lexical Elements
From OPeNDAP Documentation
Revision as of 22:22, 23 February 2012 by DennisHeimbigner (talk | contribs) (Created page with "At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines * Constants: string, float, integer, char * Identifier...")
At the end of this page is the code for a flex program describing the lexical elements of the DDX. Specifically, it defines
- Constants: string, float, integer, char
- Identifiers: ID
- References identifiers: IDREF
- Whitespace separated lists of IDREF: IDREFS
Remember that in the DDX, these lexical items will be enclosed in double quotes, e.g.
<Value value="..."/>
-Dennis Heimbigner
/* lex specification for tokens for DAP4 DDX */
/* The most correct (validating) version of UTF8 character set
(Taken from: http://www.w3.org/2005/03/23-lex-U)
Note that ASCII and control are not included.
The lines of the expression cover the UTF8 characters as follows:
1. non-overlong 2-byte
2. excluding overlongs
3. straight 3-byte
4. excluding surrogates
5. straight 3-byte
6. planes 1-3
7. planes 4-15
8. plane 16
UTF8 ([\xC2-\xDF][\x80-\xBF]) \
| (\xE0[\xA0-\xBF][\x80-\xBF]) \
| ([\xE1-\xEC][\x80-\xBF][\x80-\xBF]) \
| (\xED[\x80-\x9F][\x80-\xBF]) \
| ([\xEE-\xEF][\x80-\xBF][\x80-\xBF]) \
| (\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF]) \
| ([\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]) \
| (\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]) \
*/
/*The most relaxed version of UTF8 (not used)
UTF8 ([\xC0-\xD6].)|([\xE0-\xEF]..)|([\xF0-\xF7]...)
*/
/*The partially relaxed version of UTF8, and the one used here */
UTF8 ([\xC0-\xD6][\x80-\xBF])|([\xE0-\xEF][\x80-\xBF][\x80-\xBF])|([\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF])
/* ASCII control characters */
CONTROLS [\x00-\x1F]
WHITESPACE [ \r\t\f]+
HEXCHAR [0-9a-zA-Z]
/* Generic Escapes */
XMLESCAPE "&x{HEXCHAR}{HEXCHAR};"
/* ASCII printable characters */
ASCII [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[\\\]\\^_`|{}~]
/* ASCII Printable Characters minus
' ','.','/', '"', '&'
*/
IDASCII [0-9a-zA-Z!#$%&'()*+,-:;<=>?@[\\\]\\^_`|{}~]
/* Escapes for ' ','.','/', '&', and '"' */
IDESCAPES ("&" | """ | "&x20;" | "&x2E;" | "&x2F;" | "&x26;" | "&x22;")
/* Escapes for '"', '&', and '\\' */
STRINGESCAPES ("&" | """ | "&x26;" | "&x22;" | "&x5C;")
/* Escapes for '\\', '\'' */
CHARESCAPES ("&x27;" | "&x5C;")
HEXSTRING (0[xX]{HEXCHAR}{HEXCHAR}*)
EXPONENT ([eE][+-]?[0-9]+)
MANTISSA [+-]?[0-9]*\.[0-9]*
NANINF (-?inf|nan|NaN)
INTTYPE ([BbSsLl]|"ll"|"LL")
INT [+-][0-9][0-9]*{INTTYPE}?
UINT [0-9][0-9]*{INTTYPE}?
HEXINT {HEXSTRING}{INTTYPE}?
GROUPPATH [/]?({ID}[/])*{ID}
STRUCTPATH ({ID}[.])*{ID}
string ([^"\\&]|{XMLESCAPE})*
char ([^'\\&]|{XMLESCAPE})
integer {INT}|{UINT}|{HEXINT}
float ({MANTISSA}{EXPONENT}?)|{NANINF}
IDCHAR ({IDASCII}|{XMLESCAPE}|{UTF8})
ID {IDCHAR}{IDCHAR}*
/* IDREF == path to an object; leads with group path
separated by '/' and then struct path using '.'
*/
IDREF {GROUPPATH}{STRUCTPATH}
/* IDREFS is a whitespace separated list of IDREF */
IDREFS {WHITESPACE}?{IDREF}({WHITESPACE}{IDREF})*
%% /* Order is important */
{integer} {}
{float} {}
{IDREF} {}
{IDREFS} {}
{ID} {}
{string} {}