DAP4: DAP4 Escapes

From OPeNDAP Documentation
Revision as of 18:50, 1 April 2012 by DennisHeimbigner (talk | contribs) (Created page with "== Background == The character escaping mechanisms in DAP2 have, in retrospect, caused signficant confusion and led to conflicting implementations. Character Escaping (aka escap...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
⧼opendap2-jumptonavigation⧽

Background

The character escaping mechanisms in DAP2 have, in retrospect, caused signficant confusion and led to conflicting implementations.

Character Escaping (aka escapes) occur in several places.

  1. Identifiers: some characters in identifiers (blanks, for example) require escaping in certain syntactic situations in order to properly be interpreted.
  2. String and character constants: at least the surrounding quote character (typically " or ' ) requires some form of escape so that it can occur inside string or character constants.
  3. Queries: a number of characters ('&','.',etc) have special meaning when they occur as part of a DAP2 query, and so they require escaping if they are, for example, part of an identifier or constant.

In retrospect, the DAP2 escape mechanism was chosen to be the same as the standard URL escape mechanism, when a character was converted to two hex digits and represented as %HH, where H is a hex digit. Especially when do escapes in queries, this led to confusion about when one was doing DAP2 escaping and when one was doing URL escaping.

Proposal

The primary proposal is to ensure that we use escaping mechanisms that are clearly not the same as the standard URL escaping mechanism.

For DAP4, the problem simplifies significantly because the DDX uses XML, so we can directly use the standard XML Entity escape mechanism, which in its most general form is &#DDD;, that is, an ampersand followed by a sharp followed by some number of decimal digits followed by a semicolon.

In practice, only four escape characters are needed.

  1. & (&)
  2. > (>)
  3. &lt; (<)
  4. &quot; (")

Note that XML entities may also occur in attribute values, so it can be used as the general escape mechanism in XML and is quite distinct from the URL encoding format.

The issue that needs to be addressed is how do to escapes in queries (i.e, anything after the left-most '?' in the URL. I would propose we choose one of the two following options.

  1. Standard C/Java, etc '\' escaping, where encountering \c is changes the interpretation of character c.
  2. Use the XML escaping mechanism.

Discussion

Using the backslash escaping mechanism has the advantage that it is well known and is easy to parse.

Using the XML escape mechanism has the advantage of consistency. It is, however, slightly more complicated to parse.

My personal preference is to use the backslash escaping notation. The XML escape mechanism will need to be used for characters like '.', and '/' and these do not have predefined mnemonic definitions so the &#ddddd; notation will need to be used, which is significantly less clear than the backslash notation.