DAP4: Inclusion of response metadata in the DMR: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 20: Line 20:
Based on a cursory reading of the [http://www.w3.org/TR/prov-o/ PROV-O specification], I think that notation could be used for all of this.  
Based on a cursory reading of the [http://www.w3.org/TR/prov-o/ PROV-O specification], I think that notation could be used for all of this.  


The information will be encoded in a new XML element to be added to the DMR called ''ResponseProvenance''.
The information will be encoded in a new XML element to be added to the DMR called ''DAPResponseProvenance''.


While this could get very detailed, I propose the following for the six kinds of information listed:
While this could get very detailed, I propose the following for the six kinds of information listed:

Revision as of 17:12, 24 July 2013

<-- back to OPULS Development

James Gallagher

Background

At the recent ESIP meeting, as well as the DataOne meeting that immediately preceded it, many people mentioned that responses from services would be more useful if they contained some meta information about the response. This information would help users understand how the response was made; it is essentially provenance information without the heavyweight burden that it contain everything known about those data.

Problem addressed

When users are confronted with web services (or feel confronted) they often wonder about the origin of the data they are accessing. Local data does not typically present this problem because there are often people close by who can answer questions about the data's origin and subsequent processing. Binding information to the data response that holds this kind of information is one way to address a fundamental issue with remote data access systems. NB: These same issues (will) arise with file transfer systems, but users seem more comfortable (or less uncomfortable...) with using data from them. Also, DataOne in particular, addresses a user population that is less likely to have used remote data archives in the past and so, as a group, are less accepting of the (remote) data's validity. These new users are raising valid questions about how they can tell if data are suitable for their needs.

Proposed solution

Bundle information in the DMR. The information should include:

  1. where it came from;
  2. the source data set(s);
  3. processing that happened;
  4. the software version;
  5. how to cite the dataset in a publication;
  6. licensing information/restrictions.

Based on a cursory reading of the PROV-O specification, I think that notation could be used for all of this.

The information will be encoded in a new XML element to be added to the DMR called DAPResponseProvenance.

While this could get very detailed, I propose the following for the six kinds of information listed:

where it came from
The URL used to access the resource.
the source data set(s)
Filename (not pathname, though). For an aggregation, the name of the NCML file. Extension: for aggregations, list all files touched. Because this might be complicated, it might not make an initial version.
processing that happened
This would be the name of the access (DMR, Data, ...). Extension: any server functions that were used.
the software version
This will vary for different servers (it might be one number or a list or numbers), but in the end it is one or more of the tried and true x.y.z version numbers along with names.
how to cite the dataset in a publication
Nominally a DOI or instructions. This would hopefully be boilerplate for a given server, although having it vary will be what users want since there's a move to have dataset citations list people who did the work and an institutional server will provide access to datasets with several different authors for different datasets.
licensing information/restrictions
URL reference to a license.

Rationale for the solution

Discussion