BES File Out NetCDF: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 1: Line 1:
== General Questions and Assumptions ==
* What version of netCDF will this support?
[[User:Jimg|jimg]] 11:39, 21 December 2008 (PST) Initially we should support netCDF version 3
* Should I traverse the data structure to see if there are any sequences?
[[User:Jimg|jimg]] 17:53, 18 December 2008 (PST) Yes. An initial version should note their presence and add an attribute noting that they have been elided.
== How to flatten hierarchical types ==
== How to flatten hierarchical types ==


Line 16: Line 26:
This in some way obviates the need for the dot, but I think we should use that regardless.
This in some way obviates the need for the dot, but I think we should use that regardless.


== General questions ==
== Extra data to be included ==
* Should I traverse the data structure to see if there are any sequences?
 
For a file format like netCDF it is possible to include data about the source data using it's original data model as expressed using DAP. we could then describe where each variable in the file came from. This would be a good thing if we can do it in a light-weight way. I think it would also be a good thing to add an attribute to each variable that names where in the original data it came from so that client apps & users don't have to work too hard to sort out what has been changed to make the file.


[[User:Jimg|jimg]] 17:53, 18 December 2008 (PST) Yes. An initial version should note their presence and add an attribute noting that they have been elided. To translate a Sequence, there are several cases to consider:
== Information About Specific Types ==
# A Sequence of simple types only (which means a one-level sequence): translate to a set of arrays using a name-prefix flattening scheme.
# A nested sequence (otherwise with only simple types) should first be flattened to a one level sequence and then that should be flattened.
# A Sequence with a Structure or Grid should be flattened by recursively applying the flattening logic to the components.


=== Structures ===
=== Structures ===
Line 43: Line 51:
=== Sequences ===
=== Sequences ===
* For now throw an exception [[User:Jimg|jimg]] 11:31, 21 December 2008 (PST) Initial version should elide these I think because there are important cases where they appear as part of a dataset but not the main part. We can represent these as arrays easily in the future.
* For now throw an exception [[User:Jimg|jimg]] 11:31, 21 December 2008 (PST) Initial version should elide these I think because there are important cases where they appear as part of a dataset but not the main part. We can represent these as arrays easily in the future.
[[User:Jimg|jimg]] 11:39, 21 December 2008 (PST) To translate a Sequence, there are several cases to consider:
# A Sequence of simple types only (which means a one-level sequence): translate to a set of arrays using a name-prefix flattening scheme.
# A nested sequence (otherwise with only simple types) should first be flattened to a one level sequence and then that should be flattened.
# A Sequence with a Structure or Grid should be flattened by recursively applying the flattening logic to the components.


=== Attributes ===
=== Attributes ===

Revision as of 19:39, 21 December 2008

General Questions and Assumptions

  • What version of netCDF will this support?

jimg 11:39, 21 December 2008 (PST) Initially we should support netCDF version 3

  • Should I traverse the data structure to see if there are any sequences?

jimg 17:53, 18 December 2008 (PST) Yes. An initial version should note their presence and add an attribute noting that they have been elided.

How to flatten hierarchical types

For a structure such as

Structure {
    Int x;
    Int y;
} Point;

represent that as

Point.x
Point.y

Explicitly including the dot seems ugly and like a kludge and so on, but it means that the new variable name can be feed back into the server to get the data.

Because this is hardly a lossless, we should also add an attribute that contains the original real name of the variable - information that this is the result of a flattening operation, that the parent variable was a Structure, Sequence or Grid and its name was xyz. Given that, it should be easy to sort out how to make a future request for the data in the translated variable.

This in some way obviates the need for the dot, but I think we should use that regardless.

Extra data to be included

For a file format like netCDF it is possible to include data about the source data using it's original data model as expressed using DAP. we could then describe where each variable in the file came from. This would be a good thing if we can do it in a light-weight way. I think it would also be a good thing to add an attribute to each variable that names where in the original data it came from so that client apps & users don't have to work too hard to sort out what has been changed to make the file.

Information About Specific Types

Structures

  • Flatten
  • prepend name of structure with an underscore followed by the variable name. Keep track as there might be embedded structures, grids, etc...

jimg 17:53, 18 December 2008 (PST) Use the procedure described above in How to flatten hierarchical types.

jimg 17:53, 18 December 2008 (PST) I would use a dot even though I know that dots in variable names are, in general, a bad idea. If we use underscores then it maybe hard for clients to form a name that can be used to access values from a server based on the information in the file.

Grid

  • Flatten.
  • Use the name of the grid for the array of values
  • prepend the name of the grid plus an underscore to the names of each of the map vectors. jimg 11:31, 21 December 2008 (PST) A more sophisticated version might look at the values of two or more grids that use the same names and have the same type (e.g., Float64 lon[360]) and if they are the same, make them shared dimensions.

Array

  • write_array appears to be working just fine.
  • If array of complex types?

Sequences

  • For now throw an exception jimg 11:31, 21 December 2008 (PST) Initial version should elide these I think because there are important cases where they appear as part of a dataset but not the main part. We can represent these as arrays easily in the future.

jimg 11:39, 21 December 2008 (PST) To translate a Sequence, there are several cases to consider:

  1. A Sequence of simple types only (which means a one-level sequence): translate to a set of arrays using a name-prefix flattening scheme.
  2. A nested sequence (otherwise with only simple types) should first be flattened to a one level sequence and then that should be flattened.
  3. A Sequence with a Structure or Grid should be flattened by recursively applying the flattening logic to the components.

Attributes

  • Global Attributes?
    • Include global attribute to include URL that generated that new file
  • Variable Attributes
    • What about attributes for structures? Should these attributes be created for each of the variables in the structure? So, if there is a structure a with variables v1 and v2 then the attributes for a will be attributes for a_v1 and a_v2? Or are there attributes for each of the variables in the structure? Or both.
    • For multi-dimensional datasets there will be a structure for each container, and each of these containers will have global attributes.