BES - Modules - The HDF4 Handler

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Kinds of files the handler will serve

This release of the server supports HDF4.2 and can read any file readable using that version of the API. It also supports reading/parsing HDF-EOS attribute information and provides some special mappings for HDF-EOS files depeding on the handler's build options.

Mappings between the HDF4 data model and DAP2 data types

SDS
This is mapped to a DAP2 Grid (if it has a dimension scale) or Array (if it lacks a dim scale).
Raster image
This is read via the HDF 4.0 General Raster interface and is mapped to Array. Each component of a raster is mapped to a new dimension labeled accordingly. For example, a 2-dimensional, 3-component raster is mapped to an m x n x 3 Array.
Vdata
This is mapped to a Sequence, each element of which is a Structure. Each subfield of the Vdata is mapped to an element of the Structure. Thus a Vdata with one field of order 3 would be mapped to a Sequence of 1 Structure containing 3 base types. Note: Even though these appear as Sequences, the data handler does not support applying relational constraints to them. You can use the array notation to request a range of elements.
Attributes
HDF attributes on SDS, rasters are straight-forwardly mapped to DAP attributes (HDF doesn't yet support Vdata attributes). File attributes (both SDS, raster) are mapped as attributes of a DAP variable called "HDF_GLOBAL" (by analogy to the way DAP handles netCDF global attributes, i.e., attaching them to "NC_GLOBAL").
Annotations
HDF file annotations mapped in the DAP to attribute values of type "String" attached to the fake DAP variable named "HDF_ANNOT". HDF annotations on objects are currently not read by the server.
Vgroups
Vgroups are straight-forwardly mapped to Structures.

Mappings for the HDF-EOS data model

This needs to be documented.

Special characters in HDF identifiers

A number of non-alphanumeric characters (e.g., space, #, +, -) used in HDF identifiers are not allowed in the names of DAP objects, object components or in URLs. The HDF4 data handler therefore deals internally with translated versions of these identifiers. To translate the WWW convention of escaping such characters by replacing them with "%" followed by the hexadecimal value of their ASCII code is used. For example, "Raster Image #1" becomes "Raster%20Image%20%231". These translations should be transparent to users of the server (but they will be visible in the DDS, DAS and in any applications which use a client that does not translate the identifiers back to their original form).

Known problems

Handling of floating point attributes

Because the DAP software encodes attribute values as ASCII strings there will be a loss of accuracy for floating point attributes. This loss of accuracy is dependent on the version of the C++ I/O library used in compiling/linking the software (i.e., the amount of floating point precision preserved when outputting to ASCII is dependent on the library). Typically it is very small (e.g., at least six decimal places are preserved).

Handling of global attributes

  • The server will merge the separate global attributes for the SD, GR interfaces with any file annotations into one set of global attributes. These will then be available through any of the global attribute access functions.
  • If the client opens a constrained dataset (e.g., in SDstart), any global attributes of the unconstrained dataset will not be accessible because the constraint creates a "virtual dataset" which is a subset of the original unconstrained dataset.

How to install CF-enabled HDF4 Handler correctly

The first step of using the HDF4 handler with CF option is to install the handler correctly because it has three different options. We'll call them default, generic, and hdfeos2 for convenience.

  • default: This option gives the same output as the legacy handler.
  • generic: This option gives the output that meets the basic CF conventions regardless of HDF4 and HDF-EOS2 products. Some HDF4 products can meet the extra CF conventions while most HDF-EOS2 products will fail to meet the extra CF conventions.
  • hdfeos2: This option treats HDF-EOS2 products differently so that their output follows not only the basic CF conventions but also the extra CF conventions. For HDF4 products, the output is same as the generic option.

Pick the right RPM instead of building from source

If you use Linux system that supports RPM package manager and have a super user privilege, the easiest way to install the HDF4 handler is using RPMs provided by OPeNDAP, Inc. website.

The OPeNDAP's download website provides two RPMs --- one with HDF-EOS and one without HDF-EOS. You should pick the one with HDF-EOS if you want to take advantage of the extra CF support provided by the handler. If you pick one without HDF-EOS, please make sure that the H4.EnableCF key is set "true" in h4.conf file. See section 3.1 below for the full usage.

Here are two basic commands for deleting and adding RPMs:

  • Remove any existing RPM package using 'rpm -e <package_name>'.
  • Install a new RPM package using 'rpm -i <package_name.rpm>'.

1) Download and install the latest "libdap", "BES", and "General purpose handlers (aka dap-server)" RPMs first from

  http://opendap.org/download/hyrax

3) Download and install the latest "hdf4_handler" RPM from

  http://opendap.org/download/hyrax

4) (Optional) Configure the handler after reading the section 3 below.

5) (Re)start the BES server.

  %/usr/bin/besctl (re)start

Build with the HDF-EOS2 library if you plant to support HDF-EOS2 products.

If you plan to build one instead of using RPMs and to support HDF-EOS2 products, please consider installing the HDF-EOS2 library first. Then, build the handler by specifying --with-hdfeos2=/path/to/hdfeos2-install-prefix during the configuration stage like below:

  ./configure --with-hdf4=/usr/local --with-hdfeos2=/usr/local/

Although the HDF-EOS2 library is not required to clean dataset names and attributes that CF conventions require, visualization will fail for most HDF-EOS2 products without the use of HDF-EOS2 library. Therefore, it is strongly recommended to use --with-hdfeos2 configuration option if you plan to serve NASA HDF-EOS2 data products. The --with-hdfeos2 configuration option will affect only the outputs of the HDF-EOS2 files including hybrid files, not pure HDF4 files.

As long as the H4.EnableCF key is set to be true as described in section 3.1 below, the HDF4 handler will generate outputs that conform to the basic CF conventions even though the HDF-EOS2 library is not specified with the --with-hdfeos2 configuration option. All HDF-EOS2 objects will be treated as pure HDF4 objects.

Please see the INSTALL document on step-by-step instruction on building the handler.

Configuration parameters

What are the CF conventions and how is it related to the new HDF4 handler?

Before we discuss the usage further, it's very important to know what the CF conventions are. The CF conventions precisely define metadata that provide a description of physical, spatial, and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building easy-to-use visualization tools with maps in different projections.

Here, we define the two levels of meeting the CF conventions: basic and extra.

  • Basic: CF conventions have basic (syntactic) rules in describing the metadata itself correctly. For example, dimensions should have names; certain characters are not allowed; no duplicate variable dimension names are allowed.
  • Extra: All physical, spatial, and temporal properties of the data are correctly described so that visualization tools (e.g., IDV and Panoply) can pick them up and display datasets correctly with the right physical units. A good example is the use of "units" and "coordinates" attributes.

If you look at NASA HDF4 and HDF-EOS2 products, they are very diverse in self-describing data and fail to meet CF conventions in many ways. Thus, the HDF4 handler aims to meet the conventions by correcting OPeNDAP attribute(DAS)/description(DDS)/data outputs on the fly. Although we tried our best effort to implement the "extra" level of meeting the CF conventions, some products are inherently difficult to meet such level. In those cases, we ended up meeting the basic level of meeting the CF conventions.

BES Keys in h4.conf

You can control HDF4 handler's output behavior significantly by changing key values in a configuration file called "h4.conf".

If you used RPMs, you can find the h4.conf file in /etc/bes/modules/. If you built one, you can find the h4.conf file in {prefix}/etc/bes/modules.

The following 6 BES keys are newly added in the h4.conf file. The default configuration values are specified in the parentheses.

H4.EnableCF (true)

If this key's value is false, the handler will behave same as the default handler. The output will not follow basic CF conventions. Object and attribute names will not be corrected to follow the CF conventions. Most NASA products cannot be visualized by visualization tools that follow the CF conventions. Such tools include IDV and Panoply.

The rest of keys below relies on this option. This key must be set to be "true" to ensure other keys to be valid. Thus, this is the most important key to be turned on.

H4.EnableMODAPSFile(false)

By turning EnableMODAPSFile to be true, when HDF-EOS2 library is used, an extra HDF file handle(by calling SDstart) will be generated at the beginning of DAS,DDS and Data build. This may be useful for a server that mounts its data over the network. If you are not sure about your server settings, always leave it as false or comment out this key. By default this key is turned off.

H4.EnableSpecialEOS (true)

When turning on this key, the handler will handle AIRS level 3 version 6 products and MOD08_M3-like products in a speedy way by taking advantage of the special data structures in these two products. Using this key requires the use of HDF-EOS2 library now although HDF-EOS2 library will not be called. By turning on this key, potentially HDF-EOS2 files that provide dimension scales for all dimensions may also be handled quickly. By default, this key should be set to true.

H4.DisableScaleOffsetComp (true)

Some NASA HDF4(MODIS etc.) products don't follow the CF rule to pack the data. To avoid the confusion for OPeNDAP's clients , the handler may adopt the following two approaches:

  1. Apply the scale and offset computation to the individual data point if the scale and offset rule doesn't follow CF in the handler.
  2. If possible, transform the scale and offset rule to CF rule.

Since approach 1) may degrade the performance of fetching large size data by heavy computation, we recommend approach 2), which is indicated by setting this key to be true. By default, this key should always be true.

H4.EnableCheckScaleOffsetType (true)

By turning on this key, the handler will check if the datatype of scale_factorand offset is the same. This is required by CF. If they don't share the same datatype, the handler will make the data type of offset be the same as that of scale_factor.

Since we haven't found the data type inconsistencies of scale_factor and offset, in order not affect the performance,this key will be set to false by default.

H4.EnableHybridVdata (true)

If this key's value is false, additional Vdata such as "Level 1B Swath Metadta" in LAADS MYD021KM product will not be processed and visible in the DAS/DDS output. Those additional Vdatas are added directly using HDF4 APIs and HDF-EOS2 APIs cannot access them.

H4.EnableCERESVdata (false)

Some CERES products(CER_AVG,CER_ES4,CER_SRB and CER_ZAVG, see description in the HDFSP.h) have many SDS fields and some Vdata fields. Correspondingly, the DDS and DAS page may be very long. The performance of accessing such products with visualization clients may be greatly affected. It may potentially even choke netCDF java clients.

To avoid such cases, we will not map vdata to DAP in such products by default. Users can turn on this key to check vdata information of some CERES products. This key will not affect the access of other products.

H4.EnableVdata_to_Attr (true)

If this key's value is false, small vdata datasets will be mapped to arrays in DDS output instead of attributes in DAS.

If this key's value is true, vdata is mapped to attribute if there are less than or equal to 10 records.

For example, the DAS output of TRMM data 1B21 will show vdata as an attribute:

  DATA_GRANULE_PR_CAL_COEF {
       String hdf4_vd_desc "This is an HDF4 Vdata.";
       Float32 Vdata_field_transCoef -0.5199999809;
       Float32 Vdata_field_receptCoef 0.9900000095;
       Float32 Vdata_field_fcifIOchar 0.000000000, 0.3790999949, 0.000000000, 
       -102.7460022, 0.000000000, 24.00000000, 0.000000000, 226.0000000, 0.000000000, 
       0.3790999949, 0.000000000, -102.7460022, 0.000000000, 24.00000000, 0.000000000, 
       226.0000000;
   }

H4.EnableCERESMERRAShortName (true)

If this key's value is false, the object name will be prefixed by the vgroup path and the fullpath attribute will not be printed in DAS output. This key only affects NASA CERES and MERRA products we support.

For example, the DAS output for Region_Number dataset

    Region_Number {
        String coordinates "Colatitude Longitude";
        String fullpath "/Monthly Hourly Averages/Time And Position/Region Number";
   }

becomes

   Monthly_Hourly_Averages_Time_And_Position_Region_Number {
        String coordinates "Monthly_Hourly_Averages_Time_And_Position_Colatitude Monthly_Hourly_Averages_Time_And_Position_Longitude";
   }

in CER_AVG_Aqua-FM3-MODIS_Edition2B_007005.200510.hdf.

H4.DisableVdataNameclashingCheck (true)

If this key's value is false, the handler will check if there's any vdata that has the same name as SDS. We haven't found such a case in NASA products so it's safe to disable this to improve performance.

H4.EnableVdataDescAttr (false)

If this key's value is true, the handler will generate vdata's attributes. By default, it's turned off because most NASA hybrid products do not seem to store important information in vdata attributes. If you serve pure HDF4 files, it's recommended to turn this value to true so that users can see all data. This key will not affect the behavior of the handler triggered by the H4.EnableVdata_to_Attr key in section 3.3 except the vdata attributes of small vdatas that are mapped to attributes in DAS instead of arrays in DDS. That is, only attributes of small vdatas will be also turned off from the DAS output if this key is turned off, not the values of vdatas. If a vdata doesn't have any attribute or field attribute, the description

       String hdf4_vd_desc "This is an HDF4 Vdata.";

will not appear in the attribute for that vdata although the key is true. The attribute container of the vdata will always appear regardless of this key's value.

H4.EnableCheckMODISGeoFile (false)

For MODIS swath data products that use the dimension map, if this key's value is true and a MODIS geo-location product such as MOD03 is present and under the same directory as the swath product, the geolocation values will be retrieved using the geolocation fields in MOD03/MYD03 file instead of using the interpolation according to the dimension map formula.

We feel this is a more accurate approach since additional corrections may be done for geo-location values stored in those files [1] although we've done a case study that shows the differences between the interpolated values and the values stored in the geo-location file are very small.

For example, when the handler serves

       "MOD05_L2.A2010001.0000.005.2010005211557.hdf" 

file, it will first look for a geo-location file

       "MOD03.A2010001.0000.005.2010003235220.hdf" 

first from the SAME DIRECTORY where MOD05_L2 file exists.

Please note that the "A2010001.0000" in the middle of the name is the "Acquisition Date" of the data so the geo-location file name should have exactly the same string. Handler uses this string to identify if a MODIS geo-location file exists or not.

This feature works only with HDF-EOS2 MODIS products. It will not work on the pure HDF4 MODIS product like MOD14 that requires the MOD03 geo-location product. That is, putting the MOD03 file with MOD14 in the same directory will not affect the handler's DAS/DDS/DDX output of the MOD14 product.

[1] http://modis.gsfc.nasa.gov/data/dataprod/nontech/MOD0203.php

H4.CacheDir (no longer supported)

The HDF4 handler used to support caching its response objects, but that feature has been removed do to problems with it and datasets where multiple SDS objects had arrays with the same names. This parameter is now ignored. Note that no error message is generated if your h4.conf file includes this, but it's ignored by hyrax 1.7 and later.