OPeNDAP Developer's Workshop 2010

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Where & When

  • Location: Troy, NY
  • Dates: 6-8 Oct

Meeting modus operandi

  • mockups
  • whiteboard designs
  • use cases
  • best to exploit face-to-face for what it affords.

Post Meeting Task Prioritization

NcML - priorities - ordered

  1. Caching - looking up the dimension sizes of granules automatically offline and saving them
  2. Autogenerated metadata - this is the source file, this is the *, and adding a container
  3. Caching - support for HTTP/1.1 caching - NcML/BES changes to return LMT, size, etc. (Requires new BES release)
  4. FMRC aggreagations (Prereq: support for nesting aggregations?)

Other NCML Handler things we need to keep in mind

  1. Aggreagtions - Add ability to use HTTP/URLs in NcML
  2. For full support for THREDDS(?) for existing thredds catalogs we will need to push NcML from the OLFS (which reads the catalogs) to the BES (whcih performs the aggregations via the NcML handler.

BES - priorities

  1. Support for registering delegates for file info requests (LMT, size, etc) [Prereq for HTTP level caching of NcML]
  2. Module versioning
  3. Server-side functions as loadable modules for the BES

Admin Interface

  1. Work with Tim - Write the requirements.

Other priorities

  1. Easier Install - binary builds, meta packges
  2. SQL Handler
  3. Web Site (Drupal??)

Other things to keep in mind

  1. Rewrite BES in Java



Agenda



Wednesday

0830 - Coffee

Server Side Functions

  • Registration
    • modularized
    • each function -> class
    • version
      • can you handle this
      • service
      • how do we package this
  • Discovery
    • Ask the OLFS/BES what functions for dataset
      • what functions
      • capabilities
  • Format
    • function(p1,p2,p3,p4...)
  • Usage
    • Run by handler during loading - slection
    • Run during constraint parsing
    • Run during transmission/serialization (constraint evaluation)
    • What about functions like "version"??


void
function_version(int, BaseType *[], DDS &, BaseType **btpp)

typedef void(*bool_func)(int argc, BaseType *argv[], DDS &dds, bool *result);
typedef void(*btp_func)(int argc, BaseType *argv[], DDS &dds, BaseType **btpp);
typedef void(*proj_func)(int argc, BaseType *argv[], DDS &dds, ConstraintEvaluator &ce);


Use Cases
  • Scientist has a dataset and wants to discover what functions are available to apply to that dataset
  • Data provider has a set of server-side functions that they wish to provide for data users. They want to add these modules to the OPeNDAP via dynamically loaded modules
  • Scientist has a dataset and wants to apply geogrid function to that dataset
  • Scientist wants to discover the the shape and size (metadata) of a particular function against a dataset
Plan: Define a class/hierarchy
  • Define a set of classes for server-saide functions
  • Have the BES instantiate those from .so files
  • Those instances are then passed into a ConstraintEvaluator instance which is then passed to a handler
  • The BES conf file(s) can include information so that some functions are always loaded and some are loaded only for specific handlers (e.g., FreeForm).
  • Take the server side functions out of libdap and add to BES modules
  • Add the server side functions to server side class
  • These classes will be able to handle things like get version, given a ddx(?) can this function be run against it (return true/false)
  • At load time the classes are created within the BES, the functions are registered with libdap. This maintains binary compatability.


Server Administration

Admin interface
  • admin connects to a listener
  • can restart a listener
  • can restart the BES with ADMIN
  • turn on/off debugging through admin interface
  • keep track of how many connections are allowed/active
  • re-write besdaemon/beslistener (2-3 hrs)
  • allowed size of response from BES (1 hr)
http://docs.opendap.org/index.php/Hyrax_Admin_Interface
BES design changes
  • drop the besdaemon from the architecture
  • listener is started instead of daemon
  • listener keeps track of connections made and socket connected to
  • listener keeps track of connections dropped, removes from list
  • limit the number of open connections
  • besdaemon will be able to have both hard and graceful restart
  • Out of band comm will enable access to logs and conf files in addition to the restarts
  • capability to send back configuration information
  • capability to receive new configuration information and write to disk and reload
  • we don't sweat the fact that the logs are shared - we can build a fancy interface out at the servlet level to filter by specific client using the PID info in the log.
  • Changes in the conf only are used when a process is started.
  • Soft shutdown, listener sends certain signal to child, when done handling current request, go down
  • Hard shutdown, listener sends certain signal to child, go down now
  • capability to stream back the log file
Authentication and Authorization
  • Will people be authenticating through ESG to get to Hyrax? Or will Hyrax need to have an authentication piece? Authorization to ESG AuthZ.
  • encrypting the data? Is it through the BES or through the middle tier? If middle tier then no need to encrypt the information through the BES.
  • client application to need to allow login to get data? Scientist gets a URL back to access data through OPeNDAP. To get that data, have to be logged in? Authorization needs to happen.
  • Certificate is on the client disk, when request made within matlab client, grabs certificate and sends along with data URL
Throttling the response
  • Function added to DDS and base types to be able to pre-compute the response size based on projections and selections
  • Should also work for sequences, at least be able to say that you can have x number of rows returned.

Lunch

James : Pizza Margarita
Dan : Corned Beef
Patrick : Veggy Pizza
Michael : Chikin Sandwich
Nathan : Turkey Club Sandwich

Easier install for Hyrax

  1. Modify nightly build to use shrew
  2. Build RPMs from the NB shrew
    1. Add dependencies to all RPM spec files
  3. Modify all Makefiles to support pkg/dmg binaries
  4. Integrate into NB on OS/X Server VMs
  5. build meta package for OS/X using above
  6. Run RPM/Linux NB on VMs too
  7. Test both PRMs and pkg/dmgs on completely clean VMs

SQL Handler

  1. Add to README and INSTALL; fill these in and make them standard WRT the other README/INSTALL files.
  2. Add to the README so that there is a simple 'How To' for the server's configutation and then write stuff up for the docs wiki.
  3. Look at SQL Handler requirements in docs wiki and match up to current developed functionality
  4. Getting attributes

JGOFS

  • Still a lot of organizations using jgofs and wanting opendap access
  • Current library is very difficult to work with and still fork/exec the methods
  • Could take the library and convert to use the autotools
    • Also change from fork and exec to dynamically loaded module
  • Or be a pass-thru/gateway/proxy module, like WCS, and just pass the request back to the proper JGOFS server that is already running.
    • Figure out how to return catalog information from the jgofs servers
    • How do we get attributes/metadata...

Active file system

A filesystem that creates a "signal" whenever something changes.

In many cases this boils down to caching binary objects in/around the BES




1800 - Dinner

Mmmmmmm Irish/Mexican




Thursday

0830 - Coffee

Mmmmmm that was tasty...

NcML Ingestion

  • Sending in NcML from a client
  • Use a proxy server to shield clients from the mechanics
    • The aggregations need to persist and at least some of this must be at the origin server
    • The NcML is held at a proxy server and sent to the origin server where it's validated, etc. and the origin server (must be a Hyrax) returns an id that can be used to reference the aggregation
    • The proxy retains that id and takes requests against it.
  • Can we simplfy this and remove the proxy given that the aggregation has to persist?
  • How long does the aggregation persist? That is, what criteria are used to 'flush' an 'uploaded' aggregation.
Why we want this?
  • EML --> Building aggreagations at a client
  • TDS migration
Design mechanics for this feature
  • Errors: How to get information about bad ncml back to a client
  • Assme that the server (Hyrax) is modified to use POST for this feature
  • Handle errors by returning text as built by the NCML handler
  • OLFS currently has no POST termini and we could make an explict terminus for this but we could make a special servlet for this feature
Alternatives
  • Two things: We add support for HTTP acccess to the ncml handler - this is how we handle aggreagtions of remote entities
  • For the NcML migration we modify the BES/NcML commands to support uploading NCML from the OLFS to a specific BES/NCML-handler instance
  • Advantages: This solves the first problem w/o any changes to exisitng origin servers at the cost of poor accesses to real-world aggreations. Actually useful aggregations would be sent to the 'origin server' by hand between people.
  • There seem to be no real increased costs for NCML migration.
To support exteranal access
  • NcML handler becomes a HTTP client
  • Might require remote URLs to be listed explicitly? Performance implemented

BES Internal Caching

Caching compressed data
  • The BES caching issue is only regarding the compressed files, their decompression and maintaining a restricted 'cache' size.
  • This caching scheme needs to include a two stage cache where a single space with no size limit is used for at most one item. This is used for the actual decompression phase. Only one writer may access this at any time.
  • A second space is used to store N-bytes of decompressed files. This space must lock all accesses during a write (and must block on both reads and writes). Call this the 'main cache.'
  • The main cache must store spin locks for read access to all of the cache items.
  • Modify access to items in the cache so that they are accessed using an object such that it's dtor updates the spin lock (so that the spin lock is manged even when exceptions are thrown).
  • The object (above) also provides access to a path to the cached item; this path is accessed and passed to the handlers.
  • See libdap's HTTPCache and Response classes for an example implmentation
Caching binary objects
  • While it's desireable it's also really hard to retro-fit
  • To serialize the DDS we would have to serialize BaseType and all of the concrete classes that are derived from it (NCByte, ...)
  • It's not clear that caching these kinds o objects will provide a huge benefit for users/clients.
The NcML handler will have to do its own caching and add information to HTTP caching GETs
  • It will need to support the Last Modified Time of aggregations if we're going to get caching systems like Squid working correctly. Actually, for ANY ncml file, not just aggregations, we need to give the BES a proxy for the modification time. The wrapped dataset could change, but not the NcML, which implies the cache is invalidated but not reflected by the ncml file time.
  • This will need to be a feature added to the BES for the showInfo command. Basically, it will need to allow the modules to register a delegate function to fill in the info response for a given file. This include the last modified time as well as the size of the file, since the NcML does not reflect the size of the actual wrapped dataset. The default response (for no delegate registered) will be the current BES behaviour of filling in the file mod time and size.

Lunch

James : Veggie Burger
Dan : Something tasty
Patrick : Veggy Burger
Michael : Corned Beef on Rye
Nathan : Chikin Salad

BES module architecture

  • Show Version: This is a built-in BES command
  • <showVersion .../>
  • show version --> BESVersionResponseHandler (if this mapping exists, then we can build a response for this command)
  • a ResponseHandler knows hw to make the response object and how to get it filled in (and in this case it does fill it in a little bit)
  • So you do get a repnse from this command from this command with no modules loaded.
  • Sequence:
    • BESVersionResponseHandler
      • Makes response object
      • knows how to get it filled in
        • Go to each response handler and see if they know how to fill in version
  • Now... Load a module like the Dap Module or NCRequestHandler knows how to fill in a version response.
  • It (NCResponseHandler) registers this capability
  • Response handlers are assoicated with commands (like 'show version' or 'get')
  • Request handlers are associated with modules (like things that read files or process DAP object)
  • <get type="dds" defintion="d" /> (BES command)
  • The XML command that reads the above turns this into: getDDS
  • Ths calls the DapDDSResponseHandler.
    • Thsi knows how to make the response object (whcih is a DDS)
    • It goes to eac container in the definition (d in this ex) and goes to te matching Request Handler and asks for the dds
    • The Response handler knows how to send the response

Now, for the caching....

  • < UpdateCache/>
  • NcML_Module
    • Init
      • updateCache --> XML___
      • updateCache --> UCResponseHandler




1800 - Dinner




Friday

0830 - Coffee

Module check

Making sure a loaded module matches the version of BES and any required modules


SSL authentication/authorization issues

Currently problem with SSL authentication and keeping SSL channel open for secure communication


Multiple catalogs

BES and OLFS integration


1230 - Lunch

Strategy Breakout (cont)

1800 - Dinner and Departure