AMQP Support in Hyrax: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 79: Line 79:


However, the coupling between ServletRequest and the OLFS is not actually needed. The software could be refactored so that ''DispatchHandler'' accepts two objects defined by the OLFS (not the generic Servlet software) using interfaces more suitable to its needs. At the outermost levels, in the software that calls the ''request can be handled'' and ''handle request'' methods, those new OLFS-defined interfaces would be the formal parameters. Instances of the proposed request and response implementations could be built from constructors that accept instances of ''ServletRequest'' and ''ServletResponse'' as well as other things such as a set of hypothetical ''AMQPRequest'' and ''...Response'' objects. Instances of the request and respose implementations would be passed into the ''DispatchHandler'' implementations, allowing them to be used by both the OLFS and the porposed AMQP front end.
However, the coupling between ServletRequest and the OLFS is not actually needed. The software could be refactored so that ''DispatchHandler'' accepts two objects defined by the OLFS (not the generic Servlet software) using interfaces more suitable to its needs. At the outermost levels, in the software that calls the ''request can be handled'' and ''handle request'' methods, those new OLFS-defined interfaces would be the formal parameters. Instances of the proposed request and response implementations could be built from constructors that accept instances of ''ServletRequest'' and ''ServletResponse'' as well as other things such as a set of hypothetical ''AMQPRequest'' and ''...Response'' objects. Instances of the request and respose implementations would be passed into the ''DispatchHandler'' implementations, allowing them to be used by both the OLFS and the porposed AMQP front end.
[[Image:OLFS new servlet.png|440px|thumb|left|The OLFS implemented using ''HyraxRequest'' and ''HyraxResponse'' objects. Using these factors out the coupling between Servelt and the OLFS at all but the highest levels of implementation.]][[Image:OLFS new AMQP.png||400px|thumb|right|The AMQP front end using ''HyraxRequest'' and ''HyraxResponse'' objects. This uses the same code as the OLFS with the ''HyraxRequest''/''Response'' objects, limiting issues with long-term maintenance.]]


== How Best to Combine AMQP and HTTP ==
== How Best to Combine AMQP and HTTP ==

Revision as of 22:59, 8 December 2009

Figure 1. Hyrax Architecture at a High-level

Support for AMQP in hyrax is best handled by adding a new front-end to the server that can act as an AMQP client, reading information from an AMQP queue. Hyrax has an overall architecture that already supports this. Figure one shows a high-level view of the Hyrax architecture. The BES is the part f Hyrax that builds the bodies of a DAP response. The front-end (the OLFS) contains a set of handlers which respond to requests made using HTTP. Based on the request, the OLFS sends commands over a stateful connection to the BES asking it to make the correct response. Generally, the OLFS will have to parse the request URL and pass information from that URL to the BES. Even though the OLFS is designed to support several different 'protocols' like DAP or THREDDS, it is capable of responding to HTTP only (it is a Java Servlet; see Server Dispatch Operations for information about the OLFS design, implementation and extension capabilities). Thus, it makes the most sense to build a new front-end dedicated to AMQP.

While the architecture chosen to add support for AMQP is very important, other considerations are also critical to the success of the overall effort to run DAP over AMQP. One is the mapping between different DAP versions to AMQP. Since DAP was designed with HTTP in mind, how DAP and AMQP can best be matched merits serious consideration. This will entail looking at the current DAP implementation along with the evolving DAP, version 4, specification and its implementation.

Proposed Architecture for AMQP Support in Hyrax

Hyrax with AMQP

Figure 2. shows how a AMQP module could be added to work with Hyrax. The diagram implies that an actual installation could support both DAP/HTTP and DAP/AMQP interactions using one BES. That might be true, or it might not, depending on how the connection pooling is handled by the OLFS and new AMQP front-end. The DAP/SOAP interface of the OLFS is really different than the other three interfaces shown in the diagram because the SOAP messaging software uses the request document body instead of the URL while the other three interfaces/protocols all use the URL and ignore the request document body. However, it's still part of the OLFS because Hyrax uses SOAP messaging over HTTP.

One option we should consider is supporting DAP over AMQP using SOAP.

Sharing one BES between two front-ends

Because there is no limit within the BES on the number of beslistener processes created, any number of front ends can make connections to the BES and holds those connection sin a pool without concern that the 'pool will fill up'. This is a result of the Unix fork/exec model and the architectural design of Hyrax that has placed limits on the number of outstanding BES connections within the OLFS. A second front end could establish its own set of connections to completely separate processes. In the OLFS, the software that manages the connections to the BES can be found in BES.java.

Where's the OLFS' 'main'?

To see where the OLFS starts running, look at ServletDispatch. This class' init() method is the first code run by the servlet engine when it starts.

How the OLFS connects to the BES

On start-up the OLFS makes a connection to the BES. When the OLFS is started, the BES Daemon (besdaemon) has already started and bound a well-known port (10002 by default; set in both the bes and olfs configuration files). The OLFS starts when Tomcat starts or restarts the servlet and initially makes a pool of connections. These are really TCP socket connections to specific instances of the BES listener (beslistener). When the OLFS gets a request it needs to process using the BES, it checks this pool of connections and picks the next available one. If no connections are available, then a new connection is made unless the maximum number of allowed connections have already been made. In the latter case the request for a the next available connection blocks until there is an available connection. The maximum number of connections to the BES, which is really the maximum number of BES listeners (i.e., processes) to make is set in the OLFS configuration file.

To see how the OLFS does this, look at BES.java, OPeNDAPClient.java, and NewPPTClient.java

Important points:

  1. Because the OLFS would block indefinitely if all of the beslisteners get 'stuck', the OLFS uses and inactivity timeout to kill beslisteners. (This is not the case in Hyrax 1.5 but will be in Hyrax 1.6; 300 seconds, hard coded, but it could be a configuration parameter).
  2. The OLFS will dump connections from the pool after 2000 commands have been sent to a particular beslistener. This parameter is hard coded into the OLFS, but it could be read from the configuration file.
  3. The OLFS initially makes zero connections and it makes new connections only when a request for a connection indicates that no available connections are in the pool. Thus, even though the maximum number of connections is set at N, there will only be N instances of the beslistener and N entries in the pool of connections if there's a need for N simultaneous connections.
  4. The besdaemon and master beslistener do not know about the child processes that have been created.

Abstracting the OLFS/BES connection logic

To see how to abstract the connection logic so that the OLFS' connection pooling and configuration logic can be reused with a different transport protocol, look at the NewPPTClient.java class. This class effectively implements a simple interface with the methods:

init
Make the object that holds state for the request
open
Connect to a new BES listener
send request
Given that a connection to a server exists, send a request
process response
Wait for, and then process, a response to a request
close
Deallocate resources associated with this connection

In the explanation above, I used the word connection but it is really a virtual connection. In Java the InetAddress and Socket classes abstract the operations of socket-based IPC. the actual transport can be TCP, UDP, ...

Todo: Extrapolate from this class an interface, make this class implement that interface and then write an second class that provides for TCP tunneling over AMQP (for example) with a second implementation of that interface. It may also be that RabbitMQ provides a tight enough integration with Java's IPC classes that a more straightforward implantation is possible.

How the current OLFS dispatching works

The current OLFS implementation is tightly coupled with the Servlet classes, especially the HttpServletRequest and Response classes

The OLFS is a dispatch handler, a giant switch statement that looks at each incoming request and shunts it to the correct software for processing. In many cases the BES is not actually involved in the processing or is involved in only a tangential way. In fact, however, the OLFS' dispatch code is more sophisticated than a switch statement. Instead it consists of two layers of processing where the outer layer is made up of a set of DispatchHandler classes. Each of these classes implements the DispatchHandler interface. This interface has five methods:

init
Initialize the handler; called when the OLFS starts
requestCanBeHandled
Called when a new request is presented to the OLFS and the dispatch logic is looking for a handler to process it. Returns true or false.
handleRequest
Perform whatever is required to build a response and process it
getLastModified
Ask the BES for the last modified date of some resource that's central to processing the request.
destroy
Remove the dispatch handlers

Three of these methods take the request (i.e., HttpServletRequest) object defined by the Servlet class (requestCanBeHandled, handleRequest and getLastModified). One of the methods also takes the response (i.e., HttpServletResponse) object from Servlet (handleRequest).

The second level of dispatch takes place within each of the classes that implement the DispatchHandler interface. Here are three examples:

DirectoryDispatchHandler
Responds to requests for directory information
BESThreddsDispatchHandler
Responds to requests for THREDDS catalogs
DapDispatchHandler
Responds to requests for DAP responses

Take a look at the DapDispatchHandler to see how within that class the requests for several different requests, including some, like the ASCII response that are not strictly DAP responses, are handled.

Abstracting the request-response logic of the OLFS

The main obstacle we see to building a second front end for Hyrax is that the DispatchHandler interface takes instances of the HttpServletRequest and ...Response classes and then passes those along to the lowest levels of the software. This creates close coupling between the ServletRequest interfaces and the OLFS code.

However, the coupling between ServletRequest and the OLFS is not actually needed. The software could be refactored so that DispatchHandler accepts two objects defined by the OLFS (not the generic Servlet software) using interfaces more suitable to its needs. At the outermost levels, in the software that calls the request can be handled and handle request methods, those new OLFS-defined interfaces would be the formal parameters. Instances of the proposed request and response implementations could be built from constructors that accept instances of ServletRequest and ServletResponse as well as other things such as a set of hypothetical AMQPRequest and ...Response objects. Instances of the request and respose implementations would be passed into the DispatchHandler implementations, allowing them to be used by both the OLFS and the porposed AMQP front end.

The OLFS implemented using HyraxRequest and HyraxResponse objects. Using these factors out the coupling between Servelt and the OLFS at all but the highest levels of implementation.
The AMQP front end using HyraxRequest and HyraxResponse objects. This uses the same code as the OLFS with the HyraxRequest/Response objects, limiting issues with long-term maintenance.

How Best to Combine AMQP and HTTP

One approach to adapting DAP to a messaging architecture has already been implemented in our interface to Hyrax for SOAP and this might be the best starting point for an adaptation of DAP/HTTP to DAP and AMQP. One difference, however, that is likely to play a role in DAP over AMQP that doesn't show up in the SOAP interface is that DAP4 is now much farther along than when that software was written. The feature of DAP4 most important to this project is that DAP4 over HTTP no longer relies on HTTP headers as the sole way to return certain information. Instead, all information about a response is contained in the body of the response and some information is also contained in HTTP response headers to simplify writing HTTP clients and/or working with DAP2 clients. So, for example, the information about the version of DAP used to build a particular response is now part of the response body (in the <Dataset> element) and in the HTTP response header XDAP. This means that HTTP clients can figure out the version before the response document is parsed and other protocols (e.g., AMQP) can get it from the response itself.

I'll chime in here and say that as far as I can see the primary obstacle in moving the protocol to AMQP is the use of HTTP headers as the mechanism for version negotiation between the client and the server. The client tells the server what it wants and the server hands back a response that is requested version or lesser. If this was moved into the request URL via a mandatory server side function, say something like "version(x.y)" where "x" is the DAP major and "y" the DAP minor version, then I think using AMQP would simplified.--ndp 12:05, 3 December 2009 (PST)

See Also

  1. BES XML Commands and Hyrax - BES Client commands
  2. How to build the DataDDX response in/with Hyrax
  3. Hyrax SOAP API

Use Cases

In order to move forward and define the most useful way to use DAP2 and/or DAP4 over AMQP, we need to make suer there's a clear understanding of how the server is supposed to interact with the AMQP broker and how Hyrax within the OOI system will be used.

http://www.oceanobservatories.org/spaces/display/CIDev/Data+Exchange