Caching that's multi-process safe

From OPeNDAP Documentation
Revision as of 21:49, 21 March 2012 by Jimg (talk | contribs) (Proposed solution)

1 Problem

Develop a general procedure (or software) for implementing a cache that can be shared by several processes. There seems to be no readily available library that implements caching. It certainly needs to work for Unix; Windows suport would be good, but i of less importance because the OC library may take over or client role on that platform.

2 Background

We have several cases where files need to be cached, both in our server software and in the client code. These include at least: files that have been decompressed and responses from the web that can be cached using the HTTP/1.1 rules. We need for the caching to be both thread safe and multi-process safe. That is, the cache needs to be shared between processes that do not, otherwise, synchronize their operation.

3 Proposed solution

3.1 Assumptions

  1. The size of the cache can only change by the number of active processes using the cache (at any given instance).
  2. The size of the cache will likely change only by the average size of a cached object times some constant that's less than or equal to the number of processes.
  3. We accept as tolerable that the cache might grow beyond its bounds by a 'little bit' but that's an OK tradeoff if we don't have to lock the cache to count its size.
  4. We will count the size of the cache before every attempt to write a new item.
  5. We can store the size of the cache in a file and access that using an exclusive lock (but that might make item #3 moot, because then we always know the exact size of the cache).

3.2 Definitions

shared lock
a synonym for a read-only lock; a program cannot acquire an exclusive lock when a shared lock exists, but it can acuire another shared lock. Thus, many programs can use a shared lock for reading and prevent a program from writing/deleting the file.
exclusive lock
synonym for a write lock; a program cannot acquire an exclusive lock when either a shared lock or another exclusive lock already exists.
semaphore file
a file that is used to serve as a proxy lock for another file. These can be use to indicate to cooperating processes that a file should not be touched - it is effectively locked - without actually locking the file. This can be used to establish overlapping locks, particularly so that files can be first exclusively locked and, without removing that lock, placing a shared lock on it too. Then the 'exclusive lock' can be removed, thus ensuring that at no time in the sequence is the file left in the unprotected state.

3.3 Releasing locks

The fcntl command makes locks that are removed whenever any of the file descriptors to an open file are closed. So long as we use advisory locking and program logic follows the rules, we can use these locks and be sure that files will be unlocked whenever a handler closes the file, without any need to actually close the lock in the using the same file descriptor used to obtain the lock. This is apparently not the case for flock or lockf.

3.4 There are three basic operations that must be implemented

Is a file in the cache
Try to acquire a shared lock. If successful, the file can be read (because it's locked for read). There's no point in not locking the file for read, because locking it is the only way to ensure it won't be deleted by the time the caller reads & processes the response.
Write a new file
Acquire an exclusive lock on a semaphore file for the file. Then open/create the file and obtain an exclusive lock and write the contents. Then release the exclusive lock and obtain a shared lock. The delete the semaphore file. Not sure about the use of a semaphore file in this case.
Delete a file
Get an exclusive lock on the file. Delete it.

3.5 Cache design

In addition to the files themselves, maintain information about the time each file was last used and their size. The cache should also probably have information about its total size.