Using Virtual Machines to Serve Data

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽

Introduction

We have used the VMware platform for our virtual machines (VM) and thus our VMs will run under any of their hypervisors. We choose VMware because they provide a good balance of support for linux and Windows XP hypervisors that can be used to serve data as well as hypervisors for Windows, Mac OS/X and Linux for use in a workshop environment. In all but the Mac OS/X case, there is a hypervior available for free and the OS/X 'Fusion' hypervisor can be used for 30 days without charge.

The following describes using the Hyrax/VM both as a way to serve data and as a pedagogical tool for use in a workshop. To use the Hyrax/VM to serve data you will need to have the data online and be able to install and run either the VMware Server or ESX hypervisor. In all likelihood, you will need to be conversant with the host OS running on the machine where the data reside and also have moderate knowledge of computer networking, firewall configuration and sharing disks on a LAN. In other words you will need basic system administration skills. A familiarity with Linux/Unix, and Ubuntu Linux in particular, will also be very useful since that is the operating system used by the VMs.

To run the Hyrax/VM as part of a workshop the general requirements are much more easily met. You need to be able to install software, follow the directions that come with the VMware Player, Workstation or Fusion hypervisor and then proceed on with the workshop agenda.

Technical Background

A Virtual Machine (VM) is “an efficient, isolated duplicate of a real machine” (Popek, Gerald J. and Robert P. Goldberg. “Formal Requirements for Virtualizable Third Generation Architectures.” Communications of the ACM. Vol.17 No.7. July 1974).

  •  The ‘isolated duplicate’ is typically a data file or collection of files which contain the information needed to emulate the RAM, spinning disks and other hardware elements of the VM. If you look inside the .zip files we use to hold out Hyrax/VM distributions, you will see they contain a number of files and those have names that indicate which hold the RAM, Processor state and disk memory of the VM.
  •  The information in the files is used by a virtual machine monitor or hypervisor to instantiate the VM.
  •  Once running, the VM is indistinguishable from a ‘real’ computer in many ways.

There are two broad classes of hypervisors: Native and Hosted. A native (or type one) hypervisor runs on ‘bare metal’ while a hosted (or type two) hypervisor runs within an operating system. Later in this Guide we discuss using both type one and two hypervisors to serve data using the Hyrax/VM. For a workshop environment, only the type two (or hosted) hypervisor makes sense since the attendees will all be running the Hyrax/VM on their laptop computers.

There are a large number of VM systems. Note that virtual machine language interpreters (e.g., Java) are not considered part of this set of software.

Why use virtualization?

Benefits:

  • Isolation. It is easy to ensure that software in one VM will be unaffected by problems in another, even when they run on the same hardware and/or hypervisor.
  •  Legacy software. Because there are several computers, it is possible to configure each one to conform to the specific requirements of legacy software while running newer software in different VMs.
  •  Scalability. VMs provide some interesting possibilities in situations where it is important to be able to add and subtract new hosts from a network. Examples include Internet Service Providers and Cloud Computing environments.

We are using VMs primarily for the first two reasons: We want to provide a way to run our servers within a hypervisor because we fell that is an important option to provide larger groups who need to meet strict security requirements and because we want to support data providers who are using Windows operating systems. In the later case we can develop our software for the Linux/Unix operating systems (which are not really legacy systems...) and then us a VM and hypervisor to run them on the Windows platforms.

Downsides:

  •  Efficiency. It is less efficient, although how much so varies widely. If many aspects of the underlying hardware and the VM hardware are the same, the hypervisor can be very efficient.
  •  Complexity. With many computers (virtual or not) comes a larger system administration burden. This can manifest itself in several ways including increased network complexity and the need to take into account the idiosyncrasies of different VMs.

Using the Hyrax and Virtual Machine to server data

Using the Hyrax and Virtual Machine (VM) software combination in a production environment is hard to describe in a completely general way since each such environment will have its own unique characteristics. However, there are several common cases that we will cover here along with some basic choices about additional infrastructure you'll need to add to your computer(s). You'll need to select a hypervisor within which the VM will run and you'll need to determine how that VM will gain access to your data.

Choosing a hypervisor

In order to serve data you will need to use either the VMware Server or ESX hypervisor since these allow incoming network connections to be made with servers running in VMs hosted by the hypervisor. The Server hypervisor is available for free while the ESX hypervisor costs money. For many uses Server is likely adequate in terms of performance and is almost certainly simpler to configure. You should be able to install it on a typical Linux or Windows computer in less than an hour. Choose Server if you need to get a server up on a Windows XP host or if ease of configuration and maintenance are more important than maximum performance. However, the ESX hypervisor is the faster option. If you are using the Hyrax/VM combination as part of a security policy that requires the server to be isolated from the data and other network services, then you should consider this option. Lastly, the ESX hypervisor is part of VMware's Virtual Infrastructure and if performance is a major concern then you should probably research that as well.

Comaprison of the Server and ESX Hypervisors
Hypervisor Cost Environment Complexity Performance
VMware Server Free Windows XP or Linux Low (< 1 hour to install) Modest (Some overhead w/host OS)
VMware ESX $ Bare Intel Hardware High (Requires new Intel hardware) High (No host OS overhead)

Data access

The next choice you must make is how the server (Hyrax) will access the data it serves. One very limited option is to store the data in the virtual machine. This is limited for two reasons. First the data will need to be copied to the VM and copying data is often very undesirable because data volumes are high and because now two copies of the data must be maintained. In addition, the size of the VM has been kept small so that running it imposes the minimum load on the host.

A second option is to store that data on a different computer and access the files using a networked file system. This is really the only viable option for most cases - there's usually just too much data to copy it on the VM disks. If you are using Linux to store the data, then use NFS to export the file system as read only and use the Ubuntu package system (apt-get) to load the necessary modules onto the Hyrax/VM to enable it to use NFS. Once you have added the NFS modules to the VM, configure them by editing the /etc/fstab as described in the NFS manual pages and edit the /etc/bes/bes.conf file on the VM so that the root directory of the data tree is the mount point for the data.

Similarly, if you are using a Windows computer to host the Server Hypervisor, use SMB to export the data and Samba running on the Hyrax/VM to access that share. The steps involved are outlined below.

We don't include both NFS and Samba on the Hyrax/VM because we wanted to load only those packages needed and clearly, only one of these is necessary for the vast majority of cases.

Steps to set up a Hyrax/VM running within VMware Server on a Linux host which also stores the data:

  1. Configure the VM and the hypervisor so the VM has an IP number and can accept HTTP connections from the Internet
  2. Export the directory holding the data (typically in read-only mode) to the VM's IP number
  3. Install the packages needed for client NFS in the VM:
    apt-get install nfs-common portmap
  4. Mount the data directory (seethe NFS manual page for all of the options):
    /etc/fstab
  5. Edit the BES configuration file:
    /etc/bes/bes.conf
  6. Restart the server

If instead you're using a Windows XP/Vista host to run VMware Server and store the data, you would modify the above to:

  1. Export the data as a SMB share (in place of the Linux/NFS export)
  2. Load Samba on the Hyrax/VM in place of NFS:
    sudo apt-get install samba
  3. Configure Samba to mount the data by editing the samba configuration file:
    /etc/samba/smb.conf

Workshops and the Hyrax and virtual machine software

  • How do I use the Hyrax VM in a workshop?
    • Using the VMware Workstation, Fusion or Player hypervisors
    • Using a web browser to look at data
    • Getting sample clients
    • Powerpoint presentations for use with the VM