Using Virtual Machines to Serve Data

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽


Introduction

We have used the VMware platform for our virtual machines (VM) and thus our VMs will run under any of their hypervisors. We choose VMware because they provide a good balance of support for Linux and Windows XP hypervisors that can be used to serve data as well as hypervisors for Windows, Mac OS/X and Linux for use in a workshop environment. In all but the Mac OS/X case, there is a hypervisor available for free and the OS/X 'Fusion' hypervisor can be used for 30 days without charge.

The following describes using the Hyrax/VM both as a way to serve data and as a pedagogical tool in a workshop. To use the Hyrax/VM to serve data you will need to have the data online and be able to install and run either the VMware Server or ESX hypervisor. In all likelihood, you will need to be conversant with the host OS running on the machine where the data reside and also have moderate knowledge of computer networking, firewall configuration and sharing disks on a LAN. In other words you will need basic system administration skills. A familiarity with Linux/Unix, and Ubuntu Linux in particular, will also be very useful since that is the operating system used by the VMs.

To run the Hyrax/VM as part of a workshop the general requirements are much more easily met. You need to be able to install software, follow the directions that come with the VMware Player, Workstation or Fusion hypervisor and then proceed on with the workshop agenda.

This work was sponsored by a grant from NASA and a description of how the VMs were built can be found at CheapStix.

jimg 10:10, 12 February 2009 (PST) The CheapStix article has some important information for new users of the VM:

  1. Sometimes, for an unknown reason, the network interfaces come up hosed. The symptom is that networking is not working and that ifconfig -a does not show eth0 in an UP state. When you login to the VM and run ifconfig -a you should see, in addition to some other stuff, the word UP next to eth0. If not, follow the advice for the udev hack on the CheapStix page.

Technical Background

A Virtual Machine (VM) is “an efficient, isolated duplicate of a real machine” (Popek, Gerald J. and Robert P. Goldberg. “Formal Requirements for Virtualizable Third Generation Architectures.” Communications of the ACM. Vol.17 No.7. July 1974).

  •  The ‘isolated duplicate’ is typically a data file or collection of files which contain the information needed to emulate the RAM, spinning disks and other hardware elements of the VM. If you look inside the .zip files we use to hold our Hyrax/VM distributions, you will see they contain a number of files and those have names that indicate which hold the RAM, Processor state and disk memory of the VM.
  •  The information in the files is used by a virtual machine monitor or hypervisor to instantiate the VM.
  •  Once running, the VM is indistinguishable from a ‘real’ computer in many ways.

There are two broad classes of hypervisors: Native and Hosted. A native (or type one) hypervisor runs on ‘bare metal’ while a hosted (or type two) hypervisor runs within an operating system. Later in this Guide we discuss using both type one and two hypervisors to serve data using the Hyrax/VM. For a workshop environment, only the type two (or hosted) hypervisor makes sense since the attendees will all be running the Hyrax/VM on their laptop computers.

There are a large number of VM systems. Note that virtual machine language interpreters (e.g., Java) are not considered part of this set of software.

Why use virtualization?

Benefits:

  • Isolation. It is easy to ensure that software in one VM will be unaffected by problems in another, even when they run on the same hardware and/or hypervisor.
  •  Legacy software. One real computer running a hypervisor can runs a suite of virtual machines, each tailored to the needs of various legacy software packages.
  •  Scalability. VMs provide some interesting possibilities in situations where it is important to be able to add and subtract new hosts from a network. Examples include Internet Service Providers and Cloud Computing environments.

We are using VMs primarily for the first two reasons: We want to provide a way to run our servers within a hypervisor because we fell that is an important option to provide larger groups who need to meet strict security requirements and because we want to support data providers who are using Windows operating systems. In the later case we can develop our software for the Linux/Unix operating systems (which are not really legacy systems...) and then us a VM and hypervisor to run them on the Windows platforms.

Downsides:

  •  Efficiency. It is less efficient, although how much so varies widely among different implementations. If many aspects of the underlying hardware and the VM hardware are the same, the hypervisor can be very efficient.
  •  Complexity. With many computers (virtual or not) comes a larger system administration burden. This can manifest itself in several ways including increased network complexity and the need to take into account the idiosyncrasies of different VMs.

Using the Hyrax and Virtual Machine to server data

Using the Hyrax and Virtual Machine (VM) software combination in a production environment is hard to describe in a completely general way since each such environment will have its own unique characteristics. However, there are several common cases that we will cover here along with some basic choices about additional infrastructure you'll need to add to your computer(s). You'll need to select a hypervisor within which the VM will run and you'll need to determine how that VM will gain access to your data.

To understand more about the construction of the VM, see CheapStix where the process used to build the Ubuntu JeOS VM is described in detail.

Choosing a hypervisor

In order to serve data you will need to use either the VMware Server or ESX hypervisor since these allow incoming network connections to be made with servers running in VMs hosted by the hypervisor. The Server hypervisor is available for free while the ESX hypervisor costs money. For many uses Server is likely adequate in terms of performance and is almost certainly simpler to configure. You should be able to install it on a typical Linux or Windows computer in less than an hour. Choose Server if you need to get a server up on a Windows XP host or if ease of configuration and maintenance are more important than maximum performance. However, the ESX hypervisor is the faster option. If you are using the Hyrax/VM combination as part of a security policy that requires the server to be isolated from the data and other network services, then you should consider this option. Lastly, the ESX hypervisor is part of VMware's Virtual Infrastructure and if performance is a major concern then you should probably research that as well.

Comaprison of the Server and ESX Hypervisors
Hypervisor Cost Environment Complexity Performance
VMware Server Free Windows XP or Linux Low (< 1 hour to install) Modest (Some overhead w/host OS)
VMware ESX $ Bare Intel Hardware High (Requires new Intel hardware) High (No host OS overhead)

Data access

The next choice you must make is how the server (Hyrax) will access the data it serves. One very limited option is to store the data in the virtual machine. This is limited for two reasons. First the data will need to be copied to the VM and copying data is often very undesirable because data volumes are high and because now two copies of the data must be maintained. In addition, the size of the VM has been kept small so that running it imposes the minimum load on the host.

A second option is to store that data on a different computer and access the files using a networked file system. This is really the only viable option for most cases - there's usually just too much data to copy it on the VM disks. If you are using Linux to store the data, then use NFS to export the file system as read only and use NFS to mount the file system on the VM. Similarly, if your host OS is Windows XP, use the XP Shared Folder to make the data/file system available and then use Samba on the VM to mount it.

We don't include both NFS and Samba on the Hyrax/VM because we wanted to load only those packages needed and clearly, only one of these is necessary for the vast majority of cases.

General steps for serving data using a VM

  1. Load the hypervisor on the Host computer
  2. Copy the Virtual machine to the host and store it where the hypervisor can access it.
  3. Start the hypervisor and the VM and verify that both are working.
  4. Export the data to be served from the Host computer (or any other computer)
  5. Mount the data within the VMs file system
  6. Make any configuration changes needed to Hyrax to serve the new data
  7. Start Hyrax and verify correct operation.

Detailed example: Serving data from an XP host using the Ubuntu-based VM

This example details the steps I used to serve data from a virtual machine using the Ubuntu 8.04 JeOS, Hyrax 1.4.2, Tomcat 6. The host computer was running Windows XP Professional, Service Pack 2, with the VMware Server 2.01 hypervisor. The Virtual machine is available for download on the OPeNDAP web site. Mostly this is the basic process outlined above, but there were somethings that took some sleuthing to discover - I've included those in block-quoted sections in the example.

  • I started with a fresh installation of Windows XP, SP2 and added VMware Server 2.01.
  • Copy the Virtual machine to the the Windows XP Host system. You can use IE to do this and save the result on the desktop. Then move the file to the Virtual Machines folder made when VMware Server was installed (mine was on the C drive).
  • Start VMware Server. To do this, follow the link on the desktop, which will open a browser window and access the VMware Server using a URL, or open a browser window and type the URL http://<machine name>:8333/.

With Internet Explorer 8 there's a warning about the SSL certificate supplied by VMware Server. Tell IE8 to load the page anyway - it recommends that you don't, but that's because it thinks you're trying to access a remote site. To keep this annoying message from appearing every time you start VMware Server, click on the right end of the URL display area and configure the browser to accept this particular certificate all the time without question.

  • Add the VM you copied to the Virtual Machines folder to the set of VM's that VMware Server controls. To do this, click on Virtual Machines and then Add Virtual Machine to Inventory.
  • Make sure the VMware Server 'console' plugin is installed. First, click on the VM's name in the window pane on the left. Then click on the Console tab at the top of the center pane. If the plugin is not installed, you see a message in yellow about installing it. Do so.

Later on I would upgrade from Internet Explorer 7 to 8, which 'broke' the VMware Server Console. In fact this was really my lack of Win XP experience showing - the VMWare Server software uses a plugin (ActiveX control?) to provide a console to operate the VM. When I upgraded Internet Explorer, I needed to go to the Tools menu and Manage Addons. To make the VMware Server console plugin active, choose Toolbars and Extensions and then Show All in the menu on the left. Set the VMware plugin so that its available for all sites. If the plugin is installed but not made available, even if it was available under IE 7, then the message you'll get says the plugin is not available and asks if you want to install it.

  • At this point you should be able to start the VM and login. With the VM selected in the left pane, click on the 'play' button and the VM should boot. It may take a few seconds. Once started you should be able to click on the Console tab and see a 'terminal' window popup. Type a return if you're not prompted for a username and password. The VMs all come preconfigured with the username opendap and the password opendap. Login. You can change the password if you'd like. You have full access to the VM using the sudo command. To change the password, use the command passwd.
  • Now export the data from the Windows XP host. Open up a file browsing window and navigate to the folder that holds the data you want to serve. Select the folder and then choose Share this Folder from the pane on the left (I'm working from memory here - it might not be exactly like that, but it's close). I believe Shared Folders are read-only by default and that's probably what you want. If not, there's an option to allow people to change the folder contents (make it read-write).
  • Now mount the Win XP Shared Folder in the VM. Lets break this into two steps. First collect information about the Host and VM and ensure that the networking parameters are correctly set. Write down the IP number of the Host computer by going to the Control Panel and examining the attached network devices. Next write down the IP number assigned to the VM by going to the console and using the command sudo /sbin/ifconfig -a and noting the value for eth0. If eth0 is not listed and/or is not marked 'UP' then follow the advice below regarding the udev hack. With the two IP numbers in hand, check to make sure that the VM is configured for NAT using the VMware Server's Virtual Network Editor (found under the 'Start' menu's vmware entry). Make sure NAT is enabled an the other two modes (host only and bridged) are disabled. At this time you can use the Virtual Network Editor to set up port forwarding so requests to access a particular port on the Host computer will be forwarded to the VM. To do this, select the NAT tab and then click on the Port Forwarding button. In the dialog box that comes up, use the IP numbers for the Host and VM and forward either port 8080 or 80 from the host to port 8080 on the VM. You can choose any port on the host, although 80 is the default HTTP port and most people will that's easiest. If you're already running a web server on the host, choose port 8080 to forward to avoid a conflict with the existing web server. At his point an access to port 80 (or 8080) on you Host will be forwarded to the VM on Port 8080, where you'll be running Hyrax. Connecting computers will not know this forwarding operation is taking place - it will seem as though they are using Hyrax on the Host computer.

There can be a odd wrinkle in getting the VM connected to the network.


The udev hack
I found that sometimes the VM would start with networking broken. I don't see a pattern, but looking at the network devices using sudo /sbin/ifconfig -a, eth0 is hosed (it does not say 'UP'). To fix this problem, change directory to /etc/udev/rules.d/ and in the file 70-persistent-net-rules remove the line about eth0 and edit the line for eth1 replacing 'eth1' with 'eth0'. Restart udev and networking using the eponymous scripts in /etc/init.d. Now ifconfig should show eth0 as 'UP'
Network Access
It may be that editing your /etc/network/interfaces so it looks like the one below will remove the need modify the persistent-net-rules file above. Thanks to Marty Brewer at RRS, Inc for this information.

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# This is a list of hotpluggable network interfaces.
# They will be activated automatically by the hotplug subsystem.
mapping hotplug
        script grep
        map eth0

# The primary network interface. Be cautious that the addresses you use here match 
# the values in the vmnet* device.
auto eth0
iface eth0 inet static
        address 192.168.0.100
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1
  • Go back to the VM's console window. Since you're using Win XP to export a folder, use the Linux Samba system to mount it within the Linux directory tree. Since we don't bundle Samba with the VM, you'll have to get it. Use the apt-get system to install Samba: sudo apt-get install smbfs. I've found that the apt-get system may need to be updated in order for this to work, so you will likely have to do sudo apt-get update first. Now make a mount point in the VM for the shared folder. I used sudo mkdir /opt/Hyrax-1.4.2/share/hyrax/data/<my data> because Hyrax is already configured to read from /opt/Hyrax-1.4.2/share/hyrax/data/. Finally, mount the Shared Folder onto the new directory: mount -t smbfs -o username=<user>,password=<pw> //<host IP>/data /opt/Hyrax-1.4.2/share/hyrax/data/satellite.
  • Next to last step. Make any changes to Hyrax need to serve the data. For NetCDF, HDF4, HDF5 or FreeForm data sets, no changes should be needed if you mounted the Shared Folder under /opt/Hyrax-1.4.2/share/hyrax/data. If you chose another mount point, edit the bes.conf file in /opt/Hyrax-1.4.2/etc/bes/ so that the DirectoryRoot parameter points to that mount point. Next, if your filenames are unusual (e.g., your NetCDF files don't use a .nc suffix but instead use something else) edit the bes.conf file so that those files will be recognized and associated with the NetCDF handler. Of course, the same goes with HDF4 files and the HDF4 handler, et cetera. This step is really a generic Configure Hyrax step and you can find instructions on that here.
  • Start Hyrax. Goto the /opt/Hyrax-1.4.2/ directory and run the two commands: sudo bin/besctl start and sudo ../tomcat/bin/startup.sh. There's more information about starting and stopping Hyrax in the main Hyrax documentation section.
  • Test the server. Goto the Host computer and in a new browser window type the URL http://localhost/opendap/ and you should see the server's top-level page for the data you are serving. If you forwarded port 8080 instead of 80, us http://localhost:8080/opendap/ instead. Other computers would use either your host's name or IP number, depending on how your network is configured.

Workshops and the Hyrax and virtual machine software

TDB

  • How do I use the Hyrax VM in a workshop?
    • Using the VMware Workstation, Fusion or Player hypervisors
    • Using a web browser to look at data
    • Getting sample clients
    • Powerpoint presentations for use with the VM