Using Virtual Machines to Serve Data: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
Line 69: Line 69:


=== General steps for serving data using a VM ===
=== General steps for serving data using a VM ===
# Load the hypervisor on the Host computer
# Copy the Virtual machine to the host and store it where the hypervisor can access it.
# Start the hypervisor and the VM and verify that both are working.
# Export the data to be served from the Host computer (or any other computer)
# Mount the data within the VMs file system
# Make any configuration changes needed to Hyrax to serve the new data
# Start Hyrax and verify correct operation.


=== Detailed example: Serving data from an XP host using the Ubuntu-based VM ===
=== Detailed example: Serving data from an XP host using the Ubuntu-based VM ===

Revision as of 04:25, 7 June 2009

Introduction

We have used the VMware platform for our virtual machines (VM) and thus our VMs will run under any of their hypervisors. We choose VMware because they provide a good balance of support for Linux and Windows XP hypervisors that can be used to serve data as well as hypervisors for Windows, Mac OS/X and Linux for use in a workshop environment. In all but the Mac OS/X case, there is a hypervisor available for free and the OS/X 'Fusion' hypervisor can be used for 30 days without charge.

The following describes using the Hyrax/VM both as a way to serve data and as a pedagogical tool in a workshop. To use the Hyrax/VM to serve data you will need to have the data online and be able to install and run either the VMware Server or ESX hypervisor. In all likelihood, you will need to be conversant with the host OS running on the machine where the data reside and also have moderate knowledge of computer networking, firewall configuration and sharing disks on a LAN. In other words you will need basic system administration skills. A familiarity with Linux/Unix, and Ubuntu Linux in particular, will also be very useful since that is the operating system used by the VMs.

To run the Hyrax/VM as part of a workshop the general requirements are much more easily met. You need to be able to install software, follow the directions that come with the VMware Player, Workstation or Fusion hypervisor and then proceed on with the workshop agenda.

This work was sponsored by a grant from NASA and a description of how the VMs were built can be found at CheapStix.

jimg 10:10, 12 February 2009 (PST) The CheapStix article has some important information for new users of the VM:

  1. Sometimes, for an unknown reason, the network interfaces come up hosed. The symptom is that networking is not working and that ifconfig -a does not show eth0 in an UP state. When you login to the VM and run ifconfig -a you should see, in addition to some other stuff, the word UP next to eth0. If not, follow the advice for the udev hack on the CheapStix page.

Technical Background

A Virtual Machine (VM) is “an efficient, isolated duplicate of a real machine” (Popek, Gerald J. and Robert P. Goldberg. “Formal Requirements for Virtualizable Third Generation Architectures.” Communications of the ACM. Vol.17 No.7. July 1974).

  •  The ‘isolated duplicate’ is typically a data file or collection of files which contain the information needed to emulate the RAM, spinning disks and other hardware elements of the VM. If you look inside the .zip files we use to hold our Hyrax/VM distributions, you will see they contain a number of files and those have names that indicate which hold the RAM, Processor state and disk memory of the VM.
  •  The information in the files is used by a virtual machine monitor or hypervisor to instantiate the VM.
  •  Once running, the VM is indistinguishable from a ‘real’ computer in many ways.

There are two broad classes of hypervisors: Native and Hosted. A native (or type one) hypervisor runs on ‘bare metal’ while a hosted (or type two) hypervisor runs within an operating system. Later in this Guide we discuss using both type one and two hypervisors to serve data using the Hyrax/VM. For a workshop environment, only the type two (or hosted) hypervisor makes sense since the attendees will all be running the Hyrax/VM on their laptop computers.

There are a large number of VM systems. Note that virtual machine language interpreters (e.g., Java) are not considered part of this set of software.

Why use virtualization?

Benefits:

  • Isolation. It is easy to ensure that software in one VM will be unaffected by problems in another, even when they run on the same hardware and/or hypervisor.
  •  Legacy software. One real computer running a hypervisor can runs a suite of virtual machines, each tailored to the needs of various legacy software packages.
  •  Scalability. VMs provide some interesting possibilities in situations where it is important to be able to add and subtract new hosts from a network. Examples include Internet Service Providers and Cloud Computing environments.

We are using VMs primarily for the first two reasons: We want to provide a way to run our servers within a hypervisor because we fell that is an important option to provide larger groups who need to meet strict security requirements and because we want to support data providers who are using Windows operating systems. In the later case we can develop our software for the Linux/Unix operating systems (which are not really legacy systems...) and then us a VM and hypervisor to run them on the Windows platforms.

Downsides:

  •  Efficiency. It is less efficient, although how much so varies widely among different implementations. If many aspects of the underlying hardware and the VM hardware are the same, the hypervisor can be very efficient.
  •  Complexity. With many computers (virtual or not) comes a larger system administration burden. This can manifest itself in several ways including increased network complexity and the need to take into account the idiosyncrasies of different VMs.

Using the Hyrax and Virtual Machine to server data

Using the Hyrax and Virtual Machine (VM) software combination in a production environment is hard to describe in a completely general way since each such environment will have its own unique characteristics. However, there are several common cases that we will cover here along with some basic choices about additional infrastructure you'll need to add to your computer(s). You'll need to select a hypervisor within which the VM will run and you'll need to determine how that VM will gain access to your data.

To understand more about the construction of the VM, see CheapStix where the process used to build the Ubuntu JeOS VM is described in detail.

Choosing a hypervisor

In order to serve data you will need to use either the VMware Server or ESX hypervisor since these allow incoming network connections to be made with servers running in VMs hosted by the hypervisor. The Server hypervisor is available for free while the ESX hypervisor costs money. For many uses Server is likely adequate in terms of performance and is almost certainly simpler to configure. You should be able to install it on a typical Linux or Windows computer in less than an hour. Choose Server if you need to get a server up on a Windows XP host or if ease of configuration and maintenance are more important than maximum performance. However, the ESX hypervisor is the faster option. If you are using the Hyrax/VM combination as part of a security policy that requires the server to be isolated from the data and other network services, then you should consider this option. Lastly, the ESX hypervisor is part of VMware's Virtual Infrastructure and if performance is a major concern then you should probably research that as well.

Comaprison of the Server and ESX Hypervisors
Hypervisor Cost Environment Complexity Performance
VMware Server Free Windows XP or Linux Low (< 1 hour to install) Modest (Some overhead w/host OS)
VMware ESX $ Bare Intel Hardware High (Requires new Intel hardware) High (No host OS overhead)

Data access

The next choice you must make is how the server (Hyrax) will access the data it serves. One very limited option is to store the data in the virtual machine. This is limited for two reasons. First the data will need to be copied to the VM and copying data is often very undesirable because data volumes are high and because now two copies of the data must be maintained. In addition, the size of the VM has been kept small so that running it imposes the minimum load on the host.

A second option is to store that data on a different computer and access the files using a networked file system. This is really the only viable option for most cases - there's usually just too much data to copy it on the VM disks. If you are using Linux to store the data, then use NFS to export the file system as read only and use NFS to mount the file system on the VM. Similarly, if your host OS is Windows XP, use the XP Shared Folder to make the data/file system available and then use Samba on the VM to mount it.

We don't include both NFS and Samba on the Hyrax/VM because we wanted to load only those packages needed and clearly, only one of these is necessary for the vast majority of cases.

General steps for serving data using a VM

  1. Load the hypervisor on the Host computer
  2. Copy the Virtual machine to the host and store it where the hypervisor can access it.
  3. Start the hypervisor and the VM and verify that both are working.
  4. Export the data to be served from the Host computer (or any other computer)
  5. Mount the data within the VMs file system
  6. Make any configuration changes needed to Hyrax to serve the new data
  7. Start Hyrax and verify correct operation.

Detailed example: Serving data from an XP host using the Ubuntu-based VM

This example details the steps I used to serve data from a virtual machine using the Ubuntu 8.04 JeOS, Hyrax 1.4.2, Tomcat 6. The host computer was running Windows XP Professional, Service Pack 2, with the VMware Server 2.01 hypervisor. The Virtual machine is available for download on the OPeNDAP web site. Mostly this is the basic process outlinedd above, but there were somethings that took some sleuthing to discover - I've included those in blockquoted sections in the example.

  • I started with a fresh installation of Windows XP, SP2 and added VMware Server 2.01.
  • Copy the Virtual machine to the the Windows XP Host system. You can use IE to do this and save the result on the desktop. Then move the file to the Virtual Machines folder made when VMware Server was installed (mine was on the C drive).
  • Start VMware Server. To do this, follow the link on the desktop, which will open a browser window and access the VMware Server using a URL, or open a browser window and type the URL http://<machine name>:8333/.

With IE8 there's a warning about the SSL certificate supplied by VMware Server. Tell IE8 to load the page anyway - it recommends that you don't, but that's because it thinks you're trying to access a remote site. To keep this annoying message from appearing every time you start VMware Server, click on the right end of the URL display area and configure the browser to accept this particular certificate all the time without question.

  • Add the VM you copied to the Virtual Machines folder to the set of VM's that VMware Server controls. To do this, click on Virtual Machines and then Add Virtual Machine to Inventory.
  • Make sure the VMware Server 'console' plugin is installed. First, click on the VM's name in the window pane on the left. Then click on the Console tab at the top of the center pane. If the plugin is not installed, you see a message in yellow about installing it. Do so.

Later on I would upgrade from Internet Explorer 7 to 8, which 'broke' my VMware Server hypervisor. In fact this was really my lack of Win XP experience showing - the VMWare Server software uses a plugin (ActiveX control?) to provide a console to operate the VM. When I upgraded Internet Explorer, I needed to goto the Tools' menu and Manage Addons. To make the VMware Server console plugin active, choose Toolbars and Extensions and then Show All in the menu on the left. Set the VMware plugin so that its available for all sites. If the plugin is installed but not made available, even if it was available under IE 7, then the message you'll get says the plugin is not available and asks if you want to install it.

  • At this point you should be able to start the VM and login. With the VM selected in the left pane, click on the 'play' button and the VM should boot. It may take a few seconds. Once started you should be able to click on the Console tab and see a 'terminal' window popup. Type a return if you're not prompted for a username and password. The VMs all user the username opendap and the password opendap. Login. You can change the password if you'd like. You have full access to the VM using the sudo command.
  • Now export the data from the Windows XP host. Open up a file browsing window and navigate to the folder that holds the data you want to serve. Select the folder and then choose Share this Folder from the pane on the left (I'm working from memory here - it might not be exactly like that, but it's close). I believe Shared Folders are read-only by default and that's probably what you want. If not, there's an option to allow people to change the folder contents (make it read-write).
  • Now mount the Win XP Shared Folder in the VM. Go back to the VM's console window. Since you're using Win XP to export a folder, use the Linux Samba system to mount it within the Linux directory tree. Since we don't bundle Samba with the VM, you'll have to get it. Use the apt-get system to install Samba: sudo apt-get smbfs. Now make a mount point in the VM for the shared folder. I used sudo mkdir /opt/Hyrax-1.4.2/share/hyrax/data/<my data> because Hyrax is already configured to read from /opt/Hyrax-1.4.2/share/hyrax/data/. Finally, mount the Shared Folder onto the new directory: mount -t smbfs -o username=<user>,password=<pw> //<host IP>/data /opt/Hyrax-1.4.2/share/hyrax/data/satellite.

There can be a odd wrinkle in getting the VM connected to the network.

The udev hack

I found that sometimes the VM would start with networking broken. I don't see a pattern, but looking at the network devices using sudo /sbin/ifconfig -a, eth0 is hosed (it does not say 'UP'). To fix this problem, cd to /etc/udev and in the file 70-persistent-net-rules to remove the line about eth0 and edit the line for eth1 replacing 'eth1' with 'eth0'. Restart udev and networking using the eponymous scripts in /etc/init.d. Now ifconfig should show eth0 as 'UP'

Network Access

It may be that editing your /etc/network/interfaces so it looks like the one below will remove the need modify the persistent-net-rules file above. Thanks to Marty Brewer at RRS, Inc for this information.

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# This is a list of hotpluggable network interfaces.
# They will be activated automatically by the hotplug subsystem.
mapping hotplug
        script grep
        map eth0

# The primary network interface
auto eth0
iface eth0 inet static
        address 192.168.0.100
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1

Steps to set up a Hyrax/VM running within VMware Server on a Linux host which also stores the data:

  1. Configure the VM and the hypervisor so the VM has an IP number and can accept HTTP connections from the Internet
    1. Use the vmware-config.pl script, run as root, to install the hypervisor
    2. You'll need a serial number - even though the software is free
    3. IF you want to use the web interface to control the VMs, you need Apache or a compatible HTTP server
    4. Accept the defaults:
      1. Port for the VMware authorization service: 902
      2. HTTP connections: 8222
      3. HTTPs connections: 8333
      4. Use NAT networking (this is not the default)...
    5. If it's not already running, start apache sudo /usr/sbin/apachectl start
    6. fire up a browser and goto https://localhost:8333
    7. login as root on your computer
  2. Export the directory holding the data (typically in read-only mode) to the VM's IP number
  3. Install the packages needed for client NFS in the VM: apt-get install nfs-common portmap
  4. Mount the data directory (seethe NFS manual page for all of the options): /etc/fstab
  5. Edit the BES configuration file:
    /etc/bes/bes.conf
  6. Restart the server

If instead you're using a Windows XP/Vista host to run VMware Server and store the data, you would modify the above to:

  1. Export the data as a SMB share (in place of the Linux/NFS export)
  2. Load Samba on the Hyrax/VM in place of NFS:
    sudo apt-get install samba
  3. Configure Samba to mount the data by editing the samba configuration file:
    /etc/samba/smb.conf

Workshops and the Hyrax and virtual machine software

  • How do I use the Hyrax VM in a workshop?
    • Using the VMware Workstation, Fusion or Player hypervisors
    • Using a web browser to look at data
    • Getting sample clients
    • Powerpoint presentations for use with the VM