Tuesday 30 March 2010

Getting Started: Setting Up CentOS on VirtualBox

The first step of using R with LAMP is to get the standard LAMP part of the system up and going.  There are probably hundreds of references on installing LAMP and getting to Hello World stage.  I'll provide yet another one describing the steps I used in my standard configuration.

At a high level, the components needed are:

  • A Virtual Machine
  • Linux
  • Apache HTTP, MySQL, PHP, Python

Here's some background on options available in these areas.

The Virtual Machine: VirtualBox

The L in LAMP stands for "Linux", and the tools for connecting R with Apache are Linux-focused.  They take advantage of features in Linux that aren't available in Windows to efficiently spawn R threads from Apache.

I tend to work on Windows, so in order to use Linux I can either configure my laptop to have separate Linux and Windows partitions, or I can run Linux in a Virtual Machine.  The advantage of using a Virtual Machine is it makes it easy to set up many different LAMP environments, and the environment is well isolated from the rest of the laptop.

I use Sun VirtualBox as the virtual machine as it is freely available and includes the tools for creating virtual machines.

Another reasonable choice is VMWare Player.  The catch with this is that while it can run a Virtual Machine it can't create one.  However, this isn't that big a deal as you can use EasyVMX to create the basic Virtual Machine configuration.  A post on my other blog describes how to do this.


The Operating System: CentOS

Upon deciding to jump into Linux, one quickly discovers that there are bunches of Linux variations out there.  DistroWatch is a good place to look if you want to be overwhelmed by the richness of choices.

The different Linux variations have much in common, but also have substantive differences.  The main difference at the beginning is they will use different installation utilities based on the family of the distribution.

My opinion is that most people select either a Red Hat or Debian variation.  RedHat uses the yum package manager and Debian uses the apt package manager.  This difference means the installation instructions differ between these distributions.

Debian has widespread support for installation of R packages via apt due largely to the efforts of Dirk Eddelbuettel.  As most R packages can be installed from R itself, this isn't a super-important distinction.  However it does ease the installation of R packages that rely in additional system components such as rJava or Cairo.

Red Hat seems to have more traction with corporate IT groups.  In particular, some of our clients use Red Hat so that's the operating system family that I discuss here.

Red Hat is a commercial company that sells the Red Hat Enterprise Linux (RHEL) operating system along with support and services.  Red Hat the company also sponsors the free Fedora operating system.  Fedora is used partly as an avenue for integrating new technologies into the RedHat family before they are incorporated into the more conservative RHEL product.

Because RHEL is based on GPL code, Red Hat must release the source code back to the community.  The Community Enterprise Operating System (CentOS) project takes the RHEL source, changes the branding, and releases an operating system called CentOS that is binary-compatible with RHEL.  Thus CentOS is a good choice for prototyping applications that may be deployed to RHEL sometime in the future.  In particular, the steps involved when installing on CentOS will match those for RHEL.

I'll be using CentOS here.


The Rest of LAMP: Apache HTTP, MySQL, Perl/PHP/Python

There aren't any decisions to be made regarding the "A" and "M" in LAMP.  The "A" is Apache HTTP.  The "M" is MySQL.

The "P" can stand for Perl, PHP, or Python depending on which scripting language you prefer.  My view is you might as well just install them all.  They are all standard components that are easy to install on Linux, and easy to configure with Red Hat.

Setting It Up

To install LAMP in a virtual machine on Windows, the steps are:

  • Install Sun Virtual Box on Windows
  • Install CentOS on a virtual machine
  • Install Apache HTTP, MySQL, PHP, and Python on CentOS
Initially I was going to cover all of those steps here, but found that just installing Virtual Box and CentOS too a lot of steps.  So this post will cover the first two steps with the AMP installation discussed in another post.

Virtual Box

Installing Sun Virtual Box on Windows is trivial.  Just download the installer and run it.  I'm using Sun VirtualBox 3.1.4.

Getting CentOS

To install CentOS, you'll first need to get a DVD image.  You can download this from the CentOS download site, purchase it from a provider of Linux CD images, or often find it in a Linux magazine.  I downloaded the CentOS 5.4 32-bit version (CentOS-5.4-i386-bin-DVD.iso).  This is 3.7GB, so it'll take awhile to download.


Creating the Virtual Machine


Next, create a virtual machine on which to install the operating system.  A virtual machine is essentially some configuration information together with a file that represents a hard drive.


To create the virtual machine:

  1. Start VirtualBox.
  2. Select the "Machine > New..." menu item to launch the "New Virtual Machine" wizard.
  3. Press the "Next" button.
  4. Enter a "Name" such as "CentOS 5.4 for R-LAMP".
  5. Select "Linux" as the "Operating System" and "Red Hat" as the "Version".  
  6. Press "Next".
  7. On the "Memory" tab, accept the default of 384MB memory.  (I haven't looked into this yet, a larger choice might be good depending upon the amount of physical RAM in your machine.)
  8. Press "Next".
  9. On the "Virtual Hard Disk" tab accept the default of 8192MB.  Leave "Boot Hard Disk" checked and the "Create new hard disk" item selected.
  10. Press "Next".
  11. The "New Virtual Disk" wizard opens.  Press "Next".
  12. Leave "Dynamically expanding storage" selected.  Press "Next".
  13. Leave the "Location" and "Size" unchanged.  Press "Next".
  14. On the "Summary" page press "Finish" to complete the "New Virtual Disk" wizard.
  15. Now press "Finish" to complete the "New Virtual Machine" wizard.
You've now created a virtual machine representing a computer with 384MB of RAM and an 8GB hard drive that's configured to work well with members of the Red Hat operating system family such as CentOS.

It looks like a lot of steps, but really you'll just be pressing "Next" a lot.

Install CentOS

Now we essentially fire up the machine, load the DVD, and install the OS.  

To start the virtual machine:
  1. Select "CentOS 5.4 for R-LAMP" on the left side of the VirtualBox window and press the "Start" button.
  2. This opens a new window titled "CentOS 5.4 for R-LAMP" representing the monitor for the virtual machine along with the "VirtualBox - Information" dialog.  This notes that your cursor and keyboard input are now going to the virtual machine.  When you want to get them back, press the right "Ctrl" key.  (Don't do that right now.)
  3. Press "OK" to dismiss the information box and the "First Run Wizard" will appear.  Press "Next".
  4. You now need to load up the DVD into the virtual DVD drive.  Here I assume you downloaded the ISO file.  If you have a physical disk, adjust the directions accordingly.
  5. Press the file browser next to the "Media Source" field.  Browser for the ISO file and press "Select".  Back in the wizard, press "Next".
  6. Press "Finish" to close the wizard and boot the virtual machine with the CentOS DVD loaded.
At this point you will get the "CentOS" splash screen.  This means you've successfully created the virtual machine and booted it with the CentOS DVD.  Now we let the CentOS DVD do some initial checking of the disk image.
  1. Press the "Enter" key to start up the graphical installer.  It's graphical in the old-school Ultima I sense.  Remember that right Ctrl will take focus away from the virtual machine and clicking on it will move focus back.
  2. The "CD Found" dialog appears.  Use the "Tab" key to highlight the "OK" button and press "Enter".
  3. The "Media Check" dialog appears.  Highlight "Test" and press "Enter".  It'll spend a bit of time checking the DVD for errors.
  4. The "Media Check Result" dialog appears saying "The media check PASSED".  Press "Enter".
  5. The "Media Check" dialog appears again.  We are using a single DVD rather than multiple CD's so there's nothing else to test.  Press "Tab" to highlight "Continue" and press "Enter".
  6. Now here's a tricky bit.  If you followed the directions you get an "Error" dialog.  We checked the DVD but that step didn't leave it loaded or something.  On the top menu of the VirtualBox window select Devices....
  7. On the "Error" dialog press "Enter" again.  It'll now find the DVD and boot up.  It'll show some command line startup information, then launch the CentOS graphical user interface.  This is a bona fide twenty-first century GUI.
  8. You will probably get a "Virtual Box - Information" box saying the VM is optimized for 32 bit color but the virtual display is set to 24 bit.  I haven't figure out how to make this go away, so if anyone knows please tell me.  Press "OK".
At this point we have successfully booted CentOS within the virtual machine.
  1. The screen is now a standard GUI with clickable buttons.  Select "Next".
  2. Leave the language as "English (English)" and press "Next". 
  3. Select the type of keyboard that you are using.  Press "Next".
  4. A "Warning" dialog pops up with a scary message saying the partition table on device hda was unreadable.  Don't panic.  It's just saying that our virtual hard drive hasn't been formatted.  Press "Yes" to format the drive.  (Note we are in a virtual machine so we know reformatting isn't going to hurt our normal hard drive.)
  5. On the next page, keep the partitioning settings unchanged and press "Next".  Another Warning dialog appears.  Press "Yes" again.
  6. The "Network Devices" page appears.  Leave this unchanged and press "Next".
  7. A map of the work appears.  Select the city nearest to you that's in your time zone.  This screen is really all about setting the time zone.  Press "Next".
  8. Enter a Root Password.  In this example we will use "r-lamp" for the root password.  Enter the same password in the Confirm field and press "Next".
Whew, lots of dialogs... But we are getting there.  We now have the hardware all set up and need to make some choices regarding the software to install.

The primary purpose of this machine will be as a server so we don't need to install things like Open Office.  However, I do like to work in a GUI rather than just at the console.  The best default configuration for this scenario is "Server - GUI".
  1. Unselect "Desktop - Gnome" and select "Server - GUI".  It's also reasonable to further customize which items are installed, but for this example we'll just take the defaults.  Press "Next".
  2. The next screen says it's ready to do the installation of the requested components to the hard disk.  Press "Finish".  (Or whatever it's called, I pressed it before writing down the button name.)
At this point there's a new screen with a progress bar and information on each component that's getting installed.  When everything is installed, a "Congratulations" screen appears saying it's time to restart.
  1. First, unmount the DVD by selecting it in the "Devices > CD/DVD Devices" menu at the top of the VirtualBox window.  We are unmounting the DVD so it doesn't boot from it when we restart.
  2. Press "Reboot".
  3. CentOS will shutdown and then go through its startup sequence.  You will then get the "32 bit color" Information dialog again.  Press "OK" to dismiss it.
We have now started CentOS from the hard drive for the first time.  It will walk us through some configuration options.
  1. The "Welcome" screen appears. Press "Forward".
  2. The "Firewall" screen appears.  Since this is to be a web server that I won't log onto remotely, I'll uncheck "SSH" and check "FTP", "HTTPS", and "HTTP".  Press "Forward".  A confirmation dialog appears.  Press "Yes".
  3. The "SELinux" screen appears.  Press "Forward".
  4. The "Kdump" screen appears.  Press "Forward".
  5. The "Date and Time" screen appears.  Set the correct date and time.  If you want it to automatically get the date and time from the network, check the "Enable Network Time Protocol" box on the "Netword Time Protocol" tab.  Press "Forward".
  6. The "Create User" screen appears.  I'm setting this up to use R with LAMP, so I'll create a user named  "r-user" with Full Name "R User" and password "r-passwd".  Press "Forward".
  7. The "Sound Card" screen appears.  Test the sound card if you'd like.  Press "Forward".
  8. The "Additional CDs" screen appears.  Press "Finish".
  9. At this point CentOS will set some things, and a login screen will appear.  
  10. Enter the username "r-user" and password "r-passwd".
The user "r-user" will now be logged in and their Desktop will appear.  If you've made it this far the installation has been a success.

If that's enough fun for now, select "System > Shutdown" from the CentOS menu.  Otherwise proceed to the next post to install the rest of the LAMP stack.











1 comment:

  1. Great article on setting up LAMP prior to integrating R.

    However I can recommend an easier way of creating a LAMP environment in virtualbox, this can be achieved by running a program called Vagrant which cuts out the middle man, I myself have always created my own vms but Vagrant makes the whole process less painful.

    Maybe you could write how to integrate R into LAMP using Vagrant
    http://www.vagrantup.com/

    ReplyDelete