Tuesday, 6 April 2010

Getting Started: Configuring Rapache

In the last post we installed Rapache, but didn't get to configuring it. Let's now create the "rapache.conf" file and some "Hello World" examples to check that Rapache works.

Configuration

As was the case for PHP and Python, we create a file in the "/etc/httpd/conf.d" directory with configuration information. We will name this "rapache.conf" and start out with the following text:



LoadModule R_module /etc/httpd/modules/mod_R.so

ROutputErrors

<Location /RApacheInfo>
  SetHandler r-info
</Location>

<Directory /var/www/html/rscripts>
  SetHandler r-script
  RHandler sys.source
</Directory>

<Directory /var/www/html/brew>
  SetHandler r-script
  RHandler brew::brew
</Directory>

The "LibModule" statement tells Apache to load the "mod_R.so" shared library and associate it with the "R_module" set of directives.

The "ROutputErrors" statement indicates that R errors should be displayed in the brower.

The "Location" statement creates a location "RApacheInfo" that displays information about the running rapache module. We can test that rapache has loaded and is running correctly by browsing to the link:



http://localhost/RApacheInfo

The first "Directory" statement indicates that all files in the "rscripts" subdirectory will be processed by the "sys.source()" function. This will execute the file as an R script.

The second "Directory" statement indicates that all files in the "brew" subdirectory will be processed by the "brew" function that is in the "brew" package. This function takes a file containing a mix of HTML and R code, executes the R code, and places the results within the HTML that is returned. This is analogous to the mixture of HTML and code in PHP and PSP.

Hello World: R Script

We can test the "R Script" handling with some simple code that generates HTML. This code also uses the "setContentType" function to indicate that the result should be treated as HTML, and finishes with the "DONE" statement indicating the script has finished without error.



setContentType("text/html")
cat("<HTML><BODY><H1>")
cat("Hello from R!")
cat("</H1></BODY></HTML>")
DONE

If we save this to the file "test.R" in "/var/www/html/rscripts" we will see "Hello from R!" displayed in the Header 1 font when we browse to:



http://localhost/rscripts/test.R

Hello World: Brew

Instead of writing out all of the HTML directly with "cat()" commands, we can create an "rhtml" file containing a mix of R and HTML. This then gets processed by the "brew" function to create the HTML response.

As an example, we create the file "test.rhtml" in "/var/www/html/brew" containing:



<HTML>
<BODY>
<H1>
<% cat("Hello from Brew!") %>
</H1>
</BODY>
</HTML>

Browsing to "http://localhost/brew/test.rhtml" will display "Hello from Brew!" in the Header 1 font.

Beyond Hello

Rapache provides rich capabilities for accessing all of the information in the HTTP request from within R, and for setting information as part of the HTTP response.

It also provides ways to pass information other than HTML back as the response, and can even support the uploading of files to the server as part of the client's request.

This is discussed on the Rapache manual and displayed by the examples available from the Rapache web site.

Wednesday, 31 March 2010

Getting Started: Setting Up Rapache

In previous posts we've described how to set up CentOS running on a virtual machine in VirtualBox, and how to set up a standard LAMP configuration on that box.

The final pieces needed to run R with LAMP are:

R built for use as a shared library
rapache to connect R with Apache
The "brew" package for processing HTML files with embedded R code
Other packages we want to be available from R

Installing Dependencies

In order for Apache to load R, the R application needs to be built as a shared library. As we are building R from the source, it's convenient to make sure we have installed other C libraries commonly needed by R first.

Recall that in the CentOS machine we configured there is a user "r-user" with password "r-passwd". The system password is "r-lamp". Start up CentOS and log in as "r-user".

Open a terminal and use the "su" command to gain administrative rights. Then use "yum" to install the following packages:

yum install gcc-gfortran
yum install gcc-c++
yum install readline-devel
yum install libpng-devel libX11-devel libXt-devel
yum install texinfo-tex
yum install tetex-dvips
yum install docbook-utils-pdf
yum install cairo-devel
yum install java-1.6.0-openjdk-devel
yum install libxml2-devel

It isn't strictly required to install all of these. For example, the "openjdk" libraries are only needed if the "rJava" package is going to be used. However, it's much easier to install them all now than to rebuild R later when the desire to use "rJava" occurs.

Installing R

The first step is to retrieve the source "tar.gz" file from CRAN. The source for the latest release is available from the main page of the CRAN web site, such as:

http://cran.r-project.org/src/base/R-2/R-2.10.1.tar.gz

Use Firefox to download this file, and move it to the home directory for "r-user".

At this point we need to follow the R installation instructions carefully. Two important points:

In order for the files to be readable by others, we need to set the appropriate default permissions for newly created files. For example, set "umask 022".
When configuring the R build, include the flag "--enable-R-shlib".

The following commands will unpack the files and build R 2.10.1, which is the most recent version when this was written:

umask 022
tar xf R-2.10.1.tar.gz
cd R-2.10.1
./configure --enable-R-shlib
make

For a quick check on whether R built correctly:

make check

When you are happy with the build, install R with:

make install

Installing R Packages

We will go ahead and install various packages that are likely to be of use with web applications. It is likely that you will want to disallow installation of packages by R scripts run from Apache for security reasons, so the required packages would be installed in advance.

To install the packages, first use "cd" to change back to the home directory of "r-user" then start R with "R". Package installation will possibly fail if you start "R" from within the directory where we were running the "make" command.

Use the "install.packages()" function to install the packages, such as:

install.packages(c("brew", "XML", "rjson", "RMySQL", "RJDBC", "rJava","Cairo", "Hmisc"))

The only package that's required for use with "rapache" is "brew".

Installing Rapache

The procedure for downloading, building, and installing Rapache is similar to that for R. The main detail is you need to include "--with-apache2-apxs=/usr/sbin/apxs" when doing the configuration.

Download the "tar.gz" from the rapache web site:

http://rapache.net/files/rapache-1.1.9.tar.gz

Move the file to the home directory for "r-user" and run the following commands:

tar xf rapache-1.1.9.tar.gz
cd rapache-1.1.0
./configure --with-apache2-apxs=/usr/sbin/apxs
make install

Rapache is now installed.

Configuring Apache

The next step is configuring Apache HTTPD to load the R module.

Create a file "rapache.conf" in "/etc/httpd/conf.d" with this basic configuration information:

LoadModule R_module /etc/httpd/modules/mod_R.so


ROutputErrors


<Location /RApacheInfo>
  SetHandler r-info
</Location>

LoadModule tells Apache to load rapache. ROutputErrors tells it to direct error messages from the R engine to the browser rather than just putting them in a log file. SetHandler maps the "RApacheInfo" location to an action returning information about the rapache configuration.

To test this out, browse to "http://localhost/RApacheInfo". Information about rapache should be displayed.

At this point we have rapache working, but Apache is not yet configured to process "R" or "RHTML" files. That configuration and example test scripts will be covered in a later post. If you are anxious to get that configuration in place, see the rapache manual for details.

Tuesday, 30 March 2010

Getting Started: The AMP in LAMP

In the previous post we set up CentOS running under Virtual Box.

So far we've just taken the default "Server - GUI" configuration. Now we'll install the rest of the AMP stack, along with some tools to make our work easier.

We'll install the base AMP components:

Apache HTTP
MySQL
Perl
PHP
Python

and the additional tools:

Firefox
emacs

Installing Software on CentOS

CentOS uses the yum package manager. We will use the command-line application yum to install packages. The primary commands we will use are "yum install pkgname" and "yum search pkgname".

If you prefer to use a GUI, the "Applications > Add/Remove Software" provides dialogs for installing and uninstalling software.

To use "yum", first open a terminal from the "Applications > Accessories > Terminal" menu item. We will be logged into the terminal as "r-user". In order to install system software we need to be a "superuser". That is, a user with administrative rights.

To show that you are a superuser, use the "su" command and enter the system password. In our example installation the password is "r-lamp".

Here are the commands to install the components we want.



yum install emacs

yum install firefox

yum install httpd httpd-devel

yum install mysql mysql-server mysql-devel

yum install php php-mysql php-common php-gd php-mbstring php-mcrypt php-devel php-xml

yum install perl

yum install python mod_python

If the component is already installed, it will say so and return immediately. If not, you'll be shown a list of components to be installed. Press "y" to continue with the installation.

The list of MySQL and PHP components is taken from the very useful "Quick 'n' Easy LAMP Server" article. That article is the basis for the configuration steps described below, along with the article "Embedding Python in Apache 2".

Apache HTTPD

The most-used web server in the world is Apache HTTPD. This is the Apache HTTP daemon. We'll refer to this as "Apache".

In a LAMP application, the job of HTTPD is to take requests from a web browser, hand them over to something like PHP to prepare a response such as an HTML page, and delivers that response back to the web browser.

Some commands we will use to manage HTTPD are:



/etc/init.d/httpd/status

/etc/init.d/httpd/start

/etc/init.d/httpd/stop

/etc/init.d/httpd/restart

Apache has a standard system for installing add-on "modules". These are C applications expanding the capabilities of Apache. We will configure the modules needed for Apache to call PHP, Perl, or Python when the user requests a page requiring processing.

Apache has a wide range of capabilities for directing requests based on the URL requested. Here we will configure Apache to call PHP, Perl, or Python when files with specific file extensions are requested.

By default, Apache maps URL's to locations under the "/var/www/html" directory. We will place our example files there.

PHP

When the PHP components are install by "yum", Apache is also configured to use PHP when a requested file name ends in "php", such as "test.php".

Create a file "/var/www/html/test.php" with the following text:

<?php
   phpinfo();
?>

If you are familiar with emacs, you can launch an emacs editor
window from the Terminal using the "emacs" command. Otherwise,
use some other method to get a text editor with permission to
write to "/var/www/html".

I'm being vague here as I haven't
explored how to get the right permissions using the "gedit"
text editor that is available from the menu system.

To test PHP, make sure Apache is started, open Firefox, and
browse to "http://localhost/test.php". If PHP is working,
the browser will display PHP version information.

Python

The Apache configuration file for Python is in "/etc/httpd/conf.d/python.conf". Before modifying this file, you may want to save a copy under a name such as "python.conf.orig".

The line in this file that enables the Python support in general is:



LoadModule python_module modules/mod_python.so

The Python module supports two types of file processing. The "Python Publisher" handler will execute a Python script file. The "Python PSP" handler takes a file with Python code embedded in HTML and processes it in a manner similar to PHP.

We can specify that "py" files will be handled by the "Python Publisher" and "psp" files by the "Python PSP" handler as shown below.



<Directory /var/ww/html>

  AddHandler mod_python .psp .py

  <Files *.psp>

    PythonHandler mod_python.psp

  </Files>

  <Files *.py>

    PythonHandler mod_python.publisher

  </Files>

</Directory>

The following code creates an "mpinfo" location that provides information about mod_python similar to the information we saw from the "phpinfo()" function.



<Location /mpinfo>

    SetHandler mod_python

    PythonHandler mod_python.testhandler

</Location>

With these settings, we can create a "test.py" and "test.psp" file in "/var/www/html" to try things out.

An example "test.py" file is:



def index(req):

    return "Hello from Python!";

An example "test.psp" file is:



<HTML>

<BODY>

<H1><% req.write("Hello from PSP!") %></H1>

</Body>

</HTML>

We can browse to "http://localhost/test.py" and "http://localhost/test.psp" to try these out. Browse to "http://localhost/mpinfo" to see the information about mod_python.

Perl

Configuring Perl is similar in principle to configuring Python. Having said that, I don't have a lot of interest in Perl at the moment and the "Hello World" examples aren't concise enough to be trivial at this time of the evening. Hence we will leave Perl unexplored for the time being.

MySQL

Before departing, let's not forget the "M" in LAMP, which is "MySQL". MySQL is a popular open source database. The MySQL daemon will start up MySQL and leave it waiting for requests from PHP, Python, or Perl. The daemon is started with:



/etc/init.d/mysqld start

The console interface to MySQL can be started with:



mysql

The "Quick 'n' Easy LAMP" article referenced above provides information on how to set the administrative password and create a new user in MySQL.

Automatic Startup

On a server, we will want Apache and MySQL to start up automatically when the machine is started. The following commands will set this:



/sbin/chkconfig httpd on

/sbin/chkconfig mysqld on

Summary

In this posting we've discussed how to install and configure Apache, MySQL, PHP, and Python. We skipped Perl for the time being.

This gives us everything we need to start creating LAMP applications. The next post in this series will discuss how to add Rapache in order to use R rather than PHP or Python for processing requests.

Getting Started: Setting Up CentOS on VirtualBox

The first step of using R with LAMP is to get the standard LAMP part of the system up and going. There are probably hundreds of references on installing LAMP and getting to Hello World stage. I'll provide yet another one describing the steps I used in my standard configuration.

At a high level, the components needed are:

A Virtual Machine
Linux
Apache HTTP, MySQL, PHP, Python

Here's some background on options available in these areas.

The Virtual Machine: VirtualBox

The L in LAMP stands for "Linux", and the tools for connecting R with Apache are Linux-focused. They take advantage of features in Linux that aren't available in Windows to efficiently spawn R threads from Apache.

I tend to work on Windows, so in order to use Linux I can either configure my laptop to have separate Linux and Windows partitions, or I can run Linux in a Virtual Machine. The advantage of using a Virtual Machine is it makes it easy to set up many different LAMP environments, and the environment is well isolated from the rest of the laptop.

I use Sun VirtualBox as the virtual machine as it is freely available and includes the tools for creating virtual machines.

Another reasonable choice is VMWare Player. The catch with this is that while it can run a Virtual Machine it can't create one. However, this isn't that big a deal as you can use EasyVMX to create the basic Virtual Machine configuration. A post on my other blog describes how to do this.

The Operating System: CentOS

Upon deciding to jump into Linux, one quickly discovers that there are bunches of Linux variations out there. DistroWatch is a good place to look if you want to be overwhelmed by the richness of choices.

The different Linux variations have much in common, but also have substantive differences. The main difference at the beginning is they will use different installation utilities based on the family of the distribution.

My opinion is that most people select either a Red Hat or Debian variation. RedHat uses the yum package manager and Debian uses the apt package manager. This difference means the installation instructions differ between these distributions.

Debian has widespread support for installation of R packages via apt due largely to the efforts of Dirk Eddelbuettel. As most R packages can be installed from R itself, this isn't a super-important distinction. However it does ease the installation of R packages that rely in additional system components such as rJava or Cairo.

Red Hat seems to have more traction with corporate IT groups. In particular, some of our clients use Red Hat so that's the operating system family that I discuss here.

Red Hat is a commercial company that sells the Red Hat Enterprise Linux (RHEL) operating system along with support and services. Red Hat the company also sponsors the free Fedora operating system. Fedora is used partly as an avenue for integrating new technologies into the RedHat family before they are incorporated into the more conservative RHEL product.

Because RHEL is based on GPL code, Red Hat must release the source code back to the community. The Community Enterprise Operating System (CentOS) project takes the RHEL source, changes the branding, and releases an operating system called CentOS that is binary-compatible with RHEL. Thus CentOS is a good choice for prototyping applications that may be deployed to RHEL sometime in the future. In particular, the steps involved when installing on CentOS will match those for RHEL.

I'll be using CentOS here.

The Rest of LAMP: Apache HTTP, MySQL, Perl/PHP/Python

There aren't any decisions to be made regarding the "A" and "M" in LAMP. The "A" is Apache HTTP. The "M" is MySQL.

The "P" can stand for Perl, PHP, or Python depending on which scripting language you prefer. My view is you might as well just install them all. They are all standard components that are easy to install on Linux, and easy to configure with Red Hat.

Setting It Up

To install LAMP in a virtual machine on Windows, the steps are:

Install Sun Virtual Box on Windows
Install CentOS on a virtual machine
Install Apache HTTP, MySQL, PHP, and Python on CentOS

Initially I was going to cover all of those steps here, but found that just installing Virtual Box and CentOS too a lot of steps. So this post will cover the first two steps with the AMP installation discussed in another post.

Virtual Box

Installing Sun Virtual Box on Windows is trivial. Just download the installer and run it. I'm using Sun VirtualBox 3.1.4.

Getting CentOS

To install CentOS, you'll first need to get a DVD image. You can download this from the CentOS download site, purchase it from a provider of Linux CD images, or often find it in a Linux magazine. I downloaded the CentOS 5.4 32-bit version (CentOS-5.4-i386-bin-DVD.iso). This is 3.7GB, so it'll take awhile to download.

Creating the Virtual Machine

Next, create a virtual machine on which to install the operating system. A virtual machine is essentially some configuration information together with a file that represents a hard drive.

To create the virtual machine:

Start VirtualBox.
Select the "Machine > New..." menu item to launch the "New Virtual Machine" wizard.
Press the "Next" button.
Enter a "Name" such as "CentOS 5.4 for R-LAMP".
Select "Linux" as the "Operating System" and "Red Hat" as the "Version".
Press "Next".
On the "Memory" tab, accept the default of 384MB memory. (I haven't looked into this yet, a larger choice might be good depending upon the amount of physical RAM in your machine.)
Press "Next".
On the "Virtual Hard Disk" tab accept the default of 8192MB. Leave "Boot Hard Disk" checked and the "Create new hard disk" item selected.
Press "Next".
The "New Virtual Disk" wizard opens. Press "Next".
Leave "Dynamically expanding storage" selected. Press "Next".
Leave the "Location" and "Size" unchanged. Press "Next".
On the "Summary" page press "Finish" to complete the "New Virtual Disk" wizard.
Now press "Finish" to complete the "New Virtual Machine" wizard.

You've now created a virtual machine representing a computer with 384MB of RAM and an 8GB hard drive that's configured to work well with members of the Red Hat operating system family such as CentOS.

It looks like a lot of steps, but really you'll just be pressing "Next" a lot.

Install CentOS

Now we essentially fire up the machine, load the DVD, and install the OS.

To start the virtual machine:

Select "CentOS 5.4 for R-LAMP" on the left side of the VirtualBox window and press the "Start" button.
This opens a new window titled "CentOS 5.4 for R-LAMP" representing the monitor for the virtual machine along with the "VirtualBox - Information" dialog. This notes that your cursor and keyboard input are now going to the virtual machine. When you want to get them back, press the right "Ctrl" key. (Don't do that right now.)
Press "OK" to dismiss the information box and the "First Run Wizard" will appear. Press "Next".
You now need to load up the DVD into the virtual DVD drive. Here I assume you downloaded the ISO file. If you have a physical disk, adjust the directions accordingly.
Press the file browser next to the "Media Source" field. Browser for the ISO file and press "Select". Back in the wizard, press "Next".
Press "Finish" to close the wizard and boot the virtual machine with the CentOS DVD loaded.

At this point you will get the "CentOS" splash screen. This means you've successfully created the virtual machine and booted it with the CentOS DVD. Now we let the CentOS DVD do some initial checking of the disk image.

Press the "Enter" key to start up the graphical installer. It's graphical in the old-school Ultima I sense. Remember that right Ctrl will take focus away from the virtual machine and clicking on it will move focus back.
The "CD Found" dialog appears. Use the "Tab" key to highlight the "OK" button and press "Enter".
The "Media Check" dialog appears. Highlight "Test" and press "Enter". It'll spend a bit of time checking the DVD for errors.
The "Media Check Result" dialog appears saying "The media check PASSED". Press "Enter".
The "Media Check" dialog appears again. We are using a single DVD rather than multiple CD's so there's nothing else to test. Press "Tab" to highlight "Continue" and press "Enter".
Now here's a tricky bit. If you followed the directions you get an "Error" dialog. We checked the DVD but that step didn't leave it loaded or something. On the top menu of the VirtualBox window select Devices....
On the "Error" dialog press "Enter" again. It'll now find the DVD and boot up. It'll show some command line startup information, then launch the CentOS graphical user interface. This is a bona fide twenty-first century GUI.
You will probably get a "Virtual Box - Information" box saying the VM is optimized for 32 bit color but the virtual display is set to 24 bit. I haven't figure out how to make this go away, so if anyone knows please tell me. Press "OK".

At this point we have successfully booted CentOS within the virtual machine.

The screen is now a standard GUI with clickable buttons. Select "Next".
Leave the language as "English (English)" and press "Next".
Select the type of keyboard that you are using. Press "Next".
A "Warning" dialog pops up with a scary message saying the partition table on device hda was unreadable. Don't panic. It's just saying that our virtual hard drive hasn't been formatted. Press "Yes" to format the drive. (Note we are in a virtual machine so we know reformatting isn't going to hurt our normal hard drive.)
On the next page, keep the partitioning settings unchanged and press "Next". Another Warning dialog appears. Press "Yes" again.
The "Network Devices" page appears. Leave this unchanged and press "Next".
A map of the work appears. Select the city nearest to you that's in your time zone. This screen is really all about setting the time zone. Press "Next".
Enter a Root Password. In this example we will use "r-lamp" for the root password. Enter the same password in the Confirm field and press "Next".

Whew, lots of dialogs... But we are getting there. We now have the hardware all set up and need to make some choices regarding the software to install.

The primary purpose of this machine will be as a server so we don't need to install things like Open Office. However, I do like to work in a GUI rather than just at the console. The best default configuration for this scenario is "Server - GUI".

Unselect "Desktop - Gnome" and select "Server - GUI". It's also reasonable to further customize which items are installed, but for this example we'll just take the defaults. Press "Next".
The next screen says it's ready to do the installation of the requested components to the hard disk. Press "Finish". (Or whatever it's called, I pressed it before writing down the button name.)

At this point there's a new screen with a progress bar and information on each component that's getting installed. When everything is installed, a "Congratulations" screen appears saying it's time to restart.

First, unmount the DVD by selecting it in the "Devices > CD/DVD Devices" menu at the top of the VirtualBox window. We are unmounting the DVD so it doesn't boot from it when we restart.
Press "Reboot".
CentOS will shutdown and then go through its startup sequence. You will then get the "32 bit color" Information dialog again. Press "OK" to dismiss it.

We have now started CentOS from the hard drive for the first time. It will walk us through some configuration options.

The "Welcome" screen appears. Press "Forward".
The "Firewall" screen appears. Since this is to be a web server that I won't log onto remotely, I'll uncheck "SSH" and check "FTP", "HTTPS", and "HTTP". Press "Forward". A confirmation dialog appears. Press "Yes".
The "SELinux" screen appears. Press "Forward".
The "Kdump" screen appears. Press "Forward".
The "Date and Time" screen appears. Set the correct date and time. If you want it to automatically get the date and time from the network, check the "Enable Network Time Protocol" box on the "Netword Time Protocol" tab. Press "Forward".
The "Create User" screen appears. I'm setting this up to use R with LAMP, so I'll create a user named "r-user" with Full Name "R User" and password "r-passwd". Press "Forward".
The "Sound Card" screen appears. Test the sound card if you'd like. Press "Forward".
The "Additional CDs" screen appears. Press "Finish".
At this point CentOS will set some things, and a login screen will appear.
Enter the username "r-user" and password "r-passwd".

The user "r-user" will now be logged in and their Desktop will appear. If you've made it this far the installation has been a success.

If that's enough fun for now, select "System > Shutdown" from the CentOS menu. Otherwise proceed to the next post to install the rest of the LAMP stack.

About This Blog

Welcome to the R-LAMP blog! This is a blog focused on using R as part of a LAMP application.

At the very least, this will serve as a place for people to find references on getting up and going with R and LAMP. Potentially it will grow to also contain information on other web components useful for statistics and plots, such as Google Charts.

If anyone would like to add relevant posts, drop me a note and I can add you as an author.

R-LAMP: Using R for LAMP Development