This document outlines what you need to do to run parallel computations
within R, using statcluster.ucsf.edu as a twelve-CPU cluster. This procedure
uses the LAM/MPI message passing interface" to communicate between computers ("nodes") comprising the cluster,
and the "snow" library within R to distribute computations. LAM/MPI
works by using rsh/ssh to distribute computations between cluster nodes and return the results to the calling process.
After the initial set-up in step 1, you need only follow step 2 below to do a parallel computation in R.
Note that this approach is designed for "embarrassingly parallel" computations like simulations and the bootstrap. Also, the examples require a good working knowledge of R or S-PLUS."
Remember that is important to properly shut down LAM/MPI processes after you're finished (steps 2-d and 2-e below). Leaving things running will tie up resources that will be unavailable for others to use.
See the article by Rossini et al and Luke Tierney's pages on the snow() library for more details about parallel computations using R. Finally, note that custom R functions using clusters can be written using primitives in the Rmpi library. Further, LAM/MPI can be used to implement parallel programs in FORTRAN and C that run outside of the R environment. (See www.lam-mpi.org for information on how to do this.)
statcluster.ucsf.edu from Mark Garey. Note that this machine can only be
accessed from Sun and Apple hosts in the Division offices, and that user accounts on the
master node (statcluster) are mirrored to the daughter nodes via NFS, so direct login to
nodes other than the master is unecessary. However, you'll need to set-up password-less login
to the daughter nodes for the LAM/MPI software to work. This is described below:
1.  Use ssh to login in to statcluster from an account on a trusted host among the Division workstations, and generate a file called ".rhosts" in your home directory using a text editor (e.g. vi).
The contents of the file should look like the following:
10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.6These are the IP addresses of all hosts on the internal ethernet network for the cluster. Set the permissions of this file as follows:
chmod 600 .rhostsNow, verify that the file permissions are set to allow reading and writing only by you:
statcluster:~ steve$ ls -lrt .rhosts -rw------- 1 steve staff 54 28 Oct 12:16 .rhostsDo not make a .rhosts file on your desktop machine under any circumstances, and never attempt to set-up password-less login on that machine.
2.   Follow the steps below to run a distributed job using R on statcluster as a 12-unit cluster:
statcluster statcluster node1 node1 node2 node2 node3 node3 node4 node4 node5 node5
statcluster, use the following
commands to initiate the LAM/MPI process on the specified cluster:
LAMRSH=rsh export LAMRSH lamboot -v lamhosts
statcluster:~/work/test steve$ lamboot -v lamhosts LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University n0<11018> ssi:boot:base:linear: booting n0 (statcluster) n0<11018> ssi:boot:base:linear: booting n1 (node1) n0<11018> ssi:boot:base:linear: booting n2 (node2) n0<11018> ssi:boot:base:linear: booting n3 (node3) n0<11018> ssi:boot:base:linear: booting n4 (node4) n0<11018> ssi:boot:base:linear: booting n5 (node5) n0<11018> ssi:boot:base:linear: finished
makeMPIcluster()
should match the number of hosts in your lamhosts file.)
library(snow) cl <- makeCluster(12, type = "MPI")
Loading required package: Rmpi
Rmpi version: 0.4-9
Rmpi is an interface (wrapper) to MPI APIs
with interactive R slave functionalities.
See `library (help=Rmpi)' for details.
12 slaves are spawned successfully. 0 failed.
snow library has versions of standard functions designed to distribute work over the nodes in the cluster and collect the results locally. Below is an example of a simple bootstrap prediction computation with and without using the cluster:
############### single CPU #####################
> source("boot-example.R")
> system.time(boot(nuke.data, nuke.fun, R = 1000, m = 1, fit.pred = new.fit, x.pred = new.data))
[1] 11.86 0.53 12.40 0.00 0.00
############### twelve CPUs #####################
system.time() gives timing information, and the 3rd number returned gives the elapsed time (in seconds) for the process to run. The second call is faster. There is currently only one online help file for snow. You can view it via the help command called with any valid snow function. e.g. "help(parMM)". See the linked page for more details about this example.
stopCluster(cl)
lamclean -v
bootstrap and boot as dcescribed above) is that the results returned from the included cluster nodes may need to be "massaged" a bit to be useable for summarization and/or further analyses. For example, if a bootstrap confidence interval based on 1000 samples boot is desired, the cluster call applying boot() to 10 nodes will return a list consisting of 10 lists, each summarizing the bootstrap results for 100 samples. This structure is not directly usable in the boot.ci() function to compute BCa confidence intervals. It is not usually too difficult to extract the desired results. However, until cluster-friendly versions of standard routines are developed, an alternative approach which provides more control (but relies less on "canned" functions) is described below.
The basic idea is to generate a list of all component datasets (or, in the case of bootstraping, a list of indicators of all desired resamples) first, and then using a cluster-aware version of the function lapply() to apply a selected user-supplied function for estimation across cluster nodes. The results are returned as a list that can be easily manipulated.
.RData object in a directory in your account, and that you have started up a cluster and are in R as described in step 1 above.
The ingredients needed are
The list of sample indicators is generated in the two statements below:
# generate bootstrap sample indicators bdat1 <- sample(seq(1,length(icd2[,1])),1000*length(icd2[,1]),replace=TRUE) # coerce into a list of 1000 components bdat1 <- split(bdat1.0,rep(seq(1,1000),rep(length(jnk[,1]),1000)))The first statement uses the R internal function
sample() to generate a random sample of 1000*n individuals (with replacement) from the original data. The second splits these into a list of 1000 separate sample selection vectors, each of length n. The resulting list of selection vectors bdat1 will be used again below.
In this example, I chose to pass only the sample selection vectors to the nodes (rather than the all data for each bootstrap sample), and use a simple fitting function which calls the actual estimation routine and returns only the desired estimates. Here is a listing of the R code for this function:
> dcft2
function (idx, sp = 1, tol = 1e-04, maxit = 100)
{
n <- length(icd2[idx, 1])
fit <- dcpl(y = icd2[idx, 1], A = rep(1, n), B = rep(1, n), C = icd2[idx, 2], D = icd2[idx, 3], E = rep(0, n), pri = matrix(1, nrow = n, ncol = 1), thet = c(0.01), phi = 0, lambda = sp, maxit = 100, jfail = 3)
if (fit$niter == maxit) {
fit$theta <- NULL
}
fit$theta
}
Notice that this just sets up a call to the actual fitting function, and extracts the desired parameter estimate. (I've also included a statement to set to NULL any parameter vectors resulting in models that don't fully converge.
The idx argument expects row indicators of indiv. selected for a given bootstrap sample, and is used to subset the original dataset (icd2) in the function body. This dataset and the estimation function must be present in the .RData from a previous session. The contents of this are made available to all participating nodes via a call to clusterEvalQ() prior to running the job (see below).
Here are the cluster start-up statements and timing results from running this problem on both a 12-unit cluster and a single CPU on the master node:
# Startup
> cl <- makeCluster(12, type = "MPI")
12 slaves are spawned successfully. 0 failed.
> clusterEvalQ(cl, load(".RData"))
# Cluster results
> system.time(bout1 <- parLapply(cl, bdat1, dcft2))
[1] 0.47 0.16 15.84 0.00 0.00
# Single CPU results
> system.time(bout2 <- lapply(bdat1, dcft2))
[1] 150.72 6.94 155.90 0.00 0.00
# Shutdown
stopCluster(cl)
The cluster job runs approximately ten times faster in this case.
The output of the cluster call is a list of 1000 components, each containing a vector of 12 estimates. I can put these into a 1000 X 12 array and get individual standard errors across all bootstrap samples as follows:
bout1 <- matrix(unlist(bout1), ncol = 12, byrow = T) apply(bout1,2,sd)Now that the boostrap results are available it's possible to perform further computations (e.g. confidence intervals, etc). NOTE: batch jobs and simulation example
gcc) and FORTRAN (g77) compilers available. Below are the statements required to use these compile and make a shared object for dynamic loading in R:
############### # C ############### # Compile (produces the object file subr.o) cc -g -O2 -mcpu=970 -mtune=970 -force_cpusubtype_ALL -mpowerpc-gpopt -c subr.c # Link (produces the shared object file subr.so) cc -bundle -flat_namespace -undefined suppress -L/usr/local/lib/ -o subr.so subr.o -lcc_dynamic -Wl,-framework -Wl,vecLib ############### # FORTRAN ############### # Compile (produces the object file subr.o) g77 -g -O2 -mcpu=970 -mtune=970 -force_cpusubtype_ALL -mpowerpc-gpopt -c subr.f # Link (produces the shared object file subr.so) cc -bundle -flat_namespace -undefined suppress -L/usr/local/lib/ -o subr.so subr.o -lcc_dynamic -Wl,-framework -Wl,vecLibNote that the 3rd through 6th compile options for gcc/g77 are specific to the G5 processor. The "veclib" link option is useful if your subroutine invokes linear algebra routines from BLAS/LAPACK. R is compiled using this framework and it is optimized for the G5 chip. Extending this procedure to several routines is straightforward. To make a single shared object, simply list all the compiled object files together in the link statement (after the ".so" file). The simplest way to handle a large number of routines is to write a makefile and use the
make utility. The resulting shared object can now be loaded using a call to dynload within R. (e.g. dynload("subr.so").)