Pondermatic IV

Home Supercomputing with Linux


My efforts to build a supercomputer at home using Linux and Beowulf technology have evolved much over the past year. My early efforts centered on building a cluster I called the Pondermatic. This page outlines my current efforts involving budget multiprocessor systems, Redhat Linux 6.0 and the parallel version of the freeware raytracer, Povray. The results have been impressive.

For the original version of this file detailing the use of Redhat 5.2 and Povray 3.01 see http://www.cris.com/~rjbono/html/oldpondermatic.html.


Starting Point

My intital experiments indicated that raytracing with clusters consisting of processors having nearly equal speed were more efficient than a cluster with a variety of different CPU speeds. In many cases, the slower machines would bog down the rest of the cluster. The result was poor performance as the faster machine ran out of things to process.

At this point I decided that I wanted to try using the SMP features in the new Linux kernel. At the same time, I read a review on the new Abit BP6 motherboard. This motherboard allows the use of two relatively inexpensive Intel Celeron processors to be used in SMP dual processor mode. This is a relatively inexpensive (~$130) board that is chock full of very impressive features, including precise control of the CPU & bus speeds and hardware monitoring functions.

The Intel Celeron CPU is an inexpensive, Pentium II processor that was initially designed to compete against the low cost AMD K6 and Cryix CPU's. The Celeron is known to overclock quite well…

Before I go on I must state the obligatory disclaimer regarding overclocking. Overclocking is bad, can ruin your system, cause unexplained hair loss, will void any warranty you may have with Intel and is generally not recommended. I do not advocate its use (although it works for me!) and any attempts you make to overclock your machine are at your own risk!

…with a 95% success rate in overclocking a 366 MHz CPU to 458 MHz by simply increasing the bus speed to 83 MHz.

With this in mind I decided to see what two dual Celeron systems overclocked to 458 MHz would provide in terms of price and performance in a simple Beowulf cluster using 10Mbit/sec Ethernet.

Sourcing Parts

All the components for the two machines were purchased via the Internet and sourced through some very helpful price search engines: Pricewatch & Killerapp. For about $550 you can get an ATX case, Abit BP6 motherboard, two Celeron 366 MHz CPU's w/fans, 64 MB of PC100 SDRAM, an 8MB AGP video card, floppy drive, sound card and 4.3 GB hard drive. I reused monitors, keyboards, CD-ROMs, zip drives etc to round out one system as a multiple boot (Win98, WinNT4, & Redhat 6.0) machine and the other dedicated to running Linux.

I sourced Redhat Linux 6.0 from Linux System Labs which in my humble opinion is the best source for linux distributions on the Internet.

Linux supports SMP nicely and PVMPOV allows you to take advantage of the multiple CPU's.

 

The base system configuration for each of two machines in Pondermatic IV are as follows:

The QED machine has an external modem, IDE zip drive, and Mitsumi CD-RW drive as well.

 Installing Redhat Linux 6.0

Redhat has a slight advantage over other distributions due to the RPM method of installing applications. RPM's for key Beowulf software such as MPICH, LAM & PVM are available at the Redhat ftp site.

Install notes for QED (First machine):

1.    Insert CD-ROM in Drive and get to a DOS prompt.

2.    Change to the CD-ROM drive (for this example say its d:)

3.    Change to the images subdirectory. The CD-ROM install needs only one boot disk.

4.    Create the boot disk using rawrite to transfer the boot.img file to a floppy. The command is d:\dosutils\rawrite.

5.    Enter the name of the boot image: boot.img

6.    Enter the floppy drive destination: a

7.    Rawrite then creates the boot image.

NOTE: If you choose to not install Lilo on a multiple CPU system, the boot disk will default to a uniprocessor kernel. When using the loadlin method outlined above you will need to copy the smp enabled kernel from the /boot directory to your windows c:\ drive.

The second machine I call Pondermatic and the installation is considerably different. This is a Linux only box that does not have a CD drive. Here I used the NFS install. A different boot disk is required for an NFS install. Follow the bootdisk creation instructions outlined above but use bootnet.img instead of boot.img. Next I had to configure QED to be my NFS server for Pondermatic. Once logged in to QED as root do the following to setup NFS.

QED is now ready to be a server for an NFS install to Pondermatic. To proceed:

Configuring the Cluster

A Beowulf Cluster basically works with one of two message passing libraries. One is MPI (Message Passing Interface) and the other is PVM (Parallel Virtual Machine). When compiled into the application these libraries pass intermediate data between machines. Both MPI & PVM use the TCP/IP protocol to communicate with other machines. Further, they use the rsh command to initiate sessions with the other machines. This handy Unix command allows you to issue command lines to remote machines. Handy, but not very secure. This isn't a huge problem for the home supercomputer but something to keep in mind if you are doing something larger that is full-time connected to the internet.

The following items must be performed on each machine (I'll use Pondermatic as an example):

  1. Login as root, the superuser.
  2. Create a new user and assign a password. Working as root all the time is dangerous. The new user will have a home directory created. For example say we have created a user named rjbono. The user's home directory is /home/rjbono.
  3. Edit the /etc/hosts file. The hosts file translates the IP addresses of each of the machines in your network to a name. Here’s what mine looks like
  4. 192.168.0.1

    qed.synergetics.org

    qed

    192.168.0.2

    pondermatic.synergetics.org

    pondermatic

  5. Create and edit the /etc/hosts.equiv file. This file contains the names of machines allowed to access the machine with rsh commands. Here's mine:
  6. qed.synergetics.org

    pondermatic.synergetics.org

  7. Create these files on each of the machines on the network.
  8. Logout and then login as your new user.
  9. Test communications. Say your sitting at the qed console. First test that TCP/IP and your host file is working by pinging the other machines: e.g. ping pondermatic. Ctrl-C to stop the pings and display a summary. If pinging doesn't work you'll have to start digging through Linux how-to's to troubleshoot your set-up.
  10. Now test that rsh works. Issue a command like rsh pondermatic "ls -l" from QED. This should return a listing of pondermatic's user home directory.

Given that this works right, your networking is now configured to work with PVM.

Installing and configuring PVM

After experimenting with different versions of PVM I found that the best thing to do was download the latest version (pvm3.4.0) and compile it. The RPM of pvm 3.3 on the beowulf CD works but does not have any of the files needed to compile PVM enabled programs. The 3.3 release version does not compile under Redhat 5.X for some reason. The installation is pretty straightforward:

  1. Download pvm3.4.0.tgz into your user directory ( mine is /home/rjbono).
  2. Add the following to your .bash_profile file
  3. PVM_ROOT=$HOME/pvm3

    PVM_DPATH=$PVM_ROOT/lib/pvmd

    PVM_ARCH=LINUX

    export PVM_ROOT PVM_DPATH PVM_ARCH

  4. Logout and then login again as your username to install the new environment.
  5. Now untar the PVM files with tar -zxvf pvm3.4.0.tgz
  6. Change directory into /pvm3
  7. Run make and let the compile proceed.
  8. When complete move back to your home directory and issue the pvm command. You should see the pvm prompt. If so all is well. Type halt to exit pvm.
  9. Repeat this on each of the machines in the cluster.
  10. Now test that you can add machines to the cluster. At the console of qed for example, I enter pvm to start the pvm daemon prompt. Typing conf will show the machines currently in your cluster (only qed for now). Type add followed by the name of one of the machines in your cluster (e.g. add pondermatic.synergetics.org). Repeat this for each of the machines in the cluster. Type conf again and you see a list of all machines on the cluster.

If you made it this far you have a working cluster! Now on to the first parallel program. Type halt to exit pvm.

Installing PVMPOV

Pov-ray is a multiplatform, freeware raytracer. Many people have modified its source code to produce special "unofficial" versions. One of these unofficial versions is PVMPOV, which enables POVray to run on a beowulf cluster. PVMPOV has evolved quite a bit since first written. Many thanks to Andreas Dilger, Harald Deischinger, & Jakob Flierl for writing and maintaining the patches that make PVMPOV work. The beowulf CD has the RPMS for this program, however I found that they were much slower that the normal program for some reason. For this reason I decided to compile POVray from source after applying the PVM patches. The instructions for the 3.1e version of POVray follow:

  1. Download the UNIX sources for Povray version 3.1e from http://www.povray.org . These consist of two files povuni_s.tgz and povuni_d.tgz.
  2. Download the PVMPOV patch file and store in your home directory.
  3. Create a directory called pvmpov3_1e_1 in your home directory using the mkdir command.
  4. Copy the povray files into the pvmpov3_1e_1 directory and untar them. This will create a directory called /povray31.
  5. From your home (cd ~ to get back to it) directory untar the PVMPOV patch file. Change into the /pvmpov3_1e-1 directory.
  6. Apply the patch by executing the inst-pvm script.
  7. Change into the /pvmpov3_1e_1/povray31/source/zlib directory to compile the compression library. First run ./configure followed by make test. Now su to become the root user temporarily and type make install to install the library. Type exit to become the normal user again.
  8. Change into the /pvmpov3_1e_1/povray31/source/libpng directory to compile the png library. Create the makefile by issuing the command: cp scripts/makefile.std makefile. Type make test. When complete, issue the su command again and type make install. Type exit to become the normal user again.
  9. Change into the /pvmpov3_1e_1/povray31/source/pvm directory and modify the pvm.h file by changing the line "#if defined(SUN4SOL2) || defined(your_pvm_arch)" with "#if defined(SUN4SOL2) || defined(LINUX)".
  10. Make sure that the PVM_ARCH variable is defined by typing "export PVM_ARCH=LINUX".
  11. Temporarily add the pvm library directory to your search path by entering "export PATH=$PATH:$PVM_ROOT/lib"
  12. Now start the compile by entering "aimk newunix" from the /pvmpov3_1e-1/povray31/source/pvm directory. When complete the PVMPOV executable will be in the /povray31/sources/pvm/LINUX directory. Copy this to your main home directory. Run "aimk newsvga" and "aimk newxwin" to generate the SVGA and X-windows versions respectively. Now repeat this on the remaining machines.

Running PVMPOV and benchmarking

I suggest going to the POVray benchmarking site and downloading the skyvase.pov file for your first rendering. By using this file you can compare the rendering time of your cluster against other computers and clusters. Copy the skyvase.pov file into the home directory of each of the computers running pvm.

Now the fun part:

  1. Start pvm from one of your machines.
  2. Add each of the other machines as before when we tested pvm. Type conf to confirm that they are all running.
  3. Now type quit at the pvm prompt to drop back to the command-line. Note that the pvm daemon is still running. Type pvm if you want to get back into pvm. Always type the halt command to stop the daemon before logging out.
  4. With the cluster configured type the following to begin the raytracing:

./pvmpov +iskyvase.pov +h480 +w640 +FT +v1 -x -d +a0.300 -q9 -mv2.0 -b1000 -nw32 -nh32 -nt4 -L/home/rjbono/pvmpov3_1e_1/povray31/include

This is the benchmark option command-line with the exception of the -nw and -nh switches, which are specific to pvmpov and define the size of image each of the slaves will be working on. The -nt4 switch is specific to the Pondermatic IV configuration. It starts four tasks, one for each CPU.

The messages on the screen should show that slaves were successfully started. The cluster is now rendering the image. When complete, PVMPOV will display the slave statistics as well as the total render time.

 

You're Supercomputing, baby!

 

Pondermatic IV Cluster Benchmark Results

My first cluster, the Pondermatic, consisting of five machines (mostly 486 machines) rendered the Povbench test image in 1 minute, 45 seconds. To put this in perspective, a single 486-66 running the same job takes ~20 minutes to complete. A 266 Mhz MMX processor scored in at 3 minutes, 5 seconds.

The overclocked, dual processor machines scored considerably better. Using single processor mode the render time was 1 minute, 4 seconds. Using both CPU's on a single machine dropped the render time to 39 seconds. Adding the second machine's dual CPU's dropped the time to 22 seconds.

The original Pondermatic cluster compares well with 300 & 400 MHz Pentium II's as well as a DEC Alpha 500 MHz machine. The Pondermatic IV cluster performed quite well in PVMPOV as compared to other parallel machines. The SMILE cluster was 1 second faster and consists of $27,000 of Pentium II 350MHz machines!

Summary, Conclusions & Future Work

True parallel supercomputing is now easily within the reach of the home user. Applications in raytracing are readily available as are applications in molecular modeling, electromagnetics and weather forecasting.

Two modest (200-266Mhz) machines can perform nearly as well as a 400Mhz Pentium II machine. Older, slower 486 class machines can help further reduce processing times, but the real benefits seem to be in having machines that are nearly equal in speed and power. A modest cluster can allow a raytracer like PVMPOV to produce quality animations quickly.

Dual processor, SMP performance can be reliably obtained by using Intel Celeron CPU's using the Abit BP6 motherboard for under $600. Those daring enough to void the their Intel warranties can tweak the Celeron 366 to 458MHz with little extra effort.

PVMPOV performance is image related. The main parameter to tweak is the -nh & -nw switch values. There is an optimum based on the image being rendered and the cluster configuration.

I'd like to follow this initial work up in the following areas:

  1. Learn to program and use PVM and MPI in my own applications.
  2. Try using the Pentium II optimized compiler pgcc to rebuild the Linux Kernel, PVM and PVMPOV.
  3. Upgrade the pondermatic network to switched 100 Mbit/sec fast Ethernet or at least channel bonded 10 Mbit/sec Ethernet.
  4. Determine optimum PVMPOV -nh & -nw values for different configurations and images.
  5. Test PVMPOV animation frame rendering.
  6. Configure machines to use diskless boot, and NFS. This will lower the cost of adding systems, as no hard disk will be needed. For my progress in this area see my diskless boot page.

The bottom-line is if you feel the need for speed and are on a budget a beowulf cluster may well be the answer.


Contact Rick Bono at: rjbono@hiline.net


Applied Synergetics Main Page