Submitted under Task Agreement GSFC-CT-2
Cooperative Agreement Notice (CAN) CAN-00OES-01
Increasing Interoperability and Performance of Grand Challenge Applications in the Earth, Space, Life, and Microgravity Sciences
Version 1.0
(Initial release: 8/30/2002)
(Latest revision: 10/8/2002)
Installation of a PC cluster at an Investigator's home site
Install a Linux cluster at NASA/GSFC, consisting of two "queen" nodes and at least 192 "compute" nodes connected by at least fast ethernet. The queen nodes will support control and I/O functions and consist of dual 1.5 GHz processors, 4 GB RAM, and 2 TB storage or better. The compute nodes will support computations and consist of single 1.2 GHZ processors, 512 MB RAM, and 80 GB storage or better. We anticipate gigabit connections between the compute node switches and the queen nodes as well as between the queen nodes and the GSFC network.
1. Background
The Land Information System is designed to perform land surface simulation and data analysis in real-time, at a spatial resolution as high as 1 km. This ambitious effort, with limited budget meanwhile, requires us to seek a powerful computing infrastructure with a high performance/price ratio. Linux Beowulf clusters have established themselves as a cost-effective way to provide high-performance computing power for scientific applications, and thus a cluster solution with commodity parts becomes our natural selection.
The construction of the cluster started with intensive research and analysis since March, 2002. Prototyping and small scale testing were carried out in the following months. Meanwhile, to contain the cost of the cluster, we decided to start from commodity parts, instead of assembled computers. Various parts, such as CPUs, fans, mainboards, hard drives, etc., were being ordered from various vendors, and the shipments started to arrive as early as June.
A small cluster with 27 nodes was up and running by late July, 2002.
As more computer parts arrived, intensive assembling of the computers
took place in the basement office H040B of Building 33, during the first
two weeks of August. By the end of the third week, all the 192 compute
nodes
had been
assembled, burned in, mounted and connected, and started running as a
cluster in the basement space H040A of Building 33.
Meanwhile, within the limit of our budget,
we added 8, instead of two, IO nodes
("queen" nodes as named in the milestone text) in the cluster for
much more IO capacity and fault tolerance. Thus the LIS cluster
now has 200 nodes in total (192 compute nodes and 8 IO nodes).
Figure 1 below is a recent picture of the
cluster.

2. Physical Architecture of the Cluster
Figure 2 shows the physical and network structure of the LIS Beowulf cluster.The cluster consists of 192 computing nodes and 8 IO nodes, interconnected with 8 Ethernet switches. Each switch has 24 fast Ethernet ports and 2 gigabit ports.
The 192 computing nodes are divided into 8 sub-clusters, with 24 nodes in each sub-cluster, interconnected via the fast Ethernet ports on one of the switches. The 8 switches communicate to each other via 2Gbps interconnects. The 8 IO nodes are interconnected and connected to the computing nodes via the gigabit ports on the switches.
The use of 8 sub-clusters and 8 IO nodes is mainly for the segregation of network traffic resulting from non-local file IO operations, and for the spreading of data storage so each IO node does not have to deal with single big files. So on average each IO node will only need to serve the IO requests of 24 computing nodes, and only store 1/8 of the output information, which makes the output volume managable.
3. Hardware Specs
The cluster has 200 computers in total. Eight of them are the IO nodes, or "queen" nodes. Each IO node has dual AMD XP 2000 CPUs, 2 GB DDR memory, 220 GB internal disk storage, 2 built-in Ultra-SCSI channels, 2 built-in fast Ethernet adapters, and 1 gigabit Ethernet adapter (two of the IO nodes have two gigabit adapters each). Meanwhile, the 8 IO nodes share 2 (eventually 4) external Promise RAID systems for file storage, with a capacity of 1 TB in each RAID system. The remaining 192 computers are compute nodes. Each of them has an AMD XP 1800 CPU, 512 MB memory, an internal IDE hard drive of 81 GB, and a built-in fast Ethernet adapter.
Overall, the LIS cluster has 208 AMD XP processors of 1.53 GHz and above, 112 GB of memory, 21 TB of disk space, 192 fast Ethernet connections, and 10 gigabit Ethernet connections. Obviously most of the configurations exceeded the requirements specified in the milestone agreement.
Table 1 below shows more detailed specification of the
cluster hardware.
[an error occurred while processing this directive]
4. Supporting Software Installed
IO nodes:
- Operating system: Red Hat Linux 7.3, Linux Kernel 2.4.18-3smp, with updated Intel gigabit NIC driver e1000-4.3.2.
- Web services: Apache 1.3.26 web server.
- Network file services: NFS v2.
- Name services: bind 9.1.3-4.
- Remote access: openssh 3.4p1
- Network boot services: dhcp-2.0pl5-8, tftp-server-0.17-14, and PXElinux (syslinux-1.75, http://syslinux.zytor.com/).
- System time synchronization: ntp 4.1.1-1 (http://www.eecis.udel.edu/~ntp/).
- Hardware monitoring: c-rpm, a software tool developed in-house, serving as a web-based frontend of lm-sensors to monitor the nodes temperature and fan speed.
- System monitoring: Big Brother 1.9c (http://bb4.com/), a generic system monitoring system with web-based interface. It monitors CPU load, server status, disk usage, processes, etc.
- Network management services: snmpd from ucd-snmp-4.2.1-7 (http://net-snmp.sourceforge.net/)
- Network traffic monitoring and analysis: Multi-Router Traffic Grapher 2.9.22 (MRTG: http://people.ee.ethz.ch/~oetiker/webtools/mrtg/), a web-based tool to monitor the traffic load on network links, via SNMP polling.
- Compilers: gcc-2.96 and GNU Fortran 0.5.26 20000731 (Red Hat Linux 7.1 2.96-98).
- Parellel computing environment: MPICH-1.2.4 (http://www-unix.mcs.anl.gov/mpi/mpich/).
- Performance analysis tools: Multi-Processing Environment (MPE) 1.2.2 (http://www-unix.mcs.anl.gov/ mpi/mpich/).
Computing nodes:
- Operating system: Red Hat Linux 7.2, Linux Kernel 2.4.7-10.
- Hardware monitoring: lm_sensors 2.5.5-6 (http://secure.netroedge.com/~lm78/)
- System time synchronization: ntp 4.1.1-1 (http://www.eecis.udel.edu/~ntp/).
- Network management services: snmpd from ucd-snmp-4.2.1-7 (http://net-snmp.sourceforge.net/)
- Remote access: openssh 2.9p2-7 (internal use only)
5. LIS Cluster in action
- Real-time monitoring of node status and services
Figure 3 below shows a screenshot of the system we use to monitor, in real-time, the nodes status and one of the representative network services, ssh.
- Real-time network traffic monitoring and analysis
Figure 4 below is a screenshot of the syste we use to monitor, in real-time, the network traffic of each node.
- Parallel computing demo 1: Mandelbrot fractals
This demo uses 8 processors from the cluster, using the MPE libraries for the graphics. Figure 5 below shows a screenshot of the running animation with 8 processors running in parallel.

- Parallel computing demo 2: Performance profiling
This demo uses 32 processors from the cluster to run the test program "srtest". Then its performance is analyzed using the parallel performance analysis and profiling tools installed on the cluster. Figure 6A below shows the event count vs. time plot.

Figure 6B below shows the timing of each of the 32 processes on different states during the same test.

- Parallel computing demo 3: MPI performance benchmarking of the cluster
Running 188 nodes, we performed standard MPI communication benchmarking tests. The two images below show the point-to-point (Figure 7A) and bisection (Figure 7B) bandwidth of the cluster. A few nodes in the cluster were not used for the test due to hardware problems they have been having. A snapshot of the network bandwidth usage during the time of the tests can be viewd by clicking here.


6. Summary and future work
The LIS Beowulf cluster has matured from the 27-node prototype implemented in July, 2002 to the full implementation of 192 compute nodes plus 8 IO nodes. The hardware configuration of the cluster has met, and in some areas, exceeded the requirments of the project milestone. A whole array of supporting software, including system monitoring, system management, network traffic anaysis, parallel computing environment, and performance testing suites, has been intalled, configured and running. A series of preliminary performance testing and benchmarking have been carried out, and the results showed that the cluster is fully functional, and is ready for production runs.
There are a few areas in the LIS cluster environment which require further efforts:
- More systematic benchmarking tests will be run, and based on the results, fine tuning of the cluster will be performed.
- Trouble-shooting of a few problematic nodes and installation of a few backup nodes will be completed. The extensive use of commodity parts for the LIS cluster, on one hand, increases the performance/price ratio, and on the other hand, forces us to design highly fault-tolerant applications which take the node failures into account. Meanwhile, keep some backup nodes handy is highly desirable.
- An array of LIS cluster-specific software will be developed in house, to support the unique applications to be running on the cluster, and to enhance the functionality of the cluster in the areas where no third party packages can be found or used. The software package will be generic enough to port to other Beowulf clusters, however.
Acknowledgement
We greatly appreciate the timely assistance from Uttam Majumder, Nikkia Anderson, Joe Eastman and
Steve Lidard for the construction of the cluster, and constant help
from Aedan Jenkins and Meg Larko from 974, as well as everybody from the LIS team.
Special thanks to Code 900 for providing temporary office space
during the intensive construction of the cluster. The cluster, as well as
the whole project, is funded by NASA's Computational Technologies(CT)
under Task Agreement Number GSFC-CT-2.