Linux Cluster Documentation for the Land Information System

Submitted under Task Agreement GSFC-CT-2

Cooperative Agreement Notice (CAN) CAN-00OES-01

Increasing Interoperability and Performance of Grand Challenge Applications in the Earth, Space, Life, and Microgravity Sciences

Version 1.0
(Initial release: 8/30/2002)
(Latest revision: 10/8/2002)

Milestone f (optional), due August, 2002

Installation of a PC cluster at an Investigator's home site

Install a Linux cluster at NASA/GSFC, consisting of two "queen" nodes and at least 192 "compute" nodes connected by at least fast ethernet. The queen nodes will support control and I/O functions and consist of dual 1.5 GHz processors, 4 GB RAM, and 2 TB storage or better. The compute nodes will support computations and consist of single 1.2 GHZ processors, 512 MB RAM, and 80 GB storage or better. We anticipate gigabit connections between the compute node switches and the queen nodes as well as between the queen nodes and the GSFC network.


1. Background

The Land Information System is designed to perform land surface simulation and data analysis in real-time, at a spatial resolution as high as 1 km. This ambitious effort, with limited budget meanwhile, requires us to seek a powerful computing infrastructure with a high performance/price ratio. Linux Beowulf clusters have established themselves as a cost-effective way to provide high-performance computing power for scientific applications, and thus a cluster solution with commodity parts becomes our natural selection.

The construction of the cluster started with intensive research and analysis since March, 2002. Prototyping and small scale testing were carried out in the following months. Meanwhile, to contain the cost of the cluster, we decided to start from commodity parts, instead of assembled computers. Various parts, such as CPUs, fans, mainboards, hard drives, etc., were being ordered from various vendors, and the shipments started to arrive as early as June.

A small cluster with 27 nodes was up and running by late July, 2002. As more computer parts arrived, intensive assembling of the computers took place in the basement office H040B of Building 33, during the first two weeks of August. By the end of the third week, all the 192 compute nodes had been assembled, burned in, mounted and connected, and started running as a cluster in the basement space H040A of Building 33. Meanwhile, within the limit of our budget, we added 8, instead of two, IO nodes ("queen" nodes as named in the milestone text) in the cluster for much more IO capacity and fault tolerance. Thus the LIS cluster now has 200 nodes in total (192 compute nodes and 8 IO nodes). Figure 1 below is a recent picture of the cluster.



Figure 1: A picture of the LIS Beowulf cluster. The cluster is located in Room H040A, Building 33. Picture taken on Aug. 29, 2002.






2. Physical Architecture of the Cluster


Figure 2: Physical architecture and network layout of the LIS Beowulf cluster. The 8 IO nodes' hostnames are X1, X2, ..., X8, respectively, while each compute node is named "A1, A2, ..., A24, B1, B2, ..., B24, ..., H1, H2, ..., H24", depending on which of the 8 sub-clusters it belongs. The blue lines indicate gigabit Ethernet connections, and the black lines fast Ethernet ones.









Figure 2 shows the physical and network structure of the LIS Beowulf cluster.The cluster consists of 192 computing nodes and 8 IO nodes, interconnected with 8 Ethernet switches. Each switch has 24 fast Ethernet ports and 2 gigabit ports.

The 192 computing nodes are divided into 8 sub-clusters, with 24 nodes in each sub-cluster, interconnected via the fast Ethernet ports on one of the switches. The 8 switches communicate to each other via 2Gbps interconnects. The 8 IO nodes are interconnected and connected to the computing nodes via the gigabit ports on the switches.

The use of 8 sub-clusters and 8 IO nodes is mainly for the segregation of network traffic resulting from non-local file IO operations, and for the spreading of data storage so each IO node does not have to deal with single big files. So on average each IO node will only need to serve the IO requests of 24 computing nodes, and only store 1/8 of the output information, which makes the output volume managable.

3. Hardware Specs


The cluster has 200 computers in total. Eight of them are the IO nodes, or "queen" nodes. Each IO node has dual AMD XP 2000 CPUs, 2 GB DDR memory, 220 GB internal disk storage, 2 built-in Ultra-SCSI channels, 2 built-in fast Ethernet adapters, and 1 gigabit Ethernet adapter (two of the IO nodes have two gigabit adapters each). Meanwhile, the 8 IO nodes share 2 (eventually 4) external Promise RAID systems for file storage, with a capacity of 1 TB in each RAID system. The remaining 192 computers are compute nodes. Each of them has an AMD XP 1800 CPU, 512 MB memory, an internal IDE hard drive of 81 GB, and a built-in fast Ethernet adapter.

Overall, the LIS cluster has 208 AMD XP processors of 1.53 GHz and above, 112 GB of memory, 21 TB of disk space, 192 fast Ethernet connections, and 10 gigabit Ethernet connections. Obviously most of the configurations exceeded the requirements specified in the milestone agreement.

Table 1 below shows more detailed specification of the cluster hardware.


[an error occurred while processing this directive]

4. Supporting Software Installed

IO nodes:


Computing nodes:



5. LIS Cluster in action





Figure 3: A partial screenshot of the cluster node status monitoring system using Big Brother. Green square icons indicate working status, while red cross icons indicate failures.











Figure 4: A partial screenshot of the real-time cluster network traffic monitoring system. Green lines indicate incoming traffic to a node, and blue ones outgoing traffic from the node.











Figure 5: Testing Mandelbrot. Eight nodes were used to run this parallelized version of the Mandelbrot fractals. MPE libraries were used to display the fractal graphics on X windows. This is a screenshot of the animation.











Figure 6A: Performance analysis of the "srtest" program distributed with MPICH: event count vs. time plot. The test program used 32 nodes in this case.






Figure 6B below shows the timing of each of the 32 processes on different states during the same test.


Figure 6B: Performance analysis of the "srtest" program distributed with MPICH: connected states of all the processes during the test. The test program used 32 nodes in this case.








Figure 7A: Cluster MPI communication performance analysis: point-to-point bandwidth test. There were 188 nodes in this test.







Figure 7B: Cluster MPI communication performance analysis: bisection bandwidth test. There were 188 nodes in this test.





6. Summary and future work

The LIS Beowulf cluster has matured from the 27-node prototype implemented in July, 2002 to the full implementation of 192 compute nodes plus 8 IO nodes. The hardware configuration of the cluster has met, and in some areas, exceeded the requirments of the project milestone. A whole array of supporting software, including system monitoring, system management, network traffic anaysis, parallel computing environment, and performance testing suites, has been intalled, configured and running. A series of preliminary performance testing and benchmarking have been carried out, and the results showed that the cluster is fully functional, and is ready for production runs.

There are a few areas in the LIS cluster environment which require further efforts:



Acknowledgement

We greatly appreciate the timely assistance from Uttam Majumder, Nikkia Anderson, Joe Eastman and Steve Lidard for the construction of the cluster, and constant help from Aedan Jenkins and Meg Larko from 974, as well as everybody from the LIS team. Special thanks to Code 900 for providing temporary office space during the intensive construction of the cluster. The cluster, as well as the whole project, is funded by NASA's Computational Technologies(CT) under Task Agreement Number GSFC-CT-2.