2006-1733: DESIGNING AND IMPLEMENTING A CURRICULUM BASED ON BEOWULF CLUSTERING

Fitra Khan, University of Texas-Brownsville Mahmoud Quweider, University of Texas-Brownsville Juan Iglesias, University of Texas-Brownsville Amjad Zaim, University of Texas-Brownsville Page 11.418.1

© American Society for Engineering Education, 2006

Designing and Implementing a Parallel Computing Curriculum Based on Beowulf Clustering1

Introduction

The Computer Science/Computer Information Systems (CS/CIS) Department at The University of Texas at Brownsville (UTB) has improved its curriculum by including parallel computing topics based on a computing and networking laboratory (CNL)1. Built around a 24-node distributed Beowulf2,3 , the main goal of CNL is to enhance the understanding of parallel computing principles in key courses of the Bachelor of Science in Computer Science (BS-CS) degree, the two-year Associate in Applied Science in Computer Information Systems (AAS-CIS), and the four-year Bachelor of Applied Technology in Computer Information Systems Technology (BAT-CIST).

The strategy has been to use this supercomputer as the main instrument to infuse concepts and principles into targeted courses by creating a set of laboratory modules and capstone projects. Such project framework in CS education is strongly emphasized in the ACM/IEEE-CS curricula model4. CNL has aided in motivating the students by engaging them in integrating and networking concepts into their course work through laboratory modules and capstone projects.

There are benefits in joining the practice and theory of different computer science areas via an integrated laboratory environment such as the one provided by CNL. First, it is easier to develop laboratory modules that help students to put different theoretical concepts together5,6. Second, an integrated laboratory is a low-cost solution compared to developing separate physical laboratories to serve different areas of computer science.

The laboratory has proved to be a dynamic educational tool for providing in depth understanding of essential concepts by incorporating state-of-the-art technologies into the curricula. This has allowed educators to keep on developing new laboratory modules for enriching their courses. In addition to currently implemented modules in areas like networking, databases and operating systems, new modules in areas such as encryption, autonomous intelligent systems, and web design and programming are planned to be developed, for example.

After being supported originally by NSF, the CNL project has reached maturity and it is now institutionalized. This paper details the rationale, scope and achievements of the project. The

Page 11.418.2 1 This material is based upon work supported by the National Science Foundation under Grant No. 0101648.

methodology used is also discussed with emphasis on considerations and feasibility for implementing similar computing and networking environment at peer institutions.

Laboratory Design

CNL project is built around the concept of a laboratory which offers laboratory projects in key courses of computer science. The equipment consists mainly of 24 computers, three Alpha workstations, and network hardware used to build a 24-node rack-mounted Beowulf cluster. The 24-node Beowulf cluster currently runs using open source operating system. The clustering software used is based on a Message Passing Interface (MPI) package called MPICH7 which is available free of cost. MPI based software packages/toolkits are used to familiarize students with real-world tools to develop and implement algorithms with a short development cycle.

The Beowulf cluster is complemented with devices to develop real-world laboratory projects in order to enhance student understanding of important concepts of computer science. For example, network devices to simulate leased lines of Public Switched Network (PSN)8 are installed to provide a true network environment of the real-world. As another example, image capturing devices were acquired to capture an image of an object for recognition by a neural network based pattern recognition algorithm.

The hardware is interconnected using network auxiliary devices to provide a locally simulated PSN that models the real-world connectivity environment. The network hardware includes a 400Mbps network switching matrix with a 100Mbps Fast Ethernet uplink to the building's Gigabit backbone. The backplane is attributed by 10Mbps switches (VN900EE and VN900EA) providing a total of one ATM port and 36 10Mbps switched ports. The building's LAN is connected to the Internet via a GigaMAN circuit leased from the local phone provider.

One of the three Alpha workstations is used for distributing tasks to the Beowulf and also for accepting tasks from connected users. This Alpha workstation is the management station for the Beowulf. The second Alpha workstation is used to compile and analyze results produced by the Beowulf. It is a dedicated user workstation due to graphics required to analyze data. The third Alpha workstation is placed on the far side of the simulated Public Switched Network (PSN). Among the many tasks, it is used to simulate congestion on the PSN by transferring large amounts of data back and forth across the PSN.

Figure 1 shows the general schematic of CNL. The laboratory houses the 24 computers that constitute the 24-node rack-mounted Beowulf as a central component of B-CEIL. Network devices are required to simulate a real-world PSN. This consists of a pair of T1-to-V.35 devices to simulate a leased line8, a pair of DACs to aggregate or cross-connect different channels of T1's, a pair of routers to provide WAN-to-LAN connectivity at each end of the leased line, and VoIP units on each end to simulate real-world voice grade channels. The Beowulf nodes and other LAN equipment are connected by a hub backplane. A LAN switch provides connectivity to the LAN devices on the other side of the simulated PSN. Page 11.418.3

Illustration 1: Overall CNL configuration.

Hardware is also available to introduce image processing algorithms9. Transducers are used to convert video signals to binary frame format fit for image processing10. A high resolution Charged Coupled Device (CCD) camera and a comparable image capturing card is used to capture high resolution images in order to real-world images. A pair of video codec’s is used for benchmarking student algorithms.

Implementation

The authors participated during the implementation of the project; each one was scheduled to teach two different courses per semester for which the corresponding laboratory modules (LM) were developed. A total of eight courses were selected for utilizing B-CEIL in the first year for this project: COSC 3330 Networking and Database Management Systems, COSC 3310 Systems Programming and Concurrent Processes, COSC 3325 Digital Logic and Computer Organization, COSC 4310 Operating Systems, COSC 3355 Principles of Programming Languages, COSC 4342 Database Management Systems, COSC 4360 Numerical Methods, and COSC 4380 Image Processing.

Two levels of student laboratory projects were developed for curriculum enrichment. Appendix A presents a finer LM´s breakdown including the subject areas in which they were utilized to enhance understanding of essential concepts.

The first level of student laboratory projects was related directly to the Beowulf cluster itself, Page 11.418.4 specifically, its hardware architecture, connectivity, and existence as a logical cluster. This

entails the development of laboratory projects on topics such as computer interfacing, Local Area Networking (LAN), clustering, task scheduling and optimization, and benchmarking.

The second level of student laboratory projects were focused on setting up a PSN and associated data/voice channels to model the real-world connectivity, and building applications for Beowulf in the simulated PSN environment. This includes the development of laboratory projects in the area of Wide Area Networking (WAN), in order to enhance understanding of real-world PSN based connectivity, and computationally intensive fields such as artificial neural networks, image compression, image analysis, numerical analysis, and distributed databases where parallel processing concepts may be utilized to speed up computations.

LM’s were developed so that students could complete them in one to two weeks during a regular semester instruction. In addition to LM’s, course projects (CP) were also proposed to students. These CP’s had a long term nature in the sense that they were intended to be developed during the entire semester and carried on during different offerings of the same course from one semester to another.

Topics of CP’s were not restricted to the ambit of a single particular course. Instead, CP’s were developed having in mind a crossing-discipline emphasis that could integrate different areas of computer science. Appendix B shows a more detailed description of the CP’s.

As the reader can appreciate from Appendix B, the topics of CP’s are wide in range going from an “Integrated Monitoring System” for public networks to the “Parallel Simulation of Electromagnetic Wave Propagation” and “Optimization Based on Genetic Algorithms”. This variety is in fact a reflection of the versatility and generality of the CNL.

Results

During the three years of its implementation, the project has proven to be successful. The laboratory was opened in fall 2002 and it has remained operational ever since. During this time students and faculty have received the benefit of this lab.

LM’s created so far cover topics such as Beowulf cluster design and implementation, benchmarking computational machines, public switched networks, voice over IP, monitoring network traffic bandwidth, image processing, Taylor series, number representation, LAM and MPI for Beowulf clusters, concurrent and parallel processing, database paging management, and several others.

Besides the application of the modules in the classroom, it is interesting to note that the laboratory has been used as a recruitment and retention tool. Additionally, students and faculty have used the laboratory in projects that have motivated students to advance in research and continue their education pursuing graduate studies. As a matter of fact students have already presented results from their scholarly work11,12.

Some of the research projects motivated by the laboratory include topics like hybrid Page 11.418.5 software/hardware approaches for teaching digital logic, implementation of multithreaded web

servers using Java, implementation of integrated monitoring systems, studying the effects of congestion control on multimedia applications, and software/hardware simulation of multi- functioned calculators, among others.

Each of the laboratory modules and course projects can be freely accessed at http://blue.utb.edu/bceil. Further information about the research projects is located on this web site as well.

Conclusions

In this paper the authors have presented the development, current state and future work of a parallel computing and networking laboratory built around a 24-node Beowulf cluster that is intended to raise the educational levels at The University of Texas at Brownsville. The project has been proven to be a successful tool in terms of stimulating students enrolled in participating courses.

The program is currently being institutionalized at UTB. Faculty and students now benefit from having this tool for research purposes and the development of laboratory modules.

Since our programs in computer science and computer information systems at UTB did not previously have hardware lab, CNL has had a great impact on our ability to provide opportunities for our students to understand the contents of wide variety of computer courses. Furthermore, CNL has proved to be a powerful tool in terms of enrollment and retention.

Acknowledgments

The authors would like to acknowledge all the students that have made CNL a successful project. We specially thank Francisco Arteaga, Mario Guajardo, Ariel Martinez, Brian W. Matthews, David Ortiz, Julie Pedraza, and Jose D. Zamora.

Bibliographic Information 1. Khan, F. and Quweider, M., “Beowulf based Curriculum Enrichment Integrated Laboratory,” National Science Foundation ATE Grant 2001. 2. Sterling, T. et. al., “How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters,” The MIT Press, 1999 3. Spector, D., “Building Linux Clusters: Scaling Linux for Scientific and Enterprise Applications,” O’Reilly & Associates, Inc., 2000 4. The Joint Task Force on Computing Curricula IEEE-CS ACM, “Computing Curricula 2001 Computer Science”. 5. Khan, F., “Lessons Learned from an NSF Pilot Project on Minority Student Retention,” Proc. of the Frontiers In Education (FIE) Conference '97, Pittsburgh, Pennsylvania, Nov. 5-8, 1997 6. Khan, F. and Siddique, B., “An NSF Pilot Project on Minority Student Retention,” Proc. of the Frontiers In Education Conference '96, Salt Lake City, Utah, November 6-9, 1996 7. Gropp, W., Lusk, E., and Skjellum, A., “Using MPI: Portable Parallel Programming with the Message-Passing Page 11.418.6 Interface,” The MIT Press, 1999

8. Bates, R. and Gregory, D., “Voice and Data Communications Handbook,” McGraw Hill, 1998 9. Efford, N., “Digital image processing: a practical introduction using Java,” Addison-Wesley publishing, 2000 10. Sonka, M., Hlavac, V. and Boyle, R., “Image processing, analysis and machine vision,” PWS publishing, 1999 11. Guajardo, Mario, “Integrated Monitoring System”, 30th Annual National Conference. Society for Advancement of Chicanos and Native Americans in Science, Albuquerque, NM, October 2003 12. Pedraza, Julie “Beowulf-based Curriculum Enrichment Laboratory”, Annual UT-System AMP-NSF Research Conference. University of Texas at El Paso, El Paso, Texas, October 2002.

Appendix A. Laboratory Modules

A.1 First level of projects

LM1 — Physical aspects of constructing a cluster: The students will become familiarized with different components of a Beowulf node, its architecture, and the required switching matrix.

LM2 — Benchmarking computational machines: The students will learn to benchmark clusters and standalone machines to rate computational speed up factor provided by parallel machine architecture.

LM3 — Scheduling & optimization: Proper scheduling algorithms will be programmed and utilized by students to optimize the utility of a parallel processing machine.

A.2 Second level of projects

LM4 — Public Switched Network (PSN): The students will learn the realities of real-world network traffic over simulated leased lines, for example, T1's. By setting up a local PSN and associated data/voice channels over simulated leased lines, they will learn how the real-world PSN and its components invariably affect network speed and integrity between two distant locations no matter if a Beowulf cluster is connected to it. Students will program Direct Access Cross-connects (DACs) to aggregate (trunking or bonding process) leased lines to provide different connection bandwidths. They will learn programming of routers for end-to-end connectivity of LANs over a PSN. They will understand how a supercomputer's processing load needs to be balanced given the slow connection between two distant locations through a PSN.

LM5 — Voice over IP (VoIP): Merging of telephone traffic into normal data network traffic poses new challenges. VoIP channels will be created by students to provide voice grade channels across the simulated PSN. These voice grade channels will be used in Beowulf applications in order to realize that VoIP channels share network traffic and are susceptible to network congestion. Students will also learn bonding of these VoIP channels to provide greater bandwidth.

LM6 — Pattern recognition: The students will parallelize pattern recognition algorithms of neural networks to speed up pattern recognition process. Page 11.418.7

LM7 — Coding: Compression is concerned with reducing prohibitively large amounts of data generated by image acquisition and capture devices. Such reductions are essential in environments of limited bandwidth or limited storage. Parallelizing such techniques will contribute in the reduction of the transmission time and time delay associated with it. Videoconferencing and telebrowsing would be such applications. Audio and image compression techniques will be used to demonstrate the application of parallel processing in coding images in order to compress them for delivery over a slower medium such as a voice channel. To simulate a real-world voice grade channel, Voice over IP (VoIP) equipment will be utilized to provide a voice channel between the Beowulf cluster end and the other side across the simulated PSN.

LM8 — Image Processing Operations: Image processing applications range from noise reduction, image sharpening, feature extraction, pattern deduction, and image analysis. Due to their discrete multidimensional nature they are good candidates for parallel processing. The need and benefit of parallelizing algorithms will be demonstrated in real-world image processing techniques. Given an average image size (say 5000 X 5000), a simple smoothing operation which removes noise contamination from the image requires over 250 million multiplications and additions. This would take several minutes to complete on a common desktop workstation, whereas it is expected to be much faster on a Beowulf cluster.

LM9 — Simulation: A model of a real-world system will be created and simulated on the available cluster. Then random events will be generated in order to study the affect of different kinds of events on the behavior of the system.

LM10 — Matrix operations: The cluster will be used to teach students how to program matrix operations in order to speed up the computations based on parallel processing.

LM11 — Concurrent & Parallel Processing: The students will be taught how to parallelize a computational problem. The difference between concurrent and parallel processing will be made clear by executing the developed parallel algorithms on one machine and then on a cluster.

LM14 — Distributed Databases: Parallelizing search algorithm on distributed databases will be demonstrated. This will include record-locking and read-sharing of a distributed database.

Appendix B. Course Projects

CP1 — Implementing a Multithreaded Web Server Using Java: This project is aimed at understanding the basic nature of the Internet's five layer protocols: The Application Layer, The Transport Layer, The Network Layer, the Link Layer and the Physical Layer and to allow the student to create a complete Web Server for the application layer using Java as the basic software development tool. The CNL facility is being used to provide an integrated computing and network testing environment. The important concepts of multithreading and distributed

computing are enforced through the creation of a realistic web server which is able to handle Page 11.418.8 multiple requests at a time.

CP2 — Studying the Effects of Congestion Control on Multimedia Application: This project is aimed at integrating knowledge from Image Processing, Computer Networking and Software Engineering to study the effects of congestion control created by limited bandwidth and the use of unreliable transport protocols such as UDP (User Data Protocol). The student will be responsible to setting up the video conferencing equipment purchase by CNL and conducting a quantitative study of congestion control, image quality, delay and jitter will be studied in a systematic way. Also, the effect of image compression techniques on reducing congestion will be also investigated.

CP3 — Software/Hardware Simulating of a multi-functioned BCD based Calculator: This project is aimed at integrating knowledge from Digital Logic Design and Software tool by creating, both in software and hardware, a multi-functioned BCD based calculator. The student will specify the functionality of the calculator; verify this functionality in software after which a hardware implementation using a digital logic kit will be attempted.

CP4 — Creating an Interactive Digital Image Viewer: The goal of this project will be to create an interactive image viewer which will allow the student to capture an image, choose an image processing operation and then apply that operation. The 2048x2048 8-bit Cohu and the Integral high-resolution video capture card will be used for this project.

CP5 — Creating an HTTP based Digital Image Processing Server: The goal of this project is to create an image processing server. This dedicated image processing server will have the ability to accept an image from a client side and perform the operation specified by the client and send the result wrapped in a web page.

CP6 — Integrated Monitoring System. Critical service equipment, such as network equipment, needs to be monitored for continued successful operation. However, due it being spread over several sites, the best way is to monitor it remotely. This project recommends devices and their mode of connectivity which will send signals back & forth between the network center and closets in order to monitor/control different events such as, entry into a communication closet, high temperature/humidity, and power failure. This information is comprehensively displayed at the designated console in the network center. The operator in the network center has to decide if the problem can be taken care of remotely or a visit to the problematic communication closet is necessary.

CP7 — Parallel Simulation of Electromagnetic Wave Propagation: The current project intends to implement, based on a consolidated frequency domain technique, a novel electromagnetic structure simulator with a highly interactive user interface. The software developed is based on the current Beowulf technology provided by CNL. Our goal is to develop a tool that can be used for the simulation and analysis of virtually any electromagnetic transmission system.

CP8 — Optimization Based on Genetic Algorithms: The goal of this research project is to develop a general library for connective methods that can be used as the underlying platform to test diverse approaches for optimization using genetic algorithms. The main role of students in Page 11.418.9 the project will be coding routines under specification for optimization that follow traditional

techniques while researching in a parallel architecture under Beowulf new techniques that may be useful as alternatives to previously proposed work.

Page 11.418.10