Teaching Parallel Thinking to the Next Generation of Programmers
Total Page:16
File Type:pdf, Size:1020Kb
Teaching Parallel Thinking to the Next Generation of Programmers Brian Rague Computer Science, Weber State University Ogden, UT 84408, USA fronting multi-core development; programming ABSTRACT methodology was listed as the number one issue. The role of the application developer in the context Most collegiate programs in Computer Science of- of the multi-core environment is far from settled, fer an introductory course in programming, but two points can be stated with some certainty: primarily devoted to communicating the founda- tional principles of software design and develop- (1)Parallel and High Performance Computing (HPC) ment. Methodologies for solving problems within a as implemented on modern supercomputing plat- discrete computational context are presented. Lo- forms is not new, going back to vector computer gical thinking is highlighted, guided primarily by a systems in the latter half of the 1970s. [9] sequential approach to algorithm development and made manifest by typically using the latest, com- (2)Many computer science programs have tradition- mercially successful programming language. In re- ally viewed parallel programming as an advanced sponse to the most recent developments in ac- topic best suited for either graduate students or cessible multi-core computers, instructors of these Senior-level undergraduate students. introductory classes may wish to include training For some perspective on point (2) above, Pacheco on how to design workable parallel code. Novel is- [17] noted in his book published in 1997 that sues arise when programming concurrent applica- “most colleges and universities introduce parallel tions which can make teaching these concepts to programming in upper-division classes on com- beginning programmers a seemingly formidable puter architectures or algorithms.” Yet Wilkinson task. This paper presents a classroom strategy for and Allen [24] asserted that selected topics from introducing the core skills that reinforce parallel the first part of their 2005 textbook helped to in- thinking. A computerized visualization tool is de- troduce their “first-year students to parallel pro- scribed that immediately connects the student gramming.” with the unique challenges associated with parallel program conceptualization. This instructional ap- Indeed, concerted efforts have been made the last proach has the potential to make parallel code ten years to introduce parallel computing issues design principles available to students at the very earlier in the undergraduate CS curriculum beginning of their Computer Science education. [1][6][12][16]. CS1 as defined by the ACM is an introduction to computer programming course Keywords: parallel, programming, computer sci- offered to first-year CS students. The primary ob- ence, computational, instruction. jective of the class is to sharpen the problem solv- ing skills of prospective developers using software engineering strategies and programming tools. 1. INTRODUCTION The class is centered mostly on refining and shap- ing the thinking of these new students rather than Integrated circuit manufacturers like Intel, AMD, stressing detailed computational mechanisms. and Sun are currently producing multi-core chips During this class, important foundational comput- for commodity computational devices. Current ad- ing issues such as modularization and basic code vertising campaigns for desktops and laptops em- structure can be addressed from a high conceptual phasize their multiprocessing features. Smaller viewpoint. hand-held and embedded machines also benefit from this technology. This distinctive learning context of the CS1 class gives the instructor an opportunity to present a Research in multi-core computer architectures variety of ideas before any particular one is adop- continues to move forward, though many challen- ted by the student. Specifically, the student can ging issues still remain [5][8][11]. Davis [7] con- more readily absorb the perspectives related to structed a top ten list of the key problems con- parallel thinking before getting “locked in” to a Note that both PQ and the speedup factor are both consistently sequential mode of software analysis. functions of p and n. The execution time of a par- t This paper presents a CS1 classroom strategy for allel program is the sum of comp and tcomm , introducing the core skills that reinforce parallel with the first factor heavily dependent on al- thinking, primarily based on a computerized visu- gorithmic overhead and the second factor heavily alization tool that immediately connects the stu- dependent on interaction overhead. dent with the unique challenges associated with The PQ number is a measure of the quality of the parallel program conceptualization. parallelization effort. A high PQ would suggest that parallelization is generally effective and essentially 2. CS1 APPROPRIATE PERFORMANCE ANA- the right course of action. A low PQ suggests pos- LYSIS sible diminishing returns through parallelization. Various measures can be used to evaluate parallel A PQ number could be determined not only for an programs[18][24]. When comparing sequential entire application, but also for select sections of and parallel performance for a single application or code. One of the challenges confronting the stu- kernel, the number of primary interest is often the dent programmer is identifying which parts of the speedup factor S(p,n), defined as the ratio of se- code are good candidates for parallelization. By quential execution time to parallel execution time. monitoring PQ values for code sections that pos- The variable p is the number of processors, n is a sess inherent parallelization such as loops, the measure of the “size” of the program, and it is as- student will gain familiarity and confidence with sumed that the “best” sequential algorithm is used the paradigm of parallel thinking. for this relative analysis. If required, there will be ample time to introduce When scalability is factored in, the speedup com- the student to threading concepts and detailed putation assumes different forms depending on synchronization mechanisms such as semaphores constraints such as constant problem size, con- and mutexes later in the CS program. The reduc- stant parallel execution time, or memory. Although tionist PQ analysis described here seeks to expose speedup is a useful measure which encourages the the CS1 student to important high level parallel student programmer to increase the percentage of design concepts at a crucial time in their program- parallelizable code wherever possible, it abstracts ming careers, possibly circumventing adoption of a away critical program design issues that are valid rigid approach in which only sequential thinking is topics of study for CS1 students. applied to computing problems. Developing parallel programs requires a working knowledge of both architecture and algorithms. 3. PARALLEL ANALYSIS TOOL (PAT) CS1 students can develop an immediate appreci- Several effective visualization tools geared to- ation for this interplay between hardware and soft- wards streamlining the development of parallel ware by carefully considering why a sequential programs have been designed, such as those de- program does not simply run p times faster on a scribed by Stasko[22], Kurtz[13] and Carr[4]. Carr machine with p processors. states, “There are few pedagogical tools for teach- Sivasubramaniam[20] aptly describes the differ- ing threaded programming.” Indeed, threads cur- ence between linear speedup and real execution rently represent the de facto primitive for pro- time as the sum of algorithmic overhead and inter- grammers delving into parallel application code. action overhead. CS1 students would be well However, the real utility of the thread concept can served by a performance metric which opens dis- be traced to OS design and improved hardware cussion about both algorithmic overhead issues implementations. Threads motivated the develop- (inherently serial code, balanced task manage- ment of a more flexible Unix kernel [21] and more ment) and interaction overhead (contention, efficient execution techniques in processor design, latency, synchronization, resource management, most notably extending superscalar performance and cache effects.) into fine-grained temporal multithreading and sim- A measure more suitable to the student and ped- ultaneous multithreading[23]. agogical needs of the CS1 classroom is the com- Threads, with their associated contention and syn- putation/communication ratio which will be re- chronization concerns, present a fairly complex ferred to as the parallel quotient (PQ): picture of parallel coding to the beginning CS1 programmer. Although tools which support visual- tcomp PQ( p,n) = (1) ization of multi-threaded program execution and tcomm the various synchronization objects and monitors may be of great benefit to the Senior-level CS stu- dent, there is a finite likelihood that first year CS student programmers would become overwhelmed and possibly discouraged by this level of detail. Lee[14] has stated outright that the thread model may not be best for parallel application design. Currently, there is also an active investigation into transactional memory (TM), which aims to make the programming model simpler, freeing the de- veloper from lock management tasks[2][19]. The recent Rock processor from Sun offers TM support. The challenge facing the CS1 student is one of re- cognition: given a programming model and under- lying parallel architecture, which sections of code are