A System for Structured High-Performance Multithreaded Programming in Windows NT

The following paper was originally published in the Proceedings of the 2nd USENIX Windows NT Symposium Seattle, Washington, August 3–4, 1998 A System for Structured High-Performance Multithreaded Programming in Windows NT John Thornley, K. Mani Chandy, and Hiroshi Ishii California Institute of Technology For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ A System for Structured High-Performance Multithreaded Programming in Windows NT John Thornley, K. Mani Chandy, and Hiroshi Ishii Computer Science Department California Institute of Technology Pasadena, CA 91125, U.S.A. {john-t, mani, hishii}@cs.caltech.edu Abstract systems. Another part of the reason is the difficulty of developing multithreaded programs, as compared to With the advent of inexpensive multiprocessor PCs, equivalent sequential programs. multithreading is poised to play an important role in The recent advent of inexpensive multiprocessor computationally intensive business and personal com- PCs and commodity OS support for lightweight threads puting applications, as well as in science and engineer- opens the door to an important role for high- ing. However, the difficulty of multithreaded program- performance multithreading in all areas of computing. ming remains a major obstacle. Windows NT support Dual-processor and quad-processor PCs are now avail- for threads is well suited to systems programming, but able for a few thousand dollars. Mass-produced multi- is too unstructured when multithreading is used for the processors with 8, 16, and more processors will soon purpose of speeding up program execution. In this pa- follow. Windows NT [2] can already support hundreds per, we describe a system for structured multithreaded of fine-grained threads with low overhead, even on sin- programming. Thread creation operations are multi- gle-processor machines, and future releases will be even threaded variants of blocks and loops, and synchroniza- more efficient. Examples of applications that could use tion objects are based on Boolean flags and integer multithreading to improve performance include spread- counters. With this system, most multithreaded program sheets, CAD/CAM, three-dimensional rendering, development can be performed using traditional se- photo/video editing, voice recognition, games, simula- quential methods and tools. The system is integrated tion, and resource management. with Windows NT and Microsoft Developer Studio The biggest obstacle that remains is the difficulty Visual C++. We are developing a variety of applications of developing efficient multithreaded programs. Gen- in collaboration with other researchers, to demonstrate eral-purpose thread libraries, including the Win32 API the power of structured multithreaded programming on [1][8] supported by Windows NT, are well suited to commodity multiprocessors running Windows NT. In systems programming applications of threads, e.g., one benchmark application (aircraft route optimization), control systems, database systems, and distributed sys- we achieved better performance on a quad-processor tems. However, the interface provided by general- Pentium Pro system than the best results reported on purpose thread libraries is less well suited to applica- expensive supercomputers. tions where threads are used for the purpose of speeding up program execution. General-purpose thread man- 1. Introduction agement is unstructured and synchronization operations are complex and error-prone. The unpredictable inter- In the past, high-performance multithreading has been actions of multiple threads introduce many problems synonymous with either scientific supercomputing, real- (e.g., race conditions and deadlock) that do not occur in time control, or achieving high throughput on multi-user sequential programming. In many regards, general- servers. The idea of dividing a computationally inten- purpose thread libraries are the assembly language of sive program into multiple concurrent threads to speed high-performance multithreaded programming. up execution on multiprocessor computers is well es- In this paper, we describe our ongoing research to tablished. However, this kind of high-performance mul- develop a system for structured high-performance multithreading has made very little impact in mainstream tithreaded programming on top of the general-purpose business and personal computing, or even in most areas thread support provided by operating systems such as of science and engineering. Part of the reason has been Windows NT. The key attributes of our system are as the rarity and high cost of multiprocessor computer follows: • Structured thread creation constructs are based on 2. Windows NT Multithreading sequential blocks and for loops. • Structured synchronization constructs are based on In this section, we describe the interface and perform- Boolean flags and integer counters. ance of standard Windows NT thread support. Since our • Subject to a few simple rules, multithreaded execu- emphasis is on commodity systems and applications, tion is deterministic and produces the same results Windows NT is the ideal platform on top of which to as sequential execution. build our system for structured multithreaded program- • Lock synchronization is provided for nondeter- ming. The following are particularly important to us: (i) ministic algorithms. the Windows NT thread interface provides the function- • Barrier synchronization is provided for efficiency. ality that we require for our system, and (ii) the Win- Our system is supported at two levels: (i) Sthreads, a dows NT thread implementation efficiently supports structured thread library, and (ii) Multithreaded C, a set large numbers of lightweight threads. of pragmas transformed into Sthreads calls by a preprocessor. Sthreads is easily implemented as a thin layer 2.1. Windows NT Thread Interface on top of general-purpose thread libraries. The Multi- threaded C preprocessor is a simple source-to-source Windows NT implements the Win32 thread API. This transformation tool that involves no complex program interface provides all the functionality that is needed for analysis. Therefore, the entire system is highly portable. high-performance multithreaded programming. How- One of the major strengths of our system is that ever, because the scope of the Win32 thread API is gen- much of the development of a multithreaded application eral-purpose, the interface is more complicated and less can be performed using ordinary sequential methods structured than is desirable when threads are used for and tools. The Multithreaded C preprocessor is inte- the purpose of speeding up program execution. For this grated with Microsoft Developer Studio Visual C++ [7]. reason, we have built a less general, less complicated, Applications can either be built either as multithreaded and more structured layer on top of the functionality applications (as indicated by the pragmas) or as sequen- provided by the Win32 thread API. tial applications (by ignoring the pragmas). For deter- The Win32 thread API is typical of other general- ministic applications, most development, testing, and purpose thread libraries, e.g., Pthreads [6] and Solaris debugging can be performed using the sequential ver- threads [4]. It provides a set of function calls for creat- sion of the application. Absence of race conditions and ing and terminating threads, suspending and resuming deadlock can be verified in the context of sequential threads, synchronizing threads, and controlling the execution. Nondeterministic applications can usually be scheduling of threads using priorities. The interface developed with large deterministic components. contains a large number of constants and types, a large The focus of our work is on commodity multiproc- number of functions, and many optional arguments. essors and operating systems, in particular multiproces- Although all these operations have important uses in sor PCs and Windows NT. However, our programming general-purpose multithreading, only a structured subset system is portable across platforms ranging from low- of this functionality is required for our purpose. end PCs to high-end workstations and supercomputers. As with other general-purpose thread libraries, For this reason, the value of our work is not restricted to Win32 thread creation is unstructured. A thread is cre- developers of applications for commodity systems. Our ated by passing a function pointer and an argument portable system for high-performance multithreaded pointer to a CreateThread call. The new thread exe- programming also allows for high-end applications to cutes the given function with the given argument. The be developed, tested, and debugged on accessible low- thread can be created either runnable or suspended. Af- end platforms. ter creation, there is no special relationship or synchro- The remainder of this paper is organized as fol- nization between the created thread and the creating lows: in Section 2, we discuss the interface and per- thread. For example, the created thread may outlive the formance of Windows NT support for multithreading; in creating thread, causing problems if the created thread Section 3, we describe our structured multithreaded references variables in the creating thread. Many un- programming system; in Section 4, we report in some structured operations are permitted on threads. For ex- detail on one particular application (aircraft route opti- ample, one

Load more