<<

featureprogramming

How Pair Programming Really Works

Stuart Wray, Royal School of Signals

air programming has generated considerable controversy: some developers Pair programming are enthusiastic about it, almost evangelical; others are dubious, even hostile. isn’t always However, a large factor in this controversy is that programmers label a wide successful, and variety of practices under the “pair programming” umbrella. Thus, before recent studies cast Pour community can sensibly discuss how pair programming works, we !rst need to es- doubt on the “driver- tablish exactly what it is. navigator” metaphor. As a dictionary de!nition, I’d say that pair at the keyboard, usually swapping over with a Four mechanisms programming is a technique in which two people phrase like, “No, let me show you what I mean.” can improve pair sit down, literally side by side, and write a pro- Jan Chong and Tom Hurlbutt con!rmed this gram at the same computer. When Kent Beck view of successful pair programming after spend- programming originally coined the term, he described two ing several months on an ethnographic study of performance. programmers working at different levels of ab- professional developers who use pair program- straction.1 Laurie Williams and Robert Kessler ming in their daily work.3 They found that pro- made this idea concrete, using the meta- grammers tended to work together on the same phor of one programmer being the “driver” and facet of a problem almost the whole time and the other the “navigator.”2 In this metaphor, the swap between tactical and architectural levels as driver controls the keyboard and focuses on the a pair. Similar ethnographic studies by Sallyann immediate task of coding, and the navigator acts Bryant and her colleagues4 and Stephan Salinger as a reviewer, observing and thinking about more and his colleagues further con!rmed this.5 strategic architectural issues. Of course, not all attempts at pair program- My own experience as a developer using pair ming have been successful—Matt Stephens and programming is that it isn’t just a technique Doug Rosenberg, for example, reported unfa- where one person programs and the other per- vorably on their experiences.6 However, what son watches. Both programmers work closely to- they described is a caricature of the driver- gether, chatting the whole time, jotting down re- navigator metaphor, with one programmer !rmly minders of things to do, and pointing out pieces in control and the other sitting quietly, doing little. Share your of code on the screen. (One of the clichés of pair Such misunderstanding shows that we can’t take comments at http:// programming is that if you’re doing it right, your a claim that developers are pair programming at computingnow. screen should be covered with greasy !nger-marks face value; they might not be doing what experi- computer.org/wray. by the end of the day.) Programmers take turns enced and effective pair programmers actually do.

50 IEEE Published by the IEEE Computer Society 0740-7459/10/$26.00 © 2010 IEEE This kind of misunderstanding also casts They tested another group in the same way, but doubt on the many attempts to assess pair pro- encouraged the students to explain the textbook gramming’s effectiveness. (Tore Dybå and his out loud and “!ll in the gaps” for themselves. The An expert is colleagues provide a very nice summary of this self-explainers learned signi!cantly more than the experimental work.7) If the subjects of these ex- control group, and those who explained the most more likely periments did different things, can we really com- improved the most. The researchers also prompted to ask a deep pare their results? And if they weren’t doing what the students for their explanations; they weren’t just question, which successful pair programmers do in commercial left to their own devices. In particular, they were practice, can we apply their !ndings to commer- “prompted for further clari!cation by the experi- prompts the cial development? menter if what they stated was vague.”10 novel inference In this article, I advance four mechanisms This brings me back to an often-neglected as- prompted by my own experience of pairing in pect of the expert programmer theory. When we from the stuck both agile and non-agile development. These coined that term, we noticed that although real un- programmer. mechanisms explain a large part of what suc- derstanding wasn’t necessary on the listener’s part, cessful pair programmers do. Of course, this is a belief that the listener really was an expert seemed only the beginning: you might have experiences to signi!cantly improve the outcome (hence our that con!rm or contradict my suggestions. What choice of name). have I missed? I hope you’ll contribute to the dis- But why would believing that you were talk- cussion of these issues on the Web site (http:// ing to experts make any difference when they computingnow.computer.org/wray). didn’t need to understand your explanation? Re- cent work by Rod Roscoe and Chi showed that Mechanism 1: prompting questions seems to be the key.11 In their Pair Programming Chat study, one student (the tutor) explained material to Around 1980, as undergradu- another student (the tutee). As expected, the tutor ate students at the University of Cambridge, my actually learned more than the tutee, but the ques- friends and I noticed a strange phenomenon that tions the tutee asked made a dramatic difference in we called expert programmer theory. When one the quality of the tutor’s explanations. Most ques- of us had trouble getting our programs to work, tions were shallow, and could be satis!ed by mere we’d describe the nonfunctioning state of our code repetition of facts, but some questions were deep to each other over coffee. Quite often, we’d real- and often prompted deep answers that included ize in a "ash what was wrong and how to solve novel inferences or self-monitoring statements. it. These epiphanies were quite independent of the So perhaps this is how expert programmer the- other person having any real understanding of our ory really works: an expert is more likely to ask a problems—the listener often seemed little wiser deep question, which prompts the novel inference about the subject. from the stuck programmer. It also seems possible Since then, I’ve found this phenomenon is well that merely thinking that you’re talking to an ex- known to professional developers, and sometimes pert—or pretending—will help the stuck program- described in textbooks and research papers. For mer produce the sort of deep questions that experts example, Brian Kernighan and Rob Pike recom- have asked them in the past. mended explaining problems aloud, even to a As an explanation for expert programmer the- stuffed toy,8 a practice that John Sturdy called the ory, this is almost satisfactory, but is student learn- rubber-plant effect.9 Part of pair programming’s ing a good analogy for what happens to stuck effectiveness is presumably due to this effect be- programmers? After all, the students in these ex- ing continually triggered: as one programmer gets periments had to master basic science, and their ex- stuck, the back-and-forth chat serves to unstick planations helped them work out what they didn’t them in the same way as solo programmers talk- understand. Stuck programmers must already have ing about their problems out loud. However, this all the information somehow hidden in their heads raises the question of whether any type of speak- and then realize the answer in a moment of epiph- ing will help or whether something speci!c is any. How’s that possible? needed. It’s widely accepted that cognitive abilities are Research on “self-explanation” by Michelene divided into a variety of largely separate mental Chi and others throws some light on this ques- modules, each dealing with a different ability tion. Chi and her colleagues described a study that such as intuitive grasp of small numbers, predict- tested a control group of students before and after ing other people’s actions, facial recognition, and they received a textbook explanation to read.10 so on. Less well known is the role of the language

January/February 2010 IEEE SOFTWARE 51 module in integrating other modules’ knowledge. each other should be most productive of all. Experiments by Linda Hermer-Vazquez and her What we colleagues on integrating knowledge about ge- Mechanism 2: Pair ometry and color12 and by Ashley Newton and Programmers Notice More Details notice depends Jill de Villiers on false-belief reasoning13 showed Research on change blindness and inattentional on what we that adults perform as poorly as young children blindness illustrates something that stage magi- expect to see when their linguistic abilities are occupied with cians have known for a long time: if we don’t know a verbal shadowing task. The language mod- what to look for, we can stare right at it and still and what we ule seems crucial to combining knowledge from miss it. What we notice depends on what we ex- unconsciously other modules. pect to see and what we unconsciously consider This isn’t to say that we integrate the outputs salient. So, although successful pair programmers consider of several mental modules by talking to ourselves. will concentrate mostly on the same things, they salient. Rather, Peter Carruthers suggested that because might notice different things. speech is uniquely both an input and output brain Research on change blindness shows that peo- medium, the language module is the only one ple are remarkably poor at detecting changes, not with a strong connection to all the other mod- only in 2D images under laboratory conditions but ules.14 The mechanisms underlying the logical in real-life situations such as noticing the substitu- form of language might thus be redeployed at a tion of one person with another.15 It appears that level beneath conscious awareness to integrate in- people remember something they saw as belonging formation from other modules. The logical form to a particular mental category, then fail to notice must be able to represent objects with properties substitution by another member of that category. derived from several modules because this is the A large portion of experts’ pro!ciency is probably basis for noun-phrases in speech. in their more detailed and extensive array of men- As programmers, we clearly use visual imagi- tal categories in their particular !elds.16 Research nation to help design and debug our programs on inattentional blindness has similarly shown that (although the diagrams we use bear little rela- when our attention is focused on a particular task, tion to our programs’ texts). This visual informa- we can miss something that would otherwise be tion can be spread across several mental modules, so obvious that it would just pop out. For exam- and the other information we require to under- ple, it might seem unlikely that people would miss stand our programs can be in yet other modules. a woman in a gorilla suit walking into the shot in For example, it seems that our understanding of a video, but that’s what half the subjects did in a object-oriented (OO) programs is supported by study by Daniel Simons and Christopher Chabris.17 the folk psychology module that supplies intu- (They’d been instructed to pay close attention to an- itions about other people’s actions. (We think of other aspect of the video.) objects as having intentions, wanting to do things, So, two people programming together won’t and sending each other messages.) We therefore have the same prior knowledge or categorization: need to integrate information from separate mod- one will presumably spot some things faster and ules when thinking about our programs. Why the other different things faster. Where their rate of can’t we always integrate it straightaway? working is limited by the rate they can !nd things Carruthers suggested that we must rely on the by just looking, two heads must be better than one. language module posing the right question and And in fact, one of the earliest observations that that the other modules don’t usually present in- people make when they start to pair program is that formation spontaneously. However, when we the person who isn’t typing code always picks up ty- hear the right question, our brains make the nec- pos quicker: “Oh, you’ve left out the comma here.” essary information available, and the language Of course, the compiler would pick up such module can then perform rudimentary inference small slips easily, so in this case the early catch and draw the obvious conclusions. Carruthers isn’t very important. However, it’s crucial to catch suggested that the key is posing a question that’s problems early when the slip is more subtle—for “both relevant and fruitful.”14 The right question example, if the code is syntactically correct but se- draws forth the crucial knowledge, and in a mo- mantically wrong, or where there’s a fault in the ment of epiphany, the answer becomes obvious design itself. Such slips can easily cause hours of This !rst mechanism would therefore lead us problems at a later date. The ability to catch mis- to predict that programmers who chat about their takes early in an online code review is only one programs more should be more productive and bene!t of two pairs of eyes: perhaps even more im- that those who pose occasional deep questions for portant is looking at old code with a fresh eye and

52 IEEE SOFTWARE www.computer.org/software a different set of expectations, reading what it re- of learning explored by behavioral psychologists, ally says, not what it ought to say. called operant conditioning, involves learning to This second mechanism also partially explains perform some action spontaneously. This is the the phenomenon of pair fatigue, which I’ve noticed way that animals learn to perform tricks in circus in myself and others. When two programmers pair acts or domestic dogs are taught obedience. An together, the things they notice and fail to notice animal has a variety of behaviors that it engages become more similar. Eventually, the bene!t from in occasionally, and with operant conditioning, we two pairs of eyes becomes negligible. Beck sug- can supply the animal with a reward after we ob- gested that pairs should rotate at frequent intervals, serve it doing what we want (which reinforces the perhaps once or twice a day.1 Arlo Belshee found behavior). As this reinforcement process continues, that in a jelled team, rotating after two hours was the desired behavior becomes more likely to hap- optimal.18 Some pair programmers regard rota- pen spontaneously, even when no reward is given. tion as an optional part of the practice, and on a Of course, if the rewards stop entirely, the be- small team, or with few programmers willing to havior diminishes and !nally ceases, a process pair, there might be little alternative. However, known as extinction (which happens quite slowly). pair fatigue means they’ll ultimately be much less If we supply a further reinforcement before the be- productive. havior has entirely ceased, we can easily restore it On the other hand, because a great deal of ex- to full strength. In fact, learning happens quickest pert knowledge is probably in the form of catego- if the reward pattern is unpredictable, with a so- ries in long-term memory, a novice might be un- called variable ratio (VR) schedule of reinforce- able to distinguish between events experienced at ment. Henry Gleitman and his colleagues explain: different times. Experts really can see things that novices can’t. We could therefore predict that this In a VR schedule, there is no way for the ani- second mechanism will bring the maximum ben- mal to know which of its responses will bring e!t to novice pairs; indeed, the most extensive ex- the next reward. Perhaps one response will periment with novice pairs and experts found that do the trick, or perhaps it will take a hundred novice pairs bene!ted the most.19 more. This uncertainty helps explain why VR schedules produce such high levels of Mechanism 3: responding in humans and other creatures. Fighting Poor Practices Although this is easily demonstrated in As programmers, we don’t always use the best the laboratory, more persuasive evidence practices. An advantage of pair programming is comes from any gambling casino. There, slot said to be pair pressure, the feeling of not wanting machines pay off on a VR schedule, with the to let your partner down.20 But why is this nec- ‘reinforcement schedule’ adjusted so that the essary? Why do we persist in poor programming ‘responses’ occur at a very high rate, ensuring practices when we know they’re poor? Is there that the casino will be lucrative for its owners something special about programming that makes and not for its patrons.21 it more dif!cult to do the right thing? It appears that there is. This learning is unconscious: we need not realize Let’s look at a particular example of worst that it’s happening to us, and in the case of the ca- practice: the code-and-!x style of programming sino, a machine instead of a real person is condi- most often used by novices (and sadly, often used tioning the slot-machine patrons. In our habitual by more experienced programmers). Programmers patterns of , we too can be write some code that they hope will do a particu- conditioned by our machines. This is the special lar thing and then run it to see what happens. If property of interactive programming that makes it appears to work, they press on with other code, it dif!cult to do the right thing. With code and without systematically searching for "aws. When !x, we tinker haphazardly with our programs, ef- it fails, which is often the case, they tinker with the fectively putting a coin into the slot machine each code haphazardly until it appears to work. Why time we run our code. Slot machines are known is this style of programming so compelling and so as the most addictive form of gambling, and the easy to discover independently? similarly unpredictable rewards from code-and- Traditional behavioral psychology offers a very !x programming mean that it could be equally plausible explanation, although more modern addictive. work on the neuroscience of learning and addic- How can we resist this addiction? Perhaps tion also points in the same direction. One form we can try to “just say no” and choose a differ-

January/February 2010 IEEE SOFTWARE 53 ent development pattern. Some development pro- tell just by meeting them. “I can’t tell, even now,” he cesses attempt to remove temptation by being less wrote. “You also can’t tell from their résumés.”25 Some interactive. Edsger Dijkstra suggested that stu- This is my experience, too: it isn’t enough to dents shouldn’t be allowed near a computer un- talk with someone about programming; you have programmers til they’d learned to write programs away from to work on a problem with them to gauge their ex- are up to one.22 Such ideas might have once had merit, but pertise. A weak version of this technique is stan- 10 times more it seems foolish to turn our backs on the orders-of- dard practice in programming interviews. After magnitude increase in computer power available the preliminary discussion centered on the appli- productive to us. cant’s résumé, the interview proceeds to a series of than others. Pair programmers might be less susceptible to successively more dif!cult programming exercises poor practices because they can promise to write that the applicant has to talk through at a white- code in a particular way and ensure that each board. I’m frequently surprised by how a very other’s promises are kept. The prevalence of two- plausible-sounding candidate, when challenged people working in jobs where human fallibility is in this way, completely fails to produce even the a serious problem should lead us to seriously con- most basic evidence of the knowledge that he or sider that pair pressure might be the solution for she earlier claimed. us, too. However, you can only keep a promise if Sadly, these poor candidates seem blissfully you made one in the !rst place. We should there- unaware of their own lack of expertise. They’re fore expect that to bene!t from the third mecha- so bad that they don’t realize how bad they are, nism, programmers must agree in advance how probably because, in the words of Justin Kruger they’re going to write and test their code. and David Dunning, “the same knowledge that underlies the ability to produce correct judgments Mechanism 4: Sharing is also the knowledge that underlies the ability to and Judging Expertise recognize correct judgment.”26 In a !eld where Even within a single development team, conven- expertise is hard to measure, this is a serious prob- tional wisdom says that some programmers are lem, because as Kruger and Dunning observed, up to 10 times more productive than others.23 the less competent are often more con!dent of Certainly we see a wide range of expertise, but their own ability than their more expert peers. how con!dent can we be in saying who contrib- The most competent, on the other hand, suffer utes most to overall productivity? Assigning credit from the opposite problem, the false consensus ef- for success is dif!cult in team activities because fect, in which they believe that their own abilities there are so many variables. are typical. This happens for the same reason: it’s In some !elds, it’s easy to recognize experts hard to accurately assess others’ competence, so because individual contributions are simple to the most competent have no reason to believe that measure. Chess players have numerical rankings; they’re extraordinary—unless they work closely golfers have handicaps. These are good predic- with another programmer on the same problem. tors of their likely success against other players. Most programmers work on problems on their But in team activities, so many factors contribute own, so no one knows how good (or bad) they re- to success or failure that we simply can’t under- ally are. But with pair programming, people con- stand the causal relationships without a detailed tinually work together. Because they keep swapping scienti!c investigation, so we usually select one or pairs, everyone on the team learns who’s the most two arbitrary factors to simplify the analysis.24 In expert at particular things. From this comparison, software development, “lines of code written per they also realize their own level of expertise. We day” often gets elevated above all others, simply should therefore expect more accurate estimates because it’s easy to measure. But selecting such of time and dif!culty by a pair programming team arbitrary factors tends to promote “star players” than from a solo programming team. From my ex- who demonstrate those qualities but don’t signi!- perience, this does appear to be the case. cantly contribute to the team’s success. Unfortunately, more detailed scienti!c analysis is seldom practical. So how can we assign individ- e’re no longer in the !rst "ush of ual credit (or blame) for team performance? Paul pair programming, yet the gulf be- Graham said that when an expert programmer tween enthusiasts and critics seems works alongside another programmer on the same Was wide as ever. Experimental evidence has been problem, the expert can judge the other program- equivocal. How can we advance our understand- mer’s skill. But that’s the only way: he or she can’t ing? I believe the mechanisms I describe here are

54 IEEE SOFTWARE www.computer.org/software among the fundamental properties shared by all About the Author instances of successful pair programming, but Stuart Wray is a senior lecturer at the Royal School of Signals. His research interests other mechanisms are important, too: for ex- include the psychology of programming, , and functional programming. ample, team jelling appears to have a signi!cant Wray has a PhD in computer science from the University of Cambridge. Since then, he’s effect. What other mechanisms are signi!cant? worked in research at the Olivetti Research Laboratory and the University of Cambridge Computer Laboratory, and in product development at Virata, Marconi, and BAE Systems. Although I believe that pairing works the same Contact him at [email protected]. in agile and non-agile settings, this has yet to be established. In addition, there might be anti- mechanisms: poor practices that lead to unsuc- cessful pair programming and that aren’t merely Programming Interest Group (PPIG 07), Psychology of Programming Interest Group, 2005, pp. 215–226; the absence of the bene!cial mechanisms. To in- www.ppig.org/papers/17th-sturdy.pdf. vestigate all of these, we could solicit suggestions 10. M. Chi et al., “Eliciting Self-Explanations Improves from working developers; ethnographic research- Understanding” , vol. 18, no. 3, 1994, pp. 439–477. ers could reexamine their records for evidence for 11. R. Roscoe and M. Chi, “The In"uence of the Tutee in or against the mechanisms. Learning by Peer Tutoring,” Presented at 26th Ann. We also need some form of objective check- Conf. Cognitive Science Soc., 2004; www.cogsci. northwestern.edu/cogsci2004/papers/paper278.pdf. list to compare results across experiments, so that 12. L. Hermer-Vazquez, E.S. Spelke, and A.S. Katsnelson, experimenters can agree how much a particular “Sources of Flexibility in Human Cognition: Dual-Task programming team uses a particular mechanism. Studies of Space and Language,” Cognitive Psychology, vol. 39, no. 1, 1999, pp. 3–36. With such a checklist, we could then reexamine 13. A. Newton and J. de Villiers, “Thinking While Talking: the experiments that Dybå described7 and attempt Adults Fail False-Belief Reasoning,” Psychological Sci- to establish the extent to which the teams used cer- ence, vol. 18, no. 7, 2007, pp. 574–579. 14. P. Carruthers, “The Cognitive Functions of Language,” tain mechanisms. However, such post hoc analysis Behavioral and Brain Sciences, vol. 25, no. 6, 2002, pp. could still give equivocal results. To clearly estab- 657–726. lish the mechanisms’ impact, we must design new 15. D. Simons and D. Levin, “Failure to Detect Changes to People During Real-World Interaction,” Psychonomic experiments that properly control for them. Bull. and Rev., vol. 5, no. 4, 1998, pp. 644–649. Perhaps you have some other questions—or 16. N. Charness et al., The Cambridge Handbook of Expertise and Expert Performance, Cambridge Univ. even some answers—in mind right now. If so, Press, 2006. I invite you to share your comments on the Web 17. D.J. Simons and C.F. Chabris, “Gorillas in Our Midst: site (http://computingnow.computer.org/wray). In Sustained Inattentional Blindness for Dynamic Events,” any case, I hope that thinking about these mecha- Perception, vol. 28, no. 9, 1999, pp. 1059–1074. 18. A. Belshee, “Promiscuous Pairing and Beginner’s Mind: nisms will help you apply pair programming more Embrace Inexperience,” Proc. Agile Development Conf. effectively. (AGILE 05), IEEE CS Press, 2005, pp. 125–131. 19. E. Arishholm et al., “Evaluating Pair Programming with Respect to System Complexity and Programmer References Expertise,” IEEE Trans. Software Eng., vol. 33, no. 2, 1. K. Beck, Explained: Embrace 2007, pp. 65–86. Change, 1st ed., Addison-Wesley, 2000. 20. L. Williams and R. Kessler, “The Effects of ‘Pair- 2. L. Williams and R. Kessler, Pair Programming Illumi- Pressure’ and ‘Pair-Learning’ on nated, Addison-Wesley, 2003. Education,” Proc. 13th Ann. Conf. Software Eng. 3. J. Chong and T. Hurlbutt, “The Social Dynamics of Education and Training (CSEE&T 00), IEEE CS Press, Pair Programming,” Proc. 29th Int’l Conf. Software 2000, pp. 59–65. Eng. (ICSE 07), IEEE CS Press, 2007, pp. 354–363. 21. H. Gleitman, A.J. Fridlund, and D. Reisberg, Psychol- 4. S. Bryant, P. Romero, and B. du Boulay, “Pair Program- ogy, 6th ed., W.W. Norton & Co., 2004. ming and the Mysterious Role of the Navigator,” Int’l 22. E. Dijkstra, “On the Cruelty of Really Reaching Com- J. Human-Computer Studies, vol. 66, no. 7, 2008, pp. puting Science,” 1988; www.cs.utexas.edu/users/EWD/ 519–529. ewd10xx/EWD1036.PDF. 5. S. Salinger, L. Plonka, and L. Prechelt, “A Coding 23. R. Glass, Facts and Fallacies of Software Engineering, Scheme Development Methodology Using Grounded Addison-Wesley, 2003. Theory for Qualitative Analysis of Pair Programming,” 24. M. Gladwell, “Game Theory,” The New Yorker, Proc. 19th Ann. Workshop Psychology of Program- 29 May 2006; www.newyorker.com/archive/2006/ ming Interest Group (PPIG 07), Psychology of Pro- 05/29/060529crbo_books1. gramming Interest Group, 2007, pp. 144–157; www. 25. P. Graham, “Great Hackers,” 2004; www.paulgraham. ppig.org/papers/19th-Salinger.pdf. com/gh.html. 6. M. Stephens and D. Rosenberg, Extreme Programming 26. J. Kruger and D. Dunning, “Unskilled and Unaware Refactored: The Case against XP, Apress, 2003. of It: How Dif!culties in Recognizing One’s Own 7. T. Dybå et al., “Are Two Heads Better than One? On Incompetence Lead to In"ated Self-Assessments,” J. the Effectiveness of Pair Programming,” IEEE Soft- Personality and Social Psychology, vol. 77, no. 6, 1999, ware, vol. 24, no. 6, 2007, pp. 12–15. pp. 1121–1134. 8. B. Kernighan and R. Pike, The Practice of Program- ming, Addison-Wesley, 1999. Selected CS articles and columns are also available 9. J. Sturdy, “Sidebrain: A Sidekick for the Programmer’s Brain,” Proc. 17th Ann. Workshop Psychology of for free at http://ComputingNow.computer.org

January/February 2010 IEEE SOFTWARE 55