Stimulus Structures and Mental Representations in Expert Comprehension of Programs By: Nancy Pennington Presented by: Michael John Decker Author & Journal

• Nancy Pennington - No internet presence

• Journal: Cognitive Psychology - Is. 19, Pg: 295-341 (1987)

• 2013 JIF: 3.571

• Paper Citations: 525 Research Question

• What mental model best describe how an expert programmer builds up knowledge of a program?

• When studying the code?

• When performing maintenance activities? Motivation

• Study of program comprehension means studying role of particular kinds of knowledge in cognitive skill domains

• Estimated 50% of professional programmers time is spent on program maintenance

• comprehension is a significant part of maintenance

• Increased understanding of how knowledge is obtained and applied then use for higher productivity and decreased maintenance costs Text Abstractions

• Goal hierarchy - functional achievements of program

• Data flow - transformations applied to data objects

• Control flow - sequences of program actions and the passage of control between them

• Conditionalized Actions - representation specifying states of the program and the actions invoked Programming Knowledge

• Text Structure Knowledge

• Plan Knowledge Text Structure Knowledge

• Decomposition of a program’s text into text structure units

• Segmentation of statements into phrase-like grouping that are combined into higher-order groupings

• Syntactic markings provide clues into the boundary of segments

• Segmentation reflects the control structure of the program

• Relation to abstractions:

• control flow information - easy to obtain

• data flow and goal-hierarchy - difficult to obtain when involves across unit

• conditionalized actions - difficult to obtain Plan Knowledge

• Patterns of program instructions combine to accomplish some functions

• e.g. bubble-sort, swap

• Recognition of patterns implementing known plans

• Plans are activated by partial

• Results reflect data flow structure of program indexed by program functions

• Relation to abstractions:

• data flow and goal-hierarchy - easy to obtain

• control flow and conditionalized actions - difficult to obtain Study One: Details

• 80 Expert Programmers (40 COBOL, 40 )

• 8 Program Segments: 15 lines, comprehensible, and different Text Structure and Plan units

• 6 comprehension questions per segment that tested knowledge related to specific abstractions

• 22 item recognition test list (4 targets per segment separated into two sets)

• constructed from triple consisting of target item, Text Structure prime, and Plan prime

• Response and response times were collected Study One: Design

• 2 (language) x (4 orders) x 2 (subject groups within language) x 2 (prime types) x 2 (sets of target items)

• Language, order of presentation, and subject groups between participant with subject group, prime type, and target items forming a 2 x 2 x 2 Latin square Study One Results: Recognition TS COBOL Plan COBOL TS COBOL Plan COBOL TS FORTRAN Plan FORTRAN TS FORTRAN Plan FORTRAN 3 2.9 2.9 2.8 2.8 2.7 2.7 2.6 2.6 2.5 2.5 2.4 2.4 2.3 2.3 2.2 2.2 2.1 2.1

Target Set 1 Target Set 2 Subject Group 1 Subject Group 2 • Response times consistently quicker for Text Structure units then for Plan

• Accuracy was not significantly different Study One Results: Comprehension Operations Control Flow Data Flow State Function 50%

40%

30%

20%

10%

FORTRAN COBOL • Error rates least for program operations and Control Flow (Text Structure Knowledge)

• FORTRAN programmers better for control flow and COBOL better at data flow Study Two: Details

• 40 Expert Programmers (20 COBOL, 20 FORTRAN) best and worst previous study

• 200 line program computes specifications for industrial plant designs

• Part I: 45 min. spent studying program followed by summary and comprehension questions

• Part II: 30 min. modification activity followed by summary and comprehension questions

• 40 Questions (10 Control Flow, 10 Data Flow, 10 goal hierarchy, 10 conditionalized actions)

• Divided two matching sets of 20 (question correspondence between sets) Study Two: Design

• 2 (language) x 2 (previous performance) x (talk and no talk) x 2 (comprehension tests) x 4 (question categories)

• Language, previous performance, and talk/no talk between-participant with separate tests and categories within-participant Study Two Part I Results: Comprehension Control Flow Data Flow State Function 60% • Results 50% comparable to first study 40%

30% • Control Flow lowest error indicating 20% Textual Structural

10% Model

0% Information after Study Session Study Two Part I Results: Summaries

• Classified by type: procedural, data flow, function statements

• 57% procedural, 30% data flow, 13% function

• Indicates Text Structure Knowledge Study Two Part II Results: Comprehension Control Flow Data Flow State Function

60% 50% 40% 30% 20% 10% 0% Information after Modification Session No Talk Information after Modification Session Talk

• Data flow and function lowest error rates indicating Plan Knowledge

• Participants who talked allowed show larger disparity and larger trend toward Plan Knowledge Conclusions

• Text Structure Knowledge: plays an the initial organizing role in memory for programs (Study One, Study Two Part I)

• Plan Knowledge: comes to play in later stages of program comprehension under appropriate task conditions Future Work

• Find evidence for why/how shift between Text Structure Knowledge and Plan Knowledge occurs

• Speculate situation model

• Evidence for studies is found by post-task questions and analysis, more recent technology (eye-tracking, EEG) could possible be used to measure and identify how mental model is built

• Further look at the language affect on mental model Ending Remarks

• Overall, well thought out studies that controlled for a number of variables (e.g. language)

• Results that initial model differs from latter model indicate task related shift which sets stages for latter research

• The second study lacked data about summaries after modification task and a comparison

• 200 line program is no longer moderate length (if ever was) and is still easy to understand specially in time allotted. How would this apply to a moderate length project now? References

• [Pennington'87] Pennington, N., (1987), "Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs", Cognitive Psychology, vol. 19, pp. 295-341.