Meaningful Method Names
Total Page:16
File Type:pdf, Size:1020Kb
Meaningful Method Names Doctoral dissertation by Einar W. Høst Submitted to the Faculty of Mathematics and Natural Sciences at the University of Oslo in partial fulfillment of the requirements for the degree Philosophiae Doctor in Computer Science November 2010 © Einar W. Høst, 2011 Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo No. 1044 ISSN 1501-7710 All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, without permission. Cover: Inger Sandved Anfinsen. Printed in Norway: AIT Oslo AS. Produced in co-operation with Unipub. The thesis is produced by Unipub merely in connection with the thesis defence. Kindly direct all inquiries regarding the thesis to the copyright holder or the unit which grants the doctorate. Abstract We build computer programs by creating named abstractions, aggregations of be- haviour that can be invoked by referring to the name alone. Abstractions can be nested, meaning we can construct new, more powerful abstractions that use more primitive abstractions. Thus we can start from tiny blocks of behaviour and build ar- bitrarily complex systems. For this to work, however, the abstractions must be sound — in other words, the names must suit the behaviour they represent. Otherwise our tower of abstractions will collapse. Hence we see the crucial importance of naming in programming. Despite this importance, programmers almost completely lack tools to assist them. The computer treats names as arbitrary, allowing for sloppy and inconsistent naming. Tool support for good naming would be beneficial for many reasons. Most obviously, it would help create programs that are easier to understand, and hence easier to maintain. A secondary, but equally important, effect is that good naming and good design go together. In other words, good naming strengthens the tower of abstractions. In this thesis, we show that the method names used in Java programs are far from arbitrary. They are meaningful in a sense that relates to the behaviour they represent. By analysing the implementation of methods in real-world Java programs, we can approximate the meaning of names and gain a deeper understanding of key aspects of naming in Java. For instance, we show that it is feasible to create a tool to discover naming bugs in Java programs — methods that have been improperly named. Our analyses are completely mechanical, meaning that they require no human supervision. iii iv Acknowledgements First of all, I would like to thank my main supervisor, Bjarte M. Østvold, for providing motivation, support, inspiring discussions and never-faltering faith in the research. You are an excellent supervisor — it has been invaluable to me that you always kept your door open, always found time and energy to listen or contribute ideas. Working with you has been both educational and great fun. I would also like to thank my co-supervisor Gerardo Schneider for kind assistance and cooperation in all practical matters, as well as valuable proofreading and comments. The main part of the work presented in this thesis was done while I was employed as a PhD fellow at Norsk Regnesentral. I would like to thank the head of the DART department, Asmund˚ Skomedal, for having enough faith in me to hire me. I also appreciate the kind faces of the rest of the DART employees. Thank you to professor Barbara G. Ryder for inviting me to Rutgers during my PhD fellowship, a trip that greatly expanded my horizon and taught me some valuable lessons. I would also like to thank Jan Wloka for many interesting discussions, both professional and personal, over coffee ranging from the excellent to the abysmal. I learned much from you. My work at Norsk Regnesentral was supported by a grant from the Research Council of Norway through the RSE-SIP project. I am grateful to the staff at the Department of Informatics at the University of Oslo for extending my PhD contract so that I have been able to complete my work. I would also like to thank my current employer, Computas, for flexibility and support during the final phase of my work. Thank you mum and dad for your endless support and understanding. You have taught me the value of knowledge and learning, as well as the joy in working to ac- complish something. I am proud and grateful to have been raised in that tradition. Finally, my deepest thanks to my wonderful family — my ever-optimistic and positive wife Line and my two amazing children Astrid and Sigurd — for filling my life with light, laughter and love. You make every day meaningful and valuable. Thank you. v vi Contents IOverview 1 1 Introduction 3 1.1 Research Goals ............................... 4 1.2 Summary of contributions ......................... 4 2 Research method 7 2.1 Research on programming ......................... 8 2.2 Narrative and relevance: Influencing programmers . ......... 8 2.3 Method: Empirical studies ......................... 9 2.4 The research method of this thesis ..................... 9 2.4.1 Informational phase: Informal meaning . ........... 10 2.4.2 Propositionalphase:Abstractsemantics............. 10 2.4.3 Analyticalphase:Answeringquestions.............. 10 2.4.4 Evaluationalphase:Hypothesistesting.............. 11 3 Problem analysis 13 3.1 A pragmatic theory of meaning ...................... 13 3.2 Informal meaning in programs ....................... 14 3.3 Interpretation of meaning ......................... 15 3.4Ambitions.................................. 16 3.4.1 GoalG1:Namepatterns...................... 16 3.4.2 GoalG2:Usagesemantics..................... 17 3.4.3 Goal G3: Understanding naming ................. 17 3.4.4 Prerequisite:Representativecorpus................ 18 4 State of the art 21 4.1Exploringprogrammerlanguage...................... 21 4.2 Finding meaningful artefacts in programs ................. 23 4.2.1 Finding patterns .......................... 23 4.2.2 Finding clones ............................ 24 4.2.3 Finding examples .......................... 24 4.3 Relating names to meaningful artefacts .................. 25 5 Contribution 27 5.1 Research goals ................................ 27 5.1.1 GoalG1:Namepatterns...................... 27 5.1.2 GoalG2:Usagesemantics..................... 27 5.1.3 Goal G3: Understanding naming ................. 28 vii viii CONTENTS 5.1.4 Prerequisite:Representativecorpus................ 29 5.2Critique................................... 30 5.2.1 Limitations of the usage semantics model . ........... 30 5.2.2 Limitations of the corpus ...................... 31 5.3Conclusion.................................. 32 Bibliography 33 II Research papers 37 6 Overview of Research Papers 39 7 Paper 1: The Programmer’s Lexicon 41 7.1Introduction................................. 41 7.2 Definitions .................................. 43 7.2.1 Preliminaries ............................ 43 7.2.2 Distribution and entropy ...................... 44 7.2.3 TheUsageSemanticsofNames.................. 44 7.3 Approach to Name Analysis ........................ 45 7.3.1 RestrictingtheSetofNames.................... 45 7.3.2 Describing Names .......................... 46 7.3.3 MeasuringthePrecisionofNames................. 46 7.3.4 ComparingandRelatingNames.................. 46 7.4 The Attribute Catalogue .......................... 47 7.4.1 CritiqueoftheCatalogue..................... 48 7.5TheCorpusofJavaPrograms....................... 48 7.6ExperimentalResults............................ 50 7.6.1 ExploringNuanceswithaLargerLexicon............ 54 7.7RelatedWork................................ 54 7.8Conclusion.................................. 55 7.ATheLexicon................................. 58 8 Paper 2: The Java Programmer’s Phrase Book 61 8.1Introduction................................. 61 8.2ConceptualOverview............................ 63 8.2.1 ProgrammerEnglish........................ 63 8.2.2 RequirementsforThePhraseBook................ 64 8.2.3 Approach .............................. 64 8.2.4 Definitions .............................. 65 8.3MethodAnalysis.............................. 66 8.3.1 SyntacticAnalysisofMethodNames............... 66 8.3.2 Semantic Analysis of Method Implementations .......... 68 8.3.3 PhraseSemantics.......................... 70 8.3.4 MethodDelegation......................... 71 8.4Engineeringthephrasebook........................ 71 8.4.1 MeetingtheRequirements..................... 72 8.4.2 GenerationAlgorithm....................... 73 8.5Results.................................... 74 CONTENTS ix 8.6RelatedWork................................ 77 8.7Conclusion.................................. 77 9 Paper 3: Debugging Method Names 81 9.1Introduction................................. 81 9.2Motivation.................................. 83 9.2.1 TheJavaLanguageGame..................... 83 9.3AnalysisofMethods............................ 84 9.3.1 Definitions .............................. 85 9.3.2 AnalysingMethodNames..................... 86 9.3.3 AnalysingMethodSemantics................... 88 9.3.4 Deriving Phrase-Specific Implementation Rules ......... 90 9.3.5 Finding Naming Bugs ....................... 91 9.3.6 Fixing Naming Bugs ........................ 91 9.4TheCorpus................................. 92 9.5Results.................................... 94 9.5.1 NameDebugginginPractice.................... 94 9.5.2 Notable Naming Bugs ....................... 96 9.5.3 Naming Bug Statistics ....................... 98 9.5.4 Threats to Validity ......................... 100 9.6RelatedWork...............................