Guidelines for Statistical Projects: Coding and Typography

0001 0002 Guidelines for Statistical Projects: Coding and 0003 0004 Typography 0005 0006 1 2 0007 Marius Hofert , Ulf Schepsmeier 0008 0009 0010 2014-10-01 0011 0012 0013 0014 0015 Abstract 0016 A 0017 Guidelines for conducting, implementing (in LTEX and R) and documenting statistical (re- 0018 search) projects are provided in order to improve readability and reduce the error rates of 0019 theses, scientific papers, reports and especially code. This is meant to save supervisors, package 0020 maintainers, students and practitioners a lot of time. It is clear, however, that such guidelines 0021 cannot be exhaustive. The given recommendations should therefore rather serve as a starting 0022 0023 point for improving your workflow and to avoid common pitfalls in statistical projects of larger 0024 scale. 0025 0026 0027 Keywords 0028 Coding, typography, LATEX, R. 0029 0030 MSC2010 0031 0032 68U15, 68U20, 68U05, 97R60. 0033 0034 0035 Contents 0036 0037 0038 0039 1 Introduction 2 0040 0041 2 General suggestions 3 0042 2.1 Forget about the Pareto principle (80–20 rule) . .3 0043 0044 2.2 When solving a particular problem for the first time, spend time on it . .4 0045 2.3 English in mathematics . .4 0046 2.4 Be consistent . .5 0047 0048 2.5 Be concise . .5 0049 2.6 Be structured . .6 0050 0051 2.7 Be self-contained . .7 0052 2.8 Be reproducible . .7 0053 2.9 Optimize communication, meetings and preparation . .8 0054 0055 0056 3 Editors and integrated development environments 9 0057 0058 0059 0060 0061 0062 1Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, 0063 ON, Canada N2L 3G1, [email protected] 0064 2Department of Mathematics, Technische Universität München, 85748 Garching, Germany, ulf.schepsmeier@ 0065 tum.de 0066 1 © Marius Hofert, Ulf Schepsmeier 1 Introduction 0067 4 LAT X 10 0068 E 0069 4.1 Getting started . 10 0070 4.2 Typographic recommendations for mathematical documents . 10 0071 4.3 Technical tricks to improve typography . 12 0072 0073 4.3.1 Citations . 12 0074 4.3.2 Spaces and alignment . 12 0075 0076 4.3.3 Figures . 13 0077 4.3.4 Miscellaneous . 14 0078 0079 5 R 14 0080 0081 5.1 Getting started . 14 0082 5.2 Documentation . 15 0083 5.2.1 Citing R and R packages . 15 0084 0085 5.2.2 Run time information . 16 0086 5.2.3 Code documentation . 17 0087 0088 5.3 Programming style . 18 0089 5.3.1 Writing correct code . 18 0090 5.3.2 Writing readable code . 18 0091 0092 5.3.3 Writing safe, fast, flexible and sophisticated functions . 20 0093 5.3.4 Learn from others, learn from the masters . 22 0094 5.3.5 Test your code . 23 0095 0096 5.3.6 Specific hints . 23 0097 5.4 Tables and graphics . 23 0098 0099 0100 6 Version control 25 0101 6.1 Dropbox . 25 0102 6.2 SVN . 25 0103 0104 6.2.1 Checkout . 26 0105 6.2.2 Add and (re)move . 26 0106 6.2.3 Update, commit . 27 0107 0108 6.2.4 Log, status, list, diff . 27 0109 6.2.5 Conflicts . 28 0110 6.3 Git . 28 0111 0112 0113 7 Submitting a paper 28 0114 7.1 Purpose of journals: How to find the best fitting journal for my research . 29 0115 0116 7.2 Preparations before submission . 30 0117 7.3 Submitting a paper . 31 0118 0119 References 32 0120 0121 0122 0123 1 Introduction 0124 0125 These guidelines are meant for students, professors and practitioners who would like to write, 0126 0127 participate in, or supervise a project such as a bachelor, master, Ph.D. thesis, or scientific 0128 paper in the intersection of mathematics, statistics and computer science. Besides some general 0129 recommendations, we focus on the software tools LAT X and R for conducting, implementing and 0130 E 0131 documenting the project. 0132 2 © Marius Hofert, Ulf Schepsmeier 2 General suggestions 0133 Before going into detail, some remarks are in order: 0134 0135 The science of coding Coding (in LATEX, R etc.) is like a handwriting. From the corresponding 0136 files and style of coding one can read a lot. Writing correct, readable, well-documented and 0137 0138 easy-to-maintain code is a science on its own and one that is not taught explicitly at university 0139 level unless one specifically studies computer science (but this course of studies rarely addresses 0140 LAT X and R). However, with the ever increasing complexity of statistical simulations and 0141 E 0142 projects, it becomes important to have coding guidelines – otherwise it might be difficult for 0143 others to understand what you actually want to “say” or do with your code. Besides being 0144 0145 correct, your code should be easily readable and extendable by others. Larger projects involve 0146 more and more contributors, each of which should be able to easily follow your code and adjust 0147 it if required, hence some guidelines are in order. 0148 0149 Motivation These guidelines are motivated from our own work with students and practitioners. 0150 After pointing out improvements, making code more readable, correcting common mistakes and 0151 0152 improving documents again and again, we hope that these guidelines help all parties involved 0153 in a project to avoid (what we believe are) common pitfalls and to save time. 0154 0155 Focus The guidelines reflect our personal recommendations and experience using LATEX and 0156 R (and related tools mentioned below) in our personal areas of research (which lies in the 0157 0158 intersection of mathematics, statistics and computer sciences). It is clear that such a guide 0159 cannot be exhaustive. In particular, this is not an introduction to the topics presented! If you 0160 feel that we missed an important aspect not easily found or addressed in other guidelines or 0161 0162 tutorials, or if you can improve this document, please let us know. 0163 Goal Our goal with these guidelines is not to make a document or code snippet 100% perfect. 0164 0165 There are exceptions to almost any rule and describing all of them would extend the page 0166 count of this document well beyond what you would be willing to read; the left-out exceptions 0167 0168 are the 10–20% not discussed here. Furthermore, some aspects are discussed in more detail 0169 than others (which is motivated by our work/judgement). Aspects addressing more advanced 0170 users are marked with an A . 0171 0172 Disclaimer This document does not exist to torture you(r workflow) to use a specific kind of 0173 operating system, editor, software, etc. It should rather point out how we tackle certain 0174 0175 problems (and partly, but not always (!), why we solve them like this). You may or may not 0176 find this helpful, the principle do not like it? do not use it! applies. 0177 0178 We will constantly update this document. Therefore, there will never be a version which can 0179 be considered final. 0180 0181 The guidelines are organized as follows. In Section 2, we give general suggestions for written or 0182 coding intensive projects. Section 3 briefly addresses the importance and choice of text editors. 0183 A 0184 Section 4 and Section 5 point out recommendations when working with LTEX and R, respectively. 0185 0186 0187 2 General suggestions 0188 0189 2.1 Forget about the Pareto principle (80–20 rule) 0190 0191 Definition The Pareto principle (or 80–20 rule) says that for many events, 80% of the final 0192 0193 outcome/result/effect is achieved by 20% of the input/causes. 0194 0195 Meaning Essentially, this means that one should stop after having spent 20% of the time/effort 0196 one could spend on the project, since all additional effort would just improve the outcome by 0197 the remaining 20%. 0198 3 © Marius Hofert, Ulf Schepsmeier 2 General suggestions 0199 Why not? The Pareto principle is frequently used in many areas. However, it does not apply 0200 0201 to scientific work. If you write a research paper, for example, it will come back for revision 0202 at some time. You certainly do not want to realize a year later, that you now actually have 0203 to start over with the whole work (instead of doing just a revision). Furthermore, if a referee 0204 0205 feels that you only spent 20% of the effort on the submitted work, this most likely results in a 0206 rejection. Do your homework, work hard and exclusively on the topic and you will get a result 0207 0208 you can be happy with. Also, your supervisor is happy to learn about what comes (far) after 0209 the 80% (in contrast to hearing about well-known results). Keep that in mind at any stage. 0210 0211 0212 2.2 When solving a particular problem for the first time, spend time on it 0213 0214 In programming, there is this basic (unwritten) law (maybe applying as well to research in 0215 general): 0216 0217 1) If you have a problem, search for it. 0218 The chance that you are the first one working on this problem is small. Others may have 0219 0220 already solved the problem (in an elegant, optimal, fast and readable way). 0221 2) If you cannot find a solution, search more, search differently, search longer – but search for it! 0222 0223 3) If you still cannot find it, go back to 2). 0224 0225 4) If you are sure there exists no (good) solution, write your own.

Load more