How Do You Know Your Spreadsheet Is Right? Principles, Techniques and Practice of Spreadsheet Style Philip L
Total Page:16
File Type:pdf, Size:1020Kb
How do you know your spreadsheet is right? Principles, Techniques and Practice of Spreadsheet Style Philip L. Bewig — July 28, 2005 You know it’s true: Spreadsheets have errors a clear understanding of the requirements of like dogs have fleas.1 It is generally accepted2 your task. Sketch your design on a whiteboard, that nine out of every ten spreadsheets suffer and look for flaws. Consider alternate software some error, and consequences can be severe:3 tools such as databases,10 statistics packages,11 financial modeling systems,12,13 business intelli- • A cut-and-paste error cost TransAlta $24 million 14,15 when it underbid an electricity-supply contract.4 gence systems, mathematical programming languages,16,17,18 and traditional computer pro- • A missing minus sign caused Fidelity’s Magellan Fund to overstate projected earnings by $2.6 billion gramming languages. This is the most funda- (yes, billion) and miss a promised dividend.5 mental level of your work, and the most creative moment in the entire existence of your spread- • Falsely-linked spreadsheets permitted fraud totaling $700 million at the Allied Irish Bank.6 sheet. An error here can be hard to fix, requir- ing massive rearrangements of the spreadsheet • Voting officials reported spreadsheet irregularities in structure or new inputs from new sources. New Mexico7 and South Africa.8 • A new drug introduction was delayed several months by an untested macro, costing the pharmaceutical Know the players. The reader sees the printed company profits and its patients misery.9 output, and uses it to make a decision; he relies on you to organize and present the data he You can’t eliminate errors from the spreadsheets needs, as he needs it. The user inputs data, op- you develop, but you can reduce their number. erates macros, and prints output, but doesn’t The principles and techniques described below, modify anything; he relies on you to provide applied consistently, will improve the quality of adequate instructions. You, the developer, de- your spreadsheets. The discussion assumes Ex- sign and implement the structure and all the cel, but the principles and techniques apply eve- formulas in the spreadsheet. The auditor checks rywhere. The spreadsheet shown below will be the work of the developer; he relies on you to used as a practical example: produce a clean design and good documenta- tion. The sponsor assigns the task, provides re- sources, and has overall responsibility for the spreadsheet; he relies on you to meet his speci- fications. In many cases some of these roles overlap; keep them all in mind as you develop your spreadsheet. Make your spreadsheet as simple as possible, but no simpler.19 Most spreadsheets work well enough with a few SUMs and IFs, and using functions like SUMPRODUCT or features like array formulas, or writing your own macros and func- tions, can make a spreadsheet harder to read and understand than it should be. On the other hand, Think before you write. Resist the urge to don’t “dumb down” your spreadsheet, feel free jump right in to actual development. Start with to hide complex logic in user-defined functions, Copyright © 2005 by Philip L. Bewig of Saint Louis, Missouri, USA. All rights reserved. This work is available under the Creative Commons Attribution- NonCommercial-NoDerivs License. For information regarding this license, visit http://creativecommons.org/licenses/by-nc-nd/2.0/ or write to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. Excel is a registered trademark of Microsoft Corporation. How do you know your spreadsheet is right? Page 2 and if some advanced feature will simplify your Worksheets can’t be “dropped in” and reused, task, use it. Be wary of features just added to or nor can they be checked individually without changed in the newest version of Excel, where reference to other worksheets. bugs are likeliest to lurk; for instance, the RAND function has changed with each version of Ex- Keep your entire spreadsheet on a single tab, cel, and statisticians claim20,21 it’s still not right. intermixing input, logic and reports.26 You can’t see your whole spreadsheet at a glance if it Plan to throw one away; you will, anyway.22 occupies multiple worksheets. Multiple sheets Prototypes are useful for spreadsheet developers make formulas longer and harder to read be- for the same reason that scale models are useful cause the sheet name must be included. They for architects; they help you visualize what you breed spurious cells (cells that simply copy are building. They can help you add meaning to other cells without calculation, like =R12C4) be- ill-defined specifications, demonstrate a partial cause the spreadsheet developer wants to see the solution, or work out tricky formulas. Fre- precedent cell. The auditing toolbar fails with quently, prototypes grow into a solution; some- multiple sheets because arrows don’t go to off- times, the prototype is the solution, and the sheet cells, and searches are confined to the se- whole problem need never be solved. lected worksheet. With input and logic inter- mixed, arcs of precedence are shortened. Design for change. Few spreadsheets are still- born; most evolve and grow through countless Lay out your spreadsheet as determined by versions, then may be copied with a new name the needs of your problem. Obviously, the next month when the process starts anew. two previous suggestions conflict, and in fact no Brooks says: “All successful software gets single design is always right; the size of the changed.”23 The best place to plan for change is spreadsheet, frequency of off-sheet references, during the initial design of the spreadsheet. Re- complexity of formatting, and many other fac- flection, described below, is a useful tool for tors must all be considered. The sample spread- implementing that plan. Building a change- sheet has four sections: growth rates (input) at tolerant spreadsheet isn’t much harder than the top of the first column, tax rate (input) in an building the other kind, but so much better for unlabelled cell at the bottom of the first column, the poor fellow who has to modify your work; base values (input) and growth amounts (calcu- you’ll especially appreciate the initial effort if lated) in the top rows, and income statement you are, yourself, that poor fellow. (calculated) in the bottom rows. An alternate layout would have base values in a section next Keep input, logic, and reports in separate to growth rates, with formulas that copy base sections of a spreadsheet, preferably on dif- values into the calculation area of the spread- ferent tabs.24 That way you can always see the sheet; this design works well if there are many assumptions neatly in a single place, formulas input values or the calculation section is large. are less likely to be overwritten, you know Some people will object to the inclusion of fixed where to go to make changes, and output can be costs twice on the spreadsheet, saying that one formatted for the reader while logic can be laid or the other is spurious and should be removed, out for the developer. If you can’t put the three but that’s a consequence of separating logic sections on separate tabs, arrange them on a sin- from output; in some cases it will be sufficient gle tab in a stepped diagonal so that rows and to have only a single appearance of the number, columns can be inserted or deleted in each area but if logic and output are both large, it may without affecting the other areas. But beware make sense to have the number appear twice. the false modularity of separate worksheets; since all cells in a worksheet are globally read- Make your spreadsheet read top-to-bottom able, and globally writeable with a macro, using and left-to-right. All dependent arrows should separate worksheets hides no information.25 point down, right, or somewhere in between. How do you know your spreadsheet is right? Page 3 One exception is when the beginning balance at rect in succession (a “cascade” of cells) in order the top of one column depends on the ending for the whole spreadsheet to be correct. balance at the bottom of the previous column. Mathematically,29 this function is 1–(1–e)n, which grows asymptotically to 100%, as shown Build a complicated spreadsheet in stages. in the graph below; with a 5% error rate, even a Let it grow, but always with a working partial cascade of only six cells gives more than a 25% 27 solution at hand. Test as you go, so you have chance of overall error. You can reduce the cell confidence in the pieces as well as the whole. error rate by careful checking, but it’s generally Fix problems immediately; don’t leave them for easier to restructure the computation to reduce the next version. the length of the cascade. The sample spread- sheet has seventeen cascades, six of six cells, Draw the dependency graph. “A picture is 28 eight of five cells, two of four cells, and one of worth a thousand words.” The dependency three cells, as shown in the diagram above graph of the sample spreadsheet looks like this: (every cascade through Pretax Earnings counts Base twice, since it has two out-arrows); it also has a Value one-cell cascade for the year captions that is not included in the diagram. Growth Rate Price Per Unit 100.0% Base 75.0% Value C a sc a de e r r or 50.0% Growth r a t e Rate Unit 100 80 Sales 25.0% 60 Sales 40 C a sc a de l e ngt h Base 0.0% 20 5.0% 4.0% Value 3.0% 0 2.0% 1.0% 0 Er r or r a t e Growth Rate Cost Document the design of your spreadsheet on a separate HOWTO tab.