The Programming Historian
Total Page:16
File Type:pdf, Size:1020Kb
The Programming Historian The Programming Historian is an open-access introduction to programming in Python, aimed at working historians (and other humanists) with little previous experience. There are two editions available here; the second is currently under development. We are constantly adding new material, much of it driven by reader request. We welcome questions, corrections and suggestions for improvement. At this point we are still figuring out how best to allow community participation, while maintaining the coherence and direction of a more monographic work. If you e-mail us at [email protected], [email protected] and/or [email protected], we are happy to respond to you personally and try to incorporate your comments. In the future we may come up with something more elegant... but, hey, it's a work in progress. • William J. Turkel, Adam Crymble and Alan MacEachern, The Programming Historian, 2nd ed. NiCHE: Network in Canadian History & Environment (2009-). • William J. Turkel and Alan MacEachern, The Programming Historian, 1st ed. NiCHE: Network in Canadian History & Environment (2007-08). Introductory lessons teach you how to • install Zotero, the Python programming language and other useful tools • read and write data files • save web pages and automatically extract information from them • count word frequencies • remove stop words • automatically refine searches • make n-gram dictionaries • create keyword-in-context (KWIC) displays • make tag clouds, and • harvest sets of hyperlinks Table of Contents 0. About this book...........................................................................................................................................3 1. Do you need to learn how to program?.......................................................................................................4 Techniques that don't involve programming..............................................................................................4 Why you might want to learn to program..................................................................................................4 What kind of techniques you will learn.....................................................................................................5 2. Getting started.............................................................................................................................................5 Install and set up software..........................................................................................................................5 Linux instructions.............................................................................................................................6 Mac instructions................................................................................................................................7 Windows instructions.......................................................................................................................8 "Hello world" in Python.............................................................................................................................9 Interacting with a Python shell...................................................................................................................9 Linux instructions.............................................................................................................................9 Mac instructions................................................................................................................................9 Windows instructions.....................................................................................................................10 "Hello world" in JavaScript.....................................................................................................................11 Viewing HTML files................................................................................................................................11 "Hello World" in HTML..........................................................................................................................12 "Hello World" in embedded JavaScript...................................................................................................13 Back up your work...................................................................................................................................13 Keep in touch with us...............................................................................................................................13 Other resources.........................................................................................................................................14 Suggested readings...................................................................................................................................14 3. Working with files and web pages............................................................................................................14 Making use of your ability to do close reading........................................................................................14 Sending information to text files..............................................................................................................15 Getting information from text files..........................................................................................................15 Splitting code into modules and functions...............................................................................................16 About URLs.............................................................................................................................................17 Opening URLs with Python.....................................................................................................................18 Saving a local copy of a web page...........................................................................................................19 Suggested Readings.................................................................................................................................20 4. From HTML to a list of words..................................................................................................................20 Getting rid of HTML formatting..............................................................................................................20 More about Python strings.......................................................................................................................20 Looping....................................................................................................................................................22 Branching.................................................................................................................................................22 The stripTags routine...............................................................................................................................23 Python lists...............................................................................................................................................23 Suggested Readings.................................................................................................................................25 5. Computing frequencies.............................................................................................................................25 Useful measures of a text.........................................................................................................................25 Cleaning up the list...................................................................................................................................25 Our first use of regular expressions.........................................................................................................26 Python dictionaries...................................................................................................................................27 Counting word frequencies......................................................................................................................28 From HTML to a dictionary of word-frequency pairs.............................................................................29 Removing stop words...............................................................................................................................30 Putting it all together................................................................................................................................31 Suggested Readings.................................................................................................................................32 6. Wrapping output in HTML.......................................................................................................................32 Putting new information where you can use it.........................................................................................32 Python string formatting..........................................................................................................................33 Creating HTML output............................................................................................................................33 Sending HTML output to Firefox............................................................................................................34