The Fivethirtyeight R Package: "Tame Data" Principles for Introductory Statistics and Data Science Courses
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Long Red Thread How Democratic Dominance Gave Way to Republican Advantage in Us House of Representatives Elections, 1964
THE LONG RED THREAD HOW DEMOCRATIC DOMINANCE GAVE WAY TO REPUBLICAN ADVANTAGE IN U.S. HOUSE OF REPRESENTATIVES ELECTIONS, 1964-2018 by Kyle Kondik A thesis submitted to Johns Hopkins University in conformity with the requirements for the degree of Master of Arts Baltimore, Maryland September 2019 © 2019 Kyle Kondik All Rights Reserved Abstract This history of U.S. House elections from 1964-2018 examines how Democratic dominance in the House prior to 1994 gave way to a Republican advantage in the years following the GOP takeover. Nationalization, partisan realignment, and the reapportionment and redistricting of House seats all contributed to a House where Republicans do not necessarily always dominate, but in which they have had an edge more often than not. This work explores each House election cycle in the time period covered and also surveys academic and journalistic literature to identify key trends and takeaways from more than a half-century of U.S. House election results in the one person, one vote era. Advisor: Dorothea Wolfson Readers: Douglas Harris, Matt Laslo ii Table of Contents Abstract…………………………………………………………………………………....ii List of Tables……………………………………………………………………………..iv List of Figures……………………………………………………………………………..v Introduction: From Dark Blue to Light Red………………………………………………1 Data, Definitions, and Methodology………………………………………………………9 Chapter One: The Partisan Consequences of the Reapportionment Revolution in the United States House of Representatives, 1964-1974…………………………...…12 Chapter 2: The Roots of the Republican Revolution: -
Downloaded from Ensembl V41 (6), All
IN SILICO APPROACHES TO INVESTIGATING MECHANISMS OF GENE REGULATION by SHANNAN JANELLE HO SUI B.Sc., The University of British Columbia, 2000 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Genetics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) March 2008 © Shannan Janelle Ho Sui, 2008 Abstract Identification and characterization of regions influencing the precise spatial and temporal expression of genes is critical to our understanding of gene regulatory networks. Connecting transcription factors to the cis-regulatory elements that they bind and regulate remains a challenging problem in computational biology. The rapid accumulation of whole genome sequences and genome-wide expression data, and advances in alignment algorithms and motif-finding methods, provide opportunities to tackle the important task of dissecting how genes are regulated. Genes exhibiting similar expression profiles are often regulated by common transcription factors. We developed a method for identifying statistically over- represented regulatory motifs in the promoters of co-expressed genes using weight matrix models representing the specificity of known factors. Application of our methods to yeast fermenting in grape must revealed elements that play important roles in utilizing carbon sources. Extension of the method to metazoan genomes via incorporation of comparative sequence analysis facilitated identification of functionally relevant binding sites for sets of tissue-specific genes, and for genes showing similar expression in large-scale expression profiling studies. Further extensions address alternative promoters for human genes and coordinated binding of multiple transcription factors to cis-regulatory modules. Sequence conservation reveals segments of genes of potential interest, but the degree of sequence divergence among human genes and their orthologous sequences varies widely. -
Online Media and the 2016 US Presidential Election
Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Faris, Robert M., Hal Roberts, Bruce Etling, Nikki Bourassa, Ethan Zuckerman, and Yochai Benkler. 2017. Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election. Berkman Klein Center for Internet & Society Research Paper. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:33759251 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA AUGUST 2017 PARTISANSHIP, Robert Faris Hal Roberts PROPAGANDA, & Bruce Etling Nikki Bourassa DISINFORMATION Ethan Zuckerman Yochai Benkler Online Media & the 2016 U.S. Presidential Election ACKNOWLEDGMENTS This paper is the result of months of effort and has only come to be as a result of the generous input of many people from the Berkman Klein Center and beyond. Jonas Kaiser and Paola Villarreal expanded our thinking around methods and interpretation. Brendan Roach provided excellent research assistance. Rebekah Heacock Jones helped get this research off the ground, and Justin Clark helped bring it home. We are grateful to Gretchen Weber, David Talbot, and Daniel Dennis Jones for their assistance in the production and publication of this study. This paper has also benefited from contributions of many outside the Berkman Klein community. The entire Media Cloud team at the Center for Civic Media at MIT’s Media Lab has been essential to this research. -
Politics in Science 1 Running Head: Politics And
Politics in Science 1 Running Head: Politics and Science Is research in social psychology politically biased? Systematic empirical tests and a forecasting survey to address the controversy Orly Eitan† INSEAD Domenico Viganola† Stockholm School of Economics Yoel Inbar† University of Toronto Anna Dreber Stockholm School of Economics and University of Innsbruck Magnus Johannesson Stockholm School of Economics Thomas Pfeiffer Massey University Stefan Thau & Eric Luis Uhlmann†* INSEAD †First three and last author contributed equally *Corresponding author: Eric Luis Uhlmann ([email protected]) Politics in Science 2 Abstract The present investigation provides the first systematic empirical tests for the role of politics in academic research. In a large sample of scientific abstracts from the field of social psychology, we find both evaluative differences, such that conservatives are described more negatively than liberals, and explanatory differences, such that conservatism is more likely to be the focus of explanation than liberalism. In light of the ongoing debate about politicized science, a forecasting survey permitted scientists to state a priori empirical predictions about the results, and then change their beliefs in light of the evidence. Participating scientists accurately predicted the direction of both the evaluative and explanatory differences, but at the same time significantly overestimated both effect sizes. Scientists also updated their broader beliefs about political bias in response to the empirical results, providing -
Disinformation, and Influence Campaigns on Twitter 'Fake News'
Disinformation, ‘Fake News’ and Influence Campaigns on Twitter OCTOBER 2018 Matthew Hindman Vlad Barash George Washington University Graphika Contents Executive Summary . 3 Introduction . 7 A Problem Both Old and New . 9 Defining Fake News Outlets . 13 Bots, Trolls and ‘Cyborgs’ on Twitter . 16 Map Methodology . 19 Election Data and Maps . 22 Election Core Map Election Periphery Map Postelection Map Fake Accounts From Russia’s Most Prominent Troll Farm . 33 Disinformation Campaigns on Twitter: Chronotopes . 34 #NoDAPL #WikiLeaks #SpiritCooking #SyriaHoax #SethRich Conclusion . 43 Bibliography . 45 Notes . 55 2 EXECUTIVE SUMMARY This study is one of the largest analyses to date on how fake news spread on Twitter both during and after the 2016 election campaign. Using tools and mapping methods from Graphika, a social media intelligence firm, we study more than 10 million tweets from 700,000 Twitter accounts that linked to more than 600 fake and conspiracy news outlets. Crucially, we study fake and con- spiracy news both before and after the election, allowing us to measure how the fake news ecosystem has evolved since November 2016. Much fake news and disinformation is still being spread on Twitter. Consistent with other research, we find more than 6.6 million tweets linking to fake and conspiracy news publishers in the month before the 2016 election. Yet disinformation continues to be a substantial problem postelection, with 4.0 million tweets linking to fake and conspiracy news publishers found in a 30-day period from mid-March to mid-April 2017. Contrary to claims that fake news is a game of “whack-a-mole,” more than 80 percent of the disinformation accounts in our election maps are still active as this report goes to press. -
The Pennsylvania State University the Graduate School College of Communications the RISE and FALL of GRANTLAND a Thesis in Medi
The Pennsylvania State University The Graduate School College of Communications THE RISE AND FALL OF GRANTLAND A Thesis in Media Studies by Roger Van Scyoc © 2018 Roger Van Scyoc Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Arts May 2018 The thesis of Roger Van Scyoc was reviewed and approved* by the following: Russell Frank Associate Professor of Journalism Thesis Adviser Ford Risley Professor of Journalism Associate Dean for Undergraduate and Graduate Education Kevin Hagopian Senior Lecturer of Media Studies John Affleck Knight Chair in Sports Journalism and Society Matthew McAllister Professor of Media Studies Chair of Graduate Programs *Signatures are on file in the Graduate School ii ABSTRACT The day before Halloween 2015, ESPN pulled the plug on Grantland. Spooked by slumping revenues and the ghost of its ousted leader Bill Simmons, the multimedia giant axed the sports and pop culture website that helped usher in a new era of digital media. The website, named for sports writing godfather Grantland Rice, channeled the prestige of a bygone era while crystallizing the nature of its own time. Grantland’s writers infused their pieces with spry commentary, unabashed passion and droll humor. Most importantly, they knew what they were writing about. From its birth in June 2011, Grantland quickly became a hub for educated sports consumption. Grantland’s pieces entertained and edified. Often vaulting over 1,000 words, they also skewed toward a more affluent and more educated audience. The internet promoted shifts and schisms by its very nature. Popular with millennials, Grantland filled a certain niche. -
Survey of Registered Voters in Texas
Texas Registered Voter Sample Field Dates: April 18-27, 2020 N= 1183 Adults (Registered Voters) Margin of error: +/- 2.85% Democratic Primary Run-off Sample, 447 Voters Margin of error: +/- 4.64% Survey of Registered Voters in Texas Do you consider yourself to be a Republican, Democrat, or neither? Code Weighted 1 Republican 38% 2 Democrat 39 3 Neither 24 Total = 1183 [If Republican or Democrat] Do you consider yourself to be a strong [Republican/ Democrat] or not strong [Republican/ Democrat]? OR [If independent, no preference, or other party] Do you think of yourself as closer to the Republican Party or to the Democratic Party? Code Weighted 1 Strong Republican 28% 2 Not strong Republican 11 3 Lean Republican, Independent 7 4 Lean to no Party, Independent 8 5 Lean Democratic, Independent 9 6 Not strong Democrat 13 7 Strong Democrat 24 Total = 1183 Using a 7-point scale where 1 is extremely liberal and 7 is extremely conservative, how would you rate your political views. Code Weighted Dem. Ind. Rep. 1 Extremely Liberal 7% 13% 3% 5% 2 Liberal 13 25 9 3 3 Slightly Liberal 7 11 8 1 4 Moderate, Middle of the Road 28 34 41 15 5 Slightly Conservative 11 7 13 13 6 Conservative 18 5 13 34 7 Extremely Conservative 12 2 4 26 8 Don’t Know 4 3 9 2 Total = 1183 417 277 489 1 How much do you agree with the statement? The coronavirus is a major health threat. Code Weighted Dem. Ind. Rep. 1 Strongly agree 68% 84% 66% 52% 2 Agree 20 10 16 32 3 Neither agree not disagree 6 4 8 8 4 Disagree 4 1 7 5 5 Strongly disagree 2 1 2 2 Total = 1183 417 277 489 How much do you agree with the statement? COVID-19 is as severe than the common flu. -
DAVID WASSERMAN Senior Election Analyst, the Cook Political Report and NBC Contributor
DAVID WASSERMAN Senior Election Analyst, The Cook Political Report and NBC Contributor David Wasserman is the U.S. House editor and senior election analyst for the non-partisan newsletter, The Cook Political Report, and a contributor to NBC News. Founded in 1984, The Cook Political Report provides analyses of U.S. presidential, Senate, House, and gubernatorial races. The New York Times called The Cook Political Report “a newsletter that both parties regard as authoritative.” Wasserman analyzes the current political environment in lively and entertaining presentations that he can tailor to his audiences’ specific interests or locales. His data-driven forecasting looks at both national and local trends (if requested, he can even do a district-by-district outlook), the relationship between consumer brand loyalty and voting, and what the future holds for American elections. He is exclusively represented by Leading Authorities speakers bureau. Highly-Praised Expertise. Nate Silver of FiveThirtyEight.com has written: “Wasserman’s knowledge of the nooks and crannies of political geography can make him seem like a local,” and the Los Angeles Times called Wasserman “whip smart” and a “scrupulously nonpartisan” analyst whose “numbers nerd-dom was foretold at a young age.” Chuck Todd, host of NBC's Meet the Press, recently called David "pretty much the only person you need to follow on Election Night." In 2016, Wasserman drew wide praise for his accurate pre-election analysis, including his uncanny September piece entitled, “How Trump Could Win the White House While Losing the Popular Vote.” Leading Republican pollster Kristen Soltis Anderson concluded, “Nobody of my generation knows American politics as well as Dave Wasserman.” Political Intelligence. -
UPI/Cvoter Poll State Tracker 2016
UPI/CVoter Poll State Tracker 2016 Current state projections are based on UPI/CVoter’s state tracking poll conducted online during the last two weeks among 18+ adults nationwide, including likely voters, details of which are mentioned beside the projections as of today. The CVoter State Poll is different from the National Poll. In our National Poll, we are tracking opinions of more than 200 people each day, leading to a national representative sample size of at least 1,400 people during any seven-day span. In our State Poll, we are tracking opinions of about 250 likely voters in each state every week, leading to a state representative sample size of about 500 likely voters during any two-week span. In total, we are covering about 25,000 samples in this time frame across all the states for our state projections. The data is weighted to the known demographic profile of the United States, including the Census. Sometimes the table figures do not sum to 100 due to the effects of rounding. pg. 1 UPI/CVoter Tracking Poll State Tracker2016 If the U.S. presidential election was held today, which candidate would you vote for? *LV Only* STATE CLINTON TRUMP OTHERS MARGIN Alabama 36.1 58.4 5.4 -22.3 Alaska 40.1 54.9 5.0 -14.9 Arizona 41.7 51.8 6.6 -10.1 Arkansas 36.0 57.5 6.5 -21.5 California 57.2 38.1 4.6 19.1 Colorado 48.9 45.4 5.8 3.5 Connecticut 53.7 41.9 4.4 11.8 Delaware 54.2 42.4 3.4 11.9 Florida 46.4 48.2 5.4 -1.8 Georgia 42.8 50.6 6.6 -7.8 Hawaii 64.0 32.5 3.5 31.5 Idaho 33.0 60.6 6.4 -27.6 Illinois 54.2 40.7 5.1 13.5 Indiana 40.9 54.4 4.7 -
The Grifters, Chapter 3 – Election Prediction November 9, 2020
The Grifters, Chapter 3 – Election Prediction November 9, 2020 That’s Nate Silver, founder and face of election “modeler” FiveThirtyEight, performing his traditional “Awkshually, we weren’t wrong” dance after mangling yet another national election. Haha. No, that’s a falsehood, as the fact checkers would say. That claim was made with no evidence, as an ABC News reporter would say. In truth, this is a picture of Nate Silver speaking at the “ABC Leadership Breakfast” during Advertising Week XII. Of course Advertising Week uses the same numbering system as the SuperBowl ™. That would be 2015 in normie text, about a year prior to FiveThirtyEight’s mangling of the prior national election. You will only see Nate Silver on ABC News and other ABC media properties and events, because FiveThirtyEight is a wholly-owned subsidiary of ABC News. ABC News, of course, is a wholly-owned subsidiary of The Walt Disney Corporation. Hold that thought. ©2020 Ben Hunt 1 All rights reserved. That’s Fivey Fox, the FiveThirtyEight cartoon mascot, who is happy to guide you through the genius-level mathematics and super-science that “powers” FiveThirtyEight’s election models. You may have also seen Fivey Fox on ABC News programming, as part of a weekly animated cartoon segment broadcast over the past nine months to “inform” viewers about “how the election actually works”. For all you FiveThirtyEight and ABC News viewers, I’d guess that most of you find Fivey Fox and the cartoon infographics pretty cringey. I’d guess that most of you believe, however, that these animated cartoons are not aimed at you, but at “low-information” viewers who are not easily capable of understanding how the election actually works, and certainly not capable of understanding the genius-level mathematics and super-science behind FiveThirtyEight’s election models. -
Tools for Reproducible Research Organizing Projects; Exploratory Data Analysis
Tools for Reproducible Research Organizing projects; exploratory data analysis Karl Broman Biostatistics & Medical Informatics, UW–Madison kbroman.org github.com/kbroman @kwbroman Course web: kbroman.org/Tools4RR I’m trying to cover two things here: how to organize data analysis projects, so in the end the results will be reproducible and clear, and how to capture the results of exploratory data analysis. The hardest part, regarding organizing projects, concerns how to coordinate collaborative projects: to keep data, code, and results synchronized among collaborators. Regarding exploratory data analysis, we want to capture the whole process: what you’re trying to do, what you’re thinking about, what you’re seeing, and what you’re concluding and why. And we want to do so without getting in the way of the creative process. I’ll sketch what I try to do, and the difficulties I’ve had. But I don’t have all of the answers. File organization and naming are powerful weapons against chaos. – Jenny Bryan 2 You don’t need to be organized, but it sure will help others (or yourself, later), when you try to figure out what it was that you did. Segregate all the materials for a project in one directory/folder on your harddrive. I prefer to separate raw data from processed data, and I put code in a separate directory. Write ReadMe files to explain what’s what. Organizing your stuff Code/d3examples/ /Others/ /PyBroman/ /Rbroman/ /Rqtl/ /Rqtlcharts/ Docs/Talks/ /Meetings/ /Others/ /Papers/ /Resume/ /Reviews/ /Travel/ Play/ Projects/AlanAttie/ /BruceTempel/ /Hassold_QTL/ /Hassold_Age/ /Payseur_Gough/ /PhyloQTL/ /Tar/ 3 This is basically how I organize my hard drive. -
A Guide to Teaching Data Science
A Guide to Teaching Data Science 1,2 1,2 Stephanie C. Hicks , Rafael A. Irizarry 1 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 2 Department of Biostatistics, Harvard School of Public Health, Boston, MA Emails: Stephanie C. Hicks, [email protected] Rafael A. Irizarry, [email protected] Abstract Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking this training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate that the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching data science courses centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching a course.