Econometric Data Science

Total Page:16

File Type:pdf, Size:1020Kb

Econometric Data Science Econometric Data Science Francis X. Diebold University of Pennsylvania October 22, 2019 1 / 280 Copyright c 2013-2019, by Francis X. Diebold. All rights reserved. All materials are freely available for your use, but be warned: they are highly preliminary, significantly incomplete, and rapidly evolving. All are licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. (Briefly: I retain copyright, but you can use, copy and distribute non-commercially, so long as you give me attribution and do not modify. To view a copy of the license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I ask that you please cite the books whenever appropriate, as: "Diebold, F.X. (year here), Book Title Here, Department of Economics, University of Pennsylvania, http://www.ssc.upenn.edu/ fdiebold/Textbooks.html." The painting is Enigma, by Glen Josselsohn, from Wikimedia Commons. 2 / 280 Introduction 3 / 280 Numerous Communities Use Econometrics Economists, statisticians, analysts, "data scientists" in: I Finance (Commercial banking, retail banking, investment banking, insurance, asset management, real estate, ...) I Traditional Industry (manufacturing, services, advertising, brick-and-mortar retailing, ...) I e-Industry (Google, Amazon, eBay, Uber, Microsoft, ...) I Consulting (financial services, litigation support, ...) I Government (treasury, agriculture, environment, commerce, ...) I Central Banks and International Organizations (FED, IMF, World Bank, OECD, BIS, ECB, ...) 4 / 280 Econometrics is Special Econometrics is not just \statistics using economic data". Many properties and nuances of economic data require knowledge of economics for sucessful analysis. I Emphasis on predictions, guiding decisions I Observational data I Structural change I Volatility fluctuations ("heteroskedasticity") I Even trickier in time series: Trend, Seasonality, Cycles ("serial correlation") 5 / 280 Let's Elaborate on the \Emphasis on Predictions Guiding Decisions"... Q: What is econometrics about, broadly? A: Helping people to make better decisions I Consumers I Firms I Investors I Policy makers I Courts Forecasts guide decisions. Good forecasts promote good decisions. Hence prediction holds a distinguished place in econometrics, and it will hold a distinguished place in this course. 6 / 280 Types/Arrangements of Economic Data { Cross section Standard cross-section notation: i = 1; :::; N { Time series Standard time-series notation: t = 1; :::; T Much of our discussion will apply to both cross-section and time-series environments, but still we have to pick a notation. 7 / 280 A Few Leading Econometrics Web Data Resources (Clickable) Indispensible: I Resources for Economists (AEA) I FRED (Federal Reserve Economic Data) More specialized: I National Bureau of Economic Research I FRB Phila Real-Time Data Research Center I Many more 8 / 280 A Few Leading Econometrics Software Environments (Clickable) I High-Level: EViews, Stata I Mid-Level: R (also CRAN; RStudio; R-bloggers), Python (also Anaconda), Julia I Low-Level: C, C++, Fortran \High-level" does not mean \best", and \low-level" does not mean worst. There are many issues. 9 / 280 Graphics Review 10 / 280 Graphics Help us to: I Summarize and reveal patterns in univariate cross-section data. Histograms and density estimates are helpful for learning about distributional shape. Symmetric, skewed, fat-tailed, ... I Summarize and reveal patterns in univariate time-series data. Time Series plots are useful for learning about dynamics. Trend, seasonal, cycle, outliers, ... I Summarize and reveal patterns in multivariate data (cross-section or time-series). Scatterplots are useful for learning about relationships. Does a relationship exist? Is it linear or nonlinear? Are there outliers? 11 / 280 Histogram Revealing Distributional Shape: 1-Year Government Bond Yield 12 / 280 Time Series Plot Revealing Dynamics: 1-Year Goverment Bond Yield 13 / 280 Scatterplot Revealing Relationship: 1-Year and 10-Year Government Bond Yields 14 / 280 Some Principles of Graphical Style I Know your audience, and know your goals. I Appeal to the viewer. I Show the data, and only the data, withing the bounds of reason. I Avoid distortion. The sizes of effects in graphics should match their size in the data. Use common scales in multiple comparisons. I Minimize, within reason, non-data ink. Avoid chartjunk. I Third, choose aspect ratios to maximize pattern revelation. Bank to 45 degrees. I Maximize graphical data density. I Revise and edit, again and again (and again). Graphics produced using software defaults are almost never satisfactory. 15 / 280 Probability and Statistics Review 16 / 280 Moments, Sample Moments and Their Sampling Distributions I Discrete random variable, y I Discrete probability distribution p(y) I Continuous random variable y I Probability density function f (y) 17 / 280 Population Moments: Expectations of Powers of R.V.'s Mean measures location: X µ = E(y) = pi yi (discrete case) i Z µ = E(y) = y f (y) dy (continuous case) Variance, or standard deviation, measures dispersion, or scale: σ2 = var(y) = E(y − µ)2: { σ easier to interpret than σ2. Why? 18 / 280 More Population Moments Skewness measures skewness (!) E(y − µ)3 S = : σ3 Kurtosis measures tail fatness relative to a Gaussian distribution. E(y − µ)4 K = : σ4 19 / 280 Covariance and Correlation Multivariate case: Joint, marginal and conditional distributions f (x; y), f (x), f (y), f (xjy), f (yjx) Covariance measures linear dependence: cov(y; x) = E[(y − µy )(x − µx )]: So does correlation: cov(y; x) corr(y; x) = : σy σx Correlation is often more convenient. Why? 20 / 280 Sampling and Estimation N Sample : fyi gi=1 ∼ iid f (y) Sample mean: N 1 X y¯ = y N i i=1 Sample variance: PN 2 (yi − y¯) σ^2 = i=1 N Unbiased sample variance: PN 2 (yi − y¯) s2 = i=1 N − 1 21 / 280 More Sample Moments Sample skewness: 1 PN 3 (yi − y¯) S^ = N i=1 σ^3 Sample kurtosis: 1 PN 4 (yi − y¯) K^ = N i=1 σ^4 22 / 280 Still More Sample Moments Sample covariance: N 1 X cov(y; x) = [(y − y¯)(x − x¯)] d N i i i=1 Sample correlation: covd(y; x) corrd (y; x) = σ^y σ^x 23 / 280 Exact Finite-Sample Distribution of the Sample Mean (Requires iid Normality) 2 Simple random sampling : yi ∼ iid N(µ, σ ); i = 1; :::; N y¯ is unbiased and normally distributed with variance σ2=N. σ2 y¯ ∼ N µ, ; N and we estimate σ2 using s2, where PN (y − y¯)2 s2 = i=1 i : N − 1 s µ 2 y¯ ± t1− α (N − 1) p w:p: 1 − α 2 N y¯ − µ0 µ = µ0 =) ∼ t1− α (N − 1) ps 2 N where \t − α (N − 1)" denotes the appropriate critical value of the Student's t 1 2 density with N − 1 degrees of freedom 24 / 280 Large-Sample Distribution of the Sample Mean (Requires iid, but not Normality) 2 Simple random sampling : yi ∼ iid (µ, σ ); i = 1; :::; N y¯ is consistent and asymptotically normally distributed with variance v. a y¯ ∼ N(µ, v); and we estimate v usingv ^ = s2=N, where PN (y − y¯)2 s2 = i=1 i : N − 1 This is an approximate (large-sample) result, due to the central limit theorem. The \a" is for \asymptotically", which means \as N ! 1". s As N ! 1; µ 2 y¯ ± z1− α p w:p: 1 − α 2 N y¯ − µ0 As N ! 1; µ = µ0 =) ∼ N(0; 1) ps N 25 / 280 Wages: Distributions 26 / 280 Wages: Sample Statistics WAGE log WAGE Sample Mean 12:19 2.34 Sample Median 10:00 2.30 Sample Maximum 65:00 4.17 Sample Minimum 1:43 0.36 Sample Std. Dev. 7:38 0.56 Sample Skewness 1:76 0.06 Sample Kurtosis 7:93 2.90 Jarque-Bera 2027:86 1.26 (p = 0:00) (p = 0:53) t(H0 : µ = 12) 0.93 -625.70 (p = 0:36) (p = 0:00) 27 / 280 Regression 28 / 280 Regression A. As curve fitting. \Tell a computer how to draw a line through a scatterplot". (Well, sure, but there must be more...) B. As a probabilistic framework for optimal prediction. 29 / 280 Regression as Curve Fitting 30 / 280 Distributions of Log Wage, Education and Experience 31 / 280 Scatterplot: Log Wage vs. Education 32 / 280 Curve Fitting Fit a line: yi = β1 + β2xi Solve: N X 2 min (yi − β1 − β2xi ) β1,β2 i=1 \least squares" (LS, or OLS) \quadratic loss" 33 / 280 Actual Values, Fitted Values and Residuals Actual values: yi ; i = 1; :::; N Least-squares fitted parameters: β^1 and β^2 Fitted values:y ^i = β^1 + β^2xi ; i = 1; :::; N, (\hats" denote fitted things...) Residuals: ei = yi − y^i ; i = 1; :::; N. 34 / 280 Log Wage vs. Education with Superimposed Regression Line LWAGE\ = 1:273 + :081EDUC 35 / 280 Multiple Linear Regression (K RHS Variables) Solve: N X 2 min (yi − β1 − β2xi2 − ::: − βK xiK ) β1,...,βK i=1 Fitted hyperplane: y^i = β^1 + β^2xi2 + ::: + β^K xiK More compactly: K X y^i = β^k xik ; k=1 where xi1 = 1 for all i. Wage dataset: LWAGE\ = :867 + :093EDUC + :013EXPER 36 / 280 Regression as a Probability Model 37 / 280 An Ideal Situation (\The Ideal Conditions", or IC) 1. The data-generating process (DGP) is: yi = β1 + β2xi2 + ::: + βK xiK + "i 2 "i ∼ iidN(0; σ ) i = 1; :::; N; and the fitted model matches it exactly. 1.1 The fitted model is correctly specified 1.2 The disturbances are Gaussian 1.3 The coefficients (βk 's) are fixed 1.4 The relationship is linear 2 1.5 The "i 's have constant unconditional variance σ 1.6 The "i 's are uncorrelated 2. "i is independent of (xi1; :::; xiK ), for all i 2.1 E("i j xi1; :::; xiK ) = 0, for all i 2 2.2 var("i j xi1; :::; xiK ) = σ , for all i (Written here for cross sections.
Recommended publications
  • Instructions for Creating Your Own R Package∗
    Instructions for Creating Your Own R Package∗ In Song Kimy Phil Martinz Nina McMurryx Andy Halterman{ March 18, 2018 1 Introduction The following is a step-by-step guide to creating your own R package. Even beyond this course, you may find this useful for storing functions you create for your own research or for editing existing R packages to suit your needs. This guide contains three different sets of instructions. If you use RStudio, you can follow the \Ba- sic Instructions" in Section 2 which involve using RStudio's interface. If you do not use RStudio or you do use RStudio but want a little bit more of control, follow the instructions in Section 3. Section 4 illustrates how to create a R package with functions written in C++ via Rcpp helper functions. NOTE: Write all of your functions first (in R or RStudio) and make sure they work properly before you start compiling your package. You may also want to try compiling with a very simple function first (e.g. myfun <- function(x)fx + 7g). 2 Basic Instructions (for RStudio Users Only) All of the following should be done in RStudio, unless otherwise noted. Even if you build your package in RStudio using the \Basic Instructions," we strongly recommend that you carefully review the \Advanced Instructions" as well. RStudio has built-in tools that will do many of these steps for you, but knowing how to do them manually will make it easier for you to build and distribute your own packages in the future and/or adapt existing packages.
    [Show full text]
  • Rstudio Connect: Admin Guide Version 1.5.12-7
    RStudio Connect: Admin Guide Version 1.5.12-7 Abstract This guide will help an administrator install and configure RStudio Connect on a managed server. You will learn how to install the product on different operating systems, configure authentication, and monitor system resources. Contents 1 Introduction 4 1.1 System Requirements . .4 2 Getting Started 5 2.1 Installation . .5 2.2 Initial Configuration . .7 3 License Management 9 3.1 Capabilities . .9 3.2 Notification of Expiration . .9 3.3 Product Activation . .9 3.4 Connectivity Requirements . 10 3.5 Evaluations . 11 3.6 Licensing Errors . 12 3.7 Floating Licensing . 12 4 Files & Directories 15 4.1 Program Files . 15 4.2 Configuration . 15 4.3 Server Log . 15 4.4 Access Logs . 16 4.5 Application Logs . 16 4.6 Variable Data . 16 4.7 Backups . 18 4.8 Server Migrations . 18 5 Server Management 19 5.1 Stopping and Starting . 19 5.2 System Messages . 21 5.3 Health-Check . 21 5.4 Upgrading . 21 5.5 Purging RStudio Connect . 22 6 High Availability and Load Balancing 22 6.1 HA Checklist . 22 6.2 HA Limitations . 23 6.3 Updating HA Nodes . 24 6.4 Downgrading . 24 6.5 HA Details . 24 1 7 Running with a Proxy 25 7.1 Nginx Configuration . 26 7.2 Apache Configuration . 27 8 Security & Auditing 28 8.1 API Security . 28 8.2 Browser Security . 28 8.3 Audit Logs . 30 8.4 Audit Logs Command-Line Interface . 31 9 Database 31 9.1 SQLite . 31 9.2 PostgreSQL .
    [Show full text]
  • Fira Code: Monospaced Font with Programming Ligatures
    Personal Open source Business Explore Pricing Blog Support This repository Sign in Sign up tonsky / FiraCode Watch 282 Star 9,014 Fork 255 Code Issues 74 Pull requests 1 Projects 0 Wiki Pulse Graphs Monospaced font with programming ligatures 145 commits 1 branch 15 releases 32 contributors OFL-1.1 master New pull request Find file Clone or download lf- committed with tonsky Add mintty to the ligatures-unsupported list (#284) Latest commit d7dbc2d 16 days ago distr Version 1.203 (added `__`, closes #120) a month ago showcases Version 1.203 (added `__`, closes #120) a month ago .gitignore - Removed `!!!` `???` `;;;` `&&&` `|||` `=~` (closes #167) `~~~` `%%%` 3 months ago FiraCode.glyphs Version 1.203 (added `__`, closes #120) a month ago LICENSE version 0.6 a year ago README.md Add mintty to the ligatures-unsupported list (#284) 16 days ago gen_calt.clj Removed `/**` `**/` and disabled ligatures for `/*/` `*/*` sequences … 2 months ago release.sh removed Retina weight from webfonts 3 months ago README.md Fira Code: monospaced font with programming ligatures Problem Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like -> , <= or := are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet. Solution Download v1.203 · How to install · News & updates Fira Code is an extension of the Fira Mono font containing a set of ligatures for common programming multi-character combinations.
    [Show full text]
  • David O. Neville, Phd, MS Rev
    David O. Neville, PhD, MS Rev. 05 March 2021 The Center for Teaching, Learning, and Assessment Email: [email protected] 1119 6th Avenue Website: https://doktorfrag.com Grinnell College Twitter: https://twitter.com/doktorfrag Grinnell, IA 50112 Education MS Utah State University (Logan, Utah, USA, 2007) Instructional Technology and Learning Sciences Concentration in Computer Science; Business Information Systems PhD Washington University in St. Louis (Missouri, USA, 2002) German Language and Literature Concentration in Latin Language and Literature; Medieval Studies Ludwig-Maximilians-Universität (Munich, Germany, 1999-2000) DAAD Annual Scholarship (Jahresstipendium) AM Washington University in St. Louis (Missouri, USA, 1997) German Language and Literature BA, Brigham Young University (Provo, Utah, USA, 1994) Honors German Language and Literature Minor in Russian Language and Literature Employment Grinnell College 2015- Digital Liberal Arts Specialist Present The Digital Liberal Arts Collaborative Elon University 2014-15 Associate Professor of German Department of World Languages and Cultures 2008-14 Assistant Professor of German and Director of Language Learning Technologies Department of World Languages and Cultures Utah State 2006-08 Instructional Designer and Blackboard Administrator University Faculty Assistance Center for Teaching (FACT) 2004-06 Visiting Assistant Professor Department of Languages, Philosophy, and Speech Communication Washington 2002-03 Lecturer and Instructional Technology Specialist University Department of Germanic Languages and Literatures in St. Louis Curriculum Vitæ: David O. Neville, PhD, MS Pg. 1 Fellowships and Awards 2017 Top Three Print Poster in the 2017 Humanities, Arts, Science and Technology Alliance and Collaboratory (HASTAC) Conference Poster Competition: “Visualizing Difficult Historical Realities: The Uncle Sam Plantation Project.” With Sarah Purcell (Co-Presenter).
    [Show full text]
  • B.Tech(2018 Onwards)
    HERITAGE INSTITUTE OF TECHNOLOGY (An Autonomous Institute Under MAKAUT) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING B.Tech Course Structure June 2021 Dept. of CSE, HIT-K B. Tech in CSE, Course Structure Revised: June 2021 PART I: COURSE STRUCTURE Page 1 of 129 Dept. of CSE, HIT-K B. Tech in CSE, Course Structure Revised: June 2021 FIRST YEAR FIRST SEMESTER Contacts Sl. Code Subject Credit Periods/ Week Points L T P Total A. Theory 1 CHEM1001 Chemistry-I 3 1 0 4 4 2 MATH1101 Mathematics-I 3 1 0 4 4 3 ELEC1001 Basic Electrical Engineering 3 1 0 4 4 Total Theory 9 3 0 12 12 B. Practical 1 CHEM1051 Chemistry I Lab 0 0 3 3 1.5 2 ELEC1051 Basic Electrical Engineering Lab 0 0 2 2 1 3 MECH1052 Engineering Graphics & Design 1 0 4 5 3 Total Practical 1 0 9 10 5.5 Total of Semester without Honors 10 3 9 22 17.5 C. Honors 1 HMTS1011 Communication for Professionals 3 0 0 3 3 2. HMTS1061 Professional Communication Lab 0 0 2 2 1 Total Honors 3 0 2 5 4 Total of Semester with Honors 13 3 11 27 21.5 Page 2 of 129 Dept. of CSE, HIT-K B. Tech in CSE, Course Structure Revised: June 2021 FIRST YEAR SECOND SEMESTER Contacts Sl. Code Subject Credit Periods/ Week Points L T P Total A. Theory 1 PHYS1001 Physics I 3 1 0 4 4 2 MATH1201 Mathematics II 3 1 0 4 4 3 CSEN1001 Programming for Problem Solving 3 0 0 3 3 4 HMTS1202 Business English 2 0 0 2 2 Total Theory 11 2 0 13 13 B.
    [Show full text]
  • R Markdown Cheat Sheet I
    1. Workflow R Markdown is a format for writing reproducible, dynamic reports with R. Use it to embed R code and results into slideshows, pdfs, html documents, Word files and more. To make a report: R Markdown Cheat Sheet i. Open - Open a file that ii. Write - Write content with the iii. Embed - Embed R code that iv. Render - Replace R code with its output and transform learn more at rmarkdown.rstudio.com uses the .Rmd extension. easy to use R Markdown syntax creates output to include in the report the report into a slideshow, pdf, html or ms Word file. rmarkdown 0.2.50 Updated: 8/14 A report. A report. A report. A report. A plot: A plot: A plot: A plot: Microsoft .Rmd Word ```{r} ```{r} ```{r} = = hist(co2) hist(co2) hist(co2) ``` ``` Reveal.js ``` ioslides, Beamer 2. Open File Start by saving a text file with the extension .Rmd, or open 3. Markdown Next, write your report in plain text. Use markdown syntax to an RStudio Rmd template describe how to format text in the final report. syntax becomes • In the menu bar, click Plain text File ▶ New File ▶ R Markdown… End a line with two spaces to start a new paragraph. *italics* and _italics_ • A window will open. Select the class of output **bold** and __bold__ you would like to make with your .Rmd file superscript^2^ ~~strikethrough~~ • Select the specific type of output to make [link](www.rstudio.com) with the radio buttons (you can change this later) # Header 1 • Click OK ## Header 2 ### Header 3 #### Header 4 ##### Header 5 ###### Header 6 4.
    [Show full text]
  • Anika Anzum Prima Email : [email protected] Linkedin: Mobile : +1-406-580-9126
    Anika Anzum Prima Email : [email protected] Linkedin: https://www.linkedin.com/in/anika-anzum-prima-70198ab3/ Mobile : +1-406-580-9126 Education Montana State University Bozeman, MT • Master's in Computer Science; CGPA: 3.77/4.00 Fall 2017 { Fall 2019 Research Topic: ◦ Analysis of US census data and network measurement data to figure out whether everybody is accessing same quality of Internet at same price. Evaluating the fairness and variation of network performance metrics among all the counties of US using different data analysis techniques. ◦ Google Speech Recognition for figuring out which Google servers are converting voice into text and developing an architecture for a distributed voice recognition server to speed up the conversion and to ensure faster performance. Completed Coursework: Advanced Computational Biology, Research and Methodology in Engineering, Data Mining, Distributed System, Algorithm, Advanced Algorithm Topics, Networks, Computability, Advanced Networks University of Dhaka Dhaka, Bangladesh • Bachelor of Computer Science and Engineering; CGPA: 3.68/4.00 January 2012 { June. 2016 Thesis Topic: Internet Data Budget Allocation Policies for Diverse Smartphone Applications Experience Montana State University Bozeman, MT • Digital Technologies Developer (Non Tenure-Track Faculty) May 2019 - Present Responsibilities: Working as an active member of a service-oriented team that advances digital library initiatives via software development and data analysis. Assisting with development and maintenance of a diverse portfolio
    [Show full text]
  • Feature Selection and Improving Classification
    FEATURE SELECTION AND IMPROVING CLASSIFICATION PERFORMANCE FOR MALWARE DETECTION AThesis Presented to Department of Computer Science By Carlos Andres Cepeda Mora In Partial Fulfillment of Requirements for the Degree Master of Science, Computer Science KENNESAW STATE UNIVERSITY May, 2017 II FEATURE SELECTION AND IMPROVING CLASSIFICATION PERFORMANCE FOR MALWARE DETECTION Approved: _______________________________ Dr. Dan Chia-Tien Lo - Advisor _______________________________ Dr. Dan Chia-Tien Lo– Department Chair _______________________________ Dr. Jon Preston - Dean III In presenting this thesis as a partial fulfillment of the requirements for an advanced degree from Kennesaw State University, I agree that the university library shall make it available for inspection and circulation in accordance with its regulations governing materials of this type. I agree that permission to copy from, or to publish, this thesis may be granted by the professor under whose direction it was written, or, in his absence, by the dean of the appropriate school when such copying or publication is solely for scholarly purposes and does not involve potential financial gain. It is understood that any copying from or publication of, this thesis which involves potential financial gain will not be allowed without written permission. __________________________ Carlos Andres Cepeda Mora IV Unpublished thesis deposited in the Library of Kennesaw State University must be used only in accordance with the stipulations prescribed by the author in the preceding statement. The author of this thesis is: CARLOS ANDRES CEPEDA MORA 1876 HEDGE BROOKE WAY NW, ACWORTH – GA. 30101 The director of this thesis is: DAN CHIA-TIEN LO Users of this thesis not regularly enrolled as students at Kennesaw State University are required to attest acceptance of the preceding stipulations by signing below.
    [Show full text]
  • Open and Reproducible Research: Tools 3
    Open and Reproducible Research: Tools 3 January, 2015 Topics • Git • Github • Reproducible research check list What is git? • the stupid content tracker • a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals. • “I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’ ” — Linus Torvalds according to “Why the ‘Git’ name?” item in GitFaq. Accessed at https://git.wiki.kernel.org/index.php/GitFaq on Jan, 2015. Tracking changes of working file(s) over time 1 • versioning 1 • egsr03.dat: Egyptian WFS Standard Record Ver 3 Tracking changes over time 2 • timestamping • 504_ideas_2014-05-19T19_04_52Z_public_cleaned.csv 2 Tracking changes over time 3 • snapshot: state of a system (here a file) at a particular point in time • keep only the latest (current) files in the working directory • 20141201 directory has the analysis.Rmd (and any other files) as was on Dec., 1, 2014 • 20141223 directory has the anaysis.Rmd (ditto) as was on Dec., 23, 2014 More elaborate snapshots strategy • put all the snapshots under a subdirectory (repo?) • snapshot names don’t have to be date • “branches” idea 3 Git as snapshot management system • each snapshot is called a ‘commit’ • Git stores file contents as well as meta-data (file names, who made the snapshot (commit), when, how this snapshot is related to other snapshot, ...) 4 • Git stores everything under the (hidden) .git directory • everything in Git is check-summed before it
    [Show full text]
  • Anaconda, Spider and Rstudio Installation for Windows
    Anaconda & Spider Installation for windows: 1. Please click on the link below https://www.anaconda.com/download/#windows 2. Click on Download, and then you have to check for compatibility of your Pc, after that it will start downloading. 3. Double click the installer to launch. 4. Click Next. 5. Read the licensing terms and click “I Agree”. 6. Select an install for “Just Me” unless you’re installing for all users (which require Windows Administrator privileges) and click Next. 7. Select a destination folder to install Anaconda and click the Next button. 8. Choose whether to add Anaconda to your PATH environment variable. We recommend not adding Anaconda to the PATH environment variable, since this can interfere with other software. Instead, use Anaconda software by opening Anaconda Navigator or the Anaconda Prompt from the Start Menu NOTE: Choose whether to register Anaconda as your default Python. Unless you plan on installing and running multiple versions of Anaconda or multiple versions of Python, accept the default and leave this box checked. 9. Click the Install button. If you want to watch the packages Anaconda is installing, click Show Details 10. Click the Next button. 11. And then click the Finish button. 12. After a successful installation you will see the “Thanks for installing Anaconda” dialog box: Spyder: Spyder, the Scientific Python Development Environment, which is a free integrated development environment (IDE) that is included with Anaconda. It includes: • Editing, • Interactive testing, • Debugging, • Introspection features. Steps for Spyder setup and run a test code: 1. In Window search box, type Spyder and press Enter.
    [Show full text]
  • Tips for Working Reproducibly
    Tips for Working Reproducibly Thomas J. Leeper Senior Visiting Fellow in Methodology London School of Economics and Political Science 9 May 2019 Tools We’ll See Today R, RStudio https://cran.r-project.org/ https://www.rstudio.com/ make (and other command line tools) https: //cran.r-project.org/bin/windows/Rtools/ git git (https://git-scm.com/) github (https://github.com/) gitkraken (https://www.gitkraken.com/) Collaboration software overleaf (https://www.overleaf.com/) dropbox (https://www.dropbox.com/) google drive (https://drive.google.com/) Introductions Me: Thomas Political Scientist, Methodology Department Experimental and computational methods R You: Name Field/Department Methods Tools/Software Learning Objectives 1 Define reproducibility and replicability 2 Understand how to organize a reproducible research project 3 Recognize different approaches to reproducibility and tools for implementing various reproducible workflows 4 Understand how to collaborate reproducibly Some Semantics Reproduction: recreating output from a shared input Replication: creating new output from new input https://thomasleeper.com/2015/05/ open-science-language/ Beyond Scope for Today Verification Data Transparency Replication Non-computational reproducibility Software and hardware versioning Deterministic computing Interpretations Jefferson’s instructions: The object of your mission is to explore the Missouri river, & such principal stream of it, as, by its course & communication with the water of the Pacific ocean may offer the most direct & practicable water
    [Show full text]
  • Scclustviz – Single-Cell Rnaseq Cluster Assessment and Visualization [Version 1; Peer Review: 2 Approved with Reservations]
    F1000Research 2018, 7:1522 Last updated: 26 JUL 2021 SOFTWARE TOOL ARTICLE scClustViz – Single-cell RNAseq cluster assessment and visualization [version 1; peer review: 2 approved with reservations] Brendan T. Innes 1,2, Gary D. Bader 1,2 1Molecular Genetics, University of Toronto, Toronto, Ontario, M5S3E1, Canada 2The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S3E1, Canada v1 First published: 21 Sep 2018, 7:1522 Open Peer Review https://doi.org/10.12688/f1000research.16198.1 Latest published: 12 Mar 2019, 7:1522 https://doi.org/10.12688/f1000research.16198.2 Reviewer Status Invited Reviewers Abstract Single-cell RNA sequencing (scRNAseq) represents a new kind of 1 2 microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, version 2 in a single experiment. This technology is particularly valuable for (revision) report report characterization of tissue heterogeneity because it can be used to 12 Mar 2019 identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular version 1 type share similar transcriptomes, distinct from other cell types in the 21 Sep 2018 report report tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. 1. Martin Hemberg , Wellcome Sanger The R Shiny software tool described here, scClustViz, provides a simple Institute, Hinxton, UK interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell Tallulah Andrews , Wellcome Sanger types are expected to have distinct gene expression patterns, Institute, Hinxton, UK scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple 2.
    [Show full text]