Mathematics & Statistics (MATH)

Total Page:16

File Type:pdf, Size:1020Kb

Mathematics & Statistics (MATH) University of New Hampshire 1 MATH 831 - Mathematics for Geodesy MATHEMATICS & STATISTICS Credits: 3 A survey of topics from undergraduate mathematics designed for (MATH) graduate students in engineering and science interested in applications to geodesy and Earth Sciences. Topics include essential elements from # Course numbers with the # symbol included (e.g. #400) have not analytic geometry, geometry of surfaces, linear algebra and statistics, been taught in the last 3 years. Fourier analysis, discrete Fourier transforms and software, filtering applications to tidal data. MATH 801 - Exploring Mathematics for Teachers I Prerequisite(s): (MATH 645 with a minimum grade of D- or MATH 645H Credits: 3 with a minimum grade of D- or MATH 762 with a minimum grade of D- or Provides prospective elementary teachers with the opportunity to MATH 862 with a minimum grade of B-). explore and master concepts involving number systems and operations, Grade Mode: Letter Grade data analysis and probability. Additional topics may include geometry, MATH 832 - Introduction to the R Software measurement, and algebraic thinking. Mathematical reasoning, problem Credits: 1 solving, and the use of appropriate manipulatives and technology are This course provides a basic introduction to the open-sources statistical integrated throughout the course. Readings, class discussions, and software R for students who have never used this software or have never assignments focus on mathematics content as well as applicable formally learned the basics of it. Topics include: Numeric calculations, theories of learning, curriculum resources, and state and national simple and advanced graphics, object management and work-flow, recommendations. The course models instructional techniques that can RStudio, user-contributed packages, basic programming, writing of be adapted to the elementary curricula. Credit offered only to M.Ed. and functions, statistical modeling and related graphs, distributed computing, M.A.T., certificate students, and in-service teachers. (Not offered for credit reproducible research and document production via markup language. if credit is received for MATH #821 or MATH 823.) Cr/F. Prerequisite(s): (EDUC 500 with a minimum grade of D- or EDUC 935 with Equivalent(s): MATH 859 a minimum grade of B-). Grade Mode: Equivalent(s): MATH 601, MATH #821, MATH 823 Grade Mode: Letter Grade MATH 835 - Statistical Methods for Research Credits: 3 MATH #821 - Number Systems for Teachers This course provides a solid grounding in modern applications of Credits: 3 statistics to a wide range of disciplines by providing an overview of the Ways of representing numbers, relationships between numbers, number fundamental concepts of statistical inference and analysis, including t- systems, the meanings of operations and how they relate to one another, tests and confidence intervals. Additional topics include: ANOVA, multiple and computation with number systems as a foundation for algebra; linear regression, analysis of cross classified categorical data, logistic episodes in history and development of the number system; and regression, nonparametric statistics and data mining using CART. The examination of the developmental sequence and learning trajectory as use of statistical software, such as JMP. S PLUS, or R, is fully integrated children learn number concepts. Credit offered only to M.Ed., M.A..T., into the course. Elementary Math Specialist certificate only students, and in-service Grade Mode: Letter Grade teachers. Not offered for credit if credit received for MATH 621. Equivalent(s): MATH 621 MATH 836 - Advanced Statistical Methods for Research Grade Mode: Letter Grade Credits: 3 An introduction to multivariate statistical methods, including principal MATH 823 - Statistics and Probability for Teachers components, discriminant analysis, cluster analysis, factor analysis, Credits: 3 multidimensional scaling, and MANOVA. Additional topics include An introduction to probability, descriptive statistics and data analysis; generalized linear models, general additive models, depending on the exploration of randomness, data representation and modeling. interests of class participants. This course completes a solid grounding Descriptive statistics will include measures of central tendency, in modern applications of statistics used in most research applications. dispersion, distributions and regression. Analysis of experiments The use of statistical software, such as JMP, S PLUS, or R, is fully requiring hypothesizing, experimental design and data gathering. Credit integrated into the course. offered only to M.Ed., M.A..T., Elementary Math Specialist certificate only Prerequisite(s): (MATH 835 with a minimum grade of B- or MATH 839 students, and in-service teachers. Not offered for credit if credit received with a minimum grade of B-). for MATH 623. Grade Mode: Letter Grade Prerequisite(s): (MATH 621 with a minimum grade of D- or MATH #821 with a minimum grade of B-). Equivalent(s): MATH 623 Grade Mode: Letter Grade 2 Mathematics & Statistics (MATH) MATH 837 - Statistical Methods for Quality Improvement and Design MATH 841 - Survival Analysis Credits: 3 Credits: 3 Six Sigma is a popular, data-focused methodology used worldwide by Explorations of models and data-analytic methods used in medical, organizations to achieve continuous improvement of their existing biological, and reliability studies. Event-time data, censored data, processes, products and services or to design new ones. This course reliability models and methods, Kaplan-Meier estimator, proportional provides a thorough introduction to the Six Sigma principles, methods, hazards, Poisson models, loglinear models. The use of statistical and applications for continuous improvement (DMAIC process) and an software, such as SAS, JMP, or R, is fully integrated into the course. overview of Design for Six Sigma (DFSS). Both manufacturing and non- Prereq: MATH 839. (Offered in alternate years.) manufacturing (transactional Six Sigma) applications will be included. Grade Mode: Letter Grade Emphasis is placed on the use of case studies to motivate the use of, as MATH 843 - Time Series Analysis well as the proper application of, the Six Sigma methodology. Formal Six Credits: 3 Sigma Green Belt certification from UNH may be attained by successfully An introduction to univariate time series models and associated methods completing TECH 696. Students must have completed a calculus-based of data analysis and inference in the time domain and frequency domain. introductory statistics course. Topics include: Auto regressive (AR), moving average (MA), ARMA and Grade Mode: Letter Grade ARIMA processes, stationary and non-stationary processes, seasonal MATH 838 - Data Mining and Predictive Analytics ARIMA processes, auto-correlation and partial auto-correlation functions, Credits: 3 identification of models, estimation of parameters, diagnostic checking An introduction to supervised and unsupervised methods for exploring of fitted models, forecasting, spectral density function, periodogram and large data sets and developing predictive models. Unsupervised methods discrete Fourier transform, linear filters. parametric spectral estimation, include: market basket analysis, principal components, clustering, dynamic Fourier analysis. Additional topics may include wavelets and and variables clustering. Important statistical and machine learning long memory processes (FARIMA) and GARCH Models. The use of methods (supervised learning) include: Classification and Regression statistical software, such as JMP, or R, is fully integrated in to the course. Tress (CART), Random Forests, Neural Nets, Support Vector Machines, Offered in alternate years in the spring. Logistic Regression and Penalized Regression. Additional topics focus Prerequisite(s): (MATH 835 with a minimum grade of B- or MATH 839 on metamodeling, validation strategies, bagging and boosting to improve with a minimum grade of B-). prediction or classification, and ensemble prediction from a set of diverse Grade Mode: Letter Grade models. Required case studies and projects provide students with MATH 844 - Design of Experiments II experience in applying these techniques and strategies. The course Credits: 3 necessarily involves the use of statistical software and programming Second course in design of experiments, with applications in quality languages. Students must have completed a calculus-based introductory improvement and industrial manufacturing, engineering research and statistics course. development, research in physical and biological sciences. Covers Grade Mode: Letter Grade experimental design strategies and issues that are often encountered in MATH 839 - Applied Regression Analysis practice complete and incomplete blocking, partially balanced incomplete Credits: 3 blocking (PBIB), partial confounding, intra and inter block information, Statistical methods for the analysis of relationships between response split plotting and strip plotting, repeated measures, crossover designs, and input variables: simple linear regression, multiple regression analysis, Latin squares and rectangles, Youden squares, crossed and nested residual analysis model selection, multi-collinearity, nonlinear curve treatment structures, variance components, mixed effects models, fitting, categorical predictors, introduction to analysis of variance, analysis of covariance, optimizations, space filling designs, and modern analysis of covariance, examination of validity of underlying assumptions, screening design
Recommended publications
  • Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors
    Revista Colombiana de Estadística Diciembre 2013, volumen 36, no. 2, pp. 319 a 336 Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors Inferencia Bayesiana para la distribución Gamma de dos parámetros asumiendo diferentes a prioris no informativas Fernando Antonio Moala1;a, Pedro Luiz Ramos1;b, Jorge Alberto Achcar2;c 1Departamento de Estadística, Facultad de Ciencia y Tecnología, Universidade Estadual Paulista, Presidente Prudente, Brasil 2Departamento de Medicina Social, Facultad de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brasil Abstract In this paper distinct prior distributions are derived in a Bayesian in- ference of the two-parameters Gamma distribution. Noniformative priors, such as Jeffreys, reference, MDIP, Tibshirani and an innovative prior based on the copula approach are investigated. We show that the maximal data information prior provides in an improper posterior density and that the different choices of the parameter of interest lead to different reference pri- ors in this case. Based on the simulated data sets, the Bayesian estimates and credible intervals for the unknown parameters are computed and the performance of the prior distributions are evaluated. The Bayesian analysis is conducted using the Markov Chain Monte Carlo (MCMC) methods to generate samples from the posterior distributions under the above priors. Key words: Gamma distribution, noninformative prior, copula, conjugate, Jeffreys prior, reference, MDIP, orthogonal, MCMC. Resumen En este artículo diferentes distribuciones a priori son derivadas en una in- ferencia Bayesiana de la distribución Gamma de dos parámetros. A prioris no informativas tales como las de Jeffrey, de referencia, MDIP, Tibshirani y una priori innovativa basada en la alternativa por cópulas son investigadas.
    [Show full text]
  • Bayes Estimation and Prediction of the Two-Parameter Gamma Distribution
    Bayes Estimation and Prediction of the Two-Parameter Gamma Distribution Biswabrata Pradhan ¤& Debasis Kundu y Abstract In this article the Bayes estimates of two-parameter gamma distribution is considered. It is well known that the Bayes estimators of the two-parameter gamma distribution do not have compact form. In this paper, it is assumed that the scale parameter has a gamma prior and the shape parameter has any log-concave prior, and they are independently distributed. Under the above priors, we use Gibbs sampling technique to generate samples from the posterior density function. Based on the generated samples, we can compute the Bayes estimates of the unknown parameters and also can construct highest posterior density credible intervals. We also compute the approximate Bayes estimates using Lindley's approximation under the assumption of gamma priors of the shape parameter. Monte Carlo simulations are performed to compare the performances of the Bayes estimators with the classical estimators. One data analysis is performed for illustrative purposes. We further discuss about the Bayesian prediction of future observation based on the observed sample and it is observed that the Gibbs sampling technique can be used quite e®ectively, for estimating the posterior predictive density and also for constructing predictive interval of the order statistics from the future sample. Keywords: Maximum likelihood estimators; Conjugate priors; Lindley's approximation; Gibbs sampling; Predictive density, Predictive distribution. Subject Classifications: 62F15; 65C05 Corresponding Address: Debasis Kundu, Phone: 91-512-2597141; Fax: 91-512-2597500; e-mail: [email protected] ¤SQC & OR Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, Pin 700108, India yDepartment of Mathematics and Statistics, Indian Institute of Technology Kanpur, Pin 208016, India 1 1 Introduction The two-parameter gamma distribution has been used quite extensively in reliability and survival analysis particularly when the data are not censored.
    [Show full text]
  • Big Data Effective Processing and Analysis of Very Large and Unstructured Data for Official Statistics
    Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Big Data and Interference in Official Statistics Part 2 Diego Zardetto Istat ([email protected]) Eurostat Big Data: Methodological Pitfalls Outline • Representativeness (w.r.t. the desired target population) Selection Bias Actual target population unknown Often sample units’ identity unclear/fuzzy Pre processing errors (acting like measurement errors in surveys) Social media & sentiment analysis: pointless babble & social bots • Ubiquitous correlations Causation fallacy Spurious correlations • Structural break in Nowcasting Algorithms Eurostat 1 Social Networks: representativeness (1/3) The “Pew Research Center” carries out periodic sample surveys on the way US people use the Internet According to its Social Media Update 2013 research: • Among US Internet users aged 18+ (as of Sept 2013) 73% of online adults use some Social Networks 71% of online adults use Facebook 18% of online adults use Twitter 17% of online adults use Instagram 21% of online adults use Pinterest 22% of online adults use LinkedIn • Subpopulations using these Social platforms turn out to be quite different in terms of: Gender, Ethnicity, Age, Education, Income, Urbanity Eurostat Social Networks: representativeness (2/3) According to Social Media Update 2013 research • Social platform users turn out to belong to typical social & demographic profiles Pinterest is skewed towards females LinkedIn over-represents higher education and higher income
    [Show full text]
  • A Philosophical Framework for Explainable Artificial
    Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence Atoosa Kasirzadeh [email protected] University of Toronto (Toronto, Canada) & Australian National University (Canberra, Australia) Abstract The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, in- cluding computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional frame- work to appropriately bridge the technical, epistemic, and normative aspects of this debate prevents the discussion from being as productive as it could be. Drawing on the philosophical literature on the nature and value of explana- tions, this paper offers a multi-faceted framework that brings more conceptual precision to the present debate by (1) identifying the types of explanations that are most pertinent to artificial intelligence predictions, (2) recognizing the rel- evance and importance of social and ethical values for the evaluation of these explanations, and (3) demonstrating the importance of these explanations for incorporating a diversified approach to improving the design of truthful al- arXiv:2103.00752v1 [cs.CY] 1 Mar 2021 gorithmic ecosystems. The proposed philosophical framework thus lays the groundwork for establishing a pertinent connection between the technical and ethical aspects of artificial intelligence systems. Keywords: Ethical Algorithm, Explainable AI, Philosophy of Explanation, Ethics of Artificial Intelligence Preprint submitted to Elsevier 1. Preliminaries Governments and private actors are seeking to leverage recent develop- ments in artificial intelligence (AI) and machine learning (ML) algorithms for use in high-stakes decision making.
    [Show full text]
  • An Algorithmic Inference Approach to Learn Copulas Arxiv:1910.02678V1
    An Algorithmic Inference Approach to Learn Copulas Bruno Apolloni Department of Computer Science University of Milano Via Celoria 18, 20133, Milano, Italy [email protected] October 8, 2019 Abstract We introduce a new method for estimating the parameter α of the bivariate Clayton copulas within the framework of Algorithmic Inference. The method consists of a variant of the standard boot- strapping procedure for inferring random parameters, which we ex- arXiv:1910.02678v1 [stat.ML] 7 Oct 2019 pressly devise to bypass the two pitfalls of this specific instance: the non independence of the Kendall statistics, customarily at the basis of this inference task, and the absence of a sufficient statistic w.r.t. α. The variant is rooted on a numerical procedure in order to find the α estimate at a fixed point of an iterative routine. Although paired 1 with the customary complexity of the program which computes them, numerical results show an outperforming accuracy of the estimates. Keywords | Copulas' Inference, Clayton copulas, Algorithmic Infer- ence, Bootstrap Methods. 1 Introduction Copulas are the atoms of the stochastic dependency between variables. For short, the Sklar Theorem (Sklar 1973) states that we may split the joint cumulative distribution function (CDF) FX1;:::;Xn of variables X1;:::;Xn into the composition of their marginal CDF FXi ; i = 1; : : : ; n through their copula 1 CX1;:::;Xn . Namely : FX1;:::;Xn (x1; : : : ; xn) = CX1;:::;Xn (FX1 (x1);:::;FXn (xn)) (1) While the random variable FXi (Xi) is a uniform variable U in [0; 1] for whatever continuous FXi thanks to the Probability Integral Transform Theo- rem (Rohatgi 1976) (with obvious extension to the discrete case), CX1;:::;Xn (FX1 (X1); :::;FXn (Xn)), hence CX1;:::;Xn (U1;:::;Un); has a specific distribution law which characterizes the dependence between the variables.
    [Show full text]
  • Study On: Tools for Causal Inference from Cross-Sectional Innovation Surveys with Continuous Or Discrete Variables
    Study on: Tools for causal inference from cross-sectional innovation surveys with continuous or discrete variables Dominik Janzing June 28, 2016 Abstract The design of effective public policy, in innovation support, and in other areas, has been constrained by the difficulties involved in under- standing the effects of interventions. The limited ability of researchers to unpick causal relationships means that our understanding of the likely ef- fects of public policy remains relatively poor. If improved methods could be found to unpick causal connections our understanding of important phenomena and the impact of policy interventions could be significantly improved, with important implications for future social prosperity of the EU. Recently, a series of new techniques have been developed in the natu- ral and computing sciences that are enabling researchers to unpick causal links that were largely impossible to explore using traditional methods. Such techniques are currently at the cutting edge of research, and remain technical for the non-specialist, but are slowly diffusing to the broader academic community. Economic analysis of innovation data, in particu- lar, stands to gain a lot from applying techniques developed by the ma- chine learning community. Writing recently in the Journal of Economic Perspectives, Varian (2014, p3) explains \my standard advice to graduate students these days is go to the computer science department and take a class in machine learning. There have been very fruitful collaborations between computer scientists and statisticians in the last decade or so, and I expect collaborations between computer scientists and econometricians will also be productive in the future." However, machine learning tech- niques have not yet been applied to innvovation policy.
    [Show full text]
  • Predictive Inference for Non Probability Samples
    Discussion Paper Predictive inference for non-probability samples: a simulation study The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands 2015 | 13 Bart Buelens Joep Burger Jan van den Brakel 3 CBS | Discussion Paper, 2015 | 13 Non-probability samples provide a challenging source of information for official statistics, because the data generating mechanism is unknown. Making inference from such samples therefore requires a novel approach compared with the classic approach of survey sampling. Design-based inference is a powerful technique for random samples obtained via a known survey design, but cannot legitimately be applied to non-probability samples such as big data and voluntary opt-in panels. We propose a framework for such non-probability samples based on predictive inference. Three classes of methods are discussed. Pseudo-design-based methods are the simplest and apply traditional design-based estimation despite the absence of a survey design; model-based methods specify an explicit model and use that for prediction; algorithmic methods from the field of machine learning produce predictions in a non-linear fashion through computational techniques. We conduct a simulation study with a real-world data set containing annual mileages driven by cars for which a number of auxiliary characteristics are known. A number of data generating mechanisms are simulated, and—in absence of a survey design—a range of methods for inference are applied and compared to the known population values. The first main conclusion from the simulation study is that unbiased inference from a selective non- probability sample is possible, but access to the variables explaining the selection mechanism underlying the data generating process is crucial.
    [Show full text]
  • Engines of Order: a Mechanology of Algorithmic Techniques
    A Mechanology Engines of Algorithmic of Order Techniques BERNHARD RIEDER Amsterdam University Press Engines of Order The book series RECURSIONS: THEORIES OF MEDIA, MATERIALITY, AND CULTURAL TECHNIQUES provides a platform for cuttingedge research in the field of media culture studies with a particular focus on the cultural impact of media technology and the materialities of communication. The series aims to be an internationally significant and exciting opening into emerging ideas in media theory ranging from media materialism and hardware-oriented studies to ecology, the post-human, the study of cultural techniques, and recent contributions to media archaeology. The series revolves around key themes: – The material underpinning of media theory – New advances in media archaeology and media philosophy – Studies in cultural techniques These themes resonate with some of the most interesting debates in international media studies, where non-representational thought, the technicity of knowledge formations and new materialities expressed through biological and technological developments are changing the vocabularies of cultural theory. The series is also interested in the mediatic conditions of such theoretical ideas and developing them as media theory. Editorial Board – Jussi Parikka (University of Southampton) – Anna Tuschling (Ruhr-Universität Bochum) – Geoffrey Winthrop-Young (University of British Columbia) Engines of Order A Mechanology of Algorithmic Techniques Bernhard Rieder Amsterdam University Press This publication is funded by the Dutch Research Council (NWO). Chapter 1 contains passages from Rieder, B. (2016). Big Data and the Paradox of Diversity. Digital Culture & Society 2(2), 1-16 and Rieder, B. (2017). Beyond Surveillance: How Do Markets and Algorithms ‘Think’? Le Foucaldien 3(1), n.p.
    [Show full text]
  • 9<HTMIPI=Aadajf>
    58 Engineering Springer News 6/2008 springer.com/booksellers S. Akella, N. Amato, W. Huang, B. Mishra (Eds.) B. Apolloni, University of Milano, Italy; W. Pedrycz, V. Botti, A. Giret, Technical University of Valencia, University of Alberta, Edmonton, Canada; S. Bassis, Spain Algorithmic Foundation of D. Malchiodi, University of Milano, Italy Robotics VII ANEMONA The Puzzle of Granular A Multi-agent Methodology for Holonic Selected Contributions of the Seventh Computing International Workshop on the Algorithmic Manufacturing Systems Foundations of Robotics “Rem tene, verba sequentur“ Gaius J. Victor, Rome ANEMONA is a multi-agent system (MAS) meth- VI century b.c. odology for holonic manufacturing system (HMS) Algorithms are a fundamental component of The ultimate goal of this book is to bring the analysis and design, based on HMS requirements. robotic systems: they control or reason about fundamental issues of information granularity, ANEMONA defines a mixed top-down and bottom- motion and perception in the physical world. inference tools and problem solving procedures up development process, and provides HMS-specific They receive input from noisy sensors, consider into a coherent, unified, and fully operational guidelines to help the designer in identifying and geometric and physical constraints, and operate on framework. The objective is to offer the reader a implementing holons. In ANEMONA, the specified the world through imprecise actuators. The design comprehensive, self-contained, and uniform expo- HMS is divided into concrete aspects that form and analysis of robot algorithms therefore raises a sure to the subject.The strategy is to isolate some different “views” of the system. unique combination of questions in control theory, fundamental bricks of Computational Intelligence The development process of ANEMONA provides computational and differential geometry, and in terms of key problems and methods, and discuss clear and HMS-specific modelling guidelines computer science.
    [Show full text]
  • An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems
    An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems Hector Zenil^*a,b,c,d,e, Narsis A. Kiani*a,b,d,e, Francesco Marabitab,d, Yue Dengb, Szabolcs Eliasb,d, Angelika Schmidtb,d, Gordon Ballb,d, & Jesper Tegnér^b,d,f a) Algorithmic Dynamics Lab, Center for Molecular Medicine, Karolinska Institutet, Stockholm, 171 76, Sweden b) Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna, Karolinska Institutet, Stockholm, 171 76, Sweden c) Department of Computer Science, University of Oxford, Oxford, OX1 3QD, UK. d) Science for Life Laboratory, Solna, 171 65, Sweden e) Algorithmic Nature Group, LABORES for the Natural and Digital Sciences, Paris, 75006, France. f) Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955–6900, Kingdom of Saudi Arabia * Shared-first authors ^ Corresponding authors Author Contributions: HZ, NK and JT are responsible for the general design and conception. HZ and NK are responsible for data acquisition. HZ, NK, YD, FM, SE and AS contributed to data analysis. HZ developed the methodology, with key contributions from NK and JT. HZ undertook most of the numerical experiments, with YD and FM contributing. HZ, AS, GB and SE contributed the literature-based Th17 enrichment analysis. HZ and JT wrote the article, with key contributions from NK. Correspondence should be addressed to HZ: [email protected]
    [Show full text]
  • Large Scale Multinomial Inferences and Its Applications in Genome Wide Association Studies
    Large Scale Multinomial Inferences and Its Applications in Genome Wide Association Studies Chuanhai Liu and Jun Xie Abstract Statistical analysis of multinomial counts with a large number K of cat- egories and a small number n of sample size is challenging to both frequentist and Bayesian methods and requires thinking about statistical inference at a very funda- mental level. Following the framework of Dempster-Shafer theory of belief func- tions, a probabilistic inferential model is proposed for this “large K and small n” problem. The inferential model produces a probability triplet (p,q,r) for an asser- tion conditional on observed data. The probabilities p and q are for and against the truth of the assertion, whereas r = 1 p q is the remaining probability called the probability of “don’t know”. The new− inference− method is applied in a genome-wide association study with very high dimensional count data, to identify association be- tween genetic variants to the disease Rheumatoid Arthritis. 1 Introduction We consider statistical inference for the comparison of two large-scale multino- mial distributions. This problem is motivated by genome-wide association studies (GWAS) with very high dimensional count data, i.e., single nucleotide polymor- phism (SNP) data. SNPs are major genetic variants that may associate with common diseases such as cancer and heart disease. A SNP has three possible genotypes, wild type homozygous, heterozygous, and mutation (rare) homozygous. In genome-wide association studies, genotypes of over 500,000 SNPs are typically measured for dis- ease (case) and normal (control) subjects, resulting in a large amount of count data.
    [Show full text]
  • Tight Bounds for SVM Classification Error
    Tight bounds for SVM classification error B. Apolloni, S. Bassis, S. Gaito, D. Malchiodi Dipartimento di Scienze dell'Informazione, Universita degli Studi di Milano Via Comelico 39/41, 20135 Milano, Italy E-mail: {apolloni, bassis, gaito, malchiodi}@dsi.unimi.it Abstract- We find very tight bounds on the accuracy of a Sup- lies in finding a separating hyperplane, i.e. an h E H such port Vector Machine classification error within the Algorithmic that all the points with a given label belong to one of the two Inference framework. The framework is specially suitable for this half-spaces determined by h. kind of classifier since (i) we know the number of support vectors really employed, as an ancillary output of the learning procedure, In order to obtain such a h, an SVM computes first the and (ii) we can appreciate confidence intervals of misclassifying solution {°ia, . , a* } of a dual constrained optimization probability exactly in function of the cardinality of these vectors. problem As a result we obtain confidence intervals that are up to an order m m narrower than those in the a 1 supplied literature, having slight max Z different meaning due to the different approach they come from, °Zai-2 aiicmjyiYjxi-x (1) but the same operational function. We numerically check the t=1 i,i-=i covering of these intervals. m aiyi = 0 (2) I. INTRODUCTION i-l Support Vector Machines (SVM for short) [1] represent ajt > O i = 1 . .. ,m, (3) an operational tool widely used by the Machine Learning where - denotes the standard dot product in RI, and then community.
    [Show full text]