A SAS* Macro System David S. Frankel, Exxon Company, USA Abstract the Statistical-Model Toolbo

Total Page:16

File Type:pdf, Size:1020Kb

A SAS* Macro System David S. Frankel, Exxon Company, USA Abstract the Statistical-Model Toolbo The Statistical-Model Toolbox: A SAS* Macro System David S. Frankel, Exxon Company, U.S.A. Abstract However, if the objective is to estimate the expected value of • nonlinear function of the The statistical-model toolbox (SMT) is a SAS predicted variable and if the scatter in the macro system written in the Production Depart­ sample is Significant, the conventional approach ment of Exxon Company, U.S.A_, that provides two can lead to significant errors. In this case; powerful capabilities: a systematic way to model it is preferable to model the popUlation in scattered data and to model calculated results terms of conditional probability-density 'that are based on the scattered data; and, a way functions (PDF's) that are determined by central to create and manipulate synthetic probabil ity tendency ("1ocationll) and by variance ("scale"). distributions in the absence of measured data. PDF's are also referred to as distributions or The Production Department uses these capabi1 i­ statistical models. ties to address problems in petroleum reservoir description, where rock properties are Figure 2 depicts this statistical-model stochastic by nature. However, the tools are approac~. In a procedure analogous to complete 1y general and can be aPfl i ed to any regressl0n t the location and scale parameters continuous, numeric, random variab es. The most are estimated for normal (Gaussian) PDF's. The frequently used tool calculates expected values expected value of any function of the variable of arbitrary functions of one or two random is calculated by integrating the product of the variables. Other tools display distributions, function and the PDF over the entire range of calculate statistics, and generate random variable. Computationally, this is simply the samples. Because it is organized as a set of probability-weighted average value of the the tools, the SMT provides sufficient flexibility function. to handle a wide range of applications. A minimum of programming expertise and effort are The statistical-model approach is especially required to use the tools, and the SMT verifies valuable in petrophysics because key physical all input specifications to provide a high level variables are 10gnorma11y distributed. That is, of fault tolerance. in the conventional approach, the most likely value of the logarithm of the variable is predicted. It is not adequate to directly Background substitute the antilog of this prediction into expressions for funcUons that depend on the Geologists and petroleum engineers who attempt variable. to quantitatively describe petroleum reservoirs must deal with problems Similar to those in The statistical-model method .150 has advantages other disciplines: predicting values of one and drawbacks. Its main advantages are that it variable based on the measured value of a second provides a more complete description and more variable; and, evaluating functions of the accurate expected values of arbitrary functions predicted variable. of the variable compared with the conventional approach. This is to be expected because it The prediction problem is usually handled using retains more information about the sample. regreSSion teChniques, most often ordinary least squares, which predicts the most likely value. Its drawbacks are that it assumes the PDF's are Once regression has been completed, the raw data normal, that these normal PDF's are unbounded in are discarded, leaving the regression curve as the range of values accessible to the variable, the sole model of the data. Funct ions of the and that the method is difficult to understand predicted variable are evaluated by substituting and implement. unique values as calculated from the equation of the regression curve. The assumption that samples of random-variable populations approximately conform to normal Figure 1 depicts this conventional approach, PDF's is generally a good one. In most cases which has advantages and drawbacks. where the assumption is not adequate, a simple transformation can be found to render the PDF's The advantages are that it is easy to of the transformed variable approximately understand, easy to automate, and easy to normal. In the rare case where even this is not display. Most people feel comfortable using pOSSible, the statistical-model approach could this approach. be handled in terms of a specific non-normal type of prohabil ity density ·function. Its main drawback is that the regression equation by itself is an incomplete model of the The SAS macro system called the Statistical­ population; it only predicts the central Model Toolbox (SMT) successfully overcomes the tendency of the data. This is not a serious other two drawbacks. The SMT does provide for drawback if the objective is to predict the truncation of the range of the random variable, expected value of the variable or of a linear and It makes the method simple to implement. function of the variable. The user need not be a statistician, although he must reorient his mental image of the process from predicting unique values to integrating over PDF's. 1304 The Statistical-Model Toolbox (SMT) grid-cell s into trapezoidal PDF's that approximate the distribution of elevations above The SKT is a system for managing univariate a datum in each cell. These PDF's are useful in PDF's for continuous, numeric, random variables. integrating functions that depend on depth. The SMT is a set of SAS version 5.16 name-style macros that uses only keyword parameters. The GENNORM and GENlNORM build hypothetical PDF's m.cros were written for IBM/OS and are normally (marginal or conditional) using specified invoked in the background environment. No statistics for normal and lognormal populations, user-interface has yet been provided to respectively. The statistics may be construct the macro invocations, but a high percent i 1es, the ari thmet i c mean, or (for degree of fault tolerance is provided by input GENlNORM) any arbUrary power-mean (e.g., verification utilities within the macro system. harmonic mean, geometric mean, root-mean-square, etc.) . Figure 3 is a conceptual flowchart for the SMT. The main elements of the system are: macros that The GEN macro makes PDF's using primitive model make PDF's; statistical-model datasets (SMDS's), parameters. Whereas PARMEST, GENNORM, and which serve as databases from which PDF~s can be GENlNORM make normal (Gaussian) PDF's, GEN can extracted and applied; and, application macros also make uniform (rectangular), generalized that use extracted PDF's to perform specific triangular (i.e., symmetric or asymmetric), tasks. trapezoidal, and spike (delta-function) PDF's. These non-normal PDF's may be used on an equal The SMT also includes auxiliary macros thatare basis with normal PDF's that are derived by used at either end of this legica1 flowchart, to parameter estimation. aid in the description of samples, or to look up answers in output datasets created by the application macros. Statistical-Model Datasets (SMDS's) Figure 4 is a detailed flowchart that shows the The core of the SMT is one or more actual macros. The following discussion statistical-model datasets (SMDS). An SMDS is a summarizes key macros and concepts. Sr.s dataset (normally permanent) that contains in canonical form all of the essential detail s about one or more statistical models. The user Making Models need not be concerned with the detailed contents of an SMDS. It is sufficient to know that PDF's Models (PDF's) can be made by performing can (and must) be extracted from an SMDS to be parameter estimation on samples or by creating used by an application macro. hypothetical models in the absence of data. Both classes of models accommodate double Each observation in an SHDS represents a truncation, single truncation, or no truncation marginal PDF or a class of conditional PDF's. in the permissible range of the random variable. Stored details include: the type (i.e., one of This is an important consideration when dealing the five available standard forms); the location with physical variables or with constructed parameter or an equation to generate it; the variables (e.g., ratios and percentages) whose scale parameter; the shape parameter (triangular values must be confined to known intervals. and trapezoidal types only); lower and upper truncation limits; and a parameter covariance The PARMEST macro estimates parameters for matrix if the PDF resulted from parameter normal PDF's by applying maximum likelihood estimation. estimation to samples. Both truncated and censored samples can be handled. A truncated One detail not stored by an SMOS is what sample has no values in a certain range (usually transformation if any was applied to the sample due to measurement or reporting problems) (or implied for the population) before capturing although these values are known to exist in the its description in the SMDS. The user must keep population. Censored samples contain inexact track of this transformation and be able to observations, which are effectively error bars invert it. rather than exact data. PARMEST can derivea marginal PDF from a univariate sample, or it can An SMDS can be the target of two macros: the derive a class of conditional PDF's from a QUERY macro, which lists relevant details to bivariate sample. For the latter case, either a allow the user to identify the contents of the linear or quadratic relation between the SMDS; and, the EXTRACT macro, which the user regressor and the location of the conditional invokes to prepare a specifiC univariate model PDF may be chosen, and the location maY for subsequent use by an application macro. optionally be restricted to match a specified value. Application Macros PARMEST embodies proprietary parameter estimation techniques; it is not just a shell The app1 ication macros do calculatinns or make that calls SAS statistics procedures. plots for either functions of one or two random variables nr for the PDF itself.
Recommended publications
  • Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning
    UPTEC F 19050 Examensarbete 30 hp August 2019 Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Per Hedbrant Master Thesis in Engineering Physics Department of Engineering Sciences Uppsala University Sweden Abstract Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Teknisk- naturvetenskaplig fakultet UTH-enheten Motivation A challenge for researchers at CBCS is the ability to efficiently manage the Besöksadress: different data formats that frequently are changed. Significant amount of time is Ångströmlaboratoriet Lägerhyddsvägen 1 spent on manual pre-processing, converting from one format to another. There are Hus 4, Plan 0 currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Postadress: Box 536 751 21 Uppsala Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for Telefon: automated recognition and loading of data stored in arbitrary formats. The aim of 018 – 471 30 03 this study is three-folded: A) Investigate if unsupervised machine learning Telefax: methods can be used to label different types of cells in spreadsheets. B) 018 – 471 30 00 Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and Hemsida: technologies for the SaaS solution. http://www.teknat.uu.se/student Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done.
    [Show full text]
  • Kwame Nkrumah University of Science and Technology, Kumasi
    KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY, KUMASI, GHANA Assessing the Social Impacts of Illegal Gold Mining Activities at Dunkwa-On-Offin by Judith Selassie Garr (B.A, Social Science) A Thesis submitted to the Department of Building Technology, College of Art and Built Environment in partial fulfilment of the requirement for a degree of MASTER OF SCIENCE NOVEMBER, 2018 DECLARATION I hereby declare that this work is the result of my own original research and this thesis has neither in whole nor in part been prescribed by another degree elsewhere. References to other people’s work have been duly cited. STUDENT: JUDITH S. GARR (PG1150417) Signature: ........................................................... Date: .................................................................. Certified by SUPERVISOR: PROF. EDWARD BADU Signature: ........................................................... Date: ................................................................... Certified by THE HEAD OF DEPARTMENT: PROF. B. K. BAIDEN Signature: ........................................................... Date: ................................................................... i ABSTRACT Mining activities are undertaken in many parts of the world where mineral deposits are found. In developing nations such as Ghana, the activity is done both legally and illegally, often with very little or no supervision, hence much damage is done to the water bodies where the activities are carried out. This study sought to assess the social impacts of illegal gold mining activities at Dunkwa-On-Offin, the capital town of Upper Denkyira East Municipality in the Central Region of Ghana. The main objectives of the research are to identify factors that trigger illegal mining; to identify social effects of illegal gold mining activities on inhabitants of Dunkwa-on-Offin; and to suggest effective ways in curbing illegal mining activities. Based on the approach to data collection, this study adopts both the quantitative and qualitative approach.
    [Show full text]
  • Assessing the Environmental Adaptation of Wildlife And
    Assessing the Environmental Adaptation of Wildlife and Production Animals Production and Wildlife of Adaptation Assessing Environmental the Assessing the Environmental Adaptation of Wildlife and • Edward Narayan Edward • Production Animals Applications of Physiological Indices and Welfare Assessment Tools Edited by Edward Narayan Printed Edition of the Special Issue Published in Animals www.mdpi.com/journal/animals Assessing the Environmental Adaptation of Wildlife and Production Animals: Applications of Physiological Indices and Welfare Assessment Tools Assessing the Environmental Adaptation of Wildlife and Production Animals: Applications of Physiological Indices and Welfare Assessment Tools Editor Edward Narayan MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Editor Edward Narayan The University of Queensland Australia Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Animals (ISSN 2076-2615) (available at: https://www.mdpi.com/journal/animals/special issues/ environmental adaptation). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Volume Number, Page Range. ISBN 978-3-0365-0142-0 (Hbk) ISBN 978-3-0365-0143-7 (PDF) © 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher areproperly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.
    [Show full text]
  • General Linear Models - Part II
    Introduction to General and Generalized Linear Models General Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby October 2010 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 1 / 32 Today Test for model reduction Type I/III SSQ Collinearity Inference on individual parameters Confidence intervals Prediction intervals Residual analysis Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 2 / 32 Tests for model reduction Assume that a rather comprehensive model (a sufficient model) H1 has been formulated. Initial investigation has demonstrated that at least some of the terms in the model are needed to explain the variation in the response. The next step is to investigate whether the model may be reduced to a simpler model (corresponding to a smaller subspace),. That is we need to test whether all the terms are necessary. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 3 / 32 Successive testing, type I partition Sometimes the practical problem to be solved by itself suggests a chain of hypothesis, one being a sub-hypothesis of the other. In other cases, the statistician will establish the chain using the general rule that more complicated terms (e.g. interactions) should be removed before simpler terms. In the case of a classical GLM, such a chain of hypotheses n corresponds to a sequence of linear parameter-spaces, Ωi ⊂ R , one being a subspace of the other. n R ⊆ ΩM ⊂ ::: ⊂ Ω2 ⊂ Ω1 ⊂ R ; where Hi : µ 2
    [Show full text]
  • Security Systems Services World Report
    Security Systems Services World Report established in 1974, and a brand since 1981. www.datagroup.org Security Systems Services World Report Database Ref: 56162 This database is updated monthly. Security Systems Services World Report SECURITY SYSTEMS SERVICES WORLD REPORT The Security systems services Report has the following information. The base report has 59 chapters, plus the Excel spreadsheets & Access databases specified. This research provides World Data on Security systems services. The report is available in several Editions and Parts and the contents and cost of each part is shown below. The Client can choose the Edition required; and subsequently any Parts that are required from the After-Sales Service. Contents Description ....................................................................................................................................... 5 REPORT EDITIONS ........................................................................................................................... 6 World Report ....................................................................................................................................... 6 Regional Report ................................................................................................................................... 6 Country Report .................................................................................................................................... 6 Town & Country Report ......................................................................................................................
    [Show full text]
  • Types of Sums of Squares
    Types of Sums of Squares With flexibility (especially unbalanced designs) and expansion in mind, this ANOVA package was implemented with general linear model (GLM) approach. There are different ways to quantify factors (categorical variables) by assigning the values of a nominal or ordinal variable, but we adopt binary coding for each factor level and all applicable interactions into dummy (indicator) variables. An ANOVA can be written as a general linear model: Y = b0 + b1X1 + b2X2 + ... + bkXk+e With matrix notation, it is reduced to a simple form Y = Xb + e The design matrix for a 2-way ANOVA with factorial design 2X3 looks like Data Design Matrix A B A*B A B A1 A2 B1 B2 B3 A1B1 A1B2 A1B3 A2B1 A2B2 A2B3 1 1 1 1 0 1 0 0 1 0 0 0 0 0 1 2 1 1 0 0 1 0 0 1 0 0 0 0 1 3 1 1 0 0 0 1 0 0 1 0 0 0 2 1 1 0 1 1 0 0 0 0 0 1 0 0 2 2 1 0 1 0 1 0 0 0 0 0 1 0 2 3 1 0 1 0 0 1 0 0 0 0 0 1 After removing an effect of a factor or an interaction from the above full model (deleting some columns from matrix X), we obtain the increased error due to the removal as a measure of the effect. And the ratio of this measure relative to some overall error gives an F value, revealing the significance of the effect.
    [Show full text]
  • Yxilon - the Next Generation of Statistical Software
    Yxilon - The Next Generation of Statistical Software Wolfgang H¨ardle Sigbert Klinke Uwe Ziegenhagen Institut f¨ur Statistik and Okonometrie¨ Humboldt-Universit¨at zu Berlin http://www.quantlet.org http://www.xplore-stat.de Contents of This Talk Introduction X Statistical Software Packages Requirements for Statistical Software Applications Introduction 1-1 “Each new generation of computers offers us new possiblities, at a time when we are far from using most of the possibilities offered by those already obsolete.” John W. Tukey (1965) Yxilon Introduction 1-2 Statistical Software Packages ACE, Alpha+, Analyse-it for Microsoft Excel, AnswerTree 3.0, Ascribe, 2ask, Askia, auditHOST,[B/D], Bellview® CAPI, Bellview® CATI, Bellview® Fusion, Bellview® Scan, Bellview® Web, Beyond 20/20, Beyond Question, Blaise, BMDP, Brand2hand,Clementine, CleverForm, Clicktools, Collect Data Now!, Community Express, Complete Report Automation, Confirmit, Cost$Benefit Analysis Tool, CSPro,DataSet V, DatStat Illume, Demographix, Dub Interviewer,Easyresearch - online surveys, Entryware™ Pro, EquivTest, ESBPDF Analysis, ESBStats, E-Tabs Enterprise, E-Tabs Interactive, E-Tabs Reader - Professional / Lite, E-Tabs Writer,Fitter, ForexastX Wizard, Formic 3, FUSE Enhanced Survey Solutions,GenStat, gllamm, Grader, GS+ Geostatistics for the Environmental Sciences,Halogen eSurveyor, HostedSurvey,IDAMS, IdeaMap.NET, Inquisite, InSite, Interviewer CAPI, Interviewer CATI, Interviewer VCC, Interviewer Web, Ioxphere, IRQuest, ISSA, IT, IYCQ Manager, IYSS,Jambo,KeyPlan, Keypoint
    [Show full text]
  • Download This PDF File
    Control Theory and Informatics www.iiste.org ISSN 2224-5774 (print) ISSN 2225-0492 (online) Vol 2, No.1, 2012 The Trend Analysis of the Level of Fin-Metrics and E-Stat Tools for Research Data Analysis in the Digital Immigrant Age Momoh Saliu Musa Department of Marketing, Federal Polytechnic, P.M.B 13, Auchi, Nigeria. Tel: +2347032739670 e-mail: [email protected] Shaib Ismail Omade (Corresponding author) Department of Statistics, School of Information and Communication Technology, Federal Polytechnic, P.M.B 13, Auchi, Nigeria. Phone: +2347032808765 e-mail: [email protected] Abstract The current trend in the information technology age has taken over every spare of human discipline ranging from communication, business, governance, defense to education world as the survival of these sectors depend largely on the research outputs for innovation and development. This study evaluated the trend of the usage of fin-metrics and e-stat tools application among the researchers in their research outputs- digital data presentation and analysis in the various journal outlets. The data used for the study were sourced primarily from the sample of 1720 out of 3823 empirical on and off line journals from various science and social sciences fields. Statistical analysis was conducted to evaluate the consistency of use of the digital tools in the methodology of the research outputs. Model for measuring the chance of acceptance and originality of the research was established. The Cockhran test and Bartlet Statistic revealed that there were significant relationship among the research from the Polytechnic, University and other institution in Nigeria and beyond. It also showed that most researchers still appeal to manual rather than digital which hampered the input internationally and found to be peculiar among lecturers in the system that have not yet appreciate IT penetration in Learning.
    [Show full text]
  • Simulation Study to Compare the Random Data Generation from Bernoulli Distribution in Popular Statistical Packages
    Simulation Study to Compare the Random Data Generation from Bernoulli Distribution in Popular Statistical Packages Manash Pratim Kashyap Department of Business Administration Assam University, Silchar, India [email protected] Nadeem Shafique Butt PIQC Institute of Quality Lahore, Pakistan [email protected] Dibyojyoti Bhattacharjee Department of Business Administration Assam University, Silchar, India [email protected] Abstract In study of the statistical packages, simulation from probability distributions is one of the important aspects. This paper is based on simulation study from Bernoulli distribution conducted by various popular statistical packages like R, SAS, Minitab, MS Excel and PASW. The accuracy of generated random data is tested through Chi-Square goodness of fit test. This simulation study based on 8685000 random numbers and 27000 tests of significance shows that ability to simulate random data from Bernoulli distribution is best in SAS and is closely followed by R Language, while Minitab showed the worst performance among compared packages. Keywords: Bernoulli distribution, Goodness of Fit, Minitab, MS Excel, PASW, SAS, Simulation, R language 1. Introduction Use of Statistical software is increasing day by day in scientific research, market surveys and educational research. It is therefore necessary to compare the ability and accuracy of different statistical softwares. This study focuses a comparison of random data generation from Bernoulli distribution among five softwares R, SAS, Minitab, MS Excel and PASW (Formerly SPSS). The statistical packages are selected for comparison on the basis of their popularity and wide usage. Simulation is the statistical method to recreate situation, often repeatedly, so that likelihood of various outcomes can be more accurately estimated.
    [Show full text]
  • Cumulation of Poverty Measures: the Theory Beyond It, Possible Applications and Software Developed
    Cumulation of Poverty measures: the theory beyond it, possible applications and software developed (Francesca Gagliardi and Giulio Tarditi) Siena, October 6th , 2010 1 Context and scope Reliable indicators of poverty and social exclusion are an essential monitoring tool. In the EU-wide context, these indicators are most useful when they are comparable across countries and over time. Furthermore, policy research and application require statistics disaggregated to increasingly lower levels and smaller subpopulations. Direct, one-time estimates from surveys designed primarily to meet national needs tend to be insufficiently precise for meeting these new policy needs. This is particularly true in the domain of poverty and social exclusion, the monitoring of which requires complex distributional statistics – statistics necessarily based on intensive and relatively small- scale surveys of households and persons. This work addresses some statistical aspects relating to improving the sampling precision of such indicators in EU countries, in particular through the cumulation of data over rounds of regularly repeated national surveys. 2 EU-SILC The reference data for this purpose are EU Statistics on Income and Living Conditions, the major source of comparative statistics on income and living conditions in Europe. A standard integrated design has been adopted by nearly all EU countries. It involves a rotational panel, with a new sample of households and persons introduced each year to replace one-fourth of the existing sample. Persons enumerated in each new sample are followed-up in the survey for four years. The design yields each year a cross- sectional sample, as well as longitudinal samples of 2, 3 and 4 year duration.
    [Show full text]
  • Comparison of Three Common Statistical Programs Available to Washington State County Assessors: SAS, SPSS and NCSS
    Washington State Department of Revenue Comparison of Three Common Statistical Programs Available to Washington State County Assessors: SAS, SPSS and NCSS February 2008 Abstract: This summary compares three common statistical software packages available to county assessors in Washington State. This includes SAS, SPSS and NCSS. The majority of the summary is formatted in tables which allow the reader to more easily make comparisons. Information was collected from various sources and in some cases includes opinions on software performance, features and their strengths and weaknesses. This summary was written for Department of Revenue employees and county assessors to compare statistical software packages used by county assessors and as a result should not be used as a general comparison of the software packages. Information not included in this summary includes the support infrastructure, some detailed features (that are not described in this summary) and types of users. Contents General software information.............................................................................................. 3 Statistics and Procedures Components................................................................................ 8 Cost Estimates ................................................................................................................... 10 Other Statistics Software ................................................................................................... 13 General software information Information in this section was
    [Show full text]
  • Statistical Inference with Paired Observations and Independent Observations in Two Samples
    Statistical inference with paired observations and independent observations in two samples Benjamin Fletcher Derrick A thesis submitted in partial fulfilment of the requirements of the University of the West of England, Bristol, for the degree of Doctor of Philosophy Applied Statistics Group, Engineering Design and Mathematics, Faculty of Environment and Technology, University of the West of England, Bristol February, 2020 Abstract A frequently asked question in quantitative research is how to compare two samples that include some combination of paired observations and unpaired observations. This scenario is referred to as ‘partially overlapping samples’. Most frequently the desired comparison is that of central location. Depend- ing on the context, the research question could be a comparison of means, distributions, proportions or variances. Approaches that discard either the paired observations or the independent observations are customary. Existing approaches evoke much criticism. Approaches that make use of all of the available data are becoming more prominent. Traditional and modern ap- proaches for the analyses for each of these research questions are reviewed. Novel solutions for each of the research questions are developed and explored using simulation. Results show that proposed tests which report a direct measurable difference between two groups provide the best solutions. These solutions advance traditional methods in this area that have remained largely unchanged for over 80 years. An R package is detailed to assist users to per- form these new tests in the presence of partially overlapping samples. Acknowledgments I pay tribute to my colleagues in the Mathematics Cluster and the Applied Statistics Group at the University of the West of England, Bristol.
    [Show full text]