HST 190: Introduction to Biostatistics

Total Page:16

File Type:pdf, Size:1020Kb

HST 190: Introduction to Biostatistics HST 190: Introduction to Biostatistics Lecture 7: Logistic regression 1 HST 190: Intro to Biostatistics Logistic regression • We’ve previously discussed linear regression methods for predicting continuous outcomes § Functionally, predicting the mean at particular covariate levels • What if we want to predict values for a dichotomous categorical variable, instead of a continuous one? § This corresponds to predicting the probability of the outcome variable being a “1” versus “0” • Can we Just use linear regression for a 0-1 outcome variable? 2 HST 190: Intro to Biostatistics • Consider modeling the probability that a person receives a physical exam in a given year as a function of income. § A sample of individuals is collected. Each individual reports income and whether he/she went to the doctor last year. patient # � = checkup � = income 1 1 32,000 2 0 28,000 3 0 41,000 4 1 38,000 etc. 3 HST 190: Intro to Biostatistics • Plotting this data and fitting a linear regression line, we see that the linear model is not tailored to this type of outcome § For example, an income of $500,000 yields a predicted probability of visiting the doctor greater than 1! 4 HST 190: Intro to Biostatistics Logit transformation • To overcome the problem, we define the logit transformation: if 5 0 < � < 1, then logit � = ln 675 § Notice that as � ↑ 1, logit � ↑ ∞, and as � ↓ 0, logit � ↓ −∞ • Thus, logit � can take any continuous value, so we will fit linear model on this transformed outcome instead • Write this type of model generally as � � � = � + �6�6 + ⋯ + �E�E § Where � � = 1 ⋅ � � = 1 + 0 ⋅ � � = 0 = � � = 1 = � and � �(�) = logit �(�) § This model is called a logistic regression model or a logit model § By comparison, the linear regression model takes � �(�) = � � 5 HST 190: Intro to Biostatistics • A key benefit of fitting logit model rather than contingency table methods is ability to adJust for multiple covariates (including continuous covariates) simultaneously patient # � = checkup income age gender 1 1 32,000 60 F 2 0 28,000 53 M 3 0 41,000 45 M 4 1 38,000 40 F etc. • To interpret parameters, compare fit for man and woman § logit �JKLMN = � + �MOP�MOP + �QNRKLP�QNRKLP + �JKLMN § logit �LMN = � + �MOP�MOP + �QNRKLP�QNRKLP 6 HST 190: Intro to Biostatistics ⇒ logit �LMN = logit �JKLMN − �JKLMN �JKLMN �LMN ⟺ ln − ln = �JKLMN 1 − �JKLMN 1 − �LMN �JKLMN 1 − �LMN ⟺ ln = �JKLMN �LMN 1 − �JKLMN �JKLMN 1 − �LMN oddsJKLMN ⟺ = = �XYZ[\] �LMN 1 − �JKLMN oddsLMN • So, �JKLMN is the log of the odds ratio for getting a checkup between men and women, adjusting for age and income • This result holds for any dichotomous variable in the model § This allows us to estimate odds ratio for a given exposure with disease in a regression, accounting for the effects of other variables 7 HST 190: Intro to Biostatistics • In a logistic regression logit � = � + �6�6 + ⋯ + �E�E, ` ` then denote the fitted parameter estimates as �_, �6, … , �E • if �b is a dichotomous exposure, then the estimated odds ratio relating this exposure to the outcome is f ORe = �Xg • If instead �b is a continuous exposure, then the above odds ratio and CI describe the outcome’s association with a one- unit increase in the exposure, adjusting for other covariates § e.g., “a one unit increase in age is associated on average with an f �X\hi-fold change in the odds of getting a checkup, holding gender constant.” 8 HST 190: Intro to Biostatistics Hypothesis testing and confidence intervals ` • For an estimated �b coefficient in a logistic model, the corresponding 100 1 − � % CI is given by Xf 7k n pPq Xf Xf rk n pPq Xf g lm g g lm g � o , � o § Matlab or other software will provide both �`b and ses �`b f Xg § take note of whether you are given �`b or ORe = � in software output! This differs between programs ` ` • Testing the hypothesis �u: �b = 0 versus �6: �b ≠ 0 is a z- test that is typically provided as part of software output f Xg § If the null is true, � = f is approximately �(0,1) pPq Xg 9 HST 190: Intro to Biostatistics Interaction terms • Like in linear regression, we can also incorporate interaction terms in a logistic regression model logit � = � + �MOP�MOP + �QNRKLP�QNRKLP +�JKLMN�JKLMN + �MOP:JKLMN�MOP�JKLMN • �MOP:JKLMN captures the presence of an interaction effect or effect modification of gender by age § e.g., gender effect on probability of getting annual checkup is greater among younger people 10 HST 190: Intro to Biostatistics Model building for inference • The techniques for variable selection in logistic regression are similar as for linear regression § Biggest challenge is lack of comparable visual fit diagnostics like residual plots • When model building for studies of association between exposure and outcome, focus is on including sources of confounding (i.e., external variables associated with both exposure and outcome) • One strategy is to fit and report the following three models: 1) an unadjusted or minimally adjusted model 2) a model that includes ‘core’ confounders (‘primary’ model) o clear indication from scientific knowledge and/or the literature o consensus among investigators 3) a model that includes ‘core’ confounders plus any ‘potential’ confounders o indication is less certain 11 HST 190: Intro to Biostatistics Logistic regression in retrospective setting • How do we interpret intercept logit � = � + �6�6 + ⋯ + �E�E 5 § � = log is the log odds of experiencing the outcome in the 675 population among subjects with �6 = ⋯ = �E = 0 § Links model to absolute prevalence of the outcome in the population • What happens to logit model if our sampling is case-control (or retrospective)? § That is, what if we sample based on outcome status? § e.g., sample 100 patients with a disease and 100 patients without • Typically this setting artificially selects more cases than would arise naturally under cross-sectional or prospective sampling § so we cannot readily use the sample to describe the true probability of disease in the population 12 HST 190: Intro to Biostatistics • Thus, we see that the intercept � is no longer meaningful in a logistic regression using case-control sampled data § What about the other estimates? • Recall we showed that using contingency tables to compute odds ratios was valid both in prospective and retrospective sampling designs • It turns out that the same is true for the estimated coefficients in logistic regression! Just as before, f ORe = �Xg § all other inference (tests, CIs) is also the same • The estimated odds ratio of the outcome between exposed and unexposed groups is the same even if the ‘absolute’ proportion of cases sampled is higher 13 HST 190: Intro to Biostatistics Matched case-control designs • To further increase the statistical efficiency of a study, researchers may create a matched case-control design § For every case sampled, one or more controls is selected based on similarity to the case o Matching each case with � controls is called �: � matching § Goal is to correct for potential confounding in the study design § e.g., match each case with noncase of same age and gender, resulting in two groups having same distributions of age and gender • As with standard case-control, analysis then measures association between an exposure of interest and the outcome § Exposure of interest is not a factor used for matching • Matched designs balance increased cost of matching each subject with higher power and potential for causal inference 14 HST 190: Intro to Biostatistics Analyzing matched case-control designs • Suppose the sample includes � matched sets, how should we approach analysis? • Naïve approach: choose one matched set to be ‘baseline,’ and include � − 1 indicator variables for each other set § Essentially, treat each matched set as level of a categorical variable • Such a model forces us to estimate effect of exposure within groups that may only have a few people in them § Unstable estimation § Cannot generalize estimated comparisons of specific pairs of people • Instead, we want an analysis that estimates exposure effect by aggregating across matched sets 15 HST 190: Intro to Biostatistics Conditional logistic regression • Instead, researchers use conditional logistic regression to estimate the effect of an exposure of interest, conditioning out the factors used to create the matched sets • To illustrate, assume a matched pairs design. Let § �~6 = 1, �~• = 0 be the disease indicators of the �th case-control pair § �~66, … , �~6E , �~•6, … , �~•E be the covariates of the �th pair o Does not include ‘matched on’ factors, which are accounted for in design • Then for each pair, define the conditional likelihood contribution � �~6 = 1 ∩ �~• = 0 �~ �6, … , �E = � �~6 = 1 ∩ �~• = 0 + � �~6 = 0 ∩ �~• = 1 † �∑g‡l Xg„…lg = ∑† ∑† � g‡l Xg„…lg + � g‡l Xg„…og 16 HST 190: Intro to Biostatistics • Thus, we can compute estimates that maximize the ‰ Œ conditional likelihood � = arg min� ∑~•6 �~ �6, … , �E • If �b is coefficient of exposure of interest, then as before ‰ ORŽ = �Xg § Standard methods for testing and CIs are all the same as before • Note that because we already adJusted for factors used for matching, we do not get estimated effects for these factors § It would be inappropriate to include matching factors as covariates • We also do not get an estimated intercept, which makes sense because intercept not interpretable in case-control setting anyways 17 HST 190: Intro to Biostatistics Logistic regression modeling for prediction • Using a fitted logistic regression model, we so far have focused on estimation and inference of associations
Recommended publications
  • No-Decision Classification: an Alternative to Testing for Statistical
    The Journal of Socio-Economics 33 (2004) 631–650 No-decision classification: an alternative to testing for statistical significance Nathan Berga,b,∗ a School of Social Sciences, University of Texas at Dallas, GR 31 211300, Box 830688, Richardson, TX 75083-0688, USA b Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany Abstract This paper proposes a new statistical technique for deciding which of two theories is better supported by a given set of data while allowing for the possibility of drawing no conclusion at all. Procedurally similar to the classical hypothesis test, the proposed technique features three, as opposed to two, mutually exclusive data classifications: reject the null, reject the alternative, and no decision. Referred to as No-decision classification (NDC), this technique requires users to supply a simple null and a simple alternative hypothesis based on judgments concerning the smallest difference that can be regarded as an economically substantive departure from the null. In contrast to the classical hypothesis test, NDC allows users to control both Type I and Type II errors by specifying desired probabilities for each. Thus, NDC integrates judgments about the economic significance of estimated magnitudes and the shape of the loss function into a familiar procedural form. © 2004 Elsevier Inc. All rights reserved. JEL classification: C12; C44; B40; A14 Keywords: Significance; Statistical significance; Economic significance; Hypothesis test; Critical region; Type II; Power ∗ Corresponding author. Tel.: +1 972 883 2088; fax +1 972 883 2735. E-mail address: [email protected]. 1053-5357/$ – see front matter © 2004 Elsevier Inc. All rights reserved.
    [Show full text]
  • Logistic Regression, Dependencies, Non-Linear Data and Model Reduction
    COMP6237 – Logistic Regression, Dependencies, Non-linear Data and Model Reduction Markus Brede [email protected] Lecture slides available here: http://users.ecs.soton.ac.uk/mb8/stats/datamining.html (Thanks to Jason Noble and Cosma Shalizi whose lecture materials I used to prepare) COMP6237: Logistic Regression ● Outline: – Introduction – Basic ideas of logistic regression – Logistic regression using R – Some underlying maths and MLE – The multinomial case – How to deal with non-linear data ● Model reduction and AIC – How to deal with dependent data – Summary – Problems Introduction ● Previous lecture: Linear regression – tried to predict a continuous variable from variation in another continuous variable (E.g. basketball ability from height) ● Here: Logistic regression – Try to predict results of a binary (or categorical) outcome variable Y from a predictor variable X – This is a classification problem: classify X as belonging to one of two classes – Occurs quite often in science … e.g. medical trials (will a patient live or die dependent on medication?) Dependent variable Y Predictor Variables X The Oscars Example ● A fictional data set that looks at what it takes for a movie to win an Oscar ● Outcome variable: Oscar win, yes or no? ● Predictor variables: – Box office takings in millions of dollars – Budget in millions of dollars – Country of origin: US, UK, Europe, India, other – Critical reception (scores 0 … 100) – Length of film in minutes – This (fictitious) data set is available here: https://www.southampton.ac.uk/~mb1a10/stats/filmData.txt Predicting Oscar Success ● Let's start simple and look at only one of the predictor variables ● Do big box office takings make Oscar success more likely? ● Could use same techniques as below to look at budget size, film length, etc.
    [Show full text]
  • Comparison of Machine Learning Techniques When Estimating Probability of Impairment
    Comparison of Machine Learning Techniques when Estimating Probability of Impairment Estimating Probability of Impairment through Identification of Defaulting Customers one year Ahead of Time Authors: Supervisors: Alexander Eriksson Prof. Oleg Seleznjev Jacob Långström Xun Su June 13, 2019 Student Master thesis, 30 hp Degree Project in Industrial Engineering and Management Spring 2019 Abstract Probability of Impairment, or Probability of Default, is the ratio of how many customers within a segment are expected to not fulfil their debt obligations and instead go into Default. This isakey metric within banking to estimate the level of credit risk, where the current standard is to estimate Probability of Impairment using Linear Regression. In this paper we show how this metric instead can be estimated through a classification approach with machine learning. By using models trained to find which specific customers will go into Default within the upcoming year, based onNeural Networks and Gradient Boosting, the Probability of Impairment is shown to be more accurately estimated than when using Linear Regression. Additionally, these models provide numerous real-life implementations internally within the banking sector. The new features of importance we found can be used to strengthen the models currently in use, and the ability to identify customers about to go into Default let banks take necessary actions ahead of time to cover otherwise unexpected risks. Key Words Classification, Imbalanced Data, Machine Learning, Probability of Impairment, Risk Management Sammanfattning Titeln på denna rapport är En jämförelse av maskininlärningstekniker för uppskattning av Probability of Impairment. Uppskattningen av Probability of Impairment sker genom identifikation av låntagare som inte kommer fullfölja sina återbetalningsskyldigheter inom ett år.
    [Show full text]
  • Binary Classification Models with “Uncertain” Predictions
    Binary classification models with “Uncertain” predictions Damjan Krstajic1§, Ljubomir Buturovic2, Simon Thomas3, David E Leahy4 1Research Centre for Cheminformatics, Jasenova 7, 11030 Beograd, Serbia 2Clinical Persona, 932 Mouton Circle, East Palo Alto, CA 94303, USA 3Cyprotex Discovery Ltd, No. 24 Mereside, Alderley Park, Macclesfield SK10 4TG, UK 4Discovery Bus Ltd, The Oakridge Centre, Gibhill Farm, Shrigley Rd, Macclesfield, SK10 5SE, UK §Corresponding author Email addresses: DK: [email protected] LB: [email protected] ST: [email protected] DEL: [email protected] - 1 - Abstract Binary classification models which can assign probabilities to categories such as “the tissue is 75% likely to be tumorous” or “the chemical is 25% likely to be toxic” are well understood statistically, but their utility as an input to decision making is less well explored. We argue that users need to know which is the most probable outcome, how likely that is to be true and, in addition, whether the model is capable enough to provide an answer. It is the last case, where the potential outcomes of the model explicitly include ‘don’t know’ that is addressed in this paper. Including this outcome would better separate those predictions that can lead directly to a decision from those where more data is needed. Where models produce an “Uncertain” answer similar to a human reply of “don’t know” or “50:50” in the examples we refer to earlier, this would translate to actions such as “operate on tumour” or “remove compound from use” where the models give a “more true than not” answer.
    [Show full text]
  • Social Dating: Matching and Clustering
    Social Dating: Matching and Clustering SangHyeon (Alex) Ahn Jin Yong Shin 3900 North Charles St 3900 North Charles St Baltimore, MD 21218, USA Baltimore, MD 21218, USA [email protected] [email protected] Abstract 1 Introduction Purpose of the paper was to apply machine learn- Social dating is a stage of interpersonal rela- ing algorithms for Social Dating problems and build tionship between two individuals for with the tools to be used for smarter decision making in aim of each assessing the other’s suitability as matching process. We used the data from Speed Dat- a partner in a more committed intimate rela- ing Experiment (of 21 waves) which includes each tionship or marriage. Today, many individ- candidate’s demographic attributes and preferences. uals spend a lot of money, time, and effort We have selected relevant features to provide the for the search of their true partners. Some most informative data and used SQL to inner join reasons for the inefficiency in seeking sexual partners include limited pool of candidates, missing feature variables. lack of transparency in uncovering personal- We implemented Machine Learning algorithms ities, and the nature of time consumption in (without the use of external libraries) to perform building relationships. Finding the right part- (1) binary classifications to predict a date match ner in a machine driven automated process can between a candidate and a potential partner, and be beneficial towards increasing efficiency in (2) clustering analysis on candidates to use narrow the following aspects: reduced time and ef- down a pool of candidates by demographic traits fort, a larger pool, and a level of quantita- and partner preferences.
    [Show full text]
  • How Will AI Shape the Future of Law?
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Helsingin yliopiston digitaalinen arkisto How Will AI Shape the Future of Law? EDITED BY RIIKKA KOULU & LAURA KONTIAINEN 2019 Acknowledgements The editors and the University of Helsinki Legal Tech Lab would like to thank the authors and interviewees for the time and effort they put into their papers. We would also like to thank the Faculty of Law at University of Helsinki and particularly the continuous kind support of Professor Kimmo Nuotio, the former dean of the faculty and the Lab’s unofficial godfather, for facilitating the Lab’s development and for helping us make the second Legal Tech Conference happen. In addition, we would like to express our gratitude to the conference sponsors, Finnish Bar Association, the Association of Finnish Lawyers, Edita Publishing, Attorneys at Law Fondia and Attorneys at Law Roschier as well as the Legal Design Summit community for their support. It takes a village to raise a conference. Therefore, we would like to thank everyone whose time and commitment has turned the conference and this publication from an idea into reality. Thank you to the attendees, speakers, volunteers and Legal Tech Lab crew members. RIIKKA KOULU & LAURA KONTIAINEN Legal Tech Lab, University of Helsinki 2019 University of Helsinki Legal Tech Lab publications © Authors and Legal Tech Lab ISBN 978-951-51-5476-7 (print) ISBN 978-951-51-5477-4 (PDF) Print Veiters Helsinki 2019 Contents Foreword 009 KIMMO NUOTIO I Digital Transformation of
    [Show full text]
  • Logistic Regression Maths and Statistics Help Centre
    Logistic regression Maths and Statistics Help Centre Many statistical tests require the dependent (response) variable to be continuous so a different set of tests are needed when the dependent variable is categorical. One of the most commonly used tests for categorical variables is the Chi-squared test which looks at whether or not there is a relationship between two categorical variables but this doesn’t make an allowance for the potential influence of other explanatory variables on that relationship. For continuous outcome variables, Multiple regression can be used for a) controlling for other explanatory variables when assessing relationships between a dependent variable and several independent variables b) predicting outcomes of a dependent variable using a linear combination of explanatory (independent) variables The maths: For multiple regression a model of the following form can be used to predict the value of a response variable y using the values of a number of explanatory variables: y 0 1x1 2 x2 ..... q xq 0 Constant/ intercept , 1 q co efficients for q explanatory variables x1 xq The regression process finds the co-efficients which minimise the squared differences between the observed and expected values of y (the residuals). As the outcome of logistic regression is binary, y needs to be transformed so that the regression process can be used. The logit transformation gives the following: p ln 0 1x1 2 x2 ..... q xq 1 p p p probabilty of event occuring e.g. person dies following heart attack, odds ratio 1- p If probabilities of the event of interest happening for individuals are needed, the logistic regression equation exp x x ....
    [Show full text]
  • Gaussian Process Classification Using Privileged Noise
    Mind the Nuisance: Gaussian Process Classification using Privileged Noise Daniel Hernandez-Lobato´ Viktoriia Sharmanska Universidad Autonoma´ de Madrid IST Austria Madrid, Spain Klosterneuburg, Austria [email protected] [email protected] Kristian Kersting Christoph H. Lampert TU Dortmund IST Austria Dortmund, Germany Klosterneuburg, Austria [email protected] [email protected] Novi Quadrianto SMiLe CLiNiC, University of Sussex Brighton, United Kingdom [email protected] Abstract The learning with privileged information setting has recently attracted a lot of at- tention within the machine learning community, as it allows the integration of ad- ditional knowledge into the training process of a classifier, even when this comes in the form of a data modality that is not available at test time. Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian process classifier (GPC). That is, in contrast to the standard GPC setting, the latent function is not just a nuisance but a feature: it becomes a natural measure of confidence about the training data by modulating the slope of the GPC probit likelihood function. Extensive experiments on public datasets show that the proposed GPC method using privileged noise, called GPC+, improves over a stan- dard GPC without privileged knowledge, and also over the current state-of-the-art SVM-based method, SVM+. Moreover, we show that advanced neural networks and deep learning methods can be compressed as privileged information. 1 Introduction Prior knowledge is a crucial component of any learning system as without a form of prior knowl- edge learning is provably impossible [1].
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Binary Classification Approach to Ordinal Regression
    Binary Classification Approach to Ordinal Regression Jianhao Peng [email protected] Abstract In the lecture we mainly focus on questions about binary classification or regres- sion. However in many cases we may want the results to show preference in terms of several discrete, but ordered, values, like ratings of a restaurant, disease condition and students' letter grades. Such problems are usually known as ordinal regression or ordinal classification. Standard approaches to ordinal regression in statistics require a assumption on the distribution of a latent variable while many other learning methods treat this as a multiclass problem, which may lose the order information in the data. In this paper I first present the probability model of ordinal regression in traditional statistics, then briefly go through different types of loss function defined from two per- spectives in order to reduce the problem to simple binary case. From the loss function we can also get performance bounds similar to those we learnt in binary classification. 1 Introduction Compared to the standard machine learning tasks of classification or regression, there exist many other systems in which users specify preferences by giving each sample one of several discrete values, e.g. one through five stars in movie or restaurant rating and A through F in letter grades. Those problems are often considered as multiclass classification, which treated the labels as a finite unordered nominal set. However in that case the natural order structure of the data is ignored and may not be as accurate. Therefore one nature question arises: can we make use of the lost ordinal information to potentially improve the predictive performance of the classifier or simplify the learning algorithm? Such question has been studied in statistics for decades.
    [Show full text]
  • Analysis of Confirmed Cases of COVID-19
    sustainability Article Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Confirmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classification Using Artificial Intelligence and Regression Analysis Behrouz Pirouz 1 , Sina Shaffiee Haghshenas 2,* , Sami Shaffiee Haghshenas 2 and Patrizia Piro 2 1 Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Rende, Italy; [email protected] 2 Department of Civil Engineering, University of Calabria, 87036 Rende, Italy; Sami.shaffi[email protected] (S.S.H.); [email protected] (P.P.) * Correspondence: S.shaffi[email protected]; Tel.: +39-0984-496-542 Received: 5 March 2020; Accepted: 16 March 2020; Published: 20 March 2020 Abstract: Nowadays, sustainable development is considered a key concept and solution in creating a promising and prosperous future for human societies. Nevertheless, there are some predicted and unpredicted problems that epidemic diseases are real and complex problems. Hence, in this research work, a serious challenge in the sustainable development process was investigated using the classification of confirmed cases of COVID-19 (new version of Coronavirus) as one of the epidemic diseases. Hence, binary classification modeling was used by the group method of data handling (GMDH) type of neural network as one of the artificial intelligence methods. For this purpose, the Hubei province in China was selected as a case study to construct the proposed model, and some important factors, namely maximum, minimum, and average daily temperature, the density of a city, relative humidity, and wind speed, were considered as the input dataset, and the number of confirmed cases was selected as the output dataset for 30 days.
    [Show full text]
  • Evidence Type Classification in Randomized Controlled Trials Tobias Mayer, Elena Cabrio, Serena Villata
    Evidence Type Classification in Randomized Controlled Trials Tobias Mayer, Elena Cabrio, Serena Villata To cite this version: Tobias Mayer, Elena Cabrio, Serena Villata. Evidence Type Classification in Randomized Controlled Trials. 5th ArgMining@EMNLP 2018, Oct 2018, Brussels, Belgium. hal-01912157 HAL Id: hal-01912157 https://hal.archives-ouvertes.fr/hal-01912157 Submitted on 5 Nov 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Evidence Type Classification in Randomized Controlled Trials Tobias Mayer and Elena Cabrio and Serena Villata Universite´ Coteˆ d’Azur, CNRS, I3S, Inria, France ftmayer,cabrio,[email protected] Abstract To answer this question, we propose to resort Randomized Controlled Trials (RCT) are a on Argument Mining (AM) (Peldszus and Stede, common type of experimental studies in the 2013; Lippi and Torroni, 2016a), defined as “the medical domain for evidence-based decision general task of analyzing discourse on the prag- making. The ability to automatically extract matics level and applying a certain argumentation the arguments proposed therein can be of valu- theory to model and automatically analyze the data able support for clinicians and practitioners in at hand” (Habernal and Gurevych, 2017).
    [Show full text]