Huu Minh PHAM

Total Page:16

File Type:pdf, Size:1020Kb

Huu Minh PHAM UNIVERSITY OF EASTER FINLAND SCIENCE AND FOREST FACULTY -----&----- LEARNING DIARY OF RESEARCH METHODS IN FOREST SCIENCES Student's name: Pham Huu Minh Student number: 291366 1. Research process Science is a systematic and logical approach to discovering how things in the universe work. It is also the body of knowledge accumulated through the discoveries about all the things in the universe. When conducting research, scientists use the scientific method to collect measurable, empirical evidence in an experiment related to a hypothesis(often in the form of an if/then statement), the results aiming to support or contradict a theory. 1. Make an observation or observations. 2. Ask questions about the observations and gather information. 3. Form a hypothesis — a tentative description of what's been observed, and make predictions based on that hypothesis. 4. Test the hypothesis and predictions in an experiment that can be reproduced. 5. Analyze the data and draw conclusions; accept or reject the hypothesis or modify the hypothesis if necessary. 6. Reproduce the experiment until there are no discrepancies between observations and theory. Statistical analysis is fundamental to all experiments that use statistics as a research methodology. Most experiments in social sciences and many important experiments in natural science and engineering need statistical analysis. Statistical analysis is also a very useful tool to get approximate solutions when the actual process is highly complex or unknown in its true form. Example: The study of turbulence relies heavily on statistical analysis derived from experiments. Turbulence is highly complex and almost impossible to study at a purely theoretical level. Scientists therefore need to rely on a statistical analysis of turbulence through experiments to confirm theories they propound. In social sciences, statistical analysis is at the heart of most experiments. It is very hard to obtain general theories in these areas that are universally valid. In addition, it is through experiments and surveys that a social scientist is able to confirm theory. 2. Basic concepts in statistics Mean The most commonly used measure of center for quantitative variable is the (arithmetic) sample mean. When people speak of taking an average, it is mean that they are most often referring to. The sample mean of the variable is the sum of observed values in a data divided by the number of observations. Variance The sample range of the variable is the difference between its maximum and minimum values in a data set: Range = Max − Min. The sample range of the variable is quite easy to compute. However, in using the range, a great deal of information is ignored, that is, only the largest and smallest values of the variable are considered; the other observed values are disregarded. It should also be remarked that the range cannot ever decrease, but can increase, when additional observations are included in the data set and that in sense the range is overly sensitive to the sample size. There are also several different measures of variation, but three of the most frequently used measures of variation are the sample range, the sample interquartile range and the sample standard deviation. Measures of variation are used mostly only for quantitative variables. Std Deviation i=1 is called sum of squared deviations and provides a measure of total deviation from the mean for all the observed values of the variable. Once the sum of squared deviations is divided by n − 1, we get: n−1 which is called the sample variance. The sample standard deviation has following alternative formulas: The formulas (2) and (3) are useful from the computational point of view. In hand calculation, use of these alternative formulas often reduces the arithmetic work, especially when x ̄ turns out to be a number with many decimal places. Error In statistics, sampling error is incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics on the sample, such as means and quantiles, generally differ from the characteristics of the entire population, which are known as parameters. For example, if one measures the height of a thousand individuals from a country of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country. Since sampling is typically done to determine the characteristics of a whole population, the difference between the sample and population values is considered a sampling error. Distributions Frequency distributions for a variable apply both to a population and to samples from that population. The first type is called the population distribution of the variable, and the second type is called a sample distribution. In a sense, the sample distribution is a blurry photograph of the population distribution. 3. t-test and ANOVA T-test The independent t-test, also called the two-sample t-test, independent-samples t-test or student's t- test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups. Null Hypothesis Null and alternative hypotheses for the independent t-test: The null hypothesis for the independent t-test is that the population means from the two unrelated groups are equal: H0: u1 = u2 In most cases, we are looking to see if we can show that we can reject the null hypothesis and accept the alternative hypothesis, which is that the population means are not equal: HA: u1 ≠ u2 To do this, we need to set a significance level (also called alpha) that allows us to either reject or accept the alternative hypothesis. Most commonly, this value is set at 0.05. Requirements: • Two independent samples • Data should be normally distributed • The two samples should have the same variance Equation: Significance Level: α. Critical Region: Example: The above table shows the results of the age survey between men and women participating in insurance in Vietnam. After using 2 samples t-Test for Equal Means, I got the result as follow: p= 0.811 > 0.05 so we reject the null hypothesis and conclude that the two-group means are different at the 0.05 significance level. ANOVA Definition: An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. Basically, you’re testing groups to see if there’s a difference between them. Examples of when you might want to test different groups: • A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. Types of test: There are two main types: one-way and two-way. Two-way tests can be with or without replication. • One-way ANOVA between groups: used when you want to test two groups to see if there’s a difference between them. • Two - way ANOVA without replication: used when you have one group and you’re double- testing that same group. For example, you’re testing one set of individuals before and after they take a medication to see if it works or not. • Two - way ANOVA with replication: Two groups, and the members of those groups are doing more than one thing. For example, two groups of patients from different hospitals trying two different therapies. In a one-way, ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal. In a Two - way, ANOVA is an extension of the One-Way ANOVA. With a One Way, you have one independent variable affecting a dependent variable. With a Two-Way ANOVA, there are two independents. Use a two-way ANOVA when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. Assumptions for Two-way ANOVA: • The population must be close to a normal distribution. • Samples must be independent. • Population variances must be equal. • Groups must have equal sample sizes. Example for one-way ANOVA: The revenue sales (in Euro) of 3 items in a supermarket is : After using one-way ANOVA, I got the result as follow: Conclusion: if F > F crit, we reject the null hypothesis. This is the case, 6.12 > 3.35. Therefore, we reject the null hypothesis. 4. Basics of modeling: simple regression A linear regression model attempts to explain the relationship between two or more variables using a straight line. Consider the data obtained from a chemical process where the yield of the process is thought to be related to the reaction temperature. And a scatter plot can be obtained as shown in the following figure. In the scatter plot yield, �" is plotted for different temperature values, �". It is clear that no line can be found to pass through all points of the plot. Thus, no functional relation exists between the two variables x and Y. However, the scatter plot does give an indication that a straight line may exist such that all the points on the plot are scattered randomly around this line. A statistical relation is said to exist in this case. The statistical relation between x and Y may be expressed as follows: Y=�% + �'x A regression line can show a positive linear relationship, a negative linear relationship, or no relationship. If the graphed line in a simple linear regression is flat (not sloped), there is no relationship between the two variables.
Recommended publications
  • Harvard-MIT Division of Health Sciences and Technology HST
    Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo C# .NET Algorithm for Variable Selection Based on the Mallow’s Cp Criterion Jessie Chen, MEng. Massachusetts Institute of Technology, Cambridge, MA Abstract: Variable selection techniques are important in statistical modeling because they seek to simultaneously reduce the chances of data overfitting and to minimize the effects of omission bias. The Linear or Ordinary Least Squared regression model is particularly useful in variable selection because of its association with certain optimality criterions. One of these is the Mallow’s Cp Criterion which evaluates the fit of a regression model by the squared distance between its predictions and the true values. The first part of this project seeks to implement an algorithm in C# .NET for variable selection using the Mallow’s Cp Criterion and also to test the viability of using a greedy version of such an algorithm in reducing computational costs. The second half aims to verify the results of the algorithm through logistic regression. The results affirmed the use of a greedy algorithm, and the logistic regression models also confirmed the Mallow’s Cp results. However, further studies on the details of the Mallow’s Cp algorithm, a calibrated logistic regression modeling process, and perhaps incorporation of techniques such as cross- validation may also be useful before drawing final conclusions concerning the reliability of the algorithm implemented. Keywords: variable selection; overfitting; omission bias; linear least squared regression; Mallow’s Cp; logistic regression; C-Index Background Variable Selection Variable selection is an area of study concerned with the strategies for selecting one subset out of a pool of independent variables that is able to explain or predict the dependent variable well enough, such that all contributions from the variables that remain unselected may be neglected or considered pure error [13].
    [Show full text]
  • New Ideas for Method Comparison: a Monte Carlo Power Analysis
    New ideas for method comparison: a Monte Carlo power analysis Dr Giorgio Pioda∗ 10/5/2021 Contents 1 Introduction 1 1.1 Joint ellipse (JE) vs confidence intervals (CI) based method comparison . .1 1.2 Robust Deming regressions . .2 2 Methods 3 2.1 Monte Carlo simulation models . .3 2.2 The type I error . .5 2.3 Type II error and power comparison: the slope . .9 2.4 Type II error and power comparison: the intercept . 21 2.5 The heteroscedastic case . 24 2.6 Ties: the (badly ignored) role of the precision of the methods on the validation . 31 2.7 Conclusions about the methods . 38 3 Application examples 39 3.1 Creatinine R data set from the {mcr} package . 39 3.2 Glycated hemoglobin (extreem) example . 42 4 Conclusions 46 5 Aknowledgments 46 6 Appendix 47 6.1 Appendix A: the data generation function . 47 6.2 Appendix B: empirical rejection plot for the long range experiments . 50 6.3 Appendix C: the plots for Dem, MMDem, Paba and WDem . 51 6.4 Appendix D: Nonlinear exponential power regression. 54 6.5 Appendix E: comparison tables . 55 arXiv:2105.04628v1 [stat.ME] 10 May 2021 6.6 Appendix F: 250 samples CI vs 40 samples ellipse power comparison . 56 6.7 Appendix G. tables for the intercept experiments . 57 6.8 Appendix H: heteroscedastic additional comparative plots . 59 1 Introduction 1.1 Joint ellipse (JE) vs confidence intervals (CI) based method comparison Comparison methods are very important for all analytical laboratories and are largely standardized like in the EU ISO 15189 norms collection [1].
    [Show full text]
  • Descriptive Statistics
    Regression analysis DESCRIPTIVE STATISTICS Dr Alina Gleska Institute of Mathematics, PUT April 22, 2018 Regression analysis 1 Regression analysis Regression analysis Two-dimensional data Statistical observation – the pair (X;Y ). Types of analysis: correlation – we define the shape, the direction and the strength of relationships; regression – we define the mathematical function between correlated variables. Regression analysis Regression analysis Regression analysis – a set of statistical processes for estimating the relationships among variables. Regression function – a mathematical function of the independent variable X. We distinguish two types of dependence: functional - for each value ofX we have only one value of Y; statistical (stochastic) - for each value ofX we have many values ofY. Regression analysis The statistical dependence The statistical dependence we can write as: Y = f (X) + e; where e – a random error. The regression equation (the regression model) – the equation describing the relationships among variables after adding a random error. Regression analysis Regression Types of regression: Y = f (X) + e – a simple regression (a regression model with a single explanatory variable); Y = f (X1;:::;Xn) + e – a multiple regression; X – a reason – the independent variable; Y – a result, an effect – the dependent variable. Types of simple regression: linear – the best fitted function is a linear function; nonlinear – the best fitted function is a nonlinear function (for example an exponential function or a logarithmic function). Regression analysis The choice of the regression model The regression model is chosen according to the dispersion of the empirical data on their scatterplot. REMARK! In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data.
    [Show full text]
  • A Comprehensive Guide to Machine Learning
    A Comprehensive Guide to Machine Learning Soroush Nasiriany, Garrett Thomas, William Wei Wang, Alex Yang Department of Electrical Engineering and Computer Sciences University of California, Berkeley January 17, 2018 2 About CS 189 is the Machine Learning course at UC Berkeley. In this guide we have created a com- prehensive course guide in order to share our knowledge with students and the general public, and hopefully draw the interest of students from other universities to Berkeley's Machine Learning curriculum. We owe gratitude to Professor Anant Sahai and Professor Stella Yu, as this book is heavily inspired from their lectures. In addition, we are indebted to Professor Jonathan Shewchuk for his machine learning notes, from which we drew inspiration. The latest version of this document can be found at http://snasiriany.me/cs189/. Please report any mistakes to [email protected]. Please contact the authors if you wish to redistribute this document. Notation Notation Meaning R set of real numbers n R set (vector space) of n-tuples of real numbers, endowed with the usual inner product m×n R set (vector space) of m-by-n matrices δij Kronecker delta, i.e. δij = 1 if i = j, 0 otherwise rf(x) gradient of the function f at x r2f(x) Hessian of the function f at x p(X) distribution of random variable X p(x) probability density/mass function evaluated at x E[X] expected value of random variable X Var(X) variance of random variable X Cov(X; Y ) covariance of random variables X and Y Other notes: n • Vectors and matrices are in bold (e.g.
    [Show full text]
  • The Validation of Expert System Traffic Psychological Assessment to Romanian Driving Schools
    Procedia Available online at www.sciencedirect.com Social and Behavioral Procedia - Social and Behavioral Sciences 00 (2011) 000–000 Sciences Procedia - Social and Behavioral Sciences 30 (2011) 457 – 464 www.elsevier.com/locate/procedia WCPCG-2011 The validation of Expert System Traffic psychological assessment to Romanian Driving Schools Mihai Aniţeia*, Mihaela Chraifb, Gernort Schuhfriedc, Markus Sommerd a Professor, PhD, University of Bucharest, Faculty of Psychology and Educational Sciences / Bd. M. Kogalniceanu 050107, Bucharest, Romania bPostdoctoral fellow, University of Bucharest, Faculty of Psychology and Educational Sciences, Bd. M. Kogalniceanu 050107, Bucharest, Romania c Schuhfried GmbH,Vienna, Austria dAsistent Professor, PhD, University of Vienna, Vienna, Austria Abstract Analyzing the multiple regression model for the composite criterion, the multiple correlation coefficient, evidence a high and statistically significant correlation between the predictors and the criterion (r=0.741, p<0.05). Also, the beta coefficients provide that the variables of the tests are predictors for the performances registered in traffic (p<0.05). This study based on the findings of the previous research highlight that the Romanian driving schools should improve the psychological assessment batteries with modern and validated instruments. The predictive regression validation model emphasizes the importance of using high performance statistical programs in choosing the psychological tests for evaluation. ©© 20112011 PublishedPublished by by Elsevier
    [Show full text]
  • Improving the Prediction of Readmissions Amongst Medicare
    Improving the Prediction of Readmissions Amongst Medicare Patients in a California Hospital Nhan Huynh, Dylan Robbins-Kelley, Holly Fallah Faculty Advisors: Ian Duncan, Janet Duncan, Wade Herndon Dept. of Statistics and Applied Probability, University of California Santa Barbara Introduction Methodology Results Purpose and Motivation Logistic Regression Models Centers for Medicare and Medicaid Services A regression model where the data set has a Three models were created: (CMS) reduced Medicare payments for binary response or a multinomial response . LACE model hospitals with excess readmissions (within 30 and several predictors . General Model days of discharge) for following health We are interested in predicting the . Age 65+ model with CMS penalty conditions: probability a patient is readmitted to the conditions . Heart Attack, Heart Failure, Pneumonia, hospitals within 30 days after discharge Hip/Knee Replacement, Chronic based on characteristics such as: Obstructive Pulmonary Disease. age, gender, length of stay during Readmissions can lead to longer stays, and admission, diagnoses, admission from put patients at additional risk of hospital- emergency department, number of acquired infections and complications. emergency visits, etc… Development of LACE Logistic regression links the binary outcomes Table to compare predicted and actual re- Currently the LACE index is a widely used of readmission status with a combination of admissions using the age 65+ model: readmission model in the United States, due the linear predictors. Let p=probability the patient is readmitted to its simplicity and moderate predictive power. within 30 days after discharge LACE scores every patient on the risk of Let b0=intercept readmission upon discharge based on the Let bp=coefficient of variable following parameters: Let Xp=variable .
    [Show full text]
  • C# .NET Algorithm for Variable Selection Based on the Mallow's Cp Criterion
    C# .NET Algorithm for Variable Selection Based on the Mallow’s C p Criterion Jessie Chen, MEng. Massachusetts Institute of Technology, Cambridge, MA [email protected] Abstract: Variable selection techniques are important in statistical modeling because they seek to simultaneously reduce the chances of data overfitting and to minimize the effects of omission bias. The Linear or Ordinary Least Squared regression model is particularly useful in variable selection because of its association with certain optimality criterions. One of these is the Mallow’s C p Criterion which evaluates the fit of a regression model by the squared distance between its predictions and the true values. The first part of this project seeks to implement an algorithm in C# .NET for variable selection using the Mallow’s C p Criterion and also to test the viability of using a greedy version of such an algorithm in reducing computational costs. The second half aims to verify the results of the algorithm through logistic regression. The results affirmed the use of a greedy algorithm, and the logistic regression models also confirmed the Mallow’s C p results. However, further studies on the details of the Mallow’s C p algorithm, a calibrated logistic regression modeling process, and perhaps incorporation of techniques such as cross- validation may also be useful before drawing final conclusions concerning the reliability of the algorithm implemented. Keywords: variable selection; overfitting; omission bias; linear least squared regression; Mallow’s C p; logistic regression; C-Index Background Variable Selection Variable selection is an area of study concerned with the strategies for selecting one subset out of a pool of independent variables that is able to explain or predict the dependent variable well enough, such that all contributions from the variables that remain unselected may be neglected or considered pure error [13].
    [Show full text]
  • Estimating Product Composition Profiles in Batch Distillation Via
    ARTICLE IN PRESS Control Engineering Practice 12 (2004) 917–929 Estimating product composition profiles in batch distillation via partial least squares regression Eliana Zamprognaa, Massimiliano Baroloa,*, Dale E. Seborgb a Dipartimento di Principi e Impianti di Ingegneria Chimica (DIPIC), Universita" di Padova, Via Marzolo, 9, 35131 Padova PD, Italy b Department of Chemical Engineering, University of California, Santa Barbara, CA 93106, USA Received 15 February 2003; accepted 24 November 2003 Abstract The properties of two multivariate regression techniques, principal component analysis and partial least squares (PLS) regression, are exploited to develop soft sensors able to estimate the product composition profiles in a simulated batch distillation process using available temperature measurements. The estimators’ performance is evaluated with respect to several issues, such as pre-processing of the calibration and validation data sets, number of measurements used as sensor inputs, presence of noise in the input measurements, and use of lagged measurements. A simple augmentation of the conventional PLS regression approach is also proposed, which is based on the development and sequential use of multiple regression models. The results prove that the PLS estimators can provide accurate composition estimations for a batch distillation process. The computational requirements are very low, which makes the estimators attractive for on-line use. r 2004 Elsevier Ltd. All rights reserved. Keywords: Batch distillation; Composition estimators; Soft sensors; Partial least squares regression; Principal component analysis 1. Introduction composition), at constant distillate composition (with variable reflux ratio), and at total reflux. A combination Batch distillation is a well-known unit operation that of these three basic modes can be used to optimize the is widely used in the fine chemistry, pharmaceutical, performance of the separation.
    [Show full text]
  • Pattern Recognition (For Neuroimaging Data) Fundamentals
    Pattern Recognition (for Neuroimaging Data) Fundamentals OHBM Educational Course Vancouver, June 25, 2017 C. Phillips, GIGA – Research, ULiège, Belgium [email protected] http://www.giga.ulg.ac.be Today’s Menu Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Introduction Series of images = 4D image = 3D array of feature series = series of 3D images N Many variable values Series of measurements Univariate vs. multivariate Standard univariate approach, aka. Statistical Parametric Mapping Standard Statistical Analysis (encoding) Input Voxel-wise Output GLM model Independent Correction estimation ... statistical for test at each multiple voxel comparisons Univariate statistical BOLD signal BOLD Parametric map Time Find the mapping from explanatory variable (my design matrix) to observed data (one voxel values across images). Univariate vs. multivariate Multivariate approach, aka. “pattern recognition” Input Output … Training “trained machine” Samples from Cond 1 Phase = link from image to Cond {1,2} … Samples from Cond 2 New sample Test Phase Prediction: Cond 1 or Cond 2 Find the mapping f from observed data X (one whole image) to explanatory variable y (label/score) f : X y Pattern
    [Show full text]
  • Logistic and Multiple Regression: a Two-Pronged Approach to Accurately Estimate Cost Growth in Major Dod Weapon Systems
    Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-2004 Logistic and Multiple Regression: A Two-Pronged Approach to Accurately Estimate Cost Growth in Major DoD Weapon Systems Matthew B. Rossetti Follow this and additional works at: https://scholar.afit.edu/etd Part of the Finance and Financial Management Commons Recommended Citation Rossetti, Matthew B., "Logistic and Multiple Regression: A Two-Pronged Approach to Accurately Estimate Cost Growth in Major DoD Weapon Systems" (2004). Theses and Dissertations. 3964. https://scholar.afit.edu/etd/3964 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. LOGISTIC AND MULTIPLE REGRESSION: A TWO-PRONGED APPROACH TO ACCURATELY ESTIMATE COST GROWTH IN MAJOR DoD WEAPON SYSTEMS THESIS Matthew B. Rossetti, B.A. First Lieutenant, USAF AFIT/GCA/ENC/04-04 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U. S. Government. AFIT/GCA/ENC/04-04 LOGISTIC AND MULTIPLE REGRESSION: A TWO-PRONGED APPROACH TO ACCURATELY ESTIMATE COST GROWTH IN MAJOR DoD WEAPON SYSTEMS THESIS Presented to the Faculty Department of Mathematics and Statistics Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Master of Science in Cost Analysis Matthew B.
    [Show full text]
  • Multiple Regression Validation Forrest Breyfogle III August 31, 2004
    Multiple Regression Validation Forrest Breyfogle III www.smartersolutions.com August 31, 2004 The following is my response to a posting at the Quality Digest’s www.insidequality.com, where I am the Six Sigma Discussion Forum Moderator. INITIAL POSTING Once I've calculated my equation for multiple regression, how do I go about validation? My only direction, for now, is to take the residuals of the calculated vs. actual and create a control chart (3sds). Test an appropriate sample size again and calculate those residuals (actual vs. calculated). If they fall within the original control chart, the equation is considered valid. Does this seem correct? Any other suggestions? RESPONSE: One approach that builds upon the described basic multiple regression model validation approach is: 1. Use historical data to determine a multiple regression equation. A best subsets approach can be very useful to determine the key process input variables (KPIVs) to include within the equation. 2. Create an infrequent subgrouping/sampling plan such that normal variation levels of the KPIVs will impact the XmR response control chart as common cause variability. 3. Select a sample per the sampling plan. 4. For the sample, record a response and the equation’s KPIVs levels when the response was recorded. 5. Use the regression equation to calculate a prediction value for the recorded KPIV levels of the sample. 6. Determine for the sample the difference between the predicted and the measured response. Do not adjust the predicted equation with the newly recorded response value and KPIV level. I will call this “residual” difference a “prediction delta”.
    [Show full text]
  • Modeling Power Output of Horizontal Solar Panels Using Multivariate Linear Regression and Random Forest Machine Learning Christil K
    Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-21-2019 Modeling Power Output of Horizontal Solar Panels Using Multivariate Linear Regression and Random Forest Machine Learning Christil K. Pasion Follow this and additional works at: https://scholar.afit.edu/etd Part of the Oil, Gas, and Energy Commons Recommended Citation Pasion, Christil K., "Modeling Power Output of Horizontal Solar Panels Using Multivariate Linear Regression and Random Forest Machine Learning" (2019). Theses and Dissertations. 2348. https://scholar.afit.edu/etd/2348 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. AFIT-ENV-MS-19-M-192 MODELING POWER OUTPUT OF HORIZONTAL SOLAR PANELS USING MULTIVARIATE LINEAR REGRESSION AND RANDOM FOREST MACHINE LEARNING THESIS Christil K. Pasion, 2d Lt, USAF AFIT-ENV-MS-19-M-192 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A. APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the United States Government. This material is declared a work of the United States Government and is not subject to copyright protection in the United States. AFIT-ENV-MS-19-M-192 MODELING POWER OUTPUT OF HORIZONTAL SOLAR PANELS USING MULTIVARIATE LINEAR REGRESSION AND RANDOM FOREST MACHINE LEARNING THESIS Presented to the Faculty Department of Engineering Management Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering Management Christil K.
    [Show full text]