Huu Minh PHAM
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF EASTER FINLAND SCIENCE AND FOREST FACULTY -----&----- LEARNING DIARY OF RESEARCH METHODS IN FOREST SCIENCES Student's name: Pham Huu Minh Student number: 291366 1. Research process Science is a systematic and logical approach to discovering how things in the universe work. It is also the body of knowledge accumulated through the discoveries about all the things in the universe. When conducting research, scientists use the scientific method to collect measurable, empirical evidence in an experiment related to a hypothesis(often in the form of an if/then statement), the results aiming to support or contradict a theory. 1. Make an observation or observations. 2. Ask questions about the observations and gather information. 3. Form a hypothesis — a tentative description of what's been observed, and make predictions based on that hypothesis. 4. Test the hypothesis and predictions in an experiment that can be reproduced. 5. Analyze the data and draw conclusions; accept or reject the hypothesis or modify the hypothesis if necessary. 6. Reproduce the experiment until there are no discrepancies between observations and theory. Statistical analysis is fundamental to all experiments that use statistics as a research methodology. Most experiments in social sciences and many important experiments in natural science and engineering need statistical analysis. Statistical analysis is also a very useful tool to get approximate solutions when the actual process is highly complex or unknown in its true form. Example: The study of turbulence relies heavily on statistical analysis derived from experiments. Turbulence is highly complex and almost impossible to study at a purely theoretical level. Scientists therefore need to rely on a statistical analysis of turbulence through experiments to confirm theories they propound. In social sciences, statistical analysis is at the heart of most experiments. It is very hard to obtain general theories in these areas that are universally valid. In addition, it is through experiments and surveys that a social scientist is able to confirm theory. 2. Basic concepts in statistics Mean The most commonly used measure of center for quantitative variable is the (arithmetic) sample mean. When people speak of taking an average, it is mean that they are most often referring to. The sample mean of the variable is the sum of observed values in a data divided by the number of observations. Variance The sample range of the variable is the difference between its maximum and minimum values in a data set: Range = Max − Min. The sample range of the variable is quite easy to compute. However, in using the range, a great deal of information is ignored, that is, only the largest and smallest values of the variable are considered; the other observed values are disregarded. It should also be remarked that the range cannot ever decrease, but can increase, when additional observations are included in the data set and that in sense the range is overly sensitive to the sample size. There are also several different measures of variation, but three of the most frequently used measures of variation are the sample range, the sample interquartile range and the sample standard deviation. Measures of variation are used mostly only for quantitative variables. Std Deviation i=1 is called sum of squared deviations and provides a measure of total deviation from the mean for all the observed values of the variable. Once the sum of squared deviations is divided by n − 1, we get: n−1 which is called the sample variance. The sample standard deviation has following alternative formulas: The formulas (2) and (3) are useful from the computational point of view. In hand calculation, use of these alternative formulas often reduces the arithmetic work, especially when x ̄ turns out to be a number with many decimal places. Error In statistics, sampling error is incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics on the sample, such as means and quantiles, generally differ from the characteristics of the entire population, which are known as parameters. For example, if one measures the height of a thousand individuals from a country of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country. Since sampling is typically done to determine the characteristics of a whole population, the difference between the sample and population values is considered a sampling error. Distributions Frequency distributions for a variable apply both to a population and to samples from that population. The first type is called the population distribution of the variable, and the second type is called a sample distribution. In a sense, the sample distribution is a blurry photograph of the population distribution. 3. t-test and ANOVA T-test The independent t-test, also called the two-sample t-test, independent-samples t-test or student's t- test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups. Null Hypothesis Null and alternative hypotheses for the independent t-test: The null hypothesis for the independent t-test is that the population means from the two unrelated groups are equal: H0: u1 = u2 In most cases, we are looking to see if we can show that we can reject the null hypothesis and accept the alternative hypothesis, which is that the population means are not equal: HA: u1 ≠ u2 To do this, we need to set a significance level (also called alpha) that allows us to either reject or accept the alternative hypothesis. Most commonly, this value is set at 0.05. Requirements: • Two independent samples • Data should be normally distributed • The two samples should have the same variance Equation: Significance Level: α. Critical Region: Example: The above table shows the results of the age survey between men and women participating in insurance in Vietnam. After using 2 samples t-Test for Equal Means, I got the result as follow: p= 0.811 > 0.05 so we reject the null hypothesis and conclude that the two-group means are different at the 0.05 significance level. ANOVA Definition: An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. Basically, you’re testing groups to see if there’s a difference between them. Examples of when you might want to test different groups: • A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. Types of test: There are two main types: one-way and two-way. Two-way tests can be with or without replication. • One-way ANOVA between groups: used when you want to test two groups to see if there’s a difference between them. • Two - way ANOVA without replication: used when you have one group and you’re double- testing that same group. For example, you’re testing one set of individuals before and after they take a medication to see if it works or not. • Two - way ANOVA with replication: Two groups, and the members of those groups are doing more than one thing. For example, two groups of patients from different hospitals trying two different therapies. In a one-way, ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal. In a Two - way, ANOVA is an extension of the One-Way ANOVA. With a One Way, you have one independent variable affecting a dependent variable. With a Two-Way ANOVA, there are two independents. Use a two-way ANOVA when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. Assumptions for Two-way ANOVA: • The population must be close to a normal distribution. • Samples must be independent. • Population variances must be equal. • Groups must have equal sample sizes. Example for one-way ANOVA: The revenue sales (in Euro) of 3 items in a supermarket is : After using one-way ANOVA, I got the result as follow: Conclusion: if F > F crit, we reject the null hypothesis. This is the case, 6.12 > 3.35. Therefore, we reject the null hypothesis. 4. Basics of modeling: simple regression A linear regression model attempts to explain the relationship between two or more variables using a straight line. Consider the data obtained from a chemical process where the yield of the process is thought to be related to the reaction temperature. And a scatter plot can be obtained as shown in the following figure. In the scatter plot yield, �" is plotted for different temperature values, �". It is clear that no line can be found to pass through all points of the plot. Thus, no functional relation exists between the two variables x and Y. However, the scatter plot does give an indication that a straight line may exist such that all the points on the plot are scattered randomly around this line. A statistical relation is said to exist in this case. The statistical relation between x and Y may be expressed as follows: Y=�% + �'x A regression line can show a positive linear relationship, a negative linear relationship, or no relationship. If the graphed line in a simple linear regression is flat (not sloped), there is no relationship between the two variables.