Surface Fitting and Some Associated Confidence Procedures
Total Page:16
File Type:pdf, Size:1020Kb
SURFACE FITTING AND SOME ASSOCIATED CONFIDENCE PROCEDURES by ARNOLD LEON SCHROEDER, JR . A THESIS submitted to OREGON STATE UNIVERSITY in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE June 1962 APPROVED: Redacted for Privacy Associate Professor of Statistics In Charge of Major Redacted for Privacy Chaii n of the Departrnent of Statistics Redacted for Privacy Chairrnan Cradua 'e Cornrnittee Redacted for Privacy of Graduate Date thesis is presented May 15, 1962 Typed by Jolene 'Wuest TABLE OF CONTENTS CHAPTER PAGE 1 INTRODUCTION .. ••.•... • . 1 2 REGRESSION ANALYSIS APPLIED TO ORE DISTRIBUTION ..• .... 4 3 FITTING DATA TO REGRESSION MODELS . ....•.. • 7 4 CONFIDENCE LIMITS OF REGRESSION COEFFICIENTS .... • •..... ... 13 5 CONFIDENCE PROCEDURES FOR THE DISCRIMINANT . 16 6 CONFIDENCE LIMITS FOR LOCATION OF MAXIMUM . .. 19 7 EXAMPLE ANALYSES 25 BIBLIOGRAPHY 32 APPENDIX . ...... 34 SURFACE FITTING AND SOME ASSOCIATED CONFIDENCE PROCEDURES CHAPTER 1 INTRODUCTION The use of statistics in geology has expanded rapidly, until today we have statistical analysis being applied in many branches of geology. One of the main reasons for this expansion is the enormous increase in the amount of available numerical data, especially in the field ofmining where thousands of sample assays are often taken in the course of mining operations. This thesis is one approach to the problem ofhow to analyze these vast amounts of sample data and is specifically a des crip tion of the analysis of the distribution of ore in the veins of two large mixed-metal ore deposits . However, the statistical techniques pre sented are applicable to geological data in general, as well as to the specific problem of analyzing ore distribution in metalliferous veins. Extensive mine workings developed over many years in the ore deposits under study have afforded quantitative data in the form of hundreds of thousands of mine assays. A preliminary investigation resulted in the development of efficient methods for processing and averaging these assay results by means of punched cards and IBM machines. Probably the most desirable form of presentation of such data for geological evaluation is a contour map, but preliminary 2 efforts in this direction showed that it was impossible to draw satis factory contou::.- maps from such averaged data. Thus the need for a method of statistical analysis, results of which could readily be trans lated into detailed contour maps ,was apparent (7, p. 1-4). Regression analysis, which is one of the branches of statistics based on the method of least squares and the analysis of variance, filled this need and consequently was adopted as the basis of analysis. Both Krumbein (4) and Whitten (9) have discussed the use of regres sion analysis in the study of geological surfaces, but neither mentions the utility of confidence limits or gives any comprehensive methods of assessing the results of such an analysis . The statistical contribution of this thesis is the derivation of certain new confidence limit procedures associated with regression analyses, especially those having to do with the fitting of quadratic surfaces. The geological objective of this thesis is to apply the mathema tical and statistical methods of regression analysis to such geological questions concerning ore deposits as: which metals are zoned in a statistically significant sense and how, and in what directions and at what rates do percentages and contents of various metals change. This thesis is written in conjunction with a National Science Foundation Research Grant, NSF- G 14189. This project on, "The 3 Distribution of Ore in Metalliferous Veins, 11 is under the direction of Dr. GeorgeS. Koch, Jr., Department of Geology, O regon State University and Dr. Richard F. Link, Department of Statistics, Oregon State University. The author is a research-assistant on the project with contributions in programming, digital computing and data processing on the IBM 1620 electronic computer and supporting equipment. 4 CHAPTER 2 REGRESSION ANALYSIS APPLIED TO ORE DISTRIBUTION Ore distribution within a particular vein may be studied through the analysis of assays of samples collected in drifts, raises and stopes. Each sample is located on a co- ordinate system drawn on a vertical longitudinal section with x representing horizontal distance along the vein and y representing elevation. Locations, vein widths, and assays for the five ore metals: gold, silver, lead, copper and zinc are punched on IBM cards, each ca ~ d representing a different sample. By analyzing thousands of such sample assays, we detet" mine how the amounts of a given metal, say silver, vary as the sample locations vary. By letting z be the amount of silver at a given location (x, y), represented by a vector parallel to the z axis with length proportional to the amount, we can restate the problem as that of finding how z varies with x and y. In other words we want to study the function (2.1) z = f(x, y). The geometric solution of z = f(x, y) is a surface and we are able to describe the zoning of a metal by defining the specific geometric solution of equation (2.1). Therefore, if we can find the relationship between z, x and y, our problem is solved. We cannot 5 directly determine this relationship since all we have given are sample valu es of z for various (x, y) locations. However, by means of r egression analysis we can make use of all our sample data and thereby estimate how z is ::elated to x and y, or statistically speaking, estimate the regression equation. W e actually make an indirect estimation in that we try various equations until a close fit is found . The regression equations or the r egression models, as they are more generally called, which are of most general use in ore distri bution are the multiple, curvilinear, and quadratic response surface models. Their general forms are (2.2) E(z) =~ 0 + ~lx + ~zy , 2 (2. 3) E(z) = )3 0 + ~ 1 x + 5?2x and 2 2 (2. 4) E(z)=~o+~1x+~2Y+_03x +~4Y +_0sxy respectively. E(z) is the expected or average value of z . We note that equations (2. 2) and (2. 3 ) are me::.ely special cases of equation (2. 4), where several coefficients of (2. 4) are zero. We, therefore, can reduce the problem of analyzing the three models to an analysis of the r esponse surface model, without loss of generality. Conse que ntly, we define surface fitting to include the fitting of data to any of the three models. Models with coefficients of higher power may also be fitted, but in addition to more tedious calculations, the regression coefficient estimates also become very unreliable. 6 The geologic s i gnificance of using regression analysis methods to fit data to t he multiple regression model, equation (2. 2), is to determine if the zoning of a particula!" metal can be described by a plane. If it can, then methods introduced in Chapter 4 facilitate finding the horizontal direction (str ike) of the plane. The curvilinear model, equation (2. 3), is utilized when we consider how the amount of a metal varies with distance alone. Replacing x and x2 by y and y 2 respectively in equation (2. 3) corresponds to considering the variation of the amount or content with that of the elevation. Finally, in fitting the response surface model, we determine what type of quadratic surface best describes the distribution o f a given metal. The E(z) component of a fitted response surface model is easily calculated at each of then sample locations. Locations yielding equal E(z) values can then be plotted and graphically connected to form contour maps. 7 CHAPTER 3 FITTING DATA TO REGRESSION MODELS In general, an experimenter who is to interpret rigorously the outcome of an experiment will need to fj rst formulate the problem in mathematical terms (the mathematical model), then test the concor dance of the mathematical model with all relevant respects of the data, and finally, if the model proves to be acceptable, to estimate or set limits on, any constants left unspedfied in the model. R egression analysis is a means of making such an interpretation when the expected or average value of one variable is defined as a function of the observed values of other variables (12, p. 1). In the particular application of regression analysis at hand we are interested in estimating the constants of a regression when the form of the relationship is given and in testing the concordance of some preassigned regression relation with the data. This is because our mathematical regression models are p redetermined. Surface fittmg has its statist1cal s1gnificance based on the similarity between the general form of a particular regression model and its corresponding graphical solut10n equation. This similarity between regression models and equations of geometric representations is apparent 1n even the simplest of all cases. a linear relationship between two variables. For example, in regression analysis, which 8 refers to the fitting of models and to the analysis of the fitted models as well, the study of the linear relationship between two variables is given the name linear regression and the general form of a linear regression model is (3. 1) The algebraic equation representing a straight line is (3. 2) y=mx+b. The similarity between the two forms {3.1) and (3. 2) is immediately obvious and in fact can always be noted regardless of the model we are dealing with. The general forms of the multiple regression, curvilinear regression, and quadratic response surface models are listed in equations (2. 2), (2. 3) and (2. 4) of Chapter 2. We only want to note here that the multiple regression model bears a close similarity to the algebraic equation of a plane and that the curvilinear and response surface models are likewise very similar to the algebraic equations of a curve and a surface respectively.