Mathematical Modelling 2017

Total Page:16

File Type:pdf, Size:1020Kb

Mathematical Modelling 2017

Mathematical Modelling 2017 Instructor: Prof. Ganser, Armstrong Hall 408K ,[email protected] PR: A background in differential equations, linear algebra and statistics/probability is necessary for this course. The course in statistics should be at the calculus level such as Stat 461 here at WVU. Students must have knowledge of a spread sheet program like JMP or Excel as well as a program for doing math such as Mathematica or Matlab. The first set assignments should help you decide if you have the proper background.

Grading: 50% (assignments, projects, semester tests) 20% (Midterm) 30%(Final) Guidelines: Graduate A:80-90, B: 70-79, C: 60-69, D: F Undergraduate: A: 70-90, B: 60-69, C: 50-59, D: 30-49, F: 0-29 Assignments are to be done individually and typed. The project write-up should be clear and concise with page numbers and include: i) Statement of the problem ii) Summary of the solution with reference (page numbers) to calculations and data analysis in the back. iii) Computer code and output in the back

Assignments will have a due date. Sometimes the date is relaxed for all students because of unforeseen circumstances. However, once a date is set the assignments are due on that date. Any projects that are turned in late will either not be accepted or the grade will be reduced.

Goals: This course covers many models. A summary is discussed the first day of class. It is more of a survey of mathematical models than a specialized course in particular models. Topics will not be studied in complete detail in order to see more examples. There is a danger that a student will feel that they have not learned the topic well enough to use it in the future. However, it is hoped that the student will have sufficient understanding of the models discussed so that he or she may know “where to start” when faced with a new problem. Also, the course should help students better understand the models developed by others. Outline

Overview of Linear Models through examples Bowling problem Shelf life of a drink End aisle drink display choice Discussion of basic statistics Dimensional Analysis What it is Fundamental scales How trigonometry is a good example: Formula for area Drag on a sphere Period of a pendulum Spectral Analysis-Analyzing cyclic phenomenon Known periods vs. unknown periods Discrete Fourier basis compared to the Fourier series of a continuous function Nyquist Frequncy Examples Spectral Analysis of white noise Probability Models Probability trees Bayesian Networks Examples: medicine, genetics Review of statistics The big four distributions Deeper into linear models applied to the linear model examples Prediction Design of Experiments Deeper into DOE linear models Optimal design of experiment Probability Models Geometric distribution Exponential distribution Poisson distribution Generalized Gamma distribution Goodness of fit: Graphical Methods Goodness of fit: Chi-Square Test

Assignments

1. Enter the data for the tape problem into a spread sheet. The physical problem will be explained in class. Estimate the value of the angle for and in any way. 10 10 10 10 10 20 20 20 20 20 30 30 30 30 30 Circu m c/cm Tape 3 5 7 9 11 3 5 7 9 11 3 5 7 9 11 width w/cm Angle 17 30 44 64 --- 9 14 20 27 33 6 10 14 17 21 of Pitch A/deg

2. Suppose you experiment with two different shoes and two different balls to determine the best combination for scoring the highest in bowling. Below is the data. Can you come to some conclusion as to what may be the best combination? One possible answer is there is no conclusion. Note: Each combination was done at random and then put in the order below. Shoes Ball 176 A1 B1 176 A1 B2 186 A2 B1 184 A2 B2 180 A1 B1 182 A1 B2 182 A2 B1 188 A2 B2

3. A baseball player wants to use analytics to improve their hitting and so it is decided to use a linear model. First it is decided, based on knowledge of the ingredients that go into hitting, to reduce the analysis to four factors with two levels as shown in the table.

Factor Name Unit Type Level 1 Level 2 s Foot Angle Categorical Square Position c Choke on Inches Numeric .000 2 in. bat p Position in Categorical Forward Back box t Speed of mph Numeric 60mph 80mph pitch The player tried to hit the ball 100 times at each of the 16 combinations in the data set below. The results show on many hits he was able to get out of the 100 tries. Run Stance s Choke c Position Speed Hits 1 2 in. Back 60 mph 13 2 Square 0 in. Forward 60 mph 28 3 Square 2 in. Forward 80 mph 14 4 2 in. Forward 60 mph 38 5 Square 2 in. Back 80 mph 27 6 0 in. Back 60 mph 11 7 Square 0 in. Back 60 mph 13 8 Square 2 in. Forward 60 mph 40 9 Square 0 in. Back 80 mph 19 10 0 in. Back 80 mph 23 11 0 in. Forward 60 mph 34 12 0 in. Forward 80 mph 5 13 Square 0 in. Forward 80 mph 2 14 Square 2 in. Back 60 mph 23 15 2 in. Forward 80 mph 9 16 2 in. Back 80 mph 31

(a)Use the model . As with the Bowling Ball problem, ,etc. To be uniform use the table to determine the meaning of the variables.

Square 2 in 0 in Forward Back 80 mph 60mph

(b) Calculate the Mean Square Error of the model and the common Standard Error for one of the parameters.

(c)Use the rule of thumb to eliminate parameters. Note if an interaction term is not eliminated this means the individual parameters making up the interaction must also be included even if they would be eliminated on their own.

(d)Redo the calculation for the reduced model and find the new Mean Square Error and compare to the previous value. Does the reduced model seem superior?

(e) Based on this analysis, what is the advice you would give to the baseball player?

4. The following data set gives the mass of a chunk of cement and the mass of the four components used in the mix. The problem is to use these masses to predict the heat that was produced given in the y column. What variables would you use in the model? This is very basic, and the question has to do with the initial pencil and paper analysis.

Mass(gm y(calor ) x1(gm) x2(gm) x3(gm) x4(gm) ies) 78892 1005 70.35 261.3 60.3 603 .5 990 9.9 287.1 148.5 514.8 73557 10273 985 108.35 551.6 78.8 197 5.5 1025 112.75 317.75 82 481.75 89790 900 63 468 54 297 85950 905 99.55 497.75 81.4 199.1 98826 10321 1005 30.15 713.55 170.85 60.3 3.5 890 8.9 275.9 195.8 391.6 64525 88910 955 19.1 515.7 171.9 210.1 .5 11590 1000 210 470 40 260 0 870 8.7 348 200.1 295.8 72906 11103 980 107.8 646.8 88.2 117.6 4 10502 960 96 652.8 76.8 115.2 4

5. Read the Dimensional Analysis Notes that are on the e-campus page for our class. Do problems 3 and 4 on page 9.

6. Read the first few pages of the paper A Non Dimensional Analysis of Hemodialysis. On page 140 under the Results section the authors produce three nondimensional quantities given in Equations 5, 6 and 7 from the variables Use the method of products to derive the nondimensional quantities. 7. Crater Ejecta Scaling Laws Article. (a) Do the dimensional analysis of Eq.(1)( the

authors never do this) (b) In your own words derive Eq(6) from Eq.(1). Pay

attention to how the authors do it and why they do it.

8. The problem is to predict the energy consumption for the winter of 1971(the first quarter of 1971).

(a)Graph the data and determine a linear model. Model the slight increasing trend with a straight line and the cyclic portion with an appropriate set of predictor variables. Write out the linear model. Do not solve.

9. Plot Use this data to calculate the Sample Spectrum.

10. Let be a sequence of independent and normally distributed random variables with mean zero and variance one for every. Use any program to graph a realization of as a function of for Use this data to calculate the Sample Spectrum.

11. A huge multinational company, Gans Dynamics, manufactures spacecraft as well as breakfast cereal. A major problem in the production process of cereal is contamination of the cereal by metal pieces that entered the process before the start or pieces that have broken off of the many metal rollers involved in the processing. Also rollers with an imperfection can cause problems.

To monitor the process, sensors that record vibrations are positioned throughout the line to detect abnormal vibrations. The many rollers rotate at various speeds with the highest being a little less than 5 rounds per second. Data from the sensors is collected at the rate of 10Hz. Two files with n=1000 data points each (100 seconds of data) are given on e-campus. One file corresponds to normal operations and the other to contaminated operations. Graph the data and use spectral analysis to determine the sample spectrums. Discuss results and in particular the “contaminated spectrum” and what it implies. Of importance are the frequencies that are highlighted in the contaminated spectrum. This would help locate where the problem might be located. 12. Download the Wolfer sun spot data and graph the data. Also perform a spectral analysis of the data and determine the dominant frequencies. 12. An experiment is done to see if a coin is fair(the probability of heads is .5). In one experiment the coin is tossed 100 times and heads show up 57 times. Use the likelihood ratio test and the result that is approximately (when is large) to test . Do the same calculation for and heads appeared 570 times. Why do you think (use common sense) the conclusions are different?

13. Suppose a great basketball player is a terrible free throw shooter. The coach knows that on average, “Mounty” makes about 40% of his free throws but the coach wants to know what percent of free throws Mounty will make given that he either missed the one before or made the one before. Here is data from one game from start to finish: 1-0-0-1- 0-1-1-0-0-0-1. Can you help the coach answer her question? 14. Norman is sometimes late for work usually because of a train strike. Given the data

Train strike t T .1 F .9

Train Strike t T F T .8 .1 F .2 .9

Norman Late

a) Find the probability Norman is late. b) Find the probability the train is on strike given the Norman is late.

15. Company X manufactures memory chips in lots of 10 at stage 1. In stage 2 each lot is further processed. From experience, it is known at stage 1 that 80% of the lots are “good” and 20% are “bad”. Good lots have exactly 1 defective chip and bad lots have exactly 5 defective chips. In the standard procedure at stage 2, the good lots require $1000 of processing to yield 10 acceptable chips while bad lots require $4000 to yield 10 acceptable chips. a) What is the expected value or average cost at stage 2 per lot? b) Alternatively, the company can rework each batch (both good and bad) at a cost of $1000 that always produces a good batch that makes it ready for the $1000 processing. What is the expected cost at the end of stage 2 for this procedure? c) Alternatively, for $100 a technician can sample 1 chip from each batch in an attempt to determine if it is good or a bad batch. If the sample chip is bad, the whole lot is reworked at a cost of $1000 (as in b) to guarantee a good lot for stage 2 with the usual $1000 processing. If the sample is good it is put into the standard procedure as in part a). What is the average cost of a lot in this case?

16. Turkey data. The following data corresponds to the weight of turkeys(y) at age (x) raised in Georgia, Virginia, or Wisconsin. y x Origin 13.3 28 G 8.9 20 G 15.1 32 G 10.4 22 G 13.1 29 V 12.4 27 V 13.2 28 V 11.8 26 V 11.5 21 W 14.2 27 W 15.4 29 W 13.1 23 W 13.8 25 W

(a) Write out the linear model for this problem using our procedure to model a variable such as the state of origin and assuming the weight grows linearly with age (straight line). Use (G,V,W) with for the coefficients corresponding to the states. 17. The data set is an investigation into the relationship between cholinergic nicotine receptors (nAChR) in the brains of schizophrenics compared to the brains of the controls (non schizophrenics). Complicating the study is some evidence that the number of nicotine receptors decrease with age for everyone and also smoking has an effect because it is associated with early deaths and could cloud the results. Consequently, the age of the subjects are included and also cotinine levels. Cotinine in the brain is a metabolic of nicotine and has a longer half-life than nicotine and is a way to capture if the subject smoked.

Analyze several relationships using linear models. 1. First consider a simple model between nAChR levels and whether the subject is schizophrenic or not. (This is really the basic hypothesis) 2. Also consider if controlling by age (ie. including age in the model as a predictor) and or cotinine levels is necessary to determine a relationship between nAChR levels and schizophrenia.

Give parameter estimates and use the rule of thumb to determine if a coefficient is significant. Also use the MSE for each model to compare with other models. Ultimately give what you think is the best model. Note: In reality a neuroscientist would be part of the project to put results into context. The results here are just what the numbers say.

Schizophrenia Age Cotinine Smoke nAChR No 55 2.00 No 18.53 No 83 9.03 No 11.73 No 52 5.6 ? 19.01 No 74 2.00 No 25.93 No 61 2.00 No 21.66 No 56 103.11 ? 25.54 No 80 5.27 ? 11.28 No 84 4.85 ? 16.22 No 49 85.19 Yes 30.69 No 87 78.54 Yes 21.03 No 74 72.33 ? 23.65 No 44 4.40 ? 17.27 No 94 2.00 No 17.34 Yes 91 4.69 ? 11.41 Yes 70 100.70 ? 10.90 Yes 58 65.50 ? 21.38 Yes 61 78.89 ? 12.45 Yes 42 84.64 Yes 27.20 Yes 70 66.74 ? 17.08 Yes 69 108.62 ? 26.77 Yes 30 74.00 ? 19.56 Yes 70 90.08 Yes 17.73 Yes 40 113.77 Yes 26.30 Yes 91 2.00 No 10.30

18. A beverage company has the opportunity to use end aisle displays to sell 6-packs of soda. The marketing department has created three candidate types of displays to use. To determine the best display, the marketing department identifies 15 similar stores and sets up each of the candidate displays at 5 stores. The data collected is the typical number of 6-packs sold in a week in each of the stores with the old displays and in a week with the new display. Set up the problem you would solve to answer the question on what is the best display. DO NOT SOLVE. Give reasons as if you were presenting this approach to your boss in the marketing department.

Store Initial New Display 1 110 116 1 316 334 1 225 239 1 206 218 1 172 181 2 88 93 2 215 229 2 246 261 2 313 331 2 151 161 3 262 285 3 134 146 3 425 459 3 156 169 3 198 213

Recommended publications