Division of Graduate Professional Studies s1
Total Page:16
File Type:pdf, Size:1020Kb
Brandeis University Division of Graduate Professional Studies Rabb School of Continuing Studies Course Syllabus I. Course Information
1. Introduction to probability and statistics 2. RBIF-0103-G1 3. 01/21/2015- 04/25/2015 4. Distant Learning Course Week: Wednesday through Tuesday 5. Instructor, contact info: Michael B. Partensky, PhD Please contact me via email: [email protected], [email protected]
(1) To avoid delays, please send your mail to both addresses if you want to contact me before 01/21/15. Later, please use the Brandeis address. 6. Virtual office hours: Sunday, 11 am – 13 am (EST) [occasional changes are possible] 7. Document Overview This syllabus contains all relevant information about the course: its objectives and outcomes, the grading criteria, the texts and other materials of instruction, and of weekly topics, outcomes, descriptions of assignments, and due dates. Consider this your roadmap for the course. Please read through the syllabus carefully and feel free to share any questions that you may have. Please print a copy of this syllabus for reference. 8. Course Description Purpose and content. The course builds a foundation for the “probabilistic thinking” method, with applications to real life problems including bioinformatics, bio- and medical statistics, computational biology and biophysics, data analysis. The topics cover random numbers, discrete and continuous random variables, elements of Combinatorics, conditional probability, Bayes' formula, Markov chain, Binomial, Poisson and normal distribution, entropy and information, Monte-Carlo method, the central limit theorem, confidence interval and hypothesis testing, correlations, nonlinear regression and maximum likelihood. We will also learn some basics of Mathematica programming language and will be using it for the computational probabilistic experiments. Prerequisites. Solid knowledge of basic algebra, geometry and trigonometry would be very helpful for your success. If you are not fluent in basic math, please reserve more time for your weekly studies. Some familiarity with introductory calculus (functions, derivatives, integrals) is preferable, but not required. The lectures will provide you with the necessary background in calculus as needed. Catching up with Math. On the first week of the class an introductory math quiz will be offered, aimed to help you refresh your math background, and allocate adequate time and efforts for your weekly studies. The test will cover the areas of basic and more advanced Math directly related to the class. The test is not graded, but required (the grade is 100 if you took it or 0 otherwise). Based on the outcome, you will be advised to refresh some of the materials if necessary. Mathematica (see section 9.3), an excellent educational and research software intensively used throughout the course, will also help in refreshing your Math skills. It is strongly advisable to start practicing Mathematica without delay.
9. Instruction Materials 9.1. Semi-Required Texts (mostly for the individual studies) 2
1. M.S. Spiegel, J.J. Schiller and R.A.Srinivasan , Schaum's Outline of Probability and Statistics, Schaum’s Outline Series, McGraw-Hill, 3-d (2009), ISBN:9780071544252 (2) 2. E. Don, Schaum's Outline of Mathematica, Schaum’s Outline Series, McGraw-Hill 2-d (2009) ISBN: 9780071608282
3. C.M. Grinstead and J.L., Snell. Introduction to probability. Am. Math, 2-d (1997) ISBN: 9780821894149 (this book can be also downloaded from the web for free Please send a thank-u note to the authors)
9.2 Recommended Text(s)
4. Bennett, D.J. 1998. Randomness. Harvard University Press, Cambridge, (1999), ISBN: 978-0674107465 Enjoyable supplementary reading. A lot of insights, paradoxes, peculiarities. 5. S. Wolfram Mathematica (9-th edition): the reference Source. It is included in e-format in the standard Mathematica distribution). 6. W.J. Ewens and G.R. Grant, Statistical methods in bioinformatics (an introduction), Springer, 2-d, (2005) ISBN-13: 978-0387400822 (will be used only occasionally, but could be also handy in your future study of bioinformatics.) 7. R. Durrett, Probability: Theory and Examples (Cambridge Series in Statistical and Probabilistic Mathematics), CUP (2010) ISBN-13: 978-0521765398 8. N.N. Taleb, Fooled by randomness, Random House, 2-d, (2008) ISBN-13: 978-1400067930 [Contains a lot of insights and cute examples] 9. W.W Hines et al., Probability and Statistics in Engineering, Wiley, 4-th (2009) ISBN: 978-0471240877 10. R. Durbin, S.R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis : Probabilistic Models of proteins. Cambridge University Press; Reprint edition (1999), ISBN: 978- 0521629713 (the comment from #6 is also applicable here)
9.3 Required Software
Mathematica 10. We will be using Mathematica for the experiments with randomness. In addition Mathematica will help you to refresh some of the math required for the course. You will be able to purchase a student version of Mathematica 10 (which is fully functional) at a significant discount. To get an additional 15% discount please enter the promotion code PD1637 at checkout from the Wolfram Web Store at store.wolfram.com (If asked, please enter my name. This feature is provided to the members of the Wolfram Faculty Program). Mathematica is an extremely powerful and elegant tool, and I am sure that some of you will find it very useful in your future work.
9.4 On-line Course Content This course will be conducted completely online using Brandeis’ LATTE site, available at http://latte.brandeis.edu. The site contains the course syllabus, assignments, our discussion forums, links/resources to course-related professional organizations and sites, and weekly checklists, objectives, outcomes, topic notes, self-tests, and discussion questions. Access information is emailed to enrolled participants before the start of the course. 10. Overall Course Outcomes
2 3
The course is designed to teach the probabilistic way of thinking. It provides a thorough background in the basics of probability theory and statistics, the major pillars of bioinformatics and biostatistics. We will utilize the multi- disciplinary approach by using the examples and examining the ideas from various fields, from statistical physics and computer modeling of proteins to the probabilistic aspects of evolution and biological data analysis. The class will strongly benefit from using Mathematica, the most advanced “computer aided thinking tool” which helps in understanding the major concepts of P&S, developing algorithms and running random experiments.
Course Outcome Assignment / Assessment
1. Apply the elements of set theory to the Lect. 2, 3; HW 2, 3 analysis of complex events and biological sequences
2. Use Combinatorics for the analysis of Lect. 3,4; HWs 3,4 various random selection problems, derivation of major probability distributions Lect. 10; HW 10 and grasping some major combinatorial In addition, various Combinatorial concepts are quite problems of sequence analysis. evenly distributed over the course, as one of the foundations of Probabilistic Thinking
3. Apply Binomial, Poisson, geometric hyper- Lect. 4, 5, 10, 12; geometric, negative binomial, Normal, exponential and other probability HW 4-6, 10-12 distributions to the analysis of probabilities, sampling errors, sequence similarity.
4. Recognize and analyze phenomena described Lect. 6,7; HW 6,7 by conditional probability. Use the Bayes formula to analyze prior probabilities given the outcomes
5. Apply non-linear regression (NLR) to data Lect. 11; HW 11. modeling; develop Mathematica-based applications of NLR for solving some real- life problems
6. Apply the concept of Maximum likelihood to Lect. 11; HW 11 the experimental data analysis.
7. Analyze some archetypical paradoxes of Lect. 2, 7 probability (‘Monty Hall’, ‘prisoner’s Multiple Q&A forum discussions dilemma’, second daughter) for the guidance in solving complex real-life statistical problems.
8. Apply the measures of central tendency Lect. 10; HW 10 (mean, variance, e.t.c.) for the statistical estimates 4
9. Analyze and simulate with Mathematica Lect. 7,8; HW 8 various Markov models as a foundation of the major algorithms of sequence analysis (HMM, Blast, e.t.c.)
10. Use the central limit theorem for the analysis Lect. 12; HW 12; Test preparation problems. of sampling errors and confidence interval 11. Apply the hypothesis testing technique to the analysis of statistical data Lect. 12; HW 12; Test preparation problems.
12. Use relation between entropy and Lect. 13; Q&A forum discussions. probability, and Boltzmann statistics as fundamental concepts behind the protein dynamics and energetics. Elucidate relation between entropy, disorder, and information.
13. Formulate basic principles underlying the Lect. 5, 13 (+ Videos of MC simulations) Monte Carlo and Molecular dynamics modeling of molecular biological systems.
14. Analyze and describe some statistical Lect. 7; HW 7 problems of genetics ( Hardy-Weinberg law, probabilities of genetically inherited diseases, applications of Bayesian statistics)
15. Actively participate in the team work: Weeks 2 - 13 problem solving in groups 16. Use Mathematica as the programming, Weeks 1-5 : intense introduction to Mathematica; practical visualization and presentation environment applications of Mathematica are evenly distributed between the classes
Upon completion of the course students will be able to Use general principles of P&S in preparation for future work in bioinformatics - Use the operational definition of probability to estimate the empiric probabilities for random events and biological sequences - Apply the elements of set theory to the analysis of complex events - Use Combinatorics for the analysis of various random selection problems, derivation of major probability distributions and grasping some major combinatorial problems of sequence analysis. - Apply Binomial, Poisson, Normal, geometric, hyper-geometric and negative binomial distributions to the analysis of probabilities, sampling errors, sequence similarity - Recognize and analyze phenomena described by conditional probability - Use the Bayes’ formula to analyze prior probabilities given the outcomes - Apply non-linear regression (NLR) to data modeling; develop Mathematica-based NLR applications for some practical examples
4 5
- Apply the concept of Maximum likelihood to the experimental data analysis. - Analyze some archetypical paradoxes of probability (prisoner’s dilemma, Buffen needle, etc) for the guidance in the analysis of complex real-life statistical problems. - Apply the measures of central tendency (mean, variance etc) for the statistical estimates - Analyze and simulate with Mathematica various Markov and random walk models for better understanding of the major algorithms of sequence analysis (HMM, Blast, etc) - Use the central limit theorem for the analysis of sampling errors and confidence interval - Apply the hypothesis testing technique to the analysis of statistical data - Use the ORC curves approach to the test design Apply probabilistic methods and concepts to the analysis of biological systems on different levels: - Use relation between entropy and probability, and Boltzmann statistics as fundamental concepts behind the protein dynamics and energetics - Formulate basic principles underlying the Monte Carlo and Molecular dynamics modeling of molecular biological systems. - Analyze the probabilistic basis of Mendelian genetics, distribution of alleles, Hardy-Weinberg (HW) theorem; Participate in a team research work involving numerical statistical analysis and modeling, and communicate its results to colleagues; make presentations on various statistical topics - Team work in the class - Use Mathematica as the programming, visualization and presentation environment 11. General Grading Criteria The course grade will be based on homework (50%), tests (20%), student’s activity in class (30%). In addition, students can earn extra credits for various extra activities. This can be done, for instance, by completing the optional assignments offered in most of the lectures, making short presentations (papers + computer experiments), etc. 12. Assignments and Tests: Description, Structure and Grading
13.1 Participation/Attendance All students are expected to participate regularly. The activities (forum discussions, group activities, reading and Home Work assignments) should be spread evenly over the week. 13.2 Communication, correspondence. All the emails related to this class will be sent to your Brandeis email account. However, almost everyone has and uses a primary personal account. For this reason it is extremely important to set up forwarding from your Brandeis account to the primary account. It is you responsibility to make sure that all the messages from the instructor and from the school are received on time. At the beginning of the class I will ask you to send me confirmations to make sure that everyone is tuned in . 13.3 Home assignments (content, early submission options, and grading).
General Every week, a homework assignment will be offered. It typically includes a required part and an extra- credit. The deadline for the submission is Tuesday 11.30 pm. The late assignments are not accepted (graded F). In such cases, a make-up can be offered. However, it is highly recommended to submit on time because the class is quite intense and working on additional assignments can jeopardize your progress. 6
All the submissions should be done via the latte
Submission options. Usually, you will be offered to choose one of two options: a) Submitting once (single file submission) for final grading. The only deadline in this case is Tuesday 11:30 pm. b) Submitting more than once (multiple file submission). As explained further, this option is also named the “Early submission” (ES), and involves two deadlines [for the first submission (see weekly assignments), and for the final submission, Tuesday, 11:30 pm].
The “Early Submission” (ES) elaborated
ES implies “multiple file submission”, where the originally submitted assignment can be improved and resubmitted. One who chooses this option must submit early, usually by 14:00 on Sunday preceding the class (unless otherwise is stated for a particular week). If the original submission is not perfect, it will be returned to you with the initial grade (we designate it G(1) ), with the score assigned to each of the problems, and with some questions and hints helping you to find and fix the errors. Then you are given an opportunity to resubmit and improve your grade.
The initial submission must be complete: you should provide solutions to all the required problems. The first grade G(1) is the starting point, and all the further grades depend on it. At the end, after the resubmission(s), your grade cannot be less than G(1), but you also (except for some rare occasions) cannot get 100% (assuming G(1) was less than 100%). Each submission numbered n (n= 1, 2, 3…) is initially graded based purely on its quality. We name this the “unbiased” grade G(n). The “real” grade for each submission is defined as (1) For instance, if the first percentile grade is G(1) = 60 and the second grade (first resubmission) is G(2) = 100, then the final grade is 80%. This approach should motivate you (in addition to submitting early) to receive the starting grade as high as possible.
The individual problems are graded on the scale 0 to 1. In each submission the total percentile grade G(n) is obtained as the total of the scores for the individual problems divided by the total number of the problems, times 100. The real grades for the individual problems are calculated for each submission in the spirit of rule (1). For example, if the (unbiased) grade for a particular problem changes as {0.6, 0.7, 1. } in the course of three consecutive submissions , then the “real” grade for this problem is {0.6, 0.65 and 0.8}, and the final grade is 0.8. Usually, there is only one resubmission, n=2. In some cases (especially if the work was submitted earlier during the week, say before Sunday), the second, and, occasionally, even the third resubmission (n= 3, 4) will be allowed. Please use the same file for all the (re)submissions of a current week! For each resubmission, please create a separate subsection after each solution being fixed, named the “Solution n” with n = 2 (for the first resubmission) or n=3 (for the second). We will learn how to format Mathematica files (including the sectioning) during the first two weeks. All my comments and your solutions (the current one and all the previous) must stay in the file unchanged. Do not delete your previous solutions and my comments! 6 7
The name of the file should contain your name, and the submission number (n). It is usually derived from the original name of the file (posted by the instructor) by adding “_YourName_n”. For example, the assignment of the 5-th week was named HW5.nb. Then, HW5_ SamClemens_2.nb is the second (2) submission of this assignment by Samuel Clemens. Similarly, HW7_JaneWang_1.nb is Jane’s first submission of HW7.
The discussions that follow the early submission lead to a better and deeper understanding of the course material and improve your overall performance. Naturally, submissions made after the ES deadline (but before the final deadline), are graded only once. There is neither a penalty for not using the ES option, nor a reward for submitting early (except for the opportunity to resubmit and fix the errors).
13.4 Self-tests. Some assignments will be accompanied by the self-tests containing the problems similar to those from the Home Assignment. These are offered solely for your practice and benefit, and do not have to be submitted. All the self-test problems can be discussed on the open forum
13.5 Class presentations Concepts reviewed in the class or related to those could be enriched through the (optional) students’ presentations (typically, the short papers including the examples with Mathematica). This activity is entirely voluntary. It is graded as a participation assignment (see the grading policy). I will suggest a few topics, but quite often students contribute their own ideas and topics, and share P&S- related experiences from their work (I remember remarkable presentations about the Bayesian networks, and on a Stock Market analysis) or even their hobbies (the Mathematica model of the “Texas Hold'em” was one of such examples). Please, indicate you interest in making a presentations as early as possible). The “presentations” will be added to the reading materials, and everyone will be encouraged to read and discuss them on the forums. The starting threads for the discussions can be created by the presenters.
13.6 Groups and related activities. The class will be divided into several groups, usually three students per group. Certain assignments will be offered for group work, and the answers will be graded as “class activity” or the HW, depending on the type of assignment. Short (30 min) tests will also be offered during some weeks, either for the whole class, or separately for the group work. An important component of your class activity is the participation in the weekly discussion forums. Your posts (responses, questions etc.) will be evaluated based on substantiality of their content. Instead of elaborating our understanding of “substantiality” it is easier to give the examples of non- substantial posts:
“Hi, John. It’s a wonderful idea! I was thinking along the same line! Cheers. Mike”. “Ann, I liked your solution but could not understand the last part. Could you please explain it”.
They both are valid and useful responses. The first one is a kind encouragement, while the second contains a question and invites for further discussion. In other words, they do not contribute to the grade, but they both are valuable and important. All kinds of responses are welcome and important, even if they are not graded as substantive. Besides, there is no sharp boundary between the substantive and other responses. For example a simple inquiry can be considered substantive if it triggers a valuable discussion, and it should be graded accordingly.
Quite often the HW assignments will be offered on a group basis. In such cases the detailed instructions will be provided. 8
Note: In general, the early submission policy is not applied to the group assignments. Instead, the group members are allowed to discuss the solutions on the group forum, in the process of composing jointly a submitted document. The instructor can participate in these discussions and provide hints and advises if necessary.
13.7 Mid-term and Final tests will be offered on the 8th and 13-th week respectively. They include 5- 6 problems each. The specific instructions will be provided.
13.8 Online Participation
There are four major types of forum activity
(1) Responses to the original questions posted by the instructor (Q&A forum(s)). This includes questions related to the HW assignment and Lecture materials. Answering a certain number of these questions will be required. Each student gains complete access to such forums only after having responded to the first question posted by the instructor. Usually, there will be up to three required questions. The specific instructions will be provided weekly.
(2) Participation in the discussion at Q&A forum(s) After answering the required questions, you will be able to access these forums and participate in discussion. This is also a valuable component of your participation.
(3) Participation in the open discussion forum(s) (ODF), where a student can ask and respond to any class-related questions (except for the HW assignment).
Comment: The exception is the Home Work problems: they must be solved individually. The only HW-related questions allowed for discussion are the questions posted by the instructor (see A.1 ). Otherwise, the discussion of HW assignments is prohibited. However, you can ask me HW –related questions (mostly related to the understanding of the problems rather than their solutions) at private forums.
(4) Group discussion forums
These forums will be created for various group activities.
Detailed participation assignments will be posted weekly. Here is an example (we presume that the class week starts on Wednesday):
“ (a) By Friday Night (22:00 EST) post two original (required) responses on Q&A forum(s). (b) Not later than 12:00 (EST) on Monday post two (at least) replies to the posts of other participants (and/or submit your own substantive questions or comments) on Q&A forum, and at least one post on ODF. The posts must be submitted on at least three different days of the online course week. For example, you can post your answers to Q&A on Thursday and Friday, reply to Q&A on Saturday, and participate in ODF during the week".
Online participation is very important. It contributes 30% to the total grade, and it is a very effective learning tool. You will soon realize that the aforementioned requirements are not “abusive”, and discussing your questions with others is rewarding and enjoyable. Most of you will easily surpass the
8 9
required level of participation.
13.9 Participation Evaluation
First, we introduce two types of students’ responses: Type 1(T1): responses to the original questions posted by instructor at Q&A forum; Type 2 (T2): Participation in Q&A (after and ODF discussion.
Points may be earned for original responses and substantive replies based on the following criteria:
Type 1
90-100 pts Discussion is substantive and relates to key principles. (Very Thoughtful) The answers are complete, and well explained
The Math and coding part (if present ) is correct
Provides examples demonstrating application of principles.
Is submitted according to the deadlines in the course schedule.
Language is clear, concise, and easy to understand. Uses terminology appropriately and is logically organized.
79-89 pts Makes reference to key principles, but is not well developed or integrated in the response.
(Thoughtful) The answers are not complete, and not well explained
The Math/coding part (if present ) is on the right track, with some errors
Offers some examples, but they are not sufficiently illustrative and not well integrated in the response.
Submitted according to the deadlines in the course schedule.
Is adequately written, but may use some terms incorrectly; may need to be read two or more times to be understood.
68-78 pts Contains no reference to key principles; if key principles are present, (Somewhat Thoughtful) there is no evidence the learner understood principles, or key principles are not integrated into the response.
The Math/Coding part (if present ) contains errors
Does not offer examples, or the examples are too trivial.
Response is not submitted by the due date.
Poorly written; terms are used incorrectly; cannot comprehend learner’s ideas after repeated readings.
Type 2
90-100 pts Is substantially related to and reinforces the unit overview, text, (Very Thoughtful) and/or supplementary readings.
Responds to the ideas and concerns of other learners.
Math/coding (if present) is correct and clearly explained
Is characterized by three to four of the following criteria: o Thought-provoking 10
o Supportive o Challenging o Reflective
Is submitted according to deadlines in the course schedule.
Language is clear, concise, and easy to understand; uses terminology appropriately and is well organized.
79-89 pts Contains references to unit overview, text, and/or supplemental (Thoughtful) readings, but references are not well integrated in the response.
Response is peripherally related to the ideas and concerns of other learners.
Math/coding (if present) contains some minor errors or explained clearly
Is characterized by one or two of the following criteria: o Thought-provoking o Supportive o Challenging o Reflective
Submitted according to deadlines in the course schedule.
Adequately written, but may use some terms incorrectly; may need to be read two or more times to be understood.
68-78 pts Contains no reference to key principles; if key principles are present, (Somewhat Thoughtful) there is no evidence learner understood principles, or key principles are not integrated into the response.
Math/coding (if present) contains errors
Response is unrelated to the ideas and concerns of other learners.
Response is not thought-provoking, supportive, challenging, or reflective.
Response is not submitted by the due date. Is poorly written; terms are used incorrectly; instructor cannot comprehend learner’s ideas after repeated readings.
The total participation grade is calculated as a weighted average. The responses belonging to T1 directly test your understanding of the lecture material and of the HW assignment. For this reason they are sometimes assigned a higher weight.
For example, the grade X1 for the type 1 response can be assigned the weight p = 0.6. Then, the weight of contributions X2 of the second type has weight q=0.4. The grade X1 itself is the average of percentile grades for all the required responses to the instructor’s questions at Q&A forum. The grade X2 for T2 is the average of the grades for corresponding contributions.
In all cases, if the number of responses exceeds the required number of responses, Nreq, the best Nreq responses will be chosen. For example, if the required number for T1 is Nreq=3 and the actual number of responses is 5, then only three best responses will be counted towards the grade (we call these grades X1, X2 and X3 ) and the total grade for T1 becomes X1 = (X1 + X2 + X3)/3 . The same holds for T2 responses. The final grade is calculated as
10 11
Consider the example:X1=70, X2=90, p=0.6, and q = 0.4. Then, The final grade is shifted towards X1 demonstrating the role of the weights. Note: This “weighting” approach is not strict. For example, if your contribution belonging to T2 category is original, mind-provocative and demonstrates your deep understanding of the subject, its contribution to the grade will be enhanced. II. Weekly Information
1. Course schedule and class topics Some minor changes in the topics, and in their distribution, are possible.
Startin Comments Week g Topic (for the week of the Date class) 1 1-st Introduction to Mathematica. Mathematical Background for You are encouraged to 01/22/14 watch the suggested videos P&S even before the class starts 2-d Introduction to Mathematica. First random simulation with 2 Mathematica. Please, volunteer and select topics for 1-st 01/29/14 First introduction in Probability (P). Early history of P: on The presentations (for the shoulders of the giants. Laws of chance: are they possible? weeks 4 or 5)* Conundrums and Paradoxes of probability.
3 Random experiments, sample space and random events. Introduction to the set theory. Axioms of Probability. Frequency definition of Probability. Random variables. Probability function and Cumulative Distribution Function. 02/05/14 Counting Probabilities: Multiplication Rule.
3-d Introduction to Mathematica.
4 Counting probabilities (continued). Elements of Combinatorics. Permutations, combinations, binomial coefficients. Bernoulli trials and related Probability Distributions: Binomial, Geometric, Negative binomial distribution. Some applications in 02/12/14 Sequence Analysis.
4-th Introduction to Mathematica: Random numbers, chance experiments with Mathematica: Matrices in M. 5 Multinomial, Gypegeomteric and Poisson distributions. Please select topic Applications, problem solving. for the presentation, 02/19/14 Chance experiments with Mathematica: Monte-Carlo Integration. week 9-11. You can Conditional probability. Independence, Global independence. use my suggestions or 02/26/14 6 Total probability Rule. Simulations with Mathematica. pick your own. The 03/05/13 Bayes formula and related “paradoxes” presentation is 7 Two-stage experiments. Hardy Weinberg theorem. Markov graded as class Chain: recursive treatment. participation. Practice for the test. If you decide not to present, - do not worry. This is completely 12
volunteer activity 8 Mid-term test (it may be distributed between weeks 8 and 9) Markov chain: Matrices-based treatment. Applications to 03/12/14 Bioinformatics: CpG islands. Random walks.
Integrals with Mathematica. 9 Continuous random variables. Distribution function, CDF. Important distributions and densities: Uniform, Exponential, 03/19/14 Gamma, Normal, Chi-Square. Relations between Binomial, Poisson and Normal distributions. Practice.
Practice with the continuous distributions. Mean, Variance and other estimators (moments) for discrete and 10 continuous random variables. Sums of random variables. 03/26/14 Some applications of Probability Distributions in bioinformatics Joint distributions, marginal distribution. 11 Introduction to data modeling: (1) Maximum Likelihood; (2) Linear and Non-linear regression. 04/02/14 Real life examples with Mathematica.
12 Distributions of sums of random variables. Laws of large 04/09/14 numbers. Central limit theorem. Confidence Interval. Hypothesis testing (1). Random samples, and sampling distributions. 13 Probability and Entropy. Boltzmann distribution. Monte Carlo 04/16/14 and Molecular dynamic simulation of biomolecules. Final Test. Review. *The link to the suggested topics can be found in the “Lecture Materials” page (Latte), but you are also encouraged to suggest your own topics. (3)
2. Weekly assignments
Every week we offer a HW assignment typically including 5 – 9 problems (one of them is usually an extra-credit problem). The assignments are in Mathematica notebook format, and Mathematica is used both for solving the problems, and formatting the submitted document. Some assignments include random experiments with Mathematica. The early submission policy is described in section 10 (1). The latest submission time is Tuesday, 11.30 pm. The assignments will also include the “participation tasks”, especially the forum activities. (4) (5) (6)
3. Weekly outcomes
12 13
1 At the end of week 1, students will: Refresh the main mathematical concepts/tools used in the class, including some elements of algebra, sums, products, integrals. Write first Mathematica-based programs using functions, tables, random number generator, plots.
2 At the end of week 2, students will be able to:
Describe the major sources of Probability Theory Describe some archetypical paradoxes of Probability Apply some basic analytical and visualization tools of Mathematica Run simple random simulations with Mathematica
3 At the end of week 3, students will be able to:
Describe the sample spaces of various random experiments. Analyze simple and complex events in terms of the set theory Apply the set theory to the classification of amino acids Use the frequency-based definition of probability rule for the analysis of probabilities of different random events Describe random phenomena in terms of Random variables, Probability Function and Cumulative Distribution Function. Apply Multiplication Rule to counting the outcomes of sequential experiments.
4 At the end of week 4, students will be able to: Use Combinatorics for the analysis of various random selection problems, derivation of major probability distributions and grasping some major combinatorial problems of sequence analysis. Running simple statistical simulations in Mathematica. Recognize the Bernoulli trials process, and the related discrete probability distributions (DPDs): Binomial, Geometric, and Negative Binomial. Describe some sequence analysis problems in terms of discrete probability distributions. Performing basic operations on matrices using M.
5 At the end of week 5, students will be able to:
Perform numeric computations using Monte Carlo Approach Formulate basic principles underlying the Monte Carlo approach to computer modeling Apply Poisson, Hypergeometric and Multinomial DPDs to the analysis of random events Recognize differences between sampling with and without replacement. 6. At the end of week 6, students will be able to:
Apply various statistical distributions to analysis of random events Investigate properties of the statistical distributions using the Mathematica-based algorithms Analyze the properties of related events in terms of conditional probability. Investigate the pairwise and global independence of the events. 7. At the end of week 7, students will be able to:
Apply Bayes’ formula to the analysis of posterior probabilities Using Bayes’ approach, analyze reliability of tests based on the on the two types of errors and 14
1 At the end of week 1, students will: Refresh the main mathematical concepts/tools used in the class, including some elements of algebra, sums, products, integrals. Write first Mathematica-based programs using functions, tables, random number generator, plots.
2 At the end of week 2, students will be able to:
Describe the major sources of Probability Theory Describe some archetypical paradoxes of Probability Apply some basic analytical and visualization tools of Mathematica Run simple random simulations with Mathematica
prevalence. Apply the concepts of Specificity and Sensitivity to the medical tests analysis. Analyze multi-step random experiments Derive the Hardy Weinberg theorem Describe the general properties of Markov chains Apply recursive approach to the analysis of Markov chain 8. At the end of week 8, students will be able to:
Apply matrices to the analysis of Markov chains (MC) Apply MC to bioinformatics ( detecting the CpG Islands) Simulate the random walks with Mathematica.
9. At the end of week 9, students will be able to Describe the continuous probability distributions in terms of pdf and cdf Apply Mathematica to the analytical and numerical computations of the continuous probabilities Recognize and use some important continuous distributions: Uniform, Exponential, Gamma, and Normal. Recognize and apply joint and marginal distributions
10 At the end of the week 10, students will be able Compute the measures of central tendency (expectation, variance etc) and apply them to the analysis of statistical properties Analyze the statistical properties of the sums of random variables using Mathematica-based simulations Apply the law of large numbers to the analysis of the asymptotic behaviors of the sums.
11 At the end of week 11, students will be able to Apply the Maximum Likelihood approach to the modeling of statistical data Use Linear and Non-linear regression for data modeling and analysis of correlations Use Mathematica-based statistical algorithms (NonlinearFit, Anova, LinearRegression) for the data analysis Develop Mathematica-based NLR tools for some practical applications
14 15
1 At the end of week 1, students will: Refresh the main mathematical concepts/tools used in the class, including some elements of algebra, sums, products, integrals. Write first Mathematica-based programs using functions, tables, random number generator, plots.
2 At the end of week 2, students will be able to:
Describe the major sources of Probability Theory Describe some archetypical paradoxes of Probability Apply some basic analytical and visualization tools of Mathematica Run simple random simulations with Mathematica
12 At the end of week 12, students will be able to Use the central limit theorem, and explain the special role played in statistics by the normal distribution. Explain the concept of “confidence interval”, “statistical significance” and “ p-values”, and their role in statistics. Apply these concepts to the hypothesis testing.
At the end of week 13, students will be able to 13 Describe the relation between probability and entropy. Boltzmann’s formula. Describe Boltzmann distribution and explain its use for derivation of equilibrium statistical properties (examples of gases). Explain physical principles behind Monte Carlo simulation of biological systems.
4. Weekly Reading assignments.
(7) The lecture materials are mostly self-containing. The additional reading assignments will be posted.
III. Course Policies and Procedures
Late Policies
The Homework assignments must be submitted prior to the class, not later than 11.30 pm on Tuesday. Those who do not submit their assignments on time will have to take a make-up test. However, the class is quite intense, and it is in your best interest to complete your assignments on time. Grading Standards
Work expectations 16
Students are responsible to explore each week's materials and submit required work by their due dates. On average, a student can expect to spend approximately 9 - 12 hours per week (more specific recommendations will be made individually during the week 2), reading and completing assignments. This presumes that a student’s educational background satisfies the prerequisites. Otherwise, more efforts would be required. The assignments will be posted at the beginning of each week (Wednesday morning). Grades are not given but are earned. Students are graded on demonstration of knowledge or competence, rather than on effort alone. Each student is expected to maintain high standards of honesty and ethical behavior.
How points and percentages equate to grades
%% Character grade %% Character grade 98-100 A+ 70-74 C+ 94-97 A 65-69 C 90-93 A- 60-64 C- 85-89 B+ 50-59 D 80-84 B 0-49 F 75-79 B- Extra credit Adds up to 10% of the base grade
Attention: Sage converts both A+ and A into 4.0. I will still use A+ as a token of my appreciation for a job done far above the required level. In a practical sense, the extra “+” can be used to improve your other grades if needed.
Feedback Feedback will be provided on assignments and exams within 2-3 days of receipt. Responses to the forum posts will be provided not less than 4 times per week.
Confidentiality
We can draw on the wealth of examples from our organizations in class discussions and in our written work. However, it is imperative that we not share information that is confidential, privileged, or proprietary in nature. We must be mindful of any contracts we have agreed to with our companies. In addition, we should respect our fellow classmates and work under the assumption that what is discussed here (as it pertains to the workings of particular organizations) stays within the confines of the classroom. [Please add this to your syllabus, in the confidentiality sub- section:
For your awareness, members of the University's technical staff have access to all course sites to aid in course setup and technical troubleshooting. Program Chairs and a small number of Graduate Professional Studies (GPS) staff have access to all GPS courses for oversight purposes. Students enrolled in GPS courses can expect that individuals other than their fellow classmates and the course instructor(s) may visit their course for various purposes. Their intentions are to aid in technical troubleshooting and to ensure that quality course delivery standards are met. Strict confidentiality of student information is maintained.
Class Schedule
Week 1 01/22 - 01/28 Week 7 03/05 – 03/11
Week 2 01/29 - 02/04 Week 8 03/12 – 03/18
Week 3 02/05 - 02/11 Week 9 03/19 - 03/25 Week 4 02/12 – 02/18 Week 10 03/26 - 04/01
16 17
Week 5 02/19 – 02/25 Week 11 04/02 - 04/08
Week 6 02/26 – 03/04 Week 12 04/09 - 04/15
Week 13 04/16 - 04/22
IV. University and Division of Continuing Studies Standards Please review the policies and procedures of Continuing Studies, found at http://www.brandeis.edu/gps/students/studentresources/policiesprocedures/index.html. Among them, we would like to highlight the following. Learning Disabilities If you are a student with a documented disability on record at Brandeis University and wish to have a reasonable accommodation made for you in this course, please contact me immediately. Academic Honesty and Student Integrity Academic honesty and student integrity are of fundamental importance at Brandeis University and we want students to understand this clearly at the start of the term. As stated in the Brandeis Rights and Responsibilities handbook, “Every member of the University Community is expected to maintain the highest standards of academic honesty. A student shall not receive credit for work that is not the product of the student’s own effort. A student's name on any written exercise constitutes a statement that the work is the result of the student's own thought and study, stated in the students own words, and produced without the assistance of others, except in quotes, footnotes or references with appropriate acknowledgement of the source." In particular, students must be aware that material (including ideas, phrases, sentences, etc.) taken from the Internet and other sources MUST be appropriately cited if quoted, and footnoted in any written work turned in for this, or any, Brandeis class. Also, students will not be allowed to collaborate on work except by the specific permission of the instructor. Failure to cite resources properly may result in a referral being made to the Office of Student Development and Judicial Education. The outcome of this action may involve academic and disciplinary sanctions, which could include (but are not limited to) such penalties as receiving no credit for the assignment in question, receiving no credit for the related course, or suspension or dismissal from the University.
Further information regarding academic integrity may be found in the following publications: "In Pursuit of Excellence - A Guide to Academic Integrity for the Brandeis Community", "(Students') Rights and Responsibilities Handbook" AND "Continuing Studies Student Handbook". You should read these publications, which all can be accessed from the Continuing Studies Web site. A student that is in doubt about standards of academic honesty (regarding plagiarism, multiple submissions of written work, unacknowledged or unauthorized collaborative effort, false citation or false data) should consult either the course instructor or other staff of the Rabb School for Continuing Studies. University Caveat The above schedule, content, and procedures in this course are subject to change in the event of extenuating circumstances.