CSC 411: Lecture 13: Mixtures of Gaussians and EM

Total Page:16

File Type:pdf, Size:1020Kb

CSC 411: Lecture 13: Mixtures of Gaussians and EM CSC 411: Lecture 13: Mixtures of Gaussians and EM Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 1 / 33 Today Mixture of Gaussians EM algorithm Latent Variables Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 2 / 33 Today: statistical formulation of clustering ! principled, justification for updates We need a sensible measure of what it means to cluster the data well I This makes it possible to judge different methods I It may help us decide on the number of clusters An obvious approach is to imagine that the data was produced by a generative model I Then we adjust the model parameters to maximize the probability that it would produce exactly the data we observed A Generative View of Clustering Last time: hard and soft k-means algorithm Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 3 / 33 We need a sensible measure of what it means to cluster the data well I This makes it possible to judge different methods I It may help us decide on the number of clusters An obvious approach is to imagine that the data was produced by a generative model I Then we adjust the model parameters to maximize the probability that it would produce exactly the data we observed A Generative View of Clustering Last time: hard and soft k-means algorithm Today: statistical formulation of clustering ! principled, justification for updates Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 3 / 33 An obvious approach is to imagine that the data was produced by a generative model I Then we adjust the model parameters to maximize the probability that it would produce exactly the data we observed A Generative View of Clustering Last time: hard and soft k-means algorithm Today: statistical formulation of clustering ! principled, justification for updates We need a sensible measure of what it means to cluster the data well I This makes it possible to judge different methods I It may help us decide on the number of clusters Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 3 / 33 I Then we adjust the model parameters to maximize the probability that it would produce exactly the data we observed A Generative View of Clustering Last time: hard and soft k-means algorithm Today: statistical formulation of clustering ! principled, justification for updates We need a sensible measure of what it means to cluster the data well I This makes it possible to judge different methods I It may help us decide on the number of clusters An obvious approach is to imagine that the data was produced by a generative model Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 3 / 33 A Generative View of Clustering Last time: hard and soft k-means algorithm Today: statistical formulation of clustering ! principled, justification for updates We need a sensible measure of what it means to cluster the data well I This makes it possible to judge different methods I It may help us decide on the number of clusters An obvious approach is to imagine that the data was produced by a generative model I Then we adjust the model parameters to maximize the probability that it would produce exactly the data we observed Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 3 / 33 GMM is a density estimator Where have we already used a density estimator? We know that neural nets are universal approximators of functions GMMs are universal approximators of densities (if you have enough Gaussians). Even diagonal GMMs are universal approximators. Gaussian Mixture Model (GMM) A Gaussian mixture model represents a distribution as K X p(x) = πk N (xjµk ; Σk ) k=1 with πk the mixing coefficients, where: K X πk = 1 and πk ≥ 0 8k k=1 Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 4 / 33 Where have we already used a density estimator? We know that neural nets are universal approximators of functions GMMs are universal approximators of densities (if you have enough Gaussians). Even diagonal GMMs are universal approximators. Gaussian Mixture Model (GMM) A Gaussian mixture model represents a distribution as K X p(x) = πk N (xjµk ; Σk ) k=1 with πk the mixing coefficients, where: K X πk = 1 and πk ≥ 0 8k k=1 GMM is a density estimator Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 4 / 33 We know that neural nets are universal approximators of functions GMMs are universal approximators of densities (if you have enough Gaussians). Even diagonal GMMs are universal approximators. Gaussian Mixture Model (GMM) A Gaussian mixture model represents a distribution as K X p(x) = πk N (xjµk ; Σk ) k=1 with πk the mixing coefficients, where: K X πk = 1 and πk ≥ 0 8k k=1 GMM is a density estimator Where have we already used a density estimator? Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 4 / 33 GMMs are universal approximators of densities (if you have enough Gaussians). Even diagonal GMMs are universal approximators. Gaussian Mixture Model (GMM) A Gaussian mixture model represents a distribution as K X p(x) = πk N (xjµk ; Σk ) k=1 with πk the mixing coefficients, where: K X πk = 1 and πk ≥ 0 8k k=1 GMM is a density estimator Where have we already used a density estimator? We know that neural nets are universal approximators of functions Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 4 / 33 Gaussian Mixture Model (GMM) A Gaussian mixture model represents a distribution as K X p(x) = πk N (xjµk ; Σk ) k=1 with πk the mixing coefficients, where: K X πk = 1 and πk ≥ 0 8k k=1 GMM is a density estimator Where have we already used a density estimator? We know that neural nets are universal approximators of functions GMMs are universal approximators of densities (if you have enough Gaussians). Even diagonal GMMs are universal approximators. Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 4 / 33 Now, we are trying to fit a GMM (with K = 2 in this example): Visualizing a Mixture of Gaussians { 1D Gaussians In the beginning of class, we tried to fit a Gaussian to data: [Slide credit: K. Kutulakos] Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 5 / 33 Visualizing a Mixture of Gaussians { 1D Gaussians In the beginning of class, we tried to fit a Gaussian to data: Now, we are trying to fit a GMM (with K = 2 in this example): [Slide credit: K. Kutulakos] Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 5 / 33 Visualizing a Mixture of Gaussians { 2D Gaussians Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 6 / 33 Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations How would you optimize this? Can we have a closed form update? Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations How would you optimize this? Can we have a closed form update? Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 I Identifiability: Solution is up to permutations How would you optimize this? Can we have a closed form update? Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 How would you optimize this? Can we have a closed form update? Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 Can we have a closed form update? Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations How would you optimize this? Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 Don't forget to satisfy the constraints on πk Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations How would you optimize this? Can we have a closed form update? Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 Fitting GMMs: Maximum Likelihood Maximum likelihood maximizes N K ! X X (n) ln p(Xjπ; µ, Σ) = ln πk N (x jµk ; Σk ) n=1 k=1 w.r.t Θ = fπk ; µk ; Σk g Problems: I Singularities: Arbitrarily large likelihood when a Gaussian explains a single point I Identifiability: Solution is up to permutations How would you optimize this? Can we have a closed form update? Don't forget to satisfy the constraints on πk Zemel, Urtasun, Fidler (UofT) CSC 411: 13-MoG 7 / 33 K X = p(z = k) p(xjz = k) k=1 | {z } | {z } πk N (xjµk ;Σk ) We could introduce a
Recommended publications
  • LECTURE 13 Mixture Models and Latent Space Models
    LECTURE 13 Mixture models and latent space models We now consider models that have extra unobserved variables. The variables are called latent variables or state variables and the general name for these models are state space models.A classic example from genetics/evolution going back to 1894 is whether the carapace of crabs come from one normal or from a mixture of two normal distributions. We will start with a common example of a latent space model, mixture models. 13.1. Mixture models 13.1.1. Gaussian Mixture Models (GMM) Mixture models make use of latent variables to model different parameters for different groups (or clusters) of data points. For a point xi, let the cluster to which that point belongs be labeled zi; where zi is latent, or unobserved. In this example (though it can be extended to other likelihoods) we will assume our observable features xi to be distributed as a Gaussian, so the mean and variance will be cluster- specific, chosen based on the cluster that point xi is associated with. However, in practice, almost any distribution can be chosen for the observable features. With d-dimensional Gaussian data xi, our model is: zi j π ∼ Mult(π) xi j zi = k ∼ ND(µk; Σk); where π is a point on a K-dimensional simplex, so π 2 RK obeys the following PK properties: k=1 πk = 1 and 8k 2 f1; 2; ::; Kg : πk 2 [0; 1]. The variables π are known as the mixture proportions and the cluster-specific distributions are called th the mixture components.
    [Show full text]
  • START HERE: Instructions Mixtures of Exponential Family
    Machine Learning 10-701 Mar 30, 2015 http://alex.smola.org/teaching/10-701-15 due on Apr 20, 2015 Carnegie Mellon University Homework 9 Solutions START HERE: Instructions • Thanks a lot to John A.W.B. Constanzo for providing and allowing to use the latex source files for quick preparation of the HW solution. • The homework is due at 9:00am on April 20, 2015. Anything that is received after that time will not be considered. • Answers to every theory questions will be also submitted electronically on Autolab (PDF: Latex or handwritten and scanned). Make sure you prepare the answers to each question separately. • Collaboration on solving the homework is allowed (after you have thought about the problems on your own). However, when you do collaborate, you should list your collaborators! You might also have gotten some inspiration from resources (books or online etc...). This might be OK only after you have tried to solve the problem, and couldn’t. In such a case, you should cite your resources. • If you do collaborate with someone or use a book or website, you are expected to write up your solution independently. That is, close the book and all of your notes before starting to write up your solution. • Latex source of this homework: http://alex.smola.org/teaching/10-701-15/homework/ hw9_latex.zip. Mixtures of Exponential Family In this problem we will study approximate inference on a general Bayesian Mixture Model. In particular, we will derive both Expectation-Maximization (EM) algorithm and Gibbs Sampling for the mixture model. A typical
    [Show full text]
  • Robust Mixture Modelling Using the T Distribution
    Statistics and Computing (2000) 10, 339–348 Robust mixture modelling using the t distribution D. PEEL and G. J. MCLACHLAN Department of Mathematics, University of Queensland, St. Lucia, Queensland 4072, Australia [email protected] Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise. Keywords: finite mixture models, normal components, multivariate t components, maximum like- lihood, EM algorithm, cluster analysis 1. Introduction tailed alternative to the normal distribution. Hence it provides a more robust approach to the fitting of normal mixture mod- Finite mixtures of distributions have provided a mathematical- els, as observations that are atypical of a component are given based approach to the statistical modelling of a wide variety of reduced weight in the calculation of its parameters. Also, the random phenomena; see, for example, Everitt and Hand (1981), use of t components gives less extreme estimates of the pos- Titterington, Smith and Makov (1985), McLachlan and Basford terior probabilities of component membership of the mixture (1988), Lindsay (1995), and B¨ohning (1999).
    [Show full text]
  • SCRS/2020/056 Collect. Vol. Sci. Pap. ICCAT, 77(4): 240-251 (2020)
    SCRS/2020/056 Collect. Vol. Sci. Pap. ICCAT, 77(4): 240-251 (2020) 1 REVIEW ON THE EFFECT OF HOOK TYPE ON THE CATCHABILITY, HOOKING LOCATION, AND POST-CAPTURE MORTALITY OF THE SHORTFIN MAKO, ISURUS OXYRINCHUS B. Keller1*, Y. Swimmer2, C. Brown3 SUMMARY Due to the assessed vulnerability for the North Atlantic shortfin mako, Isurus oxyrinchus, ICCAT has identified the need to better understand the use of circle hooks as a potential mitigation measure in longline fisheries. We conducted a literature review related to the effect of hook type on the catchability, anatomical hooking location, and post-capture mortality of this species. We found twenty eight papers related to these topics, yet many were limited in interpretation due to small sample sizes and lack of statistical analysis. In regards to catchability, our results were inconclusive, suggesting no clear trend in catch rates by hook type. The use of circle hooks was shown to either decrease or have no effect on at-haulback mortality. Three papers documented post-release mortality, ranging from 23-31%. The use of circle hooks significantly increased the likelihood of mouth hooking, which is associated with lower rates of post-release mortality. Overall, our review suggests minimal differences in catchability of shortfin mako between hook types, but suggests that use of circle hooks likely results in higher post-release survival that may assist population recovery efforts. RÉSUMÉ En raison de la vulnérabilité évaluée en ce qui concerne le requin-taupe bleu de l'Atlantique Nord (Isurus oxyrinchus), l’ICCAT a identifié le besoin de mieux comprendre l'utilisation des hameçons circulaires comme mesure d'atténuation potentielle dans les pêcheries palangrières.
    [Show full text]
  • Arrowhead Union High School CHALLENGE ROPES COURSE
    POLICY: 327. CHALLENGE ROPES COURSE POLICY AND PROCEDURES MANUAL Arrowhead Union High School CHALLENGE ROPES COURSE POLICY AND PROCEDURES MANUAL CREATED BY: Arrowhead Physical Education Department Updated: July, 2008 2 TABLE OF CONTENTS Purpose of the Manual 3 Description of the Course 3 Rationale and Philosophy 3-4 Course Location & Physical Function 4 Safety Management 4-5 Requirements of Participants 5 Safety Standards 5- 7 Lead Climbing by instructor 7 - 8 Elements 8 Spotting Techniques 8 - 9 Belaying High Elements 9-10 Double Belay 10 Rescues 10-11 Injuries 11 Blood Born Pathogens Policy 11 PE Staff & Important Telephone Numbers 11 Registration/Release Forms 12 Statement of Health Form/Photo Release 13-14 Set-up Procedures for High Elements on Challenge Course 15-17 3 PURPOSE OF THE MANUAL Literature has been written about the clinical application of ropes and initiative programs. The purpose of this manual is to provide a reference for the technical, mechanical, and task aspects of Arrowhead Target Wellness Challenge Ropes Course & Indoor Climbing Wall Program. Our instructors must speak a common language with respect to setting up, taking down, spotting, belaying, safety practices and methodology, so that these processes remain consistent. Only through this common language will safe use of the course result and full attention be devoted to the educational experience. While variations of tasks are possible and language used to frame the tasks may change, success is always measured by individual and group experience. The technical and mechanical aspects described here represent our Policy, and may only be altered by the Arrowhead Target Wellness Team.
    [Show full text]
  • Human Rights Charter
    1 At Treasury Wine Estates Limited (TWE): • We believe that human rights recognise the inherent value of each person and encompass the basic freedoms and protections that belong to every single one of us. Our business and people can only thrive when human rights are safeguarded. • We understand that it’s the diversity of our people that makes us unique and so we want you to be you, because you belong here and you matter. You have a role to play in ensuring a professional and safe working environment where respect for human rights is the cornerstone of our culture and where everyone can make a contribution and feel included. • We are committed to protecting human rights and preventing modern slavery in all its forms, including forced labour and human trafficking, across our global supply chain. We endeavour to respect and uphold the human rights of our people and any other individuals we are in contact with, either directly or indirectly. This Charter represents TWE’s commitment to upholding human rights. HUMAN RIGHTS COMMITMENTS In doing business, we are committed to respecting human rights and support and uphold the principles within the UN Universal Declaration of Human Rights, the United Nations Guiding Principles on Business and Human Rights, the ILO 1998 Declaration on Fundamental Principles and Rights at Work and Modern Slavery Acts. TWE’s commitment to the protection of human rights and the prevention of modern slavery is underpinned by its global policies and programs, including risk assessment processes that are designed to identify impacts and adopt preventative measures.
    [Show full text]
  • D4 Descender News Feature
    The new D4 is an evolutionary development in the world of descenders, a combination of all the best features from other descenders in one. Our market research discovered that many industry users have multiple demands that were not being met by current devices on the market, we listened and let them get involved in the design process of the D4. After hundreds of hours of trials, evaluation and feedback, eventually more than two years later we are proud to release what we, and many dozens of experienced industry users think is the best work/rescue descender in the world! varying speeds but always under control, they need to ascend, they need to create a Z-rig system for progress capture, they need to belay and they need to rescue ideally without having to add components to create friction. Z-rig system with the D4 Work/ Tyrolean Traverse rigging, North Wales. Rescue Descender with ATRAES, Australia. The device needed to work on industry standard ropes of 10.5mm-11.5mm RP880 D4 Descender whether they be dry or wet, new or old, needed to be easy to thread and easy to use, to be robust to the point of being indestructible, and last but Weight > 655g (23oz) not least the device needed to have a double-stop panic brake that engages Slip Load > Approx.5kN > 10.5-11.5mm when you need it to but isn’t so sensitive that it comes on and off too easily. Rope Range User Weight > Up to 240kg (500lbs) It was a tough call. Standards > EN12841; ANSI Z359; NFPA 1983 Page 1 of 4 Features & Applications The D4 has a 240kg/500lb Working Load Limit which means it is absolutely suitable for two-man rescue, without the need for the creation of extra friction.
    [Show full text]
  • The Impact of Mold on Red Hook NYCHA Tenants
    The Impact of Mold on Red Hook NYCHA Tenants A Health Crisis in Public Housing OCTOBER 2016 Acknowledgements Participatory Action Research Team Leticia Cancel Shaquana Cooke Bonita Felix Dale Freeman Ross Joy Juana Narvaez Anna Ortega-Williams Marissa Williams With Support From Turning the Tide New York Lawyers for the Public Interest Red Hook Community Justice Center Report Development and Design Pratt Center for Community Development With Funding Support From The Kresge Foundation North Star Fund 2 The Impact of Mold on Red Hook NYCHA Tenants: A Health Crisis in Public Housing Who We Are Red Hook Initiative (RHI) Turning the Tide: A Community-Based RHI is non-profit organization serving the Collaborative community of Red Hook in Brooklyn, New York. RHI is a member of Turning the Tide (T3), a RHI believes that social change to overcome community-based collaborative, led by Fifth systemic inequities begins with empowered Avenue Committee and including Families United youth. In partnership with community adults, we for Racial and Economic Equality, and Southwest nurture young people in Red Hook to be inspired, Brooklyn Industrial Development Corporation. resilient, and healthy, and to envision themselves T3 is focused on engaging and empowering as co-creators of their lives, community and South Brooklyn public housing residents in the society. climate change movement to ensure that the unprecedented public and private-investments We envision a Red Hook where all young for NYCHA capital improvements meet the real people can pursue their dreams and grow into and pressing needs of residents. independent adults who contribute to their families and community.
    [Show full text]
  • Pre-K Attendance Zones 21-22 .Pdf
    Katy ISD Prekindergarten Attendance Zones 2021 - 2022 The lists below indicate the prekindergarten attendance zones for the 2021 - 2022 school year. Enrollment at a particular campus is subject to the attendance zone guidelines and space availability. Students who qualify for the prekindergarten program will attend their zoned prekindergarten campus unless the campus no longer has space. Students who qualify for the bilingual program attend campuses on the Bilingual Students list. All other prekindergarten students attend campuses on the ESL/Income/Military/DFPS/Star of Texas Students list. The District will offer students the opportunity to attend a different campus if their zoned campus is full. ESL/INCOME/MILITARY/DFPS/Star of Texas Bilingual Students Students Campus of Students From Campus of Attendance Students From Attendance ACE ACE, JRE, SSE ACE ACE BCE BCE BCE BCE, WE (LUZ 17) BES BES BHE BHE FE FE CBE CBE FES FES and MCE CE CE and FE DWE DWE HE HE, KE, BES (LUZ 8, 20A, 20B, 30, 31B, 40) (N OF I-10) FES FES JWE JWE, CE and RAE FPSE FPSE GE GE MJE MJE, BES (LUZ 50A, 50B, 51A) (S of I-10), HE HE KDE, RJWE and WCE MPE MPE, SCE, JEE, JHE, NCE, PE and WME JEE JEE JHE JHE MRE MRE, DWE (LUZ 24 A-Colony, LUZ 24B- Williams Chase, LUZ 34-Settlement), CBE JRE JRE and OLE, PMCE KE KE Speaking Classes PME PME, GE and RES (LUZ 27C) KDE KDE MCE MCE RES RES (LUZ 14B, 15A, 15B) and WE (LUZ 18) PMCE PMCE RKE RKE and DWE (LUZ 23A, 23B) MGE MGE Spanish MJE MJE RRE RRE, MGE, BHE MRE MRE SE SE Speaking Classes NCE NCE and MPE OKE OKE SES SES and WE (LUZ 39, 48, 49) OLE OLE PE PE? TWE TWE, FPSE, OKE English PME PME USE USE RAE RAE RES RES RJWE RJWE RKE RKE RRE RRE SCE SCE and JWE SE SE SES SES SSE SSE TWE TWE USE USE WCE WCE WE WE WME WME .
    [Show full text]
  • Lecture 16: Mixture Models
    Lecture 16: Mixture models Roger Grosse and Nitish Srivastava 1 Learning goals • Know what generative process is assumed in a mixture model, and what sort of data it is intended to model • Be able to perform posterior inference in a mixture model, in particular { compute the posterior distribution over the latent variable { compute the posterior predictive distribution • Be able to learn the parameters of a mixture model using the Expectation-Maximization (E-M) algorithm 2 Unsupervised learning So far in this course, we've focused on supervised learning, where we assumed we had a set of training examples labeled with the correct output of the algorithm. We're going to shift focus now to unsupervised learning, where we don't know the right answer ahead of time. Instead, we have a collection of data, and we want to find interesting patterns in the data. Here are some examples: • The neural probabilistic language model you implemented for Assignment 1 was a good example of unsupervised learning for two reasons: { One of the goals was to model the distribution over sentences, so that you could tell how \good" a sentence is based on its probability. This is unsupervised, since the goal is to model a distribution rather than to predict a target. This distri- bution might be used in the context of a speech recognition system, where you want to combine the evidence (the acoustic signal) with a prior over sentences. { The model learned word embeddings, which you could later analyze and visu- alize. The embeddings weren't given as a supervised signal to the algorithm, i.e.
    [Show full text]
  • Highlighting Utterances in Chinese Spoken Discourse
    Highlighting Utterances in Chinese Spoken Discourse Shu-Chuan Tseng Institute of Linguistics Academia Sinica, Nankang I15 Taipei, Taiwan tsengsc(4ate.sinica.edu.tw Abstract This paper presents results of an empirical analysis on the structuring of spoken discourse focusing upon how some particular utterance components in Chinese spoken dialogues are highlighted. A restricted number of words frequently and regularly found within utterances structure the dialogues by marking certain significant locations. Furthermore, a variety of signals of monitoring and repairing in conversation are also analysed and discussed. This includes discourse particles, speech disfluency as well as their prosodic representation. In this paper, they are considered a kind of "highlighting-means" in spoken language, because their function is to strengthen the structuring of discourse as well as to emphasise important functions and positions within utterances in order to support the coordination and the communication between interlocutors in conversation. 1 Introduction In the field of discourse analysis, many researchers such as Sacks, Schegloff and Jefferson (Sacks et al. 1974; Schegloff et al. 1977), Taylor and Cameron (Taylor & Cameron 1987), Hovy and Scott (Hovy & Scott 1991) and the others have extensively explored how conversation is organised. Various topics have been investigated, ranging from turn taking, self-corrections, functionality of discourse components, intention and information delivery to coherence and segmentation of spoken utterances. The methodology has also accordingly changed from commenting on fragments of conversations, to theoretical considerations based on more materials, then to statistical and computational modelling of conversation. Apparently, we have experienced the emerging interests and importance on the structure of spoken discourse.
    [Show full text]
  • Linear Mixture Model Applied to Amazonian Vegetation Classification
    Remote Sensing of Environment 87 (2003) 456–469 www.elsevier.com/locate/rse Linear mixture model applied to Amazonian vegetation classification Dengsheng Lua,*, Emilio Morana,b, Mateus Batistellab,c a Center for the Study of Institutions, Population, and Environmental Change (CIPEC) Indiana University, 408 N. Indiana Ave., Bloomington, IN, 47408, USA b Anthropological Center for Training and Research on Global Environmental Change (ACT), Indiana University, Bloomington, IN, USA c Brazilian Agricultural Research Corporation, EMBRAPA Satellite Monitoring Campinas, Sa˜o Paulo, Brazil Received 7 September 2001; received in revised form 11 June 2002; accepted 11 June 2002 Abstract Many research projects require accurate delineation of different secondary succession (SS) stages over large regions/subregions of the Amazon basin. However, the complexity of vegetation stand structure, abundant vegetation species, and the smooth transition between different SS stages make vegetation classification difficult when using traditional approaches such as the maximum likelihood classifier (MLC). Most of the time, classification distinguishes only between forest/non-forest. It has been difficult to accurately distinguish stages of SS. In this paper, a linear mixture model (LMM) approach is applied to classify successional and mature forests using Thematic Mapper (TM) imagery in the Rondoˆnia region of the Brazilian Amazon. Three endmembers (i.e., shade, soil, and green vegetation or GV) were identified based on the image itself and a constrained least-squares solution was used to unmix the image. This study indicates that the LMM approach is a promising method for distinguishing successional and mature forests in the Amazon basin using TM data. It improved vegetation classification accuracy over that of the MLC.
    [Show full text]