Mixture Models & EM Algorithm Lecture 21

Total Page:16

File Type:pdf, Size:1020Kb

Mixture Models & EM Algorithm Lecture 21 Mixture Models & EM algorithm Lecture 21 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore The Evils of “Hard Assignments”? • Clusters may overlap • Some clusters may be “wider” than others • Distances can be deceiving! ProbabilisOc Clustering • Try a probabilisOc model! Y X X • allows overlaps, clusters of different 1 2 size, etc. ?? 0.1 2.1 • Can tell a generave story for ?? 0.5 -1.1 data ?? 0.0 3.0 – P(X|Y) P(Y) ?? -0.1 -2.0 • Challenge: we need to esOmate model parameters without ?? 0.2 1.5 labeled Ys … … … The General GMM assumpOon • P(Y): There are k components • P(X|Y): Each component generates data from a mulvariate Gaussian with mean μi and covariance matrix Σi Each data point is sampled from a genera&ve process: 1. Choose component i with probability P(y=i) 2. Generate datapoint ~ N(mi, Σi ) µ2 Gaussian mixture model µ (GMM) 1 µ3 (x µ )2 i− i0 2σ2 1 − i √ e = ln σi 2π (x µ )2 i− i1 1 2σ2 e− i σi√2π 2 2 (xi µi0) (xi µi1) = − 2 + − 2 − 2σi 2σi 2 2 µi0 + µi1 µi0 + µi1 = 2 xi + 2 σi 2σi 2 2 1 θ µi0 + µi1 w0 = ln − + 2 θ 2σi µi0 + µi1 wi = 2 σi w = w + η [y∗ p(y∗ x ,w)]f(x ) j − j | j j j w = w +[y∗ y(x;w)]f(x) − w = w + y∗f(x) h(x) = sign αihi(x) i n Count(Y = y)= δ(Y j = y) What Model Should We Use? =1 j n • Depends on X! j Count(Y = y)= D(j)δ(Y = y) Y X1 X2 • Here, maybe Gaussian Naïve Bayes? j=1 ?? 0.1 2.1 – MulOnomial over clusters Y ?? 0.5 -1.1 f(x)=– (Independent) Gaussian for each Xw0 + wihi(x) i given Y ?? 0.0 3.0 i ?? -0.1 -2.0 p(Yi = yk)=θk ?? 0.2 1.5 … … … 4 Could we make fewer assumpOons? • What if the Xi co-vary? • What if there are mulOple peaks? • Gaussian Mixture Models! – P(Y) sOll mulOnomial – P(X|Y) is a mulOvariate Gaussian distribuOon: 1 # 1 T & P(X x |Y i) exp x "1 x = j = = m/2 1/2 %" ( j " µi ) !i ( j " µi )( (2! ) || !i || $ 2 ' MulOvariate Gaussians 1 # 1 T & P(X x |Y i) exp x "1 x =P(X=j xj)== = m/2 1/2 %" ( j " µi ) !i ( j " µi )( (2! ) || !i || $ 2 ' Σ ∝ idenOty matrix MulOvariate Gaussians 1 # 1 T & P(X x |Y i) exp x "1 x =P(X=j xj)== = m/2 1/2 %" ( j " µi ) !i ( j " µi )( (2! ) || !i || $ 2 ' Σ = diagonal matrix Xi are independent ala Gaussian NB MulOvariate Gaussians 1 # 1 T & P(X x |Y i) exp x "1 x =P(X=j xj)== = m/2 1/2 %" ( j " µi ) !i ( j " µi )( (2! ) || !i || $ 2 ' Σ = arbitrary (semidefinite) matrix: - specifies rotaon (change of basis) - eigenvalues specify relave elongaon MulOvariate Gaussians Eigenvalue, λ, of Σ Covariance matrix, Σ, = degree to which xi vary together 1 # 1 T & P(X x |Y i) exp x "1 x =P(X=j xj)== = m/2 1/2 %" ( j " µi ) !i ( j " µi )( (2! ) || !i || $ 2 ' Mixtures of Gaussians (1) Old Faithful Data Set Time to Erupon Duraon of Last ErupOon Mixtures of Gaussians (1) Old Faithful Data Set Single Gaussian Mixture of two Gaussians Mixtures of Gaussians (2) Combine simple models into a complex model: Component Mixing coefficient K=3 Mixtures of Gaussians (3) Eliminang Hard Assignments to Clusters Model data as mixture of mulOvariate Gaussians Eliminang Hard Assignments to Clusters Model data as mixture of mulOvariate Gaussians Eliminang Hard Assignments to Clusters Model data as mixture of mulOvariate Gaussians Shown is the posterior probability that a point was generated from ith Gaussian: Pr(Y = i x) | ML esOmaon in supervised seng • Univariate Gaussian • Mixture of Mul&variate Gaussians ML esOmate for each of the MulOvariate Gaussians is given by: n n 1 T µ k = x k 1 k k ML ! n !ML = #(x j " µML )(x j " µML ) n j=1 n j=1 Just sums over x generated from the k’th Gaussian That was easy! But what if unobserved data? • MLE: – argmaxθ ∏j P(yj,xj) – θ: all model parameters • eg, class probs, means, and variances • But we don’t know yj’s!!! • Maximize marginal likelihood: K – argmaxθ ∏j P(xj) = argmax ∏j ∑k=1 P(Yj=k, xj) How do we opOmize? Closed Form? • Maximize marginal likelihood: K – argmaxθ ∏j P(xj) = argmax ∏j ∑k=1 P(Yj=k, xj) • Almost always a hard problem! – Usually no closed form soluOon – Even when lgP(X,Y) is convex, lgP(X) generally isn’t… – For all but the simplest P(X), we will have to do gradient ascent, in a big messy space with lots of local opOmum… Learning general mixtures of Gaussian 1 ⎡ 1 T ⎤ P(y k | x ) exp x −1 x P(y k) = j ∝ m / 2 1/ 2 ⎢ − ( j − µk ) Σk ( j − µk )⎥ = (2π) || Σk || ⎣ 2 ⎦ • Marginal likelihood: € m m K ∏P(x j ) = ∏∑P(x j ,y = k) j =1 j =1 k=1 m K 1 ⎡ 1 T ⎤ exp x −1 x P(y k) = ∏∑ m / 2 1/ 2 ⎢ − ( j − µk ) Σk ( j − µk )⎥ = j =1 k=1 (2π) || Σk || ⎣ 2 ⎦ • Need to differenOate and solve for μk, Σk, and P(Y=k) for k=1..K € • There will be no closed form soluOon, gradient is complex, lots of local opOmum • Wouldn’t it be nice if there was a be<er way!?! Expectaon Maximizaon The EM Algorithm • A clever method for maximizing marginal likelihood: K – argmaxθ ∏j P(xj) = argmaxθ ∏j ∑k=1 P(Yj=k, xj) – A type of gradient ascent that can be easy to implement (eg, no line search, learning rates, etc.) • Alternate between two steps: – Compute an expectaon – Compute a maximizaon • Not magic: sll op&mizing a non-convex func&on with lots of local op&ma – The computaons are just easier (oqen, significantly so!) EM: Two Easy Steps K K Objecve: argmaxθ lg∏j ∑k=1 P(Yj=k, xj |θ) = ∑j lg ∑k=1 P(Yj=k, xj|θ) Data: {x | j=1 .. n} Notaon a bit inconsistent j Parameters = θ=λ • E-step: Compute expectaons to “fill in” missing y values according to current parameters, θ – For all examples j and values k for Yj, compute: P(Yj=k | xj, θ) • M-step: re-esOmate the parameters with “weighted” MLE esOmates – Set θ = argmaxθ ∑j ∑k P(Yj=k | xj, θ) log P(Yj=k, xj |θ) Especially useful when the E and M steps have closed form soluOons!!! EM algorithm: Pictorial View Given a set of Parameters and training data Class assignment is probabilisOc or weighted (soq EM) Class assignment is Supervised learning hard (hard EM) problem EsOmate the class of each relearn the parameters training example using the based on the new training parameters yielding new data (weighted) training data Simple example: learn means only! Consider: • 1D data • Mixture of k=2 Gaussians • Variances fixed to σ=1 • DistribuOon over classes is uniform .01 .03 .05 .07 .09 • Just need to esOmate μ1 and μ2 m K m K =2 ⎡ 1 2 ⎤ ∏∑P(x,Y j = k) ∝ ∏∑exp⎢ − 2 x − µk ⎥ P(Y j = k) j =1 k=1 j =1 k=1 ⎣ 2σ ⎦ € EM for GMMs: only learning means Iterate: On the t’th iteraon let our esOmates be (t) (t) (t) λt = { μ1 , μ2 … μK } E-step Compute “expected” classes of all datapoints ⎛ 1 2⎞ P Y = k x ,µ ...µ ∝ exp⎜ − x − µ ⎟P Y = k ( j j 1 K ) ⎝ 2σ 2 j k ⎠ ( j ) M-step Compute most likely new μs given class expectaons € m P Y k x x ∑ ( j = j ) j j =1 µk = m P Y k x ∑ ( j = j ) j =1 € (t) E.M. for General GMMs pk is shorthand for esOmate of P(y=k) on Iterate: On the t’th iteraon let our esOmates be t’th iteraon (t) (t) (t) (t) (t) (t) (t) (t) (t) λt = { μ1 , μ2 … μK , Σ1 , Σ2 … ΣK , p1 , p2 … pK } E-step Compute “expected” classes of all datapoints for each class (t ) (t ) (t ) P Y k x , p p x , Just evaluate a ( j = j λt ) ∝ k ( j µk Σk ) Gaussian at xj M-step Compute weighted MLE for μ given expected classes above € T P Y = k x ,λ x P Y k x , x (t +1) x (t +1) ∑ ( j j t ) j ∑ ( j = j λt ) [ j − µk ][ j − µk ] (t +1) j (t +1) j µk = Σk = P Y = k x ,λ P Y k x , ∑ ( j j t ) ∑ ( j = j λt ) j j P Y k x , ∑ ( j = j λt ) p (t +1) = j k m € € m = #training examples € Gaussian Mixture Example: Start Aqer first iteraon Aqer 2nd iteraon Aqer 3rd iteraon Aqer 4th iteraon Aqer 5th iteraon Aqer 6th iteraon Aqer 20th iteraon What if we do hard assignments? Iterate: On the t’th iteraon let our esOmates be (t) (t) (t) λt = { μ1 , μ2 … μK } E-step Compute “expected” classes of all datapoints ⎛ 1 2⎞ P Y = k x ,µ ...µ ∝ exp⎜ − x − µ ⎟P Y = k ( j j 1 K ) ⎝ 2σ 2 j k ⎠ ( j ) M-step δ represents hard Compute most likely new μs given class expectaons assignment to “most likely” or nearest € m cluster P Y = k x x Y k,x x ∑ ( j j ) j δ( j = j ) j j =1 µ = µ = k m k m δ Y = k,x P Y = k x ∑ ( j j ) ∑ ( j j ) j =1 j =1 Equivalent to k-means clustering algorithm!!! € € Lets look at the math behind the magic! Next lecture, we will argue that EM: • OpOmizes a bound on the likelihood • Is a type of coordinate ascent • Is guaranteed to converge to a (oqen local) opOma .
Recommended publications
  • 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
    Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
    [Show full text]
  • University Microfilms, a XEROX Company, Ann Arbor, Michigan DIFFRACTION by DOUBLY CURVED
    70-26,382 VOLTMER, David Russell, 1939- DIFFRACTION BY DOUBLY CURVED CONVEX SURFACES. The Ohio State University, Ph.D., 1970 Engineering, electrical University Microfilms, A XEROX Company, Ann Arbor, Michigan DIFFRACTION BY DOUBLY CURVED CONVEX SURFACES DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University by David Russell Voltmer, B.S., M.S.E.E. ****** The Ohio State University 1970 Approved by 6 & k v t Adviser ^Adviser Department of Electrical Engineering t ACKNOWLEDGMENTS This work, though written in my name, reflects the efforts of many others; I take this opportunity to credit them and to offer my gratitude to them. The continued guidance, encourage­ ment, and assistance of Professor Robert G. Kouyoumjian, my adviser, has been instrumental in the successful completion of this study. Professors Leon Peters, Jr. and Roger C. Rudduck, as members of the reading committee, aided in the editing of the manuscript with their suggestions. I especially thank my wife, Joan, who provided understanding, moral support, and love through a difficult time. The majority of this work was accomplished while I was under the sponsorship of Hughes Aircraft Company. Additional assistance was provided by the stimulation and encouragement of the Plasma Physics Laboratory of the USAF Aerospace Research Laboratories. ii VITA June 4, 1939 Born - Ottumwa, Iowa 1961 Be Sc, Iowa State University, Ames, Iowa 1961-1963 Hughes Work-Study Masters Fellow, Hughes Aircraft Company, Fullerton, California 1963 MoS.EcEc, University of Southern California, Los Angeles, California 1963-1966 Hughes Staff Doctoral Fellow, The Ohio State University, Columbus, Ohio 1966-1969 Research Engineer, Plasma Physics Laboratory, USAF Aerospace Research Laboratories, Wright-Patterson AFB, Ohio 1969-1970 Assistant Professor, Department of Electrical Engineering, Pennsylvania State University, University Park, Pennsylvania FIELDS OF STUDY Major Field: Electrical Engineering Studies in Electromagnetism.
    [Show full text]
  • Kyrillische Schrift Für Den Computer
    Hanna-Chris Gast Kyrillische Schrift für den Computer Benennung der Buchstaben, Vergleich der Transkriptionen in Bibliotheken und Standesämtern, Auflistung der Unicodes sowie Tastaturbelegung für Windows XP Inhalt Seite Vorwort ................................................................................................................................................ 2 1 Kyrillische Schriftzeichen mit Benennung................................................................................... 3 1.1 Die Buchstaben im Russischen mit Schreibschrift und Aussprache.................................. 3 1.2 Kyrillische Schriftzeichen anderer slawischer Sprachen.................................................... 9 1.3 Veraltete kyrillische Schriftzeichen .................................................................................... 10 1.4 Die gebräuchlichen Sonderzeichen ..................................................................................... 11 2 Transliterationen und Transkriptionen (Umschriften) .......................................................... 13 2.1 Begriffe zum Thema Transkription/Transliteration/Umschrift ...................................... 13 2.2 Normen und Vorschriften für Bibliotheken und Standesämter....................................... 15 2.3 Tabellarische Übersicht der Umschriften aus dem Russischen ....................................... 21 2.4 Transliterationen veralteter kyrillischer Buchstaben ....................................................... 25 2.5 Transliterationen bei anderen slawischen
    [Show full text]
  • Fee Trairu S~Ifitufa1 Inumtci Tleoltim~A Hudm
    S~ifItufa1 inUMtci tLEOLtIM~a hudm fee trairu is. pas scu IL.OPfSCT0 WCNWNCA0~ AUMaMAn It. TV"c opbc"f uLmmsm A Caum Mwsul for Eaxtin Trani Prollmmi Wift"MVGLmn v BlrATEE Al IVAN. W/W A It p.rinted: April. 1970 ZIformation Rlesources Division/PDIM TAMJ OF Comus IN.R EDUCTW N .. ....I $ ... ............. .................. A. EXTUMM TRADWC- -FHM I ............................. mAdmmwmofm u m a ............................. Ww, Ij . ..................... ... ,1 u lotmAVaud A .. ......... ....... l so n (%Bch lu t .......... ...... Fo Vm u ....... ...... 0 Rw M &Plouru .. ... ... S. T1Dg OFg ilft1 OF AGURLATURAL LXTDGCRE FlG RAM C EXT)OM TUMIAWG- MAK II . .......................... , m Y A d . ........... ...... Hm Fer O AM IkM 160 ..................... 1u Lm " " Pms . .. ............................. Ag.mml lm g . .lm. .. ... .. .. .. .. Cjm,mn m om%m .................................... '. ,.mLAW La mm' .. .. ............................ lulusm 49 .. .. .. .. .. la a llw " .... .. p wm v Ch,am ca ra ...... ............................ ..... we . ° . .° -°• AP IM ........ .... o . .. .. .... .. .. °.o. N tu am a %P nodC to" sow smm snm a" No 0 b oln* a"n 6IN WML kl ia tim o U a aw m &W aer ao OwMeL t pom mu biw bre boe bes p fro a r of in w.l - ow Dowbop .a,UeAsid SIM tmew of Air uir e. MIinJ PretIla As ­ C'UNIVral COm au&Inm NA 1I COOTRal Psgbm P110119 4 but7 1118bw ts a d foair ia ist rare ctent or miarl tm cba directly WMV, dweo Wo a Loral peugram aIs the w ad tow a s of tho biy to s Ue tow anmei olLowd in tM hadboo mre adpt for &M tried i the traWL aI Kos lys skn gtaff Tlwy bave bee sucs in that stato A sbrvej of codkm for 0220"bM trals" wa the first stop MANY '.
    [Show full text]
  • Transdermal Chronopharmaceutical Drug Delivery: Microneedles, Intradermal Infusion Experiments and a Delivery Device
    Michael Vosseler Transdermal chronopharmaceutical drug delivery: microneedles, intradermal infusion experiments and a delivery device Dissertation zur Erlangung des Doktorgrades der Technischen Fakultät der Albert-Ludwigs-Universität Freiburg im Breisgau April 3rd 2013 I Dekan Prof. Dr. Yiannos Manoli Gutachter Prof. Dr. Roland Zengerle Prof. Dr. Holger Reinecke Tag der Prüfung 05.02.2014 Michael Vosseler Hahn-Schickard-Gesellschaft für Angewandte Forschung e.V. Institut für Mikro- und Informationstechnik (HSG-IMIT) Villingen-Schwenningen II Erklärung Ich erkläre, dass ich die vorliegende Arbeit ohne unzulässige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus ande- ren Quellen direkt oder indirekt übernommenen Daten und Konzepte sind unter An- gabe der Quelle gekennzeichnet. Insbesondere habe ich hierfür nicht die entgeltliche Hilfe von Vermittlungs- oder Beratungsdiensten (Promotionsberaterinnen oder Pro- motionsberater oder anderer Personen) in Anspruch genommen. Niemand hat von mir unmittelbar oder mittelbar geldwerte Leistungen für Arbeiten erhalten, die im Zu- sammenhang mit dem Inhalt der vorgelegten Dissertation stehen. Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder ähnlicher Form einer anderen Prüfungsbehörde vorgelegt. Ich erkläre hiermit, dass ich mich noch nie an einer in- oder ausländischen wissen- schaftlichen Hochschule um die Promotion beworben habe oder gleichzeitig be- werbe. Datum/date: Unterschrift/signature: III Abstract The beginning of modern chronotherapy was the delivery of cancer medication according to circadian (about 24 h) rhythms in the 70s and 80s of the last century. Nowadays, there is a lot of knowledge on rhythms of medical conditions and especially on the exacerbation of symptoms with time of day. Chronic conditions ranging from asthma to stroke are studied intensively.
    [Show full text]
  • The Unicode Standard 5.1 Code Charts
    Cyrillic Extended-B Range: A640–A69F The Unicode Standard, Version 5.1 This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 5.1. Characters in this chart that are new for The Unicode Standard, Version 5.1 are shown in conjunction with any existing characters. For ease of reference, the new characters have been highlighted in the chart grid and in the names list. This file will not be updated with errata, or when additional characters are assigned to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.1/ for charts showing only the characters added in Unicode 5.1. See http://www.unicode.org/Public/5.1.0/charts/ for a complete archived file of character code charts for Unicode 5.1. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 5.1 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, and #44, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online.
    [Show full text]
  • ISO/IEC International Standard 10646-1
    JTC1/SC2/WG2 N3381 ISO/IEC 10646:2003/Amd.4:2008 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — AMENDMENT 4: Cham, Game Tiles, and other characters such as ISO/IEC 8824 and ISO/IEC 8825, the concept of Page 1, Clause 1 Scope implementation level may still be referenced as „Implementa- tion level 3‟. See annex N. In the note, update the Unicode Standard version from 5.0 to 5.1. Page 12, Sub-clause 16.1 Purpose and con- text of identification Page 1, Sub-clause 2.2 Conformance of in- formation interchange In first paragraph, remove „, the implementation level,‟. In second paragraph, remove „, and to an identified In second paragraph, remove „with an implementation implementation level chosen from clause 14‟. level‟. In fifth paragraph, remove „, the adopted implementa- Page 12, Sub-clause 16.2 Identification of tion level‟. UCS coded representation form with imple- mentation level Page 1, Sub-clause 2.3 Conformance of de- vices Rename sub-clause „Identification of UCS coded repre- sentation form‟. In second paragraph (after the note), remove „the adopted implementation level,‟. In first paragraph, remove „and an implementation level (see clause 14)‟. In fourth and fifth paragraph (b and c statements), re- move „and implementation level‟. Replace the 6-item list by the following 2-item list and note: Page 2, Clause 3 Normative references ESC 02/05 02/15 04/05 Update the reference to the Unicode Bidirectional Algo- UCS-2 rithm and the Unicode Normalization Forms as follows: ESC 02/05 02/15 04/06 Unicode Standard Annex, UAX#9, The Unicode Bidi- rectional Algorithm, Version 5.1.0, March 2008.
    [Show full text]
  • Semi-Supervised Learning Overview 1 Semi-Supervised Learning
    CS 446 Machine Learning Fall 2016 Nov 10, 2016 Semi-Supervised Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview • Semi-Supervised Learning • Expectation Maximization • K-Means Algorithm 1 Semi-Supervised Learning Consider the problem of prepositional phrase attachment; we want to predict to which preceding phrase a prepositional phrase attaches, as in Buy car with money. Buy car with wheels. In this setting, each sentence has four features (one for each word) and our goal is to predict whether to attach the preposition ("with") to the verb ("buy") or the noun ("car"). Let us further expand the feature space to include all conjunctions, resulting in 15 total features. In the Naive Bayes setting, we want to predict noun or verb, given our features ([x1:::x15]). Therefore, we must estimate P (n) P (v) P (x1jn) P (x1jv) ... P (xnjn) P (xnjv) Note that there are two values for each cell in the column (ex. P (x1jn = 1) and P (x1jn = 0)). Now, given an example (x1; x2:::xn; ?), we compare P (njx) = P (n)P (x1jn):::P (xnjn) P (vjx) = P (v)P (x1jv):::P (xnjv). and predict the label with the highest probability. Consider that after seeing 10 examples, we have Semi-Supervised Learning-1 P (n) = 0:5 P (v) = 0:5 P (x1jn) = 0:75 P (x1jv) = 0:25 P (x2jn) = 0:5 P (x2jv) = 0:25 P (x3jn) = 0:5 P (x3jv) = 0:75 P (x4jn) = 0:5 P (xnjv) = 0:5 If now we have an example of x = [1000], we have 3 P (njx) = 0:5 ∗ 0:75 ∗ 0:5 ∗ 0:5 ∗ 0:5 = 64 3 P (vjx) = 0:5 ∗ 0:25 ∗ 0:75 ∗ 0:25 ∗ 0:5 = 256 According to Naive Bayes, we then compare these two numbers and predict that the label for this example is n.
    [Show full text]
  • Cyrillic # Version Number
    ############################################################### # # TLD: xn--j1aef # Script: Cyrillic # Version Number: 1.0 # Effective Date: July 1st, 2011 # Registry: Verisign, Inc. # Address: 12061 Bluemont Way, Reston VA 20190, USA # Telephone: +1 (703) 925-6999 # Email: [email protected] # URL: http://www.verisigninc.com # ############################################################### ############################################################### # # Codepoints allowed from the Cyrillic script. # ############################################################### U+0430 # CYRILLIC SMALL LETTER A U+0431 # CYRILLIC SMALL LETTER BE U+0432 # CYRILLIC SMALL LETTER VE U+0433 # CYRILLIC SMALL LETTER GE U+0434 # CYRILLIC SMALL LETTER DE U+0435 # CYRILLIC SMALL LETTER IE U+0436 # CYRILLIC SMALL LETTER ZHE U+0437 # CYRILLIC SMALL LETTER ZE U+0438 # CYRILLIC SMALL LETTER II U+0439 # CYRILLIC SMALL LETTER SHORT II U+043A # CYRILLIC SMALL LETTER KA U+043B # CYRILLIC SMALL LETTER EL U+043C # CYRILLIC SMALL LETTER EM U+043D # CYRILLIC SMALL LETTER EN U+043E # CYRILLIC SMALL LETTER O U+043F # CYRILLIC SMALL LETTER PE U+0440 # CYRILLIC SMALL LETTER ER U+0441 # CYRILLIC SMALL LETTER ES U+0442 # CYRILLIC SMALL LETTER TE U+0443 # CYRILLIC SMALL LETTER U U+0444 # CYRILLIC SMALL LETTER EF U+0445 # CYRILLIC SMALL LETTER KHA U+0446 # CYRILLIC SMALL LETTER TSE U+0447 # CYRILLIC SMALL LETTER CHE U+0448 # CYRILLIC SMALL LETTER SHA U+0449 # CYRILLIC SMALL LETTER SHCHA U+044A # CYRILLIC SMALL LETTER HARD SIGN U+044B # CYRILLIC SMALL LETTER YERI U+044C # CYRILLIC
    [Show full text]
  • Clustering on the Unit Hypersphere Using Von Mises-Fisher Distributions
    JournalofMachineLearningResearch6(2005)1345–1382 Submitted 9/04; Revised 4/05; Published 9/05 Clustering on the Unit Hypersphere using von Mises-Fisher Distributions Arindam Banerjee [email protected] Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712, USA Inderjit S. Dhillon [email protected] Department of Computer Sciences University of Texas at Austin Austin, TX 78712, USA Joydeep Ghosh [email protected] Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712, USA Suvrit Sra [email protected] Department of Computer Sciences University of Texas at Austin Austin, TX 78712, USA Editor: Greg Ridgeway Abstract Several large scale data mining applications, such as text categorization and gene expression anal- ysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a gen- erative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentra- tion parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants.
    [Show full text]
  • Hard and Soft EM in Bayesian Network Learning from Incomplete Data
    algorithms Article Hard and Soft EM in Bayesian Network Learning from Incomplete Data Andrea Ruggieri 1,†, Francesco Stranieri 1,† , Fabio Stella 1 and Marco Scutari 2,* 1 Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, 20126 Milano, Italy; [email protected] (A.R.); [email protected] (F.S.); [email protected] (F.S.) 2 Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), 6962 Viganello, Switzerland * Correspondence: [email protected] † These authors contributed equally to this work. Received: 18 November 2020; Accepted: 7 December 2020; Published: 9 December 2020 Abstract: Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics (“soft EM”) using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data (“hard EM”) to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data.
    [Show full text]
  • The Meaning and Use of Ideophones in Siwu
    PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/91241 Please be advised that this information was generated on 2014-12-22 and may be subject to change. The Meaning and Use of Ideophones in Siwu Mark Dingemanse The Meaning and Use of Ideophones in Siwu Copyright © Mark Dingemanse, 2011 Cover photo: Making palm oil in Akpafu-Mempeasem, Ghana, March 2008 The Meaning and Use of Ideophones in Siwu Een wetenschappelijke proeve op het gebied van Letteren Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann volgens besluit van college van decanen in het openbaar te verdedigen op maandag 24 oktober 2011 om 15.30 precies door Mark Dingemanse geboren op 5 mei 1983 te Middelburg Promotoren Prof. Dr. Stephen C. Levinson Prof. Dr. Gunter Senft (Universität zu Köln) Copromotor Dr. Felix K. Ameka (Universiteit Leiden) Manuscriptcommissie Prof. Dr. Pieter Muysken Prof. Dr. Tom Güldemann (Humboldt-Universität zu Berlin) Dr. Sotaro Kita (University of Birmingham) This thesis comes with a website containing online supplementary materials: http://thesis.ideophone.org/. Materials available online are marked (S). The research reported in this thesis was supported by a grant from the Max- Planck-Gesellschaft zur Förderung der Wissenschaften, München, Germany. Outline Acknowledgements.................................................................................................................i
    [Show full text]