Algorithms for Generalized Topic Modeling

Total Page:16

File Type:pdf, Size:1020Kb

Algorithms for Generalized Topic Modeling Algorithms for Generalized Topic Modeling Avrim Blum Nika Haghtalab Toyota Technological Institute at Chicago Computer Science Department [email protected] Carnegie Mellon University [email protected] Abstract lations, like shooting a free throw or kicking a field goal. Better would be a model in which sentences are drawn i.i.d. Recently there has been significant activity in developing al- from a distribution over sentences. Even better would be gorithms with provable guarantees for topic modeling. In this paragraphs drawn i.i.d. from a distribution over paragraphs work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that (this would account for the word correlations that exist within words are drawn i.i.d. and instead view a topic as a complex a coherent paragraph). Or, even better, how about a model in distribution over sequences of paragraphs. Since one could which paragraphs are drawn non-independently, so that the not hope to even represent such a distribution in general (even second paragraph in a document can depend on what the first if paragraphs are given using some natural feature representa- paragraph was saying, though presumably with some amount tion), we aim instead to directly learn a predictor that given a of additional entropy as well? This is the type of model we new document, accurately predicts its topic mixture, without study here. learning the distributions explicitly. We present several natural Note that an immediate problem with considering such conditions under which one can do this from unlabeled data a model is that now the task of learning an explicit distri- only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More bution (over sentences or paragraphs) is hopeless. While a generally, our model can be viewed as a generalization of the distribution over words can be reasonably viewed as a proba- multi-view or co-training setting in machine learning. bility vector, one could not hope to learn or even represent an explicit distribution over sentences or paragraphs. Indeed, except in cases of plagiarism, one would not expect to see 1 Introduction the same paragraph twice in the entire corpus. Moreover, this Topic modeling is an area with significant recent work is likely to be true even if we assume paragraphs have some in the intersection of algorithms and machine learning natural feature-vector representation. Instead, we bypass this [4, 5, 3, 1, 2, 8]. In topic modeling, a topic (such as sports, issue by aiming to directly learn a predictor for documents— business, or politics) is modeled as a probability distribution that is, a function that given a document, predicts its mixture over words, expressed as a vector ai. A document is gener- over topics—without explicitly learning topic distributions. ated by first selecting a mixture w over topics, such as 80% Another way to think of this is that our goal is not to learn sports and 20% business, and then choosing words i.i.d. from a model that could be used to write a new document, but the associated mixture distribution, which in this case would instead just a model that could be used to classify a document be 0:8asports +0:2abusiness. Given a large collection of such written by others. This is much as in standard supervised documents (and some assumptions about the distributions learning where algorithms such as SVMs learn a decision ai as well as the distribution over mixture vectors w) the boundary (such as a linear separator) for making predictions goal is to recover the topic vectors ai and then to use the ai on the labels of examples without explicitly learning the dis- to correctly classify new documents according to their topic tributions D+ and D− over positive and negative examples mixtures. Algorithms for this problem have been developed respectively. However, our setting is unsupervised (we are with strong provable guarantees even when documents con- not given labeled data containing the correct classifications sist of only two or three words each [5, 1, 18]. In addition, of the documents in the training set) and furthermore, rather algorithms based on this problem formulation perform well than each data item belonging to one of the k classes (topics), empirically on standard datasets [9, 15]. each data item belongs to a mixture of the k topics. Our goal As a theoretical model for document generation, however, is given a new data item to output what that mixture is. an obvious problem with the standard topic modeling frame- We begin by describing our high level theoretical formula- work is that documents are not actually created by inde- tion. This formulation can be viewed as a generalization both pendently drawing words from some distribution. Moreover, of standard topic modeling and of multi-view learning or co- important words within a topic often have meaningful corre- training [10, 12, 11, 7, 21]. We then describe several natural Copyright c 2018, Association for the Advancement of Artificial assumptions under which we can indeed efficiently solve the Intelligence (www.aaai.org). All rights reserved. problem, learning accurate topic mixture predictors. Pk 2 Preliminaries mixture distribution i=1 wiai. The document vector x^ is We assume that paragraphs are described by n real-valued then the vector of word counts, normalized by dividing by features and so can be viewed as points x in an instance the number of words in the document so that kx^k1 = 1. space X ⊆ Rn. We assume that each document consists of As a thought experiment, consider infinitely long docu- at least two paragraphs and denote it by (x1; x2). Further- ments. In the standard framework, all infinitely long docu- more, we consider k topics and partial membership func- ments of a mixture weight w have the same representation tions f ; : : : ; f : X! [0; 1], such that f (x) determines Pk 1 k i x = i=1 wiai. This representation implies x · vi = wi for the degree to which paragraph x belongs to topic i, and, all i 2 [k], where V = (v1;:::; vk) is the pseudo-inverse Pk k i=1 fi(x) = 1. For any vector of probabilities w 2 R — of A = (a1;:::; ak). Thus, by partitioning the document which we sometimes refer to as mixture weights — we define into two halves (views) x1 and x2, our noise-free model with w n X = fx 2 R j 8i; fi(x) = wig to be the set of all para- fi(x) = vi · x generalizes the standard topic model for long graphs with partial membership values w. We assume that documents. However, our model is substantially more gen- both paragraphs of a document have the same partial mem- eral: features within a view can be arbitrarily correlated, the 1 2 S w w bership values, that is (x ; x ) 2 w X × X , although views themselves can also be correlated, and even in the zero- we also allow some noise later on. To better relate to the noise case, documents of the same mixture can look very literature on multi-view learning, we also refer to topics as different so long as they have the same projection to the span “classes” and refer to paragraphs as “views” of the document. of the a1;:::; ak. Much like the standard topic models, we consider an unla- For a shorter document x^, each feature x^i is drawn accord- Pk beled sample set that is generated by a two-step process. First, ing to a distribution with mean xi, where x = i=1 wiai. we consider a distribution P over vectors of mixture weights Therefore, x^ can be thought of as a noisy measurement of x. and draw w according to P. Then we consider distribution The fewer the words in a document, the larger is the noise in Dw over the set X w × X w and draw a document (x1; x2) x^. Existing work in topic modeling, such as [5, 2], provide according to Dw. We consider two settings. In the first set- elegant procedures for handling large noise that is caused ting, which is addressed in Section3, the learner receives the by drawing only 2 or 3 words according to the distribution instance (x1; x2). In the second setting, the learner receives induced by x. As we show in Section4, our method can samples (x^1; x^2) that have been perturbed by some noise. We also tolerate large amounts of noise under some conditions. discuss two noise models in Sections4 and F.2. In both cases, While our method cannot deal with documents that are only the goal of the learner is to recover the partial membership 2- or 3-words long, the benefit is a model that is much more functions fi. general in many other respects. More specifically, in this work we consider partial mem- Generalization of Co-training Framework bership functions of the form fi(x) = f(vi · x), where n v1;:::; vk 2 R are linearly independent and f : R ! [0; 1] Here, we briefly discuss how our model is a generalization of is a monotonic function.1 For the majority of this work, we the co-training framework. The standard co-training frame- consider f to be the identity function, so that fi(x) = vi · x. work of [10] considers learning a binary classifier from pri- 1 2 Define ai 2 spanfv1;:::; vkg such that vi · ai = 1 and marily unlabaled instances, where each instance (x ; x ) is vj ·ai = 0 for all j 6= i. In other words, the matrix containing a pair of views that have the same classification.
Recommended publications
  • Avrim Blum, John Hopcroft, Ravindran Kannan
    Foundations of Data Science This book provides an introduction to the mathematical and algorithmic founda- tions of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decompo- sition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms, and analysis for clustering, probabilistic models for large networks, representation learning including topic modeling and nonnegative matrix factorization, wavelets, and compressed sensing. Important probabilistic techniques are developed including the law of large num- bers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data. Avrim Blum is Chief Academic Officer at the Toyota Technological Institute at Chicago and formerly Professor at Carnegie Mellon University. He has over 25,000 citations for his work in algorithms and machine learning. He has received the AI Journal Classic Paper Award, ICML/COLT 10-Year Best Paper Award, Sloan Fellowship, NSF NYI award, and Herb Simon Teaching Award, and is a fellow of the Association for Computing Machinery. John Hopcroft is the IBM Professor of Engineering and Applied Mathematics at Cornell University. He is a member National Academy of Sciences and National Academy of Engineering, and a foreign member of the Chinese Academy of Sciences.
    [Show full text]
  • Caltech/Mit Voting Technology Project
    CALTECH/MIT VOTING TECHNOLOGY PROJECT A multi-disciplinary, collaborative project of the California Institute of Technology – Pasadena, California 91125 and the Massachusetts Institute of Technology – Cambridge, Massachusetts 02139 ADVANCES IN CRYPTOGRAPHIC VOTING SYSTEMS BEN ADIDA MIT Key words: voting systems, cryptographic, election administration, secret- ballot elections VTP WORKING PAPER #51 September 2006 Advances in Cryptographic Voting Systems by Ben Adida Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY August 2006 c Massachusetts Institute of Technology 2006. All rights reserved. Author . .................................................................. Department of Electrical Engineering and Computer Science August 31st, 2006 Certified by . ............................................................. Ronald L. Rivest Viterbi Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . ............................................................ Arthur C. Smith Chairman, Department Committee on Graduate Students 2 Advances in Cryptographic Voting Systems by Ben Adida Submitted to the Department of Electrical Engineering and Computer Science on August 31st, 2006, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract Democracy depends on the proper administration of popular
    [Show full text]
  • Manuel Blum's CV
    Manuel Blum [email protected] MANUEL BLUM Bruce Nelson Professor of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Telephone: Office: (412) 268-3742, Fax: (412) 268-5576 Home: (412) 687-8730, Mobile: (412) 596-4063 Email: [email protected] Personal Born: 26 April1938 in Caracas, Venezuela. Citizenship: Venezuela and USA. Naturalized Citizen of USA, January 2000. Wife: Lenore Blum, Distinguished Career Professor of Computer Science, CMU Son: Avrim Blum, Professor of Computer Science, CMU. Interests: Computational Complexity; Automata Theory; Algorithms; Inductive Inference: Cryptography; Program Result-Checking; Human Interactive Proofs. Employment History Research Assistant and Research Associate for Dr. Warren S. McCulloch, Research Laboratory of Electronics, MIT, 1960-1965. Assistant Professor, Department of Mathematics, MIT, 1966-68. Visiting Assistant Professor, Associate Professor, Professor, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 1968-2001. Associate Chair for Computer Science, U.C. Berkeley, 1977-1980. Arthur J. Chick Professor of Computer Science, U.C. Berkeley, 1995-2001. Group in Logic and Methodology of Science, U.C. Berkeley, 1974-present. Visiting Professor of Computer Science. City University of Hong Kong, 1997-1999. Bruce Nelson Professor of Computer Science, Carnegie Mellon University, 2001-present. Education B.S., Electrical Engineering, MIT, 1959; M.S., Electrical Engineering, MIT. 1961. Ph.D., Mathematics, MIT, 1964, Professor Marvin Minsky, supervisor. Manuel Blum [email protected] Honors MIT Class VIB (Honor Sequence in EE), 1958-61 Sloan Foundation Fellowship, 1972-73. U.C. Berkeley Distinguished Teaching Award, 1977. Fellow of the IEEE "for fundamental contributions to the abstract theory of computational complexity,” 1982.
    [Show full text]
  • Foundations of Data Science∗
    Foundations of Data Science∗ Avrim Blum, John Hopcroft and Ravindran Kannan Thursday 9th June, 2016 ∗Copyright 2015. All rights reserved 1 Contents 1 Introduction 8 2 High-Dimensional Space 11 2.1 Introduction . 11 2.2 The Law of Large Numbers . 11 2.3 The Geometry of High Dimensions . 14 2.4 Properties of the Unit Ball . 15 2.4.1 Volume of the Unit Ball . 15 2.4.2 Most of the Volume is Near the Equator . 17 2.5 Generating Points Uniformly at Random from a Ball . 20 2.6 Gaussians in High Dimension . 21 2.7 Random Projection and Johnson-Lindenstrauss Lemma . 23 2.8 Separating Gaussians . 25 2.9 Fitting a Single Spherical Gaussian to Data . 27 2.10 Bibliographic Notes . 29 2.11 Exercises . 30 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) 38 3.1 Introduction and Overview . 38 3.2 Preliminaries . 39 3.3 Singular Vectors . 41 3.4 Singular Value Decomposition (SVD) . 44 3.5 Best Rank-k Approximations . 45 3.6 Left Singular Vectors . 47 3.7 Power Method for Computing the Singular Value Decomposition . 49 3.7.1 A Faster Method . 50 3.8 Singular Vectors and Eigenvectors . 52 3.9 Applications of Singular Value Decomposition . 52 3.9.1 Centering Data . 52 3.9.2 Principal Component Analysis . 53 3.9.3 Clustering a Mixture of Spherical Gaussians . 54 3.9.4 Ranking Documents and Web Pages . 59 3.9.5 An Application of SVD to a Discrete Optimization Problem . 60 3.10 Bibliographic Notes . 63 3.11 Exercises .
    [Show full text]
  • Computational Learning Theory
    UC Berkeley, CS 294-220 Spring 2021 Computational Learning Theory Staff Prof. Shafi Goldwasser [email protected] Jonathan Shafer [email protected] About this Course This course is an introduction to learning theory. We will present a few definitions of learning, and then proceed to investigate what can (and cannot) be learned according to each definition, and what amount of computational resources (especially data and compute time) are necessary in the cases where learning is possible. Attention will be devoted to understanding definitions and proofs. The course will generally not cover applications and state-of-the-art technologies. See a list of topics on the following page. Recommended Textbooks 1. Shalev-Shwartz and Ben-David. Understanding Machine Learning: From Theory to Algorithms. [SB14] 2. Mohri, Rostamizadeh and Talwalkar. Foundations of Machine Learning. [MRT18] 3. Vapnik. The Nature of Statistical Learning Theory. [Vap00] Homework: Six problem sets will be assigned throughout the semester. Grading Students may choose between the following options: 1. Six problem sets: 100% 2. Four problem sets: 65%; Final research project: 35%. Prerequisites: CS 170. 1 Last update: November 9, 2020 UC Berkeley, CS 294-220 Spring 2021 Outline The following is preliminary and very much subject to change. Unit Topic References [MRT18, Ch. 16] Language identification in the limit 1 [KV94, Ch. 8] Learning with queries [Gol67, BB75, Ang80, Ang87] PAC learning: { Definitions: PAC, agnostic PAC [SB14, Ch. 3-5] 2 { Learning via uniform convergence [MRT18, Ch. 2] { No free lunch PAC learning: { VC dimension [SB14, Ch. 6] 3 { Sauer's lemma [MRT18, Ch. 3] { Fundamental theorem of PAC learning { Part I PAC learning: { Rademacher complexity [SB14, Ch.
    [Show full text]
  • (CONSCS) Systems
    3/2/2004 NSF ITR-(ASE+ECS+NHS)-(dmc+soc+int) Mathematical Foundations for Understanding and Designing Conceptualizing Strategizing Control (CONSCS) Systems Manuel Blum (PI), Avrim Blum (Co-PI), Lenore Blum (Co-PI), Steven Rudich (Co-PI), Research Assistants: Ryan Williams, Luis von Ahn From Collins English Dictionary, 1995: conscious adj. 4.a. denoting or relating to a part of the human mind that is aware of a person’s self, environment, and mental activity and that to a certain extent determines his choices of action. b. (as n.) the conscious is only a small part of the mind. Compare unconscious. 1. Introduction: Proposal Objectives and Expected Significance. There is a critical need for, and a great deal of interest in, the design of self-aware programs, robots, and systems. Robots that are aware of their own state and the state of their surroundings are better able to deal with unexpected adversity, changes in goals, new information, and to make long-term plans. Indeed, a number of areas of AI deal with different models and approaches to self-awareness, as for example the explicitly modeled belief-state of a robot in a Partially Observable Markov Decision Process (POMDP) framework [KaelLitCas, JaaSinJor, Monahan, SutBar]. Our goal in this proposal is to initiate an automata-theoretic study of self-awareness, and what we would mean by a truly self-aware autonomous system, in terms of both its behavior and its internal design. More ambitiously, we would like to develop a set of definitions for an automata-theoretic version of consciousness, what we call a CONceptualizing Strategizing Control System (CONSCS), that would enable us to prove theorems about CONSCSness, and would help to support the eventual development of a truly self-aware robot.
    [Show full text]
  • Santosh S. Vempala
    Santosh S. Vempala Professor of Computer Science Office Phone: (404) 385-0811 Georgia Institute of Technology Home Phone: (404) 855-5621 Klaus 2222, 266 Ferst Drive Email: [email protected] Atlanta, GA 30332 http://www.cc.gatech.edu/~vempala Research The theory of algorithms; algorithmic tools for sampling, learning, optimiza- Interests tion and data analysis; high-dimensional geometry; randomized linear algebra; Computing-for-Good (C4G). Education Carnegie Mellon University, (1993-1997) School of CS, Ph.D. in Algorithms, Combinatorics and Optimization. Thesis: \Geometric Tools for Algorithms." Advisor: Prof. Avrim Blum Indian Institute of Technology, New Delhi, (1988-1992) B. Tech. in Computer Science. Awards Best Paper ACM-SIAM Symposium on Discrete Algorithms (SODA), 2021. Best Paper ITU Kaleideschope, 2019. Gem of PODS (test of time award), 2019. Edenfield Faculty Fellowship, 2019. CIOS Course Effectiveness Teaching Award, 2016. Frederick Storey chair, 2016-. ACM fellow, 2015. Open Source Software World Challenge, Gold prize, 2015. GT Class of 1934 Outstanding Interdisciplinary Activities Award, 2012. Georgia Trend 40-under-40, 2010. Raytheon fellow, 2008. Distinguished Professor of Computer Science, 2007- Guggenheim Fellow, 2005 Alfred P. Sloan Research Fellow, 2002 NSF Career Award, 1999 Miller Fellow, U.C. Berkeley, 1998 IEEE Machtey Prize, 1997 Best Undergraduate Thesis Award, IIT Delhi, 1992 Appointments Aug 06 { Georgia Tech, Professor, Computer Science (primary), ISYE, Mathematics. Aug 06{Apr 11 Founding Director, Algorithms and Randomness Center (ARC), Georgia Tech. Jul 03{Jul 07 MIT, Associate Professor of Applied Mathematics Jul 98{Jun 03 MIT, Assistant Professor of Applied Mathematics Sep 97{Jun 98 MIT, Applied Mathematics Instructor. 2001, 03, 06, 14 Microsoft Research, Visiting Researcher.
    [Show full text]
  • A Discriminative Framework for Clustering Via Similarity Functions
    A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan∗ Avrim Blum∗ Santosh Vempala† Dept. of Computer Science Dept. of Computer Science College of Computing Carnegie Mellon University Carnegie Mellon University Georgia Inst. of Technology Pittsburgh, PA Pittsburgh, PA Atlanta, GA [email protected] [email protected] [email protected] ABSTRACT new efficient algorithms that are able to take advantage of them. Problems of clustering data from pairwise similarity information Our algorithms for hierarchical clustering combine recent learning- are ubiquitous in Computer Science. Theoretical treatments typi- theoretic approaches with linkage-style methods. We also show cally view the similarity information as ground-truth and then de- how our algorithms can be extended to the inductive case, i.e., by sign algorithms to (approximately) optimize various graph-based using just a constant-sized sample, as in property testing. The anal- objective functions. However, in most applications, this similarity ysis here uses regularity-type results of [20] and [3]. information is merely based on some heuristic; the ground truth is Categories and Subject Descriptors: F.2.0 [Analysis of Algo- really the unknown correct clustering of the data points and the real rithms and Problem Complexity]: General goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In partic- General Terms: Algorithms, Theory ular, motivated by recent work in learning theory that asks “what Keywords: Clustering, Similarity Functions, Learning. natural properties of a similarity (or kernel) function are sufficient to be able to learn well?” we ask “what natural properties of a 1.
    [Show full text]
  • Bespeak Dissertation Fast
    bespeak Dissertation fast The following is an academic genealogy of computer scientists and is constructed by following the pedigree of thesis advisors. == Europe == === Denmark === Peter Naur (Olivier Danvy) === Finland === Arto Salomaa === France === Many French computer scientists worked at the National Institute for Research in Computer Science and Control (INRIA). Marcel-Paul Schützenberger Maurice Nivat Philippe Flajolet Gérard Huet Francois Fages Thierry Coquand Hugo Herbelin Xavier Leroy Christine Paulin-Mohring Didier Rémy François Pottier Bruno Courcelle Louis Nolin Bernard Robinet Emmanuel Saint-James Olivier Danvy (Secondary advisor: Emmanuel Saint-James) Jean-François Perrot Jacques Sakarovitch Jean-Eric Pin Pascal Weil Gérard Berry Gilles Kahn Patrick Cousot Alain Colmerauer === Germany === Karl Steinbuch Franz Baader Carl Adam Petri Martin Odersky === Italy === Corrado Böhm Ugo Montanari Paolo Ciancarini Roberto Gorrieri Nadia Busi Davide Sangiorgi === Netherlands === ==== Van Wijngaarden / Dijkstra ==== Adriaan van Wijngaarden was director of the computer science department at the Centrum Wiskunde & Informatica. It was influential in the development of ALGOL 68. Cornelis Benjamin Biezeno (1933: honoris causa. Universiteit van Amsterdam) Adriaan van Wijngaarden (1945: Enige toepassingen van Fourierintegralen op elastische problemen. Technische Universiteit Delft) Willem van der Poel (1956: The Logical Principles of Some Simple Computers. Universiteit van Amsterdam) Gerard Holzmann (1979: Coordination Problems in Multiprocessing
    [Show full text]
  • A Theoretical Computer Science Perspective on Consciousness1
    A Theoretical Computer Science Perspective on Consciousness1 Manuel Blum and Lenore Blum2 ABSTRACT The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. This paper studies consciousness from the perspective of theoretical computer science. It formalizes the Global Workspace Theory (GWT) originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, and others. Our major contribution lies in the precise formal definition of a Conscious Turing Machine (CTM), also called a Conscious AI. We define the CTM in the spirit of Alan Turing’s simple yet powerful definition of a computer, the Turing Machine (TM). We are not looking for a complex model of the brain nor of cognition but for a simple model of (the admittedly complex concept of) consciousness. After formally defining CTM, we give a formal definition of consciousness in CTM. We later suggest why the CTM has the feeling of consciousness. The reasonableness of the definitions and explanations can be judged by how well they agree with commonly accepted intuitive concepts of human consciousness, the range of related concepts that the model explains easily and naturally, and the extent of its agreement with scientific evidence. INTRODUCTION Thanks to major advances in cognitive neuroscience, science is on the brink of understanding how the brain achieves consciousness. In 1988, cognitive neuroscientist Bernard Baars proposed a Global Workspace Theory (GWT) of the brain, sketched its architecture, and outlined its implications for understanding consciousness. See (Baars B. J., 1988) and (Baars B. J., 2019). That, together with the invention of fMRI in 1990, and the seminal investigations by Francis Crick and Christof Koch (Crick & Koch, 1990) into the neural correlates of consciousness, helped shake off the taboo on the scientific study of consciousness.
    [Show full text]
  • CV- Maria-Florina Balcan 1
    MARIA-FLORINA BALCAN Carnegie Mellon University Pittsburgh, PA 15213-3891 [email protected] www.cs.cmu.edu/~ninamf RESEARCH INTERESTS Learning Theory, Machine Learning, Theory of Computing, Artificial Intelligence, Algorithmic Economics and Algorithmic Game Theory, Optimization. APPOINTMENTS • July 2020 - present Professor, Cadence Design Systems Chair in Computer Science, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. • June 2014 – 2020 Associate Professor, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. (tenured in 2016) • 2009 - 2014 Assistant Professor, College of Computing, Georgia Tech, Atlanta, GA. • 2008 - 2009 Postdoctoral Researcher, Microsoft Research NE, Cambridge, MA. • 2000 - 2002 Instructor, Computer Science Department, University of Bucharest, Romania. EDUCATION • Ph.D. 2002 – 2008. Carnegie Mellon University, Pittsburgh, PA. Computer Science Department. • M.S. 2000 – 2002. University of Bucharest, Romania, Faculty of Mathematics, Computer Science Department, M.S. Degree in Computer Science, GPA 10.00 / 10.00. • B.S. 1996 – 2000. University of Bucharest, Romania. Faculty of Mathematics, Computer Science Dept., GPA 10.00/10.00. “Summa Cum Laude” Diploma. HONORS AND AWARDS Major Leadership Positions • General Chair for the 38th International Conference on Machine Learning (ICML) 2021. • Program Committee Co-Chair for the 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. • Invited Co-organizer for the “Machine Learning and Computational Modelling” Session of the Japanese- American- German Kavli Frontiers of Science Symposium, National Academy of Sciences, 2017. • Co-organizer for “Foundations of Machine Learning”, semester long program at the Simons Institute for Theory of Computing, 2017. • Program Committee Co-Chair for the 33rd International Conference on Machine Learning (ICML) 2016. • Program Committee Co-Chair for the 27th Annual Conference on Learning Theory (COLT) 2014.
    [Show full text]