REZA ZADEH [email protected] Reza-Zadeh.Com 650.898.3193
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Halifax, Nova Scotia Canada August 13 17, 2017
Halifax, Nova Scotia Canada August 13 17, 2017 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining Contents KDD 2017 Agenda at a Glance KDD 2017 Chairs’ Welcome Message Program Highlights Keynote Talks Research and Applied Data Science Tracks Applied Data Science Track Invited Talks Applied Data Science Panels Tutorials Hands‐On Tutorials Workshops KDD 2017 Tutorial Program KDD 2017 Workshop Program Full Day Workshops ‐ Monday August 14, 8:00am ‐5:00pm Half Day Workshops ‐ Monday August 14, 8:00am ‐ 12:00pm Half Day Workshops ‐ Monday August 14, 1:00am ‐ 5:00pm KDD Cup Workshop ‐ Wednesday August 16, 1:30pm ‐ 5:00pm KDD 2017 Hands‐On Tutorial Program Tuesday August 15, 2017 Wednesday August 16, 2017 Thursday August 17, 2017 KDD 2017 Conference Program Monday August 14 2017 Detailed Program Monday August 14, 2017 5:15pm – 7:00pm, KDD 2017 Opening Session ‐ Scoabank Centre Tuesday August 15, 2017 Detailed Program Wednesday August 16, 2017 Detailed Program Thursday August 17, 2017 Detailed Program KDD 2017 Conference Organizaon KDD 2017 Organizing Commiee Research Track Senior Program Commiee Applied Data Science Track Senior Program Commiee Research Track Program Commiee Applied Data Science Track Program Commiee KDD 2017 Sponsors & Supporters Halifax, Points of Interest Useful Links and Emergency Contacts KDD 2017 Agenda at a Glance Saturday, August 12th Level 8 Summit 8:00AM 5:00PM Workshop: Broadening Participation in Data Mining (BPDM) Day 1 Suite/Meeting Room 5 4:00PM 6:00PM KDD 2017 Registration Level 1 Atrium -
Face Recognition of Pedestrians from Live Video Stream Using Apache Spark Streaming and Kafka
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-7 Issue-5, February 2018 Face Recognition of Pedestrians from Live Video Stream using Apache Spark Streaming and Kafka Abhinav Pandey, Harendra Singh Abstract: Face recognition of pedestrians by analyzing live video It also can augment data warehouse solutions by serving as a streaming, aims to identify movements and faces by performing buffer to process new data for inclusion in the data image matching with existing images using Apache Spark warehouse or to remove infrequently accessed or aged data. Streaming, Kafka and OpenCV, on distributed platform and derive decisions. Since video processing and analysis from BigData can lead to improvements in overall operations by multiple resources become slow when using Cloud or even any giving organizations greater visibility into operational issue. single highly configured machine, hence for making quick Operational insights might depends on machine data, which decisions and actions, Apache Spark Streaming and Kafka have can include anything from computers to sensors or meters to been used as real time analysis frameworks, which deliver event GPS devices, BigData provides unprecedented insight on based decisions making on Hadoop distributed environment. If customers’ decision-making processes by allowing continuous live events analysis is possible then the decision can make there-after or at the same time. And large amount videos in companies to track and analyze shopping patterns, parallel processing are also not a bottleneck after getting the recommendations, purchasing behaviours and other drivers involvement of Hadoop because base of all real time analysis that knows to influence the sales. -
Big Graph Analytics Platforms
Full text available at: http://dx.doi.org/10.1561/1900000056 Big Graph Analytics Platforms Da Yan The University of Alabama at Birmingham [email protected] Yingyi Bu Couchbase, Inc. [email protected] Yuanyuan Tian IBM Almaden Research Center, USA [email protected] Amol Deshpande University of Maryland [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/1900000056 Foundations and Trends R in Databases Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is D. Yan, Y. Bu, Y. Tian, and A. Deshpande. Big Graph Analytics Platforms. Foundations and Trends R in Databases, vol. 7, no. 1-2, pp. 1–195, 2015. R This Foundations and Trends issue was typeset in LATEX using a class file designed by Neal Parikh. Printed on acid-free paper. ISBN: 978-1-68083-242-6 c 2017 D. Yan, Y. Bu, Y. Tian, and A. Deshpande All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen- ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). -
An Introduction to Deep Reinforcement Learning Full Text Available At
Full text available at: http://dx.doi.org/10.1561/2200000071 An Introduction to Deep Reinforcement Learning Full text available at: http://dx.doi.org/10.1561/2200000071 Other titles in Foundations and Trends® in Machine Learning Non-convex Optimization for Machine Learningy Prateek Jain and Purushottam Ka ISBN: 978-1-68083-368-3 Kernel Mean Embedding of Distributions: A Review and Beyond Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur and Bernhard Scholkopf ISBN: 978-1-68083-288-4 Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama and Danilo P. Mandic ISBN: 978-1-68083-222-8 Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama and Danilo P. Mandic ISBN: 978-1-68083-276-1 Patterns of Scalable Bayesian Inference Elaine Angelino, Matthew James Johnson and Ryan P. Adams ISBN: 978-1-68083-218-1 Generalized Low Rank Models Madeleine Udell, Corinne Horn, Reza Zadeh and Stephen Boyd ISBN: 978-1-68083-140-5 Full text available at: http://dx.doi.org/10.1561/2200000071 An Introduction to Deep Reinforcement Learning Vincent François-Lavet Peter Henderson McGill University McGill University [email protected] [email protected] Riashat Islam Marc G. Bellemare McGill University Google Brain [email protected] [email protected] Joelle Pineau Facebook, McGill University [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/2200000071 Foundations and Trends® in Machine Learning Published, sold and distributed by: now Publishers Inc. -
MMDS 2014: Workshop on Algorithms for Modern Massive Data Sets
MMDS 2014: Workshop on Algorithms for Modern Massive Data Sets Stanley Hall University of California at Berkeley June 17{20, 2014 The 2014 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2014) will address algorithmic and statistical challenges in modern large-scale data analysis. The goals of MMDS 2014 are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas. Organizers: Michael Mahoney, Alexander Shkolnik Petros Drineas, Reza Zadeh, and Fernando Perez Workshop Schedule Tuesday, June 17, 2014: Data Analysis and Statistical Data Analysis Time Event Location/Page Registration & Opening Lobby outside Stanley Hall 8:00{9:45am Breakfast and registration 9:45{10:00am Organizers Welcome and opening remarks First Session Stanley Hall 10:00{11:00am Ashok Srivastava, Verizon pp. 13 Large Scale Machine Learning at Verizon (Tutorial) 11:00{11:30am Dan Suciu, University of Washington pp. 14 Communication Cost in Big Data Processing 11:30{12:00pm Gerald Friedland, ICSI pp. 9 Content-based search in 50TB of consumer-produced videos 12:00{2:00pm Lunch (on your own) Second Session Stanley Hall 2:00{2:30pm Bill Howe, University of Washington pp. 10 Myria: Scalable Analytics as a Service 2:30{3:00pm Devavrat Shah, MIT pp. 13 Computing stationary distribution, locally 3:00{3:30pm Yiannis Koutis, University of Puerto Rico pp. 12 Spectral algorithms for graph mining and analysis 3:30{4:00pm Jiashun Jin, Carnegie Mellon University pp. -
Towards Effective and Efficient Learning at Scale
Effective and Efficient Learning at Scale Adams Wei Yu AUGUST 2019 CMU-ML-19-111 Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Jaime Carbonell, Co-Chair Alexander Smola, Co-Chair (Amazon) Ruslan Salakhutdinov Quoc Le (Google) Christopher Manning (Stanford) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2019 Adams Wei Yu This research was supported by a grant from the Boeing Company, a fellowship from the Nvidia Corporation, a fellowship from the Snap Inc, a fellowship from the Thomas and Stacey Siebel Foundation and a Carnegie Mellon University presidential fellowship. Keywords: Large Scale Optmization, Deep Neural Networks, Machine Reading Compre- hension, Natural Language Processing To my beloved wife, Yanyu Long and son, Ziheng Yu. iv Abstract How to enable efficient and effective machine learning at scale has been a long- standing problem in modern artificial intelligence, which also motivates this thesis research. In particular, we aim at solving the following problems: 1. How to efficiently train a machine learning model? 2. How to speed up inference after the model is trained? 3. How to make the model generalize better? We approach those problems from two perspectives: models and algorithms. On one hand, we design novel models that are intrinsically fast to train and/or test. On the other, we develop new algorithms with rapid convergence guarantee. Not surprisingly, the above three problem are not mutually exclusive and thus solving one of them might also benefit others. For example, 1) a new model that can enable parallel computation helps accelerate both training and inference; 2) a fast algorithm can save time for hyper-parameter tuning and/or make it affordable for training with more data, which in return boosts the generalization performance. -
J:\19-UTK~1\166-Ramkrushna C. M
I J C T A, 10(9), 2017, pp. 639-645 ISSN: 0974-5572 © International Science Press Survey on Use Cases of Apache Spark Ramkrushna C. Maheshwar1, Haritha Donavalli2 and Bailappa Bhovi3 ABSTRACT This paper mainly focuses on the advantages of Apache Spark and different use cases of Apache Spark. In Big Data Analytics, data can be processed in two ways, one is Disk Based Computation Model, second is In Memory Data Processing Model. Disadvantage with Disk based model is in each iteration of data processing temporary results are kept in disk. If want process next iteration we have to read data from disk, it consumes too much time for reading and writing data from the disk. Advantages with In Memory data processing model, the iterative results are stored in cache memory and for further iteration process data can be directly fetched for execution from cache memory. We could speed up the data processing as the data is directly fetched from cache memory. As Apache Spark is InMemory Data Processing Model. It is used when the system need high throughput with low latency. Apache Spark is a freeware basically used for iterative computations and real time processing on similar data. Index Terms: Big Data Analytics, Apache Spark, Use cases of Apache Spark, Time Series Data Analytics, Hadoop Map Reduce. 1. INTRODUCTION The Bigdata is collection of large amount of data, Processing on that been done in distributed manner. The multiple computers were used to process the data, which we normally call as commodity hardware. Big Data components are Hadoop, HDFS, Name Node, Data Node, Map Function and Reduce Function. -
KDD'15 Chairs and Organizing Committee
KDD’15 Chairs and Organizing Committee Honorary Chair: Usama Fayyad (ChoozOn Corporation) General Chairs: Longbing Cao (University of Technology, Sydney) Chengqi Zhang (University of Technology, Sydney) Program Chairs: Thorsten Joachims (Cornell University) Geoff Webb (Monash University) Industry and Government Track Dragos Margineantu (Boeing Research) Program Chairs: Graham Williams (Australian Taxation Office) Industry and Government Track Rajesh Parekh (Groupon) Invited Talks Chairs: Usama Fayyad (ChoozOn Corporation) Workshop Chairs: Johannes Fuernkranz (Technische Universität Darmstadt) Tina Eliassi-Rad (Rutgers University) Tutorial Chairs: Jian Pei (Simon Fraser University) Zhihua Zhou (Nanjing University) KDD Cup Chairs: Jie Tang (Tsinghua University) Ron Bekkerman (University of Haifa) Panel Chairs: Hugh Durrant-Whyte (Nicta) Katharina Morik (Technische Universität Dortmund) Poster Chairs: Dacheng Tao (University of Technology, Sydney) Hui Xiong (Rutgers University) Best Paper Chairs: Jure Lescovec (Stanford University) Kristian Kersting (Technische Universität Dortmund) Doctoral Dissertation Award Chair: Kyuseok Shim (Seoul National University) Innovation and Service Award Chair: Ted Senator (Leidos) Test-of-Time Paper Award Chair: Martin Ester (Simon Fraser University) Local Arrangements Chairs: Jinjiu Li (IDA) Shannon Cunningham (Executivevents) Guandong Xu (University of Technology, Sydney) Student Travel Award Chairs: Wei Wang (UCLA) Xingquan Zhu (Florida Atlantic University) Jeffrey Yu (Chinese University of Hong Kong) -
Generalized Low Rank Models
Full text available at: http://dx.doi.org/10.1561/2200000055 Generalized Low Rank Models Madeleine Udell Operations Research and Information Engineering Cornell University [email protected] Corinne Horn Electrical Engineering Stanford University [email protected] Reza Zadeh Computational and Mathematical Engineering Stanford University [email protected] Stephen Boyd Electrical Engineering Stanford University [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/2200000055 R Foundations and Trends in Machine Learning Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is M. Udell, C. Horn, R. Zadeh and S. Boyd. Generalized Low Rank Models. Foundations and Trends R in Machine Learning, vol. 9, no. 1, pp. 1–118, 2016. R This Foundations and Trends issue was typeset in LATEX using a class file designed by Neal Parikh. Printed on acid-free paper. ISBN: 978-1-68083-141-2 c 2016 M. Udell, C. Horn, R. Zadeh and S. Boyd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen- ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). -
System Design for Large Scale Machine Learning
UC Berkeley UC Berkeley Electronic Theses and Dissertations Title System Design for Large Scale Machine Learning Permalink https://escholarship.org/uc/item/1140s4st Author Venkataraman, Shivaram Publication Date 2017 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California System Design for Large Scale Machine Learning By Shivaram Venkataraman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael J. Franklin, Co-chair Professor Ion Stoica, Co-chair Professor Benjamin Recht Professor Ming Gu Fall 2017 System Design for Large Scale Machine Learning Copyright 2017 by Shivaram Venkataraman 1 Abstract System Design for Large Scale Machine Learning by Shivaram Venkataraman Doctor of Philosophy in Computer Science University of California, Berkeley Professor Michael J. Franklin, Co-chair Professor Ion Stoica, Co-chair The last decade has seen two main trends in the large scale computing: on the one hand we have seen the growth of cloud computing where a number of big data applications are deployed on shared cluster of machines. On the other hand there is a deluge of machine learning algorithms used for applications ranging from image classification, machine translation to graph processing, and scientific analysis on large datasets. In light of these trends, a number of challenges arise in terms of how we program, deploy and achieve high performance for large scale machine learning applications. In this dissertation we study the execution properties of machine learning applications and based on these properties we present the design and implementation of systems that can address the above challenges. -
Intro to Apache Spark
Intro to Apache Spark ! http://databricks.com/ download slides: http://cdn.liber118.com/workshop/itas_workshop.pdf 00: Getting Started Introduction installs + intros, while people arrive: 20 min Intro: Online Course Materials Best to download the slides to your laptop: cdn.liber118.com/workshop/itas_workshop.pdf Be sure to complete the course survey: http://goo.gl/QpBSnR In addition to these slides, all of the code samples are available on GitHub gists: • gist.github.com/ceteri/f2c3486062c9610eac1d • gist.github.com/ceteri/8ae5b9509a08c08a1132 • gist.github.com/ceteri/11381941 Intro: Success Criteria By end of day, participants will be comfortable with the following: • open a Spark Shell • use of some ML algorithms • explore data sets loaded from HDFS, etc. • review Spark SQL, Spark Streaming, Shark • review advanced topics and BDAS projects • follow-up courses and certification • developer community resources, events, etc. • return to workplace and demo use of Spark! Intro: Preliminaries • intros – what is your background? • who needs to use AWS instead of laptops? • PEM key, if needed? See tutorial: Connect to Your Amazon EC2 Instance from Windows Using PuTTY 01: Getting Started Installation hands-on lab: 20 min Installation: Let’s get started using Apache Spark, in just four easy steps… spark.apache.org/docs/latest/ (for class, please copy from the USB sticks) Step 1: Install Java JDK 6/7 on MacOSX or Windows oracle.com/technetwork/java/javase/downloads/ jdk7-downloads-1880260.html • follow the license agreement instructions • then click the download for your OS • need JDK instead of JRE (for Maven, etc.) (for class, please copy from the USB sticks) Step 1: Install Java JDK 6/7 on Linux this is much simpler on Linux…! sudo apt-get -y install openjdk-7-jdk Step 2: Download Spark we’ll be using Spark 1.0.0 see spark.apache.org/downloads.html 1. -
A Microarchitectural Analysis of Machine Learning Algorithms on Spark
A Microarchitectural Analysis of Machine Learning Algorithms on Spark by José Antonio Muñoz Cepillo A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright 2017 by José Antonio Muñoz Cepillo Abstract A Microarchitectural Analysis of Machine Learning Algorithms on Spark José Antonio Muñoz Cepillo Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2017 Analysis of large data sets utilizing Machine Learning Algorithms (MLAs) is an important research and application area. This thesis presents a comprehensive performance analysis of MLAs on the ApacheTM Spark platform, identifying bottlenecks at the microarchitectural level, and associating them to higher-level code constructs. In addition, we propose solutions to address the discovered performance issues. Our results indicate that: 1) The majority of MLAs are backend-bound, in particular memory-bound. 2) The dot product is the main bottleneck in a considerable number of MLAs. 3) Garbage collection and high-level abstractions are a major source of overhead in Spark. 4) Enabling Hyper-Threading is a recommended feature that improves performance up to 52%. 5) The improved data representation of the Dataset API results in performance improvements of up to 64% over the RDD API. 6) Enabling software caching in the right place improves the performance between 6%-23%. ii Acknowledgements First and foremost, I would like to acknowledge the guidance and contributions of my supervisor, Prof. Andreas Moshovos. Special thanks go to Prof. Nick Koudas, for all his help and support.