ICML 2019 Workshop book Generated Sat Jul 06, 2019

Workshop organizers make last-minute changes to 204, Negative Dependence: Theory and their schedule. Download this document again to Applications in Machine Learning Gartrell, get the lastest changes, or use the ICML mobile Gillenwater, Kulesza, Mariet application. Grand Ballroom A, Understanding and Improving Schedule Highlights Generalization in Deep Learning Krishnan, Mobahi, Neyshabur, Bartlett, Song, Srebro June 14, 2019 Grand Ballroom B, 6th ICML Workshop on 101, ICML 2019 Workshop on Computational Automated Machine Learning (AutoML 2019) Biology Pe'er, Prabhakaran, Azizi, Diallo, Hutter, Vanschoren, Eggensperger, Feurer Kundaje, Engelhardt, Dhifli, MEPHU NGUIFO, Hall A, Generative Modeling and Model-Based Tansey, Vogt, Listgarten, Burdziak, CompBio Reasoning for Robotics and AI Rajeswaran, 102, ICML 2019 Time Series Workshop Todorov, Mordatch, Agnew, Zhang, Pineau, Kuznetsov, Yang, Yu, Tang, Wang Chang, Erhan, Levine, Stachenfeld, Zhang

103, Human In the Loop Learning (HILL) Wang, Hall B, Uncertainty and Robustness in Deep Wang, Yu, Zhang, Gonzalez, Jia, Bird, Learning Li, Lakshminarayanan, Hendrycks, Varshney, Kim, Weller Dietterich, Gilmer

104 A, Climate Change: How Can AI Help? Seaside Ballroom, Reinforcement Learning for Rolnick, Lacoste, Maharaj, Chayes, Bengio Real Life Li, Geramifard, Li, Szepesvari, Wang

104 B, Workshop on the Security and Privacy of Seaside Ballroom, Real-world Sequential Machine Learning Papernot, Tramer, Li, Decision Making: Reinforcement Learning Boneh, Evans, Jha, Liang, McDaniel, Steinhardt, and Beyond Le, Yue, Swaminathan, Boots, Song Cheng

104 C, Theoretical Physics for Deep Learning June 15, 2019 Lee, Pennington, Bahri, Welling, Ganguli, Bruna 101, Workshop on AI for autonomous driving 201, AI in Finance: Applications and Choromanska, Jackel, Li, Niebles, Gaidon, Infrastructure for Multi-Agent Learning Posner, Chao Reddy, Balch, Wellman, Kumar, Stoica, Elkind 102, Workshop on Multi-Task and Lifelong 202, The Third Workshop On Tractable Reinforcement Learning Chandar, Sodhani, Probabilistic Modeling (TPM) Lowd, Vergari, Khetarpal, Zahavy, Mankowitz, Mannor, Molina, Rahman, Domingos, Vergari Ravindran, Precup, Finn, Gupta, Zhang, Cho, Rusu, Rob Fergus 203, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network 103, Invertible Neural Networks and Normalizing Representations (ODML-CDNNR) Ravi, Flows Huang, Krueger, Van den Berg, Kozareva, Fan, Welling, Chen, Bailer, Kulis, Hu, Papamakarios, Gomez, Cremer, Chen, Dekhtiar, Lin, Marculescu Courville, J. Rezende

Page 1 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

104 A, Stein’s Method for Machine Learning and Seaside Ballroom, Adaptive and Multitask Statistics Briol, Mackey, Oates, Liu, Goldstein Learning: Algorithms & Systems Al-Shedivat, Platanios, Stretcu, Andreas, Talwalkar, 104 B, AI For Social Good (AISG) Luck, Sankaran, Caruana, Mitchell, Xing Sylvain, McGregor, Penn, Tadesse, Sylvain, Côté, Mackey, Ghani, Bengio

104 C, Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes Biggio, Korshunov, Mensink, Patrini, Sadhu, Rao

201, ICML Workshop on Imitation, Intent, and Interaction (I3) Rhinehart, Levine, Finn, He, Kostrikov, Fu, Reddy

202, Coding Theory For Large-scale Machine Learning Cadambe, Grover, Papailiopoulos, Joshi

203, The How2 Challenge: New Tasks for Vision & Language Metze, Specia, Elliot, Barrault, Sanabria, Palaskar

204, Machine Learning for Music Discovery Schmidt, Nieto, Gouyon, Kinnaird, Lanckriet

Grand Ballroom A, Workshop on Self-Supervised Learning van den Oord, Aytar, Doersch, Vondrick, Radford, Sermanet, Zamir, Abbeel

Grand Ballroom B, Learning and Reasoning with Graph-Structured Representations Fetaya, Hu, Kipf, Li, Liang, Liao, Urtasun, Wang, Welling, Xing, Zemel

Hall A, Exploration in Reinforcement Learning Workshop Bhupatiraju, Eysenbach, Gu, Edwards, White, Oudeyer, Stanley, Brunskill

Hall B, Identifying and Understanding Deep Learning Phenomena Sedghi, Bengio, Hata, Madry, Morcos, Neyshabur, Raghu, Rahimi, Schmidt, Xiao

Page 2 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Bioinformatics and AI at IJCAI 2015 Buenos Aires, June 14, 2019 IJCAI 2016 New York, IJCAI 2017 Melbourne which had excellent line-ups of talks and were well-received by the community. Every year, we received 60+ submissions. After multiple rounds of rigorous reviewing around 50 submissions were selected from which the best set of papers were ICML 2019 Workshop on Computational chosen for Contributed talks and Spotlights and the Biology rest were invited as Posters. We have a steadfast and growing base of reviewers making up the Donna Pe'er, Sandhya Prabhakaran, Elham Program Committee. For the past edition, a special Azizi, Abdoulaye Baniré Diallo, Anshul Kundaje, issue of Journal of Computational Biology will be Barbara Engelhardt, Wajdi Dhifli, Engelbert released in the following weeks with extended MEPHU NGUIFO, Wesley Tansey, Julia Vogt, versions of 14 accepted papers. Jennifer Listgarten, Cassandra Burdziak, Workshop CompBio We have two confirmed invited speakers and we will invite at least one more leading researcher in the 101, Fri Jun 14, 08:30 AM field. Similar to previous years, we plan to request partial funding from Microsoft Research, Google, The workshop will showcase recent research in the Python, Swiss Institute of Bioinformatics that we field of Computational Biology. There has been intend to use for student travel awards. In past significant development in genomic sequencing years, we have also been able to provide awards for techniques and imaging technologies. These best poster/paper and partially contribute to travel approaches not only generate huge amounts of expenses for at least 8 students per year. data but provide unprecedented resolution of single cells and even subcellular structures. The The Workshop proceedings will be available availability of high dimensional data, at multiple through CEUR proceedings. We would also have an spatial and temporal resolutions has made machine extended version to be included in a special issue of learning and deep learning methods increasingly the Journal of Computational Biology (JCB) for critical for computational analysis and interpretation which we already have an agreement with JCB. of the data. Conversely, biological data has also exposed unique challenges and problems that call Schedule for the development of new machine learning methods. This workshop aims to bring together 08:30 researchers working at the intersection of Machine WCB Organisers AM Learning and Biology to present recent advances 08:40 and open questions in Computational Biology to the Caroline Uhler ML community. AM Jacopo Cirrone, The workshop is a sequel to the WCB workshops Marttinen Pekka , 09:20 we organized in the last three years Joint ICML and Elior Rahmani, Elior AM IJCAI 2018, Stockholm, ICML 2017, Sydney and Rahmani, Harri ICML 2016, New York as well as Workshop on Lähdesmäki

Page 3 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

09:50 ICML 2019 Time Series Workshop Ali Oskooei AM Vitaly Kuznetsov, Scott Yang, Rose Yu, Cheng 11:00 Francisco LePort Tang, Yuyang Wang AM 102, Fri Jun 14, 08:30 AM Ali Oskooei, 11:40 Zhenqin Wu, Karren Time series data is both quickly growing and AM Dai Yang, Mijung already ubiquitous. In domains spanning as broad a Kim range as climate, robotics, entertainment, finance, Wiese, Carter, healthcare, and transportation, there has been a DeBlasio, Hashir, significant shift away from parsimonious, infrequent Chan, Manica, measurements to nearly continuous monitoring and Oskooei, Wu, Yang, recording. Rapid advances in sensing technologies, FAGES, Liu, ranging from remote sensors to wearables and Beebe-Wang, He, social sensing, are generating rapid growth in the 12:00 Poster Session & Cirrone, Marttinen, size and complexity of time series data streams. PM Lunch break Rahmani, Thus, the importance and impact of time series Lähdesmäki, Yadala, analysis and modelling techniques only continues to Deac, Soleimany, grow. Mane, Ernst, Cohen, Mathew, Agarwal, At the same time, while time series analysis has Zheng been extensively studied by econometricians and 02:00 statisticians, modern time series data often pose Ngo Trong Trung PM significant challenges for the existing techniques both in terms of their structure (e.g., irregular Achille Nazaret, sampling in hospital records and spatiotemporal François Fages, structure in climate data) and size. Moreover, the 02:20 Brandon Carter, focus on time series in the machine learning PM Ruishan Liu, community has been comparatively much smaller. Nicasia In fact, the predominant methods in machine Beebe-Wang learning often assume i.i.d. data streams, which is 02:50 Poster Session & generally not appropriate for time series data. Thus, PM Coffee Break there is both a great need and an exciting opportunity for the machine learning community to 04:00 Daphne Koller develop theory, models and algorithms specifically PM for the purpose of processing and analyzing time 04:40 series data. Bryan He PM 05:00 Award Ceremony & We see ICML as a great opportunity to bring PM Closing Remarks together theoretical and applied researchers from around the world and with different backgrounds who are interested in the development and usage of

time series analysis and algorithms. This includes

Page 4 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 methods for time series prediction, classification, workshop includes people who are interested in clustering, anomaly and change point detection, using machines to solve problems by having a causal discovery, and dimensionality reduction as human be an integral part of the process. This well as general theory for learning and analyzing workshop serves as a platform where researchers stochastic processes. Since time series have been can discuss approaches that bridge the gap studied in a variety of different fields and have many between humans and machines and get the best of broad applications, we plan to host leading both worlds. academic researchers and industry experts with a range of perspectives and interests as invited We welcome high-quality submissions in the broad speakers. Moreover, we also invite researchers area of human in the loop learning. A few from the related areas of batch and online learning, (non-exhaustive) topics of interest include: deep learning, reinforcement learning, data analysis and statistics, and many others to both contribute Systems for online and interactive learning and participate in this workshop. algorithms, Active/Interactive machine learning algorithm design, Systems for collecting, preparing, and managing Human In the Loop Learning (HILL) machine learning data, Model understanding tools (verifying, diagnosing, Xin Wang, Xin Wang, Fisher Yu, Shanghang debugging, visualization, introspection, etc), Zhang, Joseph Gonzalez, Yangqing Jia, Sarah Design, testing and assessment of interactive Bird, Kush Varshney, Been Kim, Adrian Weller systems for data analytics, Psychology of human concept learning, 103, Fri Jun 14, 08:30 AM Generalized additive models, sparsity and rule https://sites.google.com/view/hill2019/home learning, Interpretable unsupervised models (clustering, topic This workshop is a joint effort between the 4th ICML models, etc.), Workshop on Human Interpretability in Machine Interpretation of black-box models (including deep Learning (WHI) and the ICML 2019 Workshop on neural networks), Interactive Data Analysis System (IDAS). We have Interpretability in reinforcement learning. combined our forces this year to run Human in the Schedule Loop Learning (HILL) in conjunction with ICML 2019! 08:25 Opening Remarks The workshop will bring together researchers and AM practitioners who study interpretable and interactive 08:30 Invited Talk: James learning systems with applications in large scale AM Philbin data processing, data annotations, data visualization, human-assisted data integration, 09:00 Invited Talk: Sanja systems and tools to interpret machine learning AM Fidler models as well as algorithm designs for active 09:30 Invited Talk: Bryan learning, online learning, and interpretable machine AM Catanzaro learning algorithms. The target audience for the

Page 5 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

IDAS Poster 104 A, Fri Jun 14, 08:30 AM 10:00 Session & Coffee AM Many in the machine learning community wish to break take action on climate change, yet feel their skills 11:30 Invited Talk: Yisong are inapplicable. This workshop aims to show that in AM Yue fact the opposite is true: while no silver bullet, ML 12:00 Invited Talk: Vittorio can be an invaluable tool both in reducing PM Ferrari greenhouse gas emissions and in helping society adapt to the effects of climate change. Climate 12:30 Lunch Break change is a complex problem, for which action PM takes many forms - from designing smart electrical 02:00 Interpretability grids to tracking deforestation in satellite imagery. PM Contributed Talks Many of these actions represent high-impact opportunities for real-world change, as well as being 03:00 Coffee Break interesting problems for ML research. PM

Interpretability Schedule Invited Discussion: California's Senate 08:30 Opening Remarks 03:30 Bill 10 (SB 10) on AM PM Pretrial Release and 08:45 AI for Climate Detention with Platt AM Change: the context Solon Barocas and Peter Eckersley Why it's hard to 09:20 mitigate climate Human in the Loop Kelly 04:45 AM change, and how to Learning Panel PM do better Discussion Tackling climate 09:45 change challenges Ng Abstracts (1): AM with AI through collaboration Abstract 12: Human in the Loop Learning Panel Discussion in Human In the Loop Learning Towards a (HILL), 04:45 PM Sustainable Food 10:10 Supply Chain Kuleshov Panelists: Vittorio Ferrari, Marcin Detyniecki, and AM Powered by James Philbin Artificial Intelligence Deep Learning for Climate Change: How Can AI Help? 10:20 Wildlife DUHART AM Conservation and David Rolnick, Alexandre Lacoste, Tegan Restoration Efforts Maharaj, Jennifer Chayes, Yoshua Bengio

Page 6 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Morning Coffee Stanway, Robson, 10:30 Break + Poster Rangnekar, AM Session Chattopadhyay, Pilipiszyn, LeRoy, Achieving 11:00 Cheng, Zhang, Shen, Drawdown - Chad AM Schroeder, Clough, Frischmann DUHART, Fung, Ududec, Wang, Dao, wu, Giannakis, Sejdinovic, Precup, Watson-Parris, Wen, Chen, Erinjippurath, Li, Zou, van Hoof, Scannell, Mamitsuka, Zhang, Choo, Wang, Requeima, Hwang, Xu, Mathe, Binas, Lee, Ramea, Duffy, McCloskey, Networking Lunch 12:00 Sankaran, Mackey, (provided) + Poster PM Mones, Benabbou, Session Kaack, Hoffman, Mudigonda, Mahdavi, McCourt, Jiang, Kamani, Guha, Dalmasso, Pawlowski, Milojevic-Dupont, Orenstein, Hassanzadeh, Marttinen, Nair, Farhang, Kaski, Manjanna, Luccioni, Deshpande, Kim, Mouatadid, Park, Lin, Felgueira, Hornigold, Yuan, Beucler, Cui, Kuleshov, Yu, song, Wexler, Bengio, Wang, Yi, Malki

Page 7 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Personalized Machine Learning 04:30 01:30 Visualization of the for AC Optimal Guha, Wang Bengio PM PM Impact of Climate Power Flow Change Planetary Scale Advances in 04:40 Monitoring of Urban Clough, Nair, Climate Informatics: PM Growth in High Erinjippurath 01:55 Machine Learning Monteleoni Flood Risk Areas PM for the Study of McCloskey, Climate Change 04:50 "Ideas" Milojevic-Dupont, Detecting PM mini-spotlights Binas, Schroeder, 02:30 anthropogenic Luccioni Watson-Parris PM cloud perturbations Bengio, Ng, Hadsell, with deep learning 05:15 Panel Discussion Platt, Monteleoni, PM Evaluating aleatoric Chayes and epistemic uncertainties of 02:40 time series deep Shen Abstracts (14): PM learning models for Abstract 2: AI for Climate Change: the context in soil moisture Climate Change: How Can AI Help?, Platt 08:45 predictions AM Targeted Meta-Learning for This talk will set the context around "AI for climate 02:50 Kamani, Farhang, Critical Incident change". This context will include the difference PM Mahdavi, Wang Detection in between long-term energy research and Weather Data shorter-term research required to mitigate or adapt Afternoon Coffee to climate change. I'll illustrate the urgency of the 03:00 Break + Poster latter research by discussing the carbon budget of PM Session the atmosphere. The talk will also highlight some examples of how AI can be applied to climate Geoscience data change mitigation and energy research, including 03:30 and models for the Mukkavilli ML for fusion and for flood prediction. PM Climate Change AI community Abstract 3: Why it's hard to mitigate climate ML vs. Climate change, and how to do better in Climate 03:45 Change, Change: How Can AI Help?, Kelly 09:20 AM Witherspoon PM Applications in It's hard to have climate impact! Lots of projects Energy at DeepMind look great from a distance but fail in practice. The Truck Traffic energy system is enormously complex, and there 04:20 Monitoring with Kaack, Chen are many non-technical bottlenecks to having PM Satellite Images impact. In this talk, I'll describe some of these challenges, so you can try to avoid them and hence

Page 8 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 reduce emissions more rapidly! Let's say you've real-world pilot with a US supermarket chain, our built a great ML algorithm and written a paper. Now system reduced waste by up to 50%. We hope that what? Your paper is completely invisible to the this paper will bring the food waste problem to the climate. How do you get your research used by the attention of the broader machine learning research energy system? I don’t claim to have all the community. answers; but I’d like to discuss some of the challenges, and some ideas for how to get round Abstract 6: Deep Learning for Wildlife them. Conservation and Restoration Efforts in Climate Change: How Can AI Help?, DUHART 10:20 AM Abstract 4: Tackling climate change challenges with AI through collaboration in Climate Climate change and environmental degradation are Change: How Can AI Help?, Ng 09:45 AM causing species extinction worldwide. Automatic wildlife sensing is an urgent requirement to track The time is now for the AI community to collaborate biodiversity losses on Earth. Recent improvements with the climate community to help understand, in machine learning can accelerate the development mitigate, and adapt to climate change. In this talk, I of large-scale monitoring systems that would help will present two projects as part of interdisciplinary track conservation outcomes and target efforts. In collaborations, one in the earth system sciences this paper, we present one such system we and one in the energy space, to illustrate specific developed. 'Tidzam' is a Deep Learning framework use cases where AI is making an impact on climate for wildlife detection, identification, and change. I hope this talk will motivate you to geolocalization, designed for the Tidmarsh Wildlife contribute to tackling one of the greatest challenges Sanctuary, the site of the largest freshwater wetland of our time. restoration in Massachusetts.

Abstract 5: Towards a Sustainable Food Supply Abstract 9: Networking Lunch (provided) + Chain Powered by Artificial Intelligence in Poster Session in Climate Change: How Can AI Climate Change: How Can AI Help?, Kuleshov Help?, Stanway, Robson, Rangnekar, 10:10 AM Chattopadhyay, Pilipiszyn, LeRoy, Cheng, Zhang, Shen, Schroeder, Clough, DUHART, Fung, Ududec, About 30-40% of food produced worldwide is Wang, Dao, wu, Giannakis, Sejdinovic, Precup, wasted. This puts a severe strain on the Watson-Parris, Wen, Chen, Erinjippurath, Li, Zou, environment and represents a $165B loss to the US van Hoof, Scannell, Mamitsuka, Zhang, Choo, economy. This paper explores how artificial Wang, Requeima, Hwang, Xu, Mathe, Binas, Lee, intelligence can be used to automate decisions Ramea, Duffy, McCloskey, Sankaran, Mackey, across the food supply chain in order to reduce Mones, Benabbou, Kaack, Hoffman, Mudigonda, waste and increase the quality and affordability of Mahdavi, McCourt, Jiang, Kamani, Guha, food. We focus our attention on supermarkets — Dalmasso, Pawlowski, Milojevic-Dupont, Orenstein, combined with downstream consumer waste, these Hassanzadeh, Marttinen, Nair, Farhang, Kaski, contribute to 40% of total US food losses — and we Manjanna, Luccioni, Deshpande, Kim, Mouatadid, describe an intelligent decision support system for Park, Lin, Felgueira, Hornigold, Yuan, Beucler, Cui, supermarket operators that optimizes purchasing Kuleshov, Yu, song, Wexler, Bengio, Wang, Yi, decisions and minimizes losses. The core of our Malki 12:00 PM system is a model-based reinforcement learn- ing engine for perishable inventory management; in a

Page 9 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Catered sandwiches and snacks will be provided aleatoric term for our long short-term memory (including vegetarian/vegan and gluten-free models for this problem, and asked if the options). Sponsored by Element AI. uncertainty terms behave as they were argued to. We show that the method successfully captures the Abstract 12: Detecting anthropogenic cloud predictive error after tuning a hyperparameter on a perturbations with deep learning in Climate representative training dataset. We show the MCD Change: How Can AI Help?, Watson-Parris 02:30 uncertainty estimate, as previously argued, does PM detect dissimilarity. In this talk, several important challenges with climate modeling where machine One of the most pressing questions in climate learning may help are also introduced to open up a science is that of the effect of anthropogenic aerosol discussion. on the Earth's energy balance. Aerosols provide the `seeds' on which cloud droplets form, and changes Abstract 14: Targeted Meta-Learning for Critical in the amount of aerosol available to a cloud can Incident Detection in Weather Data in Climate change its brightness and other physical properties Change: How Can AI Help?, Kamani, Farhang, such as optical thickness and spatial extent. Clouds Mahdavi, Wang 02:50 PM play a critical role in moderating global temperatures and small perturbations can lead to Due to imbalanced or heavy-tailed nature of significant amounts of cooling or warming. weather- and climate-related datasets, the Uncertainty in this effect is so large it is not currently performance of standard deep learning models known if it is negligible, or provides a large enough significantly deviates from their expected behavior cooling to largely negate present-day warming by on test data. Classical methods to address these CO2. This work uses deep convolutional neural issues are mostly data or application dependent, networks to look for two particular perturbations in hence burdensome to tune. clouds due to anthropogenic aerosol and assess Meta-learning approaches, on the other hand, aim their properties and prevalence, providing valuable to learn hyperparameters in the learning process insights into their climatic effects. using different objective functions on training and validation data. However, these methods suffer from Abstract 13: Evaluating aleatoric and epistemic high computational complexity and are not scalable uncertainties of time series deep learning to large datasets. In this paper, we aim to apply a models for soil moisture predictions in Climate novel framework named as targeted meta-learning Change: How Can AI Help?, Shen 02:40 PM to rectify this issue, and show its efficacy in dealing with the aforementioned biases in datasets. This Soil moisture is an important variable that framework employs a small, well-crafted target determines floods, vegetation health, agriculture dataset that resembles the desired nature of test productivity, and land surface feedbacks to the data in order to guide the learning process in a atmosphere, etc. Accurately modeling soil moisture coupled manner. We empirically show that this has important implications in both weather and framework can overcome the bias issue, common to climate models. The recently available weather-related datasets, in a bow echo detection satellite-based observations give us a unique case study. opportunity to build data-driven models to predict soil moisture instead of using land surface models, Abstract 16: Geoscience data and models for the but previously there was no uncertainty estimate. Climate Change AI community in Climate We tested Monte Carlo dropout (MCD) with an Change: How Can AI Help?, Mukkavilli 03:30 PM

Page 10 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

This talk will outline how to make climate science The road freight sector is responsible for a large datasets and models accessible for machine and growing share of greenhouse gas emissions, learning. The focus will be on climate science but reliable data on the amount of freight that is challenges and opportunities associated with two moved on roads in many parts of the world are distinct projects, 1) EnviroNet: a project focused on scarce. Many low- and middle-income countries bridging gaps between geoscience and machine have limited ground-based traffic monitoring and learning research through a global data repository freight surveying activities. In this proof of concept, of ImageNet analogs and AI challenges, and 2) a we show that we can use an object detection Mila project on changing people's minds and network to count trucks in satellite images and behavior through visualization of future extreme predict average annual daily truck traffic from those climate events. The discussion related to EnviroNet counts. We describe a complete model, test the will be on how datasets and climate science uncertainty of the estimation, and discuss the problems can be framed for the machine learning transfer to developing countries. research community at large. The discussion related to the Mila project will include climate Abstract 19: Machine Learning for AC Optimal science forecast model prototype developments in Power Flow in Climate Change: How Can AI progress for accurately visualizing future extreme Help?, Guha, Wang 04:30 PM climate impacts of events such as floods, that We explore machine learning methods for AC particularly impact individual's neighborhoods and Optimal Powerflow (ACOPF) - the task of optimizing households. power generation in a transmission network Abstract 17: ML vs. Climate Change, Applications according while respecting physical and engineering in Energy at DeepMind in Climate Change: How constraints. We present two formulations of ACOPF Can AI Help?, Witherspoon 03:45 PM as a machine learning problem: 1) an end-to-end prediction task where we directly predict the optimal DeepMind has proved that machine learning can generator settings, and 2) a constraint prediction help us solve challenges in the Energy sector that task where we predict the set of active constraints in contribute to climate change. DeepMind Program the optimal solution. We validate these approaches Manager Sims Witherspoon will share how they on two benchmark grids. have applied ML to reduce energy consumption in data centers as well as to increase the value of wind Abstract 20: Planetary Scale Monitoring of Urban power by 20% (compared to a baseline of no Growth in High Flood Risk Areas in Climate realtime commitments to the grid). Sims will also Change: How Can AI Help?, Clough, Nair, highlight insights the team has learned in their Erinjippurath 04:40 PM application of ML to the real-world as well as the Climate change is increasing the incidence of potential for these kinds of techniques to be applied flooding. Many areas in the developing world are in other areas, to help tackle climate change on an experiencing strong population growth but lack even grander scale. adequate urban planning. This represents a Abstract 18: Truck Traffic Monitoring with significant humanitarian risk. We explore the use of Satellite Images in Climate Change: How Can AI high-cadence satellite imagery provided by Planet, Help?, Kaack, Chen 04:20 PM whose flock of over one hundred ’Dove’ satellites image the entire earth’s landmass everyday at 3-5m resolution. We use a deep learning-based computer

Page 11 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 vision approach to measure flood-related 09:30 Una-May O'Reilly humanitarian risk in 5 cities in Africa. AM

Enhancing Workshop on the Security and Privacy of 10:00 Gradient-based AM Attacks with Machine Learning Symbolic Intervals

Nicolas Papernot, Florian Tramer, Bo Li, Dan Adversarial Boneh, David Evans, Somesh Jha, Percy Liang, 10:20 Policies: Attacking Patrick McDaniel, Jacob Steinhardt, Dawn Song AM Deep Reinforcement Learning 104 B, Fri Jun 14, 08:30 AM 10:45 Le Song As machine learning has increasingly been AM deployed in critical real-world applications, the 11:15 Allen Qi dangers of manipulation and misuse of these AM models has become of paramount importance to Private vqSGD: public safety and user privacy. In applications such 11:45 Vector-Quantized as online content recognition to financial analytics to AM Stochastic Gradient autonomous vehicles all have shown the be Descent vulnerable to adversaries wishing to manipulate the models or mislead models to their malicious ends. 01:15 Ziko Kolter This workshop will focus on recent research and PM future directions about the security and privacy Provable problems in real-world machine learning systems. Certificates for We aim to bring together experts from machine 01:45 Adversarial learning, security, and privacy communities in an PM Examples:Fitting a attempt to highlight recent work in these area as Ball in the Union of well as to clarify the foundations of secure and Polytopes private machine learning strategies. We seek to 02:05 come to a consensus on a rigorous framework to Poster Session #1 formulate adversarial attacks targeting machine PM learning models, and to characterize the properties 02:45 Alexander Madry that ensure the security and privacy of machine PM learning systems. Finally, we hope to chart out 03:15 important directions for future work and Been Kim PM cross-community collaborations. Theoretically Schedule Principled Trade-off 03:45 between PM 09:00 Robustness and Patrick McDaniel AM Accuracy

Page 12 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Model weight theft conference and in numerous workshops over the with just noise last few years. Moreover, the conference has 04:05 inputs: The curious enjoyed an increasing number of papers using PM case of the petulant physics tools and ideas to draw insights into deep attacker learning.

04:15 Schedule Panel PM

05:15 Lee, Pennington, Poster Sesson #2 08:30 PM Opening Remarks Bahri, Welling, AM Ganguli, Bruna

Linearized 08:40 two-layers neural Montanari Theoretical Physics for Deep Learning AM networks in high dimension Jaehoon Lee, Jeffrey Pennington, Yasaman Bahri, Max Welling, Surya Ganguli, Joan Bruna Loss landscape and behaviour of 09:10 104 C, Fri Jun 14, 08:30 AM algorithms in the Zdeborova AM spiked Though the purview of physics is broad and matrix-tensor model includes many loosely connected subdisciplines, a Novak, Dreyer, unifying theme is the endeavor to provide concise, 09:40 Golkar, Higgins, quantitative, and predictive descriptions of the often Poster spotlights AM Antognini, Karakida, large and complex systems governing phenomena Ghosh that occur in the natural world. While one could debate how closely deep learning is connected to 10:20 Break and poster the natural world, it is undeniably the case that deep AM discussion learning systems are large and complex; as such, it On the Interplay is reasonable to consider whether the rich body of 11:00 between Physics Cranmer ideas and powerful tools from theoretical physicists AM and Deep Learning could be harnessed to improve our understanding of deep learning. The goal of this workshop is to Why Deep Learning investigate this question by bringing together Works: Traditional and Heavy-Tailed experts in theoretical physics and deep learning in 11:30 Implicit Mahoney order to stimulate interaction and to begin exploring AM how theoretical physics can shed light on the theory Self-Regularization of deep learning. in Deep Neural Networks We believe ICML is an appropriate venue for this gathering as members from both communities are frequently in attendance and because deep learning theory has emerged as a focus at the conference, both as an independent track in the main

Page 13 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Analyzing the Deep Learning on dynamics of online the 2-Dimensional 04:30 12:00 learning in Ising Model to Walker Goldt PM PM over-parameterized Extract the two-layer neural Crossover Region networks 04:45 Learning the Arrow Rahaman Convergence PM of Time 12:15 Properties of Neural Tachet des Combes Novak, Gabella, PM Networks on Dreyer, Golkar, Tong, Separable Data Higgins, Milletari, 12:30 Antognini, Goldt, Lunch PM Ramírez Rivera, Bondesan, Karakida, Is Optimization a Tachet des Combes, 02:00 sufficient language Arora Mahoney, Walker, PM to understand Deep Fort, Smith, Ghosh, Learning? Baratin, Granziol, Towards Roberts, Vetrov, 02:30 Understanding Wilson, Laurent, PM Regularization in Thomas, Batch Normalization Lacoste-Julien, How Noise during 05:00 Gilboa, Soudry, 02:45 Poster discussion Training Affects the PM Gupta, Goyal, PM Hessian Spectrum Bengio, Elsen, De, Jastrzebski, Martin, 03:00 Break and poster Shabanian, Courville, PM discussion Akaho, Zdeborova, Understanding Dyer, Weiler, de 03:30 overparameterized Sohl-Dickstein Haan, Cohen, PM neural networks Welling, Luo, peng, Asymptotics of Rahaman, Matthey, 04:00 Wide Networks fromGur-Ari J. Rezende, Choi, PM Feynman Diagrams Cranmer, Xiao, Lee, Bahri, Pennington, A Mean Field Yang, Hron, Theory of Quantized 04:15 Sohl-Dickstein, Deep Networks: TheGilboa PM Gur-Ari Quantization-Depth Trade-Off Abstracts (9):

Page 14 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Abstract 2: Linearized two-layers neural Theoretical Physics for Deep Learning, networks in high dimension in Theoretical Zdeborova 09:10 AM Physics for Deep Learning, Montanari 08:40 AM Speaker: Lenka Zdeborova (CEA/SACLAY) Speaker: Andrea Montanari (Stanford) Abstract: A key question of current interest is: How Abstract: Abstract: We consider the problem of are properties of optimization and sampling learning an unknown function f on the d-dimensional algorithms influenced by the properties of the loss sphere with respect to the square loss, given i.i.d. function in noisy high-dimensional non-convex samples (y_i,x_i) where x_i is a feature vector settings? Answering this question for deep neural uniformly distributed on the sphere and y_i = f(x_i). networks is a landmark goal of many ongoing We study two popular classes of models that can be works. In this talk I will answer this question in regarded as linearizations of two-layers neural unprecedented detail for the spiked matrix-tensor networks around a random initialization: (RF) The model. Information theoretic limits, and Kac-Rice random feature model of Rahimi-Recht; (NT) The analysis of the loss landscapes, will be compared to neural tangent kernel model of the analytically studied performance of message Jacot-Gabriel-Hongler. Both these approaches can passing algorithms, of the Langevin dynamics and also be regarded as randomized approximations of of the gradient flow. Several rather non-intuitive kernel ridge regression (with respect to different results will be unveiled and explained. kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, Abstract 4: Poster spotlights in Theoretical for a fixed dimension d. Physics for Deep Learning, Novak, Dreyer, Golkar, Higgins, Antognini, Karakida, Ghosh 09:40 We prove that, if both d and N are large, the AM behavior of these models is instead remarkably A Quantum Field Theory of Representation simpler. Learning Robert Bamler (University of California at If N is of smaller order than d^2, then RF performs Irvine)*; Stephan Mandt (University of California, no better than linear regression with respect to the Irivine) raw features x_i, and NT performs no better than linear regression with respect to degree-one and Covariance in Physics and Convolutional Neural two monomials in the x_i's. More generally, if N is of Networks Miranda Cheng (University of smaller order than d^{k+1} then RF fits at most a Amsterdam)*; Vassilis Anagiannis (University of degree-k polynomial in the raw features, and NT fits Amsterdam); Maurice Weiler (University of at most a degree-(k+ 1) polynomial. Amsterdam); Pim de Haan (University of We then focus on the case of quadratic functions, Amsterdam); Taco S. Cohen (Qualcomm AI and N= O(d). We show that the gap in Research); Max Welling (University of Amsterdam) generalization error between fully trained neural networks and the linearized models is potentially Scale Steerable Filters for Locally Scale-Invariant unbounded. Convolutional Neural Networks Rohan Ghosh [based on joint work with Behrooz Ghorbani, Song (National University of Singapore)*; Anupam Gupta Mei, Theodor Misiakiewicz] (National University of Singapore) Abstract 3: Loss landscape and behaviour of algorithms in the spiked matrix-tensor model in Towards a Definition of Disentangled

Page 15 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Representations Irina Higgins (DeepMind)*; David Topology of Learning in Artificial Neural Networks Amos (DeepMind); Sebastien Racaniere Maxime Gabella (Magma Learning)* (DeepMind); David Pfau (); Loic Matthey Jet grooming through reinforcement learning (DeepMind); Danilo Jimenez Rezende (Google Frederic Dreyer (University of )*; Stefano DeepMind) Carrazza (University of Milan) Inferring the quantum density matrix with machine Bayesian Deep Convolutional Networks with Many learning Kyle Cranmer (New York University); Channels are Gaussian Processes Roman Novak Siavash Golkar (NYU)*; Duccio Pappadopulo (Google Brain)*; Lechao Xiao (Google Brain); (Bloomberg) Jaehoon Lee (Google Brain); Yasaman Bahri Backdrop: Stochastic Backpropagation Siavash (Google Brain); Greg Yang (Microsoft Research AI); Golkar (NYU)*; Kyle Cranmer (New York University) Jiri Hron (); Daniel Abolafia Explain pathology in Deep Gaussian Process using (Google Brain); Jeffrey Pennington (Google Brain); Chaos Theory Anh Tong (UNIST)*; Jaesik Choi Jascha Sohl-Dickstein (Google Brain) (Ulsan National Institute of Science and Technology) Finite size corrections for neural network Gaussian Towards a Definition of Disentangled processes Joseph M Antognini (Whisper AI)* Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere Pathological Spectrum of the Fisher Information (DeepMind); David Pfau (DeepMind); Loic Matthey Matrix in Deep Neural Networks Ryo Karakida (DeepMind); Danilo Jimenez Rezende (DeepMind) (National Institute of Advanced Industrial Science Towards Understanding Regularization in Batch and Technology)*; Shotaro Akaho (AIST); Shun-ichi Normalization Ping Luo (The Chinese University of Amari (RIKEN) Hong Kong); Xinjiang Wang (); Wenqi Shao (The Chinese University of HongKong)*; Zhanglin Peng Inferring the quantum density matrix with machine (SenseTime) learning Kyle Cranmer (New York University); Covariance in Physics and Convolutional Neural Siavash Golkar (NYU)*; Duccio Pappadopulo Networks Miranda Cheng (University of (Bloomberg) Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Jet grooming through reinforcement learning Amsterdam); Pim de Haan (University of Frederic Dreyer ()*; Stefano Amsterdam); Taco S. Cohen (Qualcomm AI Carrazza (University of Milan) Research); Max Welling (University of Amsterdam) Meanfield theory of activation functions in Deep Abstract 5: Break and poster discussion in Neural Networks Mirco Milletari (Microsoft)*; Theoretical Physics for Deep Learning, 10:20 AM Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto (National University of Singapore) Bayesian Deep Convolutional Networks with Many Finite size corrections for neural network Gaussian Channels are Gaussian Processes Roman Novak processes Joseph M Antognini (Whisper AI)* (Google Brain)*; Lechao Xiao (Google Brain); SWANN: Small-World Neural Networks and Rapid Jaehoon Lee (Google Brain); Yasaman Bahri Convergence Mojan Javaheripi (UC San Diego)*; (Google Brain); Greg Yang (Microsoft Research AI); Bita Darvish Rouhani (UC San Diego); Farinaz Jiri Hron (University of Cambridge); Daniel Abolafia Koushanfar (UC San Diego) (Google Brain); Jeffrey Pennington (Google Brain); Analysing the dynamics of online learning in Jascha Sohl-Dickstein (Google Brain)

Page 16 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 over-parameterised two-layer neural networks Feynman Diagrams for Large Width Networks Guy Sebastian Goldt (Institut de Physique théorique, Gur-Ari (Google)*; Ethan Dyer (Google) Paris)*; Madhu Advani (Harvard University); Andrew Deep Learning on the 2-Dimensional Ising Model to Saxe (University of Oxford); Florent Krzakala (École Extract the Crossover Region Nicholas Walker Normale Supérieure); Lenka Zdeborova (CEA (Louisiana State Univ - Baton Rouge)* Saclay) Large Scale Structure of the Loss Landscape of A Halo Merger Tree Generation and Evaluation Neural Networks Stanislav Fort (Stanford Framework Sandra Robles (Universidad Autónoma University)*; Stanislaw Jastrzebski (New York de Madrid); Jonathan Gómez (Pontificia University) Universidad Católica de Chile); Adín Ramírez Momentum Enables Large Batch Training Samuel L Rivera (University of Campinas)*; Jenny Gonzáles Smith (DeepMind)*; Erich Elsen (Google); Soham (Pontificia Universidad Católica de Chile); Nelson De (DeepMind) Padilla (Pontificia Universidad Católica de Chile); Learning the Arrow of Time Nasim Rahaman Diego Dujovne (Universidad Diego Portales) (University of Heidelberg)*; Steffen Wolf (Heidelberg Learning Symmetries of Classical Integrable University); Anirudh Goyal (University of Montreal); Systems Roberto Bondesan (Qualcomm AI Roman Remme (Heidelberg University); Yoshua Research)*, Austen Lamacraft (Cavendish Bengio (Mila) Laboratory, University of Cambridge, UK) Scale Steerable Filters for Locally Scale-Invariant Cosmology inspired generative models Uros Seljak Convolutional Neural Networks Rohan Ghosh (UC Berkeley)*; Francois Lanusse (UC Berkeley) (National University of Singapore)*; Anupam Gupta Pathological Spectrum of the Fisher Information (National University of Singapore) Matrix in Deep Neural Networks Ryo Karakida A Mean Field Theory of Quantized Deep Networks: (National Institute of Advanced Industrial Science The Quantization-Depth Trade-Off Yaniv and Technology)*; Shotaro Akaho (AIST); Shun-ichi Blumenfeld (Technion)*; Dar Gilboa (Columbia Amari (RIKEN) University); Daniel Soudry (Technion) How Noise during Training Affects the Hessian Rethinking Complexity in Deep Learning: A View Spectrum Mingwei Wei (Northwestern University); from Function Space Aristide Baratin (Mila, David Schwab (Facebook AI Research)* Université de Montréal)*; Thomas George (MILA, A Quantum Field Theory of Representation Université de Montréal); César Laurent (Mila, Learning Robert Bamler (University of California at Université de Montréal); Valentin Thomas (MILA); Irvine)*; Stephan Mandt (University of California, Guillaume Lajoie (Université de Montréal, Mila); Irivine) Simon Lacoste-Julien (Mila, Université de Montréal) Convergence Properties of Neural Networks on The Deep Learning Limit: Negative Neural Network Separable Data Remi Tachet des Combes eigenvalues just noise? Diego Granziol (Oxford)*; (Microsoft Research Montreal)*; Mohammad Stefan Zohren (University of Oxford); Stephen Pezeshki (Mila & University of Montreal); Samira Roberts (Oxford); Dmitry P Vetrov (Higher School of Shabanian (Microsoft, Canada); Aaron Courville Economics); Andrew Gordon Wilson (Cornell (MILA, Université de Montréal); Yoshua Bengio University); Timur Garipov (Samsung AI Center in (Mila) Moscow) Universality and Capacity Metrics in Deep Neural Gradient descent in Gaussian random fields as a Networks Michael Mahoney (University of toy model for high-dimensional optimisation Mariano California, Berkeley)*; Charles Martin (Calculation Chouza (Tower Research Capital); Stephen Consulting) Roberts (Oxford); Stefan Zohren (University of

Page 17 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Oxford)* penalty landscape. In particular, the empirical Deep Learning for Inverse Problems Abhejit spectral density (ESD) of DNN layer matrices Rajagopal (University of California, Santa Barbara)*; displays signatures of traditionally-regularized Vincent R Radzicki (University of California, Santa statistical models, even in the absence of Barbara) exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in Abstract 6: On the Interplay between Physics and RMT, most notably its extension to Universality Deep Learning in Theoretical Physics for Deep classes of Heavy-Tailed matrices, and applying Learning, Cranmer 11:00 AM them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to Speaker: Kyle Cranmer (NYU) increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit Abstract: The interplay between physics and deep self-regularization is like traditional Tikhonov learning is typically divided into two themes. The regularization, in that there appears to be a ``size first is “physics for deep learning”, where techniques scale'' separating signal from noise. For from physics are brought to bear on understanding state-of-the-art DNNs, however, we identify a novel dynamics of learning. The second is “deep learning form of heavy-tailed self-regularization, similar to for physics,” which focuses on application of deep the self-organization seen in the statistical physics learning techniques to physics problems. I will of disordered systems. This implicit present a more nuanced view of this interplay with self-regularization can depend strongly on the many examples of how the structure of physics problems knobs of the training process. In particular, by have inspired advances in deep learning and how it exploiting the generalization gap phenomena, we yields insights on topics such as inductive bias, demonstrate that we can cause a small model to interpretability, and causality. exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else Abstract 7: Why Deep Learning Works: being equal---DNN optimization with larger batch Traditional and Heavy-Tailed Implicit sizes leads to less-well implicitly-regularized Self-Regularization in Deep Neural Networks in models, and it provides an explanation for the Theoretical Physics for Deep Learning, Mahoney generalization gap phenomena. Coupled with work 11:30 AM on energy landscapes and heavy-tailed spin Speaker: Michael Mahoney (ICSI and Department glasses, it also suggests an explanation of why of Statistics, University of California at Berkeley) deep learning works. Joint work with Charles Martin of Calculation Consulting, Inc. Abstract: Abstract 11: Is Optimization a sufficient language

to understand Deep Learning? in Theoretical Random Matrix Theory (RMT) is applied to analyze Physics for Deep Learning, Arora 02:00 PM the weight matrices of Deep Neural Networks (DNNs), including both production quality, Speaker: Sanjeev Arora (Princeton/IAS) pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly Abstract: There is an old debate in neuroscience indicate that the DNN training process itself about whether or not learning has to boil down to implicitly implements a form of self-regularization, optimizing a single cost function. This talk will implicitly sculpting a more regularized energy or suggest that even to understand mathematical

Page 18 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 properties of deep learning, we have to go beyond throughout training; and that this perspective the conventional view of "optimizing a single cost enables analytic predictions for how trainability function". The reason is that phenomena occur depends on hyperparameters and architecture. along the gradient descent trajectory that are not These results provide for surprising capabilities -- fully captured in the value of the cost function. I will for instance, the evaluation of test set predictions illustrate briefly with three new results that involve which would come from an infinitely wide trained such phenomena: neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer (i) (joint work with Cohen, Hu, and Luo) How deep convolutional networks. I will argue that this growing matrix factorization solves matrix completion better understanding of neural networks in the limit of than classical algorithms infinite width is foundational for future theoretical https://arxiv.org/abs/1905.13655 and practical understanding of deep learning.

(ii) (joint with Du, Hu, Li, Salakhutdinov, and Wang) Abstract 20: Poster discussion in Theoretical How to compute (exactly) with an infinitely wide net Physics for Deep Learning, Novak, Gabella, ("mean field limit", in physics terms) Dreyer, Golkar, Tong, Higgins, Milletari, Antognini, https://arxiv.org/abs/1904.11955 Goldt, Ramírez Rivera, Bondesan, Karakida, Tachet des Combes, Mahoney, Walker, Fort, Smith, (iii) (joint with Kuditipudi, Wang, Hu, Lee, Zhang, Li, Ghosh, Baratin, Granziol, Roberts, Vetrov, Wilson, Ge) Explaining mode-connectivity for real-life deep Laurent, Thomas, Lacoste-Julien, Gilboa, Soudry, nets (the phenomenon that low-cost solutions found Gupta, Goyal, Bengio, Elsen, De, Jastrzebski, by gradient descent are interconnected in the Martin, Shabanian, Courville, Akaho, Zdeborova, parameter space via low-cost paths; see Garipov et Dyer, Weiler, de Haan, Cohen, Welling, Luo, peng, al'18 and Draxler et al'18) Rahaman, Matthey, J. Rezende, Choi, Cranmer, Xiao, Lee, Bahri, Pennington, Yang, Hron, Abstract 15: Understanding overparameterized Sohl-Dickstein, Gur-Ari 05:00 PM neural networks in Theoretical Physics for Deep Learning, Sohl-Dickstein 03:30 PM Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak Speaker: Jascha Sohl-Dickstein (Google Brain) (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri Abstract: As neural networks become highly (Google Brain); Greg Yang (Microsoft Research AI); overparameterized, their accuracy improves, and Jiri Hron (University of Cambridge); Daniel Abolafia their behavior becomes easier to analyze (Google Brain); Jeffrey Pennington (Google Brain); theoretically. I will give an introduction to a rapidly Jascha Sohl-Dickstein (Google Brain) growing body of work which examines the learning dynamics and prior over functions induced by Topology of Learning in Artificial Neural Networks infinitely wide, randomly initialized, neural networks. Maxime Gabella (Magma Learning)* Core results that I will discuss include: that the distribution over functions computed by a wide Jet grooming through reinforcement learning neural network often corresponds to a Gaussian Frederic Dreyer (University of Oxford)*; Stefano process with a particular compositional kernel, both Carrazza (University of Milan) before and after training; that the predictions of wide neural networks are linear in their parameters Inferring the quantum density matrix with machine

Page 19 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 learning Kyle Cranmer (New York University); Paris)*; Madhu Advani (Harvard University); Andrew Siavash Golkar (NYU)*; Duccio Pappadopulo Saxe (University of Oxford); Florent Krzakala (École (Bloomberg) Normale Supérieure); Lenka Zdeborova (CEA Saclay) Backdrop: Stochastic Backpropagation Siavash Golkar (NYU)*; Kyle Cranmer (New York University) A Halo Merger Tree Generation and Evaluation Framework Sandra Robles (Universidad Autónoma Explain pathology in Deep Gaussian Process using de Madrid); Jonathan Gómez (Pontificia Chaos Theory Anh Tong (UNIST)*; Jaesik Choi Universidad Católica de Chile); Adín Ramírez (Ulsan National Institute of Science and Rivera (University of Campinas)*; Jenny Gonzáles Technology) (Pontificia Universidad Católica de Chile); Nelson Padilla (Pontificia Universidad Católica de Chile); Towards a Definition of Disentangled Diego Dujovne (Universidad Diego Portales) Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere Learning Symmetries of Classical Integrable (DeepMind); David Pfau (DeepMind); Loic Matthey Systems Roberto Bondesan (Qualcomm AI (DeepMind); Danilo Jimenez Rezende (DeepMind) Research)*, Austen Lamacraft (Cavendish Laboratory, University of Cambridge, UK) Towards Understanding Regularization in Batch Normalization Ping Luo (The Chinese University of Pathological Spectrum of the Fisher Information Hong Kong); Xinjiang Wang (); Wenqi Shao (The Matrix in Deep Neural Networks Ryo Karakida Chinese University of HongKong)*; Zhanglin Peng (National Institute of Advanced Industrial Science (SenseTime) and Technology)*; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN) Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of How Noise during Training Affects the Hessian Amsterdam)*; Vassilis Anagiannis (University of Spectrum Mingwei Wei (Northwestern University); Amsterdam); Maurice Weiler (University of David Schwab (Facebook AI Research)* Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI A Quantum Field Theory of Representation Research); Max Welling (University of Amsterdam) Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Meanfield theory of activation functions in Deep Irivine) Neural Networks Mirco Milletari (Microsoft)*; Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto Convergence Properties of Neural Networks on (National University of Singapore) Separable Data Remi Tachet des Combes (Microsoft Research Montreal)*; Mohammad Finite size corrections for neural network Gaussian Pezeshki (Mila & University of Montreal); Samira processes Joseph M Antognini (Whisper AI)* Shabanian (Microsoft, Canada); Aaron Courville (MILA, Université de Montréal); Yoshua Bengio Analysing the dynamics of online learning in (Mila) over-parameterised two-layer neural networks Sebastian Goldt (Institut de Physique théorique, Universality and Capacity Metrics in Deep Neural

Page 20 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Networks Michael Mahoney (University of Simon Lacoste-Julien (Mila, Université de Montréal) California, Berkeley)*; Charles Martin (Calculation Consulting) The Deep Learning Limit: Negative Neural Network eigenvalues just noise? Diego Granziol (Oxford)*; Asymptotics of Wide Networks from Feynman Stefan Zohren (University of Oxford); Stephen Diagrams Guy Gur-Ari (Google)*; Ethan Dyer Roberts (Oxford); Dmitry P Vetrov (Higher School of (Google) Economics); Andrew Gordon Wilson (Cornell University); Timur Garipov (Samsung AI Center in Deep Learning on the 2-Dimensional Ising Model to Moscow) Extract the Crossover Region Nicholas Walker (Louisiana State Univ - Baton Rouge)* Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation Mariano Large Scale Structure of the Loss Landscape of Chouza (Tower Research Capital); Stephen Neural Networks Stanislav Fort (Stanford Roberts (Oxford); Stefan Zohren (University of University)*; Stanislaw Jastrzebski (New York Oxford)* University) Deep Learning for Inverse Problems Abhejit Momentum Enables Large Batch Training Samuel L Rajagopal (University of California, Santa Barbara)*; Smith (DeepMind)*; Erich Elsen (Google); Soham Vincent R Radzicki (University of California, Santa De (DeepMind) Barbara)

Learning the Arrow of Time Nasim Rahaman (University of Heidelberg)*; Steffen Wolf (Heidelberg AI in Finance: Applications and University); Anirudh Goyal (University of Montreal); Infrastructure for Multi-Agent Learning Roman Remme (Heidelberg University); Yoshua Bengio (Mila) Prashant Reddy, Tucker Balch, Michael Wellman, Senthil Kumar, Ion Stoica, Edith Elkind Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh 201, Fri Jun 14, 08:30 AM (National University of Singapore)*; Anupam Gupta Finance is a rich domain for AI and ML research. (National University of Singapore) Model-driven strategies for stock trading and risk

assessment models for loan approvals are A Mean Field Theory of Quantized Deep Networks: quintessential financial applications that are The Quantization-Depth Trade-Off Yaniv reasonably well-understood. However, there are a Blumenfeld (Technion)*; Dar Gilboa (Columbia number of other applications that call for attention University); Daniel Soudry (Technion) as well.

Rethinking Complexity in Deep Learning: A View In particular, many finance domains involve from Function Space Aristide Baratin (Mila, ecosystems of interacting and competing agents. Université de Montréal)*; Thomas George (MILA, Consider for instance the detection of financial fraud Université de Montréal); César Laurent (Mila, and money-laundering. This is a challenging Université de Montréal); Valentin Thomas (MILA); multi-agent learning problem, especially because Guillaume Lajoie (Université de Montréal, Mila); the real world agents involved evolve their

Page 21 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 strategies constantly. Similarly, in algorithmic Invited Talk 2: The trading of stocks, commodities, etc., the actions of 09:30 Strategic Perils of Morgenstern any given trading agent affects, and is affected by, AM Learning from other trading agents -- many of these agents are Historical Data constantly learning in order to adapt to evolving market scenarios. Further, such trading agents 09:50 Oral Paper operate at such a speed and scale that they must AM Presentations 1 be fully autonomous. They have grown in 10:30 Coffee Break and sophistication to employ advanced ML strategies AM Socialization including deep learning, reinforcement learning, and Invited Talk 3: transfer learning. Trend-Following 11:00 Trading Strategies Wellman Financial institutions have a long history of investing AM and Financial in technology as a differentiator and have been key Market Stability drivers in advancing computing infrastructure (e.g., low-latency networking). As more financial 11:20 Poster Highlights - applications employ deep learning and AM Lightning Round reinforcement learning, there is consensus now on 12:00 Lunch the need for more advanced computing PM architectures--for training large machine learning Invited Talk 4: models and simulating large multi-agent learning 02:00 Towards AI systems--that balance scale with the stringent Veloso PM Innovation in the privacy requirements of finance. Financial Domain

Historically, financial firms have been highly 02:20 Oral Paper secretive about their proprietary technology PM Presentations 2 developments. But now, there is also emerging 03:00 Coffee Break and consensus on the need for (1) deeper engagement PM Socialization with academia to advance a shared knowledge of the unique challenges faced in FinTech, and (2) Invited Talk 5: Intra-day Stock more open collaboration with academic and 03:30 Price Prediction as Balch technology partners through intellectually PM sophisticated fora such as this proposed workshop. a Measure of Market Efficiency

Schedule Invited Talk 6: 03:50 RLlib: A Platform Stoica 09:00 PM for Finance Opening Remarks Reddy, Kumar AM Research

Invited Talk 1: 04:10 Poster Session - All 09:10 Adaptive Tolling for PM Accepted Papers Stone AM Multiagent Traffic 06:00 Optimization ICML Reception PM

Page 22 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Research); Manuela Veloso (JPMorgan AI Research) Abstracts (3): Multi-Agent Reinforcement Learning for Liquidation Abstract 4: Oral Paper Presentations 1 in AI in Strategy Analysis, Wenhang Bao (Columbia Finance: Applications and Infrastructure for University); Xiao-Yang Liu (Columbia University) Multi-Agent Learning, 09:50 AM

Some people aren't worth listening to: periodically 09:50-10:03 Risk-Sensitive Compact Decision retraining classifiers with feedback from a team of Trees for Autonomous Execution in presence of end users, Joshua Lockhart (JPMorgan AI Simulated Market Response, Svitlana Vyetrenko Research); Mahmoud Mahfouz (JPMorgan AI (JP Morgan Chase); Kyle Xu (Georgia Institute of Research); Tucker Balch (JPMorgan AI Research); Technology) Manuela Veloso (JPMorgan AI Research)

10:04-10:16 Robust Trading via Adversarial Optimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning Thomas Spooner Reinforcement Learning for Stock Portfolio (University of Liverpool); Rahul Savani (Univ. of Allocation, Xinyi Li (Columbia University); Yinchuan Liverpool) Li ( Beijing Institute of Technology); Yuancheng Zhan (University of Science and Technology of 10:17-10:30 Generating Realistic Stock Market China); Xiao-Yang Liu (Columbia University) Order Streams, Junyi Li (University of Michigan);

Xintong Wang (University of Michigan); Yaoyang Lin How to Evaluate Trading Strategies: Backtesting or (University of Michigan); Arunesh Sinha (University Agent-based Simulation?, Tucker Balch (JPMorgan of Michigan); Michael Wellman (University of AI Research); David Byrd (Georgia Tech); Michigan) Mahmoud Mahfouz (JPMorgan AI Research) Abstract 7: Poster Highlights - Lightning Round in AI in Finance: Applications and Infrastructure Deep Reinforcement Learning for Optimal Trade for Multi-Agent Learning, 11:20 AM Execution, Siyu Lin (University of Virginia)

Self Organizing Supply Chains for Micro-Prediction; Abstract 10: Oral Paper Presentations 2 in AI in Present and Future Uses of the ROAR Protocol, Finance: Applications and Infrastructure for Peter D Cotton (JP Morgan Chase) Multi-Agent Learning, 02:20 PM

02:20-02:33 Towards Inverse Reinforcement Learning-Based Trading Strategies in the Face of Learning for Limit Order Book Dynamics, Jacobo Market Manipulation, Xintong Wang (University of Roa Vicens (University College London); Cyrine Michigan); Chris Hoang (University of Michigan); Chtourou (JPMorgan Chase); Angelos Filos Michael Wellman (University of Michigan) (University of Oxford); Francisco Rullan (University

College of London); Yarin Gal (University of Multi-Agent Simulation for Pricing and Hedging in a Oxford); Dealer Market, Sumitra Ganesh (JPMorgan AI Ricardo Silva (University College London) Research); Nelson Vadori (JPMorgan AI Research);

Mengda Xu (JPMorgan AI Research); Hua Zheng 02:34-02:47 An Agent-Based Model of Financial (JPMorgan Chase); Prashant Reddy (JPMorgan AI Benchmark Manipulation, Megan J Shearer

Page 23 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

(University of Michigan); Gabriel Rauterberg exact (or tractably approximate) inference and/or (University of Michigan); Michael Wellman learning. To achieve this, the following approaches (University of Michigan) have been proposed: i) low or bounded-treewidth probabilistic graphical models and determinantal 02:47-03:00 The sharp, the flat and the shallow: point processes, that exchange expressiveness for Can weakly interacting agents learn to escape bad efficiency; ii) graphical models with high girth or minima?, Panos Parpas (Imperial College London); weak potentials, that provide bounds on the Nikolas Kantas (Imperial College London); Grigorios performance of approximate inference methods; Pavliotis (Imperial College London) and iii) exchangeable probabilistic models that exploit symmetries to reduce inference complexity. More recently, models compiling inference routines The Third Workshop On Tractable into efficient computational graphs such as Probabilistic Modeling (TPM) arithmetic circuits, sum-product networks, cutset networks and probabilistic sentential decision Daniel Lowd, Antonio Vergari, Alejandro Molina, diagrams have advanced the state-of-the-art Tahrima Rahman, Pedro Domingos, Antonio inference performance by exploiting context-specific Vergari independence, determinism or by exploiting latent variables. TPMs have been successfully used in 202, Fri Jun 14, 08:30 AM numerous real-world applications: image classification, completion and generation, scene Probabilistic modeling has become the de facto understanding, activity recognition, language and framework to reason about uncertainty in Machine speech modeling, bioinformatics, collaborative Learning and AI. One of the main challenges in filtering, verification and diagnosis of physical probabilistic modeling is the trade-off between the systems. expressivity of the models and the complexity of performing various types of inference, as well as The aim of this workshop is to bring together learning them from data. researchers working on the different fronts of

tractable probabilistic modeling, highlighting recent This inherent trade-off is clearly visible in powerful -- trends and open challenges. At the same time, we but intractable -- models like Markov random fields, want to foster the discussion across similar or (restricted) Boltzmann machines, (hierarchical) complementary sub-fields in the broader Dirichlet processes and Variational Autoencoders. probabilistic modeling community. In particular, the Despite these models’ recent successes, rising field of neural probabilistic models, such as performing inference on them resorts to normalizing flows and autoregressive models that approximate routines. Moreover, learning such achieve impressive results in generative modeling. models from data is generally harder as inference is It is an interesting open challenge for the TPM a sub-routine of learning, requiring simplifying community to keep a broad range of inference assumptions or further approximations. Having routines tractable while leveraging these models’ guarantees on tractability at inference and learning expressiveness. Furthermore, the rising field of time is then a highly desired property in many probabilistic programming promises to be the new real-world scenarios. lingua franca of model-based learning. This offers

the TPM community opportunities to push the Tractable probabilistic modeling (TPM) concerns expressiveness of the models used for methods guaranteeing exactly this: performing

Page 24 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 general-purpose universal probabilistic languages, 04:10 Poster session such as Pyro, while maintaining efficiency. PM

We want to promote discussions and advance the field both by having high quality contributed works, Abstracts (5): as well as high level invited speakers coming from the aforementioned tangent sub-fields of Abstract 2: Testing Arithmetic Circuits in The probabilistic modeling. Third Workshop On Tractable Probabilistic Modeling (TPM), Darwiche 09:10 AM Schedule I will discuss Testing Arithmetic Circuits (TACs), 09:00 which are new tractable probabilistic models that Welcome AM are universal function approximators like neural networks. A TAC represents a piecewise multilinear 09:10 Testing Arithmetic Darwiche function and computes a marginal query on the AM Circuits newly introduced Testing Bayesian Network (TBN). 09:50 The structure of a TAC is automatically compiled Poster spotlights AM from a Bayesian network and its parameters are learned from labeled data using gradient descent. 10:30 Coffee Break TACs can incorporate background knowledge that AM is encoded in the Bayesian network, whether 11:00 Tractable Islands conditional independence or domain constraints. Dechter AM Revisited Hence, the behavior of a TAC comes with some 11:40 guarantees that are invariant to how it is trained Poster spotlights AM from data. Moreover, a TAC is amenable to being interpretable since its nodes and parameters have Sum-Product precise meanings by virtue of being compiled from a 12:00 Networks and Deep Peharz Bayesian network. This recent work aims to fuse PM Learning: A Love models (Bayesian networks) and functions (DNNs) Marriage with the goal of realizing their collective benefits. 12:40 Lunch PM Abstract 5: Tractable Islands Revisited in The Third Workshop On Tractable Probabilistic 02:20 Tensor Variable Bingham Modeling (TPM), Dechter 11:00 AM PM Elimination in Pyro

03:00 "An important component of human problem-solving Coffee Break PM expertise is the ability to use knowledge about solving easy problems to guide the solution of Invertible Residual difficult ones.” - Minsky Networks and a 03:30 A longstanding intuition in AI is that intelligent Novel Perspective Jacobsen PM agents should be able to use solutions to easy on Adversarial problems to solve hard problems. This has often Examples been termed the “tractable island paradigm.” How do we act on this intuition in the domain of

Page 25 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 probabilistic reasoning? This talk will describe the undirected factor graphs to plated factor graphs, status of probabilistic reasoning algorithms that are and a corresponding generalization of the variable driven by the tractable islands paradigm when elimination algorithm that exploits efficient tensor solving optimization, likelihood and mixed algebra in graphs with plates of variables. This (max-sum-product, e.g. marginal map) queries. I will tensor variable elimination algorithm has been show how heuristics generated via variational integrated into the Pyro probabilistic programming relaxation into tractable structures, can guide language, enabling scalable, automated exact heuristic search and Monte-Carlo sampling, yielding inference in a wide variety of deep generative anytime solvers that produce approximations with models with repeated discrete latent structure. I will confidence bounds that improve with time, and discuss applications of such models to polyphonic become exact if enough time is allowed. music modeling, animal movement modeling, and unsupervised word-level sentiment analysis, as well Abstract 7: Sum-Product Networks and Deep as algorithmic applications to exact Learning: A Love Marriage in The Third subcomputations in approximate inference and Workshop On Tractable Probabilistic Modeling ongoing work on extensions to continuous latent (TPM), Peharz 12:00 PM variables.

Sum-product networks (SPNs) are a prominent Abstract 11: Invertible Residual Networks and a class of tractable probabilistic model, facilitating Novel Perspective on Adversarial Examples in efficient marginalization, conditioning, and other The Third Workshop On Tractable Probabilistic inference routines. However, despite these Modeling (TPM), Jacobsen 03:30 PM attractive properties, SPNs have received rather little attention in the (probabilistic) deep learning In this talk, I will discuss how state-of-the-art community, which rather focuses on intractable discriminative deep networks can be turned into models such as generative adversarial networks, likelihood-based density models. Further, I will variational autoencoders, normalizing flows, and discuss how such models give rise to an alternative autoregressive density estimators. In this talk, I viewpoint on adversarial examples. Under this discuss several recent endeavors which viewpoint adversarial examples are a consequence demonstrate that i) SPNs can be effectively used as of excessive invariances learned by the classifier, deep learning models, and ii) that hybrid learning manifesting themselves in striking failures when approaches utilizing SPNs and other deep learning evaluating the model on out of distribution inputs. I models are in fact sensible and beneficial. will discuss how the commonly used cross-entropy objective encourages such overly invariant Abstract 9: Tensor Variable Elimination in Pyro in representations. Finally, I will present an extension The Third Workshop On Tractable Probabilistic to cross-entropy that, by exploiting properties of Modeling (TPM), Bingham 02:20 PM invertible deep networks, enables control of erroneous invariances in theory and practice. A wide class of machine learning algorithms can be reduced to variable elimination on factor graphs. While factor graphs provide a unifying notation for Joint Workshop on On-Device Machine these algorithms, they do not provide a compact Learning & Compact Deep Neural Network way to express repeated structure when compared Representations (ODML-CDNNR) to plate diagrams for directed graphical models. In this talk, I will describe a generalization of

Page 26 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Sujith Ravi, Zornitsa Kozareva, Lixin Fan, Max Hardware Efficiency Welling, Yurong Chen, Werner Bailer, Brian 08:40 Aware Neural Han Kulis, Haoji Hu, Jonathan Dekhtiar, Yingyan Lin, AM Architecture Search Diana Marculescu and Compression

203, Fri Jun 14, 08:30 AM Structured matrices 09:10 for efficient deep Kumar AM This joint workshop aims to bring together learning researchers, educators, practitioners who are DeepCABAC: interested in techniques as well as applications of Context-adaptive on-device machine learning and compact, efficient 09:40 binary arithmetic neural network representations. One aim of the Wiedemann AM coding for deep workshop discussion is to establish close neural network connection between researchers in the machine compression learning community and engineers in industry, and to benefit both academic researchers as well as 10:00 Poster spotlight industrial practitioners. The other aim is the AM presentations evaluation and comparability of resource-efficient 10:30 Coffee Break AM machine learning methods and compact and AM efficient network representations, and their relation Understanding the to particular target platforms (some of which may be Challenges of highly optimized for neural network inference). The 11:00 Algorithm and research community has still to develop established Sze AM Hardware Co-design evaluation procedures and metrics. for Deep Neural Networks The workshop also aims at reproducibility and comparability of methods for compact and efficient Dream Distillation: neural network representations, and on-device 11:30 A Data-Independent Bhardwaj machine learning. Contributors are thus encouraged AM Model Compression to make their code available. The workshop Framework organizers plan to make some example tasks and The State of datasets available, and invite contributors to use 11:50 Sparsity in Deep Gale them for testing their work. In order to provide AM Neural Networks comparable performance evaluation conditions, the 12:10 use of a common platform (such as Google Colab) Lunch break is intended. PM 12:40 Hao, Lin, Li, Ruthotto, Schedule Poster session PM Yang, Karkada

DNN Training and 08:30 Welcome and 02:00 Inference with AM Introduction Gopalakrishnan PM Hyper-Scaled Precision

Page 27 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

02:30 Mixed Precision Abstract 5: Poster spotlight presentations in Dekhtiar PM Training & Inference Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network 03:00 Coffee Break PM Representations (ODML-CDNNR), 10:00 AM PM

Learning Compact 2min presentations of the posters presentations Neural Networks during lunch break Using Ordinary 03:30 Abstract 7: Understanding the Challenges of Differential PM Algorithm and Hardware Co-design for Deep Equations as Neural Networks in Joint Workshop on Activation On-Device Machine Learning & Compact Deep Functions Neural Network Representations Triplet Distillation (ODML-CDNNR), Sze 11:00 AM 03:50 for Deep Face PM Recognition The co-design of algorithm and hardware has become an increasingly important approach for Single-Path NAS: addressing the computational complexity of Deep 04:10 Device-Aware Stamoulis Neural Networks (DNNs). There are several open PM Efficient ConvNet problems and challenges in the co-design process Design and application; for instance, what metrics should 04:30 be used to drive the algorithm design, how to Panel discussion PM automate the process in a simple way, how to 05:30 Wrap-up and extend these approaches to tasks beyond image PM Closing classification, and how to design flexible hardware to support these different approaches. In this talk, we highlight recent and ongoing work that aim to Abstracts (10): address these challenges, namely energy-aware pruning and NetAdapt that automatically incorporate Abstract 4: DeepCABAC: Context-adaptive binary direct metrics such as latency and energy into the arithmetic coding for deep neural network training and design of the DNN; FastDepth that compression in Joint Workshop on On-Device extends the co-design approaches to a depth Machine Learning & Compact Deep Neural estimation task; and a flexible hardware accelerator Network Representations (ODML-CDNNR), called Eyeriss v2 that is computationally efficient Wiedemann 09:40 AM across a wide range of diverse DNNs. BIO: Vivienne Sze is an Associate Professor at MIT Simon Wiedemann, Heiner Kirchhoffer, Stefan in the Electrical Engineering and Computer Science Matlage, Paul Haase, Arturo Marban Gonzalez, Department. Her research interests include Talmaj Marinc, Heiko Schwarz, Detlev Marpe, energy-aware signal processing algorithms, and Thomas Wiegand, Ahmed Osman and Wojciech low-power circuit and system design for portable Samek multimedia applications, including computer vision, deep learning, autonomous navigation, and video http://arxiv.org/abs/1905.08318 process/coding. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI,

Page 28 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 where she designed low-power algorithms and Machine Learning & Compact Deep Neural architectures for video coding. She also represented Network Representations (ODML-CDNNR), Gale TI in the JCT-VC committee of ITU-T and ISO/IEC 11:50 AM standards body during the development of High Efficiency Video Coding (HEVC), which received a Trevor Gale, Erich Elsen and Sara Hooker Primetime Engineering Emmy Award. She is a co-editor of the book entitled “High Efficiency Video https://arxiv.org/abs/1902.09574 (to be updated) Coding (HEVC): Algorithms and Architectures” Abstract 11: Poster session in Joint Workshop (Springer, 2014). on On-Device Machine Learning & Compact Prof. Sze received the B.A.Sc. degree from the Deep Neural Network Representations University of Toronto in 2004, and the S.M. and (ODML-CDNNR), Hao, Lin, Li, Ruthotto, Yang, Ph.D. degree from MIT in 2006 and 2010, Karkada 12:40 PM respectively. In 2011, she received the Jin-Au Kong Outstanding Doctoral Prize in Electrical Xiaofan Zhang, Hao Cong, Yuhong Li, Yao Chen, Engineering at MIT. She is a recipient of the 2019 Jinjun Xiong, Wen-Mei Hwu and Deming Chen. A Edgerton Faculty Award, the 2018 Facebook Bi-Directional Co-Design Approach to Enable Deep Faculty Award, the 2018 & 2017 Qualcomm Faculty Learning on IoT Devices Award, the 2018 & 2016 Google Faculty Research https://arxiv.org/abs/1905.08369 Award, the 2016 AFOSR Young Investigator

Research Program (YIP) Award, the 2016 3M Kushal Datta, Aishwarya Bhandare, Deepthi Non-Tenured Faculty Award, the 2014 DARPA Karkada, Vamsi Sripathi, Sun Choi, Vikram Saletore Young Faculty Award, the 2007 DAC/ISSCC and Vivek Menon. Efficient 8-Bit Quantization of Student Design Contest Award, and a co-recipient Transformer Neural Machine Language Translation of the 2017 CICC Outstanding Invited Paper Award, Model the 2016 IEEE Micro Top Picks Award and the 2008

A-SSCC Outstanding Design Award. Bin Yang, Lin Yang, Xiaochun Li, Wenhan Zhang, For more information about research in the Hua Zhou, Yequn Zhang, Yongxiong Ren and Yinbo Energy-Efficient Multimedia Systems Group at MIT Shi. 2-bit Model Compression of Deep visit: http://www.rle.mit.edu/eems/ Convolutional Neural Network on ASIC Engine for image retrieval Abstract 8: Dream Distillation: A https://arxiv.org/abs/1905.03362 Data-Independent Model Compression

Framework in Joint Workshop on On-Device Zhong Qiu Lin, Brendan Chwyl and Alexander Machine Learning & Compact Deep Neural Wong. EdgeSegNet: A Compact Network for Network Representations (ODML-CDNNR), Semantic Segmentation Bhardwaj 11:30 AM https://arxiv.org/abs/1905.04222 Kartikeya Bhardwaj, Naveen Suda and Radu Marculescu Sheng Lin, Xiaolong Ma, Shaokai Ye, Geng Yuan, Kaisheng Ma and Yanzhi Wang. Toward Extremely http://arxiv.org/abs/1905.07072 Low Bit and Lossless Accuracy in DNNs with Progressive ADMM Abstract 9: The State of Sparsity in Deep Neural https://arxiv.org/abs/1905.00789 Networks in Joint Workshop on On-Device

Page 29 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Chengcheng Li, Zi Wang, Dali Wang, Xiangyang Machine Learning & Compact Deep Neural Wang and Hairong Qi. Investigating Channel Network Representations (ODML-CDNNR), 03:50 Pruning through Structural Redundancy Reduction - PM A Statistical Study https://arxiv.org/abs/1905.06498 Yushu Feng, Huan Wang and Haoji Hu

Wei Niu, Yanzhi Wang and Bin Ren. CADNN: Ultra https://arxiv.org/abs/1905.04457 Fast Execution of DNNs on Mobile Devices with Abstract 17: Single-Path NAS: Device-Aware Advanced Model Compression and Efficient ConvNet Design in Joint Workshop on Architecture-Aware Optimization On-Device Machine Learning & Compact Deep https://arxiv.org/abs/1905.00571 Neural Network Representations

(ODML-CDNNR), Stamoulis 04:10 PM Jonathan Ephrath, Lars Ruthotto, Eldad Haber and Eran Treister. LeanResNet: A Low-cost yet Effective Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Convolutional Residual Networks Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu https://arxiv.org/abs/1904.06952 and Diana Marculescu

Dushyant Mehta, Kwang In Kim and Christian https://arxiv.org/abs/1905.04159 Theobalt. Implicit Filter Sparsification In

Convolutional Neural Networks https://arxiv.org/abs/1905.04967 Negative Dependence: Theory and

Abstract 12: DNN Training and Inference with Applications in Machine Learning Hyper-Scaled Precision in Joint Workshop on Mike Gartrell, Jennifer Gillenwater, Alex On-Device Machine Learning & Compact Deep Kulesza, Zelda Mariet Neural Network Representations (ODML-CDNNR), Gopalakrishnan 02:00 PM 204, Fri Jun 14, 08:30 AM

Kailash Gopalakrishnan Models of negative dependence are increasingly important in machine learning. Whether selecting Abstract 15: Learning Compact Neural Networks training data, finding an optimal experimental Using Ordinary Differential Equations as design, exploring in reinforcement learning, or Activation Functions in Joint Workshop on making suggestions with recommender systems, On-Device Machine Learning & Compact Deep selecting high-quality but diverse items has become Neural Network Representations a core challenge. This workshop aims to bring (ODML-CDNNR), 03:30 PM together researchers who, using theoretical or Mohamadali Torkamani, Phillip Wallis, Shiv Shankar applied techniques, leverage negative dependence and Amirmohammad Rooshenas in their work. We will delve into the rich underlying mathematical theory, understand key applications, https://arxiv.org/abs/1905.07685 and discuss the most promising directions for future research. Abstract 16: Triplet Distillation for Deep Face Recognition in Joint Workshop on On-Device Schedule

Page 30 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

08:45 03:00 Opening Remarks [Coffee Break] AM PM

Victor-Emmanuel Sergei Levine: Brunel: Negative Distribution 08:50 Association and 03:30 Matching and Brunel Levine AM Discrete PM Mutual Information Determinantal Point in Reinforcement Processes Learning

Aarti Singh: Seq2Slate: 09:30 Experimental Singh 04:10 Re-ranking and AM Meshi Design PM Slate Optimization with RNNs On Two Ways to use Determinantal 04:30 10:10 [Mini Break] Point Processes for Gautier PM AM Monte Carlo Cheng Zhang: Integration Active Mini-Batch 04:40 10:30 Sampling using Zhang [Coffee Break] PM AM Repulsive Point Processes Jeff Bilmes: Deep 11:00 Submodular Bilmes Gaussian Process AM Synergies Optimization with 05:20 Adaptive Sketching:Valko Submodular Batch PM Scalable and No 11:40 Selection for N Balasubramanian Regret AM Training Deep Neural Networks Towards Efficient 05:40 Evaluation of Risk Xu 12:00 PM [Lunch Break] via Herding PM 06:00 Michal Valko: How Closing Remarks PM Negative Dependence Broke 02:00 the Quadratic Valko PM Abstracts (12): Barrier for Learning with Graphs and Abstract 2: Victor-Emmanuel Brunel: Negative Kernels Association and Discrete Determinantal Point Exact Sampling of Processes in Negative Dependence: Theory and Determinantal Point Applications in Machine Learning, Brunel 08:50 02:40 Processes with Derezinski AM PM Sublinear Time Preprocessing Discrete Determinantal Point Processes (DPPs) form a class of probability distributions that can

Page 31 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 describe the random selection of items from a finite Applications in Machine Learning, Bilmes 11:00 or countable collection. They naturally arise in many AM problems in probability theory, and they have gained a lot of attention in machine learning, due to Submodularity is an attractive framework in both their modeling flexibility and their tractability. In machine learning to model concepts such as the finite case, a DPP is parametrized by a matrix, diversity, dispersion, and cooperative costs, and is whose principal minors are the weights given by the having an ever increasing impact on the field of DPP to each possible subset of items. When the machine learning. Deep learning is having a bit of matrix is symmetric, the DPP has a very special success as well. In this talk, we will discuss property, called Negative Association. Thanks to synergies, where submodular functions and deep this property, symmetric DPPs enforce diversity neural networks can be used together to their within the randomly selected items, which is a mutual benefit. First, we'll discuss deep submodular feature that is sought for in many applications of functions (DSFs), an expressive class of functions Machine Learning, such as recommendation that include many widely used submodular functions systems. and that are defined analogously to deep neural networks (DNN). We'll show that the class of DSFs Abstract 3: Aarti Singh: Experimental Design in strictly increases with depth and discuss Negative Dependence: Theory and Applications applications. Second, we'll see how a modification in Machine Learning, Singh 09:30 AM to DNN autoencoders can produce features that can be used in DSFs. These DSF/DNN hybrids address TBD an open problem which is how best to produce a submodular function for your application. Third, we'll Abstract 4: On Two Ways to use Determinantal see how submodular functions can speed up the Point Processes for Monte Carlo Integration in training of models. In one case, submodularity can Negative Dependence: Theory and Applications be used to produce a sequence of mini-batches that in Machine Learning, Gautier 10:10 AM speeds up training of DNN systems. In another case, submodularity can produce a training data This paper focuses on Monte Carlo integration with subset for which we can show faster convergence determinantal point processes (DPPs) which to the optimal solution in the convex case. enforce negative dependence between quadrature Empirically, this method speeds up gradient nodes. We survey the properties of two unbiased methods by up to 10x for convex and 3x for Monte Carlo estimators of the integral of inter- est: a non-convex (i.e., deep) functions. direct one proposed by Bardenet & Hardy (2016) and a less obvious 60-year-old estimator by The above discusses various projects that were Ermakov & Zolotukhin (1960) that actually also performed jointly with Wenruo Bai, Shengjie Wang, relies on DPPs. We provide an efficient Chandrashekhar Lavania, Baharan Mirzasoleiman, implementation to sample exactly a particular and Jure Leskovec. multidimensional DPP called multivariate Jacobi ensemble. This let us investigate the behavior of Abstract 7: Submodular Batch Selection for both estimators on toy problems in yet unexplored Training Deep Neural Networks in Negative regimes. Dependence: Theory and Applications in Machine Learning, N Balasubramanian 11:40 AM Abstract 6: Jeff Bilmes: Deep Submodular Synergies in Negative Dependence: Theory and

Page 32 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Mini-batch gradient descent based methods are the huge portions of dictionary. With the small de facto algorithms for training neural network dictionary, SQUEAK never constructs the whole architectures today. We introduce a mini-batch matrix K_n, runs in linear time O(n*d_eff(gamma)^3) selection strategy based on submodular function w.r.t. n, and requires only a single pass over the maximization. Our novel submodular formulation dataset. A distributed version of SQUEAK runs in as captures the informativeness of each sample and little as O(log(n)*d_eff(gamma)^3) time. This online diversity of the whole subset. We design an tool opens out a range of possibilities to finally have efficient, greedy algorithm which can give scalable, adaptive, and provably accurate kernel high-quality solutions to this NP-hard combinatorial methods: semi-supervised learning or Laplacian optimization problem. Our extensive experiments on smoothing on large graphs, scalable GP-UCB, standard datasets show that the deep models efficient second-order kernel online learning, and trained using the proposed batch selection strategy even fast DPP sampling, some of these being provide better generalization than Stochastic featured in this workshop. Gradient Descent as well as a popular baseline sampling strategy across different learning rates, Abstract 10: Exact Sampling of Determinantal batch sizes, and distance metrics. Point Processes with Sublinear Time Preprocessing in Negative Dependence: Theory Abstract 9: Michal Valko: How Negative and Applications in Machine Learning, Dependence Broke the Quadratic Barrier for Derezinski 02:40 PM Learning with Graphs and Kernels in Negative Dependence: Theory and Applications in We study the complexity of sampling from a Machine Learning, Valko 02:00 PM distribution over all index subsets of the set {1,...,n} with the probability of a subset S proportional to the As we advance with resources, we move from determinant of the submatrix L_S of some n x n reasoning on entities to reasoning on pairs and p.s.d. matrix L, where L_S corresponds to the groups. We have beautiful frameworks: graphs, entries of L indexed by S. Known as a determinantal kernels, DPPs. However, the methods that work point process, this distribution is widely used in with pairs, relationships, and similarity are just slow. machine learning to induce diversity in subset Kernel regression, or second-order gradient selection. In practice, we often wish to sample methods, or sampling from DPPs do not scale to multiple subsets S with small expected size k = large data, because of the costly construction and E[|S|] << n from a very large matrix L, so it is storing of matrix K_n. Prior work showed that important to minimize the preprocessing cost of the sampling points according to their ridge leverage procedure (performed once) as well as the sampling scores (RLS) generates small dictionaries with cost (performed repeatedly). To that end, we strong spectral approximation guarantees for K_n. propose a new algorithm which, given access to L, However, computing exact RLS requires building samples exactly from a determinantal point process and storing the whole kernel matrix. In this talk, we while satisfying the following two properties: (1) its start with SQUEAK, a new online approach for preprocessing cost is n x poly(k) (sublinear in the kernel approximations based on RLS sampling that size of L) and (2) its sampling cost is poly(k) sequentially processes the data, storing a dictionary (independent of the size of L). Prior to this work, with a number of points that only depends on the state-of-the-art exact samplers required O(n^3) effective dimension d_eff(gamma) of the dataset. preprocessing time and sampling time linear in n or The beauty of negative dependence, that we dependent on the spectral properties of L. estimate on the fly, makes it possible to exclude

Page 33 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Abstract 12: Sergei Levine: Distribution Matching seq2slate. At each step, the model predicts the next and Mutual Information in Reinforcement "best" item to place on the slate given the items Learning in Negative Dependence: Theory and already selected. The sequential nature of the Applications in Machine Learning, Levine 03:30 model allows complex dependencies between the PM items to be captured directly in a flexible and scalable way. We show how to learn the model Conventionally, reinforcement learning is end-to-end from weak supervision in the form of considered to be a framework for optimization: the easily obtained click-through data. We further aim for standard reinforcement learning algorithms demonstrate the usefulness of our approach in is to recover an optimal or near-optimal policy that experiments on standard ranking benchmarks as maximizes the reward over time. However, when well as in a real-world recommendation system. considering more advanced reinforcement learning problems, from inverse reinforcement learning to Abstract 15: Cheng Zhang: Active Mini-Batch unsupervised and hierarchical reinforcement Sampling using Repulsive Point Processes in learning, we often encounter settings where it is Negative Dependence: Theory and Applications desirable to learn policies that match target in Machine Learning, Zhang 04:40 PM distributions over trajectories or states, covering all modes, or else to simply learn collections of We explore active mini-batch selection using behaviors that are as broad and varied as possible. repulsive point processes for stochastic gradient Information theory and probabilistic inference offer descent (SGD). Our approach simultaneously is a powerful set of tools for developing algorithms introduces active bias and leads to stochastic for these kinds of distribution matching problems. In gradients with lower variance. We show this talk, I will outline methods that combine theoretically and empirically that our approach reinforcement learning, inference, and information improves over standard SGD both in terms of theory to learn policies that match target convergence speed as well as final model distributions and acquire diverse behaviors, and performance. discuss the applications of such methods for a Abstract 16: Gaussian Process Optimization with variety of problems in artificial intelligence and Adaptive Sketching: Scalable and No Regret in robotics. Negative Dependence: Theory and Applications Abstract 13: Seq2Slate: Re-ranking and Slate in Machine Learning, Valko 05:20 PM Optimization with RNNs in Negative Gaussian processes (GP) are a popular Bayesian Dependence: Theory and Applications in approach for the optimization of black-box functions. Machine Learning, Meshi 04:10 PM Despite their effectiveness in simple problems, Ranking is a central task in machine learning and GP-based algorithms hardly scale to complex information retrieval. In this task, it is especially high-dimensional functions, as their per-iteration important to present the user with a slate of items time and space cost is at least quadratic in the that is appealing as a whole. This in turn requires number of dimensions $d$ and iterations~$t$. taking into account interactions between items, Given a set of $A$ alternative to choose from, the since intuitively, placing an item on the slate affects overall runtime $O(t^3A)$ quickly becomes the decision of which other items should be placed prohibitive. In this paper, we introduce BKB alongside it. In this work, we propose a (budgeted kernelized bandit), an approximate GP sequence-to-sequence model for ranking called algorithm for optimization under bandit feedback

Page 34 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 that achieves near-optimal regret (and hence cases of high-dimensional data. near-optimal convergence rate) with near-constant per-iteration complexity and no assumption on the input space or the GP's covariance. Understanding and Improving Generalization in Deep Learning Combining a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching Dilip Krishnan, Hossein Mobahi, Behnam technique (i.e., leverage score sampling), we prove Neyshabur, Peter Bartlett, Dawn Song, Nati that selecting inducing points based on their Srebro posterior variance gives an accurate low-rank approximation of the GP, preserving variance Grand Ballroom A, Fri Jun 14, 08:30 AM estimates and confidence intervals. As a The 1st workshop on Generalization in Deep consequence, BKB does not suffer from variance Networks: Theory and Practice will be held as part starvation, an important problem faced by many of ICML 2019. Generalization is one of the previous sparse GP approximations. Moreover, we fundamental problems of machine learning, and show that our procedure selects at most increasingly important as deep networks make their $\widetilde{O}(d_{eff})$ points, where $d_{eff}$ is presence felt in domains with big, small, noisy or the \emph{effective} dimension of the explored skewed data. This workshop will consider space, which is typically much smaller than both generalization from both theoretical and practical $d$ and $t$. This greatly reduces the dimensionality perspectives. We welcome contributions from of the paradigms such as representation learning, transfer problem, thus leading to a learning and reinforcement learning. The workshop $\widetilde{O}(TAd_{eff}^2)$ runtime and invites researchers to submit working papers in the $\widetilde{O}(A d_{eff})$ space complexity. following research areas: Abstract 17: Towards Efficient Evaluation of Risk via Herding in Negative Dependence: Theory Implicit regularization: the role of optimization and Applications in Machine Learning, Xu 05:40 algorithms in generalization PM Explicit regularization methods Network architecture choices that improve We introduce a novel use of herding to address the generalization problem of selecting samples from a large Empirical approaches to understanding unlabeled dataset to efficiently evaluate the risk of a generalization given model. Herding is an algorithm which Generalization bounds; empirical evaluation criteria elaborately draws samples to approximate the to evaluate bounds underlying distribution. We use herding to select the Robustness: generalizing to distributional shift a.k.a most informative samples and show that the loss dataset shift evaluated on $k$ samples produced by herding Generalization in the context of representation converges to the expected loss at a rate learning, transfer learning and deep reinforcement $\mathcal{O}(1/k)$, which is much faster than learning: definitions and empirical approaches $\mathcal{O}(1/\sqrt{k})$ for iid random sampling. We validate our analysis on both synthetic data and Schedule real data, and further explore the empirical performance of herding-based sampling in different

Page 35 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

08:30 12:10 Lunch and Poster Opening Remarks AM PM Session

Keynote by Dan Keynote by Roy: Progress on 01:30 Aleksander M■dry: 08:40 Madry Nonvacuous Roy PM Are All Features AM Generalization Created Equal? Bounds Keynote by Jason Keynote by Chelsea Lee: On the 09:20 Finn: Training for Finn Foundations of AM 02:00 Generalization Deep Learning: Lee PM SGD, A Meta-Analysis of 09:50 Overparametrization, Overfitting in AM and Generalization Machine Learning Towards Large Uniform 02:30 Scale Structure of convergence may 10:05 PM the Loss Landscape be unable to explain AM of Neural Networks generalization in deep learning Zero-Shot Learning from scratch: 10:20 Break and Poster 02:45 leveraging local AM Session PM compositional Keynote by Sham representations 10:40 Kakade: Prediction, Kakade 03:00 Break and Poster AM Learning, and PM Session Memory Panel Discussion Keynote by Mikhail (Nati Srebro, Dan 11:10 Belkin: A Hard Look Belkin 03:30 Roy, Chelsea Finn, Srebro, Roy, Finn, AM at Generalization PM Mikhail Belkin, Belkin, Madry, Lee and its Theories Aleksander M■dry, Towards Task and Jason Lee) 11:40 Architecture-Independent Overparameterization AM Generalization Gap without Overfitting: Predictors 04:30 Jacobian-based Data-Dependent PM Generalization Sample Complexity Guarantees for 11:55 of Deep Neural Neural Networks AM Networks via Lipschitz Augmentation

Page 36 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

How Learning Rate Technology: his dissertation on probabilistic and Delay Affect programming won an MIT EECS Sprowls Minima Selection in Dissertation Award. Daniel's group works on 04:45 AsynchronousTraining foundations of machine learning and statistics. PM of Neural Networks: Abstract 3: Keynote by Chelsea Finn: Training for Toward Closing the Generalization in Understanding and Improving Generalization Gap Generalization in Deep Learning, Finn 09:20 AM 05:00 Poster Session PM TBA.

Bio: Chelsea Finn is a research scientist at Google Abstracts (18): Brain, a post-doc at Berkeley AI Research Lab (BAIR), and will join the Stanford Computer Science Abstract 2: Keynote by Dan Roy: Progress on faculty in Fall 2019. Finn’s research studies how Nonvacuous Generalization Bounds in new algorithms can enable machines to acquire Understanding and Improving Generalization in intelligent behavior through learning and interaction, Deep Learning, Roy 08:40 AM allowing them to perform a variety of complex sensorimotor skills in real-world settings. She has Generalization bounds are one of the main tools developed deep learning algorithms for concurrently available for explaining the performance of learning learning visual perception and control in robotic algorithms. At the same time, most bounds in the manipulation skills, inverse reinforcement methods literature are loose to an extent that raises the for scalable acquisition of nonlinear reward question as to whether these bounds actually have functions, and meta-learning algorithms that can any explanatory power in the nonasymptotic regime enable fast, few-shot adaptation in both visual of actual machine learning practice. I'll report on perception and deep reinforcement learning. Finn’s progress towards developing bounds and research has been recognized through an NSF techniques---both statistical and graduate fellowship, the C.V. Ramamoorthy computational---aimed at closing the gap between Distinguished Research Award, and the Technology empirical performance and theoretical Review 35 Under 35 Award, and her work has been understanding. covered by various media outlets, including the New

York Times, Wired, and Bloomberg. With Sergey Bio: Daniel Roy is an Assistant Professor in the Levine and John Schulman, she also designed and Department of Statistical Sciences and, by courtesy, taught a course on deep reinforcement learning, Computer Science at the University of Toronto, and with thousands of followers online. a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of Finn received a PhD in Computer Science from UC an Ontario Early Researcher Award and Google Berkeley and a S.B. in Electrical Engineering and Faculty Research Award. Before joining U of T, Computer Science from MIT. Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Abstract 4: A Meta-Analysis of Overfitting in Fellowship at Emmanuel College, University of Machine Learning in Understanding and Cambridge. Daniel earned his S.B., M.Eng., and Improving Generalization in Deep Learning, Ph.D. from the Massachusetts Institute of 09:50 AM

Page 37 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Authors: Sara Fridovich-Keil, Moritz Hardt, John provably cannot ``explain generalization,'' even if we Miller, Ben Recht, Rebecca Roelofs, Ludwig take into account implicit regularization {\em to the Schmidt and Vaishaal Shankar fullest extent possible}. More precisely, even if we consider only the set of classifiers output by SGD Abstract: We conduct the first large meta-analysis of that have test errors less than some small overfitting due to test set reuse in the machine $\epsilon$, applying (two-sided) uniform learning community. Our analysis is based on over convergence on this set of classifiers yields a one hundred machine learning competitions hosted generalization guarantee that is larger than on the Kaggle platform over the course of several $1-\epsilon$ and is therefore nearly vacuous. years. In each competition, numerous practitioners repeatedly evaluated their progress against a Abstract 6: Break and Poster Session in holdout set that forms the basis of a public ranking Understanding and Improving Generalization in available throughout the competition. Performance Deep Learning, 10:20 AM on a separate test set used only once determined 1- Uniform convergence may be unable to explain the final ranking. By systematically comparing the generalization in deep learning. Vaishnavh public ranking with the final ranking, we assess how Nagarajan and J. Zico Kolter much participants adapted to the holdout set over 2- The effects of optimization on generalization in the course of a competition. Our longitudinal study infinitely wide neural networks. Anastasia Borovykh shows, somewhat surprisingly, little evidence of 3- Generalized Capsule Networks with Trainable substantial overfitting. These findings speak to the Routing Procedure. Zhenhua Chen, Chuhua Wang, robustness of the holdout method across different David Crandall and Tiancong Zhao data domains, loss functions, model classes, and 4- Implicit Regularization of Discrete Gradient human analysts. Dynamics in Deep Linear Neural Networks. Abstract 5: Uniform convergence may be unable Gauthier Gidel, Francis Bach and Simon to explain generalization in deep learning in Lacoste-Julien Understanding and Improving Generalization in 5- Stable Rank Normalization for Improved Deep Learning, 10:05 AM Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania Authors: Vaishnavh Nagarajan and J. Zico Kolter 6- On improving deep learning generalization with adaptive sparse connectivity. Shiwei Liu, Decebal Abstract: We cast doubt on the power of uniform Constantin Mocanu and Mykola Pechenizkiy convergence-based generalization bounds to 7- Identity Connections in Residual Nets Improve provide a complete picture of why Noise Stability. Shuzhi Yu and Carlo Tomasi overparameterized deep networks generalize well. 8- Factors for the Generalisation of Identity While it is well-known that many existing bounds are Relations by Neural Networks. Radha Manisha numerically large, through a variety of experiments, Kopparti and Tillman Weyde we first bring to light another crucial and more 9- Output-Constrained Bayesian Neural Networks. concerning aspect of these bounds: in practice, Wanqian Yang, Lars Lorch, Moritz A. Graule, these bounds can {\em increase} with the dataset Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, size. Guided by our observations, we then present Melanie F. Pradier and Finale Doshi-Velez examples of overparameterized linear classifiers 10- An Empirical Study on Hyperparameters and and neural networks trained by stochastic gradient their Interdependence for RL Generalization. descent (SGD) where uniform convergence Xingyou Song, Yilun Du and Jacob Jackson

Page 38 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

11- Towards Large Scale Structure of the Loss Neural Networks via Lipschitz Augmentation. Colin Landscape of Neural Networks. Stanislav Fort and Wei and Tengyu Ma Stanislaw Jastrzebski 25- Few-Shot Transfer Learning From Multiple 12- Detecting Extrapolation with Influence Pre-Trained Networks. Joshua Ka-Wing Lee, Functions. David Madras, James Atwood and Alex Prasanna Sattigeri and Gregory Wornell D'Amour 26- Uniform Stability and High Order Approximation 13- Towards Task and Architecture-Independent of SGLD in Non-Convex Learning. Mufan Li and Generalization Gap Predictors. Scott Yak, Hanna Maxime Gazeau Mazzawi and Javier Gonzalvo 27- Better Generalization with Adaptive Adversarial 14- SGD Picks a Stable Enough Trajectory. Training. Amit Despande, Sandesh Kamath and K V Stanis■aw Jastrz■bski and Stanislav Fort Subrahmanyam 15- MazeNavigator: A Customisable 3D Benchmark 28- Adversarial Training Generalizes Spectral Norm for Assessing Generalisation in Reinforcement Regularization. Kevin Roth, Yannic Kilcher and Learning. Luke Harries, Sebastian Lee, Jaroslaw Thomas Hofmann Rzepecki, Katja Hofmann and Sam Devlin 29- A Causal View on Robustness of Neural 16- Utilizing Eye Gaze to Enhance the Networks. Cheng Zhang and Yingzhen Li Generalization of Imitation Network to Unseen 30- Improving PAC-Bayes bounds for neural Environments. Congcong Liu, Yuying Chen, Lei Tai, networks using geometric properties of the training Ming Liu and Bertram Shi method. Anirbit Mukherjee, Dan Roy, Pushpendre 17- Investigating Under and Overfitting in Rastogi and Jun Yang Wasserstein Generative Adversarial Networks. Ben 31- An Analysis of the Effect of Invariance on Adlam, Charles Weill and Amol Kapoor Generalization in Neural Networks. Clare Lyle, 18- An Empirical Evaluation of Adversarial and Yarin Gal Robustness under Transfer Learning. Todor 32- Data-Dependent Mututal Information Bounds for Davchev, Timos Korres, Stathi Fotiadis, Nick SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Antonopoulos and Subramanian Ramamoorthy Dziugaite, Mahdi Haghifam and Ashish Khisti 19- On Adversarial Robustness of Small vs Large 33- Comparing normalization in conditional Batch Training. Sandesh Kamath, Amit Deshpande computation tasks. Vincent Michalski, Vikram Voleti, and K V Subrahmanyam Samira Ebrahimi Kahou, Anthony Ortiz, Pascal 20- The Principle of Unchanged Optimality in Vincent, Chris Pal and Doina Precup Reinforcement Learning Generalization. Xingyou 34- Weight and Batch Normalization implement Song and Alex Irpan Classical Generalization Bounds. Andrzej 21- On the Generalization Capability of Memory Banburski, Qianli Liao, Brando Miranda, Lorenzo Networks for Reasoning. Monireh Ebrahimi, Md Rosasco, Jack Hidary and Tomaso Poggio Kamruzzaman Sarker, Federico Bianchi, Ning Xie, 35- Increasing batch size through instance Aaron Eberhart, Derek Doran and Pascal Hitzler repetition improves generalization. Elad Hoffer, Tal 22- Visualizing How Embeddings Generalize. Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby and Daniel Soudry Stylianou and Robert Pless 36- Zero-Shot Learning from scratch: leveraging 23- Theoretical Analysis of the Fixup Initialization for local compositional representations. Tristan Sylvain, Fast Convergence and High Generalization Ability. Linda Petrini and Devon Hjelm Yasutaka Furusho and Kazushi Ikeda 37- Circuit-Based Intrinsic Methods to Detect 24- Data-Dependent Sample Complexity of Deep Overfitting. Satrajit Chatterjee and Alan Mishchenko

Page 39 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

38- Dimension Reduction Approach for Kernels. Alberto Bietti and Julien Mairal Interpretability of Sequence to SequenceRecurrent 52- PAC Bayes Bound Minimization via Kronecker Neural Networks. Kun Su and Eli Shlizerman Normalizing Flows. Chin-Wei Huang, Ahmed Touati, 39- Tight PAC-Bayesian generalization error Pascal Vincent, Gintare Karolina Dziugaite, bounds for deep learning. Guillermo Valle Perez, Alexandre Lacoste and Aaron Courville Chico Q. Camargo and Ard A. Louis 53- SGD on Neural Networks Learns Functions of 40- How Learning Rate and Delay Affect Minima Increasing Complexity. Preetum Nakkiran, Gal Selection in AsynchronousTraining of Neural Kaplun, Dimitris Kalimeris, Tristan Yang, Ben Networks: Toward Closing the Generalization Gap. Edelman, Fred Zhang and Boaz Barak Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and 54- Overparameterization without Overfitting: Daniel Soudry Jacobian-based Generalization Guarantees for 41- Making Convolutional Networks Shift-Invariant Neural Networks. Samet Oymak, Mingchen Li, Again. Richard Zhang Zalan Fabian and Mahdi Soltanolkotabi 42- A Meta-Analysis of Overfitting in Machine 55- Incorrect gradients and regularization: a Learning. Sara Fridovich-Keil, Moritz Hardt, John perspective of loss landscapes. Mehrdad Yazdani Miller, Ben Recht, Rebecca Roelofs, Ludwig 56- Divide and Conquer: Leveraging Intermediate Schmidt and Vaishaal Shankar Feature Representations for Quantized Training of 43- Kernelized Capsule Networks. Taylor Killian, Neural Networks. Ahmed Youssef, Prannoy Justin Goodwin, Olivia Brown and Sung Son Pilligundla and Hadi Esmaeilzadeh 44- Model similarity mitigates test set overuse. 57- SinReQ: Generalized Sinusoidal Regularization Moritz Hardt, Horia Mania, John Miller, Ben Recht for Low-Bitwidth Deep Quantized Training. Ahmed and Ludwig Schmidt Youssef, Prannoy Pilligundla and Hadi 45- Understanding Generalization of Deep Neural Esmaeilzadeh Networks Trained with Noisy Labels. Wei Hu, 58- Natural Adversarial Examples. Dan Hendrycks, Zhiyuan Li and Dingli Yu Kevin Zhao, Steven Basart, Jacob Steinhardt and 46- Domainwise Classification Network for Dawn Song Unsupervised Domain Adaptation. Seonguk Seo, 59- On the Properties of the Objective Landscapes Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun and Generalization of Gradient-Based You, Woong-Gi Chang and Suha Kwak Meta-Learning. Simon Guiroy, Vikas Verma and 47- The Generalization-Stability Tradeoff in Neural Christopher Pal Network Pruning. Brian Bartoldson, Ari Morcos, 60- Angular Visual Hardness. Beidi Chen, Weiyang Adrian Barbu and Gordon Erlebacher Liu, Animesh Garg, Zhiding Yu, Anshumali 48- Information matrices and generalization. Shrivastava and Animashree Anandkumar Valentin Thomas, Fabian Pedregosa, Bart van 61- Luck Matters: Understanding Training Dynamics Merriënboer, Pierre-Antoine Manzagol, Yoshua of Deep ReLU Networks. Yuandong Tian, Tina Bengio and Nicolas Le Roux Jiang, Qucheng Gong and Ari Morcos 49- Adaptively Preconditioned Stochastic Gradient 62- Understanding of Generalization in Deep Langevin Dynamics. Chandrasekaran Anirudh Learning via Tensor Methods. Jingling Li, Yanchao Bhardwaj Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 50- Additive or Concatenating Skip-connections 63- Learning from Rules Performs as Implicit Improve Data Separability. Yasutaka Furusho and Regularization. Hossein Hosseini, Ramin Moslemi, Kazushi Ikeda Ali Hooshmand and Ratnesh Sharma 51- On the Inductive Bias of Neural Tangent 64- Stochastic Mirror Descent on

Page 40 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Overparameterized Nonlinear Models: focusing on designing provable and practical Convergence, Implicit Regularization, and statistically and computationally efficient algorithms. Generalization. Navid Azizan, Sahin Lale and Amongst his contributions, with a diverse set of Babak Hassibi collaborators, are: establishing principled 65- Scaling Characteristics of Sequential Multitask approaches in reinforcement learning (including the Learning: Networks Naturally Learn to Learn. Guy natural policy gradient, conservative policy iteration, Davidson and Michael Mozer and the PAC-MDP framework); optimal algorithms 66- Size-free generalization bounds for in the stochastic and non-stochastic multi-armed convolutional neural networks. Phillip Long and bandit problems (including the widely used linear Hanie Sedghi bandit and the Gaussian process bandit models); computationally and statistically efficient tensor Abstract 7: Keynote by Sham Kakade: Prediction, decomposition methods for estimation of latent Learning, and Memory in Understanding and variable models (including estimation of mixture of Improving Generalization in Deep Learning, Gaussians, latent Dirichlet allocation, hidden Kakade 10:40 AM Markov models, and overlapping communities in social networks); faster algorithms for large scale Building accurate language models that capture convex and nonconvex optimization (including how meaningful long-term dependencies is a core to escape from saddle points efficiently). He is the challenge in language processing. We consider the recipient of the IBM Goldberg best paper award (in problem of predicting the next observation given a 2007) for contributions to fast nearest neighbor sequence of past observations, specifically focusing search and the best paper, INFORMS Revenue on the question of how to make accurate predictions Management and Pricing Section Prize (2014). He that explicitly leverage long-range dependencies. has been program chair for COLT 2011. Empirically, and perhaps surprisingly, we show that state-of-the-art language models, including LSTMs Sham completed his Ph.D. at the Gatsby and Transformers, do not capture even basic Computational Neuroscience Unit at University properties of natural language: the entropy rates of College London, under the supervision of Peter their generations drift dramatically upward over Dayan, and he was a postdoc at the Dept. of time. We also provide provable methods to mitigate Computer Science, University of Pennsylvania , this phenomenon: specifically, we provide a under the supervision of Michael Kearns. Sham was calibration-based approach to improve an estimated an undergraduate at Caltech , studying in physics model based on any measurable long-term under the supervision of John Preskill. Sham has mismatch between the estimated model and the been a Principal Research Scientist at Microsoft true underlying generative distribution. More Research, New England, an associate professor at generally, we will also present fundamental the Department of Statistics, Wharton, UPenn, and information theoretic and computational limits of an assistant professor at the Toyota Technological sequential prediction with a memory. Institute at Chicago.

Bio: Sham Kakade is a Washington Research Abstract 8: Keynote by Mikhail Belkin: A Hard Foundation Data Science Chair, with a joint Look at Generalization and its Theories in appointment in the Department of Computer Understanding and Improving Generalization in Science and the Department of Statistics at the Deep Learning, Belkin 11:10 AM University of Washington. He works on the theoretical foundations of machine learning,

Page 41 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

"A model with zero training error is overfit to the number of best paper and other awards and has training data and will typically generalize poorly" served on the editorial boards of the Journal of goes statistical textbook wisdom. Yet in modern Machine Learning Research and IEEE PAMI. practice over-parametrized deep networks with near perfect (interpolating) fit on training data still show Abstract 9: Towards Task and excellent test performance. This fact is difficult to Architecture-Independent Generalization Gap reconcile with most modern theories of Predictors in Understanding and Improving generalization that rely on bounding the difference Generalization in Deep Learning, 11:40 AM between the empirical and expected error. Indeed, Authors: Scott Yak, Hanna Mazzawi and Javier as we will discuss, bounds of that type cannot be Gonzalvo expected to explain generalization of interpolating models. I will proceed to show how classical and Abstract: Can we use deep learning to predict when modern models can be unified within a new "double deep learning works? Our results suggest the descent" risk curve that extends the usual U-shaped affirmative. We created a dataset by training 13,500 bias-variance trade-off curve beyond the point of neural networks with different architectures, on interpolation. This curve delimits the regime of different variations of spiral datasets, and using applicability of classical bounds and the regime different optimization parameters. We used this where new analyses are required. I will give dataset to train task-independent and examples of first theoretical analyses in that modern architecture-independent generalization gap regime and discuss the (considerable) gaps in our predictors for those neural networks. We extend knowledge. Finally I will briefly discuss some Jiang et al.(2018) to also use DNNs and RNNs and implications for optimization. show that they outperform the linear model,

obtaining R^2=0.965. We also show results for Bio: Mikhail Belkin is a Professor in the departments architecture-independent, task-independent, and of Computer Science and Engineering and Statistics out-of-distribution generalization gap prediction at the Ohio State University. He received a PhD in tasks. Both DNNs and RNNs consistently and mathematics from the University of Chicago in significantly outperform linear models, with RNNs 2003. His research focuses on understanding the obtaining R^2=0.584. fundamental structure in data, the principles of recovering these structures and their computational, Abstract 10: Data-Dependent Sample Complexity mathematical and statistical properties. This of Deep Neural Networks via Lipschitz understanding, in turn, leads to algorithms for Augmentation in Understanding and Improving dealing with real-world data. His work includes Generalization in Deep Learning, 11:55 AM algorithms such as Laplacian Eigenmaps and Manifold Regularization based on ideas of classical Authors: Colin Wei and Tengyu Ma differential geometry, which have been widely used for analyzing non-linear high-dimensional data. He Abstract: Existing Rademacher complexity bounds has done work on spectral methods, Gaussian for neural networks rely only on norm control of the mixture models, kernel methods and applications. weight matrices and depend exponentially on depth Recently his work has been focussed on via a product of the matrix norms. Lower bounds understanding generalization and optimization in show that this exponential dependence on depth is modern over-parametrized machine learning. Prof. unavoidable when no additional properties of the Belkin is a recipient of an NSF Career Award and a training data are considered. We suspect that this

Page 42 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 conundrum comes from the fact that these bounds Noise Stability. Shuzhi Yu and Carlo Tomasi depend on the training data only through the 8- Factors for the Generalisation of Identity margin. In practice, many data-dependent Relations by Neural Networks. Radha Manisha techniques such as Batchnorm improve the Kopparti and Tillman Weyde generalization performance. For feedforward neural 9- Output-Constrained Bayesian Neural Networks. nets as well as RNNs, we obtain tighter Wanqian Yang, Lars Lorch, Moritz A. Graule, Rademacher complexity bounds by considering Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, additional data-dependent properties of the network: Melanie F. Pradier and Finale Doshi-Velez the norms of the hidden layers of the network, and 10- An Empirical Study on Hyperparameters and the norms of the Jacobians of each layer with their Interdependence for RL Generalization. respect to the previous layers. Our bounds scale Xingyou Song, Yilun Du and Jacob Jackson polynomially in depth when these empirical 11- Towards Large Scale Structure of the Loss quantities are small, as is usually the case in Landscape of Neural Networks. Stanislav Fort and practice. To obtain these bounds, we develop Stanislaw Jastrzebski general tools for augmenting a sequence of 12- Detecting Extrapolation with Influence functions to make their composition Lipschitz and Functions. David Madras, James Atwood and Alex then covering the augmented functions. Inspired by D'Amour our theory, we directly regularize the network's 13- Towards Task and Architecture-Independent Jacobians during training and empirically Generalization Gap Predictors. Scott Yak, Hanna demonstrate that this improves test performance. Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Abstract 11: Lunch and Poster Session in Stanis■aw Jastrz■bski and Stanislav Fort Understanding and Improving Generalization in 15- MazeNavigator: A Customisable 3D Benchmark Deep Learning, 12:10 PM for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw 1- Uniform convergence may be unable to explain Rzepecki, Katja Hofmann and Sam Devlin generalization in deep learning. Vaishnavh 16- Utilizing Eye Gaze to Enhance the Nagarajan and J. Zico Kolter Generalization of Imitation Network to Unseen 2- The effects of optimization on generalization in Environments. Congcong Liu, Yuying Chen, Lei Tai, infinitely wide neural networks. Anastasia Borovykh Ming Liu and Bertram Shi 3- Generalized Capsule Networks with Trainable 17- Investigating Under and Overfitting in Routing Procedure. Zhenhua Chen, Chuhua Wang, Wasserstein Generative Adversarial Networks. Ben David Crandall and Tiancong Zhao Adlam, Charles Weill and Amol Kapoor 4- Implicit Regularization of Discrete Gradient 18- An Empirical Evaluation of Adversarial Dynamics in Deep Linear Neural Networks. Robustness under Transfer Learning. Todor Gauthier Gidel, Francis Bach and Simon Davchev, Timos Korres, Stathi Fotiadis, Nick Lacoste-Julien Antonopoulos and Subramanian Ramamoorthy 5- Stable Rank Normalization for Improved 19- On Adversarial Robustness of Small vs Large Generalization in Neural Networks. Amartya Sanyal, Batch Training. Sandesh Kamath, Amit Deshpande Philip H Torr and Puneet K Dokania and K V Subrahmanyam 6- On improving deep learning generalization with 20- The Principle of Unchanged Optimality in adaptive sparse connectivity. Shiwei Liu, Decebal Reinforcement Learning Generalization. Xingyou Constantin Mocanu and Mykola Pechenizkiy Song and Alex Irpan 7- Identity Connections in Residual Nets Improve

Page 43 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

21- On the Generalization Capability of Memory Banburski, Qianli Liao, Brando Miranda, Lorenzo Networks for Reasoning. Monireh Ebrahimi, Md Rosasco, Jack Hidary and Tomaso Poggio Kamruzzaman Sarker, Federico Bianchi, Ning Xie, 35- Increasing batch size through instance Aaron Eberhart, Derek Doran and Pascal Hitzler repetition improves generalization. Elad Hoffer, Tal 22- Visualizing How Embeddings Generalize. Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby and Daniel Soudry Stylianou and Robert Pless 36- Zero-Shot Learning from scratch: leveraging 23- Theoretical Analysis of the Fixup Initialization for local compositional representations. Tristan Sylvain, Fast Convergence and High Generalization Ability. Linda Petrini and Devon Hjelm Yasutaka Furusho and Kazushi Ikeda 37- Circuit-Based Intrinsic Methods to Detect 24- Data-Dependent Sample Complexity of Deep Overfitting. Satrajit Chatterjee and Alan Mishchenko Neural Networks via Lipschitz Augmentation. Colin 38- Dimension Reduction Approach for Wei and Tengyu Ma Interpretability of Sequence to SequenceRecurrent 25- Few-Shot Transfer Learning From Multiple Neural Networks. Kun Su and Eli Shlizerman Pre-Trained Networks. Joshua Ka-Wing Lee, 39- Tight PAC-Bayesian generalization error Prasanna Sattigeri and Gregory Wornell bounds for deep learning. Guillermo Valle Perez, 26- Uniform Stability and High Order Approximation Chico Q. Camargo and Ard A. Louis of SGLD in Non-Convex Learning. Mufan Li and 40- How Learning Rate and Delay Affect Minima Maxime Gazeau Selection in AsynchronousTraining of Neural 27- Better Generalization with Adaptive Adversarial Networks: Toward Closing the Generalization Gap. Training. Amit Despande, Sandesh Kamath and K V Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Subrahmanyam Daniel Soudry 28- Adversarial Training Generalizes Spectral Norm 41- Making Convolutional Networks Shift-Invariant Regularization. Kevin Roth, Yannic Kilcher and Again. Richard Zhang Thomas Hofmann 42- A Meta-Analysis of Overfitting in Machine 29- A Causal View on Robustness of Neural Learning. Sara Fridovich-Keil, Moritz Hardt, John Networks. Cheng Zhang and Yingzhen Li Miller, Ben Recht, Rebecca Roelofs, Ludwig 30- Improving PAC-Bayes bounds for neural Schmidt and Vaishaal Shankar networks using geometric properties of the training 43- Kernelized Capsule Networks. Taylor Killian, method. Anirbit Mukherjee, Dan Roy, Pushpendre Justin Goodwin, Olivia Brown and Sung Son Rastogi and Jun Yang 44- Model similarity mitigates test set overuse. 31- An Analysis of the Effect of Invariance on Moritz Hardt, Horia Mania, John Miller, Ben Recht Generalization in Neural Networks. Clare Lyle, and Ludwig Schmidt Marta Kwiatkowska and Yarin Gal 45- Understanding Generalization of Deep Neural 32- Data-Dependent Mututal Information Bounds for Networks Trained with Noisy Labels. Wei Hu, SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Zhiyuan Li and Dingli Yu Dziugaite, Mahdi Haghifam and Ashish Khisti 46- Domainwise Classification Network for 33- Comparing normalization in conditional Unsupervised Domain Adaptation. Seonguk Seo, computation tasks. Vincent Michalski, Vikram Voleti, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun Samira Ebrahimi Kahou, Anthony Ortiz, Pascal You, Woong-Gi Chang and Suha Kwak Vincent, Chris Pal and Doina Precup 47- The Generalization-Stability Tradeoff in Neural 34- Weight and Batch Normalization implement Network Pruning. Brian Bartoldson, Ari Morcos, Classical Generalization Bounds. Andrzej Adrian Barbu and Gordon Erlebacher

Page 44 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

48- Information matrices and generalization. Shrivastava and Animashree Anandkumar Valentin Thomas, Fabian Pedregosa, Bart van 61- Luck Matters: Understanding Training Dynamics Merriënboer, Pierre-Antoine Manzagol, Yoshua of Deep ReLU Networks. Yuandong Tian, Tina Bengio and Nicolas Le Roux Jiang, Qucheng Gong and Ari Morcos 49- Adaptively Preconditioned Stochastic Gradient 62- Understanding of Generalization in Deep Langevin Dynamics. Chandrasekaran Anirudh Learning via Tensor Methods. Jingling Li, Yanchao Bhardwaj Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 50- Additive or Concatenating Skip-connections 63- Learning from Rules Performs as Implicit Improve Data Separability. Yasutaka Furusho and Regularization. Hossein Hosseini, Ramin Moslemi, Kazushi Ikeda Ali Hooshmand and Ratnesh Sharma 51- On the Inductive Bias of Neural Tangent 64- Stochastic Mirror Descent on Kernels. Alberto Bietti and Julien Mairal Overparameterized Nonlinear Models: 52- PAC Bayes Bound Minimization via Kronecker Convergence, Implicit Regularization, and Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Generalization. Navid Azizan, Sahin Lale and Pascal Vincent, Gintare Karolina Dziugaite, Babak Hassibi Alexandre Lacoste and Aaron Courville 65- Scaling Characteristics of Sequential Multitask 53- SGD on Neural Networks Learns Functions of Learning: Networks Naturally Learn to Learn. Guy Increasing Complexity. Preetum Nakkiran, Gal Davidson and Michael Mozer Kaplun, Dimitris Kalimeris, Tristan Yang, Ben 66- Size-free generalization bounds for Edelman, Fred Zhang and Boaz Barak convolutional neural networks. Phillip Long and 54- Overparameterization without Overfitting: Hanie Sedghi Jacobian-based Generalization Guarantees for ■ Neural Networks. Samet Oymak, Mingchen Li, Abstract 12: Keynote by Aleksander M dry: Are Zalan Fabian and Mahdi Soltanolkotabi All Features Created Equal? in Understanding 55- Incorrect gradients and regularization: a and Improving Generalization in Deep Learning, perspective of loss landscapes. Mehrdad Yazdani Madry 01:30 PM 56- Divide and Conquer: Leveraging Intermediate TBA. Feature Representations for Quantized Training of Neural Networks. Ahmed Youssef, Prannoy Bio: Aleksander M■dry is an Associate Professor of Pilligundla and Hadi Esmaeilzadeh Computer Science in the MIT EECS Department 57- SinReQ: Generalized Sinusoidal Regularization and a principal investigator in the MIT Computer for Low-Bitwidth Deep Quantized Training. Ahmed Science and Artificial Intelligence Laboratory Youssef, Prannoy Pilligundla and Hadi (CSAIL). He received his PhD from MIT in 2011 Esmaeilzadeh and, prior to joining the MIT faculty, he spent some 58- Natural Adversarial Examples. Dan Hendrycks, time at Microsoft Research New England and on the Kevin Zhao, Steven Basart, Jacob Steinhardt and faculty of EPFL. Dawn Song 59- On the Properties of the Objective Landscapes Aleksander’s research interests span algorithms, and Generalization of Gradient-Based continuous optimization, science of deep learning Meta-Learning. Simon Guiroy, Vikas Verma and and understanding machine learning from a Christopher Pal robustness perspective. 60- Angular Visual Hardness. Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali

Page 45 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Abstract 13: Keynote by Jason Lee: On the Abstract 14: Towards Large Scale Structure of Foundations of Deep Learning: SGD, the Loss Landscape of Neural Networks in Overparametrization, and Generalization in Understanding and Improving Generalization in Understanding and Improving Generalization in Deep Learning, 02:30 PM Deep Learning, Lee 02:00 PM Authors: Stanislav Fort and Stanislaw Jastrzebski Deep Learning has had phenomenal empirical successes in many domains including computer Abstract: There are many surprising and perhaps vision, natural language processing, and speech counter-intuitive properties of optimization of deep recognition. To consolidate and boost the empirical neural networks. We propose and experimentally success, we need to develop a more systematic verify a unified phenomenological model of the loss and deeper understanding of the elusive principles landscape that incorporates many of them. Our core of deep learning. In this talk, I will provide analysis idea is to model the loss landscape as a set of high of several elements of deep learning including dimensional \emph{sheets} that together form a non-convex optimization, overparametrization, and distributed, large-scale, inter-connected structure. generalization error. First, we show that gradient For instance, we predict an existence of low loss descent and many other algorithms are guaranteed subspaces connecting a set (not only a pair) of to converge to a local minimizer of the loss. For solutions, and verify it experimentally. We conclude several interesting problems including the matrix by showing that hyperparameter choices such as completion problem, this guarantees that we learning rate, batch size, dropout and $L_2$ converge to a global minimum. Then we will show regularization, affect the path optimizer takes that gradient descent converges to a global through the landscape in a similar way. minimizer for deep overparametrized networks. Finally, we analyze the generalization error by Abstract 15: Zero-Shot Learning from scratch: showing that a subtle combination of SGD, logistic leveraging local compositional representations loss, and architecture combine to promote large in Understanding and Improving Generalization margin classifiers, which are guaranteed to have in Deep Learning, 02:45 PM low generalization error. Together, these results Authors: Tristan Sylvain, Linda Petrini and Devon show that on overparametrized deep networks SGD Hjelm finds solution of both low train and test error.

Abstract: Zero-shot classification is a task focused Bio: Jason Lee is an assistant professor in Data on generalization where no instance from the target Sciences and Operations at the University of classes is seen during training. To allow for Southern California. Prior to that, he was a test-time transfer, each class is annotated with postdoctoral researcher at UC Berkeley working semantic information, commonly in the form of with Michael Jordan. Jason received his PhD at attributes or text descriptions. While classical Stanford University advised by Trevor Hastie and zero-shot learning does not specify how this Jonathan Taylor. His research interests are in problem should be solved, the most successful statistics, machine learning, and optimization. approaches rely on features extracted from Lately, he has worked on high dimensional encoders pre-trained on large datasets, most statistical inference, analysis of non-convex commonly Imagenet. This approach raises optimization algorithms, and theory for deep important questions that might otherwise distract learning. researchers from answering fundamental questions

Page 46 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 about representation learning and generalization. their Interdependence for RL Generalization. For instance, one should wonder to what extent Xingyou Song, Yilun Du and Jacob Jackson these methods actually learn representations robust 11- Towards Large Scale Structure of the Loss with respect to the task, rather than simply Landscape of Neural Networks. Stanislav Fort and exploiting information stored in the encoder. To Stanislaw Jastrzebski remove these distractors, we propose a more 12- Detecting Extrapolation with Influence challenging setting: Zero-Shot Learning from Functions. David Madras, James Atwood and Alex scratch, which effectively forbids the use encoders D'Amour fine-tuned on other datasets. Our analysis on this 13- Towards Task and Architecture-Independent setting highlights the importance of local Generalization Gap Predictors. Scott Yak, Hanna information, and compositional representations. Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Abstract 16: Break and Poster Session in Stanis■aw Jastrz■bski and Stanislav Fort Understanding and Improving Generalization in 15- MazeNavigator: A Customisable 3D Benchmark Deep Learning, 03:00 PM for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw 1- Uniform convergence may be unable to explain Rzepecki, Katja Hofmann and Sam Devlin generalization in deep learning. Vaishnavh 16- Utilizing Eye Gaze to Enhance the Nagarajan and J. Zico Kolter Generalization of Imitation Network to Unseen 2- The effects of optimization on generalization in Environments. Congcong Liu, Yuying Chen, Lei Tai, infinitely wide neural networks. Anastasia Borovykh Ming Liu and Bertram Shi 3- Generalized Capsule Networks with Trainable 17- Investigating Under and Overfitting in Routing Procedure. Zhenhua Chen, Chuhua Wang, Wasserstein Generative Adversarial Networks. Ben David Crandall and Tiancong Zhao Adlam, Charles Weill and Amol Kapoor 4- Implicit Regularization of Discrete Gradient 18- An Empirical Evaluation of Adversarial Dynamics in Deep Linear Neural Networks. Robustness under Transfer Learning. Todor Gauthier Gidel, Francis Bach and Simon Davchev, Timos Korres, Stathi Fotiadis, Nick Lacoste-Julien Antonopoulos and Subramanian Ramamoorthy 5- Stable Rank Normalization for Improved 19- On Adversarial Robustness of Small vs Large Generalization in Neural Networks. Amartya Sanyal, Batch Training. Sandesh Kamath, Amit Deshpande Philip H Torr and Puneet K Dokania and K V Subrahmanyam 6- On improving deep learning generalization with 20- The Principle of Unchanged Optimality in adaptive sparse connectivity. Shiwei Liu, Decebal Reinforcement Learning Generalization. Xingyou Constantin Mocanu and Mykola Pechenizkiy Song and Alex Irpan 7- Identity Connections in Residual Nets Improve 21- On the Generalization Capability of Memory Noise Stability. Shuzhi Yu and Carlo Tomasi Networks for Reasoning. Monireh Ebrahimi, Md 8- Factors for the Generalisation of Identity Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Relations by Neural Networks. Radha Manisha Aaron Eberhart, Derek Doran and Pascal Hitzler Kopparti and Tillman Weyde 22- Visualizing How Embeddings Generalize. 9- Output-Constrained Bayesian Neural Networks. Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Wanqian Yang, Lars Lorch, Moritz A. Graule, Stylianou and Robert Pless Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, 23- Theoretical Analysis of the Fixup Initialization for Melanie F. Pradier and Finale Doshi-Velez Fast Convergence and High Generalization Ability. 10- An Empirical Study on Hyperparameters and

Page 47 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Yasutaka Furusho and Kazushi Ikeda 37- Circuit-Based Intrinsic Methods to Detect 24- Data-Dependent Sample Complexity of Deep Overfitting. Satrajit Chatterjee and Alan Mishchenko Neural Networks via Lipschitz Augmentation. Colin 38- Dimension Reduction Approach for Wei and Tengyu Ma Interpretability of Sequence to SequenceRecurrent 25- Few-Shot Transfer Learning From Multiple Neural Networks. Kun Su and Eli Shlizerman Pre-Trained Networks. Joshua Ka-Wing Lee, 39- Tight PAC-Bayesian generalization error Prasanna Sattigeri and Gregory Wornell bounds for deep learning. Guillermo Valle Perez, 26- Uniform Stability and High Order Approximation Chico Q. Camargo and Ard A. Louis of SGLD in Non-Convex Learning. Mufan Li and 40- How Learning Rate and Delay Affect Minima Maxime Gazeau Selection in AsynchronousTraining of Neural 27- Better Generalization with Adaptive Adversarial Networks: Toward Closing the Generalization Gap. Training. Amit Despande, Sandesh Kamath and K V Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Subrahmanyam Daniel Soudry 28- Adversarial Training Generalizes Spectral Norm 41- Making Convolutional Networks Shift-Invariant Regularization. Kevin Roth, Yannic Kilcher and Again. Richard Zhang Thomas Hofmann 42- A Meta-Analysis of Overfitting in Machine 29- A Causal View on Robustness of Neural Learning. Sara Fridovich-Keil, Moritz Hardt, John Networks. Cheng Zhang and Yingzhen Li Miller, Ben Recht, Rebecca Roelofs, Ludwig 30- Improving PAC-Bayes bounds for neural Schmidt and Vaishaal Shankar networks using geometric properties of the training 43- Kernelized Capsule Networks. Taylor Killian, method. Anirbit Mukherjee, Dan Roy, Pushpendre Justin Goodwin, Olivia Brown and Sung Son Rastogi and Jun Yang 44- Model similarity mitigates test set overuse. 31- An Analysis of the Effect of Invariance on Moritz Hardt, Horia Mania, John Miller, Ben Recht Generalization in Neural Networks. Clare Lyle, and Ludwig Schmidt Marta Kwiatkowska and Yarin Gal 45- Understanding Generalization of Deep Neural 32- Data-Dependent Mututal Information Bounds for Networks Trained with Noisy Labels. Wei Hu, SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Zhiyuan Li and Dingli Yu Dziugaite, Mahdi Haghifam and Ashish Khisti 46- Domainwise Classification Network for 33- Comparing normalization in conditional Unsupervised Domain Adaptation. Seonguk Seo, computation tasks. Vincent Michalski, Vikram Voleti, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun Samira Ebrahimi Kahou, Anthony Ortiz, Pascal You, Woong-Gi Chang and Suha Kwak Vincent, Chris Pal and Doina Precup 47- The Generalization-Stability Tradeoff in Neural 34- Weight and Batch Normalization implement Network Pruning. Brian Bartoldson, Ari Morcos, Classical Generalization Bounds. Andrzej Adrian Barbu and Gordon Erlebacher Banburski, Qianli Liao, Brando Miranda, Lorenzo 48- Information matrices and generalization. Rosasco, Jack Hidary and Tomaso Poggio Valentin Thomas, Fabian Pedregosa, Bart van 35- Increasing batch size through instance Merriënboer, Pierre-Antoine Manzagol, Yoshua repetition improves generalization. Elad Hoffer, Tal Bengio and Nicolas Le Roux Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler 49- Adaptively Preconditioned Stochastic Gradient and Daniel Soudry Langevin Dynamics. Chandrasekaran Anirudh 36- Zero-Shot Learning from scratch: leveraging Bhardwaj local compositional representations. Tristan Sylvain, 50- Additive or Concatenating Skip-connections Linda Petrini and Devon Hjelm Improve Data Separability. Yasutaka Furusho and

Page 48 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Kazushi Ikeda Ali Hooshmand and Ratnesh Sharma 51- On the Inductive Bias of Neural Tangent 64- Stochastic Mirror Descent on Kernels. Alberto Bietti and Julien Mairal Overparameterized Nonlinear Models: 52- PAC Bayes Bound Minimization via Kronecker Convergence, Implicit Regularization, and Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Generalization. Navid Azizan, Sahin Lale and Pascal Vincent, Gintare Karolina Dziugaite, Babak Hassibi Alexandre Lacoste and Aaron Courville 65- Scaling Characteristics of Sequential Multitask 53- SGD on Neural Networks Learns Functions of Learning: Networks Naturally Learn to Learn. Guy Increasing Complexity. Preetum Nakkiran, Gal Davidson and Michael Mozer Kaplun, Dimitris Kalimeris, Tristan Yang, Ben 66- Size-free generalization bounds for Edelman, Fred Zhang and Boaz Barak convolutional neural networks. Phillip Long and 54- Overparameterization without Overfitting: Hanie Sedghi Jacobian-based Generalization Guarantees for Neural Networks. Samet Oymak, Mingchen Li, Abstract 18: Overparameterization without Zalan Fabian and Mahdi Soltanolkotabi Overfitting: Jacobian-based Generalization 55- Incorrect gradients and regularization: a Guarantees for Neural Networks in perspective of loss landscapes. Mehrdad Yazdani Understanding and Improving Generalization in 56- Divide and Conquer: Leveraging Intermediate Deep Learning, 04:30 PM Feature Representations for Quantized Training of Authors: Samet Oymak, Mingchen Li, Zalan Fabian Neural Networks. Ahmed Youssef, Prannoy and Mahdi Soltanolkotabi Pilligundla and Hadi Esmaeilzadeh

57- SinReQ: Generalized Sinusoidal Regularization Abstract: Many modern neural network for Low-Bitwidth Deep Quantized Training. Ahmed architectures contain many more parameters than Youssef, Prannoy Pilligundla and Hadi the size of the training data. Such networks can Esmaeilzadeh easily overfit to training data, hence it is crucial to 58- Natural Adversarial Examples. Dan Hendrycks, understand the fundamental principles that facilitate Kevin Zhao, Steven Basart, Jacob Steinhardt and good test accuracy. This paper explores the Dawn Song generalization capabilities of neural networks 59- On the Properties of the Objective Landscapes trained via gradient descent. We show that the and Generalization of Gradient-Based Jacobian matrix associated with the network Meta-Learning. Simon Guiroy, Vikas Verma and dictates the directions where learning is Christopher Pal generalizable and fast versus directions where 60- Angular Visual Hardness. Beidi Chen, Weiyang overfitting occurs and learning is slow. We develop Liu, Animesh Garg, Zhiding Yu, Anshumali a bias-variance theory which provides a control Shrivastava and Animashree Anandkumar knob to split the Jacobian spectum into 61- Luck Matters: Understanding Training Dynamics ``information" and ``nuisance" spaces associated of Deep ReLU Networks. Yuandong Tian, Tina with the large and small singular values of the Jiang, Qucheng Gong and Ari Morcos Jacobian. We show that (i) over the information 62- Understanding of Generalization in Deep space learning is fast and we can quickly train a Learning via Tensor Methods. Jingling Li, Yanchao model with zero training loss that can also Sun, Ziyin Liu, Taiji Suzuki and Furong Huang generalize well, (ii) over the nuisance subspace 63- Learning from Rules Performs as Implicit overfitting might result in higher variance hence Regularization. Hossein Hosseini, Ramin Moslemi, early stopping can help with generalization at the

Page 49 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 expense of some bias. We conduct numerical Abstract 20: Poster Session in Understanding experiments on deep networks that corroborate out and Improving Generalization in Deep Learning, theory and demonstrate that: (i) the Jacobian of 05:00 PM typical networks exhibit a bimodal structure with a few large singular values and many small ones 1- Uniform convergence may be unable to explain leading to a low-dimensional information space (ii) generalization in deep learning. Vaishnavh most of the useful information lies on the Nagarajan and J. Zico Kolter information space where learning happens quickly. 2- The effects of optimization on generalization in infinitely wide neural networks. Anastasia Borovykh Abstract 19: How Learning Rate and Delay Affect 3- Generalized Capsule Networks with Trainable Minima Selection in AsynchronousTraining of Routing Procedure. Zhenhua Chen, Chuhua Wang, Neural Networks: Toward Closing the David Crandall and Tiancong Zhao Generalization Gap in Understanding and 4- Implicit Regularization of Discrete Gradient Improving Generalization in Deep Learning, Dynamics in Deep Linear Neural Networks. 04:45 PM Gauthier Gidel, Francis Bach and Simon Lacoste-Julien Authors: Niv Giladi, Mor Shpigel Nacson, Elad 5- Stable Rank Normalization for Improved Hoffer and Daniel Soudry Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania Abstract: Background: Recent developments have 6- On improving deep learning generalization with made it possible to accelerate neural networks adaptive sparse connectivity. Shiwei Liu, Decebal training significantly using large batch sizes and Constantin Mocanu and Mykola Pechenizkiy data parallelism. Training in an asynchronous 7- Identity Connections in Residual Nets Improve fashion, where delay occurs, can make training Noise Stability. Shuzhi Yu and Carlo Tomasi even more scalable. However, asynchronous 8- Factors for the Generalisation of Identity training has its pitfalls, mainly a degradation in Relations by Neural Networks. Radha Manisha generalization, even after convergence of the Kopparti and Tillman Weyde algorithm. This gap remains not well understood, as 9- Output-Constrained Bayesian Neural Networks. theoretical analysis so far mainly focused on the Wanqian Yang, Lars Lorch, Moritz A. Graule, convergence rate of asynchronous methods. Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Contributions: We examine asynchronous training Melanie F. Pradier and Finale Doshi-Velez from the perspective of dynamical stability. We find 10- An Empirical Study on Hyperparameters and that the degree of delay interacts with the learning their Interdependence for RL Generalization. rate, to change the set of minima accessible by an Xingyou Song, Yilun Du and Jacob Jackson asynchronous stochastic gradient descent 11- Towards Large Scale Structure of the Loss algorithm. We derive closed-form rules on how the Landscape of Neural Networks. Stanislav Fort and hyperparameters could be changed while keeping Stanislaw Jastrzebski the accessible set the same. Specifically, for high 12- Detecting Extrapolation with Influence delay values, we find that the learning rate should Functions. David Madras, James Atwood and Alex be decreased inversely with the delay, and discuss D'Amour the effect of momentum. We provide empirical 13- Towards Task and Architecture-Independent experiments to validate our theoretical findings Generalization Gap Predictors. Scott Yak, Hanna Mazzawi and Javier Gonzalvo

Page 50 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

14- SGD Picks a Stable Enough Trajectory. Training. Amit Despande, Sandesh Kamath and K V Stanis■aw Jastrz■bski and Stanislav Fort Subrahmanyam 15- MazeNavigator: A Customisable 3D Benchmark 28- Adversarial Training Generalizes Spectral Norm for Assessing Generalisation in Reinforcement Regularization. Kevin Roth, Yannic Kilcher and Learning. Luke Harries, Sebastian Lee, Jaroslaw Thomas Hofmann Rzepecki, Katja Hofmann and Sam Devlin 29- A Causal View on Robustness of Neural 16- Utilizing Eye Gaze to Enhance the Networks. Cheng Zhang and Yingzhen Li Generalization of Imitation Network to Unseen 30- Improving PAC-Bayes bounds for neural Environments. Congcong Liu, Yuying Chen, Lei Tai, networks using geometric properties of the training Ming Liu and Bertram Shi method. Anirbit Mukherjee, Dan Roy, Pushpendre 17- Investigating Under and Overfitting in Rastogi and Jun Yang Wasserstein Generative Adversarial Networks. Ben 31- An Analysis of the Effect of Invariance on Adlam, Charles Weill and Amol Kapoor Generalization in Neural Networks. Clare Lyle, 18- An Empirical Evaluation of Adversarial Marta Kwiatkowska and Yarin Gal Robustness under Transfer Learning. Todor 32- Data-Dependent Mututal Information Bounds for Davchev, Timos Korres, Stathi Fotiadis, Nick SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Antonopoulos and Subramanian Ramamoorthy Dziugaite, Mahdi Haghifam and Ashish Khisti 19- On Adversarial Robustness of Small vs Large 33- Comparing normalization in conditional Batch Training. Sandesh Kamath, Amit Deshpande computation tasks. Vincent Michalski, Vikram Voleti, and K V Subrahmanyam Samira Ebrahimi Kahou, Anthony Ortiz, Pascal 20- The Principle of Unchanged Optimality in Vincent, Chris Pal and Doina Precup Reinforcement Learning Generalization. Xingyou 34- Weight and Batch Normalization implement Song and Alex Irpan Classical Generalization Bounds. Andrzej 21- On the Generalization Capability of Memory Banburski, Qianli Liao, Brando Miranda, Lorenzo Networks for Reasoning. Monireh Ebrahimi, Md Rosasco, Jack Hidary and Tomaso Poggio Kamruzzaman Sarker, Federico Bianchi, Ning Xie, 35- Increasing batch size through instance Aaron Eberhart, Derek Doran and Pascal Hitzler repetition improves generalization. Elad Hoffer, Tal 22- Visualizing How Embeddings Generalize. Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby and Daniel Soudry Stylianou and Robert Pless 36- Zero-Shot Learning from scratch: leveraging 23- Theoretical Analysis of the Fixup Initialization for local compositional representations. Tristan Sylvain, Fast Convergence and High Generalization Ability. Linda Petrini and Devon Hjelm Yasutaka Furusho and Kazushi Ikeda 37- Circuit-Based Intrinsic Methods to Detect 24- Data-Dependent Sample Complexity of Deep Overfitting. Satrajit Chatterjee and Alan Mishchenko Neural Networks via Lipschitz Augmentation. Colin 38- Dimension Reduction Approach for Wei and Tengyu Ma Interpretability of Sequence to SequenceRecurrent 25- Few-Shot Transfer Learning From Multiple Neural Networks. Kun Su and Eli Shlizerman Pre-Trained Networks. Joshua Ka-Wing Lee, 39- Tight PAC-Bayesian generalization error Prasanna Sattigeri and Gregory Wornell bounds for deep learning. Guillermo Valle Perez, 26- Uniform Stability and High Order Approximation Chico Q. Camargo and Ard A. Louis of SGLD in Non-Convex Learning. Mufan Li and 40- How Learning Rate and Delay Affect Minima Maxime Gazeau Selection in AsynchronousTraining of Neural 27- Better Generalization with Adaptive Adversarial Networks: Toward Closing the Generalization Gap.

Page 51 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and 54- Overparameterization without Overfitting: Daniel Soudry Jacobian-based Generalization Guarantees for 41- Making Convolutional Networks Shift-Invariant Neural Networks. Samet Oymak, Mingchen Li, Again. Richard Zhang Zalan Fabian and Mahdi Soltanolkotabi 42- A Meta-Analysis of Overfitting in Machine 55- Incorrect gradients and regularization: a Learning. Sara Fridovich-Keil, Moritz Hardt, John perspective of loss landscapes. Mehrdad Yazdani Miller, Ben Recht, Rebecca Roelofs, Ludwig 56- Divide and Conquer: Leveraging Intermediate Schmidt and Vaishaal Shankar Feature Representations for Quantized Training of 43- Kernelized Capsule Networks. Taylor Killian, Neural Networks. Ahmed Youssef, Prannoy Justin Goodwin, Olivia Brown and Sung Son Pilligundla and Hadi Esmaeilzadeh 44- Model similarity mitigates test set overuse. 57- SinReQ: Generalized Sinusoidal Regularization Moritz Hardt, Horia Mania, John Miller, Ben Recht for Low-Bitwidth Deep Quantized Training. Ahmed and Ludwig Schmidt Youssef, Prannoy Pilligundla and Hadi 45- Understanding Generalization of Deep Neural Esmaeilzadeh Networks Trained with Noisy Labels. Wei Hu, 58- Natural Adversarial Examples. Dan Hendrycks, Zhiyuan Li and Dingli Yu Kevin Zhao, Steven Basart, Jacob Steinhardt and 46- Domainwise Classification Network for Dawn Song Unsupervised Domain Adaptation. Seonguk Seo, 59- On the Properties of the Objective Landscapes Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun and Generalization of Gradient-Based You, Woong-Gi Chang and Suha Kwak Meta-Learning. Simon Guiroy, Vikas Verma and 47- The Generalization-Stability Tradeoff in Neural Christopher Pal Network Pruning. Brian Bartoldson, Ari Morcos, 60- Angular Visual Hardness. Beidi Chen, Weiyang Adrian Barbu and Gordon Erlebacher Liu, Animesh Garg, Zhiding Yu, Anshumali 48- Information matrices and generalization. Shrivastava and Animashree Anandkumar Valentin Thomas, Fabian Pedregosa, Bart van 61- Luck Matters: Understanding Training Dynamics Merriënboer, Pierre-Antoine Manzagol, Yoshua of Deep ReLU Networks. Yuandong Tian, Tina Bengio and Nicolas Le Roux Jiang, Qucheng Gong and Ari Morcos 49- Adaptively Preconditioned Stochastic Gradient 62- Understanding of Generalization in Deep Langevin Dynamics. Chandrasekaran Anirudh Learning via Tensor Methods. Jingling Li, Yanchao Bhardwaj Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 50- Additive or Concatenating Skip-connections 63- Learning from Rules Performs as Implicit Improve Data Separability. Yasutaka Furusho and Regularization. Hossein Hosseini, Ramin Moslemi, Kazushi Ikeda Ali Hooshmand and Ratnesh Sharma 51- On the Inductive Bias of Neural Tangent 64- Stochastic Mirror Descent on Kernels. Alberto Bietti and Julien Mairal Overparameterized Nonlinear Models: 52- PAC Bayes Bound Minimization via Kronecker Convergence, Implicit Regularization, and Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Generalization. Navid Azizan, Sahin Lale and Pascal Vincent, Gintare Karolina Dziugaite, Babak Hassibi Alexandre Lacoste and Aaron Courville 65- Scaling Characteristics of Sequential Multitask 53- SGD on Neural Networks Learns Functions of Learning: Networks Naturally Learn to Learn. Guy Increasing Complexity. Preetum Nakkiran, Gal Davidson and Michael Mozer Kaplun, Dimitris Kalimeris, Tristan Yang, Ben 66- Size-free generalization bounds for Edelman, Fred Zhang and Boaz Barak convolutional neural networks. Phillip Long and

Page 52 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Hanie Sedghi Gargiani, Zur, Baskin, Zheltonozhskii, Li, Talwalkar, Shang, 6th ICML Workshop on Automated Machine Behl, Baydin, Learning (AutoML 2019) Couckuyt, Dhaene, Lin, Wei, Sun, Frank Hutter, Joaquin Vanschoren, Katharina Majumder, Donini, Eggensperger, Matthias Feurer Ozaki, Adams, Geißler, Luo, peng, , Grand Ballroom B, Fri Jun 14, 08:30 AM Zhang, Langford, Machine learning has achieved considerable Caruana, Dey, Weill, successes in recent years, but this success often Gonzalvo, Yang, Yak, relies on human experts, who construct appropriate Hotaj, Macko, Mohri, features, design learning architectures, set their Cortes, Webb, Chen, hyperparameters, and develop new learning Jankowiak, algorithms. Driven by the demand for off-the-shelf Goodman, Klein, machine learning methods from an ever-growing Hutter, Javaheripi, Samragh, Lim, Kim, community, the research area of AutoML targets the 09:40 Poster Session 1 KIM, Volpp, Drori, progressive automation of machine learning aiming AM (all papers) to make effective methods available to everyone. Krishnamurthy, Cho, The workshop targets a broad audience ranging Jastrzebski, de from core machine learning researchers in different Laroussilhe, Tan, Ma, fields of ML connected to AutoML, such as neural Houlsby, Gesmundo, architecture search, hyperparameter optimization, Borsos, Maziarz, meta-learning, and learning to learn, to domain Petroski Such, experts aiming to apply machine learning to new Lehman, Stanley, types of problems. Clune, Gijsbers, Vanschoren, Mohr, Schedule Hüllermeier, Xiong, Zhang, zhu, Shao, Faust, Valko, Li, 09:00 Welcome Hutter Escalante, Wever, AM Khorlin, Javidi, Keynote by Peter Francis, Mukherjee, Frazier: Grey-box 09:05 Kim, McCourt, Kim, Bayesian Frazier AM You, Choi, Knudde, Optimization for Tornede, Jerfel AutoML Keynote by Rachel Thomas: Lessons 11:00 Learned from Thomas AM Helping 200,000 non-ML experts use ML

Page 53 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Contributed Talk 1: Abstract 2: Keynote by Peter Frazier: Grey-box A Boosting Tree Bayesian Optimization for AutoML in 6th ICML 11:35 Based AutoML Xiong Workshop on Automated Machine Learning AM System for Lifelong (AutoML 2019), Frazier 09:05 AM Machine Learning Bayesian optimization is a powerful and flexible tool 12:00 Poster Session 2 for AutoML. While BayesOpt was first deployed for PM (all papers) AutoML simply as a black-box optimizer, recent 12:50 approaches perform grey-box optimization: they Lunch Break PM leverage capabilities and problem structure specific to AutoML such as freezing and thawing training, Keynote by Jeff early stopping, treating cross-validation error Dean: An Overview 02:00 minimization as multi-task learning, and warm of Google's Work Dean PM starting from previously tuned models. We provide on AutoML and an overview of this area and describe recent Future Directions advances for optimizing sampling-based acquisition Contributed Talk 2: functions that make grey-box BayesOpt significantly Transfer NAS: more efficient. 02:35 Knowledge Transfer Borsos PM between Search Abstract 4: Keynote by Rachel Thomas: Lessons Spaces with Learned from Helping 200,000 non-ML experts Transformer Agents use ML in 6th ICML Workshop on Automated Machine Learning (AutoML 2019), Thomas 11:00 03:00 Poster Session 3 AM PM (all papers)

Contributed Talk 3: The mission of AutoML is to make ML available for Random Search and non-ML experts and to accelerate research on ML. 04:00 Reproducibility for Li We have a very similar mission at fast.ai and have PM Neural Architecture helped over 200,000 non-ML experts use Search state-of-the-art ML (via our research, software, & teaching), yet we do not use methods from the Keynote by Charles AutoML literature. I will share several insights we've 04:25 Sutton: Towards Sutton learned through this work, with the hope that they PM Semi-Automated may be helpful to AutoML researchers. Machine Learning

05:00 Zhang, Sutton, Li, Abstract 8: Keynote by Jeff Dean: An Overview of Panel Discussion PM Thomas, LeDell Google's Work on AutoML and Future Directions in 6th ICML Workshop on Automated Machine 06:00 Closing Remarks Hutter Learning (AutoML 2019), Dean 02:00 PM PM In this talk I'll survey work by Google researchers over the past several years on the topic of AutoML, Abstracts (4): or learning-to-learn. The talk will touch on basic approaches, some successful applications of

Page 54 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

AutoML to a variety of domains, and sketch out attention due to their flexibility and ease of use. some directions for future AutoML systems that can However, this generality often comes at the leverage massively multi-task learning systems for expense of efficiency (statistical as well as automatically solving new problems. computational) and robustness. The large number of required samples and safety concerns often limit Abstract 12: Keynote by Charles Sutton: Towards direct use of model-free RL for real-world settings. Semi-Automated Machine Learning in 6th ICML Workshop on Automated Machine Learning Model-based methods are expected to be more (AutoML 2019), Sutton 04:25 PM efficient. Given accurate models, trajectory optimization and Monte-Carlo planning methods can The practical work of deploying a machine learning efficiently compute near-optimal actions in varied system is dominated by issues outside of training a contexts. Advances in generative modeling, model: data preparation, data cleaning, unsupervised, and self-supervised learning provide understanding the data set, debugging models, and methods for learning models and representations so on. What does it mean to apply ML to this “grunt that support subsequent planning and reasoning. work” of machine learning and data science? I will Against this backdrop, our workshop aims to bring describe first steps towards tools in these directions, together researchers in generative modeling and based on the idea of semi-automating ML: using model-based control to discuss research questions unsupervised learning to find patterns in the data at their intersection, and to advance the state of the that can be used to guide data analysts. I will also art in model-based RL for robotics and AI. In describe a new notebook system for pulling these particular, this workshops aims to make progress on tools together: if we augment Jupyter-style questions related to: notebooks with data-flow and provenance information, this enables a new class of data-aware 1. How can we learn generative models efficiently? notebooks which are much more natural for data Role of data, structures, priors, and uncertainty. manipulation. 2. How to use generative models efficiently for planning and reasoning? Role of derivatives, sampling, hierarchies, uncertainty, counterfactual Generative Modeling and Model-Based reasoning etc. Reasoning for Robotics and AI 3. How to harmoniously integrate model-learning and model-based decision making? Aravind Rajeswaran, Emanuel Todorov, Igor 4. How can we learn compositional structure and Mordatch, William Agnew, Amy Zhang, Joelle environmental constraints? Can this be leveraged Pineau, Michael Chang, Dumitru Erhan, Sergey for better generalization and reasoning? Levine, Kimberly Stachenfeld, Marvin Zhang Schedule Hall A, Fri Jun 14, 08:30 AM

Workshop website: 08:45 Welcome and Rajeswaran https://sites.google.com/view/mbrl-icml2019 AM Introduction 09:00 Yann LeCun In the recent explosion of interest in deep RL, AM “model-free” approaches based on Q-learning and actor-critic architectures have received the most

Page 55 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

09:30 reliably predict in open environment, we must Jessica Hamrick AM deepen technical understanding in the following areas: (1) learning algorithms that are robust to 10:00 Spotlight Session 1 changes in input data distribution (e.g., detect AM out-of-distribution examples); (2) mechanisms to 11:00 estimate and calibrate confidence produced by Stefan Schaal AM neural networks and (3) methods to improve robustness to adversarial and common corruptions, 02:00 David Silver and (4) key applications for uncertainty such as in PM artificial intelligence (e.g., computer vision, robotics, 03:30 self-driving cars, medical imaging) as well as Byron Boots PM broader machine learning tasks. 04:20 Chelsea Finn PM This workshop will bring together researchers and practitioners from the machine learning 04:50 Abhinav Gupta communities, and highlight recent work that PM contribute to address these challenges. Our agenda 05:20 will feature contributed papers with invited Discussion Panel PM speakers. Through the workshop we hope to help identify fundamentally important directions on robust and reliable deep learning, and foster future collaborations. Uncertainty and Robustness in Deep Learning Schedule

Yixuan (Sharon) Li, Balaji Lakshminarayanan, 08:30 Welcome Li Dan Hendrycks, Tom Dietterich, Justin Gilmer AM

Hall B, Fri Jun 14, 08:30 AM

There has been growing interest in rectifying deep neural network vulnerabilities. Challenges arise when models receive samples drawn from outside the training distribution. For example, a neural network tasked with classifying handwritten digits may assign high confidence predictions to cat images. Anomalies are frequently encountered when deploying ML models in the real world. Well-calibrated predictive uncertainty estimates are indispensable for many machine learning applications, such as self-driving cars and medical diagnosis systems. Generalization to unseen and worst-case inputs is also essential for robustness to distributional shift. In order to have ML models

Page 56 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Scott, Koshy, Aigrain, Detecting 11:40 Bidart, Panda, Yap, Extrapolation with Madras AM Yacoby, Gontijo Influence Functions Lopes, Marchisio, How Can We Be So Englesson, Yang, Dense? The Graule, Sun, Kang, 11:50 Robustness of Ahmad Dusenberry, Du, AM Highly Sparse Maennel, Menda, Representations Edupuganti, Metz, Stutz, Srinivasan, Keynote by Suchi 08:40 Spotlight Sämann, N Saria: Safety AM Balasubramanian, Challenges with 12:00 Black-Box Mohseni, Cornish, Saria Butepage, Wang, Li, PM Predictors and Han, Li, Novel Learning Andriushchenko, Approaches for Ruff, Vadera, Ovadia, Failure Proofing Thulasidasan, Ji, Niu, Subspace Inference 02:00 Kirichenko, Izmailov, Mahloujifar, Kumar, for Bayesian Deep PM Wilson CHUN, Yin, Xu, Learning Gomes, Rohekar Quality of Keynote by Max Uncertainty 02:10 Welling: A Quantification for Yao PM 09:30 Nonparametric Bayesian Neural Welling AM Bayesian Approach Network Inference to Deep Learning 'In-Between' (without GPs) 02:20 Uncertainty in Foong 10:00 Poster Session 1 PM Bayesian Neural AM (all papers) Networks

Keynote by Kilian Keynote by Dawn 11:00 Weinberger: On Song: Adversarial Weinberger AM Calibration and 02:30 Machine Learning: Song Fairness PM Challenges, Why ReLU networks Lessons, and yield Future Directions high-confidence 11:30 predictions far away Andriushchenko AM from the training data and how to mitigate the problem

Page 57 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Keynote by Abstract 5: Keynote by Kilian Weinberger: On Terrance Boult: The Calibration and Fairness in Uncertainty and Deep Unknown: on Robustness in Deep Learning, Weinberger 11:00 03:30 Open-set and Boult AM PM Adversarial We investigate calibration for deep learning Examples in Deep algorithms in classification and regression settings. Learning Although we show that typically deep networks tend Panel Discussion Welling, Weinberger, to be highly mis-calibrated, we demonstrate that this 04:00 (moderator: Tom Boult, Song, is easy to fix - either to obtain more trustworthy PM Dietterich) Dietterich confidence estimates or to detect outliers in the 05:00 Poster Session 2 data. Finally, we relate calibration with the recently PM (all papers) raised tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that Abstracts (9): calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across Abstract 3: Keynote by Max Welling: A groups), and show that any algorithm that satisfies Nonparametric Bayesian Approach to Deep this relaxation is no better than randomizing a Learning (without GPs) in Uncertainty and percentage of predictions for an existing classifier. Robustness in Deep Learning, Welling 09:30 AM These unsettling findings, which extend and generalize existing results, are empirically We present a new family of exchangeable confirmed on several datasets. stochastic processes suitable for deep learning. Our nonparametric Bayesian method models Abstract 6: Why ReLU networks yield distributions over functions by learning a graph of high-confidence predictions far away from the dependencies on top of latent representations of the training data and how to mitigate the problem in points in the given dataset. In doing so, they define Uncertainty and Robustness in Deep Learning, a Bayesian model without explicitly positing a prior Andriushchenko 11:30 AM distribution over latent global parameters; they instead adopt priors over the relational structure of Classifiers used in the wild, in particular for the given dataset, a task that is much simpler. We safety-critical systems, should know when they show how we can learn such models from data, don’t know, in particular make low confidence demonstrate that they are scalable to large datasets predictions far away from the train- ing data. We through mini-batch optimization and describe how show that ReLU type neural networks fail in this we can make predictions for new points via their regard as they produce almost always high posterior predictive distribution. We experimentally confidence predictions far away from the training evaluate FNPs on the tasks of toy regression and data. For bounded domains we propose a new image classification and show that, when compared robust optimization technique similar to adversarial to baselines that employ global latent parameters, training which enforces low confidence pre- dictions they offer both competitive predictions as well as far away from the training data. We show that this more robust uncertainty estimates. technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions

Page 58 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 and test error on the original classification task com- constraint for creating highly robust networks. pared to standard training. This is a short version of the corresponding CVPR paper. Abstract 10: Subspace Inference for Bayesian Deep Learning in Uncertainty and Robustness in Abstract 7: Detecting Extrapolation with Deep Learning, Kirichenko, Izmailov, Wilson 02:00 Influence Functions in Uncertainty and PM Robustness in Deep Learning, Madras 11:40 AM Bayesian inference was once a gold standard for In this work, we explore principled methods for learning with neural networks, providing accurate extrapolation detection. We define extrapolation as full predictive distributions and well calibrated occurring when a model’s conclusion at a test point uncertainty. However, scaling Bayesian inference is underdetermined by the training data. Our metrics techniques to deep neural networks is challenging for detecting extrapolation are based on influence due to the high dimensionality of the parameter functions, inspired by the intuition that a point space. In this pa- per, we construct low-dimensional requires extrapolation if its inclusion in the training subspaces of parameter space that contain diverse set would significantly change the model’s learned sets of models, such as the first principal parameters. We provide interpretations of our components of the stochastic gradient descent methods in terms of the eigendecomposition of the (SGD) trajectory. In these subspaces, we are able Hessian. We present experimental evidence that to apply elliptical slice sampling and variational our method is capable of identifying extrapolation to inference, which struggle in the full parameter out-of-distribution points. space. We show that Bayesian model averaging over the induced posterior in these subspaces Abstract 8: How Can We Be So Dense? The produces high accurate predictions and Robustness of Highly Sparse Representations well-calibrated predictive uncertainty for both in Uncertainty and Robustness in Deep regression and image classification. Learning, Ahmad 11:50 AM Abstract 11: Quality of Uncertainty Quantification Neural networks can be highly sensitive to noise for Bayesian Neural Network Inference in and perturbations. In this paper we suggest that Uncertainty and Robustness in Deep Learning, high dimensional sparse representations can lead to Yao 02:10 PM increased robustness to noise and interference. A key intuition we develop is that the ratio of the Bayesian Neural Networks (BNNs) place priors over match volume around a sparse vector divided by the parameters in a neural network. Inference in the total representational space decreases BNNs, however, is difficult; all inference methods for exponentially with dimensionality, leading to highly BNNs are approximate. In this work, we empirically robust matching with low interference from other compare the quality of predictive uncertainty patterns. We analyze efficient sparse networks estimates for 10 common inference methods on containing both sparse weights and sparse both regression and classification tasks. Our activations. Simulations on MNIST, the Google experiments demonstrate that commonly used Speech Command Dataset, and CIFAR-10 show metrics (e.g. test log-likelihood) can be misleading. that such networks demonstrate improved Our experiments also indicate that inference robustness to random noise compared to dense innovations designed to capture structure in the networks, while maintaining competitive accuracy. posterior do not necessarily produce high quality We propose that sparsity should be a core design posterior approximations.

Page 59 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Abstract 12: 'In-Between' Uncertainty in Bayesian deep-feature adversarial approach, called LOTS Neural Networks in Uncertainty and Robustness can attack the first OpenMax solution, as well as in Deep Learning, Foong 02:20 PM successfully attack even open-set face recognition systems. We end with a discussion of how open set We describe a limitation in the expressiveness of theory can be applied to improve network the predictive uncertainty estimate given by mean- robustness. field variational inference (MFVI), a popular approximate inference method for Bayesian neural networks. In particular, MFVI fails to give calibrated Reinforcement Learning for Real Life uncertainty estimates in between separated regions of observations. This can lead to catastrophically Yuxi Li, Alborz Geramifard, Lihong Li, Csaba overconfident predictions when testing on Szepesvari, Tao Wang out-of-distribution data. Avoiding such over-confidence is critical for active learning, Seaside Ballroom, Fri Jun 14, 08:30 AM Bayesian optimisation and out-of-distribution robustness. We instead find that a classical Reinforcement learning (RL) is a general learning, technique, the linearised Laplace approximation, predicting, and decision making paradigm. RL can handle ‘in- between’ uncertainty much better for provides solution methods for sequential decision small network architectures. making problems as well as those can be transformed into sequential ones. RL connects Abstract 14: Keynote by Terrance Boult: The deeply with optimization, statistics, game theory, Deep Unknown: on Open-set and Adversarial causal inference, sequential experimentation, etc., Examples in Deep Learning in Uncertainty and overlaps largely with approximate dynamic Robustness in Deep Learning, Boult 03:30 PM programming and optimal control, and applies broadly in science, engineering and arts. The first part of the talk will explore issues with deep networks dealing with "unknowns" inputs, and the RL has been making steady progress in academia general problems of open-set recognition in deep recently, e.g., Atari games, AlphaGo, visuomotor networks. We review the core of open-set policies for robots. RL has also been applied to real recognition theory and its application in our first world scenarios like recommender systems and attempt at open-set deep networks, "OpenMax" We neural architecture search. See a recent collection discuss is successes and limitations and why about RL applications at classic "open-set" approaches don't really solve the https://medium.com/@yuxili/rl-applications-73ef685c07eb. problem of deep unknowns. We then present our It is desirable to have RL systems that work in the recent work from NIPS2018, on a new model we real world with real benefits. However, there are call the ObjectoSphere. Using ObjectoSphere loss many issues for RL though, e.g. generalization, begins to address the learning of deep features that sample efficiency, and exploration vs. exploitation can handle unknown inputs. We present examples dilemma. Consequently, RL is far from being widely of its use first on simple datasets sets deployed. Common, critical and pressing questions (MNIST/CFAR) and then onto unpublished work for the RL community are then: Will RL have wide applying it to the real-world problem of open-set deployments? What are the issues? How to solve face recognition. We discuss of the relationship them? between open set recognition theory and adversarial image generation, showing how our The goal of this workshop is to bring together

Page 60 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 researchers and practitioners from industry and invited talk by Craig academia interested in addressing practical and/or Boutilier (Google theoretical issues in applying RL to real life Research): scenarios, review state of the arts, clarify impactful 09:40 Reinforcement Boutilier research problems, brainstorm open challenges, AM Learning in share first-hand lessons and experiences from real Recommender life deployments, summarize what has worked and Systems: Some what has not, collect tips for people from industry Challenges looking to apply RL and RL experts interested in applying their methods to real domains, identify Chen, Garau Luis, potential opportunities, generate new ideas for Albert Smet, Modi, future lines of research and development, and Tomkins, promote awareness and collaboration. This is not Simmons-Edler, Mao, "yet another RL workshop": it is about how to Irpan, Lu, Wang, successfully apply RL to real life applications. This Mukherjee, Raghu, is a less addressed issue in the RL/ML/AI Shihab, Ahn, Fakoor, community, and calls for immediate attention for Chaudhari, Smirnova, 10:00 Oh, Tang, Qin, Li, sustainable prosperity of RL research and posters development. AM Brittain, Fox, Paul, Gao, Chow, Schedule Dulac-Arnold, Nachum, Karampatziakis, 08:30 optional early-bird Balaji, Paul, Davody, AM posters Bouneffouf, Sahni, 08:50 opening remarks by Kim, Kolobov, Amini, AM organizers Liu, Chen, , Boutilier

invited talk by David 10:30 Silver (Deepmind): coffee break 09:00 AM AlphaStar: Silver AM Mastering the Game of StarCraft II

invited talk by John Langford (Microsoft Research): How do 09:20 we make Real WorldLangford AM Reinforcement Learning revolution?

Page 61 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

panel discussion Abstract 4: invited talk by John Langford with Craig Boutilier (Microsoft Research): How do we make Real (Google Research), World Reinforcement Learning revolution? in Emma Brunskill Reinforcement Learning for Real Life, Langford (Stanford), Chelsea 09:20 AM Finn (Google Brain, Stone, Boutilier, Abstract: Doing Real World Reinforcement Learning Stanford, UC 11:00 Brunskill, Finn, implies living with steep constraints on the sample Berkeley), AM Langford, Silver, complexity of solutions. Where is this viable? Where Mohammad Ghavamzadeh might it be viable in the near future? In the far Ghavamzadeh future? How can we design a research program (Facebook AI), John around identifying and building such solutions? In Langford (Microsoft short, what are the missing elements we need to Research) and really make reinforcement learning more mundane David Silver and commonly applied than Supervised Learning? (Deepmind) The potential is certainly there given the naturalness 12:00 of RL compared to supervised learning, but the optional posters PM present is manifestly different. https://en.wikipedia.org/wiki/John_Langford_(computer_scientist)

Abstracts (3): Abstract 5: invited talk by Craig Boutilier (Google Research): Reinforcement Learning in Abstract 3: invited talk by David Silver Recommender Systems: Some Challenges in (Deepmind): AlphaStar: Mastering the Game of Reinforcement Learning for Real Life, Boutilier StarCraft II in Reinforcement Learning for Real 09:40 AM Life, Silver 09:00 AM Abstract: I'll present a brief overview of some recent In recent years, the real-time strategy game of work on reinforcement learning motivated by StarCraft has emerged by consensus as an practical issues that arise in the application of RL to important challenge for AI research. It combines online, user-facing applications like recommender several major difficulties that are intractable for systems. These include stochastic action sets, many existing algorithms: a large, structured action long-term cumulative effects, and combinatorial space; imperfect information about the opponent; a action spaces. I'll provide some detail on the last of partially observed map; and cycles in the strategy these, describing SlateQ, a novel decomposition space. Each of these challenges represents a major technique that allows value-based RL (e.g., difficulty faced by real-world applications, for Q-learning) in slate-based recommender to scale to example those based on internet-scale action commercial production systems, and briefly spaces, game theory in e.g. security, point-and-click describe both small-scale simulation and a interfaces, or robust AI in the presence of diverse large-scale experiment with YouTube. and potentially exploitative user strategies. Here, we introduce AlphaStar: a novel combination of deep Bio: Craig is Principal Scientist at Google, working learning and reinforcement learning that mastered on various aspects of decision making under this challenging domain and defeated human uncertainty (e.g., reinforcement learning, Markov professional players for the first time. decision processes, user modeling, preference

Page 62 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 modeling and elicitation) and recommender learning has surged significantly over the past systems. He received his Ph.D. from the University several years, with the empirical successes of of Toronto in 1992, and has held positions at the self-playing in games and availability of increasingly University of British Columbia, University of Toronto, realistic simulation environments. We believe the CombineNet, and co-founded Granata Decision time is ripe for the research community to push Systems. beyond simulated domains and start exploring research directions that directly address the Craig was Editor-in-Chief of JAIR; Associate Editor real-world need for optimal decision making. We are with ACM TEAC, JAIR, JMLR, and JAAMAS; particularly interested in understanding the current Program Chair for IJCAI-09 and UAI-2000. Boutilier theoretical and practical challenges that prevent is a Fellow of the Royal Society of Canada (RSC), broader adoption of current policy learning and the Association for Computing Machinery (ACM) evaluation algorithms in high-impact applications, and the Association for the Advancement of Artificial across a broad range of domains. Intelligence (AAAI). He was recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award This workshop welcomes both theory and and a Tier I Canada Research Chair; and has application contributions. received (with great co-authors) a number of Best Paper awards including: the 2009 IJCAI-JAIR Best Schedule Paper Prize; the 2014 AIJ Prominent Paper Award; and the 2018 NeurIPS Best Paper Award. Emma Brunskill (Stanford) - Minimizing & Real-world Sequential Decision Making: 02:00 Understanding the Brunskill Reinforcement Learning and Beyond PM Data Needed to Learn to Make Good Hoang Le, Yisong Yue, Adith Swaminathan, Sequences of Byron Boots, Ching-An Cheng Decisions

Seaside Ballroom, Fri Jun 14, 14:00 PM Miro Dudík (Microsoft Workshop website: 02:30 Research) - Doubly Dudik PM Robust Off-policy This workshop aims to bring together researchers Evaluation with from industry and academia in order to describe Shrinkage recent advances and discuss future research 03:00 Poster Session Part directions pertaining to real-world sequential PM 1 and Coffee Break decision making, broadly construed. We aim to highlight new and emerging research opportunities for the machine learning community that arise from the evolving needs for making decision making theoretically and practically relevant for realistic applications.

Research interest in reinforcement and imitation

Page 63 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Suchi Saria (John (DR) estimation, the reasons for its shortcomings in Hopkins) - Link finite samples, and how to overcome these between Causal shortcomings by directly optimizing the bound on Inference and finite-sample error. Optimization yields a new family 04:00 Reinforcement of estimators that, similarly to DR, leverage any Saria PM Learning and direct model of rewards, but shrink importance Applications to weights to obtain a better bias-variance tradeoff Learning from than DR. Error bounds can also be used to select Offline/Observational the best among multiple reward predictors. Data Somewhat surprisingly, reward predictors that work best with standard DR are not the same as those Dawn Woodard that work best with our modified DR. Our new (Uber) - Dynamic 04:30 estimator and model selection procedure perform Pricing and Woodard PM extremely well across a wide variety of settings, so Matching for we expect they will enjoy broad practical use. Ride-Hailing Panel Discussion Based on joint work with Yi Su, Maria with Emma Dimakopoulou, and Akshay Krishnamurthy. 05:00 Brunskill, Miro PM Dudík, Suchi Saria, Abstract 5: Dawn Woodard (Uber) - Dynamic Dawn Woodard Pricing and Matching for Ride-Hailing in Real-world Sequential Decision Making: 05:30 Poster Session Part Reinforcement Learning and Beyond, Woodard PM 2 04:30 PM

Ride-hailing platforms like Uber, Lyft, Didi Chuxing, Abstracts (2): and Ola have achieved explosive growth, in part by improving the efficiency of matching between riders Abstract 2: Miro Dudík (Microsoft Research) - and drivers, and by calibrating the balance of supply Doubly Robust Off-policy Evaluation with and demand through dynamic pricing. We survey Shrinkage in Real-world Sequential Decision methods for dynamic pricing and matching in Making: Reinforcement Learning and Beyond, ride-hailing, and show that these are critical for Dudik 02:30 PM providing an experience with low waiting time for Contextual bandits are a learning protocol that both riders and drivers. We also discuss encompasses applications such as news approaches used to predict key inputs into those recommendation, advertising, and mobile health, algorithms: demand, supply, and travel time in the where an algorithm repeatedly observes some road network. Then we link the two levers together information about a user, makes a decision what by studying a pool-matching mechanism called content to present, and accrues a reward if the dynamic waiting that varies rider waiting and presented content is successful. In this talk, I will walking before dispatch, which is inspired by a focus on the fundamental task of evaluating a new recent carpooling product Express Pool from Uber. policy given historic data. I will describe the We show using data from Uber that by jointly asymptotically optimal approach of doubly robust optimizing dynamic pricing and dynamic waiting, price variability can be mitigated, while increasing

Page 64 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 capacity utilization, trip throughput, and welfare. We also highlight several key practical challenges and directions of future research from a practitioner's perspective.

Page 65 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

autonomous driving capabilities beyond road June 15, 2019 following, towards fully driverless cars. The workshop will consider the current state of learning applied to autonomous vehicles and will explore how learning may be used in future systems. The workshop will span both theoretical frameworks and practical issues especially in the area of deep Workshop on AI for autonomous driving learning.

Anna Choromanska, Larry Jackel, Li Erran Li, Schedule Juan Carlos Niebles, Adrien Gaidon, Ingmar Posner, Wei-Lun (Harry) Chao 09:00 Opening Remarks AM 101, Sat Jun 15, 08:30 AM Sven Kreiss: A diverse set of methods have been devised to "Compositionality, 09:15 develop autonomous driving platforms. They range Confidence and Kreiss, Alahi AM from modular systems, systems that perform Crowd Modeling for manual decomposition of the problem, systems Self-Driving Cars" where the components are optimized Mayank Bansal: independently, and a large number of rules are "ChauffeurNet: programmed manually, to end-to-end deep-learning 09:40 Learning to Drive by Bansal frameworks. Today’s systems rely on a subset of AM Imitating the Best the following: camera images, HD maps, inertial and Synthesizing measurement units, wheel encoders, and active 3D the Worst" sensors (LIDAR, radar). There is a general Chelsea Finn: "A agreement that much of the self-driving software Practical View on stack will continue to incorporate some form of 10:05 Generalization and Finn machine learning in any of the above mentioned AM Autonomy in the systems in the future. Real World"

Self-driving cars present one of today’s greatest Sergey Levine: challenges and opportunities for Artificial "Imitation, Intelligence (AI). Despite substantial investments, Prediction, and existing methods for building autonomous vehicles 10:50 Model-Based Levine have not yet succeeded, i.e., there are no driverless AM Reinforcement cars on public roads today without human safety Learning for drivers. Nevertheless, a few groups have started Autonomous working on extending the idea of learned tasks to Driving" larger functions of autonomous driving. Initial results 11:15 on learned road following are very promising. Wolfram Burgard Burgard AM

The goal of this workshop is to explore ways to create a framework that is capable of learning

Page 66 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Dorsa Sadigh: Abstracts (7): "Influencing 11:40 Abstract 2: Sven Kreiss: "Compositionality, Interactive Sadigh AM Confidence and Crowd Modeling for Self-Driving Mixed-Autonomy Cars" in Workshop on AI for autonomous Systems" driving, Kreiss, Alahi 09:15 AM 01:30 Poster Session Jeong, Philion PM I will present our recent works related to the three AI pillars of a self-driving car: perception, prediction, Alexander Amini: 02:30 and planning. For the perception pillar, I will present "Learning to Drive Amini PM new human pose estimation and monocular with Purpose" distance estimation methods that use a loss that Fisher Yu: "Motion learns its own confidence, the Laplace loss. For 02:55 and Prediction for prediction, I will show our investigations on Yu, Darrell PM Autonomous interpretable models where we apply deep learning Driving" techniques within structured and hand-crafted Alfredo Canziani: classical models for path prediction in social "Model-Predictive contexts. For the third pillar, planning, I will show Policy Learning with our crowd-robot interaction module that uses 03:20 Uncertainty Canziani attention-based representation learning suitable for PM Regularization for planning in an RL environment with multiple people. Driving in Dense Abstract 3: Mayank Bansal: "ChauffeurNet: Traffic " Learning to Drive by Imitating the Best and Jianxiong Xiao: Synthesizing the Worst" in Workshop on AI for 04:05 "Self-driving Car: autonomous driving, Bansal 09:40 AM Xiao PM What we can achieve today?" Our goal is to train a policy for autonomous driving via imitation learning that is robust enough to drive a German Ros: real vehicle. We find that standard behavior cloning "Fostering 04:30 is insufficient for handling complex driving Autonomous Ros PM scenarios, even when we leverage a perception Driving Research system for preprocessing the input and a controller with CARLA" for executing the output on the car: 30 million Venkatraman examples are still not enough. We propose Narayanan: "The 04:55 exposing the learner to synthesized data in the form Promise and Narayanan, Bagnell PM of perturbations to the expert's driving, which Challenge of ML in creates interesting situations such as collisions Self-Driving" and/or going off the road. Rather than purely Best Paper Award imitating all data, we augment the imitation loss with 05:20 and Panel additional losses that penalize undesirable events PM Discussion and encourage progress -- the perturbations then provide an important signal for these losses and lead to robustness of the learned model. We show

Page 67 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 that the ChauffeurNet model can handle complex behavior and intentions of other drivers on the road. situations in simulation, and present ablation Finally, I will conclude with a discussion of current experiments that emphasize the importance of each research that is likely to make an impact on of our proposed changes and show that the model autonomous driving and safety-critical AI systems in is responding to the appropriate causal factors. the near future, including meta-learning, off-policy Finally, we demonstrate the model driving a real car reinforcement learning, and pixel-level video at our test facility. prediction models.

Abstract 5: Sergey Levine: "Imitation, Prediction, Abstract 9: Alexander Amini: "Learning to Drive and Model-Based Reinforcement Learning for with Purpose" in Workshop on AI for Autonomous Driving" in Workshop on AI for autonomous driving, Amini 02:30 PM autonomous driving, Levine 10:50 AM Deep learning has revolutionized the ability to learn While machine learning has transformed passive "end-to-end" autonomous vehicle control directly perception -- computer vision, speech recognition, from raw sensory data. In recent years, there have NLP -- its impact on autonomous control in been advances to handle more complex forms of real-world robotic systems has been limited due to navigational instruction. However, these networks reservations about safety and reliability. In this talk, are still trained on biased human driving data I will discuss how end-to-end learning for control (yielding biased models), and are unable to capture can be framed in a way that is data-driven, reliable the full distribution of possible actions that could be and, crucially, easy to merge with existing taken. By learning a set of unsupervised latent model-based control pipelines based on planning variables that characterize the training data, we and state estimation. The basic building blocks of present an online debiasing algorithm for this approach to control are generative models that autonomous driving. Additionally, we extend estimate which states are safe and familiar, and end-to-end driving networks with the ability to drive model-based reinforcement learning, which can with purpose and perform point-to-point navigation. utilize these generative models within a planning We formulate how our model can be used to also and control framework to make decisions. By localize the robot according to correspondences framing the end-to-end control problem as one of between the map and the observed visual road prediction and generation, we can make it possible topology, inspired by the rough localization that to use large datasets collected by previous human drivers can perform, even in cases where behavioral policies, as well as human operators, GPS is noisy or removed all together. Our results estimate confidence or familiarity of new highlight the importance of bridging the benefits observations to detect "unknown unknowns," and from end-to-end learning with classical probabilistic analyze the performance of our end-to-end models reasoning and Bayesian inference to push the on offline data prior to live deployment. I will discuss boundaries of autonomous driving. how model-based RL can enable navigation and obstacle avoidance, how generative models can Abstract 11: Alfredo Canziani: "Model-Predictive detect uncertain and unsafe situations, and then Policy Learning with Uncertainty Regularization discuss how these pieces can be put together into for Driving in Dense Traffic " in Workshop on AI the framework of deep imitative models: generative for autonomous driving, Canziani 03:20 PM models trained via imitation of human drivers that Learning a policy using only observational data is can be incorporated into model-based control for challenging because the distribution of states it autonomous driving, and can reason about future

Page 68 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 induces at execution time may differ from the To deliver the benefits of autonomous driving safely, distribution observed during training. In this work, quickly, and broadly, learnability has to be a key we propose to train a policy while explicitly element of the solution. In this talk, I will describe penalizing the mismatch between these two Aurora's philosophy towards building learnability distributions over a fixed time horizon. We do this by into the self-driving architecture, avoiding the pitfalls using a learned model of the environment dynamics of applying vanilla ML to problems involving which is unrolled for multiple time steps, and feedback, and leveraging expert demonstrations for training a policy network to minimize a differentiable learning decision-making models. I will conclude cost over this rolled-out trajectory. This cost with our approach to testing and validation. contains two terms: a policy cost which represents the objective the policy seeks to optimize, and an uncertainty cost which represents its divergence Workshop on Multi-Task and Lifelong from the states it is trained on. We propose to Reinforcement Learning measure this second cost by using the uncertainty of the dynamics model about its own predictions, Sarath Chandar, Shagun Sodhani, Khimya using recent ideas from uncertainty estimation for Khetarpal, Tom Zahavy, Daniel J. Mankowitz, deep networks. We evaluate our approach using a Shie Mannor, Balaraman Ravindran, Doina large-scale observational dataset of driving behavior Precup, Chelsea Finn, Abhishek Gupta, Amy recorded from traffic cameras, and show that we are Zhang, Kyunghyun Cho, Andrei Rusu, Facebook able to learn effective driving policies from purely Rob Fergus observational data, with no environment interaction. 102, Sat Jun 15, 08:30 AM Abstract 13: German Ros: "Fostering Autonomous Driving Research with CARLA" in Website link: https://sites.google.com/view/mtlrl/ Workshop on AI for autonomous driving, Ros 04:30 PM Significant progress has been made in reinforcement learning, enabling agents to This talk focuses on the relevance of open source accomplish complex tasks such as Atari games, solutions to foster autonomous driving research and robotic manipulation, simulated locomotion, and Go. development. To this end, we present how CARLA These successes have stemmed from the core has been used within the research community in the reinforcement learning formulation of learning a last year and what results has it enabled. We will single policy or value function from scratch. also cover the CARLA Autonomous Driving However, reinforcement learning has proven Challenge and its relevance as an open benchmark challenging to scale to many practical real world for the driving community. We will share with the problems due to problems in learning efficiency and community new soon-to-be-released features and objective specification, among many others. the future direction of the CARLA simulation Recently, there has been emerging interest and platform. research in leveraging structure and information across multiple reinforcement learning tasks to Abstract 14: Venkatraman Narayanan: "The more efficiently and effectively learn complex Promise and Challenge of ML in Self-Driving" in behaviors. This includes: Workshop on AI for autonomous driving, Narayanan, Bagnell 04:55 PM 1. curriculum and lifelong learning, where the problem requires learning a sequence of tasks,

Page 69 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 leveraging their shared structure to enable 08:45 Opening Remarks knowledge transfer AM 2. goal-conditioned reinforcement learning techniques that leverage the structure of the Sergey Levine: Unsupervised provided goal space to learn many tasks 09:00 Reinforcement Levine significantly faster AM 3. meta-learning methods that aim to learn efficient Learning and learning algorithms that can learn new tasks quickly Meta-Learning 4. hierarchical reinforcement learning, where the 09:25 Spotlight reinforcement learning problem might entail a AM Presentations compositions of subgoals or subtasks with shared Peter Stone: structure 09:50 Learning Curricula Stone AM for Transfer Multi-task and lifelong reinforcement learning has Learning in RL the potential to alter the paradigm of traditional reinforcement learning, to provide more practical 10:15 Contributed Talks and diverse sources of supervision, while helping AM overcome many challenges associated with 10:30 Posters and Break reinforcement learning, such as exploration, sample AM efficiency and credit assignment. However, the field Jacob Andreas: of multi-task and lifelong reinforcement learning is 11:00 Linguistic Scaffolds Andreas still young, with many more developments needed AM for Policy Learning in terms of problem formulation, algorithmic and theoretical advances as well as better Karol Hausman: benchmarking and evaluation. Skill Representation 11:25 and Supervision in Hausman The focus of this workshop will be on both the AM Multi-Task algorithmic and theoretical foundations of multi-task Reinforcement and lifelong reinforcement learning as well as the Learning practical challenges associated with building 11:50 multi-tasking agents and lifelong learning Contributed Talks AM benchmarks. Our goal is to bring together researchers that study different problem domains 12:20 Posters and Lunch (such as games, robotics, language, and so forth), PM Break different optimization approaches (deep learning, Martha White: evolutionary algorithms, model-based control, etc.), 02:00 Learning White and different formalisms (as mentioned above) to PM Representations for discuss the frontiers, open problems and Continual Learning meaningful next steps in multi-task and lifelong Natalia reinforcement learning. Diaz-Rodriguez: 02:25 Continual Learning Diaz Rodriguez Schedule PM and Robotics: an overview

Page 70 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

02:50 11:50 Which Tasks Should Be Learned Together in Posters and Break PM Multi-task Learning? Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Jeff Clune: Towards Malik, Silvio Savarese Solving 11:58 Finetuning Subpolicies for Hierarchical Catastrophic Reinforcement Learning Carlos Florensa, Alexander 03:30 Forgetting with Clune Li and Pieter Abbeel PM Neuromodulation & 12:06 Online Learning for Auxiliary Task Weighting Learning Curricula for Reinforcement Learning Xingyu Lin*, Harjatin by Generating Singh Baweja*, George Kantor, David Held Environments 12:14 Guided Meta-Policy Search Russell 03:55 Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Contributed Talks PM Abbeel, Sergey Levine, Chelsea Finn

04:15 Nicolas Heess: TBD Heess Abstract 15: Contributed Talks in Workshop on PM Multi-Task and Lifelong Reinforcement Benjamin Rosman: Learning, 03:55 PM Exploiting Structure 04:40 For Accelerating Rosman 03:55 Online Continual Learning with Maximally PM Reinforcement Inferred Retrieval Rahaf Alundji*, Lucas Caccia*, Learning Eugene Belilovsky*, Massimo Caccia*, Min Lin, Laurent Charlin, Tinne Tuytelaars 05:05 Panel Discussion 04:05 Skew-Fit: State-Covering Self-Supervised PM Reinforcement Learning Vitchyr H. Pong *, Murtaza Dalal*, Steven Lin, Ashvin Nair, Shikhar Bahl, Sergey Levine Abstracts (3):

Abstract 5: Contributed Talks in Workshop on Multi-Task and Lifelong Reinforcement Invertible Neural Networks and Normalizing Learning, 10:15 AM Flows

10:15 Meta-Learning via Learned Loss Yevgen Chin-Wei Huang, David Krueger, Rianne Van Chebotar*, Artem Molchanov*, Sarah Bechtle*, den Berg, George Papamakarios, Aidan Gomez, Ludovic Righetti, Franziska Meier, Gaurav Chris Cremer, Ricky T. Q. Chen, Aaron Courville, Sukhatme Danilo J. Rezende 10:23 MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies 103, Sat Jun 15, 08:30 AM Xue Bin Peng, Michael Chang, Grace Zhang Pieter Invertible neural networks have been a significant Abbeel, Sergey Levine thread of research in the ICML community for Abstract 9: Contributed Talks in Workshop on several years. Such transformations can offer a Multi-Task and Lifelong Reinforcement range of unique benefits: Learning, 11:50 AM (1) They preserve information, allowing perfect

Page 71 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 reconstruction (up to numerical limits) and obviating 12:10 Contributed talk the need to store hidden activations in memory for PM backpropagation. (2) They are often designed to track the changes in Householder meets Sylvester: probability density that applying the transformation 02:00 Normalizing flows induces (as in normalizing flows). PM (3) Like autoregressive models, normalizing flows for variational can be powerful generative models which allow inference exact likelihood computations; with the right Neural Ordinary architecture, they can also allow for much cheaper Differential 02:20 sampling than autoregressive models. Equations for PM Continuous While many researchers are aware of these topics Normalizing Flows and intrigued by several high-profile papers, few are 02:40 familiar enough with the technical details to easily Contributed talk PM follow new developments and contribute. Many may also be unaware of the wide range of applications of 03:00 poster session II invertible neural networks, beyond generative PM modelling and variational inference. The Bijector API: An 04:00 Invertible Function Schedule PM Library for TensorFlow 09:30 Tutorial on Jang Invertible Neural AM normalizing flows Networks for 04:20 10:30 Understanding and Poster Spotlights PM AM Controlling Learned Rhinehart, Tang, Representations Prabhu, Yap, Wang, 04:40 Contributed talk Finzi, Kumar, Lu, PM Kumar, Lei, 05:00 10:50 Przystupa, De Cao, Panel Session poster session I PM AM Kirichenko, Izmailov, Wilson, Kruse, Mesquita, Lezcano Casado, Müller, Stein’s Method for Machine Learning and Simmons, Atanov Statistics 11:30 Building a tractable AM generator network Francois-Xavier Briol, Lester Mackey, Chris Glow: Generative Oates, Qiang Liu, Larry Goldstein 11:50 Flow with Invertible Dhariwal AM 104 A, Sat Jun 15, 08:30 AM 1x1 Convolutions

Page 72 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Stein's method is a technique from probability Invited Talk - Anima theory for bounding the distance between Anandkumar: probability measures using differential and 09:45 Stein’s method for Anandkumar difference operators. Although the method was AM understanding initially designed as a technique for proving central optimization in limit theorems, it has recently caught the attention of neural networks. the machine learning (ML) community and has been 10:30 used for a variety of practical tasks. Recent Break applications include generative modeling, global AM non-convex optimisation, variational inference, de Invited Talk - Arthur novo sampling, constructing powerful control Gretton: Relative variates for Monte Carlo variance reduction, and 11:00 goodness-of-fit Gretton measuring the quality of (approximate) Markov AM tests for models chain Monte Carlo algorithms. Stein's method has with latent also been used to develop goodness-of-fit tests and variables. was the foundational tool in one of the NeurIPS 11:45 Invited Talk - 2017 Best Paper awards. Duncan AM Andrew Duncan

Although Stein's method has already had significant 12:30 Lunch and Poster impact in ML, most of the applications only scratch PM Session the surface of this rich area of research in Invited Talk - probability theory. There would be significant gains Yingzhen Li: 01:45 to be made by encouraging both communities to Gradient estimation Li PM interact directly, and this inaugural workshop would for implicit models be an important step in this direction. More with Stein's method. precisely, the aims are: (i) to introduce this Invited Talk - Ruiyi emerging topic to the wider ML community, (ii) to Zhang: On highlight the wide range of existing applications in Wasserstein ML, and (iii) to bring together experts in Stein's 02:30 Gradient Flows and ZHANG method and ML researchers to discuss and explore PM Particle-Based potential further uses of Stein's method. Variational Schedule Inference 03:15 Break 08:30 PM Overview of the day Briol AM Invited Talk - Paul Tutorial - Larry Valiant: How the 08:45 Goldstein: The 03:45 Ornstein-Uhlenbeck Goldstein AM Many Faces of a PM process drives Simple Identity generalization for deep learning.

Page 73 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Invited Talk - Louis guaranteed training of two-layer neural networks. Chen: Palm theory, We provide risk bounds for our proposed method, 04:30 random measures Chen with a polynomial sample complexity in the relevant PM and Stein parameters, such as input dimension and number of couplings. neurons. Our training method is based on tensor decomposition, which provably converges to the 05:15 Panel Discussion - global optimum, under a set of mild non-degeneracy PM All speakers conditions. This provides insights into role of generative process for tractability of supervised learning. Abstracts (7): Abstract 5: Invited Talk - Arthur Gretton: Relative Abstract 2: Tutorial - Larry Goldstein: The Many goodness-of-fit tests for models with latent Faces of a Simple Identity in Stein’s Method for variables. in Stein’s Method for Machine Machine Learning and Statistics, Goldstein 08:45 Learning and Statistics, Gretton 11:00 AM AM I will describe a nonparametric, kernel-based test to Stein's identity states that a mean zero, variance assess the relative goodness of fit of latent variable one random variable W has the standard normal models with intractable unnormalized densities. The distribution if and only if E[Wf(W)] = E[f'(W)] for all f test generalises the kernel Stein discrepancy (KSD) in F, where F is the class of functions such that both tests of (Liu et al., 2016, Chwialkowski et al., 2016, sides of the equality exist. Stein (1972) used this Yang et al., 2018, Jitkrittum et al., 2018) which identity as the key element in his novel method for require exact access to unnormalized densities. We proving the Central Limit Theorem. The method will rely on the simple idea of using an approximate generalizes to many distributions beyond the observed-variable marginal in place of the exact, normal, allows one to operate on random variables intractable one. As the main theoretical contribution, directly rather than through their transforms, the new test has a well-controlled type-I error, once provides non-asymptotic bounds to their limit, and we have properly corrected the threshold. In the handles a variety of dependence. In addition to case of models with low-dimensional latent structure distributional approximation, Stein's identity has and high-dimensional observations, our test connections to concentration of measure, Malliavin significantly outperforms the relative maximum calculus, and statistical procedures including mean discrepancy test, which cannot exploit the shrinkage and unbiased risk estimation. latent structure.

Abstract 3: Invited Talk - Anima Anandkumar: Abstract 8: Invited Talk - Yingzhen Li: Gradient Stein’s method for understanding optimization estimation for implicit models with Stein's in neural networks. in Stein’s Method for method. in Stein’s Method for Machine Learning Machine Learning and Statistics, Anandkumar and Statistics, Li 01:45 PM 09:45 AM Implicit models, which allow for the generation of Training neural networks is a challenging samples but not for point-wise evaluation of non-convex optimization problem. Stein’s method probabilities, are omnipresent in real-world provides a novel way to change optimization problems tackled by machine learning and a hot problem to a tensor decomposition problem for topic of current research. Some examples include

Page 74 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 data simulators that are widely used in engineering these are based on the developed theory and and scientific research, generative adversarial leverage the geometry of the Wasserstein space. networks (GANs) for image synthesis, and Experimental results show the improved hot-off-the-press approximate inference techniques convergence by the acceleration framework and relying on implicit distributions. Gradient based enhanced sample accuracy by the optimization/sampling methods are often applied to bandwidth-selection method. train these models, however without tractable densities, the objective functions often need to be Abstract 11: Invited Talk - Paul Valiant: How the approximated. In this talk I will motivate gradient Ornstein-Uhlenbeck process drives estimation as another approximation approach for generalization for deep learning. in Stein’s training implicit models and perform Monte Carlo Method for Machine Learning and Statistics, based approximate inference. Based on this view, I 03:45 PM will then present the Stein gradient estimator which After discussing the Ornstein-Uhlenbeck process estimates the score function of an implicit model and how it arises in the context of Stein's method, density. I will discuss connections of this approach we turn to an analysis of the stochastic gradient to score matching, kernel methods, denoising descent method that drives deep learning. We show auto-encoders, etc., and show application cases that certain noise that often arises in the training including entropy regularization for GANs, and process induces an Ornstein-Uhlenbeck process on meta-learning for stochastic gradient MCMC the learned parameters. This process is responsible algorithms. for a weak regularization effect on the training that, Abstract 9: Invited Talk - Ruiyi Zhang: On once it reaches stable points, will have provable Wasserstein Gradient Flows and Particle-Based found the "simplest possible" hypothesis consistent Variational Inference in Stein’s Method for with the training data. At a higher level, we argue Machine Learning and Statistics, ZHANG 02:30 how some of the big mysteries of the success of PM deep learning may be revealed by analyses of the subtle stochastic processes which govern deep Particle-based variational inference methods learning training, a natural focus for this community. (ParVIs), such as models associated with Stein variational gradient descent, have gained attention Abstract 12: Invited Talk - Louis Chen: Palm in the Bayesian inference literature, for their theory, random measures and Stein couplings. capacity to yield flexible and accurate in Stein’s Method for Machine Learning and approximations. We explore ParVIs from the Statistics, Chen 04:30 PM perspective of Wasserstein gradient flows, and In this talk, we present a general Kolmogorov bound make both theoretical and practical contributions. for normal approximation, proved using Stein’s We unify various finite-particle approximations that method. We will apply this bound to random existing ParVIs use, and recognize that the measures using Palm theory and coupling, with approximation is essentially a compulsory applications to stochastic geometry. We will also smoothing treatment, in either of two equivalent apply this bound to Stein couplings, which include forms. This novel understanding reveals the local dependence, exchangeable pairs, size-bias assumptions and relations of existing ParVIs, and couplings and more as special cases. This talk is also inspires new ParVIs. We propose an based on joint work with Adrian Roellin and Aihua acceleration framework and a principled Xia. bandwidth-selection method for general ParVIs;

Page 75 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

to address social impact issues and - the dearth of conferences and journals centered AI For Social Good (AISG) around the topic. Researchers motivated to help often find themselves without a clear idea of which Margaux Luck, Kris Sankaran, Tristan Sylvain, fields to delve into. Sean McGregor, Jonnie Penn, Girmaw Abebe Tadesse, Virgile Sylvain, Myriam Côté, Lester **Goals:** Our workshop address both these issues Mackey, Rayid Ghani, Yoshua Bengio by bringing together machine learning researchers, social impact leaders, stakeholders, policy leaders, 104 B, Sat Jun 15, 08:30 AM and philanthropists to discuss their ideas and applications for social good. To broaden the impact #AI for Social Good beyond the convening of our workshop, we are ##Important information partnering with [AI **Contact information:** Commons](http://www.aicommons.com) to expose [[email protected]](mailto:[email protected]) accepted projects and papers to the broader

community of machine learning researchers and **Submission deadline:** ***EXTENDED to April engineers. The projects/research may be at varying 26th 2019 11:59PM ET*** degrees of development, from formulation as a data

problem to detailed requirements for effective **[Workshop deployment. We hope that this gathering of talent website](https://aiforsocialgood.github.io/icml2019/index.htm)** and information will inspire the creation of new

approaches and tools by the community, help **[Submission scientists access the data they need, involve social website](https://cmt3.research.microsoft.com/AISGW2019/)** and policy stakeholders in the framing of machine

learning applications, and attract interest from **Poster Information:** philanthropists invited to the event to make a dent in

our shared goals. * Poster Size - ** 36W x 48H inches or 90 x 122 cm** **Topics:** The UN Sustainable Development Goals

(SDGs), a set of seventeen objectives whose * Poster Paper - **lightweight paper - not completion is set to lead to a more equitable, laminated** prosperous, and sustainable world. In this light, our

main areas of focus are the following: health, ##Abstract education, the protection of democracy, urban This workshop builds on our AI for Social Good planning, assistive technology, agriculture, workshop at [NeurIPS environmental protection and sustainability, social 2018](https://aiforsocialgood.github.io/2018/) and welfare and justice, developing world. Each of these [ICLR themes presents unique opportunities for AI to 2019](https://aiforsocialgood.github.io/iclr2019/). reduce human suffering and allow citizens and

democratic institutions to thrive. **Introduction:** The rapid expansion of AI research presents two clear conundrums: Across these topics, we have dual goals:

recognizing high-quality work in machine learning - the comparative lack of incentives for researchers

Page 76 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 motivated by or applied to social applications, and 08:45 Welcoming and creating meaningful connections between AM Poster set-up communities dedicated to solving technical and 09:00 social problems. To this extent, we propose two Opening remarks Bengio research tracks: AM Solving societal - **Short Papers Track (Up to four page papers + 09:05 challenges with AI Anandan unlimited pages for citations)** for oral and/or poster AM through presentation. The short papers should focus on past partnerships and current research work, showcasing actual 09:45 results and demonstrating beneficial effects on AI Commons Bengio AM society. We also accept short papers of recently published or submitted journal contributions to give Detecting authors the opportunity to present their work and 09:50 Waterborne Debris Sankaran obtain feedback from conference attendees. AM with Sim2Real and - **Problem Introduction Track (Application form, up Randomization to five page responses + unlimited pages for Conversational citations)** which will present a specific solution that 10:00 agents to address Beauxis-Aussalet will be shared with stakeholders, scientists, and AM abusive online funders. The workshop will provide a suite of behaviors questions designed to: (1) estimate the feasibility Computer Vision and impact of the proposed solutions, and (2) 10:10 For Food Quality: estimate the importance of data in their AM The Case of Injera implementation. The application responses should highlight ideas that have not yet been implemented AI for Mitigating in practice but can lead to real impact. The projects Effects of Climate 10:20 may be at varying degrees of development, from and Weather AM formulation as a data problem to structure for Changes in effective deployment. The workshop provides a Agriculture supportive platform for developing these early-stage 10:30 Break / Poster or hobby proposals into real projects. This process AM Session 1 is designed to foster sharing different points of view 11:00 AI for Ecology and ranging from the scientific assessment of feasibility, Sheldon discussion of practical constraints that may be AM Conservation encountered, and attracting interest from Using AI for philanthropists invited to the event. Accepted 11:40 Economic Damani submissions may be promoted to the wider AI AM Upliftment of solutions community following the workshop via the Handicraft Industry [AI Commons](http://www.aicommons.com), with whom we are partnering to promote the longer-term development of projects.

Schedule

Page 77 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Deep Neural Towards Detecting Networks Improve Dyslexia in 04:50 11:50 Radiologists' Children's Geras, Wu PM AM Performance in Handwriting Using Breast Cancer Neural Networks Screening Assisting 12:00 Lunch - on your Vulnerable PM own 05:00 Communities PM through AI and OR: 02:00 AI for Whose Social from Data to PM Good? Deployed Decisions 02:30 How to Volunteer as High Open PM an AI Researcher 05:30 announcement and PM 03:00 Break / Poster Best Paper Award PM Session 2

Fang, Balashankar, Damani, Abstracts (18): 03:30 Beauxis-Aussalet, Poster Session Abstract 2: Opening remarks in AI For Social PM Wu, Bondi, Good (AISG), Bengio 09:00 AM Rußwurm, Ruhe, Saxena, Spoon Speaker bio: Creating Yoshua Bengio is Full Professor of the Department constructive change of Computer Science and Operations Research, 03:50 and avoiding scientific director of Mila, CIFAR Program PM unintended co-director of the CIFAR Learning in Machines and consequences from Brains program (formerly Neural Computation and machine learning Adaptive Perception), scientific director of IVADO and Canada Research Chair in Statistical Learning Learning Global Algorithms. His main research ambition is to Variations in 04:20 understand principles of learning that yield Outdoor PM_2.5 PM intelligence. He supervises a large group of Concentrations with graduate students and post-docs. His research is Satellite Images widely cited (over 130000 citations found by Google Pareto Efficient Scholar in August 2018, with an H-index over 120, 04:30 Fairness for SkewedBalashankar and rising fast). PM Subgroup Data Abstract 3: Solving societal challenges with AI Crisis Sub-Events through partnerships in AI For Social Good 04:40 on Social Media: A (AISG), Anandan 09:05 AM PM Case Study of Wildfires Wadhwani AI was inaugurated a little more than a year ago with the mission of bringing the power of

Page 78 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

AI to address societal challenges, especially among related travel interesting. underserved communities throughout the world. We aim to address problems all major domains Abstract 4: AI Commons in AI For Social Good including health, agriculture, education, (AISG), Bengio 09:45 AM infrastructure, and financial inclusion. We are AI Commons is a collective project whose goal is to currently working on three solutions (two in health make the benefits of AI available to all. Since AI and one in agriculture) and are exploring more research can benefit from the input of a large range areas where we can apply AI for social good.The of talents across the world, the project seeks to most important lesson that we have learned during develop ways for developers and organizations to our short stint is the importance of working in close collaborate more easily and effectively. As a partnership with other stakeholders and players in community operating in an environment of trust and the social sectors, especially NGOs and problem-solving, AI Commons can empower Government organizations. In this talk, I will use one researchers to tackle the world's important case, namely that of developing an AI based problems using all the possibilities of cutting-edge approach for Integrated Pest Management (IPM) in AI. Cotton Farming, to describe how this partnership based approach has evolved and been critical to Speaker bio: our solution development and implementation. Yoshua Bengio is Full Professor of the Department

of Computer Science and Operations Research, Speaker bio: scientific director of Mila, CIFAR Program Dr. P. Anandan is the CEO of Wadhwani Institute of co-director of the CIFAR Learning in Machines and Artificial Intelligence. His prior experience includes - Brains program (formerly Neural Computation and Adobe Research Lab India (2016-2017) as a VP for Adaptive Perception), scientific director of IVADO Research and a Distinguished Scientist and and Canada Research Chair in Statistical Learning Managing Director at Microsoft Research Algorithms. His main research ambition is to (1997-2014). He was also the founding director of understand principles of learning that yield Microsoft Research India which he ran from intelligence. He supervises a large group of 2005-2014. Earlier stint was at Sarnoff Corporation graduate students and post-docs. His research is (1991-1997) as a researcher and an Assistant widely cited (over 130000 citations found by Google Professor of Computer Science at Yale University Scholar in August 2018, with an H-index over 120, (1987-1991). His primary research area is and rising fast). Computer vision where he is well known for his fundamental and lasting contributions to the Abstract 5: Detecting Waterborne Debris with problem of visual motion analysis. He received his Sim2Real and Randomization in AI For Social PhD in Computer Science from University of Good (AISG), Sankaran 09:50 AM Massachusetts, Amherst in 1987, a Masters in Computer Science from University of Nebraska, Marine debris pollution is one of the most ubiquitous Lincoln in 1979 and his B.Tech in Electrical and pressing environmental issues affecting our Engineering from IIT Madras, India in 1977. He is a oceans today. Clean up efforts such as the Great distinguished alumnus of IIT Madras, and UMass, Pacific Garbage Patch project have been Amherst and is on the Nebraska Hall of Computing. implemented across the planet to combat this His hobbies include playing African drums, writing problem. However, resources to accomplish this poems (in Tamil) and travel which makes his work goal are limited, and the afflicted area is vast. To

Page 79 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 this end, unmanned vehicles that are capable of Associate at the Digital Society School of automatically detecting and removing small-sized Amsterdam University of Applied Science, where debris would be a great complementary approach to she investigates how data-driven technologies can existing large-scale garbage collectors. Due to the be applied for the best interests of society. She complexity of fully functioning unmanned vehicles holds a PhD on classification errors and biases from for both detecting and removing debris, in this Utrecht University. Her interests include ethical and project, we focus on the detection task as a first explainable AI, data literacy in the general public, step. From the perspective of machine learning, and the synergy between human & artificial there is an unfortunate lack of sufficient labeled data intelligence to tackle job automation. for training a specialized detector, e.g., a classifier that can distinguish debris from other objects like Abstract 7: Computer Vision For Food Quality: wild animals. Moreover, pre-trained detectors on The Case of Injera in AI For Social Good (AISG), other domains would be ineffective while creating 10:10 AM such datasets manually would be very costly. Due The use of Teff as an exclusive crop for making to the recent progress of training deep models with Injera, Ethiopian national staple, has changed synthetic data and domain randomization, we overtime.Driven by the ever increasing price of Teff, propose to train a debris detector based on a producers have added other ingredients, of which mixture of real and synthetic images. some are good (maze and rice), while others are

not. Hence, households opting for the industrial Speaker bio: solution of Injera, are disturbed by the fact that hey Kris is a postdoc at Mila working with Yoshua can not figure out what exactly is contained in their Bengio on problems related to Humanitarian AI. He Injera. Thousands of local producers and local is generally interested in ways to broaden the scope shopkeepers work together to make fresh Injera of problems studied by the machine learning available to millions around the country. However, community and am curious about the ways to bridge consumers are finding it more and more difficult to statistical and computational thinking. find a safe Injera for purchase. This injera is usually Abstract 6: Conversational agents to address sold unpacked, unlabeled and in an unsafe way abusive online behaviors in AI For Social Good through local shops. This being so, consumers face (AISG), Beauxis-Aussalet 10:00 AM more and more health risks, all the more as it is impossible to evaluate the ingredients contained in Technologies to address cyber bullying are limited the Injera they are buying There are two kinds of to detecting and hiding abusive messages. We risks: (a) the local producers might try to reduce the propose to investigate the potential of cost by using cheap ingredients, including risky conversational technologies for addressing abusers. additives, and (b) the shops might sell expired Injera We will outline directions for studying the warmed up. We discuss here the growing food effectiveness dialog strategies (e.g., to educate or safety problem faced by millions of Injera deter abusers, or keep them busy with chatbots consumers in Ethiopia, and the possibility of using rather than their victims) and for initiating new AI to solve this problem. research on chatbot-mediated mitigation of online abuse. Speaker bio: Wondimagegnehu is a master’s student in Speaker bio: Information Science at Addis Ababa University. He Emma Beauxis-Aussalet is a Senior Track is working on a master's thesis in learning an

Page 80 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 optimal representation of word structure for AI can help solve big data and decision-making morphological complex languages under a problems to understand and protect the constrained settings: limited training data and environment. I’ll survey several projects the area human supervision. He is interested in exploring and discuss how to approach environmental research challenges in using AI on a social setting. problems using AI. The Dark Ecology project uses weather radar and machine learning to unravel Abstract 8: AI for Mitigating Effects of Climate mysteries of bird migration. A surprising probabilistic and Weather Changes in Agriculture in AI For inference problem arises when analyzing animal Social Good (AISG), 10:20 AM survey data to monitor populations. Novel optimization algorithms can help reason about In recent years, floods, landslides and droughts dams, hydropower, and the ecology of river have become an annual occurrence in Sri Lanka. networks. Despite the efforts made by the government and other entities, these natural disasters remain Speaker bio: challenging mainly to the people who live in high Daniel Sheldon is an Assistant Professor of risk areas. It is also crucial to predict such disasters Computer Science at the University of early on to facilitate evacuation of people living in Massachusetts Amherst and Mount Holyoke these areas. Furthermore, Sri Lankan economy College. His research investigates fundamental largely depends on agriculture, yet this sector still problems in machine learning and AI motived by remains untouched by recent advancements of AI large-scale environmental data, dynamic ecological and other predictive analytics techniques. The processes, and real-world network phenomena. solution is to develop an AI based platform that generates insights from emerging data sources. It Abstract 11: Using AI for Economic Upliftment of will be modular, extensible and open source. Similar Handicraft Industry in AI For Social Good to any other real world AI system, the end solution (AISG), Damani 11:40 AM will consist of multiple data pipelines to extract data, analyze and present results through APIs. The The handicraft industry is a strong pillar of Indian presentation layer will be a public API that can be economy which provides large-scale employment consumed through a portal such as Disaster opportunities to artisans in rural and underprivileged Management Centre of Sri Lanka. communities. However, in this era of globalization, diverse modern designs have rendered traditional Speaker bio: designs old and monotonous, causing an alarming Narmada is research engineer at ConscientAI Labs decline of handicraft sales. In this talk, we will based in Sri Lanka. She is also a visiting research discuss our approach leveraging techniques like student at the Memorial University of GANs, Color Transfer, Pattern Generation etc. to Newfoundland, Canada. She is interested in generate contemporary designs for two popular research on climate change and effects of it on Indian handicrafts - Ikat and Block Print. The human lifestyle and Deep Learning for Computer resultant designs are evaluated to be significantly Vision. more likeable and marketable than the current designs used by artisans. Abstract 10: AI for Ecology and Conservation in AI For Social Good (AISG), Sheldon 11:00 AM Speaker bio: Sonam Damani is an Applied Scientist in Microsoft, India where she has worked on several projects in

Page 81 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 the field of AI and Deep Learning, including MSc as a visiting student at the University of Microsoft's human-like-chatbot Ruuh, Cortana Edinburgh with Amos Storkey. His BSc is from the personality, novel art generation using AI, Bing University of Warsaw. search relevance, among others. In the past year, she has co-authored a bunch of publications in the - Nan Wu is a PhD student at NYU Center for Data field of conversational AI and AI creativity that were Science. She is interested in data science with presented in NeurIPS, WWW and CODS-COMAD. application to healthcare and currently working on medical image analysis. Before joining NYU, she Abstract 12: Deep Neural Networks Improve graduated from School for Gifted Young, University Radiologists' Performance in Breast Cancer of Science and Technology of China, receiving B.S Screening in AI For Social Good (AISG), Geras, in Statistics and B.A. in Business Administration. Wu 11:50 AM Abstract 14: AI for Whose Social Good? in AI For We present a deep CNN for breast cancer Social Good (AISG), 02:00 PM screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 Luke Stark will discuss two recent papers (Greene, images). Our network achieves an AUC of 0.895 in Hoffmann & Stark 2019; Stark & Hoffmann 2019) predicting whether there is a cancer in the breast, that use discursive analysis to examine a) recent when tested on the screening population. We high-profile value statements endorsing ethical attribute the high accuracy of our model to a design for artificial intelligence and machine two-stage training procedure, which allows us to learning and b) professional ethics codes in use a very high-capacity patch-level network to computer science, statistics, and other fields. learn from pixel-level labels alongside a network Guided by insights from Science and Technology learning from macroscopic breast-level labels. To Studies, values in design, and the sociology of validate our model, we conducted a reader study business ethics, he will discuss the grounding with 14 readers, each reading 720 screening assumptions and terms of debate that shape current mammogram exams, and find our model to be as conversations about ethical design in data science accurate as experienced radiologists when and AI. He will also advocate for an expanded view presented with the same data. Finally, we show that of expertise in understanding what ethical AI/ML/AI a hybrid model, averaging probability of malignancy for Social Good should mean. predicted by a radiologist with a prediction of our neural network, is more accurate than either of the Speaker bio: two separately. Luke Stark is a Postdoctoral Researcher in the Fairness, Accountability, Transparency and Ethics Speaker bio: (FATE) Group at Microsoft Research Montreal, and - Krzysztof Geras is an assistant professor at NYU an Affiliate of the Berkman Klein Center for Internet School of Medicine and an affiliated faculty at NYU & Society at Harvard University. Luke holds a PhD Center for Data Science. His main interests are in from the Department of Media, Culture, and unsupervised learning with neural networks, model Communication at New York University, and an compression, transfer learning, evaluation of Honours BA and MA in History from the University machine learning models and applications of these of Toronto. Trained as a media historian, his techniques to medical imaging. He previously did a scholarship centers on the interconnected histories postdoc at NYU with Kyunghyun Cho, a PhD at the of artificial intelligence (AI) and behavioral science, with Charles Sutton and an and on the ways the social and ethical contexts of

Page 82 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

AI are changing how we work, communicate, and unintended optimization processes to design participate in civic life. high-stakes technical and bureaucratic system. These are important in their own right, but they are Abstract 15: How to Volunteer as an AI also important contributors to conversations about Researcher in AI For Social Good (AISG), High social good applications of AI, which are also 02:30 PM subject to significant potential for unintended consequences. Over the past six years, Will High has volunteered his expertise as a data scientist to various Speaker bio: nonprofits and civic causes. He's contributed to Peter Eckersley is Director of Research at the work on homelessness, improving charter schools Partnership on AI, a collaboration between the and optimizing water distribution. Will will talk about major technology companies, civil society and his experience doing pro-bono work with DataKind, academia to ensure that AI is designed and used to a global nonprofit based in New York that connects benefit humanity. He leads PAI's research on leading social change organizations with data machine learning policy and ethics, including science talent to collaborate on cutting-edge projects within PAI itself and projects in analytics and advanced algorithms developed to collaboration with the Partnership's extensive maximize social impact. He'll comment on membership. Peter's AI research interests are DataKind's mission, how to structure effective broad, including measuring progress in the field, pro-bono engagements, and broader principles of figuring out how to translate ethical and safety the pro bono model applied to machine learning, concerns into mathematical constraints, finding the analytics and engineering. right metaphors and ways of thinking about AI development, and setting sound policies around Speaker bio: high-stakes applications such as self-driving Will is a data science executive at Joymode in Los vehicles, recidivism prediction, cybersecurity, and Angeles and works with DataKind as Data military applications of AI. Prior to joining PAI, Peter Ambassador, consultant and facilitator. Will was was Chief Computer Scientist for the Electronic previously a Senior Data Scientist at Netflix. He Frontier Foundation. At EFF he lead a team of holds a PhD in physics from Harvard. technologists that launched numerous computer security and privacy projects including Let's Encrypt Abstract 18: Creating constructive change and and Certbot, Panopticlick, HTTPS Everywhere, the avoiding unintended consequences from SSL Observatory and Privacy Badger; they also machine learning in AI For Social Good (AISG), worked on diverse Internet policy issues including 03:50 PM campaigning to preserve open wireless networks; This talk will give an overview of some of the known fighting to keep modern computing platforms open; failure modes that are leading to unintended helping to start the campaign against the consequences in AI development, as well as SOPA/PIPA Internet blacklist legislation; and research agendas and initiatives to mitigate them, running the first controlled tests to confirm that including a number that are underway at the Comcast was using forged reset packets to interfere Partnership on AI (PAI). Important case studies with P2P protocols. Peter holds a PhD in computer include the use of algorithmic risk assessment tools science and law from the University of Melbourne; in the US criminal justice system, and the his research focused on the practicality and side-effects that are caused by using deliberate or desirability of using alternative compensation

Page 83 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 systems to legalize P2P file sharing and similar of ML fairness has developed mitigation techniques. distribution tools while still paying authors and A prevalent method applies constraints, including artists for their work. He currently serves on the equality of performance, with respect to subgroups board of the Internet Security Research Group and defined over the intersection of sensitive attributes the Advisory Council of the Open Technology Fund; such as race and gender. Enforcing such he is an Affiliate of the Center for International constraints when the subgroup populations are Security and Cooperation at Stanford University and considerably skewed with respect to a target can a Distinguished Technology Fellow at EFF. lead to unintentional degradation in performance, without benefiting any individual subgroup, counter Abstract 19: Learning Global Variations in to the United Nations Sustainable Development Outdoor PM_2.5 Concentrations with Satellite goals of reducing inequalities and promoting growth. Images in AI For Social Good (AISG), 04:20 PM In order to avoid such performance degradation while ensuring equitable treatment to all groups, we The World Health Organization identifies outdoor propose Pareto-Efficient Fairness (PEF), which fine particulate air pollution (PM2.5) as a leading identifies the operating point on the Pareto curve of risk factor for premature mortality globally. As such, subgroup performances closest to the fairness understanding the global distribution of PM2.5 is an hyperplane. Specifically, PEF finds a Pareto essential precursor towards implementing pollution Optimal point which maximizes multiple subgroup mitigation strategies and modelling global public accuracy measures. The algorithm scalarizes using health. Here, we present a convolutional neural the adaptive weighted metric norm by iteratively network based approach for estimating annual searching the Pareto region of all models enforcing average outdoor PM2.5 concentrations using only the fairness constraint. PEF is backed by strong satellite images. The resulting model achieves theoretical results on discoverability and provides comparable performance to current state-of-the-art domain practitioners finer control in navigating both statistical models. convex and non-convex accuracy-fairness trade-offs. Empirically, we show that PEF increases Speaker bio: performance of all subgroups in skewed synthetic - Kris Y Hong is a research assistant and data and UCI datasets. prospective PhD student in the Weichenthal Lab at

McGill University, in Montreal, Canada. His interests Speaker bio: lie in applying current statistical and machine Ananth Balashnkar is a 2nd year Ph.D student in learning techniques towards solving humanitarian Computer Science advised by Prof. and environmental challenges. Prior to joining Lakshminarayanan Subramanian at NYU's Courant McGill, he was a data analyst at the British Institute of Mathematical Sciences. He is currently Columbia Centre for Disease Control while interested in Interpretable Machine Learning and receiving his B.Sc. in Statistics from the University the challenges involved in applying machine of British Columbia. perception for the domains of policy, privacy, economics and healthcare. Abstract 20: Pareto Efficient Fairness for Skewed Subgroup Data in AI For Social Good (AISG), Abstract 21: Crisis Sub-Events on Social Media: Balashankar 04:30 PM A Case Study of Wildfires in AI For Social Good (AISG), 04:40 PM As awareness of the potential for learned models to amplify existing societal biases increases, the field

Page 84 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Social media has been extensively used for crisis technique to be used in conjunction with current management. Recent work examines possible approaches, to accelerate the detection process. sub-events as a major crisis unfolds. In this project, Preliminary results suggest our system out-performs we first propose a framework to identify sub-events teachers. from tweets. Then, leveraging 4 California wildfires in 2018-2019 as a case study, we investigate how Speaker bio: sub-events cascade based on existing hypotheses Katie Spoon recently completed her B.S./M.S. in drawn from the disaster management literature, and computer science from Indiana University with find that most hypotheses are supported on social minors in math and statistics, and with research media, e.g., fire induces smoke, which causes air interests in anomaly detection, computer vision, pollution, which later harms health and eventually data visualization, and applications of computer affects the healthcare system. In addition, we vision to health and education, like her senior thesis discuss other unexpected sub-events that emerge detecting dyslexia with neural networks. She from social media. worked at IBM Research in the summer of 2018 on neuromorphic computing, and will be returning there Speaker bio: full-time. She hopes to potentially get a PhD and Alejandro (Alex) Jaimes is Chief Scientist and SVP become a corporate research scientist. of AI at Dataminr. Alex has 15+ years of intl. experience in research and product impact at scale. Abstract 23: Assisting Vulnerable Communities He has published 100+ technical papers in top-tier through AI and OR: from Data to Deployed conferences and journals in diverse topics in AI and Decisions in AI For Social Good (AISG), 05:00 has been featured widely in the press (MIT Tech PM review, CNBC, Vice, TechCrunch, Yahoo! Finance, N/A etc.). He has given 80+ invited talks (AI for Good Global Summit (UN, Geneva), the Future of Speaker bio: Technology Summit, O’Reilly (AI, Strata, Velocity), Phebe Vayanos is Assistant Professor of Industrial Deep Learning Summit, etc.). Alex is also an & Systems Engineering and Computer Science at Endeavor Network mentor (which leads the the University of Southern California, and Associate high-impact entrepreneurship movement around the Director of the CAIS Center for Artificial Intelligence world), and was an early voice in Human-Centered in Society. Her research aims to address AI (Computing). He holds a Ph.D. from Columbia U. fundamental questions in data-driven optimization Abstract 22: Towards Detecting Dyslexia in (aka prescriptive analytics) with aim to tackle Children's Handwriting Using Neural Networks real-world decision- and policy-making problems in in AI For Social Good (AISG), 04:50 PM uncertain and adversarial environments.

Dyslexia is a learning disability that hinders a person's ability to read. Dyslexia needs to be caught Synthetic Realities: Deep Learning for early, however, teachers are not trained to detect Detecting AudioVisual Fakes dyslexia and screening tests are used inconsistently. We propose (1) two new data sets of Battista Biggio, Pavel Korshunov, Thomas handwriting collected from children with and without Mensink, Giorgio Patrini, Arka Sadhu, Delip Rao dyslexia amounting to close to 500 handwriting samples, and (2) an automated early screening 104 C, Sat Jun 15, 08:30 AM

Page 85 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

With the latest advances of deep generative Invited Talk by models, synthesis of images and videos as well as Professor Pawel of human voices have achieved impressive realism. 11:30 Korus (NYU) Neural In many domains, synthetic media are already AM Imaging Pipelines - difficult to distinguish from real by the human eye the Scourge or and ear. The potential of misuses of these Hope of Forensics? technologies is seldom discussed in academic papers; instead, vocal concerns are rising from Contributed Talk: We Need No Pixels: media and security organizations as well as from 12:00 Video Manipulation Güera governments. Researchers are starting to PM experiment on new ways to integrate deep learning Detection Using with traditional media forensics and security Stream Descriptors techniques as part of a technological solution. This Contributed Talk: A workshop will bring together experts from the 12:15 Utility-Preserving Hao communities of machine learning, computer security PM GAN for Face and forensics in an attempt to highlight recent work Obscuration and discuss future effort to address these 12:30 challenges. Our agenda will alternate contributed Lunch Break PM papers with invited speakers. The latter will emphasize connections among the interested Invited Talk by scientific communities and the standpoint of Professor Luisa 02:00 institutions and media organizations. Verdoliva Verdoliva PM (University Federico Schedule II Naples)

Contributed Talk: 09:00 Tampered Speaker Welcome Remarks Patrini AM Inconsistency 02:30 Invited Talk by Detection with Korshunov 09:10 PM Professor Alexei Efros Phonetically Aware AM Efros (UC Berkeley) Audio-visual Features 09:40 Invited Talk by Dr. Turek AM Matt Turek (DARPA) Contributed Talk: Measuring the Contributed Talk: Effectiveness of Limits of Deepfake 10:10 Voice Conversion Detection: A Robust 02:45 AM on Speaker Keskin Estimation PM Identification and Viewpoint Automatic Speech 10:30 Poster session 1 Recognition AM and Coffee break Systems

03:00 Poster session 2 PM and Coffee break

Page 86 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

CVPR19 Media However, clever manipulators should also carefully 03:45 Forensics forge the metadata and auxiliary header PM workshop: a information, which is harder to do for videos than Preview images. In this paper, we propose to identify forged videos by analyzing their multimedia stream Invited Talk by Tom 04:15 descriptors with simple binary classifiers, completely Van de Weghe PM avoiding the pixel space. Using well-known (Stanford & VRT) datasets, our results show that this scalable Invited Talk by Aviv approach can achieve a high manipulation detection 04:45 Ovadya (Thoughtful score if the manipulators have not done a careful PM Technology Project) data sanitization of the multimedia stream descriptors. Panel Discussion 05:00 moderated by Delip PM Abstract 8: Contributed Talk: A Utility-Preserving Rao GAN for Face Obscuration in Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes, Hao 12:15 PM Abstracts (5): From TV news to Google StreetView, face obscu- Abstract 4: Contributed Talk: Limits of Deepfake ration has been used for privacy protection. Due Detection: A Robust Estimation Viewpoint in to recent advances in the field of deep learning, ob- Synthetic Realities: Deep Learning for Detecting scuration methods such as Gaussian blurring and AudioVisual Fakes, 10:10 AM pixelation are not guaranteed to conceal identity. Deepfake detection is formulated as a hypothesis In this paper, we propose a utility-preserving gen- testing problem to classify an image as genuine or erative model, UP-GAN, that is able to provide GAN-generated. A robust statistics view of GANs is an effective face obscuration, while preserving considered to bound the error probability for various facial utility. By utility-preserving we mean pre- GAN implementations in terms of their performance. serving facial features that do not reveal identity, The bounds are further simplified using a Euclidean such as age, gender, skin tone, pose, and expres- approximation for the low error regime. Lastly, sion. We show that the proposed method achieves relationships between error probability and epidemic a better performance than the common obscura- thresholds for spreading processes in networks are tion methods in terms of obscuration and utility established. preservation.

Abstract 7: Contributed Talk: We Need No Pixels: Abstract 11: Contributed Talk: Tampered Speaker Video Manipulation Detection Using Stream Inconsistency Detection with Phonetically Descriptors in Synthetic Realities: Deep Aware Audio-visual Features in Synthetic Learning for Detecting AudioVisual Fakes, Realities: Deep Learning for Detecting Güera 12:00 PM AudioVisual Fakes, Korshunov 02:30 PM

Manipulating video content is easier than ever. Due The recent increase in social media based pro- to the misuse potential of manipulated content, paganda, i.e., ‘fake news’, calls for automated multiple detection techniques that analyze the pixel methods to detect tampered content. In this paper, data from the videos have been proposed. we focus on detecting tampering in a video with

Page 87 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 a person speaking to a camera. This form of ma- nipulation is easy to perform, since one can just replace a part of the audio, dramatically chang- ICML Workshop on Imitation, Intent, and ing the meaning of the video. We consider sev- Interaction (I3) eral detection approaches based on phonetic fea- tures and recurrent networks. We demonstrate Nicholas Rhinehart, Sergey Levine, Chelsea that by replacing standard MFCC features with Finn, He He, Ilya Kostrikov, Justin Fu, Siddharth embeddings from a DNN trained for automatic Reddy speech recognition, combined with mouth land- 201, Sat Jun 15, 08:30 AM marks (visual features), we can achieve a signif- icant performance improvement on several chal- **Website**: lenging publicly available databases of speakers [https://sites.google.com/view/icml-i3](https://sites.google.com/view/icml-i3) (VidTIMIT, AMI, and GRID), for which we gen- erated sets of tampered data. The evaluations **Abstract:** A key challenge for deploying demonstrate a relative equal error rate reduction interactive machine learning systems in the real of 55% (to 4.5% from 10.0%) on the large GRID world is the ability for machines to understand corpus based dataset and a satisfying generaliza- human intent. Techniques such as imitation learning tion of the model on other datasets. and inverse reinforcement learning are popular data-driven paradigms for modeling agent intentions Abstract 12: Contributed Talk: Measuring the and controlling agent behaviors, and have been Effectiveness of Voice Conversion on Speaker applied to domains ranging from robotics and Identification and Automatic Speech autonomous driving to dialogue systems. Such Recognition Systems in Synthetic Realities: techniques provide a practical solution to specifying Deep Learning for Detecting AudioVisual Fakes, objectives to machine learning systems when they Keskin 02:45 PM are difficult to program by hand. This paper evaluates the effectiveness of a Cycle- GAN based voice converter (VC) on four speaker While significant progress has been made in these identification (SID) systems and an automated areas, most research effort has concentrated on speech recognition (ASR) system for various pur- modeling and controlling single agents from dense poses. Audio samples converted by the VC model demonstrations or feedback. However, the real are classified by the SID systems as the intended world has multiple agents, and dense expert data target at up to 46% top-1 accuracy among more collection can be prohibitively expensive. than 250 speakers. This encouraging result in Surmounting these obstacles requires progress in imitating the target styles led us to investigate frontiers such as: if converted (synthetic) samples can be used to 1) the ability to infer intent from multiple modes of improve ASR training. Unfortunately, adding syn- data, such as language or observation, in addition thetic data to the ASR training set only marginally to traditional demonstrations. improves word and character error rates. Our re- 2) the ability to model multiple agents and their sults indicate that even though VC models can intentions, both in cooperative and adversarial successfully mimic the style of target speakers as settings. measured by SID systems, improving ASR train- 3) handling partial or incomplete information from ing with synthetic data from VC systems needs the expert, such as demonstrations that lack dense further research to establish its efficacy. action annotations, raw videos, etc..

Page 88 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

02:05 Natasha Jaques The workshop on Imitation, Intention, and PM Interaction (I3) seeks contributions at the interface 02:25 of these frontiers, and will bring together Pierre Sermanet researchers from multiple disciplines such as PM robotics, imitation and reinforcement learning, 02:45 Nicholas R cognitive science, AI safety, and natural language PM Waytowich understanding. Our aim will be to reexamine the 03:05 Poster session and assumptions in standard imitation learning problem PM coffee statements (e.g., inverse reinforcement learning) and connect distinct application disciplines, such as 04:30 Kris Kitani robotics and NLP, with researchers developing core PM imitation learning algorithms. In this way, we hope 04:50 Abhishek Das to arrive at new problem formulations, new research PM directions, and the development of new connections across distinct disciplines that interact with imitation learning methods. Abstracts (7):

Schedule Abstract 2: Hal Daumé III in ICML Workshop on Imitation, Intent, and Interaction (I3), 09:00 AM 08:45 Welcoming Title: Beyond demonstrations: Learning behavior AM Remarks from higher-level supervision 09:00 Hal Daumé III AM Abstract 3: Joyce Chai in ICML Workshop on Imitation, Intent, and Interaction (I3), 09:20 AM 09:20 Joyce Chai AM Title: Collaboration in Situated Language 09:40 Communication Stefano Ermon AM Abstract 4: Stefano Ermon in ICML Workshop on 10:00 Iris R. Seaman Imitation, Intent, and Interaction (I3), 09:40 AM AM

10:20 Poster session and Multi-agent Imitation and Inverse Reinforcement AM coffee Learning

11:30 Changyou Chen Abstract 5: Iris R. Seaman in ICML Workshop on AM Imitation, Intent, and Interaction (I3), 10:00 AM 11:50 Faraz Torabi AM Title: Nested Reasoning About Autonomous Agents Using Probabilistic Programs 12:10 Seyed Kamyar PM Seyed Ghasemipour Abstract 11: Natasha Jaques in ICML Workshop 12:30 on Imitation, Intent, and Interaction (I3), 02:05 Lunch break PM PM

Page 89 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Title: Social Influence as Intrinsic Motivation for we argue that we should learn from large amounts Multi-Agent Deep Reinforcement Learning of unlabeled play data. Play serves as a way to explore and learn the breadth of what is possible in We propose a unified mechanism for achieving an undirected way. This strategy is widely used in coordination and communication in Multi-Agent nature to prepare oneself to achieve future tasks Reinforcement Learning (MARL), through rewarding without knowing in advance which ones. In this talk, agents for having causal influence over other we present methods for learning vision and control agents' actions. Causal influence is assessed using representations entirely from unlabeled sequences. counterfactual reasoning. At each timestep, an We demonstrate these representations self-arrange agent simulates alternate actions that it could have semantically and functionally and can be used for taken, and computes their effect on the behavior of downstream tasks, without ever using labels or other agents. Actions that lead to bigger changes in rewards. other agents' behavior are considered influential and are rewarded. We show that this is equivalent Abstract 15: Kris Kitani in ICML Workshop on to rewarding agents for having high mutual Imitation, Intent, and Interaction (I3), 04:30 PM information between their actions. Empirical results Title: Multi-modal trajectory forecasting demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically Coding Theory For Large-scale Machine increasing the learning curves of the deep RL agents, and leading to more meaningful learned Learning communication protocols. The influence rewards for Viveck Cadambe, Pulkit Grover, Dimitris all agents can be computed in a decentralized way Papailiopoulos, Gauri Joshi by enabling agents to learn a model of other agents using deep neural networks. In contrast, key 202, Sat Jun 15, 08:30 AM previous works on emergent communication in the MARL setting were unable to learn diverse policies # Coding Theory For Large-scale Machine Learning in a decentralized manner and had to resort to Coding theory involves the art and science of how centralized training. Consequently, the influence to add redundancy to data to ensure that a reward opens up a window of new opportunities for desirable output is obtained at despite deviations research in this area." from ideal behavior from the system components that interact with the data. Through a rich, Abstract 12: Pierre Sermanet in ICML Workshop mathematically elegant set of techniques, coding on Imitation, Intent, and Interaction (I3), 02:25 theory has come to significantly influence the design PM of modern data communications, compression and storage systems. The last few years have seen a Title: Self-Supervision and Play rapidly growing interest in coding theory based

approaches for the development of efficient Abstract: Real-world robotics is too complex to machine learning algorithms towards robust, supervise with labels or through reward functions. large-scale, distributed computational pipelines. While some amount of supervision is necessary, a more scalable approach instead is to bootstrap The CodML workshop brings together researchers learning through self-supervision by first learning developing coding techniques for machine learning, general task-agnostic representations. Specifically,

Page 90 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 as well as researchers working on systems * Performance evaluation of coding techniques implementations for computing, with cutting-edge presentations from both sides. The goal is to learn ### Submission Format and Instructions about non-idealities in system components as well as approaches to obtain reliable and robust learning The authors should prepare extended abstracts in despite these non-idealities, and identify problems the ICML paper format and submit via of future interest. [CMT](https://cmt3.research.microsoft.com/CODMLW2019/). Submitted papers may not exceed three (3) The workshop is co-located with ICML 2019, and single-spaced double-column pages excluding will be held in Long Beach, California, USA on June references. All results, proofs, figures, tables must 14th or 15th, 2019. be included in the 3 pages. The submitted manuscripts should include author names and Please see the affiliations, and an abstract that does not exceed [website](https://sites.google.com/view/codml2019) 250 words. The authors may include a link to an for more details: extended version of the paper that includes supplementary material (proofs, experimental ## Call for Posters details, etc.) but the reviewers are not required to read the extended version. ### Scope of the Workshop ### Dual Submission Policy In this workshop we solicit research papers focused on the application of coding and Accepted submissions will be considered information-theoretic techniques for distributed non-archival and can be submitted elsewhere machine learning. More broadly, we seek papers without modification, as long as the other that address the problem of making machine conference allows it. Moreover, submissions to learning more scalable, efficient, and robust. Both CodML based on work recently accepted to other theoretical as well as experimental contributions are venues are also acceptable (though authors should welcome. We invite authors to submit papers on explicitly make note of this in their submissions). topics including but not limited to:

* Asynchronous Distributed Training Methods ### Key Dates * Communication-Efficient Training * Model Compression and Quantization **Paper Submission:** May 3rd, 2019, 11:59 PM * Gradient Coding, Compression and Quantization anywhere on earth * Erasure Coding Techniques for Straggler Mitigation **Decision Notification:** May 12th, 2019. * Data Compression in Large-scale Machine Learning **Workshop date:** June 14 or 15, 2019 * Erasure Coding Techniques for ML Hardware Acceleration Schedule * Fast, Efficient and Scalable Inference * Secure and Private Machine Learning 08:45 Opening Remarks * Data Storage/Access for Machine Learning Jobs AM

Page 91 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Salman Avestimehr: Draper, Aktas, Guler, Lagrange Coded Wang, Gandikota, Computing: Optimal Park, So, Tauz, 09:00 Design for Resilient,Avestimehr Narra, Lin, AM Secure, and Private Maddahali, Yang, Distributed Dutta, Reisizadeh, 10:30 Learning Poster Session I Wang, Balevi, Jain, AM McVay, Rudow, Soto, Vivienne Sze: Li, Subramaniam, Exploiting Demirhan, Gupta, 09:30 redundancy for Sze Oktay, Barnes, Ballé, AM efficient processing Haddadpour, Jeong, of DNNs and Chen, Fahim beyond Rashmi Vinayak: "Locality Driven Resilient ML Coded 11:30 inference via coded 10:00 Computation" Vinayak Rudow AM computation: A AM Michael Rudow, learning-based Rashmi Vinayak and approach Venkat Guruswami 12:00 "CodeNet: Training Lunch Break PM Large-Scale Neural Networks in Markus Weimer: A Presence of 01:30 case for coded 10:10 Weimer Soft-Errors," Dutta PM computing on AM Sanghamitra Dutta, elastic compute Ziqian Bai, Tze Alex Dimakis: Meng Low and 02:00 Coding Theory for Pulkit Grover Dimakis PM Distributed "Reliable Clustering Learning with Redundant Wei Zhang: 10:20 Data Assignment" Gandikota Distributed deep AM Venkat Gandikota, learning system Arya Mazumdar and 02:30 building at IBM: Zhang Ankit Singh Rawat PM Scale-up and Scale-out case studies

Page 92 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

"OverSketched 203, Sat Jun 15, 08:30 AM Newton: Fast Research at the intersection of vision and language Convex has been attracting a lot of attention in recent years. Optimization for Topics include the study of multi-modal Serverless representations, translation between modalities, 03:30 Systems," Vipul Gupta bootstrapping of labels from one modality into PM Gupta, Swanand another, visually-grounded question answering, Kadhe, Thomas segmentation and storytelling, and grounding the Courtade, Michael meaning of language in visual data. An Mahoney and ever-increasing number of tasks and datasets are Kannan appearing around this recently-established field. Ramchandran

"Cooperative SGD: At NeurIPS 2018, we released the How2 data-set, A Unified containing more than 85,000 (2000h) videos, with Framework for the audio, transcriptions, translations, and textual Design and summaries. We believe it presents an ideal 03:40 Analysis of Wang resource to bring together researchers working on PM Communication-Efficient the previously mentioned separate tasks around a SGD Algorithms", single, large dataset. This rich dataset will facilitate Jianyu Wang and the comparison of tools and algorithms, and Gauri Joshi hopefully foster the creation of additional "Secure Coded annotations and tasks. We want to foster discussion Multi-Party about useful tasks, metrics, and labeling Computation for techniques, in order to develop a better Massive Matrices understanding of the role and value of 03:50 with Adversarial multi-modality in vision and language. We seek to Maddahali PM Nodes," Seyed create a venue to encourage collaboration between Reza, Mohammad different sub-fields, and help establish new research Ali Maddah-Ali and directions and collaborations that we believe will Mohammad Reza sustain machine learning research for years to Aref come.

04:00 Poster Session II [Workshop PM Homepage](https://srvk.github.io/how2-challenge/)

Schedule

The How2 Challenge: New Tasks for Vision 08:45 Welcome & Language AM

09:00 The How2 Database Florian Metze, Lucia Specia, Desmond Elliot, Specia, Sanabria Loic Barrault, Ramon Sanabria, Shruti Palaskar AM and Challenge

Page 93 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

10:15 Coffee Break AM Abstracts (2): Forcing Vision + 10:30 Language Models Abstract 7: Multi-agent communication from raw Parikh AM To Actually See, Not perceptual input: what works, what doesn't and Just Talk what's next in The How2 Challenge: New Tasks for Vision & Language, Lazaridou 01:30 PM Topics in Vision and Language: 11:00 Multi-agent communication has been traditionally Grounding, AM used as a computational tool to study language Segmentation and evolution. Recently, it has attracted attention also as Author Anonymity a means to achieve better coordination among Learning to Reason: multiple interacting agents in complex Modular and environments. However, is it easy to scale previous Relational research in the new deep learning era? In this talk, I 11:30 Representations for Saenko will first give a brief overview of some of the AM Visual Questions previous approaches that study emergent and Referring communication in cases where agents are given as Expressions input symbolic data. I will then move on to presenting some of the challenges that agents face Multi-agent when are placed in grounded environments where communication they receive raw perceptual information and how 01:30 from raw perceptual Lazaridou environmental or pre-linguistic conditions affect the PM input: what works, nature of the communication protocols that they what doesn't and learn. Finally, I will discuss some potential remedies what's next that are inspired from human language and 02:00 Overcoming Bias in communication. Hendricks PM Captioning Models Abstract 8: Overcoming Bias in Captioning 02:30 Embodied language Fragkiadaki Models in The How2 Challenge: New Tasks for PM grounding Vision & Language, Hendricks 02:00 PM Sanabria, Srinivasan, 03:00 Poster Session and Raunak, Zhou, Most machine learning models are known to PM Coffee Kundu, Patel, Specia, capture and exploit bias. While this can be Choe, Belova beneficial for many classification tasks (e.g., it might be easier to recognize a computer mouse given the Unsupervised context of a computer and a desk), exploiting bias Bilingual Lexicon 04:30 can also lead to incorrect predictions. In this talk, I Induction from Jin PM will first consider how over-reliance on bias might mono-lingual lead to incorrect predictions in a scenario where is multimodal data inappropriate to rely on bias: gender prediction in 05:00 New Directions for Metze, Palaskar image captioning. I will present the Equalizer model PM Vision & Language which more accurately describes people and their

Page 94 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 gender by considering appropriate gender evidence. algorithms capable of composing and performing Next, I will consider how bias is related to completely original (and compelling) works of music. hallucination, an interesting error mode in image These algorithms require a similar deep captioning. I will present a metric designed to understanding of music and present challenging measure hallucination and consider questions like new problems for the machine learning and AI what causes hallucination, which models are prone community at large. to hallucination, and do current metrics accurately capture hallucination? This workshop proposal is timely in that it will bridge these separate pockets of otherwise very related research. And in addition to making progress on the Machine Learning for Music Discovery challenges above, we hope to engage the wide AI and machine learning community with our nebulous Erik Schmidt, Oriol Nieto, Fabien Gouyon, problem space, and connect them with the many Katherine Kinnaird, Gert Lanckriet available datasets the MIR community has to offer (e.g., Audio Set, AcousticBrainz, Million Song 204, Sat Jun 15, 08:30 AM Dataset), which offer near commercial scale to the academic research community. The ever-increasing size and accessibility of vast music libraries has created a demand more than Schedule ever for artificial systems that are capable of understanding, organizing, or even generating such From Listening to complex data. While this topic has received Watching, A relatively marginal attention within the machine 09:00 Recommender Raimond learning community, it has been an area of intense AM Systems focus within the community of Music Information Perspective Retrieval (MIR). While significant progress has been made, these problems remain far from solved. Agrawal, Choi, Poster Gururani, Hamidi, 10:00 Furthermore, the recommender systems community Presentations (Part Jhamtani, Kopparti, AM has made great advances in terms of collaborative 1) Krause, Lee, Pati, feedback recommenders, but these approaches Zhdanov suffer strongly from the cold-start problem. As such, Making Efficient use recommendation techniques often fall back on 11:00 of Musical McFee content-based machine learning systems, but AM Annotations defining musical similarity is extremely challenging as myriad features all play some role (e.g., cultural, Two-level 11:20 Explanations in emotional, timbral, rhythmic). Thus, for machines Haunschmid must actually understand music to achieve an AM Music Emotion expert level of music recommendation. Recognition Characterizing On the other side of this problem sits the recent 11:40 Musical Correlates Kaneshiro explosion of work in the area of machine creativity. AM of Large-Scale Relevant examples are both Google Magenta and Discovery Behavior the startup Jukedeck, who seek to develop

Page 95 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

NPR: Neural Abstract 1: From Listening to Watching, A 12:00 Personalised Recommender Systems Perspective in Machine Levy PM Ranking for Song Learning for Music Discovery, Raimond 09:00 AM Selection In this talk, I'll be discussing a few key differences 02:00 Personalization at between recommending music and recommending Ellis PM Amazon Music movies or TV shows, and how these differences can A Model-Driven lead to vastly different designs, approaches, and Exploration of algorithms to find the best possible recommendation 02:20 Accent Within the Noufi for a user. On the other hand, I'll also discuss some PM Amateur Singing common challenges and some of our recent Voice research on these topics, such as better understanding the impact of a recommendation, What’s Broken in enable better offline metrics, or optimizing for Music Informatics 02:40 longer-term outcomes. Most importantly, I'll try to Research? Three Salamon PM leave a lot of time for questions and discussions. Uncomfortable Statements Abstract 3: Making Efficient use of Musical Agrawal, Choi, Annotations in Machine Learning for Music Poster Gururani, Hamidi, Discovery, McFee 11:00 AM 03:00 Presentations (Part Jhamtani, Kopparti, PM Many tasks in audio-based music analysis require 2) Krause, Lee, Pati, building mappings between complex Zhdanov representational spaces, such as the input audio User-curated signal (or spectral representation), and structured, 04:30 shaping of time-varying output such as pitch, harmony, Shi PM expressive instrumentation, rhythm, or structure. These performances mappings encode musical domain knowledge, and 04:50 Interactive Neural involve processing and integrating knowledge at Hantrakul PM Audio Synthesis multiple scales simultaneously. It typically takes humans years of training and practice to master Visualizing and these concepts, and as a result, data collection for 05:10 Understanding Won sophisticated musical analysis tasks is often costly PM Self-attention based and time-consuming. With limited available data Music Tagging with reliable annotations, it can be difficult to build A CycleGAN for robust models to automate music annotation by 05:30 style transfer computational means. However, musical problems Vande Veire PM between drum & often exhibit a great deal of structure, either in the bass subgenres input or output representations, or even between related tasks, which can be effectively leveraged to reduce data requirements. In this talk, I will survey Abstracts (6): several recent manifestations of this phenomenon across different music and audio analysis problems, drawing on recent work from the NYU Music and

Page 96 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Audio Research Lab. resorting to definition-by-annotation. How well do past/present/future algorithms perform? Despite Abstract 5: Characterizing Musical Correlates of known limitations with existing datasets and metrics, Large-Scale Discovery Behavior in Machine we still (mostly) stick to the same ones. And last but Learning for Music Discovery, Kaneshiro 11:40 not least, why do melody extraction at all? Despite AM great promise (e.g. query-by-humming, large-scale musicological analyses, etc.), melody extraction has We seek to identify musical correlates of real-world seen limited application outside of MIR research. In discovery behavior by analyzing users' audio this talk I will present three problems that are identification queries from the Shazam service. common to several of research area in music Recent research has shown that such queries are informatics: the challenge of trying to model not uniformly distributed over the course of a song, ambiguous musical concepts by training models but rather form clusters that may implicate musically with somewhat arbitrary reference annotations, the salient events. Using a publicly available dataset of lack of model generalization in the face of small, Shazam queries, we extend this research and low-variance training sets, and the possible examine candidate musical features driving disconnect between parts of the music informatics increases in query likelihood. Our findings suggest a research community and the potential users of the relationship between musical novelty -- including but technologies it produces. not limited to structural segmentation boundaries -- and ensuing peaks in discovery-based musical Abstract 11: User-curated shaping of expressive engagement. performances in Machine Learning for Music Discovery, Shi 04:30 PM Abstract 7: Personalization at Amazon Music in Machine Learning for Music Discovery, Ellis Musical statements can be interpreted in 02:00 PM performance with a wide variety of stylistic and expressive inflections. We explore how different I’ll give an overview of some of the projects that we musical characters are performed based on an are working on to make Amazon Music more extension of the basis function models, a personalized for our customers. Projects include data-driven framework for expressive performance. personalized speech and language understanding In this framework, expressive dimensions such as for voice search, personalizing “Play Music” tempo, dynamics and articulation are modeled as a requests on Alexa, and work with traditional function of score features, i.e. numerical encodings recommender models as building blocks for many of specific aspects of a musical score, using neural customer experiences. networks. By allowing the user to weight the contribution of the input score features, we show Abstract 9: What’s Broken in Music Informatics that predictions of expressive dimensions can be Research? Three Uncomfortable Statements in used to express different musical characters Machine Learning for Music Discovery, Salamon 02:40 PM

Melody extraction has been an active topic of Workshop on Self-Supervised Learning research in Music Information Retrieval for decades now. And yet - what *is* a melody? As a community Aaron van den Oord, Yusuf Aytar, Carl Doersch, we still (mostly) shy away from this question, Carl Vondrick, Alec Radford, Pierre Sermanet, Amir Zamir, Pieter Abbeel

Page 97 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Grand Ballroom A, Sat Jun 15, 08:30 AM 10:00 Learning Latent AM Plans from Play Self-supervised learning is a promising alternative where proxy tasks are developed that allow models Using and agents to learn without explicit supervision in a Self-Supervised way that helps with downstream performance on 10:15 Learning Can tasks of interest. One of the major benefits of AM Improve Model self-supervised learning is increasing data Robustness and efficiency: achieving comparable or better Uncertainty performance with less labeled data or fewer 10:30 Poster Session + environment steps (in Reinforcement learning / AM Coffee Break Robotics). 11:30 The field of self-supervised learning (SSL) is rapidly Chelsea Finn evolving, and the performance of these methods is AM creeping closer to the fully supervised approaches. 12:00 Lunch However, many of these methods are still PM developed in domain-specific sub-communities, 02:00 such as Vision, RL and NLP, even though many Yann Lecun PM similarities exist between them. While SSL is an emerging topic and there is great interest in these Revisiting Self-Supervised techniques, there are currently few workshops, 02:30 Visual tutorials or other scientific events dedicated to this PM topic. Representation This workshop aims to bring together experts with Learning different backgrounds and applications areas to Data-Efficient Image share inter-domain ideas and increase 02:45 Recognition with cross-pollination, tackle current shortcomings and PM Contrastive explore new directions. The focus will be on the Predictive Coding machine learning point of view rather than the 03:00 Poster session + domain side. PM Coffee Break https://sites.google.com/corp/view/self-supervised-icml2019 04:00 PM Schedule 04:30 Abhinav Gupta PM 08:50 Opening remarks 05:00 AM Alexei Efros PM 09:00 Jacob Devlin AM

09:30 Alison Gopnik AM Learning and Reasoning with Graph-Structured Representations

Page 98 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Ethan Fetaya, Zhiting Hu Hu, Thomas Kipf, Yujia meta-learning, planning, and reinforcement Li, Xiaodan Liang, Renjie Liao, Raquel Urtasun, learning, covering approaches which include (but Hao Wang, Max Welling, Eric Xing, Richard are not limited to): Zemel -Deep learning methods on Grand Ballroom B, Sat Jun 15, 08:30 AM graphs/manifolds/relational data (e.g., graph neural networks) Graph-structured representations are widely used -Deep generative models of graphs (e.g., for drug as a natural and powerful way to encode design) information such as relations between objects or -Unsupervised graph/manifold/relational embedding entities, interactions between online users (e.g., in methods (e.g., hyperbolic embeddings) social networks), 3D meshes in computer graphics, -Optimization methods for multi-agent environments, as well as molecular graphs/manifolds/relational data structures, to name a few. Learning and reasoning -Relational or object-level reasoning in machine with graph-structured representations is gaining perception increasing interest in both academia and industry, -Relational/structured inductive biases for due to its fundamental advantages over more reinforcement learning, modeling multi-agent traditional unstructured methods in supporting behavior and communication interpretability, causality, transferability, etc. -Neural-symbolic integration Recently, there is a surge of new techniques in the -Theoretical analysis of capacity/generalization of context of deep learning, such as graph neural deep learning models for graphs/manifolds/ networks, for learning graph representations and relational data performing reasoning and prediction, which have -Benchmark datasets and evaluation metrics achieved impressive progress. However, it can still be a long way to go to obtain satisfactory results in Schedule long-range multi-step reasoning, scalable learning with very large graphs, flexible modeling of graphs 08:45 in combination with other dimensions such as Opening remarks AM temporal variation and other modalities such as language and vision. New advances in theoretical 09:00 William L. Hamilton, Hamilton foundations, models and algorithms, as well as AM McGill University empirical discoveries and applications are therefore Evolutionary all highly desirable. Representation 09:30 Learning for taheri The aims of this workshop are to bring together AM Dynamic Graphs; researchers to dive deeply into some of the most Aynaz Taheri and promising methods which are under active Tanya Berger-Wolf exploration today, discuss how we can design new Chen, and better benchmarks, identify impactful 09:45 Poster spotlights #1 Hadziosmanovic, application domains, encourage discussion and AM Ramírez Rivera foster collaboration. The workshop will feature speakers, panelists, and poster presenters from Morning poster 10:00 machine perception, natural language processing, session and coffee AM multi-agent behavior and communication, break

Page 99 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

11:00 Marwin Segler, Segler AM Benevolent AI

Yaron Lipman, 11:30 Exploration in Reinforcement Learning Weizmann Institute Lipman AM Workshop of Science

PAN: Path Integral Surya Bhupatiraju, Benjamin Eysenbach, Based Convolution Shixiang Gu, Harrison Edwards, Martha White, for Deep Graph Pierre-Yves Oudeyer, Kenneth Stanley, Emma 12:00 Neural Networks; Ma Brunskill PM Zheng Ma, Ming Li and Yu Guang Hall A, Sat Jun 15, 08:30 AM Wang Exploration is a key component of reinforcement 12:15 learning (RL). While RL has begun to solve Poster spotlights #2 Teixeira, Baldassarre PM relatively simple tasks, current algorithms cannot 12:30 complete complex tasks. Our existing algorithms Lunch break PM often endlessly dither, failing to meaningfully explore their environments in search of high-reward 02:00 Alex Polozov, Polozov states. If we hope to have agents autonomously PM Microsoft Research learn increasingly complex tasks, these machines Sanja Fidler, must be equipped with machinery for efficient 02:30 University of Fidler exploration. PM Toronto The goal of this workshop is to present and discuss On Graph exploration in RL, including deep RL, evolutionary Classification algorithms, real-world applications, and Networks, Datasets 03:00 developmental robotics. Invited speakers will share and Baselines; PM their perspectives on efficient exploration, and Enxhell Luzhnica, researchers will share recent work in spotlight Ben Day and Pietro presentations and poster sessions. Lió

03:15 Kumar, De Cao, Schedule Poster spotlights #3 PM Chen

Afternoon poster 09:00 03:30 Doina Precup session and coffee Tu, Zhang, Chen AM PM break 09:30 Spotlight Talks 04:30 AM Caroline Uhler, MIT Uhler PM

Alexander Schwing, 05:00 University of Illinois Schwing PM at Urbana-Champaign

Page 100 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Ali Taiga, Deshmukh, Abstract 10: Poster Session #2 in Exploration in Rashid, Binas, Yasui, Reinforcement Learning Workshop, 03:00 PM 10:00 Pong, Imagawa, Poster Session #1 This is the second poster session and coffee break. AM Clifton, Mysore, Tsai, All the papers will be presented at both poster Chuck, Vezzani, sessions. Eriksson

11:00 Abstract 12: Panel Discussion in Exploration in Emo Todorov AM Reinforcement Learning Workshop, 04:30 PM

11:30 Best Paper Talks We will have a panel on exploration with panelists AM Martha White, Jeff Clune, Pulkit Agrawal, and Pieter 12:00 Abbeel, and moderated by Doina Precup. Pieter Abbeel PM 12:30 Lunch Identifying and Understanding Deep PM Learning Phenomena 02:00 Raia Hadsell PM Hanie Sedghi, Samy Bengio, Kenji Hata, 02:30 Aleksander Madry, Ari Morcos, Behnam Lightning Talks PM Neyshabur, Maithra Raghu, Ali Rahimi, Ludwig Schmidt, Ying Xiao 03:00 Poster Session #2 PM Hall B, Sat Jun 15, 08:30 AM Martha White - Our understanding of modern neural networks lags Adapting Behaviour 04:00 behind their practical successes. As this via Intrinsic PM understanding gap grows, it poses a serious Rewards to Learn challenge to the future pace of progress because Predictions fewer pillars of knowledge will be available to 04:30 Panel Discussion designers of models and algorithms. This workshop PM aims to close this understanding gap in deep learning. It solicits contributions that view the behavior of deep nets as a natural phenomenon to Abstracts (3): investigate with methods inspired from the natural sciences, like physics, astronomy, and biology. We Abstract 3: Poster Session #1 in Exploration in solicit empirical work that isolates phenomena in Reinforcement Learning Workshop, Ali Taiga, deep nets, describes them quantitatively, and then Deshmukh, Rashid, Binas, Yasui, Pong, Imagawa, replicates or falsifies them. Clifton, Mysore, Tsai, Chuck, Vezzani, Eriksson

10:00 AM As a starting point for this effort, we focus on the This is the first poster session and coffee break. All interplay between data, network architecture, and the papers will be presented at both poster training algorithms. We are looking for contributions sessions. that identify precise, reproducible phenomena, as

Page 101 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019 well as systematic studies and evaluations of Do deep neural current beliefs such as “sharp local minima do not 11:15 networks learn generalize well” or “SGD navigates out of local AM shallow learnable minima”. Through the workshop, we hope to examples first? catalogue quantifiable versions of such statements, Crowdsourcing as well as demonstrate whether or not they occur 11:30 Deep Learning reproducibly. AM Phenomena

Schedule 12:00 Lunch and Posters PM 08:45 Opening Remarks Aude Oliva: Reverse AM engineering 01:30 Nati Srebro: neuroscience and PM Optimization’s cognitive science 09:00 Untold Gift to Srebro principles AM Learning: Implicit On Understanding Regularization 02:00 the Hardness of Bad Global Minima PM Samples in Neural 09:30 Exist and SGD Can Networks AM Reach Them On the Convex Deconstructing Behavior of Deep 02:15 09:45 Lottery Tickets: Neural Networks in PM AM Zeros, Signs, and Relation to the the Supermask Layers' Width

Chiyuan Zhang: Are Andrew Saxe: all layers created Intriguing 10:00 equal? -- Studies on phenomena in 02:30 AM how neural training and Saxe PM networks represent generalization functions dynamics of deep networks 10:30 Break and Posters AM 03:00 Break and Posters PM Line attractor dynamics in 04:00 11:00 Olga Russakovsky recurrent networks PM AM for sentiment classi■cation

Page 102 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Panel Discussion: Russakovsky Moderator: Ali Rahimi in Kevin Murphy, Nati Identifying and Understanding Deep Learning Srebro, Aude Oliva, Phenomena, 04:30 PM 04:30 Andrew Saxe, Olga PM Panelists: Russakovsky Kevin Murphy, Nati Srebro, Aude Oliva, Andrew Moderator: Ali Saxe, Olga Russakovsky Rahimi Moderator: Ali Rahimi

Abstracts (4):

Abstract 1: Opening Remarks in Identifying and Adaptive and Multitask Learning: Algorithms Understanding Deep Learning Phenomena, & Systems 08:45 AM Maruan Al-Shedivat, Anthony Platanios, Otilia Hanie Sedghi Stretcu, Jacob Andreas, Ameet Talwalkar, Rich Caruana, Tom Mitchell, Eric Xing Abstract 14: Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep Seaside Ballroom, Sat Jun 15, 08:30 AM networks in Identifying and Understanding Deep Learning Phenomena, Saxe 02:30 PM Driven by progress in deep learning, the machine learning community is now able to tackle In this talk I will describe several phenomena increasingly more complex problems—ranging from related to learning dynamics in deep networks. multi-modal reasoning to dexterous robotic Among these are (a) large transient training error manipulation—all of which typically involve solving spikes during full batch gradient descent, with nontrivial combinations of tasks. Thus, designing implications for the training error surface; (b) adaptive models and algorithms that can efficiently surprisingly strong generalization performance of learn, master, and combine multiple tasks is the large networks with modest label noise even with next frontier. AMTL workshop aims to bring together infinite training time; (c) a training speed/test machine learning researchers from areas ranging accuracy trade off in vanilla deep networks; (d) the from theory to applications and systems, to explore inability of deep networks to learn known efficient and discuss: representations of certain functions; and finally (e) a trade off between training speed and multitasking * advantages, disadvantages, and applicability of ability. different approaches to learning in multitask settings, Abstract 16: Olga Russakovsky in Identifying and * formal or intuitive connections between methods Understanding Deep Learning Phenomena, developed for different problems that help better 04:00 PM understand the landscape of multitask learning techniques and inspire technique transfer between Olga Russakovsky research lines, * fundamental challenges and open questions that Abstract 17: Panel Discussion: Kevin Murphy, the community needs to tackle for the field to move Nati Srebro, Aude Oliva, Andrew Saxe, Olga forward.

Page 103 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

12:00 Lunch Break Webpage: PM [www.amtl-workshop.org](https://www.amtl-workshop.org/) ARUBA: Efficient Schedule and Adaptive 01:45 Meta-Learning with Talwalkar PM Provable 08:30 Opening Remarks Guarantees (Ameet AM Talwalkar) Building and Efficient Lifelong Structuring Training 08:40 Learning Sets for Multi-Task Ratner AM Algorithms: Regret Learning (Alex 02:15 Bounds and Ratner) PM Statistical Meta-Learning: Guarantees 09:10 Challenges and Finn (Massimiliano AM Frontiers (Chelsea Pontil) Finn) 02:45 Tricks of Trade 2 Caruana Contributed Talk: PM (Rich Caruana) Learning 03:00 09:40 Exploration Policies Coffee Break PM AM for Model-Agnostic Meta-Reinforcement Multi-Task Learning 03:30 Learning in the Wilderness Karpathy PM (Andrej Karpathy) Contributed Talk: 09:55 Lifelong Learning Recent Trends in AM via Online Leverage 04:00 Personalization: A Basilico Score Sampling PM Netflix Perspective (Justin Basilico) 10:10 Tricks of the Trade Caruana AM 1 (Rich Caruana) Contributed Talk: Improving 10:25 Coffee Break Relevance AM 04:30 Prediction with Wang PM Balazevic, Kwon, Transfer Learning in Lengerich, Asiaee, Large-scale Lambert, Chen, Ding, Retrieval Systems 11:00 Florensa, Gaudio, Poster Session Contributed Talk: AM Jaafra, Fang, Wang, Continual Li, GURUMURTHY, 04:45 Adaptation for Kwon Yan, Cilingir, PM Efficient Machine Thangarasa, Li, Lowe Communication

Page 104 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Toward Robust AI and Multitask Learning: Algorithms & Systems, Systems for 09:55 AM Understanding and 05:00 In order to mimic the human ability of continual Reasoning Over PM acquisition and transfer of knowledge across Multimodal Data various tasks, a learning system needs the (Hannaneh capability for life-long learning, effectively utilizing Hajishirzi) the previously acquired skills. As such, the key 05:30 challenge is to transfer and generalize the Closing Remarks PM knowledge learned from one task to other tasks, avoiding interference from previous knowledge and improving the overall performance. In this paper, Abstracts (5): within the continual learning paradigm, we introduce a method that effectively forgets the less useful data Abstract 4: Contributed Talk: Learning samples continuously across different tasks. The Exploration Policies for Model-Agnostic method uses statistical leverage score information Meta-Reinforcement Learning in Adaptive and to measure the importance of the data samples in Multitask Learning: Algorithms & Systems, every task and adopts frequent directions approach 09:40 AM to enable a life-long learning property. This effectively maintains a constant training size across Meta-Reinforcement learning approaches aim to all tasks. We first provide some mathematical develop learning procedures that can adapt quickly intuition for the method and then demonstrate its to a distribution of tasks with the help of a few effectiveness with experiments on variants of examples. Developing efficient exploration MNIST and CIFAR100 datasets. strategies capable of finding the most useful samples becomes critical in such settings. Existing Abstract 8: Poster Session in Adaptive and approaches to finding efficient exploration strategies Multitask Learning: Algorithms & Systems, add auxiliary objectives to promote exploration by Balazevic, Kwon, Lengerich, Asiaee, Lambert, the pre-update policy, however, this makes the Chen, Ding, Florensa, Gaudio, Jaafra, Fang, Wang, adaptation using a few gradient steps difficult as the Li, GURUMURTHY, Yan, Cilingir, Thangarasa, Li, pre-update (exploration) and post-update Lowe 11:00 AM (exploitation) policies are quite different. Instead, we propose to explicitly model a separate exploration Accepted papers: policy for the task distribution. Having two different https://www.amtl-workshop.org/accepted-papers policies gives more flexibility in training the exploration policy and also makes adaptation to any --- specific task easier. We show that using self-supervised or supervised learning objectives for TuckER: Tensor Factorization for Knowledge Graph adaptation stabilizes the training process and also Completion demonstrate the superior performance of our model Authors: Ivana Balazevic, Carl Allen, Timothy compared to prior works in this domain. Hospedales

Abstract 5: Contributed Talk: Lifelong Learning Learning Cancer Outcomes from Heterogeneous via Online Leverage Score Sampling in Adaptive Genomic Data Sources: An Adversarial Multi-task

Page 105 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Learning Approach Authors: Safoora Yousefi, Amirreza Shaban, Tasks Without Borders: A New Approach to Online Mohamed Amgad, Lee Cooper Multi-Task Learning Authors: Alexander Zimin, Christoph H. Lampert Continual adaptation for efficient machine communication The Role of Embedding-complexity in Authors: Robert Hawkins, Minae Kwon, Dorsa Domain-invariant Representations Sadigh, Noah Goodman Authors: Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka Every Sample a Task: Pushing the Limits of Heterogeneous Models with Personalized Lifelong Learning via Online Leverage Score Regression Sampling Authors: Ben Lengerich, Bryon Aragam, Eric Xing Authors: Dan Teng, Sakyasingha Dasgupta

Data Enrichment: Multi-task Learning in High Connections Between Optimization in Machine Dimension with Theoretical Guarantees Learning and Adaptive Control Authors: Amir Asiaee, Samet Oymak, Kevin R. Authors: Joseph E. Gaudio, Travis E. Gibson, Coombes, Arindam Banerjee Anuradha M. Annaswamy, Michael A. Bolender, Eugene Lavretsky A Functional Extension of Multi-Output Learning Authors: Alex Lambert, Romain Brault, Zoltan Meta-Reinforcement Learning for Adaptive Szabo, Florence d'Alche-Buc Autonomous Driving Authors: Yesmina Jaafra, Jean Luc Laurent, Aline Interpretable Robust Recommender Systems with Deruyver, Mohamed Saber Naceur Side Information Authors: Wenyu Chen, Zhechao Huang, Jason PAGANDA: An Adaptive Task-Independent Cheuk Nam Liang, Zihao Xu Automatic Data Augmentation Authors: Boli Fang, Miao Jiang, Jerry Shen Personalized Student Stress Prediction with Deep Multi-Task Network Improving Relevance Prediction with Transfer Authors: Abhinav Shaw, Natcha Simsiri, Iman Learning in Large-scale Retrieval Systems Dezbani, Madelina Fiterau, Tauhidur Rahaman Authors: Ruoxi Wang, Zhe Zhao, Xinyang Yi, Ji Yang, Derek Zhiyuan Cheng, Lichan Hong, Steve SuperTML: Domain Transfer from Computer Vision Tjoa, Jieqi Kang, Evan Ettinger, Ed Chi to Structured Tabular Data through Two-Dimensional Word Embedding Federated Optimization for Heterogeneous Authors: Baohua Sun, Lin Yang, Wenhan Zhang, Networks Michael Lin, Patrick Dong, Charles Young, Jason Authors: Tian Li*, Anit Kumar Sahu*, Manzil Zaheer, Dong Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

Goal-conditioned Imitation Learning Learning Exploration Policies for Model-Agnostic Authors: Yiming Ding, Carlos Florensa, Mariano Meta-Reinforcement Learning Phielipp, Pieter Abbeel Authors: Swaminathan Gurumurthy, Sumit Kumar,

Page 106 of 107 ICML 2019 Workshop book Generated Sat Jul 06, 2019

Katia Sycara explicit feedback is costly. In this paper, we propose to leverage user logs and implicit feedback as A Meta Understanding of Meta-Learning auxiliary objectives to improve relevance modeling Authors: Wei-Lun Chao, Han-Jia Ye, De-Chuan in retrieval systems. Specifically, we adopt a Zhan, Mark Campbell, Kilian Q. Weinberger two-tower neural net architecture to model query-item relevance given both collaborative and Multi-Task Learning via Task Multi-Clustering content information. By introducing auxiliary tasks Authors: Andy Yan, Xin Wang, Ion Stoica, Joseph trained with much richer implicit user feedback data, Gonzalez, Roy Fox we improve the quality and resolution for the learned representations of queries and items. Prototypical Bregman Networks Applying these learned representations to an Authors: Kubra Cilingir, Brian Kulis industrial retrieval system has delivered significant improvements. Differentiable Hebbian Plasticity for Continual Learning Abstract 17: Contributed Talk: Continual Authors: Vithursan Thangarasa, Thomas Miconi, Adaptation for Efficient Machine Graham W. Taylor Communication in Adaptive and Multitask Learning: Algorithms & Systems, Kwon 04:45 PM Active Multitask Learning with Committees To communicate with new partners in new contexts, Authors: Jingxi Xu, Da Tang, Tony Jebara humans rapidly form new linguistic conventions.

Recent language models trained with deep neural Progressive Memory Banks for Incremental Domain networks are able to comprehend and produce the Adaptation existing conventions present in their training data, Authors: Nabiha Asghar, Lili Mou, Kira A. Selby, but are not able to flexibly and interactively adapt Kevin D. Pantasdo, Pascal Poupart, Xin Jiang those conventions on the fly as humans do. We

introduce a repeated reference task as a Sub-policy Adaptation for Hierarchical benchmark for models of adaptation in Reinforcement Learning communication and propose a regularized continual Authors: Alexander Li, Carlos Florensa, Pieter learning framework that allows an artificial agent Abbeel initialized with a generic language model to more

accurately and efficiently understand their partner Learning to learn to communicate over time. We evaluate this framework through Authors: Ryan Lowe*, Abhinav Gupta*, Jakob simulations on COCO and in real-time reference Foerster, Douwe Kiela, Joelle Pineau game experiments with human partners. Abstract 16: Contributed Talk: Improving Relevance Prediction with Transfer Learning in Large-scale Retrieval Systems in Adaptive and Multitask Learning: Algorithms & Systems, Wang 04:30 PM

Machine learned large-scale retrieval systems require a large amount of training data representing query-item relevance. However, collecting users'

Page 107 of 107