<<

NeurIPS 2019 Workshop book

Schedule Highlights Generated Fri Dec 18, 2020

Workshop organizers make last-minute changes to their schedule. Download this document again to get the lastest changes, or use the NeurIPS mobile application.

Dec. 13, 2019

East Safety and Robustness in Decision-making Ballroom A Ghavamzadeh, Mannor, Yue, Petrik, Chow East Learning Meaningful Representations of Life Ballroom B Wood, Reshef, Bloom, Snoek, Engelhardt, Linderman, Saria, Wiltschko, Greene, Liu, Lindorf- Larsen, Marks East Optimal Transport for Ballroom C Cuturi, Peyré, Flamary, Suvorikova East Information Theory and Machine Learning Exhibition Zhao, Song, Han, Choi, Kalluri, Poole, Dimakis, Jiao, Weissman, Ermon Hall A East MLSys: Workshop on Systems for ML Meeting Lakshmiratan, Sen, Gonzalez, Crankshaw, Bird Rooms 11 + 12 East Perception as generative reasoning: structure, causality, probability Meeting Rosenbaum, Garnelo, Battaglia, Allen, Yildirim Rooms 1 - 3 East Minding the Gap: Between Fairness and Ethics Meeting Rubinov, Kondor, Poulson, Warmuth, Moss, Hagerty Rooms 8 + 15 West 109 KR2ML - Knowledge Representation and Reasoning Meets Machine Learning + 110 Thost, Muise, Talamadupula, Singh, Ré West 114 Retrospectives: A Venue for Self-Refection in ML Research + 115 Lowe, Bengio, Pineau, Paganini, Forde, Sodhani, Gupta, Lehman, Henderson, Madan, Sinha, Bouthillier West 116 Competition Track Day 1 + 117 Escalante West 118 - Workshop on Federated Learning for Data Privacy and Confdentiality 120 Fan, Konečný, Liu, McMahan, Smith, Yu West 121 Machine Learning for the Developing World (ML4D): Challenges and Risks + 122 De-Arteaga, Coston, Afonja West 202 - Visually Grounded Interaction and Language 204 Strub, Das, Wijmans, de Vries, Lee, Suhr, Arad Hudson West 205 - Robust AI in Financial Services: Data, Fairness, Explainability, Trustworthiness, and 207 Privacy Oprea, Gal, Moulinier, Chen, Veloso, Kumar, Faruquie West 208 Learning with Rich Experience: Integration of Learning Paradigms + 209 Hu, Wilson, Finn, Lee, Berg-Kirkpatrick, Salakhutdinov, Xing West 211 - Beyond frst order methods in machine learning systems 214 Kyrillidis, Berahas, Roosta, Mahoney West 215 CiML 2019: Machine Learning Competitions for All + 216 2

Mendrik, Tu, Guyon, Viegas, LI West 217 - AI for Humanitarian Assistance and Disaster Response 219 Gupta, Murphy, Darrell, Heim, Wang, Goodman, Biliński West 220 - Shared Visual Representations in Human and Machine Intelligence 222 Deza, Peterson, Murty, Grifths West 223 Workshop on Human-Centric Machine Learning + 224 Angelov, Oliver, Weller, Rodriguez, Valera, Chiappa, Heidari, Kilbertus West 301 - Solving inverse problems with deep networks: New architectures, theoretical 305 foundations, and applications Heckel, Hand, Baraniuk, Bruna, Dimakis, Needell West 306 EMC2: Energy Efcient Machine Learning and Cognitive Computing (5th edition) Parihar, Goldfarb, Srivastava, SHENG, Pal West Machine Learning for Health (ML4H): What makes machine learning in medicine Ballroom A diferent? Beam, Naumann, Beaulieu-Jones, Chen, Finlayson, Alsentzer, Dalca, McDermott West Meta-Learning Ballroom B Calandra, Clavera Gilaberte, Hutter, Vanschoren, Wang West Biological and Artifcial Reinforcement Learning Ballroom C Chua, Zannone, Behbahani, Ponte Costa, Clopath, Richards, Precup West Graph Representation Learning Exhibition Hamilton, van den Berg, Bronstein, Jegelka, Kipf, Leskovec, Liao, Sun, Veličković Hall A West Bayesian Exhibition Gal, Hernández-Lobato, Louizos, Nalisnick, Ghahramani, Murphy, Welling Hall C

Dec. 14, 2019

East Real Neurons & Hidden Units: future directions at the intersection of Ballroom A neuroscience and AI Lajoie, Shlizerman, Puelma Touzel, Thompson, Kording East Fair ML in Healthcare Ballroom B Joshi, Chen, Saleh East Tackling Climate Change with ML Ballroom C Rolnick, Donti, Kaack, Lacoste, Maharaj, Ng, Platt, Chayes, Bengio East Joint Workshop on AI for Social Good Meeting Fang, Bullock, Dilhac, Green, saltiel, Adjodah, Clark, McGregor, Luck, Penn, Sylvain, Rooms 11 Boucher, Swaine-Simon, Tadesse, Côté, Bethke, Bengio + 12 East Machine Learning for Autonomous Driving Meeting McAllister, Rhinehart, Yu, Li, Dragan Rooms 1 - 3 East Privacy in Machine Learning (PriML) Meeting Balle, Chaudhuri, Honkela, Koskela, Meehan, Park, Smart, Weller Rooms 8 + 15 West 109 Machine Learning and the Physical Sciences + 110 3

Baydin, Carrasquilla, Ho, Kashinath, Paganini, Thais, Anandkumar, Cranmer, Melko, Prabhat, Wood West 114 Program Transformations for ML + 115 Lamblin, Baydin, Wiltschko, van Merriënboer, Fertig, Pearlmutter, Duvenaud, Hascoet West 116 Competition Track Day 2 + 117 Escalante West 118 - Emergent Communication: Towards Natural Language 120 Gupta, Noukhovitch, Resnick, Jaques, Filos, Ossenkopf, Lazaridou, Foerster, Lowe, Kiela, Cho West 121 Science meets Engineering of Deep Learning + 122 Sagun, Gulcehre, Romero, Rostamzadeh, de Freitas West 202 - ML For Systems 204 Hashemi, Mirhoseini, Goldie, Swersky, XU, Raiman West 205 - The third Conversational AI workshop – today's practice and tomorrow's potential 207 Geramifard, Williams, Byrne, Celikyilmaz, Gasic, Hakkani-Tur, Henderson, Lastras, Ostendorf West 208 Document Intelligence + 209 Dufy, Akkiraju, Bedrax Weiss, Bennett, Motahari-Nezhad West 211 - Learning Transferable Skills 214 Mattar, Juliani, Lange, Crosby, Beyret West 215 Sets and Partitions + 216 Monath, Zaheer, McCallum, Kobren, Oliva, Poczos, Salakhutdinov West 217 - Context and Compositionality in Biological and Artifcial Neural Systems 219 Turek, Jain, Huth, Wehbe, Strubell, Yuille, Linzen, Honey, Cho West 220 - Robot Learning: Control and Interaction in the Real World 222 Calandra, Rakelly, Kamthe, Kragic, Schaal, Wulfmeier West 223 NeurIPS Workshop on Machine Learning for Creativity and Design 3.0 + 224 Elliott, Dieleman, Roberts, Engel, White, Fiebrink, Mital, Payne, Tokui West 301 - Medical Imaging meets NeurIPS 305 Lombaert, Glocker, Konukoglu, de Bruijne, Feragen, Oguz, Teuwen West 306 Learning with Temporal Point Processes Rodriguez, Song, Valera, Liu, De, Zha West The Optimization Foundations of Reinforcement Learning Ballroom A Dai, He, Le Roux, Li, Schuurmans, White West Machine Learning with Guarantees Ballroom B London, Dziugaite, Roy, Joachims, Madry, Shawe-Taylor West “Do the right thing”: machine learning and causal inference for improved Ballroom C decision making Santacatterina, Joachims, Kallus, Swaminathan, Sontag, Zhou West Bridging and Deep Learning Exhibition Mitliagkas, Gidel, He, Askari Hemmat, Haghtalab, Lacoste-Julien Hall A West Deep Reinforcement Learning Exhibition Abbeel, Finn, Pineau, Silver, Singh, Achiam, Florensa, Grimm, Tang, Veeriah Hall C 4

Dec. 9, 2019

East New In Machine Learning Meeting Rooms 11 + 12 5 Dec. 13, 2019

Dec. 13, 2019 09:35 Poster Session Ghosh, Shafee, AM Boopathy, Tamkin, Vasiloudis, Nanda, Baheri, Fieguth, Bennett, Shi, Liu, Jain, Tyo, Wang, Chen, Wainwright, Shama Safety and Robustness in Decision-making Sastry, Tang, Brown, Inouye, Venuto, Ramani, Diochnos, Madaan, Krashenikov, Mohammad Ghavamzadeh, Shie Mannor, Yisong Yue, Oren, Lee, Quint, amirloo, Pirotta, Marek Petrik, Yinlam Chow Hartnett, Dubourg-Felonneau, Swamy,

East Ballroom A, Fri Dec 13, 08:00 AM Chen, Bogunovic, Carter, Garcia-Barcos, Mohapatra, Zhang, Qian, Martin, Richter, Interacting with increasingly sophisticated Zaiter, Weng, , Polymenakos, Hoang, decision-making systems is becoming more and abbasi, Gallieri, Seurin, Papini, Turchetta, more a part of our daily life. This creates an Sotoudeh, Hosseinzadeh, Fulton, Uehara, immense responsibility for designers of these Prasad, Camburu, Kolaric, Renz, Jaiswal, systems to build them in a way to guarantee safe Russel, Islam, Agarwal, Aldrick, Vernekar, interaction with their users and good Lale, Narayanaswami, Daulton, Garg, performance, in the presence of noise and East, Zhang, Dsidbari, Goodwin, changes in the environment, and/or of model Krakovna, Luo, Chung, Shi, Wang, Jin, Xu misspecifcation and uncertainty. Any progress in 10:30 Finale Doshi-Velez: Combining this area will be a huge step forward in using AM Statistical methods with Human decision-making algorithms in emerging high Input for Evaluation and stakes applications, such as autonomous driving, Optimization in Batch Settings Doshi- robotics, power systems, health care, Velez recommendation systems, and fnance. 11:10 Marco Pavone: On Safe and Efcient AM Human-robot Interactions via Multi- This workshop aims to bring together researchers modal Intent Modeling and from academia and industry in order to discuss Reachability-based Safety Assurance main challenges, describe recent advances, and Pavone highlight future research directions pertaining to 11:50 Dimitar Filev: Practical Approaches develop safe and robust decision-making AM to Driving Policy Design for systems. We aim to highlight new and emerging Autonomous Vehicles theoretical and applied research opportunities for 12:30 Lunch Break the community that arise from the evolving PM needs for decision-making systems and 02:00 Nathan Kallus: Efciently Breaking algorithms that guarantee safe interaction and PM the Curse of Horizon with Double good performance under a wide range of Reinforcement Learning Kallus uncertainties in the environment. 02:40 Scott Niekum: Scaling PM Probabilistically Safe Learning to Schedule Robotics Niekum 03:20 Poster Session and Cofee Break PM 08:00 Opening Remarks 04:30 Andy Sun: Recent Advances in AM PM Multistage Decision-making under 08:15 Aviv Tamar: Visual Plan Imagination Uncertainty: New Algorithms and AM - An Interpretable Robot Learning Complexity Analysis Sun Framework Tamar 05:10 Thorsten Joachim: Fair Ranking with 08:55 Daniel Kuhn: From Data to PM Biased Data Joachims AM Decisions: Distributionally Robust 05:50 Concluding Remarks Optimization is Optimal Kuhn PM

Abstracts (9): 6 Dec. 13, 2019

Abstract 2: Aviv Tamar: Visual Plan Abstract 3: Daniel Kuhn: From Data to Imagination - An Interpretable Robot Decisions: Distributionally Robust Learning Framework in Safety and Optimization is Optimal in Safety and Robustness in Decision-making, Tamar 08:15 Robustness in Decision-making, Kuhn 08:55 AM AM

How can we build autonomous robots that We study stochastic optimization problems where operate in unstructured and dynamic the decision-maker cannot observe the environments such as homes or hospitals? This distribution of the exogenous uncertainties but problem has been investigated under several has access to a fnite set of independent training disciplines, including planning (motion planning, samples. In this setting, the goal is to fnd a task planning, etc.), and reinforcement learning. procedure that transforms the data to an While both of these felds have witnessed estimate of the expected cost function under the tremendous progress, each have fundamental unknown data-generating distribution, i.e., a drawbacks: planning approaches require predictor, and an optimizer of the estimated cost substantial manual engineering in mapping function that serves as a near-optimal candidate perception to a formal planning problem, while decision, i.e., a prescriptor. As functions of the RL, which can operate directly on raw percepts, is data, predictors and prescriptors constitute data hungry, cannot generalize to new tasks, and statistical estimators. We propose a meta- is ‘black box’ in nature. optimization problem to fnd the least conservative predictors and prescriptors subject Motivated by humans’ remarkable capability to to constraints on their out-of-sample imagine and plan complex manipulations of disappointment. The out-of-sample objects, and recent advances in imagining disappointment quantifes the probability that the images such as GANs, we present Visual Plan actual expected cost of the candidate decision Imagination (VPI) — a new computational under the unknown true distribution exceeds its problem that combines image imagination and predicted cost. Leveraging tools from large planning. In VPI, given of-policy image data from deviations theory, we prove that this meta- a dynamical system, the task is to ‘imagine’ optimization problem admits a unique solution: image sequences that transition the system from The best predictor-prescriptor pair is obtained by start to goal. Thus, VPI focuses on the essence of solving a distributionally robust optimization planning with high-dim perception, and abstracts problem over all distributions within a given away low level control and reward engineering. relative entropy distance from the empirical More importantly, VPI provides a safe and distribution of the data. interpretable basis for robotic control — before the robot acts, a human inspects the imagined plan the robot will act upon, and can intervene if Abstract 5: Finale Doshi-Velez: Combining necessary. Statistical methods with Human Input for Evaluation and Optimization in Batch I will describe our approach to VPI based on Settings in Safety and Robustness in Causal InfoGAN, a deep generative model that Decision-making, Doshi-Velez 10:30 AM learns features that are compatible with strong planning algorithms. We show that Causal Statistical methods for of-policy evaluation and InfoGAN can generate convincing visual plans, counterfactual reasoning will have fundamental and we demonstrate learning to imagine and limitations based on what assumptions can be execute real robot rope manipulation from image made and what kind of exploration is present in data. I will also discuss our VPI simulation the data (some of which is being presented here benchmarks, and recent eforts in novelty by other speakers!). In this talk, I'll discuss some detection, an important component in VPI, and in recent directions in our lab regarding ways to safe decision making in general. integrate human experts into the process of policy evaluation and selection in batch settings. The frst deals with statistical limitations by seeking a diverse collection of statistically- 7 Dec. 13, 2019

indistinguishable (with respect to outcome) Autonomous Vehicles in Safety and policies for humans to eventually decide from. Robustness in Decision-making, 11:50 AM The involves directly integrating human The presentation deals with some practical facets feedback to eliminate or validate specifc sources of sensitivity in an of-policy evaluation to get of application of AI methods to designing driving more robust estimates (or at least better policy for autonomous vehicles. Relationship understand the source of their non-robustness). between the reinforcement learning (RL) based solutions and the use of rule-based and model- More broadly, I will discuss open directions for based techniques for improving their robustness moving from purely-statistical (e.g. of-policy evaluation) or purely-human (e.g. interpretability- and safety are discussed. One approach to based) approaches for robust/safe decision- obtaining explainable RL models by learning making toward combining the advantages of alternative rule-based representations is proposed. The presentation also elaborates on both. the opportunities for extending the AI driving policy approaches by applying game theory inspired methodology to addressing diverse and Abstract 6: Marco Pavone: On Safe and unforeseen scenarios, and representing the Efcient Human-robot Interactions via negotiation aspects of decision making in Multi-modal Intent Modeling and autonomous driving. Reachability-based Safety Assurance in Safety and Robustness in Decision-making, Pavone 11:10 AM Abstract 9: Nathan Kallus: Efciently In this talk I will present a decision-making and Breaking the Curse of Horizon with Double control stack for human-robot interactions by Reinforcement Learning in Safety and using autonomous driving as a motivating Robustness in Decision-making, Kallus 02:00 example. Specifcally, I will frst discuss a data- PM driven approach for learning multimodal Of-policy evaluation (OPE) is crucial for interaction dynamics between robot-driven and reinforcement learning in domains like medicine human-driven vehicles based on recent advances in deep generative modeling. Then, I will discuss with limited exploration, but OPE is also how to incorporate such a learned interaction notoriously difcult because the similarity model into a real-time, interaction-aware between trajectories generated by any proposed policy and the observed data diminishes decision-making framework. The framework is exponentially as horizons grow, known as the designed to be minimally interventional; in particular, by leveraging backward reachability curse of horizon. To understand precisely when analysis, it ensures safety even when other cars this curse bites, we consider for the frst time the defy the robot's expectations without unduly semi-parametric efciency limits of OPE in Markov decision processes (MDP), establishing sacrifcing performance. I will present recent the best-possible estimation errors and results from experiments on a full-scale steer-by- wire platform, validating the framework and characterizing the curse as a problem-dependent providing practical insights. I will conclude the phenomenon rather than method-dependent. talk by providing an overview of related eforts Efciency in OPE is crucial because, without exploration, we must use the available data to its from my group on infusing safety assurances in fullest. In fnite horizons, this shows standard robot autonomy stacks equipped with learning- based components, with an emphasis on adding doubly-robust (DR) estimators are in fact structure within robot learning via control- inefcient for MDPs. In infnite horizons, while the theoretical and formal methods curse renders certain problems fundamentally intractable, OPE may be feasible in ergodic time- invariant MDPs. We develop the frst OPE estimator that achieves the efciency limits in Abstract 7: Dimitar Filev: Practical both setting, termed Double Reinforcement Approaches to Driving Policy Design for Learning (DRL). In both fnite and infnite horizons, DRL improves upon existing estimators, 8 Dec. 13, 2019

which we show are inefcient, and leverages the iteration complexity of the proposed problem structure to its fullest in the face of the algorithms. This settles some open questions in curse of horizon. We establish many favorable regards of the complexity of SDDP. characteristics for DRL including efciency even when nuisances are estimated slowly by blackbox models, fnite-sample guarantees, and model Abstract 13: Thorsten Joachim: Fair Ranking double robustness. with Biased Data in Safety and Robustness in Decision-making, Joachims 05:10 PM

Abstract 10: Scott Niekum: Scaling Search engines and recommender systems have become the dominant matchmaker for a wide Probabilistically Safe Learning to Robotics range of human endeavors -- from online retail to in Safety and Robustness in Decision- making, Niekum 02:40 PM fnding romantic partners. Consequently, they carry immense power in shaping markets and In recent years, high-confdence reinforcement allocating opportunity to the participants. In this learning algorithms have enjoyed success in talk, I will discuss how the machine learning application areas with high-quality models and algorithms underlying these system can produce plentiful data, but robotics remains a challenging unfair ranking policies for both exogenous and domain for scaling up such approaches. endogenous reasons. Exogenous reasons often Furthermore, very little work has been done on manifest themselves as biases in the training the even more difcult problem of safe imitation data, which then get refected in the learned learning, in which the demonstrator's reward ranking policy and lead to rich-get-richer function is not known. This talk focuses on three dynamics. But even when trained with unbiased recent developments in this emerging area of data, reasons endogenous to the algorithms can research: (1) a theory of safe imitation learning; lead to unfair allocation of opportunity. To (2) scalable reward inference in the absence of overcome these challenges, I will present new models; (3) efcient of-policy policy evaluation. machine learning algorithms that directly address The proposed algorithms ofer a blend of safety both endogenous and exogenous unfairness. and practicality, making a signifcant step towards safe robot learning with modest amounts of real-world data. Learning Meaningful Representations of Life Abstract 12: Andy Sun: Recent Advances in Multistage Decision-making under Elizabeth Wood, Yakir Reshef, Jon Bloom, Jasper Uncertainty: New Algorithms and Snoek, Barbara Engelhardt, Scott Linderman, , Alexander Wiltschko, Casey Greene, Chang Complexity Analysis in Safety and Liu, Kresten Lindorf-Larsen, Debora Marks Robustness in Decision-making, Sun 04:30

PM East Ballroom B, Fri Dec 13, 08:00 AM

In this talk, we will review some recent advances The last decade has seen both machine learning in the area of multistage decision making under and biology transformed: the former by the uncertainty, especially in the domain of ability to train complex predictors on massive stochastic and robust optimization. We will labelled data sets; the latter by the ability to present some new algorithmic development that perturb and measure biological systems with allows for exactly solving huge-scale stochastic staggering throughput, breadth, and resolution. programs with integer recourse decisions, and However, fundamentally new ideas in machine algorithms in a dual perspective that can deal learning are needed to translate biomedical data with infeasibility in problems. This signifcantly at scale into a mechanistic understanding of extends the scope of stochastic dual dynamic biology and disease at a level of abstraction programming (SDDP) algorithms from convex or beyond single genes. This challenge has the binary state variable cases to general nonconvex potential to drive the next decade of creativity in problems. We will also present a new analysis of 9 Dec. 13, 2019

machine learning as the feld grapples with how 01:15 Phenotype HaCohen, Reshef, Johnson, to move beyond prediction to a regime that PM Morris, Nagy, Eraslan, Singer, Van Allen, broadly catalyzes and accelerates scientifc Krishnaswamy, Greene, Linderman, discovery. Wiltschko, Kotliar, Zou, Bulik-Sullivan 03:00 Cofee Break To seize this opportunity, we will bring together PM current and future leaders within each feld to 03:15 Cell Carpenter, Zhou, Chikina, Tong, introduce the next generation of machine PM Lengerich, Abdelkareem, Eraslan, Ra, learning specialists to the next generation of Burkhardt, Matsen IV, Moses, Chen, biological problems. Our full-day workshop will Haghighi, Lu, Schau, Nivala, Shifman, start a deeper dialogue with the goal of Learning Harbrecht, Masengo Wa Umba, Weinstein Meaningful Representations of Life (LMRL), 05:00 Closing Remarks Sander, Fiete, Peer emphasizing interpretable representation PM learning of structure and principles. The 05:45 Last Look at Posters (Drinks workshop will address this challenge at fve PM Provided) layers of biological abstraction (genome, molecule, cell, system, phenome) through interactive breakout sessions led by a diverse Abstracts (11): team of experimentalists and computational scientists to facilitate substantive discussion. Abstract 1: Opening Remarks in Learning Meaningful Representations of Life, 08:30 We are calling for short abstracts from computer AM scientists and biological scientists. Submission Opening Remarks by Francis Collins, Director, NIH deadline is Friday, September 20. Signifcant (by video) travel support is also available. Details here: https://lmrl-bio.github.io/call https://lmrl-bio.github.io/travel Abstract 2: Opening Remarks in Learning Meaningful Representations of Life, Yeshwant 08:45 AM Schedule Opening remarks by Francis Collins (Director, National Institutes of Health) via video and 08:30 Opening Remarks Krishna Yeshwant, General Partner at AM Ventures. 08:45 Opening Remarks Yeshwant AM 09:00 Keynote - Bio/ML Regev Abstract 3: Keynote - Bio/ML in Learning AM Meaningful Representations of Life, Regev 09:30 Keynote - ML Welling 09:00 AM AM 10:00 In conversations: Daphne Koller and . Professor of Biology; Core Member, AM Barbara Englehardt Koller, Engelhardt Broad Institute; Investigator, Howard Hughes 10:30 Cofee Break Medical Institute. Aviv Regev pioneers the use of AM single-cell genomics and other techniques to dissect the molecular networks that regulate 10:45 Molecules and Genomes Haussler, genes, defne cells and tissues, and infuence AM Clevert, Keiser, Aspuru-Guzik, Duvenaud, health and disease. Jones, Wei, D'Amour 12:00 Synthetic Systems Silver, Marks, Liu, PM Huang 12:30 Poster Session I (Lunch Provided) Abstract 4: Keynote - ML in Learning PM Meaningful Representations of Life, Welling 09:30 AM 10 Dec. 13, 2019

Max Welling is a research chair in Machine David Kelley, Dylan Kotliar, Eli van Allen, Gokcen Learning at the University of Amsterdam and a VP Eraslan, James Zou, Matt Johnson, Meromit Technologies at Qualcomm. Singer, Nir Hacohen, Samantha Morris, Scott Linderman, Smita Krishnaswamy

Abstract 5: In conversations: Daphne Koller and Barbara Englehardt in Learning Abstract 12: Cell in Learning Meaningful Meaningful Representations of Life, Koller, Representations of Life, Carpenter, Zhou, Engelhardt 10:00 AM Chikina, Tong, Lengerich, Abdelkareem, Eraslan, Ra, Burkhardt, Matsen IV, Moses, Chen, Haghighi, Daphne Koller is the Rajeev Motwani Professor in Lu, Schau, Nivala, Shifman, Harbrecht, Masengo the Computer Science Department at Stanford Wa Umba, Weinstein 03:15 PM University and founder of insitro. Anne Carpenter, Hui Ting Grace Yeo, Jian Zhou, Maria Chikina, Alexander Tong, Benjamin Lengerich, Aly O. Abdelkareem, Gokcen Eraslan, Abstract 7: Molecules and Genomes in Andrew Blumberg, Stephen Ra, Daniel Burkhardt, Learning Meaningful Representations of Life, Haussler, Clevert, Keiser, Aspuru-Guzik, Emanuel Flores Bautista, Frederick Matsen, Alan Duvenaud, Jones, Wei, D'Amour 10:45 AM Moses, Zhenghao Chen, Marzieh Haghighi, Alex Lu, Geofrey Schau, Jef Nivala, Luke O'Connor, David Duvenaud & Alan Asparu-Guzik; Michael Miriam Shifman, Hannes Harbrecht and Shimbi Keiser & Jennifer Wei; David Jones & John Jumper; Masengo Wa Umba Papa Levi present in a & Alex D'Amour speak on jointly lightning round. identifed challenges.

Abstract 13: Closing Remarks in Learning Abstract 8: Synthetic Systems in Learning Meaningful Representations of Life, Sander, Meaningful Representations of Life, Silver, Fiete, Peer 05:00 PM Marks, Liu, Huang 12:00 PM , Ila Fiete, and Dana Pe'er present. Pamela Silver, Debora Marks, and Chang Liu in conversation.

Optimal Transport for Machine Learning Abstract 9: Poster Session I (Lunch Provided) in Learning Meaningful Representations of Marco Cuturi, Gabriel Peyré, Rémi Flamary, Alexandra Suvorikova Life, 12:30 PM

East Ballroom C, Fri Dec 13, 08:00 AM Yixin Wang and Alex D'Amour in conversation.

Optimal transport(OT) provides a powerful and fexible way to compare, interpolate and morph Abstract 10: Phenotype in Learning probability measures. Originally proposed in the Meaningful Representations of Life, eighteenth century, this theory later led to Nobel HaCohen, Reshef, Johnson, Morris, Nagy, Eraslan, Prizes for Koopmans and Kantorovich as well as Singer, Van Allen, Krishnaswamy, Greene, C. Villani and A. Figalli Fields’ Medals in 2010 and Linderman, Wiltschko, Kotliar, Zou, Bulik-Sullivan 2018. OT is now used in challenging learning 01:15 PM problems that involve high-dimensional data such as the inference of individual trajectories by Challenge Presenters: Casey Greene, Dylan looking at population snapshots in biology, the Kotliar, Smita Kirshnaswamy estimation of generative models for images, or more generally transport maps to transform Conversation Facilitators: Alex Wiltschko, Aurel samples in one space into another as in domain Nagy, Brendan Bulik-Sullivan, Casey Greene, 11 Dec. 13, 2019

adaptation. With more than a hundred papers Information theory is deeply connected to two mentioning Wasserstein or transport in their title key tasks in machine learning: prediction and submitted at NeurIPS this year, and several representation learning. Because of these dozens appearing every month acrossML/stats/ connections, information theory has found wide imaging and data sciences, this workshop’s aim applications in machine learning tasks, such as will be to federate and advancecurrent proving generalization bounds, certifying fairness knowledge in this rapidly growing feld. and privacy, optimizing information content of unsupervised/supervised representations, and proving limitations to prediction performance. Schedule Conversely, progress in machine learning have been successfully applied to classical information 08:00 Facundo Memoli Mémoli theory tasks such as compression and AM transmission. 08:40 Karren Dai These recent progress have lead to new open AM questions and opportunities: to marry the 09:00 Jon Weed Niles-Weed simplicity and elegance of information theoretic AM analysis with the complexity of modern high 10:30 Stefanie Jegelka Jegelka dimensional machine learning setups. However, AM because of the diversity of information theoretic 11:10 SPOTLIGHTS 5 x 10 research, diferent communities often progress AM independently despite shared questions and 12:00 Poster Session tools. For example, variational bounds to mutual PM information are concurrently developed in 02:00 Geofrey Schiebinger Schiebinger information theory, generative model, and PM learning theory communities. 02:40 Charlie Frogner Frogner PM This workshop hopes to bring together 03:00 Aude Genevay Genevay researchers from diferent disciplines, identify PM common grounds, and spur discussion on how 04:20 Daniel Kuhn Kuhn information theory can apply to and beneft from PM modern machine learning setups. 05:00 Alexei Kroshnin Kroshnin PM Schedule 05:20 Poster Session Yu, Kroshnin, Delalande, PM Carr, Tompkins, Pooladian, Robert, Makkuva, Genevay, Liu, Zeng, Frogner, 09:30 Invited Talk: Ayfer Ozgur Aydin Cazelles, Tabak, Ramos, PATY, Balikas, AM Trigila, Wang, Mahler, Nielsen, Lounici, 10:00 Invited Talk: Stefano Soatto and Swanson, Bhutani, Bréchet, Indyk, cohen, AM Alessandro Achille Soatto, Achille Jegelka, Wu, Sejourne, Manole, Zhao, 11:00 Contributed Talk Wang, Wang, Dukler, Wang, Dong AM 02:00 Invited Talk: Varun Jog Jog PM Information Theory and Machine Learning 02:30 Invited Talk: Po-Ling Loh PM

Shengjia Zhao, Jiaming Song, Yanjun Han, Kristy 03:20 Invited Talk: Aaron van den Oord Choi, Pratyusha Kalluri, Ben Poole, Alex Dimakis, PM Jiantao Jiao, Tsachy Weissman, Stefano Ermon 03:50 Invited Talk: Alexander A Alemi Alemi PM East Exhibition Hall A, Fri Dec 13, 08:00 AM 04:20 Poster Spotlight PM 12 Dec. 13, 2019

05:00 Poster Session Flamich, Ubaru, Zheng, PM Djolonga, Wickstrøm, Granziol, Pitas, Li, Our plan is to run this workshop annually co- Williamson, Yoon, Lee, Zilly, Petrini, located with one ML venue and one Systems Fischer, Dong, Alemi, Nguyen, venue, to help build a strong community which Brekelmans, Wu, Mahajan, Li, Shiragur, we think will complement newer conferences like Carmon, Adilova, LIU, An, Dash, Gunluk, SysML targeting research at the intersection of Mazumdar, Motani, Rosenzweig, Kamp, systems and machine learning. We believe this Havasi, Barnes, Zhou, Hao, Foster, dual approach will help to create a low barrier to Benjamini, Srebro, Tschannen, participation for both communities. Rubenstein, Gelly, Duchi, Sidford, Ru, Zohren, Dalal, Osborne, Roberts, This workshop is part two of a two-part series Charikar, Subramanian, Fan, Schwarzer, with one day focusing on ML for Systems and the Roberts, Lacoste-Julien, Prabhu, other on Systems for ML. Although the two Galstyan, Ver Steeg, Sankar, Noh, workshops are being led by diferent organizers, Dasarathy, Park, Cheung, Tran, Yang, we are coordinating our call for papers to ensure Poole, Censi, Sylvain, Hjelm, Liu, Gallego- that the workshops complement each other and Posada, Sypherd, Yang, Morshuis that submitted papers are routed to the appropriate venue.

MLSys: Workshop on Systems for ML Schedule Aparna Lakshmiratan, Siddhartha Sen, Joseph Gonzalez, Dan Crankshaw, Sarah Bird 08:30 Welcome East Meeting Rooms 11 + 12, Fri Dec 13, 08:00 AM AM 08:40 Keynote 1: Machine Learning A new area is emerging at the intersection of AM Reproducibility: An update from the artifcial intelligence, machine learning, and NeurIPS 2019 Reproducibility Co- systems design. This has been accelerated by the Chairs, Joelle Pineau, McGill explosive growth of diverse applications of ML in University and Facebook production, the continued growth in data volume, 09:10 Contributed Talk: SLIDE : Training and the complexity of large-scale learning AM Deep Neural Networks with Large systems. The goal of this workshop is to bring Outputs on a CPU faster than a together experts working at the crossroads of V100-GPU machine learning, system design and software 09:30 Contributed Talk: NeMo: A Toolkit for engineering to explore the challenges faced when AM Building AI Applications Using building large-scale ML systems. In particular, we Neural Modules aim to elicit new connections among these diverse felds, identifying theory, tools and design 09:50 Poster Overview principles tailored to practical machine learning AM workfows. We also want to think about best 10:00 Posters and Cofee Kumar, Kornuta, practices for research in this area and how to AM Bakhteev, Guan, Dong, Cho, Laue, evaluate it. The workshop will cover state of the Vasiloudis, Anghel, Wijmans, Shang, art ML and AI platforms and algorithm toolkits Kuchaiev, Lin, Zhang, Zhu, Chen, Joseph, (e.g. TensorFlow, PyTorch1.0, MXNet etc.), as well Ding, Raiman, Shin, Thangarasa, as dive into machine learning-focused Sankaran, Mathur, Dazzi, Löning, Ho, developments in distributed learning platforms, Zgraggen, Nakandala, Kornuta, programming languages, data structures, Kuznetsova hardware accelerators, benchmarking systems 11:10 Keynote 2: Vivienne Sze, MIT and other topics. AM

This workshop will follow the successful model we have previously run at ICML, NeurIPS and SOSP. 13 Dec. 13, 2019

11:40 Contributed Talk: 5 Parallel Prism: A learning-based analysis-by-synthesis methods for AM Topology for Pipelined perception and inference. In addition, we pose Implementations of Convolutional the question of how ideas from these research Neural Networks Using areas can be combined with and complement Computational Memory modern deep learning practices. 12:00 Lunch PM Schedule 01:30 Systems Bonanza (10 minutes each) PM PyTorch TensorFlow Keras TVM Ray ONNX Runtime CoreML Flux MLFlow 08:50 Opening Remarks Rosenbaum, MLPerf RL Systems MXNet AM Garnelo, Battaglia, Allen, Yildirim 03:30 Break and Poster Session 09:00 Tatiana Lopez-Guevara López-Guevara PM AM 03:30 Posters and Cofee 09:35 Spotlights 1 Chang, Chorowski, Dirks PM AM 04:30 Keynote 3 10:30 Josh Tenenbaum Tenenbaum PM AM 05:00 Contributed Talk: LISA: Towards 11:05 Sanja Fidler Fidler PM Learned DNA Sequence Search AM 05:20 Closing 11:40 Spotlights 2 Deng, Crawford PM AM 01:30 Niloy Mitra Mitra PM Perception as generative reasoning: 02:05 Danilo Rezende Jimenez Rezende PM structure, causality, probability 02:40 Posters Graber, Hu, Fang, Hamrick,

Dan Rosenbaum, Marta Garnelo, Peter Battaglia, PM Giannone, Co-Reyes, Deng, Crawford, Kelsey Allen, Ilker Yildirim Dittadi, Karkus, Dirks, Trivedi, Raj, Felip Leon, Chan, Chorowski, Orchard, Stanić, East Meeting Rooms 1 - 3, Fri Dec 13, 08:00 AM Kortylewski, Zinberg, Zhou, Sun, Mansinghka, Li, Cusumano-Towner Many perception tasks can be cast as ‘inverse 04:15 Panel Fidler, Tenenbaum, López- problems’ where the input signal is the outcome PM Guevara, Jimenez Rezende, Mitra of a causal process and perception is to invert 05:15 Closing remarks Rosenbaum, Garnelo, that process. For example in visual object PM Battaglia, Allen, Yildirim perception, the image is caused by an object and perception is to infer which object gave rise to that image. Following an analysis-by-synthesis approach, modelling the forward and causal Minding the Gap: Between Fairness and direction of the data generation process is a Ethics natural way to capture the underlying scene structure, which typically leads to broader Igor Rubinov, Risi Kondor, Jack Poulson, Manfred K. generalisation and better sample efciency. Such Warmuth, Emanuel Moss, Alexa Hagerty a forward model can be applied to solve the East Meeting Rooms 8 + 15, Fri Dec 13, 08:00 AM inverse problem (inferring the scene structure from an input image) using Bayes rule, for When researchers and practitioners, as well as example. This workfow stands in contrast to policy makers and the public, discuss the impacts common approaches in deep learning, where of deep learning systems, they draw upon typically one frst defnes a task, and then multiple conceptual frames that do not sit easily optimises a deep model end-to-end to solve it. In beside each other. Questions of algorithmic this workshop we propose to revisit ideas from fairness arise from a set of concerns that are the generative approach and advocate for 14 Dec. 13, 2019

similar, but not identical, to those that circulate 02:00 A Conversation with Meredith around AI safety, which in turn overlap with, but PM Whittaker Sloane, Whittaker are distinct from, the questions that motivate 02:45 Global implications Malliaraki, Poulson, work on AI ethics, and so on. Robust bodies of PM Prabhakaran, Sloane, Hagerty research on privacy, security, transparency, 03:45 Cofee Break accountability, interpretability, explainability, and PM opacity are also incorporated into each of these 04:30 Solutions Christian, Hu, Kondor, frames and conversations in variable ways. These PM Marshall, Rogers, Schuur, Moss frames reveal gaps that persist across both highly technical and socially embedded approaches, and yet collaboration across these Abstracts (5): gaps has proven challenging. Abstract 3: Approaches to Understanding AI Fairness, Ethics, and Safety in AI each draw upon in Minding the Gap: Between Fairness and diferent disciplinary prerogatives, variously Ethics, Bengio, Dobbe, Elish, Kroll, Metcalf, centering applied mathematics, analytic Poulson 08:45 AM philosophy, behavioral sciences, legal studies, The stakes of AI certainly alter how we relate to and the social sciences in ways that make each other as humans - how we know what we conversation between these frames fraught with know about reality, how we communicate, how misunderstandings. These misunderstandings we work and earn money, and about how we arise from a high degree of linguistic slippage think of ourselves as human. But in grappling between diferent frames, and reveal the with these changing relations, three fairly epistemic fractures that undermine valuable concrete approaches have dominated the synergy and productive collaboration. This conversation: ethics, fairness, and safety. These workshop focuses on ways to translate between approaches come from very diferent academic these ongoing eforts and bring them into backgrounds, draw attention to very diferent necessary conversation in order to understand aspects of AI, and imagine very diferent the profound impacts of algorithmic systems in problems and solutions as relevant, leading us to society. ask: • What are the commonalities and diferences Schedule between ethics, fairness, and safety as approaches to addressing the challenges of AI? • How do these approaches imagine diferent 08:00 Opening Remarks Poulson, Warmuth problems and solutions for the challenges posed AM by AI? 08:15 Invited Talk Bengio • How can these approaches work together, or AM are there some areas where they are mutually 08:45 Approaches to Understanding AI incompatible? AM Bengio, Dobbe, Elish, Kroll, Metcalf, Poulson 09:45 Spectrogram Moss Abstract 6: Detecting and Documenting AI AM Impacts in Minding the Gap: Between 10:00 Cofee Break Fairness and Ethics, Christian, Hagerty, Rogers, AM Schuur, Snow, Elish 10:30 AM 10:30 Detecting and Documenting AI Algorithmic systems are being widely used in key AM Impacts Christian, Hagerty, Rogers, social institutions and while they promise radical Schuur, Snow, Elish improvements in felds from public health to 11:30 Responsibilities Kim, O'Sullivan, energy allocation, they also raises troubling AM Schuur, Smart, Metcalf issues of bias, discrimination, and “automated 12:30 Lunch inequality.” They also present irresolvable PM challenges related to the dual-use nature of these 15 Dec. 13, 2019

technologies, secondary efects that are difcult environment, public health and agriculture in to anticipate, and alter power relations between diverse settings? individuals, companies, and governments. • How should we delimit the scope of AI impacts? What can properly be considered an AI impact, as Abstract 12: Solutions in Minding the Gap: opposed to an impact arising from some other Between Fairness and Ethics, Christian, Hu, cause? Kondor, Marshall, Rogers, Schuur, Moss 04:30 PM • How do we detect and document the social impacts of AI? With the recognition that there are no fully • What tools, processes, and institutions ought to sufcient steps that can be taken to addressing be involved in addressing these questions? all AI impacts, there are concrete things that ought to be done, ranging across technical, socio- technical, and legal or regulatory possibilities. Abstract 7: Responsibilities in Minding the • What are the technical, social, and/or Gap: Between Fairness and Ethics, Kim, regulatory solutions that are necessary to address the riskiest aspects of AI? O'Sullivan, Schuur, Smart, Metcalf 11:30 AM • What are key approaches to minimize the risks While there is a great deal of AI research of AI technologies? happening in academic settings, much of that work is operationalized within corporate contexts. Some companies serve as vendors, selling AI systems to government entities, some sell to KR2ML - Knowledge Representation and other companies, some sell directly to end-users, Reasoning Meets Machine Learning and yet others sell to any combination of the above. Veronika Thost, Christian Muise, Kartik • What set of responsibilities does the AI industry Talamadupula, Sameer Singh, Chris Ré have w.r.t. AI impacts? • How do those responsibilities shift depending West 109 + 110, Fri Dec 13, 08:00 AM on a B2B, B2G, B2C business model? • What responsibilities does government have to Machine learning (ML) has seen a tremendous society, with respect to AI impacts arising from amount of recent success and has been applied industry? in a variety of applications. However, it comes • What role does civil society organizations have with several drawbacks, such as the need for to play in this conversation? large amounts of training data and the lack of explainability and verifability of the results. In many domains, there is structured knowledge (e.g., from electronic health records, laws, clinical Abstract 10: Global implications in Minding guidelines, or common sense knowledge) which the Gap: Between Fairness and Ethics, can be leveraged for reasoning in an informed Malliaraki, Poulson, Prabhakaran, Sloane, Hagerty way (i.e., including the information encoded in 02:45 PM the knowledge representation itself) in order to The risks and benefts of AI are unevenly obtain high quality answers. Symbolic distributed within societies and across the globe. approaches for knowledge representation and Governance regimes are drastically diferent in reasoning (KRR) are less prominent today - various regions of the world, as are the political mainly due to their lack of scalability - but their and ethical implications of AI technologies. strength lies in the verifable and interpretable • How do we better understand how AI reasoning that can be accomplished. The KR2ML technologies operate around the world and the workshop aims at the intersection of these two range of risks they carry for diferent societies? subfelds of AI. It will shine a light on the • Are there global claims about the implications synergies that (could/should) exist between KRR of AI that can apply everywhere around the and ML, and will initiate a discussion about the globe? If so, what are they? key challenges in the feld. • What can we learn from AI’s impacts on labor, 16 Dec. 13, 2019

Schedule 03:00 Contributed Talk: LeDeepChef: Deep PM Reinforcement Learning Agent for Families of Text-Based Games 08:00 Opening Remarks Adolphs AM 03:15 Poster Spotlights B (13 posters) 08:05 Invited Talk (William W. Cohen) PM Camacho, Percy, Belle, Gunel, Klassen, AM Cohen Weyde, Ghalwash, Arora, Illanes, Raiman, 08:35 Contributed Talk: Neural-Guided Wang, Lew, Min AM Symbolic Regression with 03:30 Cofee Break + Poster Session Asymptotic Constraints Singh PM 08:50 Contributed Talk: Towards Finding 04:15 Invited Talk (Yejin Choi) Choi AM Longer Proofs Zombori PM 09:05 Contributed Talk: Neural Markov 04:45 Invited Talk (Guy Van den Broeck) AM Logic Networks Kuzelka PM Van den Broeck 09:20 Poster Spotlights A (23 posters) 05:15 Discussion Panel AM Bahn, Xu, Su, Cunnington, Hwang, Dash, PM Camacho, Salonidis, Li, Zhang, Naderi, 05:55 Closing Remarks Zeng, Khosravi, Colon-Hernandez, PM Diochnos, Windridge, Manhaeve, Belle, Juba, Govindarajulu, Bockhorst 09:45 Cofee Break + Poster Session AM Retrospectives: A Venue for Self-Refection 10:30 Invited Talk (Xin Luna Dong) Dong in ML Research AM Ryan Lowe, Yoshua Bengio, Joelle Pineau, Michela 11:00 Contributed Talk: Layerwise Paganini, Jessica Forde, Shagun Sodhani, Abhishek AM Knowledge Extraction from Deep Gupta, Joel Lehman, Peter Henderson, Kanika Convolutional Networks Odense Madan, Koustuv Sinha, Xavier Bouthillier 11:15 Contributed Talk: Ontology-based AM Interpretable Machine Learning with West 114 + 115, Fri Dec 13, 08:00 AM Learnable Anchors Lai The NeurIPS Workshop on Retrospectives in 11:30 Contributed Talk: Learning multi- Machine Learning will kick-start the exploration of AM step spatio-temporal reasoning with a new kind of scientifc publication, called Selective Attention Memory Network retrospectives. The purpose of a retrospective is Jayram to answer the question: 11:45 Contributed Talk: MARLeME: A Multi- AM Agent Reinforcement Learning “What should readers of this paper know now, Model Extraction Library Kazhdan that is not in the original publication?” 12:00 Invited Talk (Vivek Srikumar) PM Srikumar Retrospectives provide a venue for authors to 12:30 Lunch Break refect on their previous publications, to talk PM about how their intuitions have changed, to 02:00 Invited Talk (Francesca Rossi) Rossi identify shortcomings in their analysis or results, PM and to discuss resulting extensions that may not 02:30 Contributed Talk: TP-N2F: Tensor be sufcient for a full follow-up paper. A PM Product Representation for Natural retrospective is written about a single paper, by To Formal Language Generation Chen that paper's author, and takes the form of an 02:45 Contributed Talk: TabFact: A Large- informal paper. The overarching goal of PM scale Dataset for Table-based Fact retrospectives is to improve the science, Verifcation Chen openness, and accessibility of the machine learning feld, by widening what is publishable and helping to identifying opportunities for 17 Dec. 13, 2019

improvement. Retrospectives will also give 04:20 Invited talk: Michael Littman, researchers and practitioners who are unable to PM "Refecting on 'Markov games that attend top conferences access to the author’s people play'" updated understanding of their work, which 04:40 Retrospectives brainstorming would otherwise only be accessible to their PM session: how do we produce immediate circle. impact?

Schedule Abstracts (4):

Abstract 3: Invited talk: Melanie Mitchell, 09:00 Opening Remarks "Active Symbols and Analogy-Making: AM Refections on Hofstadter & Mitchell's 09:10 Invited talk: Leon Bottou Copycat project" in Retrospectives: A Venue AM for Self-Refection in ML Research, 09:30 AM 09:30 Invited talk: Melanie Mitchell, AM "Active Symbols and Analogy- In our 1995 paper “The Copycat Project: A Model Making: Refections on Hofstadter & of Mental Fluidity and Analogy-Making”, Douglas Mitchell's Copycat project" Hofstadter and I described Copycat, a computer program that makes analogies in an idealized 09:50 Invited talk: Zach Lipton, "Fairness domain of letter strings. The goal of the project AM & Interpretability in Machine was to model the general-purpose ability of Learning and the Dangers of humans to fuidly perceive abstract similarities Solutionism" between situations. Copycat's active symbol 10:10 Cofee break + poster set-up architecture, inspired by human perception, was AM a unique combination of symbolic and 10:25 Contributed talk: Juergen subsymbolic components. Now, 25 years later, AI AM Schmidhuber, "Unsupervised is refocusing on abstraction and analogy as core minimax" aspects of robust intelligence, and the ideas 10:35 Contributed talk: Prabhu Pradhan, underlying Copycat have new relevance. In this AM "Smarter prototyping for neural talk I will refect on these ideas, on the limitations learning" of Copycat and its idealized domain, and on 10:45 Contributed talk: Andre Pacheco, possible novel contributions of this decades-old AM "Recent advances in deep learning work to current open problems in AI. applied for skin cancer detection" 10:55 Invited talk: Veronika Cheplygina, AM "How I Fail in Writing Papers" Abstract 4: Invited talk: Zach Lipton, 11:15 Panel: Yoshua Bengio, Melanie "Fairness & Interpretability in Machine AM Mitchell, Joelle Pineau, Jonathan Learning and the Dangers of Solutionism" in Frankle Retrospectives: A Venue for Self-Refection 12:15 Lunch break in ML Research, 09:50 AM PM 01:45 Invited talk: Emily Denton Supervised learning algorithms are increasingly PM operationalized in real-world decision-making systems. Unfortunately, the nature and 02:05 Invited talk: Percy Liang desiderata of real-world tasks rarely ft neatly into PM the supervised learning contract. Real data 02:25 Retrospectives lightning talks deviates from the training distribution, training PM targets are often weak surrogates for real-world 03:00 Posters + Cofee Break desiderata, error is seldom the right utility PM function, and while the framework ignores 04:00 Invited talk: David Duvenaud, interventions, predictions typically drive PM "Refecting on Neural ODEs" decisions. While the deep questions concerning 18 Dec. 13, 2019

the ethics of AI necessarily address the processes https://nips.cc/Conferences/2019/ that generate our data and the impacts that CallForCompetitions automated decisions will have, neither ML tools, nor proposed ML-based solutions tackle these problems head on. This talk explores the Schedule consequences and limitations of employing ML- based technology in the real world, the 08:10 Disentanglement Challenge - limitations of recent solutions (so-called fair and AM Disentanglement and Results of the interpretable algorithms) for mitigating societal Challenge Stages 1 & 2 Miladinovic, harms, and contemplates the meta-question: Bauer, Keysers when should (today's) ML systems be of the 08:40 Robot open-Ended Autonomous table altogether? AM Learning: a challenge Cartoni, Baldassarre 09:00 The Pommerman competition Abstract 10: Panel: Yoshua Bengio, Melanie AM Resnick, Bouhenguel, Görög, Zhang, Mitchell, Joelle Pineau, Jonathan Frankle in Jasek Retrospectives: A Venue for Self-Refection 09:30 The CellSignal challenge Mabey, in ML Research, 11:15 AM AM Sypetkowski, Haque, Earnshaw, Shen, Goldbloom Some of the questions that will be discussed: (1) how can we encourage researchers to share their 10:00 Learn to Move: Walk Around Song, real thoughts and feelings about their work? and AM Kidziński, Peng, Zhou, Kolesnikov, Wang, (2) how can we improve the dissemination of 'soft Akimov, Zubkov, Zeng knowledge' in the feld? 10:30 Cofee break AM 11:00 AI Driving Olympics 3 Dicle, Paull, Tani, AM Mallya, Genc, Bowser, Sun, Tao, Abstract 14: Retrospectives lightning talks in Marcotte, Chiu, Wolf Retrospectives: A Venue for Self-Refection in ML Research, 02:25 PM 01:00 Lunch break PM Lightning talks: 02:15 MicroNet Challenge Wang, Leng, An Intriguing Failing of Convolutional Neural PM Cheng, Yan, Wang, Gale, Elsen Networks and the CoordConv Solution (Rosanne 03:15 Reconnaissance Blind Chess Liu) PM competition Llorens, Gardner, Perrotta, Learning the structure of deep sparse graphical Highley, Clark, Perrotta, Bernardoni, models (Zoubin Ghahramani) Jordan, Wang, Peng Lessons Learned from The Lottery Ticket 04:15 The 3D Object Detection over HD Hypothesis (Jonathan Frankle) PM Maps for Autonomous Cars FiLM: Visual Reasoning with a General Challenge Vincent, Jain, Zhang, Conditioning Layer (Ethan Perez) Addicam, Iglovikov, Ondruska, DLPaper2Code: Auto-Generation of Code from MURAMATSU Deep Learning Research Papers (Anush Sankaran)

Conditional computation in neural networks for Abstracts (4): faster models (Emmanuel Bengio) Abstract 1: Disentanglement Challenge - Disentanglement and Results of the Challenge Stages 1 & 2 in Competition Track Competition Track Day 1 Day 1, Miladinovic, Bauer, Keysers 08:10 AM Stefan Bauer: Learning Disentangled Hugo Jair Escalante Representations Djordje Miladinovic: Disentanglement in the Real- West 116 + 117, Fri Dec 13, 08:00 AM 19 Dec. 13, 2019

World Llorens) Daniel Keysers: Disentanglement_lib * Challenges of the Game (Ryan Gardner) Bernhard Schölkopf: Hand-out-of-certficates * Competition Results (Casey Richardson) * Overview of the StrangeFish Bot (Gino Perrotta and Robert Perrotta) * Overview of the LaSalle Bot (T.J. Highley) Abstract 4: The CellSignal challenge in * Overview of the penumbra Bot (Gregory Clark) Competition Track Day 1, Mabey, Sypetkowski, Haque, Earnshaw, Shen, Goldbloom 09:30 AM * Overview of the wbernar5 Bot (William Bernardoni) * Opening remarks, description of competition, * Overview of the MBot Bot (Mark Jordan) summary of results. * Description of frst prize solution. * Description of second prize solution. * Mention of third prize solution Workshop on Federated Learning for Data * congratulations to winners and description of Privacy and Confdentiality AutoML solution.

* Prize ceremony. Lixin Fan, Jakub Konečný, Yang Liu, Brendan McMahan, Virginia Smith, Han Yu

West 118 - 120, Fri Dec 13, 08:00 AM Abstract 9: MicroNet Challenge in Competition Track Day 1, Wang, Leng, Cheng, Overview Yan, Wang, Gale, Elsen 02:15 PM

Trevor Gale and Erich Elsen. Introduction to the Privacy and security have become critical competition and overview of results. concerns in recent years, particularly as companies and organizations increasingly collect detailed information about their products and Peisong Wang, Cong Leng, and Jian Cheng. An users. This information can enable machine Empirical Study of Network Compression for learning methods that produce better products. Image Classifcation. However, it also has the potential to allow for misuse, especially when private data about individuals is involved. Recent research shows Trevor Gale and Erich Elsen. Highlights of other that privacy and utility do not necessarily need to notable entries. be at odds, but can be addressed by careful design and analysis. The need for such research is reinforced by the recent introduction of new Zhongxia Yan and Hanrui Wang. Efcient Memory- legal constraints, led by the European Union’s Augmented Language Models with Network General Data Protection Regulation (GDPR), Compression which is already inspiring novel legislative approaches around the world such as Cyber- security Law of the People’s Republic of China Trevor Gale and Erich Elsen. Updates and and The California Consumer Privacy Act of 2018. improvements for the 2020 MicroNet Challenge. An approach that has the potential to address a number of problems in this space is federated learning (FL). FL is an ML setting where many Abstract 10: Reconnaissance Blind Chess clients (e.g., mobile devices or whole competition in Competition Track Day 1, organizations) collaboratively train a model under Llorens, Gardner, Perrotta, Highley, Clark, the orchestration of a central server (e.g., service Perrotta, Bernardoni, Jordan, Wang, Peng 03:15 provider), while keeping the training data PM decentralized. Organizations and mobile devices * Chair: I-Jeng Wang have access to increasing amounts of sensitive * Competition and Game Overview (Ashley data, with scrutiny of ML privacy and data 20 Dec. 13, 2019

handling practices increasing correspondingly. . Bias and fairness in the FL setting These trends have produced signifcant interest . Attacks on FL including model poisoning, and in FL, since it provides a viable path to state-of- corresponding defenses the-art ML without the need for the centralized . Incentive mechanisms for FL collection of training data – and the risks and . Software and systems for FL responsibilities that come with such . Novel applications of techniques from other centralization. Nevertheless, signifcant felds to the FL setting: information theory, multi- challenges remain open in the FL setting, the task learning, model-agnostic meta-learning, and solution of which will require novel techniques etc. from multiple felds, as well as improved open- . Work on fully-decentralized (peer-to-peer) source tooling for both FL research and real-world learning will also be considered, as there is deployment signifcant overlap in both interest and techniques with FL. This workshop aims to bring together academic researchers and industry practitioners with Submissions in the form of extended abstracts common interests in this domain. For industry must be at most 4 pages long (not including participants, we intend to create a forum to references), be anonymized, and adhere to the communicate what kind of problems are NeurIPS 2019 format. Submissions will be practically relevant. For academic participants, accepted as contributed talks or poster we hope to make it easier to become productive presentations. The workshop will not have formal in this area. Overall, the workshop will provide an proceedings, but accepted papers will be posted opportunity to share the most recent and on the workshop website. innovative work in FL, and discuss open problems and relevant approaches. The technical issues We support reproducible research and will encouraged to be submitted include general sponsor a prize to be given to the best computation based on decentralized data (i.e., contribution that provides code to reproduce their not only machine learning), and how such results. computations can be combined with other research areas, such as diferential privacy, Submission link: https://easychair.org/ secure multi-party computation, computational conferences/?conf=fneurips2019 efciency, coding theory, etc. Contributions in theory as well as applications are welcome, Important Dates (2019) including proposals for novel system design. Submission deadline: Sep 9 Work on fully-decentralized (peer-to-peer) Author notifcation: Sep 30 learning will also be considered, as there is Camera-Ready Papers Due: TBD signifcant overlap in both interest and Workshop: Dec 13 techniques with federated learning. Organizers: Call for Contributions Lixin Fan, WeBank We welcome high quality submissions in the Jakub Konečný, Google broad area of federated learning (FL). A few (non- Yang Liu, WeBank exhaustive) topics of interest include: Brendan McMahan, Google . Optimization algorithms for FL, particularly Virginia Smith, CMU communication-efcient algorithms tolerant of Han Yu, NTU non-IID data . Approaches that scale FL to larger models, Invited Speakers: including model and gradient compression Francoise Beaufays, Principal Researcher, Google techniques Shahrokh Daijavad, Distinguished Research, IBM . Novel applications of FL Dawn Song, Professor, University of California, . Theory for FL Berkeley . Approaches to enhancing the security and Ameet Talwalkar, Assistant Professor, CMU; Chief privacy of FL, including cryptographic techniques Scientist, Determined AI and diferential privacy Max Welling, Professor, University of Amsterdam; 21 Dec. 13, 2019

VP Technologies, Qualcomm Qiang Yang, Hong Kong University of Science and 4. Daliang Li and Junpu Wang.FedMD: Technology, Hong Kong; Chief AI Ofcer, WeBank Heterogenous Federated Learning via Model Distillation FAQ Can supplementary material be added beyond 5. Sebastian Caldas, Jakub Konečný, H. Brendan the 4-page limit and are there any restrictions on Mcmahan and Ameet Talwalkar.Mitigating the it? Impact of Federated Learning on Client Resources Yes, you may include additional supplementary material, but you should ensure that the main 6. Jianyu Wang, Anit Sahu, Zhouyi Yang, Gauri paper is self-contained, since looking at Joshi and Soummya Kar.MATCHA: Speeding Up supplementary material is at the discretion of the Decentralized SGD via Matching Decomposition reviewers. The supplementary material should Sampling also follow the same NeurIPS format as the paper and be limited to a reasonable amount (max 10 7. Sebastian Caldas, Sai Meher Karthik Duddu, pages in addition to the main submission). Peter Wu, Tian Li, Jakub Konečný, H. Brendan Can a submission to this workshop be submitted Mcmahan, Virginia Smith and Ameet to another NeurIPS workshop in parallel? Talwalkar.Leaf: A Benchmark for Federated We discourage this, as it leads to more work for Settings reviewers across multiple workshops. Our suggestion is to pick one workshop to submit to. 8. Yihan Jiang, Jakub Konečný, Keith Rush and Can a paper be submitted to the workshop that Sreeram Kannan.Improving Federated Learning has already appeared at a previous conference Personalization via Model Agnostic Meta Learning with published proceedings? We won’t be accepting such submissions unless 9. Zhicong Liang, Bao Wang, Stanley Osher and they have been adapted to contain signifcantly Yuan Yao.Exploring Private Federated Learning new results (where novelty is one of the qualities with Laplacian Smoothing reviewers will be asked to evaluate). Can a paper be submitted to the workshop that is 10. Tribhuvanesh Orekondy, Seong Joon Oh, Yang currently under review or will be under review at Zhang, Bernt Schiele and Mario Fritz.Gradient- a conference during the review phase? Leaks: Understanding Deanonymization in It is fne to submit a condensed version (i.e., 4 Federated Learning pages) of a parallel conference submission, if it also fne for the conference in question. Our 11. Yang Liu, Yan Kang, Xinwei Zhang, Liping Li workshop does not have archival proceedings, and Mingyi Hong.A Communication Efcient and therefore parallel submissions of extended Vertical Federated Learning Framework versions to other conferences are acceptable. 12. Ahmed Khaled, Konstantin Mishchenko and ======Peter Richtárik.Better Communication Complexity Accepted papers: for Local SGD 1. Paul Pu Liang, Terrance Liu, Liu Ziyin, Russ Salakhutdinov and Louis-Philippe Morency. Think 13. Yang Liu, Xiong Zhang, Shuqi Qin and Locally, Act Globally: Federated Learning with Xiaoping Lei.Diferentially Private Linear Local and Global Representations Regression over Fully Decentralized Datasets

2. Xin Yao, Tianchi Huang, Rui-Xiao Zhang, Ruiyu 14. Florian Hartmann, Sunah Suh, Arkadiusz Li and Lifeng Sun. Federated Learning with Komarzewski, Tim D. Smith and Ilana Segall. Unbiased Gradient Aggregation and Controllable Federated Learning for Ranking Browser History Meta Updating Suggestions

3. Daniel Peterson, Pallika Kanani and Virendra 15. Aleksei Triastcyn and Boi Faltings.Federated Marathe. Private Federated Learning with Domain Learning with Bayesian Diferential Privacy Adaptation 22 Dec. 13, 2019

16. Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan 28. Ziteng Sun, Peter Kairouz, Ananda Theertha Moon, Honglei Liu and Anuj Kumar.Active Suresh and Brendan McMahan.Backdoor Attacks Federated Learning on Federated Learning and Corresponding Defenses 17. Kartikeya Bhardwaj, Wei Chen and Radu Marculescu.FedMAX: Activation Entropy 29. Neta Shoham, Tomer Avidor, Aviv Keren, Maximization Targeting Efective Non-IID Nadav , Daniel Benditkis, Liron Mor-Yosef Federated Learning and Itai Zeitak.Overcoming Forgetting in Federated Learning on Non-IID Data 18. Mingshu Cong, Zhongming Ou, Yanxin Zhang, Han Yu, Xi Weng, Jiabao Qu, Siu Ming Yiu, Yang Liu 30. Ahmed Khaled and Peter Richtárik.Gradient and Qiang Yang.Neural Network Optimization for Descent with Compressed Iterates a VCG-based Federated Learning Incentive Mechanism 31. Jiahuan Luo, Xueyang Wu, Yun Luo, Anbu Huang, Yunfeng Huang, Yang Liu and Qiang 19. Kai Yang, Tao Fan, Tianjian Chen, Yuanming Yang.Real-World Image Datasets for Federated Shi and Qiang Yang.A Quasi- Method Learning Based Vertical Federated Learning Framework for Logistic Regression 32. Ahmed Khaled, Konstantin Mishchenko and Peter Richtárik.First Analysis of Local GD on 20. Suyi Li, Yong Cheng, Yang Liu and Wei Heterogeneous Data Wang.Abnormal Client Behavior Detection in Federated Learning 33. Dashan Gao, Ce Ju, Xiguang Wei, Yang Liu, Tianjian Chen and Qiang Yang. HHHFL: 21. Songtao Lu, Yawen Zhang, Yunlong Wang and Hierarchical Heterogeneous Horizontal Federated Christina Mack.Learn Electronic Health Records Learning for Electroencephalography by Fully Decentralized Federated Learning

22. Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Schedule Chen and Tie-Yan Liu.Convergence and Regularization of Distributed Stochastic Variance 08:55 Opening remarks Fan Reduced Methods AM 09:00 Federated Learning for 23. Zhaorui Li, Zhicong Huang, Chaochao Chen AM Recommendation Systems Yang and Cheng Hong.Quantifcation of the Leakage in 09:30 TBD Talwalkar Federated Learning AM 10:00 Cofee break and posters 24. Tzu-Ming Harry Hsu, Hang Qi and Matthew AM Brown.Measuring the Efects of Non-Identical Data Distribution for Federated Visual 10:30 TBD Welling Classifcation AM 11:00 TBD Song 25. Boyue Li, Shicong Cen, Yuxin Chen and Yuejie AM Chi.Communication-Efcient Distributed 11:30 Think Locally, Act Globally: Optimization in Networks with Gradient Tracking AM Federated Learning with Local and Global Representations 26. Khaoula El Mekkaoui, Paul Blomstedt, Diego 11:40 FedMD: Heterogenous Federated Mesquita and Samuel Kaski.Towards federated AM Learning via Model Distillation stochastic gradient Langevin dynamics 11:50 Private Federated Learning with AM Domain Adaptation 27. Felix Sattler, Klaus-Robert Müller and Wojciech Samek.Clustered Federated Learning 23 Dec. 13, 2019

12:00 Improving Federated Learning Machine Learning for the Developing PM Personalization via Model Agnostic World (ML4D): Challenges and Risks Meta Learning 12:10 Lunch break and poster Sattler, El Maria De-Arteaga, Amanda Coston, Tejumade Afonja PM Mekkaoui, Shoham, Hong, Hartmann, Li, Li, Caldas Rivera, Wang, Bhardwaj, West 121 + 122, Fri Dec 13, 08:00 AM Orekondy, KANG, Gao, Cong, Yao, Lu, LUO, Cen, Kairouz, Jiang, Hsu, Triastcyn, As the use of machine learning becomes Liu, Khaled Ragab Bayoumi, Liang, ubiquitous, there is growing interest in Faltings, Moon, Li, Fan, Huang, Miao, Qi, understanding how machine learning can be used Brown, Glass, Wang, Chen, Marculescu, to tackle global development challenges. The avidor, Wu, Hong, Ju, Rush, Zhang, possibilities are vast, and it is important that we ZHOU, Beaufays, Zhu, Xia explore the potential benefts of such technologies, which has driven the agenda of the 01:30 TBD Ramage ML4D workshop in the past. However, there is a PM risk that technology optimism and a 02:20 TBD Beaufays categorization of ML4D research as inherently PM “social good” may result in initiatives failing to 02:30 MATCHA: Speeding Up Decentralized account for unintended harms or deviating scarce PM SGD via Matching Decomposition funds towards initiatives that appear exciting but Sampling have no demonstrated efect. Machine learning 02:40 Mitigating the Impact of Federated technologies deployed in developing regions PM Learning on Client Resources have often been created for diferent contexts 02:50 A Communication Efcient Vertical and are trained with data that is not PM Federated Learning Framework representative of the new deployment setting. 03:00 Better Communication Complexity Most concerning of all, companies sometimes PM for Local SGD make the deliberate choice to deploy new 03:10 Cofee break and poster technologies in countries with little regulation in PM order to experiment. 03:30 TBD Popa PM This year’s program will focus on the challenges and risks that arise when deploying machine 04:30 FOCUS: Federate Opportunity learning in developing regions. This one-day PM Computing for Ubiquitous System workshop will bring together a diverse set of Chen participants from across the globe to discuss 05:00 Federated Learning with Unbiased essential elements for ensuring ML4D research PM Gradient Aggregation and moves forward in a responsible and ethical Controllable Meta Updating manner. Attendees will learn about potential 05:10 Exploring Private Federated unintended harms that may result from ML4D PM Learning with Laplacian Smoothing solutions, technical challenges that currently 05:20 Gradient-Leaks: Understanding prevent the efective use of machine learning in PM Deanonymization in Federated vast regions of the world, and lessons that may Learning be learned from other felds. 05:30 Federated Learning with Bayesian PM Diferential Privacy The workshop will include invited talks, a poster 05:40 Panel disucssion session of accepted papers and panel PM discussions. We welcome paper submissions 06:10 Closing Remarks featuring novel machine learning research that PM characterizes or tackle challenges of ML4D, empirical papers that reveal unintended harms of machine learning technology in developing regions, and discussion papers that examine the 24 Dec. 13, 2019

current state of the art of ML4D and propose 04:25 Wadhwani AI and ML4D Mahale paths forward. PM 04:30 Panel Discussion: Risks and PM Challenges in ML4D Schedule 05:30 Closing Remarks and Town Hall PM 08:30 Opening Remarks AM Abstracts (7): 08:45 AI's Blindspots and Where to Find AM Them Raji Abstract 2: AI's Blindspots and Where to Find 09:15 Algorithmic Colonization Birhane Them in Machine Learning for the AM Developing World (ML4D): Challenges and 09:45 Cofee Break Risks, Raji 08:45 AM AM When we deploy machine learning models, what 10:30 Lessons from ICTD -- Information & are the known scenarios in which the technology AM Communication Tech Toyama does not work? In this talk, we will go over the 11:00 Poster session Woldeyohannis, many potential blindspots in ML deployments, AM Duvenhage, Waigama, Senay, Babirye, and how, as a fundamentally narrow and limited Ayalew, Ogueji, Prabhu, Ravindran, technology, we need to be careful to Wahab, Nwokoye, Duckworth, Abera, communicate, evaluate for and directly address Mideksa, Benabbou, Sinha, Kiskin, Soden, these risks in a way that protects users and Isagah, Mwawado, Hussien, Wilder, reinforces developer accountability. Omeiza, Rane, Mgaya, Knight, Gonzalez Villarreal, Beyene, Obrocka Tulinska, Cantu Diaz de Leon, Aro, Smith, Famoroti, Vepakomma, Raskar, Bhowmick, Abstract 3: Algorithmic Colonization in Nwokoye, Noriega Campero, Mbelwa, Machine Learning for the Developing World Trivedi (ML4D): Challenges and Risks, Birhane 09:15 12:00 Lunch AM PM As algorithmic systems become an ever more 02:00 Data sharing in and for Africa Remy integral aspect of much of the social sphere, their PM application for “social good” has also increased. 02:15 Risks of Using Non-verifed Open Western-made AI is deployed throughout the PM Data: A case study on using Machine African continent with great enthusiasm and little Learning techniques for predicting regulation or critical engagement, in a manner Pregnancy Outcomes in India Trivedi that resembles past colonial conquests. This talk 02:30 A Noxious Market for Personal Data explores some of the critical and ethical PM Abdulrahim questions that need to be raised with the 02:45 HumBug Zooniverse: a “digitization” of the African continent and the use PM crowdsourced acoustic mosquito of AI for “social good”. dataset Kiskin 03:00 Mathematics of identity at trial: PM Digital ID at the constitutional court Abstract 5: Lessons from ICTD -- Information in Kenya Mutung'u & Communication Tech in Machine Learning 03:30 Cofee and Posters for the Developing World (ML4D): PM Challenges and Risks, Toyama 10:30 AM 04:15 Rockefeller Foundation and ML4D Since the turn of the millennium, the PM Gjekmarkaj interdisciplinary feld of information & 04:20 Partnership on AI and ML4D Xiang communication technologies and development PM (ICTD) has explored how digital technologies 25 Dec. 13, 2019

could contribute to international socio-economic application of AI in solving problems related to development. The associated research developing countries is still an emerging topic. community includes both techno-utopians who Specially, AI applications in resource-poor imagine that just about any problem can be settings remains relatively nascent. There is a solved with the right application of technology, as huge scope of AI being used in such settings. For well as extreme skeptics wary of any attempts at example, researchers have started exploring AI intervention. Debates continue, but in this talk, I applications to reduce poverty and deliver a will attempt to summarize some of the expressed broad range of critical public services. However, consensus in ICTD -- things that not everyone despite many promising use cases, there are necessarily believes, but will at least pay lip many dataset related challenges that one has to service to. I will also discuss what I call overcome in such projects. These challenges technology's "Law of Amplifcation," which often take the form of missing data, incorrectly reconciles some of the difering opinions in ICTD collected data and improperly labeled variables, and also ofers guidance for how machine among other factors. As a result, we can often learning can have real-world impact. end up using data that is not representative of the problem we are trying to solve. In this case study, we explore the challenges of using such an

Abstract 8: Data sharing in and for Africa in open dataset from India, to predict an important Machine Learning for the Developing World health outcome. We highlight how the use of AI (ML4D): Challenges and Risks, Remy 02:00 without proper understanding of reporting metrics can lead to erroneous conclusions PM

Data accessibility, including the ability to make data shareable, is important for knowledge Abstract 10: A Noxious Market for Personal creation, scientifc learning, and progress. In Data in Machine Learning for the many locations, including Africa, data is a central Developing World (ML4D): Challenges and engine for economic growth and there have been Risks, Abdulrahim 02:30 PM recent calls to increase data access and sharability by organizations including the UN, AU, Many policymakers, academics and governments and many others. Discussions around data have advocated for exchangeable property rights inaccessibility, however, revolve around lack of over information as it presents a market solution resources, human capital, and advanced to what could be considered a market failure. technologies, hiding the complex and dynamic Particularly in jurisdictions such as Africa, Asia or data ecosystem and intricate challenges faced in South America, where weaker legal protections sharing data across the continent. In this piece, and feeting regulatory enforcement leaves data we shed light on these overlooked issues around subjects vulnerable or exploited regardless of the data inaccessibility and sharing within the outcome. We argue that whether we could continent using a storytelling perspective. Using achieve this personal data economy in which the perspectives of multiple stakeholders, we individuals have ownership rights akin to surface tensions and trade-ofs that are that are property rights over their data should be inadequately captured in current discussions approached with caution as a solution to ensuring around data accessibility in the continent. individuals have agency over their data across diferent legal landscapes.

Abstract 9: Risks of Using Non-verifed Open We present an objection to the use of property rights, a market solution, due to the noxious Data: A case study on using Machine nature of personal data - which is founded on Learning techniques for predicting Pregnancy Outcomes in India in Machine Satz and Sandell's objection to markets. Learning for the Developing World (ML4D): Ultimately, our rights over personal data and Challenges and Risks, Trivedi 02:15 PM privacy are borne out of our basic human rights and are a precondition for the self-development, While applications of AI is now becoming more personal fulflment and the free enjoyment of common in felds like retail and marketing, 26 Dec. 13, 2019

other fundamental human rights - and putting it any concrete groundings into the multimodal, up for sale risks corrupting its essence and value. interactive environment in which communication takes place. The symbol grounding problem frst highlighted this limitation, that ``meaningless Abstract 12: Mathematics of identity at trial: symbols (i.e. words) cannot be grounded in anything but other meaningless symbols''. Digital ID at the constitutional court in Kenya in Machine Learning for the Developing World (ML4D): Challenges and On the other hand, humans acquire language by Risks, Mutung'u 03:00 PM communicating about and interacting within a rich, perceptual environment -- providing In September and October 2019, a three judge concrete groundings, e.g. to objects or concepts bench in Kenya heard a case protesting digital ID. either physical or psychological. Thus, recent The case arose after the state rolled out a works have aimed to bridge computer vision, mandatory digital ID system, known as Huduma interactive learning, and natural language Namba that would be a prerequisite for access to understanding through language learning tasks government services. Being a constitutional case, based on natural images or through embodied arguments on human rights such as privacy were agents performing interactive tasks in physically expected. However, the state framed its case of simulated environments, often drawing on the digital ID as inevitable technological development recent successes of deep learning and and those opposed to the advancement as being reinforcement learning. We believe these lines of fearful of technology. To understand whether research pose a promising approach for building there were human rights implications with the models that do grasp the world's underlying digital ID, parties had to explain about the complexity. technology choices taken, the data practices and parties involved in the project. Hence the court The goal of this third ViGIL workshop is to bring (and public at large) spent a signifcant amount together scientists from various backgrounds - of time getting to understand multi-modal machine learning, computer vision, natural deduplication. While judgement on the case is yet language processing, neuroscience, cognitive to be delivered, the case brings to fore the science, psychology, and philosophy - to share societal implications of technology. It also calls their perspectives on grounding, embodiment, for dialogue between technologists and and interaction. By providing this opportunity for sociologists in design and execution of projects cross-discipline discussion, we hope to foster new aimed at low and middle income countries. ideas about how to learn and leverage grounding in machines as well as build new bridges between the science of human cognition and machine learning. Visually Grounded Interaction and Language Schedule

Florian Strub, Abhishek Das, Erik Wijmans, Harm de Vries, Stefan Lee, Alane Suhr, Drew Arad Hudson 08:20 Opening Remarks Strub, de Vries, Das, AM Lee, Wijmans, Arad Hudson, Suhr West 202 - 204, Fri Dec 13, 08:00 AM 08:30 Grasping Language Baldridge The dominant paradigm in modern natural AM language understanding is learning statistical 09:10 From Human Language to Agent language models from text-only corpora. This AM Action Thomason approach is founded on a distributional notion of 09:50 Cofee Break semantics, i.e. that the ''meaning'' of a word is AM based only on its relationship to other words. 10:30 Spotlight While efective for many applications, this AM approach sufers from limited semantic 10:50 Why language understanding is not understanding -- symbols learned this way lack AM a solved problem McClelland 27 Dec. 13, 2019

11:30 Louis-Philippe Morency Morency level instructions can serve to clarify these AM individual steps. Through two new datasets and 12:10 Poster session Ross, Mrabet, accompanying models, we study human-human PM Subramanian, Cideron, Mu, Bhooshan, dialog for cooperative navigation, and high- and Okur Kavil, Delbrouck, Kuo, Lair, Ilharco, low-level language instructions for cooking, Jayram, Herrera Palacio, Fujiyama, cleaning, and tidying in interactive home Tieleman, Potapenko, Chao, Sutter, environments. These datasets are a frst step Kovaleva, Lai, Wang, Sharma, Cangea, towards collaborative, dialog-enabled robots Krishnaswamy, Tsuboi, Kuhnle, Nguyen, helpful in human spaces. Yu, Saha, Xiang, Venkataraman, Kalra, Xie, Doran, Goodwin, Kadav, Daghaghi, Baldridge, Wu, Lin, Jain Abstract 3: From Human Language to Agent 01:50 Lisa Anne Hendricks Hendricks Action in Visually Grounded Interaction and PM Language, Thomason 09:10 AM 02:30 Linda Smith Smith There is a usability gap between manipulation- PM capable robots and helpful in-home digital 03:10 Poster Session agents. Dialog-enabled smart assistants have PM recently seen widespread adoption, but these 04:00 Timothy Lillicrap Lillicrap cannot move or manipulate objects. By contrast, PM manipulation-capable and mobile robots are still 04:40 Josh Tenenbaum Tenenbaum largely deployed in industrial settings and do not PM interact with human users. Language-enabled 05:20 Panel Discussion Smith, Tenenbaum, robots can bridge this gap---natural language PM Hendricks, McClelland, Lillicrap, interfaces help robots and non-experts Thomason, Baldridge, Morency collaborate to achieve their goals. Navigation in 06:00 Closing Remarks unexplored environments to high-level targets PM like "Go to the room with a plant" can be facilitated by enabling agents to ask questions and react to human clarifcations on-the-fy. Abstracts (7): Further, high-level instructions like "Put a plate of toast on the table" require inferring many steps, Abstract 2: Grasping Language in Visually from fnding a knife to operating a toaster. Low- Grounded Interaction and Language, level instructions can serve to clarify these Baldridge 08:30 AM individual steps. Through two new datasets and There is a usability gap between manipulation- accompanying models, we study human-human capable robots and helpful in-home digital dialog for cooperative navigation, and high- and agents. Dialog-enabled smart assistants have low-level language instructions for cooking, recently seen widespread adoption, but these cleaning, and tidying in interactive home cannot move or manipulate objects. By contrast, environments. These datasets are a frst step manipulation-capable and mobile robots are still towards collaborative, dialog-enabled robots largely deployed in industrial settings and do not helpful in human spaces. interact with human users. Language-enabled robots can bridge this gap---natural language interfaces help robots and non-experts Abstract 6: Why language understanding is collaborate to achieve their goals. Navigation in not a solved problem in Visually Grounded unexplored environments to high-level targets Interaction and Language, McClelland 10:50 like "Go to the room with a plant" can be AM facilitated by enabling agents to ask questions and react to human clarifcations on-the-fy. Over the years, periods of intense excitement Further, high-level instructions like "Put a plate of about the prospects of machine intelligence and toast on the table" require inferring many steps, language understanding have alternated with from fnding a knife to operating a toaster. Low- periods of skepticism, to say the least. It is 28 Dec. 13, 2019

possible to look back over the ~70 year history of Robust AI in Financial Services: Data, this efort and see great progress, and I for one Fairness, Explainability, Trustworthiness, am pleased to see how far we have come. Yet and Privacy from where I sit we still have a long way to go, and language understanding may be one of those Alina Oprea, Avigdor Gal, Isabelle Moulinier, Jiahao parts of intelligence that will be the hardest to Chen, Manuela Veloso, Senthil Kumar, Tanveer solve. In spite of recent breakthroughs, humans Faruquie create and comprehend more structured discourse than our current machines. At the same West 205 - 207, Fri Dec 13, 08:00 AM time, psycholinguistic research suggests that humans sufer from some of the same limitations The fnancial services industry has unique needs as these machines. How can humans create and for robustness when adopting artifcial comprehend structured arguments given these intelligence and machine learning (AI/ML). Many limitations? Will it be possible for machines to challenges can be described as intricate emulate these aspects of human achievement as relationships between algorithmic fairness, well? explainability, privacy, data management, and trustworthiness. For example, there are ethical and regulatory needs to prove that models used for activities such as credit decisioning and Abstract 7: Louis-Philippe Morency in Visually lending are fair and unbiased, or that machine Grounded Interaction and Language, reliance does not cause humans to miss critical Morency 11:30 AM pieces of data. The use and protection of Note that the schedule is not fnal, and may customer data necessitates secure and privacy- change. aware computation, as well as explainability around the use of sensitive data. Some challenges like entity resolution are exacerbated because of scale, highly nuanced data points and Abstract 9: Lisa Anne Hendricks in Visually missing information. Grounded Interaction and Language, Hendricks 01:50 PM On top of these fundamental requirements, the Note that the schedule is not fnal, and may fnancial industry is ripe with adversaries who change. purport fraud, resulting in large-scale data breaches and loss of confdential information in the fnancial industry. The need to counteract malicious actors therefore calls for robust Abstract 10: Linda Smith in Visually methods that can tolerate noise and adversarial Grounded Interaction and Language, Smith corruption of data. However, recent advances in 02:30 PM adversarial attacks of AI/ML systems demonstrate Note that the schedule is not fnal, and may how often generic solutions for robustness and change. security fail, thus highlighting the need for further advances. The challenge of robust AI/ML is further complicated by constraints on data privacy and fairness, as imposed by ethical and Abstract 13: Josh Tenenbaum in Visually regulatory concerns like GDPR. Grounded Interaction and Language, Tenenbaum 04:40 PM This workshop aims to bring together researchers Note that the schedule is not fnal, and may and practitioners to discuss challenges for AI/ML change. in fnancial services, and the opportunities such challenges represent to research communities. The workshop will consist of invited talks, panel discussions and short paper presentations, which will showcase ongoing research and novel algorithms resulting from collaboration of AI/ML 29 Dec. 13, 2019

and cybersecurity communities, as well as the challenges that arise from applying these ideas in Learning with Rich Experience: Integration domain-specifc contexts. of Learning Paradigms

Schedule Zhiting Hu, Andrew Wilson, Chelsea Finn, Lisa Lee, Taylor Berg-Kirkpatrick, Ruslan Salakhutdinov, Eric Xing 08:00 Opening Remarks Chen, Veloso, AM Kumar, Moulinier, Gal, Oprea, Faruquie, West 208 + 209, Fri Dec 13, 08:00 AM K. Machine learning is about computational methods 08:15 In search of predictability Perlich that enable machines to learn concepts and AM improve performance from experience. Here, 08:45 Oral highlight presentations for experience can take diverse forms, including data AM selected contributed papers (10 min examples, abstract knowledge, interactions and x 3) feedback from the environment, other models, 10:30 Invited Talk by Louiqa Raschid and so forth. Depending on diferent assumptions AM (University of Maryland) Raschid on the types and amount of experience available 11:00 Oral highlight presentations for there are diferent learning paradigms, such as AM selected contributed papers (10 min supervised learning, active learning, x 6) reinforcement learning, knowledge distillation, 01:30 Understanding equilibrium adversarial learning, and combinations thereof. PM properties of multi-agent systems On the other hand, a hallmark of human Wooldridge intelligence is the ability to learn from all sources 02:00 Oral highlight presentations for of information. In this workshop, we aim to PM selected contributed papers (10 min explore various aspects of learning paradigms, x 6) particularly theoretical properties and formal 02:30 Discussion Panel Veloso connections between them, and new algorithms PM combining multiple modes of supervisions, etc. 03:00 Putting Ethical AI to the Vote PM Procaccia Schedule 03:30 Poster Session Baracaldo Angel, Neel, PM Le, Philps, Tao, Chatzis, Suzumura, Wang, BAO, Barocas, Raghavan, Maina, Bryant, 08:40 Opening Remarks Varshney, Speakman, Gill, Schmidt, AM Compher, Govindarajulu, Sharma, 08:50 Lightning Talks - I Vepakomma, Swedish, Kalpathy-Cramer, AM Raskar, Zheng, Pechenizkiy, Schreyer, 09:10 Raia Hadsell Hadsell Ling, Nagpal, Tillman, Veloso, Chen, AM Wang, Wellman, van Adelsberg, Wood, 09:45 Cofee Break Buehler, Mahfouz, Alexos, Shearer, AM Polychroniadou, Stavarache, Efmov, Hall, 10:30 Tom Mitchell Mitchell Zhang, Diana, Ganesh, Ravi, , panda, AM Renard, Jagielski, Shavit, Williams, Wei, 11:05 Jefrey Bilmes Bilmes Zhai, Li, Shen, Matsunaga, Choi, AM Laignelet, Guler, Roa Vicens, Desai, 11:40 Lightning Talks - II Aigrain, Samoilescu AM 05:15 Invited Talk by Yuan (Alan) Qi (Ant PM Financial) Qi 05:45 Closing remarks Kumar PM 30 Dec. 13, 2019

11:55 Poster Session Chourasia, Xu, Cortes, limitations: (1) they require fne hyper-parameter AM Chang, Nagano, Min, Boecking, Tran, tuning, (2) they do not incorporate curvature Seyed Ghasemipour, Ding, Mani, Voleti, information, and thus are sensitive to ill- Fakoor, Xu, Marino, Lee, Tresp, Kagy, conditioning, and (3) they are often unable to Zhang, Poczos, Khandelwal, Bardes, fully exploit the power of distributed computing Shelhamer, Zhu, Li, Li, Krasheninnikov, architectures. Wang, Jaiswal, Barsoum, Sanjeev, Wattanavekin, Xie, Wu, Yoshida, Kanaa, Higher-order methods, such as Newton, quasi- Khoshfetrat Pakazad, Maasoumy Newton and adaptive gradient descent methods, 12:30 Lunch are extensively used in many scientifc and PM engineering domains. At least in theory, these 02:10 Contributed Oral methods possess several nice features: they PM exploit local curvature information to mitigate the efects of ill-conditioning, they avoid or diminish 02:20 Pieter Abbeel Abbeel the need for hyper-parameter tuning, and they PM have enough concurrency to take advantage of 02:55 Yejin Choi Choi distributed computing environments. Researchers PM have even developed stochastic versions of 03:30 Cofee Break higher-order methods, that feature speed and PM scalability by incorporating curvature information 04:20 Tom Grifths Grifths in an economical and judicious manner. However, PM often higher-order methods are “undervalued.” 04:55 Contributed Oral PM This workshop will attempt to shed light on this 05:05 Panel Discussion statement. Topics of interest include --but are not PM limited to-- second-order methods, adaptive 05:50 Closing Remarks gradient descent methods, regularization PM techniques, as well as techniques based on higher-order derivatives.

Beyond frst order methods in machine Schedule learning systems

08:00 Opening Remarks Kyrillidis, Berahas, Tasos Kyrillidis, Albert Berahas, Fred Roosta, Michael W Mahoney AM Roosta, Mahoney 08:30 Economical use of second-order West 211 - 214, Fri Dec 13, 08:00 AM AM information in training machine learning models Goldfarb Optimization lies at the heart of many exciting 09:00 Spotlight talks Granziol, Pedregosa, Asi developments in machine learning, statistics and AM signal processing. As models become more 09:45 Poster Session Gorbunov, d'Aspremont, complex and datasets get larger, fnding efcient, AM Wang, Wang, Ginsburg, Quaglino, reliable and provable methods is one of the Castera, Adya, Granziol, Das, primary goals in these felds. Bollapragada, Pedregosa, Takac, Jahani, Karimireddy, Asi, Daroczy, Adolphs, In the last few decades, much efort has been Rawal, Brandt, Li, Ughi, Romero, devoted to the development of frst-order Skorokhodov, Scieur, Bae, Mishchenko, methods. These methods enjoy a low per- Anil, Sharan, Balu, Chen, Yao, Ergen, iteration cost and have optimal complexity, are Grigas, Li, Ba, Roberts, Vaswani, easy to implement, and have proven to be Eftekhari, Sharma efective for most machine learning applications. 10:30 Adaptive gradient methods: efcient First-order methods, however, have signifcant AM implementation and generalization 31 Dec. 13, 2019

11:15 Spotlight talks Scieur, Mishchenko, Anil deep learning models. Numerical results will be AM presented that show that it exhibits excellent 12:00 Lunch break performance without the need for learning rate PM tuning. If there is time, additional ways to 02:00 K-FAC: Extensions, improvements, efciently make use of second-order information PM and applications Martens will be presented. 02:45 Spotlight talks Grigas, Yao, Lucchi, PM Meng 03:30 Poster Session (same as above) Abstract 3: Spotlight talks in Beyond frst PM order methods in machine learning 04:15 Analysis of linear search methods systems, Granziol, Pedregosa, Asi 09:00 AM PM for various gradient approximation How does mini-batching afect Curvature schemes for noisy derivative free information for second order deep learning optimization. Scheinberg optimization? Diego Granziol (Oxford); Stephen 05:00 Second-order methods for Roberts (Oxford); Xingchen Wan (Oxford PM nonconvex optimization with University); Stefan Zohren (University of Oxford); complexity guarantees Wright Binxin Ru (University of Oxford); Michael A. 05:45 Final remarks Kyrillidis, Berahas, Osborne (University of Oxford); Andrew Wilson PM Roosta, Mahoney (NYU); sebastien ehrhardt (Oxford); Dmitry P Vetrov (Higher School of Economics); Timur Garipov (Samsung AI Center in Moscow) Abstracts (12):

Abstract 1: Opening Remarks in Beyond frst through Spectral Modeling. Fabian order methods in machine learning Pedregosa (Google); Damien Scieur (Princeton systems, Kyrillidis, Berahas, Roosta, Mahoney University) 08:00 AM Using better models in stochastic optimization. Opening remarks for the workshop by the Hilal Asi (); John Duchi organizers (Stanford University)

Abstract 2: Economical use of second-order Abstract 4: Poster Session in Beyond frst information in training machine learning order methods in machine learning models in Beyond frst order methods in systems, Gorbunov, d'Aspremont, Wang, Wang, machine learning systems, Goldfarb 08:30 AM Ginsburg, Quaglino, Castera, Adya, Granziol, Das, Bollapragada, Pedregosa, Takac, Jahani, Stochastic gradient descent (SGD) and variants Karimireddy, Asi, Daroczy, Adolphs, Rawal, such as Adagrad and Adam, are extensively used Brandt, Li, Ughi, Romero, Skorokhodov, Scieur, today to train modern machine learning models. Bae, Mishchenko, Anil, Sharan, Balu, Chen, Yao, In this talk we will discuss ways to economically Ergen, Grigas, Li, Ba, Roberts, Vaswani, Eftekhari, use second-order information to modify both the Sharma 09:45 AM step size (learning rate) used in SGD and the direction taken by SGD. Our methods adaptively Poster Session control the batch sizes used to compute gradient and Hessian approximations and and ensure that the steps that are taken decrease the loss Abstract 5: Adaptive gradient methods: function with high probability assuming that the efcient implementation and generalization latter is self-concordant, as is true for many in Beyond frst order methods in machine problems in empirical risk minimization. For such learning systems, 10:30 AM cases we prove that our basic algorithm is globally linearly convergent. A slightly modifed Adaptive gradient methods have had a version of our method is presented for training transformative impact in deep learning. We will 32 Dec. 13, 2019

describe recent theoretical and experimental approximations to the Fisher. For these I will advances in their understanding, including low- provide mathematical intuitions and empirical memory adaptive preconditioning, and insights results which speak to their efcacy in neural into their generalizaton ability. network optimization. Time permitting, I will also discuss some recent results on large-batch optimization with K-FAC, and the use of adaptive adjustment methods that can eliminate the need Abstract 6: Spotlight talks in Beyond frst order methods in machine learning for costly hyperparameter tuning. systems, Scieur, Mishchenko, Anil 11:15 AM

Symmetric Multisecant quasi-Newton methods. Abstract 9: Spotlight talks in Beyond frst Damien Scieur (Samsung AI Research Montreal); order methods in machine learning Thomas Pumir (Princeton University); Nicolas systems, Grigas, Yao, Lucchi, Meng 02:45 PM Boumal (Princeton University) Hessian-Aware trace-Weighted Quantization. Stochastic Newton Method and its Cubic Zhen Dong (UC Berkeley); Zhewei Yao (University Regularization via Majorization-Minimization. of California, Berkeley); Amir Gholami (UC Konstantin Mishchenko (King Abdullah University Berkeley); Yaohui Cai (Peking University); Daiyaan of Science & Technology (KAUST)); Peter Richtarik Arfeen (UC Berkeley); Michael Mahoney (KAUST); Dmitry Koralev (KAUST) ("University of California, Berkeley"); Kurt Keutzer (UC Berkeley) Full Matrix Preconditioning Made Practical. Rohan Anil (Google); Vineet Gupta (Google); Tomer New Methods for Regularization Path Koren (Google); Kevin Regan (Google); Yoram Optimization via Diferential Equations. Paul Singer (Princeton) Grigas (UC Berkeley); Heyuan Liu (University of California, Berkeley)

Abstract 8: K-FAC: Extensions, improvements, Ellipsoidal Trust Region Methods for Neural Nets. and applications in Beyond frst order Leonard Adolphs (ETHZ); Jonas Kohler (ETHZ) methods in machine learning systems, Sub-sampled Newton Methods Under Martens 02:00 PM Interpolation. Si Yi Meng (University of British Second order optimization methods have the Columbia); Sharan Vaswani (Mila, Université de potential to be much faster than frst order Montréal); Issam Laradji (University of British methods in the deterministic case, or pre- Columbia); Mark Schmidt (University of British asymptotically in the stochastic case. However, Columbia); Simon Lacoste-Julien (Mila, Université traditional second order methods have proven de Montréal) inefective or impractical for neural network training, due in part to the extremely high dimension of the parameter space. Kronecker- Abstract 10: Poster Session (same as above) factored Approximate Curvature (K-FAC) is in Beyond frst order methods in machine second-order optimization method based on a learning systems, 03:30 PM tractable approximation to the -Newton/ Fisher matrix that exploits the special structure An Accelerated Method for Derivative-Free present in neural network training objectives. Smooth Stochastic Convex Optimization. Eduard This approximation is neither low-rank nor Gorbunov (Moscow Institute of Physics and diagonal, but instead involves Kronecker- Technology); Pavel Dvurechenskii (WIAS products, which allows for efcient estimation, Germany); Alexander Gasnikov (Moscow Institute storage and inversion of the curvature matrix. In of Physics and Technology) this talk I will introduce the basic K-FAC method for standard MLPs and then present some more Fast Bregman Gradient Methods for Low-Rank recent work in this direction, including extensions Minimization Problems. Radu-Alexandru Dragomir to CNNs and RNNs, both of which requires new (Université Toulouse 1); Jérôme Bolte (Université 33 Dec. 13, 2019

Toulouse 1); Alexandre d'Aspremont (Ecole Osborne (University of Oxford); Andrew Wilson Normale Superieure) (NYU); sebastien ehrhardt (Oxford); Dmitry P Vetrov (Higher School of Economics); Timur Gluster: Variance Reduced Mini-Batch SGD with Garipov (Samsung AI Center in Moscow) Gradient Clustering. Fartash Faghri (University of Toronto); David Duvenaud (University of Toronto); On the Convergence of a Biased Version of David Fleet (University of Toronto); Jimmy Ba Stochastic Gradient Descent. Rudrajit Das (University of Toronto) (University of Texas at Austin); Jiong Zhang (UT- Austin); Inderjit S. Dhillon (UT Austin & Amazon) Neural Policy Gradient Methods: Global Optimality and Rates of Convergence. Lingxiao Wang Adaptive Sampling Quasi-Newton Methods for (Northwestern University); Qi Cai (Northwestern Derivative-Free Stochastic Optimization. Raghu University); Zhuoran Yang (Princeton University); Bollapragada (Argonne National Laboratory); Zhaoran Wang (Northwestern University) Stefan Wild (Argonne National Laboratory)

A -Gauss-Newton Method Learning * Acceleration through Spectral Modeling. Fabian Overparameterized Deep Neural Networks for Pedregosa (Google); Damien Scieur (Princeton Regression Problems. Tianle Cai (Peking University) University); Ruiqi Gao (Peking University); Jikai Hou (Peking University); Siyu Chen (Peking Accelerating Distributed Stochastic L-BFGS by University); Dong Wang (Peking University); Di He sampled 2nd-Order Information. Jie Liu (Lehigh (Peking University); Zhihua Zhang (Peking University); Yu Rong (Tencent AI Lab); Martin University); Liwei Wang (Peking University) Takac (Lehigh University); Junzhou Huang (Tencent AI Lab) Stochastic Gradient Methods with Layerwise Adaptive Moments for Training of Deep Networks. Grow Your Samples and Optimize Better via Boris Ginsburg (); Oleksii Hrinchuk Distributed Newton CG and Accumulating (NVIDIA); Jason Li (NVIDIA); Vitaly Lavrukhin Strategy. Majid Jahani (Lehigh University); Xi He (NVIDIA); Ryan Leary (NVIDIA); Oleksii Kuchaiev (Lehigh University); Chenxin Ma (Lehigh (NVIDIA); Jonathan Cohen (NVIDIA); Huyen University); Aryan Mokhtari (UT Austin); Nguyen (NVIDIA); Yang Zhang (NVIDIA) Dheevatsa Mudigere ( Labs); Alejandro Ribeiro (University of Pennsylvania); Martin Takac Accelerating Neural ODEs with Spectral Elements. (Lehigh University) Alessio Quaglino (NNAISENSE SA); Marco Gallieri (NNAISENSE); Jonathan Masci (NNAISENSE); Jan Global linear convergence of trust-region Koutnik (NNAISENSE) Newton's method without strong-convexity or smoothness. Sai Praneeth Karimireddy (EPFL); An Inertial Newton Algorithm for Deep Learning. Sebastian Stich (EPFL); Martin Jaggi (EPFL) Camille Castera (CNRS, IRIT); Jérôme Bolte (Université Toulouse 1); Cédric Févotte (CNRS, FD-Net with Auxiliary Time Steps: Fast Prediction IRIT); Edouard Pauwels (Toulouse 3 University) of PDEs using Hessian-Free Trust-Region Methods. Nur Sila Gulgec (Lehigh University); Zheng Shi Nonlinear Conjugate Gradients for Scaling (Lehigh University); Neil Deshmukh (MIT Synchronous Distributed DNN Training. Saurabh BeaverWorks - Medlytics); Shamim Pakzad Adya (Apple); Vinay Palakkode (Apple Inc.); Oncel (Lehigh University); Martin Takac (Lehigh Tuzel (Apple Inc.) University)

* How does mini-batching afect Curvature * Using better models in stochastic optimization. information for second order deep learning Hilal Asi (Stanford University); John Duchi optimization? Diego Granziol (Oxford); Stephen (Stanford University) Roberts (Oxford); Xingchen Wan (Oxford University); Stefan Zohren (University of Oxford); Tangent space separability in feedforward neural Binxin Ru (University of Oxford); Michael A. networks. Bálint Daróczy (Institute for Computer 34 Dec. 13, 2019

Science and Control, Hungarian Academy of Method via System of Coupled Ordinary Sciences); Rita Aleksziev (Institute for Computer Diferential Equations. Chhavi Sharma (IIT Science and Control, Hungarian Academy of Bombay); Vishnu Narayanan (IIT Bombay); Sciences); Andras Benczur (Hungarian Academy Balamurugan Palaniappan (IIT Bombay) of Sciences) Finite-Time Convergence of Continuous-Time * Ellipsoidal Trust Region Methods for Neural Optimization Algorithms via Diferential Nets. Leonard Adolphs (ETHZ); Jonas Kohler Inclusions. Orlando Romero (Rensselaer (ETHZ) Polytechnic Institute); Mouhacine Benosman (MERL) Closing the K-FAC Generalisation Gap Using Stochastic Weight Averaging. Xingchen Wan Loss Landscape Sightseeing by Multi-Point (University of Oxford); Diego Granziol (Oxford); Optimization. Ivan Skorokhodov (MIPT); Mikhail Stefan Zohren (University of Oxford); Stephen Burtsev (NI) Roberts (Oxford) * Symmetric Multisecant quasi-Newton methods. * Sub-sampled Newton Methods Under Damien Scieur (Samsung AI Research Montreal); Interpolation. Si Yi Meng (University of British Thomas Pumir (Princeton University); Nicolas Columbia); Sharan Vaswani (Mila, Université de Boumal (Princeton University) Montréal); Issam Laradji (University of British Columbia); Mark Schmidt (University of British Does Adam optimizer keep close to the optimal Columbia); Simon Lacoste-Julien (Mila, Université point? Kiwook Bae (KAIST)*; Heechang Ryu de Montréal) (KAIST); Hayong Shin (KAIST)

Learned First-Order Preconditioning. Aditya Rawal * Stochastic Newton Method and its Cubic (Uber AI Labs); Rui Wang (Uber AI); Theodore Regularization via Majorization-Minimization. Moskovitz (Gatsby Computational Neuroscience Konstantin Mishchenko (King Abdullah University Unit); Sanyam Kapoor (Uber); Janice Lan (Uber of Science & Technology (KAUST)); Peter Richtarik AI); Jason Yosinski (Uber AI Labs); Thomas Miconi (KAUST); Dmitry Koralev (KAUST) (Uber AI Labs) * Full Matrix Preconditioning Made Practical. Iterative Hessian Sketch in Input Sparsity Time. Rohan Anil (Google); Vineet Gupta (Google); Charlie Dickens (University of Warwick); Graham Tomer Koren (Google); Kevin Regan (Google); Cormode (University of Warwick) Yoram Singer (Princeton)

Nonlinear matrix recovery. Florentin Goyens Memory-Sample Tradeofs for Linear Regression (University of Oxford); Coralia Cartis (Oxford with Small Error. Vatsal Sharan (Stanford University); Armin Eftekhari (EPFL) University); Aaron Sidford (Stanford); Gregory Valiant (Stanford University) Making Variance Reduction more Efective for Deep Networks. Nicolas Brandt (EPFL); Farnood On the Higher-order Moments in Adam. Zhanhong Salehi (EPFL); Patrick Thiran (EPFL) Jiang (Johnson Controls International); Aditya Balu (Iowa State University); Sin Yong Tan (Iowa State Novel and Efcient Approximations for Zero-One University); Young M Lee (Johnson Controls Loss of Linear Classifers. Hiva Ghanbari (Lehigh International); Chinmay Hegde (Iowa State University); Minhan Li (Lehigh University); Katya University); Soumik Sarkar (Iowa State University) Scheinberg (Lehigh)

A Model-Based Derivative-Free Approach to Black- h-matrix approximation for Gauss-Newton Box Adversarial Examples: BOBYQA. Giuseppe Hessian. Chao Chen (UT Austin) Ughi (University of Oxford) * Hessian-Aware trace-Weighted Quantization. Distributed Accelerated Inexact Proximal Gradient Zhen Dong (UC Berkeley); Zhewei Yao (University 35 Dec. 13, 2019

of California, Berkeley); Amir Gholami (UC with several recent papers related to policy Berkeley); Yaohui Cai (Peking University); Daiyaan optimization. Arfeen (UC Berkeley); Michael Mahoney ("University of California, Berkeley"); Kurt Keutzer (UC Berkeley) Abstract 12: Second-order methods for nonconvex optimization with complexity Random Projections for Learning Non-convex guarantees in Beyond frst order methods in Models. Tolga Ergen (Stanford University); machine learning systems, Wright 05:00 PM Emmanuel Candes (Stanford University); Mert Pilanci (Stanford) We consider problems of smooth nonconvex optimization: unconstrained, bound-constrained, * New Methods for Regularization Path and with general equality constraints. We show Optimization via Diferential Equations. Paul that algorithms for these problems that are Grigas (UC Berkeley); Heyuan Liu (University of widely used in practice can be modifed slightly in California, Berkeley) ways that guarantees convergence to approximate frst- and second-order optimal Hessian-Aware Zeroth-Order Optimization. points with complexity guarantees that depend Haishan Ye (HKUST); Zhichao Huang (HKUST); on the desired accuracy. The methods we discuss Cong Fang (Peking University); Chris Junchi Li are constructed from Newton's method, the (Tencent); Tong Zhang (HKUST) conjugate gradient method, log-barrier method, and augmented Lagrangians. (In some cases, Higher-Order Accelerated Methods for Faster Non- special structure of the objective function makes Smooth Optimization. Brian Bullins (TTIC) for only a weak dependence on the accuracy parameter.) Our methods require Hessian information only in the form of Hessian-vector products, so do not require the Hessian to be Abstract 11: Analysis of linear search evaluated and stored explicitly. This talk methods for various gradient approximation schemes for noisy derivative free describes joint work with Clement Royer, Yue Xie, optimization. in Beyond frst order methods and Michael O'Neill. in machine learning systems, Scheinberg 04:15 PM Abstract 13: Final remarks in Beyond frst We develop convergence analysis of a modifed order methods in machine learning line search method for objective functions whose systems, Kyrillidis, Berahas, Roosta, Mahoney value is computed with noise and whose gradient 05:45 PM estimates are not directly available. The noise is assumed to be bounded in absolute value without Final remarks for the workshop any additional assumptions. In this case, gradient approximation can be constructed via interpolation or sample average approximation of smoothing gradients and thus they are always CiML 2019: Machine Learning inexact and possibly random. We extend the Competitions for All framework based on stochastic methods which was developed to provide analysis of a standard Adrienne Mendrik, Wei-Wei Tu, Isabelle Guyon, line-search method with exact function values Evelyne Viegas, Ming LI and random gradients to the case of noisy function. We introduce a condition on the West 215 + 216, Fri Dec 13, 08:00 AM gradient which when satisfed with some sufciently large probability at each iteration, Challenges in machine learning and data science guarantees convergence properties of the line are open online competitions that address search method. We derive expected complexity problems by providing datasets or simulated bounds for convex, strongly convex and environments. They measure the performance of nonconvex functions. We motivate these results machine learning algorithms with respect to a 36 Dec. 13, 2019

given problem. The playful nature of challenges 09:00 Emily Bender (University of naturally attracts students, making challenges a AM Washington) "Making Stakeholder great teaching resource. However, in addition to Impacts Visible in the Evaluation the use of challenges as educational tools, Cycle: Towards Fairness-Integrated challenges have a role to play towards a better Shared Tasks and Evaluation democratization of AI and machine learning. They Metrics" Bender function as cost efective problem-solving tools 09:45 Cofee Break and a means of encouraging the development of AM re-usable problem templates and open-sourced 10:30 Dina Machuve (Nelson Mandela solutions. However, at present, the geographic, AM African Institution of Science and sociological repartition of challenge participants Technology) “Machine Learning and organizers is very biased. While recent Competitions: The Outlook from successes in machine learning have raised much Africa” Machuve hopes, there is a growing concern that the 11:15 Dog Image Generation Competition societal and economical benefts might AM on Kaggle Kan, Culliton increasingly be in the power and under control of 11:30 Learning To Run a Power Network a few. AM Competition Donnot 11:45 The AI Driving Olympics: An CiML (Challenges in Machine Learning) is a forum AM Accessible Robot Learning that brings together workshop organizers, Benchmark Walter platform providers, and participants to discuss best practices in challenge organization and new 12:00 Conclusion on TrackML, a Particle methods and application opportunities to design PM Physics Tracking Machine Learning high impact challenges. Following the success of Challenge Combining Accuracy and previous years' workshops, we will reconvene and Inference Speed Rousseau, vlimant discuss new opportunities for broadening our 12:15 Catered Lunch and Poster Viewing community. PM (in Workshop Room) Stolovitzky, Pradhan, Duboue, Tang, Natekin, Bondi, For this sixth edition of the CiML workshop at Bouthillier, Milani, Müller, Holzinger, NeurIPS our objective is twofold: (1) We aim to Harrer, Day, Ustyuzhanin, Guss, enlarge the community, fostering diversity in the Mirmomeni community of participants and organizers; (2) We 02:00 Frank Hutter (University of Freiburg) aim to promote the organization of challenges for PM "A Proposal for a New Competition the beneft of more diverse communities. Design Emphasizing Scientifc Insights" Hutter The workshop provides room for discussion on 02:45 Design and Analysis of Experiments: these topics and aims to bring together potential PM A Challenge Approach in Teaching partners to organize such challenges and Pavao stimulate "machine learning for good", i.e. the 03:00 The model-to-data paradigm: organization of challenges for the beneft of PM overcoming data access barriers in society. We have invited prominent speakers that biomedical competitions Guinney have experience in this domain. 03:15 The Deep Learning Epilepsy PM Detection Challenge: Design, Schedule Implementation, and Test of a New Crowd-Sourced AI Challenge Ecosystem Kiral 08:00 Welcome and Opening Remarks 03:30 Cofee Break AM Mendrik, Tu, Guyon, Viegas, LI PM 08:15 Amir Banifatemi (XPrize) "AI for 04:15 Open Space Topic “The Organization AM Good via Machine Learning PM of Challenges for the Beneft of More Challenges" Banifatemi Diverse Communities” Mendrik, Guyon, Tu, Viegas, LI 37 Dec. 13, 2019

visible in dataset and metric design will be key to making the potential for such impact present and Abstracts (11): salient to developers. We see this as an efective Abstract 2: Amir Banifatemi (XPrize) "AI for way to promote the development of machine Good via Machine Learning Challenges" in learning technology that is helpful for people, especially those who have been subject to CiML 2019: Machine Learning Competitions marginalization. This talk will explore how to for All, Banifatemi 08:15 AM develop such shared tasks, considering task "AI for Good" eforts (e.g., applications work in choice, stakeholder community input, and sustainability, education, health, fnancial annotation and metric design desiderata. inclusion, etc.) have demonstrated the capacity to simultaneously advance intelligent system Joint work with Hal Daumé III, University of research and the greater good. Unfortunately, the Maryland, Bernease Herman, University of majority of research that could fnd motivation in Washington, and Brandeis Marshall, Spelman real-world "good" problems still center on College. problems with industrial or toy problem performance baselines.

Abstract 5: Dina Machuve (Nelson Mandela Competitions can serve as an important shaping African Institution of Science and reward for steering academia towards research Technology) “Machine Learning that is simultaneously impactful on our state of Competitions: The Outlook from Africa” in knowledge and the state of the world. This talk CiML 2019: Machine Learning Competitions covers three aspects of AI for Good competitions. for All, Machuve 10:30 AM First, we survey current eforts within the AI for Good application space as a means of identifying The current AI landscape in Africa mainly focuses current and future opportunities. Next we discuss on capacity building. The ongoing eforts to how more qualitative notions of "Good" can be strengthen the AI capacity in Africa are organized used as benchmarks in addition to more in summer schools, workshops, meetups, quantitative competition objective functions. competitions and one long-term program at the Finally, we will provide notes on building Masters level. The main AI initiatives driving the coalitions of domain experts to develop and guide AI capacity building agenda in Africa include a) socially-impactful competitions in machine Deep Learning Indaba, b) Data Science Africa, c) learning. Data Science Nigeria, d) Nairobi Women in Machine Learning and Data Science, e) Zindi and f) The African Master's in Machine Intelligence (AMMI) at AIMS. The talk will summarize our Abstract 3: Emily Bender (University of experience on low participation of African AI Washington) "Making Stakeholder Impacts Visible in the Evaluation Cycle: Towards developers at machine learning competitions and Fairness-Integrated Shared Tasks and our recommendations to address the current Evaluation Metrics" in CiML 2019: Machine challenges. Learning Competitions for All, Bender 09:00 AM Abstract 6: Dog Image Generation In a typical machine learning competition or Competition on Kaggle in CiML 2019: shared task, success is measured in terms of Machine Learning Competitions for All, Kan, systems' ability to reproduce gold-standard Culliton 11:15 AM labels. The potential impact of the systems being developed on stakeholder populations, if We present a novel format of machine learning considered at all, is studied separately from competitions where a user submits code that system `performance'. Given the tight train-eval generates images trained on training samples, cycle of both shared tasks and system the code then runs on Kaggle, produces dog development in general, we argue that making images, and user receives scores for the disparate impact on vulnerable populations performance of their generative content based on 38 Dec. 13, 2019

1. quality of images, 2. diversity of images, and simulators, data logs, code templates, baseline 3. memorization penalty. This style of competition implementations, and low-cost access to robotic targets the usage of Generative Adversarial hardware. We evaluate submissions in simulation Networks (GAN)[4], but is open for all generative online, on standardized hardware environments, models. Our implementation addresses and fnally at the competition events. We have overftting by incorporating two diferent pre- held successful AI-DO competitions at NeurIPS trained neural networks, as well as two separate 2018 and ICRA 2019, and will be holding AI-DO 3 "ground truth" image datasets, for the public and at NeurIPS 2020. Together, these competitions private leaderboards. We also have an enclosed highlight the need for better benchmarks, which compute environment to prevent submissions of are lacking in robotics, as well as improved non-generated images. In this paper, we describe mechanisms to bridge the gap between both the algorithmic and system design of our simulation and reality. competition, as well as sharing our lessons learned from running this competition [6] in July 2019 with 900+ teams participating and over Abstract 10: Catered Lunch and Poster 37,000 submissions and their code received. Viewing (in Workshop Room) in CiML 2019: Machine Learning Competitions for All, Stolovitzky, Pradhan, Duboue, Tang, Natekin, Abstract 7: Learning To Run a Power Network Bondi, Bouthillier, Milani, Müller, Holzinger, Competition in CiML 2019: Machine Harrer, Day, Ustyuzhanin, Guss, Mirmomeni Learning Competitions for All, Donnot 11:30 12:15 PM AM Accepted Posters We present the results of the frst edition as well as some perspective for a next potential edition Kandinsky Patterns: An open toolbox for creating of the "Learning To Run a Power Network" explainable machine learning challenges (L2RPN) competition to test the potential of Heimo Muller · Andreas Holzinger Reinforcement Learning to solve a real-world problem of great practical importance: controlling MOCA: An Unsupervised Algorithm for Optimal power transportation in power grids while Aggregation of Challenge Submissions keeping people and equipment safe. Robert Vogel · Mehmet Eren Ahsen · Gustavo A. Stolovitzky

Abstract 8: The AI Driving Olympics: An FDL: Mission Support Challenge Accessible Robot Learning Benchmark in Luís F. Simões · Ben Day · Vinutha M. Shreenath · Callum Wilson CiML 2019: Machine Learning Competitions for All, Walter 11:45 AM From data challenges to collaborative gig science. Despite recent breakthroughs, the ability of deep Coopetitive research process and platform learning and reinforcement learning to Andrey Ustyuzhanin · Mikhail Belous · Leyla outperform traditional approaches to control Khatbullina · Giles Strong physically embodied robotic agents remains largely unproven. To help bridge this gap, we Smart(er) Machine Learning for Practitioners have developed the “AI Driving Olympics” (AI- Prabhu Pradhan DO), a competition with the objective of evaluating the state-of-the-art in machine Improving Reproducibility of Benchmarks learning and artifcial intelligence for mobile Xavier Bouthillier robotics. Based on the simple and well specifed autonomous driving and navigation environment Guaranteeing Reproducibility in Deep Learning called “Duckietown,” AI-DO includes a series of Competitions tasks of increasing complexity—from simple lane- Brandon Houghton following to feet management. For each task, we provide tools for competitors to use in the form of Organizing crowd-sourced AI challenges in 39 Dec. 13, 2019

enterprise environments: opportunities and will propose a new competition design that challenges instead emphasizes scientifc insight by dividing Mahtab Mirmomeni · Isabell Kiral · Subhrajit Roy · the various ways in which teams could improve Todd Mummert · Alan Braz · Jason Tsay · Jianbin performance into (largely orthogonal) modular Tang · Umar Asif · Thomas Schafter · Eren components, each of which defnes its own Mehmet · Bruno De Assis Marques · Stefan competition. E.g., one could run a competition Maetschke · Rania Khalaf · Michal Rosen-Zvi · John focussing only on efective hyperparameter Cohn · Gustavo Stolovitzky · Stefan Harrer tuning of a given pipeline (across private datasets). With the same code base and WikiCities: a Feature Engineering Educational datasets, one could likewise run a competition Resource focussing only on fnding better neural Pablo Duboue architectures, or only better preprocessing methods, or only a better training pipeline, or Reinforcement Learning Meets Information only better pre-training methods, etc. One could Seeking: Dynamic Search Challenge also run multiple of these competitions in Zhiwen Tang · Grace Hui Yang parallel, hot-swapping better components found in one competition into the other competitions. I AI Journey 2019: School Tests Solving Competition will argue that the result would likely be substantially more valuable in terms of scientifc Alexey Natekin · Peter Romov · Valentin Malykh insights than traditional competitions and may even lead to better fnal performance. A BIRDSAI View for Conservation Elizabeth Bondi · Milind Tambe · Raghav Jain · Palash Aggrawal · Saket Anand · Robert Abstract 12: Design and Analysis of Hannaford · Ashish Kapoor · Jim Piavis · Shital Experiments: A Challenge Approach in Shah · Lucas Joppa · Bistra Dilkina Teaching in CiML 2019: Machine Learning Competitions for All, Pavao 02:45 PM

Abstract 11: Frank Hutter (University of Over the past few years, we have explored the Freiburg) "A Proposal for a New benefts of involving students both in organizing and in participating in challenges as a Competition Design Emphasizing Scientifc pedagogical tool, as part of an international Insights" in CiML 2019: Machine Learning Competitions for All, Hutter 02:00 PM collaboration. Engaging in the design and resolution of a competition can be seen as a The typical setup in machine learning hands-on means of learning proper design and competitions is to provide one or more datasets analysis of experiments and gaining a deeper and a performance metric, leaving it entirely up understanding other aspects of Machine to participants which approach to use, how to Learning. Graduate students of University Paris- engineer better features, whether and how to Sud (Paris, France) are involved in class projects pretrain models on related data, how to tune in creating a challenge end-to-end, from defning hyperparameters, how to combine multiple the research problem, collecting or formatting models in an ensemble, etc. The fact that work data, creating a starting kit, to implementing and on each of these components often leads to testing the website. The application domains and substantial improvements has several types of data are extremely diverse: medicine, consequences: (1) amongst several skilled teams, ecology, marketing, computer vision, the one with the most manpower and recommendation, text processing, etc. The engineering drive often wins; (2) it is often challenges thus created are then used as class unclear *why* one entry performs better than projects of undergraduate students who have to another one; and (3) scientifc insights remain solve them, both at University Paris-Sud, and at limited. Rensselaer Polytechnic Institute (RPI, New York, USA), to provide rich learning experiences at Based on my experience in both participating in scale. New this year, students are involved in several challenges and also organizing some, I creating challenges motivated by “AI for good” 40 Dec. 13, 2019

and will create re-usable templates to inspire Natural disasters are one of the oldest threats to others to create challenges for the beneft of not just individuals but to the societies they co- humanity. exist in. As a result, humanity has ceaselessly sought way to provide assistance to people in need after disasters have struck. Further, natural disasters are but a single, extreme example of Abstract 13: The model-to-data paradigm: the many possible humanitarian crises. Disease overcoming data access barriers in biomedical competitions in CiML 2019: outbreak, famine, and oppression against Machine Learning Competitions for All, disadvantaged groups can pose even greater Guinney 03:00 PM dangers to people that have less obvious solutions. Data competitions often rely on the physical In this proposed workshop, we seek to bring distribution of data to challenge participants, a together the Artifcial Intelligence (AI) and signifcant limitation given that much data is Humanitarian Assistance and Disaster Response proprietary, sensitive, and often non-shareable. (HADR) communities in order to bring AI to bear To address this, the DREAM Challenges has on real-world humanitarian crises. advanced a challenge framework called modelto- Through this workshop, we intend to establish data (MTD), requiring participants to submit re- meaningful dialogue between the communities. runnable algorithms instead of model predictions. The DREAM organization has successfully By the end of the workshop, the NeurIPS research completed multiple MTD-based challenges, and is community can come to understand the practical expanding this approach to unlock highly challenges of in aiding those in crisis, while the sensitive and non-distributable human data for HADR can understand the landscape that is the use in biomedical data challenges. state of art and practice in AI. Through this, we seek to begin establishing a pipeline of transitioning the research created by the NeurIPS community to real-world Abstract 16: Open Space Topic “The Organization of Challenges for the Beneft humanitarian issues. of More Diverse Communities” in CiML 2019: Machine Learning Competitions for Schedule All, Mendrik, Guyon, Tu, Viegas, LI 04:15 PM

“Open Space” is a technique for running 08:00 Introduction and Welcome Gupta, meetings where the participants create and AM Sajeev manage the agenda themselves. Participants can 08:15 Invited Talks (x4) Matias, Adole, Brown, propose ideas that address the open space topic, AM Jaimes these will be divided into various sessions that all 10:15 Break other participants can join and brainstorm about. AM After the open space we will collect all the ideas 10:30 Spotlight Talks 1 Wang, Dalmasso, and post them on the CiML website. AM Doshi, Seo, Kapadia, Kruspe 12:00 Lunch PM AI for Humanitarian Assistance and 01:30 Invited Talks (x2) Rasmussen, PM Stromberg Disaster Response 02:30 Spotlight Talks 2 Veitch-Michaelis,

Ritwik Gupta, Robin Murphy, Trevor Darrell, Eric PM Sidrane, Nevo, Oh Heim, Zhangyang Wang, Bryce Goodman, Piotr 03:30 Break Biliński PM 03:45 Spotlight Talks 3 Schrempf, Dubey, Lu West 217 - 219, Fri Dec 13, 08:00 AM PM 41 Dec. 13, 2019

04:30 Convergence: Two-Way Limitations Response, Veitch-Michaelis, Sidrane, Nevo, Oh PM in Taking Theory to Applications 02:30 PM Dzombak, Yang, Paper IDs 05:15 Poster Session * 18 - "Flood Detection On Low Cost Orbital PM Hardware" * 19 - "Machine Learning for Generalizable Abstracts (6): Prediction of Flood Susceptibility" * 20 - "Inundation Modeling in Data Scarce Abstract 2: Invited Talks (x4) in AI for Regions" Humanitarian Assistance and Disaster * 24 - "Explainable Semantic Mapping for First Response, Matias, Adole, Brown, Jaimes 08:15 Responders" AM

* Tracy Adole (WorldPop) * Yossi Matias (Google) Abstract 9: Spotlight Talks 3 in AI for * Col Jason Brown (US Air Force) Humanitarian Assistance and Disaster * Alex Jaimes (Dataminr) Response, Schrempf, Dubey, Lu 03:45 PM

Paper IDs * 7 - "Language Transfer for Early Warning of Abstract 4: Spotlight Talks 1 in AI for Epidemics from Social Media" Humanitarian Assistance and Disaster * 26 - "Cognitive Agent Based Simulation Model Response, Wang, Dalmasso, Doshi, Seo, For Improving Disaster Response Procedures" Kapadia, Kruspe 10:30 AM * 27 - "Building Damage Detection in Satellite Imagery Using Convolutional Neural Networks" Paper IDs * 8 - "Two Case Studies of Building Modeling using Machine Learning" * 9 - "Feature Engineering for Entity Resolution Abstract 10: Convergence: Two-Way with Arabic Names: Improving Estimates of Limitations in Taking Theory to Applications Observed Casualties in the Syrian Civil War" in AI for Humanitarian Assistance and * 12 - "Revisiting Classical Bagging with Modern Disaster Response, Dzombak, Yang, 04:30 PM Transfer Learning for On-the-fy Disaster Damage Speakers from Berkeley, Oak Ridge National Lab, Detector" Red Cross, and more. * 13 - "FireNet: Real-time Segmentation of Fire Perimeter from Aerial Video" * 14 - "Deep Crowd-Flow Prediction in Built Environments" Shared Visual Representations in Human * 15 - "Few-shot Tweet Detection in Emerging Disaster Events" and Machine Intelligence

Arturo Deza, Joshua Peterson, Apurva Ratan Murty, Tom Grifths Abstract 6: Invited Talks (x2) in AI for Humanitarian Assistance and Disaster West 220 - 222, Fri Dec 13, 08:00 AM Response, Rasmussen, Stromberg 01:30 PM The goal of the Shared Visual Representations in * Maj Megan Stromberg (US Air Force) Human and Machine Intelligence (SVRHM) * Eric Rasmussen (Infnitum Humanitarian workshop is to disseminate relevant, parallel Systems) fndings in the felds of computational neuroscience, psychology, and cognitive science that may inform modern machine learning Abstract 7: Spotlight Talks 2 in AI for methods. Humanitarian Assistance and Disaster 42 Dec. 13, 2019

In the past few years, machine learning methods 11:50 Making the next generation of —especially deep neural networks—have widely AM machine learning datasets: permeated the vision science, cognitive science, ObjectNet a new object recognition and neuroscience communities. As a result, benchmark Barbu scientifc modeling in these felds has greatly 12:15 The building blocks of vision Tarr benefted, producing a swath of potentially PM critical new insights into human learning and 02:00 Poster Session Harris, White, Choung, intelligence, which remains the gold standard for PM Shinozaki, Pal, Hermann, Borowski, many tasks. However, the machine learning Fosco, Firestone, Veerabadran, Lahner, community has been largely unaware of these Ryali, Doshi, Singh, Zhou, Besserve, cross-disciplinary insights and analytical tools, Chang, Newman, Niranjan, Hare, Mihai, which may help to solve many of the current Savvides, Kornblith, Funke, Oliva, de Sa, problems that ML theorists and engineers face Krotov, Conwell, Alvarez, Kolchinski, today (e.g., adversarial attacks, compression, Zhao, Gordon, Bernstein, Ermon, continual learning, and unsupervised learning). Mehrjou, Schölkopf, Co-Reyes, Janner, Wu, Tenenbaum, Levine, Mohsenzadeh, Thus we propose to invite leading cognitive Zhou scientists with strong computational backgrounds 03:00 Q&A from the Audience. Ask the to disseminate their fndings to the machine PM Grad Students Grant, Battleday, learning community with the hope of closing the Sanborn, Chang, Parthasarathy loop by nourishing new ideas and creating cross- 03:30 Object representation in the human disciplinary collaborations. PM visual system Konkle 03:55 Cognitive computational See more information at the ofcial conference PM neuroscience of vision Kriegeskorte website: https://www.svrhm2019.com/ Follow us on twitter for announcements: https:// 04:20 Perturbation-based remodeling of twitter.com/svrhm2019 PM visual neural network representations Bethge 04:45 Local gain control and perceptual Schedule PM invariances Simoncelli 05:10 Panel Discussion: What sorts of 08:50 Opening Remarks Deza, Peterson, PM cognitive or biological AM Murty, Grifths (architectural) inductive biases will be crucial for developing efective 09:00 Predictable representations in artifcial intelligence? Higgins, Konkle, AM humans and machines Henaf Bethge, Kriegeskorte 09:25 What is disentangling and does 06:00 Concluding Remarks & Prizes AM intelligence need it? Higgins PM Ceremony Deza, Peterson, Murty, 09:50 Cofee Break Grifths AM 06:10 Evening Reception 10:10 A "distribution mismatch" dataset PM AM for comparing representational similarity in ANNs and the brain Xiao 10:35 Feathers, wings and the future of Abstracts (8): AM computer vision research Freeman Abstract 2: Predictable representations in 11:00 Taxonomic structure in learning from humans and machines in Shared Visual AM few positive examples Grant Representations in Human and Machine 11:25 CIFAR-10H: using human-derived Intelligence, Henaf 09:00 AM AM soft-label distributions to support more robust and generalizable Despite recent progress in artifcial intelligence, classifcation Battleday humans and animals vastly surpass machine agents in their ability to quickly learn about their 43 Dec. 13, 2019

environment. While humans generalize to new meaningful and diverse samples beyond the concepts from small numbers of examples, state- training data distribution. of-the-art artifcial neural networks still require huge amounts of supervision. We hypothesize that humans beneft from such data-efciency Abstract 5: A "distribution mismatch" dataset because their internal representations support a for comparing representational similarity in much wider set tasks (such as planning and ANNs and the brain in Shared Visual decision-making) which often require making Representations in Human and Machine predictions about future events. Using the Intelligence, Xiao 10:10 AM curvature of natural videos as a measure of predictability, we fnd that human perceptual A "distribution mismatch" dataset for comparing representations are indeed more predictable than representational similarity in ANNs and the brain their inputs, whereas current deep neural networks are not. Conversely, by optimizing neural networks for an information-theoretic Abstract 8: CIFAR-10H: using human-derived measure of predictability, we arrive at artifcial soft-label distributions to support more classifers whose data-efciency greatly robust and generalizable classifcation in surpasses that of purely supervised ones. Shared Visual Representations in Human Learning predictable representations may and Machine Intelligence, Battleday 11:25 AM therefore enable artifcial systems to perceive the world in a manner that is closer to biological The classifcation performance of deep neural ones. networks has begun to asymptote at near-perfect levels on natural image benchmarks. However, their ability to generalize outside the training set Abstract 3: What is disentangling and does and their robustness to adversarial attacks have not. Humans, by contrast, exhibit robust and intelligence need it? in Shared Visual graceful generalization far outside their set of Representations in Human and Machine Intelligence, Higgins 09:25 AM training samples. In this talk, I will discuss one strategy for translating these properties to Despite the advances in modern deep learning machine-learning classifers: training them to be approaches, we are still quite far from the uncertain in the same way as humans, rather generality, robustness and data efciency of than always right. When we integrate human biological intelligence. In this talk I will suggest uncertainty into training paradigms by using that this gap may be narrowed by re-focusing human guess distributions as labels, we fnd the from implicit representation learning prevalent in generalize better and are more robust to end-to-end deep learning approaches to explicit adversarial attacks. Rather than expect all image unsupervised representation learning. In datasets to come with such labels, we instead particular, I will discuss the value of disentangled intend our CIFAR10H dataset to be used as a gold visual representations acquired in an standard, with which algorithmic means of unsupervised manner loosely inspired by capturing the same information can be biological intelligence. In particular, this talk will evaluated. To illustrate this, I present one connect disentangling with the ideas of symmetry automated method that does so—deep prototype transformations from physics to make a claim models inspired by the cognitive science that disentangled representations refect literature. important world structure. I will then go over a few frst demonstrations of how such representations can be useful in practice for Abstract 12: Q&A from the Audience. Ask the continual learning, acquiring reinforcement Grad Students in Shared Visual learning (RL) policies that are more robust to Representations in Human and Machine transfer scenarios that standard RL approaches, Intelligence, Grant, Battleday, Sanborn, Chang, and building abstract compositional visual Parthasarathy 03:00 PM concepts which make possible imagination of 44 Dec. 13, 2019

"Cross-disciplinary research experiences and tips same time, maximize their societal benefts. In for Graduate School Admissions Panelists" this workshop, we address a wide range of challenges from diverse, multi-disciplinary Panelists: viewpoints. We bring together experts from a Erin Grant (UC Berkeley) diverse set of backgrounds. Our speakers are Nadine Chang (CMU) leading experts in ML, human-computer Ruairidh Battleday (Princeton) interaction, ethics, and law. Each of our speakers Sophia Sanborn (UC Berkeley) will focus on one core human-centred challenge Nikhil Parthasarathy (NYU) (namely, fairness, accountability, interpretability, transparency, security, and privacy) in specifc application domains (such as medicine, welfare programs, governance, and regulation). One of Abstract 17: Panel Discussion: What sorts of cognitive or biological (architectural) the main goals of this workshop is to help the inductive biases will be crucial for community understand where it stands after a developing efective artifcial intelligence? few years of rapid technical development and identify promising research directions to pursue in Shared Visual Representations in Human in the years to come. Our speakers identify in and Machine Intelligence, Higgins, Konkle, Bethge, Kriegeskorte 05:10 PM their presentations 3-5 research directions that they consider to be of crucial importance. These Panelists: Irina Higgins (DeepMind), Talia Konkle directions are further debated in one of our panel (Harvard), Nikolaus Kriegeskorte (Columbia), discussions. Matthias Bethge (Universität Tübingen) Schedule

Abstract 18: Concluding Remarks & Prizes Ceremony in Shared Visual Representations 08:30 Welcome and introduction in Human and Machine Intelligence, Deza, AM Peterson, Murty, Grifths 06:00 PM 08:45 Invited talk #1 Gummadi AM Best Paper Award Prize (NVIDIA Titan RTX) and 09:15 Contributed talks (3) Best Poster Award Prize (Oculus Quest) AM 10:00 Panel #1: On the role of industry, AM academia, and government in Abstract 19: Evening Reception in Shared developing HCML Visual Representations in Human and 10:30 Cofe break Machine Intelligence, 06:10 PM AM Sponsored by MIT Quest for Intelligence 11:00 Invited talk #2 Mulligan AM 11:30 Contributed talks (2) AM Workshop on Human-Centric Machine 12:00 Lunch and poster session Learning PM 01:30 Invited talk #3 Roth Plamen P Angelov, Nuria Oliver, Adrian Weller, PM Manuel Rodriguez, Isabel Valera, Silvia Chiappa, 02:00 Contributed talks (4) Hoda Heidari, Niki Kilbertus PM

West 223 + 224, Fri Dec 13, 08:00 AM 03:00 Cofee break PM The growing feld of Human-centric ML seeks to 03:30 Invited talk #4 Doshi-Velez minimize the potential harms, risks, and burdens PM of technologies on the public, and at the 45 Dec. 13, 2019

04:00 Invited talk #5 Kim little or no learning, deep neural networks enable PM superior performance for classical linear inverse 04:30 Panel #2: Future research directions problems such as denoising and compressive PM and interdisciplinary collaborations sensing. Motivated by those success stories, in HCML researchers are redesigning traditional imaging 05:00 Poster session Gu, Xiang, Kasirzadeh, and sensing systems. PM Han, Florez, Harder, Nguyen, Akhavan Rahnama, Donini, Slack, Ali, Koley, However, the feld is mostly wide open with a Bakker, Hilgard, James-Sorenson, Ramos, range of theoretical and practical questions Lu, Yang, Boyarskaya, Pawelczyk, Sokol, unanswered. In particular, deep-neural network Jaiswal, Bhatt, Alvarez-Melis, Grover, based approaches often lack the guarantees of Marx, Yang, Wang, Çapan, Wang, the traditional physics based methods, and while Grünewälder, Khajehnejad, Patro, Kunes, typically superior can make drastic reconstruction Deng, Liu, Oneto, Li, , Matthes, Tu errors, such as fantasizing a tumor in an MRI reconstruction. 06:00 Closing remarks PM This workshop aims at bringing together theoreticians and practitioners in order to chart out recent advances and discuss new directions Solving inverse problems with deep in deep neural network based approaches for networks: New architectures, theoretical solving inverse problems in the imaging and foundations, and applications network sciences.

Reinhard Heckel, Paul Hand, Richard Baraniuk, Joan Bruna, Alex Dimakis, Deanna Needell Schedule

West 301 - 305, Fri Dec 13, 08:00 AM 08:30 Opening Remarks Heckel, Hand, AM Dimakis, Bruna, Needell, Baraniuk There is a long history of algorithmic development for solving inverse problems arising 08:40 The spiked matrix model with in sensing and imaging systems and beyond. AM generative priors Zdeborová Examples include medical and computational 09:10 Robust One-Bit Recovery via ReLU imaging, compressive sensing, as well as AM Generative Networks: Improved community detection in networks. Until recently, Statistical Rate and Global most algorithms for solving inverse problems in Landscape Analysis Qiu, Wei, Yang the imaging and network sciences were based on 09:40 Cofee Break static signal models derived from physics or AM intuition, such as wavelets or sparse 10:30 Computational microscopy in representations. AM scattering media Waller 11:00 Denoising via Early Stopping Today, the best performing approaches for the AM Soltanolkotabi aforementioned image reconstruction and 11:30 Neural Reparameterization Improves sensing problems are based on deep learning, AM Structural Optimization Hoyer, Sohl- which learn various elements of the method Dickstein, Greydanus including i) signal representations, ii) stepsizes 12:00 Lunch Break and parameters of iterative algorithms, iii) PM regularizers, and iv) entire inverse functions. For 02:00 Learning-Based Low-Rank example, it has recently been shown that solving PM Approximations Indyk a variety of inverse problems by transforming an 02:30 Blind Denoising, Self-Supervision, iterative, physics-based algorithm into a deep PM and Implicit Inverse Problems Batson network whose parameters can be learned from training data, ofers faster convergence and/or a 03:00 Learning Regularizers from Data better quality solution. Moreover, even with very PM Chandrasekaran 46 Dec. 13, 2019

04:15 Poster Session Scarlett, Indyk, Vakilian, algorithms and analyze their performance and PM Weller, Mitra, Aubin, Loureiro, Krzakala, thresholds using random matrix theory, showing Zdeborová, Monakhova, Yurtsever, their superiority to the classical principal Waller, Sommerhof, Moeller, Anirudh, component analysis. We complement our Qiu, Wei, Yang, Thiagarajan, Asif, theoretical results by illustrating the performance Gillhofer, Brandstetter, Hochreiter, of the spectral algorithms when the spikes come Petersen, Patel, Oberai, Kamath, from real datasets. Karmalkar, Price, Ahmed, Kadkhodaie, Mohan, Simoncelli, Fernandez-Granda, Leong, Sakla, Willett, Hoyer, Sohl- Abstract 3: Robust One-Bit Recovery via ReLU Dickstein, Greydanus, Jagatap, Hegde, Generative Networks: Improved Statistical Kellman, Tamir, Laanait, Dia, Ravanelli, Rate and Global Landscape Analysis in Binas, Rostamzadeh, Jalali, Fang, Solving inverse problems with deep Schwing, Lachapelle, Brouillard, Deleu, networks: New architectures, theoretical Lacoste-Julien, Yu, Mazumdar, Rawat, foundations, and applications, Qiu, Wei, Yang Zhao, Chen, Li, Ramsauer, Rizzuti, 09:10 AM Mitsakos, Cao, Strohmer, Li, Peng, Ongie We study the robust one-bit compressed sensing problem whose goal is to design an algorithm Abstracts (6): that faithfully recovers any sparse target vector $ \theta_0\in\mathbb{R}^d$ \emph{uniformly} Abstract 2: The spiked matrix model with from $m$ quantized noisy measurements. Under generative priors in Solving inverse the assumption that the measurements are sub- problems with deep networks: New Gaussian, to recover any $k$-sparse $\theta_0$ architectures, theoretical foundations, and ($k\ll d$) \emph{uniformly} up to an error $ applications, Zdeborová 08:40 AM \varepsilon$ with high probability, the best known Using a low-dimensional parametrization of computationally tractable algorithm signals is a generic and powerful way to enhance requires\footnote{Here, an algorithm is performance in signal processing and statistical ``computationally tractable'' if it has provable inference. A very popular and widely explored convergence guarantees. The notation $ type of dimensionality reduction is sparsity; \tilde{\mathcal{O}}(\cdot)$ omits a logarithm another type is generative modelling of signal factor of $\varepsilon^{-1}$.} distributions. Generative models based on neural $m\geq\tilde{\mathcal{O}}(k\log d/ networks, such as GANs or variational auto- \varepsilon^4)$. In this paper, we consider a new encoders, are particularly performant and are framework for the one-bit sensing problem where gaining on applicability. In this paper we study the sparsity is implicitly enforced via mapping a spiked matrix models, where a low-rank matrix is low dimensional representation $x_0$ through a observed through a noisy channel. This problem known $n$-layer ReLU generative network $G: with sparse structure of the spikes has attracted \mathbb{R}^k\rightarrow\mathbb{R}^d$. Such broad attention in the past literature. Here, we a framework poses low-dimensional priors on $ replace the sparsity assumption by generative \theta_0$ without a known basis. We propose to modelling, and investigate the consequences on recover the target $G(x_0)$ via an unconstrained statistical and algorithmic properties. We analyze empirical risk minimization (ERM) problem under the Bayes-optimal performance under specifc a much weaker \emph{sub-exponential generative models for the spike. In contrast with measurement assumption}. For such a problem, the sparsity assumption, we do not observe we establish a joint statistical and computational regions of parameters where statistical analysis. In particular, we prove that the ERM performance is superior to the best known estimator in this new framework achieves an algorithmic performance. We show that in the improved statistical rate of analyzed cases the approximate message $m=\tilde{\mathcal{O}} (kn\log d /\epsilon^2)$ passing algorithm is able to reach optimal recovering any $G(x_0)$ uniformly up to an error performance. We also design enhanced spectral $\varepsilon$. Moreover, from the lens of computation, despite non-convexity, we prove 47 Dec. 13, 2019

that the objective of our ERM problem has no often better solutions. On a selection of 116 spurious stationary point, that is, any stationary structural optimization tasks, our approach point is equally good for recovering the true produces an optimal design 50% more often than target up to scaling with a certain accuracy. Our the best baseline method. analysis sheds some light on the possibility of inverting a deep generative model under partial and quantized measurements, complementing Abstract 10: Blind Denoising, Self- the recent success of using deep generative Supervision, and Implicit Inverse Problems models for inverse problems. in Solving inverse problems with deep networks: New architectures, theoretical foundations, and applications, Batson 02:30 Abstract 5: Computational microscopy in PM scattering media in Solving inverse problems with deep networks: New We will discuss a self-supervised approach to the architectures, theoretical foundations, and foundational inverse problem of denoising (Noise2Self). By taking advantage of statistical applications, Waller 10:30 AM independence in the noise, we can estimate the Computational imaging involves the joint design mean-square error for a large class of deep of imaging system hardware and software, architectures without access to ground truth. This optimizing across the entire pipeline from allows us to train a neural network to denoise acquisition to reconstruction. Computers can from noisy data alone, and also to compare replace bulky and expensive optics by solving between architectures, selecting one which will computational inverse problems. This talk will produce images with the lowest MSE. However, describe new microscopes that use architectures with the same MSE performance computational imaging to enable 3D fuorescence can produce qualitatively diferent results, i.e., and phase measurement using image the hypersurface of images with fxed MSE is very reconstruction algorithms that are based on heterogeneous. We will discuss ongoing work in large-scale nonlinear non-convex optimization understanding the types of artifacts which combined with unrolled neural networks. We diferent denoising architectures give rise to. further discuss engineering of data capture for computational microscopes by end-to-end learned design. Abstract 11: Learning Regularizers from Data in Solving inverse problems with deep networks: New architectures, theoretical Abstract 7: Neural Reparameterization foundations, and applications, Improves Structural Optimization in Solving Chandrasekaran 03:00 PM inverse problems with deep networks: New architectures, theoretical foundations, and Regularization techniques are widely employed in applications, Hoyer, Sohl-Dickstein, Greydanus the solution of 11:30 AM inverse problems in data analysis and scientifc computing due to Structural optimization is a popular method for their efectiveness in addressing difculties due designing objects such as bridge trusses, airplane to ill-posedness. wings, and optical devices. Unfortunately, the In their most common manifestation, these quality of solutions depends heavily on how the methods take the form of problem is parameterized. In this paper, we penalty functions added to the objective in propose using the implicit bias over functions variational approaches for induced by neural networks to improve the solving inverse problems. The purpose of the parameterization of structural optimization. penalty function is to Rather than directly optimizing densities on a induce a desired structure in the solution, and grid, we instead optimize the parameters of a these functions are neural network which outputs those densities. specifed based on prior domain-specifc This reparameterization leads to diferent and expertise. We consider the 48 Dec. 13, 2019

problem of learning suitable regularization approach to designing the hardware, software, functions from data in and settings in which precise domain knowledge is intelligence algorithms to achieve the best not directly available; power, performance, and area (PPA). the objective is to identify a regularizer to promote the type of Topics: structure contained in the data. The regularizers - Architectures for the edge: IoT, automotive, and obtained using our mobile framework are specifed as convex functions that - Approximation, quantization reduced precision can be computed computing efciently via semidefnite programming. Our - Hardware/software techniques for sparsity approach for learning - Neural network architectures for resource such semidefnite regularizers combines recent constrained devices techniques for rank - Neural network pruning, tuning and and minimization problems along with the Operator automatic architecture search Sinkhorn procedure. - Novel memory architectures for machine (Joint work with Yong Sheng Soh) learning - Communication/computation scheduling for better performance and energy - Load balancing and efcient task distribution EMC2: Energy Efcient Machine Learning techniques and Cognitive Computing (5th edition) - Exploring the interplay between precision, performance, power and energy

Raj Parihar, Michael Goldfarb, Satyam Srivastava, - Exploration of new and efcient applications for TAO SHENG, Debajyoti Pal machine learning - Characterization of machine learning West 306, Fri Dec 13, 08:00 AM benchmarks and workloads - Performance profling and synthesis of A new wave of intelligent computing, driven by workloads recent advances in machine learning and - Simulation and emulation techniques, cognitive algorithms coupled with process frameworks and platforms for machine learning technology and new design methodologies, has - Power, performance and area (PPA) based the potential to usher unprecedented disruption comparison of neural networks in the way modern computing systems are - Verifcation, validation and determinism in designed and deployed. These new and neural networks innovative approaches often provide an attractive - Efcient on-device learning techniques and efcient alternative not only in terms of - Security, safety and privacy challenges and performance but also power, energy, and area. building secure AI systems This disruption is easily visible across the whole spectrum of computing systems -- ranging from low end mobile devices to large Schedule scale data centers and servers including intelligent infrastructures. 08:00 TBD LeCun AM A key class of these intelligent solutions is providing real-time, on-device cognition at the 08:45 Efcient Computing for AI and edge to enable many novel applications including AM Robotics Sze computer vision and image processing, language 09:30 Putting the “Machine” Back in understanding, speech and gesture recognition, AM Machine Learning: The Case for malware detection and autonomous driving. Hardware-ML Model Co-design Naturally, these applications have diverse Marculescu requirements for performance, energy, reliability, 10:00 Poster Session 1 Spasov, Nayak, Diego accuracy, and security that demand a holistic AM Andilla, Zhang, Trivedi 49 Dec. 13, 2019

10:30 Abandoning the Dark Arts: New application. In this talk, we will describe how joint AM Directions in Efcient DNN Design algorithm and hardware design can be used to Keutzer reduce energy consumption while delivering real- 11:00 Hardware-aware Neural Architecture time and robust performance for applications AM Design for Small and Fast Models: including deep learning, computer vision, from 2D to 3D Han autonomous navigation/exploration and video/ 11:30 Oral Session 1 Yu, Hartmann, Li, image processing. We will show how energy- AM Shafee, Yang, Zafrir efcient techniques that exploit correlation and sparsity to reduce compute, data movement and 12:30 Lunch storage costs can be applied to various tasks PM including image classifcation, depth estimation, 02:00 Cheap, Fast, and Low Power Deep super-resolution, localization and mapping. PM Learning: I need it now! Delp 02:45 Advances and Prospects for In- PM memory Computing Verma Abstract 3: Putting the “Machine” Back in 03:15 Algorithm-Accelerator Co-Design for Machine Learning: The Case for Hardware- PM Neural Network Specialization Zhang ML Model Co-design in EMC2: Energy 03:45 Poster Session 2 Prato, Thakker, Efcient Machine Learning and Cognitive PM Galindez Olascoaga, Zhang, Partovi Nia, Computing (5th edition), Marculescu 09:30 AM Adamczewski 04:15 Adaptive Multi-Task Neural Networks Machine learning (ML) applications have entered PM for Efcient Inference Feris and impacted our lives unlike any other 04:45 Kernel and Graph Optimization for technology advance from the recent past. Indeed, PM DL Model Execution Lee almost every aspect of how we live or interact 05:15 Confgurable Cloud-Scale DNN with others relies on or uses ML for applications PM Processor for Real-Time AI Darvish ranging from image classifcation and object Rouhani detection, to processing multi‐modal and 05:45 Oral Session 2 Liao, McKinstry, Izsak, Li, heterogeneous datasets. While the holy grail for PM Huang, Mordido judging the quality of a ML model has largely been serving accuracy, and only recently its resource usage, neither of these metrics translate Abstracts (7): directly to energy efciency, runtime, or mobile device battery lifetime. This talk will uncover the Abstract 1: TBD in EMC2: Energy Efcient need for building accurate, platform‐specifc Machine Learning and Cognitive Computing power and latency models for convolutional (5th edition), LeCun 08:00 AM neural networks (CNNs) and efcient hardware- aware CNN design methodologies, thus allowing TBD machine learners and hardware designers to identify not just the best accuracy NN confguration, but also those that satisfy given Abstract 2: Efcient Computing for AI and hardware constraints. Our proposed modeling Robotics in EMC2: Energy Efcient Machine framework is applicable to both high‐end and Learning and Cognitive Computing (5th mobile platforms and achieves 88.24% accuracy edition), Sze 08:45 AM for latency, 88.34% for power, and 97.21% for energy prediction. Using similar predictive Computing near the sensor is preferred over the models, we demonstrate a novel diferentiable cloud due to privacy and/or latency concerns for neural architecture search (NAS) framework, a wide range of applications including robotics/ dubbed Single-Path NAS, that uses one single- drones, self-driving cars, smart Internet of Things, path over-parameterized CNN to encode all and portable/wearable electronics. However, at architectural decisions based on shared the sensor there are often stringent constraints convolutional kernel parameters. Single-Path NAS on energy consumption and cost in addition to achieves state-of-the-art top-1 ImageNet the throughput and accuracy requirements of the accuracy (75.62%), outperforming existing 50 Dec. 13, 2019

mobile NAS methods for similar latency Abstract 10: Advances and Prospects for In- constraints (∼80ms) and fnds the fnal memory Computing in EMC2: Energy confguration up to 5,000× faster compared to Efcient Machine Learning and Cognitive prior work. Combined with our quantized CNNs Computing (5th edition), Verma 02:45 PM (Flexible Lightweight CNNs or FLightNNs) that customize precision level in a layer-wise fashion Edge AI applications retain the need for high- and achieve almost iso-accuracy at 5-10x energy performing inference models, while driving platforms beyond their limits of energy efciency reduction, such a modeling, analysis, and and throughput. Digital hardware acceleration, optimization framework is poised to lead to true co-design of hardware and ML model, orders of enabling 10-100x gains over general-purpose magnitude faster than state of the art, while architectures, is already widely deployed, but is satisfying both accuracy and latency or energy ultimately restricted by data-movement and memory accessing that dominates deep-learning constraints. computations. In-memory computing, based on both SRAM and emerging memory, ofers fundamentally new tradeofs for overcoming Abstract 5: Abandoning the Dark Arts: New these barriers, with the potential for 10x higher Directions in Efcient DNN Design in EMC2: energy efciency and area-normalized Energy Efcient Machine Learning and throughput demonstrated in recent designs. But, Cognitive Computing (5th edition), Keutzer those tradeofs instate new challenges, especially 10:30 AM afecting scaling to the level of computations Deep Neural Net models have provided the most required, integration in practical heterogeneous architectures, and mapping of diverse software. accurate solutions to a very wide variety of This talk examines those tradeofs to characterize problems in vision, language, and speech; however, the design, training, and optimization of the challenges. It then explores recent research efcient DNNs typically requires resorting to the that provides promising paths forward, making in- “dark arts” of ad hoc methods and extensive memory computing more of a practical reality than ever before. hyperparameter tuning. In this talk we present our progress on abandoning these dark arts by using Diferential Neural Architecture Search to guide the design of efcient DNNs and by using Abstract 11: Algorithm-Accelerator Co-Design Hessian-based methods to guide the processes of for Neural Network Specialization in EMC2: training and quantizing those DNNs. Energy Efcient Machine Learning and Cognitive Computing (5th edition), Zhang 03:15 PM

Abstract 9: Cheap, Fast, and Low Power Deep In recent years, machine learning (ML) with deep Learning: I need it now! in EMC2: Energy neural networks (DNNs) has been widely Efcient Machine Learning and Cognitive deployed in diverse application domains. Computing (5th edition), Delp 02:00 PM However, the growing complexity of DNN models, In this talk I will describe the need for low power the slowdown of technology scaling, and the machine learning systems. I will motivate this by proliferation of edge devices are driving a demand for higher DNN performance and energy describing several current projects at Purdue efciency. ML applications have shifted from University that have a need for energy efcient deep learning and in some cases the real general-purpose processors to dedicated deployment of these methods will not be possible hardware accelerators in both academic and without lower power solutions. The applications commercial settings. In line with this trend, there has been an active body of research on both include precision farming, health care monitoring, algorithms and hardware architectures for neural and edge-based surveillance. network specialization.

This talk presents our recent investigation into DNN optimization and low-precision quantization, 51 Dec. 13, 2019

using a co-design approach featuring agreed to participate. contributions to both algorithms and hardware accelerators. First, we review static network Luke Oakden-Raynor, MBBS (Adelaide) pruning techniques and show a fundamental link , MD/PhD (Stanford) between group convolutions and circulant Lilly Peng, MD/PhD (Google) matrices – two previously disparate lines of Daphne Koller, PhD (in sitro) research in DNN compression. Then we discuss Jef Dean, PhD (Google) channel gating, a dynamic, fne-grained, and trainable technique for DNN acceleration. Unlike Attendees at the workshop will gain an static approaches, channel gating exploits input- appreciation for problems that are unique to the dependent dynamic sparsity at run time. This application of machine learning for healthcare results in a signifcant reduction in compute cost and a better understanding of how machine with a minimal impact on accuracy. Finally, we learning techniques may be leveraged to solve present outlier channel splitting, a technique to important clinical problems. This year’s workshop improve DNN weight quantization by removing builds on the last two NeurIPS ML4H workshops, outliers from the weight distribution without which were both attended by more than 500 retraining. people each year, and helped form the foundations of an emerging research community.

Please see the attached document for the full Machine Learning for Health (ML4H): What program. makes machine learning in medicine diferent? Schedule

Andrew Beam, Tristan Naumann, Brett Beaulieu- Jones, Irene Y Chen, Sam Finlayson, Emily Alsentzer, 08:45 Daphne Koller Talk Adrian Dalca, Matthew McDermott AM 09:15 Spotlight Paper Talks Kapur, Raghu, Li West Ballroom A, Fri Dec 13, 08:00 AM AM The goal of the NeurIPS 2019 Machine Learning 09:45 Cofee Break for Health Workshop (ML4H) is to foster AM collaborations that meaningfully impact medicine 10:30 Alan Karthikesalingam & Nenad by bringing together clinicians, health data AM Tomasev Talk Tomasev experts, and machine learning researchers. 11:00 Paper spotlight talks Asif, Jaeger Attendees at this workshop can also expect to AM broaden their network of collaborators to include 11:20 Message from sponsor Baranov, Ji clinicians and machine learning researchers who AM are focused on solving some of the most import problems in medicine and healthcare. The organizers of this proposal have successfully run NeurIPS workshops in the past and are well- equipped to run this year’s workshop should this proposal be accepted.

This year’s theme of “What makes machine learning in medicine diferent?” aims to elucidate the obstacles that make the development of machine learning models for healthcare uniquely challenging. To speak to this theme, we have received commitments to speak from some of the leading researchers and physicians in this area. Below is a list of confrmed speakers who have 52 Dec. 13, 2019

11:30 Poster Session I Zheng, Kapur, Asif, AM Rozenberg, Gilet, Sidorov, Kumar, Van "Deep Survival Experts: A Fully Parametric Steenkiste, Boag, Ouyang, Jaeger, Liu, Survival Regression Model”, Xinyu Li et al. Balagopalan, Rajan, Skreta, Pattisapu, Goschenhofer, Prabhu, Jin, Gardiner, Li, kumar, Hu, Motani, Lovelace, Roshan, Abstract 5: Paper spotlight talks in Machine Wang, Valmianski, Lee, Mallya, Chaibub Learning for Health (ML4H): What makes Neto, Kemp, Charpignon, Nigam, Weng, machine learning in medicine diferent?, Boughorbel, Bellot, Gondara, Zhang, Asif, Jaeger 11:00 AM Bahadori, Zech, Shao, Choi, Seyyed- Kalantari, Aiken, Bica, Shen, Chin- "Privacy Preserving Human Fall Detection using Cheong, Roy, Baldini, Min, Deschrijver, Video Data”, Umar Asif et al. Marttinen, Pascual Ortiz, Nagesh, Rindtorf, Mulyar, Hoebel, Shaka, "Retina U-Net: Embarrassingly Simple Machart, Gatys, Ng, Hüser, Taylor, Exploitation of Segmentation Supervision for Barbour, Martinez, McCreery, Eyre, Medical Object Detection” Paul Jäger et al. Natarajan, Yi, Ma, Nagpal, Du, Gao, Tuladhar, Shleifer, Ren, Mashouri, Lu, Bagherzadeh-Khiabani, Choudhury, Abstract 8: Emily Fox in Machine Learning for Raghu, Fleming, Jain, YANG, Harley, Health (ML4H): What makes machine Pfohl, Rumetshofer, Fedorov, Dash, Pfau, learning in medicine diferent?, Fox 01:30 PM Tomkins, Targonski, Brudno, Li, Yu, Patel 01:30 Emily Fox Fox Models of Cognition: From Predicting Cognitive PM Impairment to the Brain Networks underlying 02:00 Luke Oakden-Rayner Talk Oakden- Complex Cognitive Processes Talk PM Rayner The ubiquity of smartphone usage in many 02:30 Spotlight Paper Talks Rozenberg, Wei, people’s lives make it a rich source of information PM Xu about a person’s mental and cognitive state. In 03:00 Anna Goldenberg Talk Goldenberg this talk, we frst consider how such data sources PM can be used to provide insights into an 03:30 Cofee Break individual’s potential cognitive impairment. PM Based on a study enriched with subjects 04:15 Lily Peng & Dale Webster talk Peng diagnosed with mild cognitive impairment or PM Alzheimer's disease, we develop structured 04:45 Panel Discussion (Luke Oakden- models of users’ smartphone interactions to PM Rayner, Emily Fox, Alan reveal diferences in phone usage patterns Karthikesalingam, Daphne Koller, between people with and without cognitive Anna Goldenberg) impairment. In particular, we focus on inferring specifc types of phone usage sessions that are predictive of cognitive impairment. Our model Abstracts (4): achieves state-of-the-art results when Abstract 2: Spotlight Paper Talks in Machine discriminating between healthy and symptomatic Learning for Health (ML4H): What makes subjects, and its interpretability enables novel machine learning in medicine diferent?, insights into which aspects of phone usage Kapur, Raghu, Li 09:15 AM strongly relate with cognitive health in our dataset. "Non-Invasive Silent Speech Recognition in Multiple Sclerosis with Dysphonia”, Arnav Kapur We then turn to a scientifc analysis of brain et al. functioning underlying cognitive behaviors. Recent neuroimaging modalities, such as “Transfusion: Understanding Transfer Learning for magnetoencephalography (MEG), provide rich Medical Imaging”, Maithra Raghu et al. descriptions of brain activity over time enabling 53 Dec. 13, 2019

new studies of the neural underpinnings of communities and topics that fall under the complex cognitive processes. We focus on umbrella of metalearning. We expect that the inferring the functional connectivity of auditory presence of these diferent communities will attention using MEG recordings. We explore result in a fruitful exchange of ideas and notions of undirected, contemporaneous stimulate an open discussion about the current interactions using sparse and interpretable deep challenges in metalearning, as well as possible generative models, as well as time-varying solutions. directed interactions using Bayesian dynamical models. Schedule

Abstract 10: Spotlight Paper Talks in Machine 09:00 Opening Remarks Learning for Health (ML4H): What makes AM machine learning in medicine diferent?, 09:10 Meta-learning as hierarchical Rozenberg, Wei, Xu 02:30 PM AM modeling Grant 09:40 How Meta-Learning Could Help Us "Localization with Limited Annotation for Chest X- AM Accomplish Our Grandest AI rays”, Eyal Rozenberg et al. Ambitions, and Early, Exotic Steps in that Direction Clune "Generative Image Translation for Data 10:10 Poster Spotlights 1 Augmentation in Colorectal Histopathology AM Images”, Jerry Wei et al. 10:30 Cofee/Poster session 1 Takagi, Javed, "Pain Evaluation in Video using Extended AM Sommer, Sharaf, D'Oro, Wei, Doveh, Multitask Learning from Multidimensional White, Gonzalez, Nguyen, Li, Yu, Measurements”, Xiaojing Xu et al. Ramalho, Nomura, Alvi, Ton, Huang, Lee, Flennerhag, Zhang, Friesen, Blomstedt, Dubatovka, Bartunov, Yi, Shcherbatyi, Simon, Shang, MacLeod, Liu, Fowl, Mesquita, Quillen Meta-Learning 11:30 Interaction of Model-based RL and

Roberto Calandra, Ignasi Clavera Gilaberte, Frank AM Meta-RL Abbeel Hutter, Joaquin Vanschoren, Jane Wang 12:00 Discussion 1 PM West Ballroom B, Fri Dec 13, 08:00 AM 02:00 Abstraction & Meta-Reinforcement PM Learning Abel Recent years have seen rapid progress in meta­‐ 02:30 Scalable Meta-Learning Hadsell learning methods, which learn (and optimize) the PM performance of learning methods based on data, generate new learning methods from scratch, 03:00 Poster Spotlights 2 and learn to transfer knowledge across tasks and PM domains. Metalearning can be seen as the logical 03:20 Cofee/Poster session 2 Song, Mangla, conclusion of the arc that machine learning has PM Salinas, Zhuang, Feng, Hu, Puri, Maddox, undergone in the last decade, from learning Raghu, Tossou, Yin, Dasgupta, Lee, Alet, classifers, to learning representations, and fnally Xu, Franke, Harrison, Warrell, Dhillon, to learning algorithms that themselves acquire Zela, Qiu, Siems, Mendonca, representations and classifers. The ability to Schlessinger, Li, Manolache, Dutta, Glass, improve one’s own learning capabilities through Singh, Koehler experience can also be viewed as a hallmark of 04:30 Contributed Talk 1: Meta-Learning intelligent beings, and there are strong PM with Warped Gradient Descent connections with work on human learning in (Sebastian Flennerhag) neuroscience. The goal of this workshop is to 04:45 Contributed Talk 2: MetaPix: Few- bring together researchers from all the diferent PM shot video retargeting (Jessica Lee) 54 Dec. 13, 2019

05:00 Compositional generalization in learned, simple representations can underlie PM minds and machines Lake efcient learning in complex environments. 05:30 Discussion 2 Second, I analyze the problem of searching for PM options that make planning more efcient. I present new computational complexity results that illustrate it is NP-hard to fnd the optimal Abstracts (3): options that minimize planning time, but show this set can be approximated in polynomial time. Abstract 3: How Meta-Learning Could Help Us Collectively, these results provide a partial path Accomplish Our Grandest AI Ambitions, and toward abstractions that minimize the difculty of Early, Exotic Steps in that Direction in Meta- high quality learning and decision making. Learning, Clune 09:40 AM

A dominant trend in machine learning is that hand-designed pipelines are replaced by higher- Abstract 14: Compositional generalization in performing learned pipelines once sufcient minds and machines in Meta-Learning, Lake compute and data are available. I argue that 05:00 PM trend will apply to machine learning itself, and thus that the fastest path to truly powerful AI is People learn in fast and fexible ways that elude to create AI-generating algorithms (AI-GAs) that the best artifcial neural networks. Once a person on their own learn to solve the hardest AI learns how to “dax,” they can efortlessly problems. This paradigm is an all-in bet on meta- understand how to “dax twice” or “dax learning. To produce AI-GAs, we need work on vigorously” thanks to their compositional skills. In Three Pillars: meta-learning architectures, meta- this talk, we examine how people and machines learning learning algorithms, and automatically generalize compositionally in language-like generating environments. In this talk I will instruction learning tasks. Artifcial neural present recent work from our team in each of the networks have long been criticized for lacking three pillars: Pillar 1: Generative Teaching systematic compositionality (Fodor & Pylshyn, Networks (GTNs); Pillar 2: Diferential plasticity, 1988; Marcus, 1998), but new architectures have diferentiable neuromodulated plasticity been tackling increasingly ambitious language (“backpropamine”), and a Neuromodulated Meta- tasks. In light of these developments, we Learning algorithm (ANML); Pillar 3: the Paired reevaluate these classic criticisms and fnd that Open-Ended Trailblazer (POET). My goal is to artifcial neural nets still fail spectacularly when motivate future research into each of the three systematic compositionality is required. We then pillars and their combination. show how people succeed in similar few-shot learning tasks and fnd they utilize three inductive biases that can be incorporated into models. Finally, we show how more structured Abstract 8: Abstraction & Meta- neural nets can acquire compositional skills and Reinforcement Learning in Meta-Learning, human-like inductive biases through meta- Abel 02:00 PM learning. Reinforcement learning is hard in a fundamental sense: even in fnite and deterministic environments, it can take a large number of samples to fnd a near-optimal policy. In this talk, Biological and Artifcial Reinforcement I discuss the role that abstraction can play in Learning achieving reliable yet efcient learning and planning. I frst introduce classes of state Raymond Chua, Sara Zannone, Feryal Behbahani, Rui abstraction that induce a trade-of between Ponte Costa, Claudia Clopath, Blake Richards, Doina Precup optimality and the size of an agent’s resulting abstract model, yielding a practical algorithm for West Ballroom C, Fri Dec 13, 08:00 AM learning useful and compact representations from a demonstrator. Moreover, I show how these 55 Dec. 13, 2019

Reinforcement learning (RL) algorithms learn 09:45 Cofee Break & Poster Session through rewards and a process of trial-and-error. AM Mohinta, Agostinelli, Moringen, Lee, Lo, This approach was strongly inspired by the study Maass, Shefer, Bredenberg, Eysenbach, of animal behaviour and has led to outstanding Xia, Markou, Lichtenberg, Richemond, achievements in machine learning (e.g. in games, Zhang, Lanier, Lin, Fedus, Berseth, robotics, science). However, artifcial agents still Sarrico, Crosby, McAleer, Ghiassian, struggle with a number of difculties, such as Scherr, Bellec, Salaj, Kolbeinsson, sample efciency, learning in dynamic Rosenberg, Shin, Lee, Cecchi, Rish, Hajek environments and over multiple timescales, 10:30 Contributed Talk #1: Humans generalizing and transferring knowledge. On the AM fexibly transfer options at multiple other end, biological agents excel at these tasks. levels of abstractions Xia The brain has evolved to adapt and learn in 10:45 Contributed Talk #2: Slow processes dynamic environments, while integrating AM of neurons enable a biologically information and learning on diferent timescales plausible approximation to policy and for diferent duration. Animals and humans gradient Maass are able to extract information from the 11:00 Invited Talk 2: Understanding environment in efcient ways by directing their AM information demand at diferent attention and actively choosing what to focus on. levels of complexity Gottlieb They can achieve complicated tasks by solving 11:30 Invited Talk #3: Predictive Cognitive sub-problems and combining knowledge as well AM Maps with Multi-scale Successor as representing the environment in efcient ways Representations and Replay and plan their decisions of-line. Neuroscience Momennejad and cognitive science research has largely 12:00 Lunch Break & Poster Session focused on elucidating the workings of these PM mechanisms. Learning more about the neural and cognitive underpinnings of these functions could 02:00 Invited Talk #4: Multi-Agent be key to developing more intelligent and PM Interaction and Online Optimization autonomous agents. Similarly, having a in RL Mordatch computational and theoretical framework, 02:30 Invited Talk #5 : Materials Matter: together with a normative perspective to refer to, PM How biologically inspired could and does contribute to elucidate the alternatives to conventional neural mechanisms used by animals and humans to networks improve meta-learning and perform these tasks. Building on the connection continual learning Clune between biological and artifcial reinforcement 03:00 Invited Talk #6: Features or Bugs: learning, our workshop will bring together leading PM Synergistic Idiosyncrasies in Human and emergent researchers from Neuroscience, Learning and Decision-Making Yu Psychology and Machine Learning to share: (i) 03:30 Cofee Break & Poster Session how neural and cognitive mechanisms can PM provide insights to tackle challenges in RL 04:15 Contributed Talk #3 MEMENTO: research and (ii) how machine learning advances PM Further Progress Through Forgetting can help further our understanding of the brain Fedus and behaviour. 04:30 Invited Talk #7: Richard Sutton PM Sutton Schedule 05:00 Panel Discussion led by Grace PM Lindsay Lindsay, Richards, Precup, Gottlieb, Clune, Wang, Sutton, Yu, 09:00 Opening Remarks Chua, Behbahani, Momennejad AM Zannone, Ponte Costa, Clopath, Precup, Richards 09:15 Invited Talk #1: From brains to Abstracts (8): AM agents and back Wang Abstract 4: Contributed Talk #1: Humans fexibly transfer options at multiple levels 56 Dec. 13, 2019

of abstractions in Biological and Artifcial learning rule uses neuroscience inspired slow Reinforcement Learning, Xia 10:30 AM processes and top-down signals, while still being rigorously derived as an approximation to actor- Humans are great at using prior knowledge to critic policy gradient. To empirically evaluate this solve novel tasks, but how they do so is not well algorithm, we consider a delayed reaching task, understood. Recent work showed that in where an arm is controlled using a recurrent contextual multi-armed bandits environments, network of spiking neurons. In this task, we show humans create simple one-step policies that they that reward-based e-prop performs as well as an can transfer to new contexts by inferring context agent trained with actor-critic policy gradient with clusters. However, the daily tasks humans face biologically implausible BPTT. are often temporally extended, and demand more complex, hierarchically structured skills. The options framework provides a potential solution for representing such transferable skills. Options Abstract 6: Invited Talk 2: Understanding information demand at diferent levels of are abstract multi-step policies, assembled from complexity in Biological and Artifcial simple actions or other options, that can represent meaningful reusable skills. We Reinforcement Learning, Gottlieb 11:00 AM developed a novel two-stage decision making In the 1950s, Daniel Berlyne wrote extensively protocol to test if humans learn and transfer about the importance of curiosity – our intrinsic multi-step options. We found transfer efects at desire to know. To understand curiosity, Berlyne multiple levels of policy complexity that could not argued, we must explain why humans exert so be explained by fat reinforcement learning much efort to obtain knowledge, and how they models. We also devised an option model that decide which questions to explore, given that can qualitatively replicate the transfer efects in exploration is difcult and its long-term benefts human participants. Our results provide evidence are impossible to ascertain. I propose that these that humans create options, and use them to questions, although relatively neglected in explore in novel contexts, consequently neuroscience research, are key to understanding transferring past knowledge and speeding up cognition and complex decision making of the learning. type that humans routinely engage in and autonomous agents only aspire to. I will describe our investigations of these questions in two types Abstract 5: Contributed Talk #2: Slow of paradigms. In one paradigm, agents are placed processes of neurons enable a biologically in contexts with diferent levels of uncertainty plausible approximation to policy gradient and reward probability and can sample in Biological and Artifcial Reinforcement information about the eventual outcome. We fnd Learning, Maass 10:45 AM that, in humans and monkeys, information sampling is partially sensitive to uncertainty but Recurrent neural networks underlie the is also biased by Pavlovian tendencies, which astounding information processing capabilities of push agents to engage with signals predicting the brain, and play a key role in many state-of- positive outcomes and avoid those predicting the-art algorithms in deep reinforcement negative outcomes in ways that interfere with a learning. But it has remained an open question reduction of uncertainty. In a second paradigm, how such networks could learn from rewards in a agents are given several tasks of diferent biologically plausible manner, with synaptic difculty and can freely organize their exploration plasticity that is both local and online. We in order to learn. In these contexts, uncertainty- describe such an algorithm that approximates based heuristics become inefective, and optimal actor-critic policy gradient in recurrent neural strategies are instead based on learning progress networks. Building on an approximation of – the ability to frst engage with and later reduce backpropagation through time (BPTT): e-prop, uncertainty. I will show evidence that humans are and using the equivalence between forward and motivated to select difcult tasks consistent with backward view in reinforcement learning (RL), we learning maximization, but they guide their task formulate a novel learning rule for RL that is both selection according to success rates rather than online and local, called reward-based e-prop. This learning progress per se, which risks trapping 57 Dec. 13, 2019

them in tasks with too high levels of difculty and animal experimentation. I will summarize (e.g., random unlearnable tasks). Together, the behavioral and neural fndings in human and results show that information demand has rodent studies by us and other groups and consistent features that can be quantitatively discuss the road ahead. measured at various levels of complexity, and a research agenda exploring these features will greatly expand our understanding of complex Abstract 9: Invited Talk #4: Multi-Agent decision strategies. Interaction and Online Optimization in RL in Biological and Artifcial Reinforcement Learning, Mordatch 02:00 PM Abstract 7: Invited Talk #3: Predictive AI and robotics have made inspiring progress Cognitive Maps with Multi-scale Successor Representations and Replay in Biological over the recent years on training systems to and Artifcial Reinforcement Learning, solve specifc, well-defned tasks. But the need to Momennejad 11:30 AM specify tasks bounds the level of complexity that can ultimately be reached in training with such Reinforcement Learning's principles of temporal an approach. The sharp distinction between diference learning can drive representation training and deployment stages likewise limits learning, even in the absence of rewards. the degree to which these systems can improve Representation learning is especially important in and adapt after training. In my talk, I will problems that require a cognitive map (Tollman, advocate for multi-agent interaction and online 1947), common in mammalian spatial navigation optimization processes as key ingredients to and non-spatial inference, e.g., shortcut- and towards overcoming these limitations. latent learning, policy revaluation, and remapping. Here I focus on models of predictive In the frst part, I will show that through multi- cognitive maps that learn successor agent competition, a simple objective such as representations (SR) at multiple scales, and use hide-and-seek game, and standard reinforcement replay to update SR maps similar to Dyna models learning algorithms at scale, agents can create a (SR-Dyna). SR- and SR-Dyna based representation self-supervised autocurriculum with multiple learning capture biological representation distinct rounds of emergent strategy, many of learning refected in place-, grid-, and distance to which require sophisticated tool use and goal cell fring patterns (Stachenfed et al. 2017, coordination. Multi-agent interaction leads to Momennejad and Howard 2018), the interaction behaviors that center around more human- between boundary vector cells and place cells relevant skills than other self-supervised (De Cothi and Barry 2019), subgoal learning reinforcement learning methods such as intrinsic (Weinstein and Botvinick 2014), remapping, motivation and holds promise of open-ended policy revaluation, and latent learning behavior growth of complexity. (Momennejad et al. 2017; Russek, Momennejad et al. 2017). The SR framework makes testable In the second part, I will argue for usefulness and predictions about representation learning in generality of online optimization processes and biological systems: e.g., about how predictive show examples of incorporating them in model- features are extracted from visual experience and based control and generative modeling contexts abstracted into spatial representations that guide via energy-based models. I will show intriguing navigation. Specifcally, the SR is sensitive to the advantages, such as compositionality, robustness policy the animal has taken during navigation - to distribution shift, non-stationarity, and generating predictions about the representation adversarial attacks in generative modeling of goals and how rewarding locations distort the problems and planned exploration and fast predictive map. Finally, deep RL using SR has adaptation to changing environments in control been shown to support option discovery, which is problems. especially useful for empowering agents with intrinsic motivation in environments that have This is joint work with many wonderful colleagues sparse rewards and complex structures. These and students at OpenAI, MIT, University of fndings can lead to novel directions of human Washington, and UC Berkeley. 58 Dec. 13, 2019

behavior (which never switches away after a win). Furthermore, we demonstrate that the Abstract 10: Invited Talk #5 : Materials Bayesian model that best predicts human Matter: How biologically inspired behavior is equivalent to a particular form of Q- alternatives to conventional neural learning often used in the brain sciences, thus networks improve meta-learning and providing statistical, normative grounding to continual learning in Biological and Artifcial phenomenological models of human and animal Reinforcement Learning, Clune 02:30 PM behavior. I will describe how alternatives to conventional neural networks that are very loosely biologically inspired can improve meta-learning, including Abstract 13: Contributed Talk #3 MEMENTO: continual learning. First I will summarize Further Progress Through Forgetting in diferentiable Hebbian learning and diferentiable Biological and Artifcial Reinforcement neuromodulated Hebbian learning (aka Learning, Fedus 04:15 PM “backpropamine”). Both are techniques for Modern Reinforcement Learning (RL) algorithms, training deep neural networks with synaptic even those with intrinsic reward bonuses, sufer plasticity, meaning the weights can change during meta-testing/inference. Whereas meta- performance plateaus in hard-exploration learned RNNs can only store within-episode domains suggesting these algorithms have information in their activations, such plastic reached their ceiling. However, in what we describe as the MEMENTO observation, we fnd Hebbian networks can store information in their that new agents launched from the position weights in addition to its activations, improving performance on some classes of problems. where the previous agent saturated, can reliably Second, I will describe a new, unpublished make further progress. We show that this is not method that improves the state of the art in an artifact of limited model capacity or training duration, but rather indicative of interference in continual learning. ANML (A Neuromodulated learning dynamics between various stages of the Meta-Learning algorithm) meta-learns a neuromodulatory network that gates the activity domain [Schaul et al., 2019], signatures of multi- of the main prediction network, enabling the task and continual learning. To mitigate learning of up to 600 simple tasks sequentially. interference we design an end-to-end learning agent which partitions the environment into various segments, and models the value function separately in each score context per Jain et al. Abstract 11: Invited Talk #6: Features or [2019]. We demonstrate increased learning Bugs: Synergistic Idiosyncrasies in Human performance by this ensemble of agents on Learning and Decision-Making in Biological Montezuma’s Revenge and further show how this and Artifcial Reinforcement Learning, Yu ensemble can be distilled into a single agent with 03:00 PM the same model capacity as the original learner. Combining a multi-armed bandit task and Since the solution is empirically expressible by Bayesian computational modeling, we fnd that the original network, this provides evidence of interference and our approach validates an humans systematically under-estimate reward avenue to circumvent it. availability in the environment. This apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision- making under uncertainty, such as a default Graph Representation Learning tendency to assume non-stationarity in environmental statistics as well as the adoption Will Hamilton, Rianne van den Berg, Michael Bronstein, Stefanie Jegelka, Thomas Kipf, Jure of a simplistic decision policy. In particular, Leskovec, Renjie Liao, Yizhou Sun, Petar Veličković reward rate underestimation discourages the decision-maker from switching away from a West Exhibition Hall A, Fri Dec 13, 08:00 AM “good” option, thus achieving near-optimal 59 Dec. 13, 2019

Graph-structured data is ubiquitous throughout Schedule the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases 08:40 Opening remarks Hamilton into deep learning architectures is crucial if we AM want systems that can learn, reason, and 09:00 Marco Gori: Graph Representations, generalize from this kind of data. Furthermore, AM Backpropagation, and Biological graphs can be seen as a natural generalization of Plausibility Gori simpler kinds of structured data (such as 09:30 Peter Battaglia: Graph Networks for images), and therefore, they represent a natural AM Learning Physics Battaglia avenue for the next breakthroughs in machine 10:00 Open Challenges - Spotlight learning. AM Presentations Sumba Toral, Maron, Kolbeinsson Recent years have seen a surge in research on 10:30 Cofee Break graph representation learning, including AM techniques for deep graph embeddings, 11:00 Andrew McCallum: Learning DAGs generalizations of convolutional neural networks AM and Trees with Box Embeddings and to graph-structured data, and neural message- Hyperbolic Embeddings McCallum passing approaches inspired by belief 11:30 Poster Session #1 Jamadandi, Sanborn, propagation. These advances in graph neural AM Yao, Cai, Chen, Andreoli, Stoehr, Su, networks and related techniques have led to new Duan, Ferreira, Belli, Boyarski, Ye, state-of-the-art results in numerous domains, Ghalebi, Sarkar, KHADEMI, Faerman, including chemical synthesis, 3D-vision, Bose, Ma, Meng, Kazemi, Wang, Wu, Wu, recommender systems, question answering, and Joshi, Brockschmidt, Zambon, Graber, social network analysis. Van Belle, Malik, Glorot, Krenn, Cameron, Huang, Stoica, Toumpa The workshop will consist of contributed talks, 12:30 Lunch contributed posters, and invited talks on a wide PM variety of methods and problems related to graph representation learning. We will welcome 4-page 01:30 Outstanding Contribution Talk: Pre- original research papers on work that has not PM training Graph Neural Networks Liu previously been published in a machine learning 01:45 Outstanding Contribution Talk: conference or workshop. In addition to traditional PM Variational Graph Convolutional research paper submissions, we will also Networks Bonilla welcome 1-page submissions describing open 02:00 Outstanding Contribution Talk: problems and challenges in the domain of graph PM Probabilistic End-to-End Graph- representation learning. These open problems will based Semi-Supervised Learning be presented as short talks (5-10 minutes) vargas vieyra immediately preceding a cofee break to facilitate 02:15 Wengong Jin: Representation and and spark discussions. PM Synthesis of Molecular Graphs Jin 02:45 Presentation and Discussion: Open The primary goal for this workshop is to facilitate PM Graph Benchmark Leskovec community building; with hundreds of new 03:15 Poster Session #2 Li, Meltzer, Sun, researchers beginning projects in this area, we PM SALHA, Vlastelica Pogančić, Liu, Frasca, hope to bring them together to consolidate this Côté, Verma, CELIKKANAT, D'Oro, fast-growing area of graph representation Vijayan, Schuld, Veličković, Tayal, Pei, Xu, learning into a healthy and vibrant subfeld. Chen, Cheng, Chami, Kim, Gomes, Maziarka, Hofmann, Levie, Gogoglou, Gong, Monti, Wang, Leng, Vivona, Flam- Shepherd, Holtz, Zhang, KHADEMI, Hsieh, Stanić, Meng, Jiao 60 Dec. 13, 2019

04:15 Bistra Dilkina: Graph Representation provide intriguing arguments on its biological PM Learning for Optimization on Graphs plausibility. Dilkina 04:45 Marinka Zitnik: Graph Neural PM Networks for Drug Discovery and Abstract 3: Peter Battaglia: Graph Networks Development Zitnik for Learning Physics in Graph 05:15 Invited Presentation: Deep Graph Representation Learning, Battaglia 09:30 AM PM Library Zhang I'll describe a series of studies that use graph networks to reason about and interact with Abstracts (3): complex physical systems. These models can be used to predict the motion of bodies in particle Abstract 2: Marco Gori: Graph systems, infer hidden physical properties, control Representations, Backpropagation, and simulated robotic systems, build physical Biological Plausibility in Graph structures, and interpret the symbolic form of the Representation Learning, Gori 09:00 AM underlying laws that govern physical systems. More generally, this work underlines graph neural Neural architectures and many learning networks' role as a frst-class member of the environments can conveniently be expressed by deep learning toolkit. graphs. Interestingly, it has been recently shown that the notion of receptive feld and the correspondent convolutional computation can nicely be extended to graph-based data domains Abstract 15: Bistra Dilkina: Graph with successful results. On the other hand, graph Representation Learning for Optimization neural networks (GNN) were introduced by on Graphs in Graph Representation extending the notion of time-unfolding, which Learning, Dilkina 04:15 PM ended up into a state-based representation along Numerous real world applications involve discrete optimization problems on graphs, many of which with a learning process that requires state are NP-hard and hence developing efective relaxation to a fxed-point. It turns out that combinatorial algorithms is a research challenge. algorithms based on this approach applied to In this talk, I will show how leveraging the power learning tasks on collections of graphs are more of machine learning, and in particular graph computationally expensive than recent graph representation learning, can provide a new convolutional nets. paradigm for designing data-driven algorithms to solve combinatorial graph optimization problems. In this talk we advocate the importance of Our approaches automatically learn solution refreshing state-based graph representations strategies from distribution of instances by in the spirit of the early introduction of GNN for explicitly considering the combinatorial task the case of “network domains” that are during training. We show that we match and characterized by a single graph (e.g. trafc nets, often outperform hand-designed algorithms both social nets). In those cases, data over the graph with learning greedy algorithms for Minimum turn out to be a continuous stream, where time Vertex Cover, Maxcut and TSP, as well as when plays a crucial role and blurs the classic using our new framework ClusterNet to learn a statistical distinction between training and test graph representation for an efcient diferential set. When expressing the graphical domain and kmeans proxy for graph problems such as the neural network within the same Lagrangian partitioning for Community Detection and node framework for dealing with constraints, we selection for Facility Location. show novel learning algorithms that seem to be very appropriate for network domains. Finally, we show that in the proposed learning framework, the Lagrangian multipliers are associated with the delta term of Backpropagation, and 61 Dec. 13, 2019

Bayesian Deep Learning 08:05 Invited talk AM Yarin Gal, Jose Miguel Hernández-Lobato, Christos 08:25 Contributed talk Louizos, Eric Nalisnick, Zoubin Ghahramani, Kevin AM Murphy, Max Welling 08:40 Invited talk 2 Matthews

West Exhibition Hall C, Fri Dec 13, 08:00 AM AM 09:00 Contributed talk 2 Extending on the workshop’s success from the AM past 3 years, this workshop will study the 09:20 Poster spotlights developments in the feld of Bayesian deep AM learning (BDL) over the past year. The workshop 09:35 Poster session Farquhar, Daxberger, will be a platform to host the recent fourish of AM Look, Benatan, Zhang, Havasi, ideas using Bayesian approaches in deep Gustafsson, Brofos, Seedat, Livne, learning, and using deep learning tools in Ustyuzhaninov, Cobb, McGregor, Bayesian modelling. The program includes a mix McClure, Davidson, Hiranandani, Arora, of invited talks, contributed talks, and Itkina, Nielsen, Harvey, Valdenegro-Toro, contributed posters. Future directions for the feld Peluchetti, Moriconi, Cui, Smidl, Cemgil, will be debated in a panel discussion. Fitzsimons, Zhao, , vargas vieyra, Bhattacharyya, Sharma, Dubourg- Speakers: Felonneau, Warrell, Voloshynovskiy, * Andrew Wilson Rosca, Song, Ross, Fashandi, Gao, Shokri * Deborah Marks Razaghi, Chang, Xiao, Boehm, Giannone, * Jasper Snoek Krishnan, Davison, Ashukha, Liu, Huang, * Roger Grosse Nikishin, Park, Ahuja, Subedar, , * Chelsea Finn Gadetsky, Arias Figueroa, Rudner, Aslam, * Yingzhen Li Csiszárik, Moberg, Hebbal, Grosse, * Alexander Matthews Marttinen, An, Jónsson, Kessler, Kumar, Figurnov, Tickoo, Saemundsson, Workshop summary: Heljakka, Varga, Heim, Rossi, Laves, While deep learning has been revolutionary for Gharbieh, Roberts, Pérez Rey, Willetts, machine learning, most modern deep learning Chakrabarty, Ghaisas, Shneider, Buntine, models cannot represent their uncertainty nor Adamczewski, Gitiaux, Lin, Fu, Rätsch, take advantage of the well studied tools of Gomez, Bodin, Phung, Svensson, Tusi probability theory. This has started to change Amaral Laganá Pinto, Alizadeh, Du, following recent developments of tools and Murphy, Benkő, Vattikuti, Gordon, Kanan, techniques combining Bayesian approaches with Ihler, Graham, Teng, Kirsch, Pevny, deep learning. The intersection of the two felds Holotyak has received great interest from the community, 10:35 Invited talk 3 with the introduction of new deep learning AM models that take advantage of Bayesian 10:55 Contributed talk 3 techniques, and Bayesian models that AM incorporate deep learning elements. Many ideas 11:10 Invited talk 4 from the 1990s are now being revisited in light of AM recent advances in the felds of approximate inference and deep learning, yielding many 11:30 Contributed talk 4 exciting new results. AM 01:20 Invited talk 5 PM Schedule 01:40 Contributed talk 5 PM 08:00 Opening remarks 01:55 Invited talk 6 AM PM 62 Dec. 13, 2019

02:10 Contributed talk 6 04:05 Contributed talk 8 PM PM 02:30 Poster session 2 04:30 Panel session PM PM 03:30 Contributed talk 7 05:30 Poster session 3 PM PM 03:50 Invited talk 7 PM 63 Dec. 14, 2019

Dec. 14, 2019 rules, and interpretability of network activity. This workshop will promote discussion and showcase diverse perspectives on these open questions.

Real Neurons & Hidden Units: future directions at the intersection of Schedule neuroscience and AI 08:15 Opening Remarks Lajoie, Thompson, Guillaume Lajoie, Eli Shlizerman, Maximilian Puelma AM Puelma Touzel, Shlizerman, Kording Touzel, Jessica Thompson, Konrad Kording 08:30 Invited Talk: Hierarchical

East Ballroom A, Sat Dec 14, 08:00 AM AM Reinforcement Learning: Computational Advances and Recent years have witnessed an explosion of Neuroscience Connections Precup progress in AI. With it, a proliferation of experts 09:00 Invited Talk: Deep learning without and practitioners are pushing the boundaries of AM weight transport Lillicrap the feld without regard to the brain. This is in 09:30 Contributed talk: Eligibility traces stark contrast with the feld's transdisciplinary AM provide a data-inspired alternative origins, when interest in designing intelligent to backpropagation through time. algorithms was shared by neuroscientists, Guillaume Bellec, Franz Scherr, Elias psychologists and computer scientists alike. Hajek, Darjan Salaj, Anand Similar progress has been made in neuroscience Subramoney, Robert Legenstein, where novel experimental techniques now aford Wolfgang Maass unprecedented access to brain activity and 09:45 Cofee Break + Posters function. However, it is unclear how to maximize AM them to truly advance an end-to-end 10:30 Invited Talk: Computing and learning understanding of biological intelligence. The AM in the presence of neural noise Savin traditional neuroscience research program, 11:00 Invited Talk: Universality and however, lacks frameworks to truly advance an AM individuality in neural dynamics end-to-end understanding of biological across large populations of intelligence. For the frst time, mechanistic recurrent networks Sussillo discoveries emerging from deep learning, 11:30 Contributed talk: How well do deep reinforcement learning and other AI felds may be AM neural networks trained on object able to steer fundamental neuroscience research recognition characterize the mouse in ways beyond standard uses of machine visual system? Santiago A. Cadena, learning for modelling and data analysis. For Fabian H. Sinz, Taliah Muhammad, example, successful training algorithms in Emmanouil Froudarakis, Erick Cobos, artifcial networks, developed without biological Edgar Y. Walker, Jake Reimer, constraints, can motivate research questions and Matthias Bethge, hypotheses about the brain. Conversely, a deeper 11:45 Contributed talk: Functional understanding of brain computations at the level AM Annotation of Human Cognitive of large neural populations may help shape future States using Graph Convolution directions in AI. This workshop aims to address Networks Yu Zhang, Pierre Bellec this novel situation by building on existing AI- Neuro relationships but, crucially, outline new 12:00 Lunch Break directions for artifcial systems and next- PM generation neuroscience experiments. We invite 02:00 Invited Talk: Simultaneous rigidity contributions concerned with the modern PM and fexibility through modularity in intersection between neuroscience and AI and in cognitive maps for navigation Fiete particular, addressing questions that can only now be tackled due to recent progress in AI on the role of recurrent dynamics, inductive biases to guide learning, global versus local learning 64 Dec. 14, 2019

02:30 Invited Talk: Theories for the Abstract 3: Invited Talk: Deep learning PM emergence of internal without weight transport in Real Neurons & representations in neural networks: Hidden Units: future directions at the from perception to navigation intersection of neuroscience and AI, Lillicrap Ganguli 09:00 AM 03:00 Contributed talk: Adversarial Recent advances in machine learning have been PM Training of Neural Encoding Models made possible by employing the on Population Spike Trains Poornima backpropagation-of-error algorithm. Backprop Ramesh, Mohamad Atayi, Jakob H enables the delivery of detailed error feedback Macke across multiple layers of representation to adjust 03:15 Contributed talk: Learning to Learn synaptic weights, allowing us to efectively train PM with Feedback and Local Plasticity. even very large networks. Whether or not the Jack Lindsey brain employs similar deep learning algorithms 03:30 Cofee Break + Posters remains contentious; how it might do so remains PM a mystery. In particular, backprop uses the 04:15 Poster Session Sainath, Akrout, weights in the forward pass of the network to PM Delahunt, Kutz, Yang, Marino, Abbott, precisely compute error feedback in the Vecoven, Ernst, warrington, Kagan, Cho, backward pass. This way of computing errors Harris, Grinberg, Hopfeld, Krotov, across multiple layers is fundamentally at odds Muhammad, Cobos, Walker, Reimer, with what we know about the local computations Tolias, Ecker, Sheth, Zhang, Wołczyk, of brains. We will describe new proposals for Tabor, Maszke, Pogodin, Corneil, biologically motivated learning algorithms that Gerstner, Lin, Cecchi, Reinen, Rish, are as efective as backpropagation without Bellec, Salaj, Subramoney, Maass, Wang, requiring weight transport. Pakman, Lee, Paninski, Tripp, Graber, Schwing, Prince, Ocker, Buice, Lansdell, Kording, Lindsey, Sejnowski, Farrell, Abstract 6: Invited Talk: Computing and Shea-Brown, Farrugia, Nepveu, Im, learning in the presence of neural noise in Branson, Hu, Iyer, Mihalas, Aenugu, Real Neurons & Hidden Units: future Hazan, Dai, Nguyen, Tsao, Baraniuk, directions at the intersection of Anandkumar, Tanaka, Nayebi, Baccus, neuroscience and AI, Savin 10:30 AM Ganguli, Pospisil, Muller, Cheng, Varoquaux, Dadi, Gklezakos, Rao, Louis, One key distinction between artifcial and Papadimitriou, Vempala, Yadati, Zdeblick, biological neural networks is the presence of Witten, Roberts, Prabhu, Bellec, Ramesh, noise, both intrinsic, e.g. due to synaptic failures, Macke, Cadena, Bellec, Scherr, Marschall, and extrinsic, arising through complex recurrent Kim, Rapp, Fonseca, Armitage, Im, dynamics. Traditionally, this noise has been Hardcastle, Sharma, Bair, Valente, viewed as a ‘bug’, and the main computational Shang, Stern, Patil, Wang, Gorantla, challenge that the brain needs to face. More Stratton, Edwards, Lu, Ester, Vlasov, recently, it has been argued that circuit Golkar stochasticity may be a ‘feature', in that can be 04:45 Invited Talk: Sensory prediction recruited for useful computations, such as PM error signals in the neocortex representing uncertainty about the state of the Richards world. Here we lay out a new argument for the 05:15 Panel Session: A new hope for role of stochasticity during learning. In particular, PM neuroscience Bengio, Richards, we use a mathematically tractable stochastic Lillicrap, Fiete, Sussillo, Precup, Kording, neural network model that allows us to derive Ganguli local plasticity rules for optimizing a given global objective. This rule leads to representations that refect both task structure and stimuli priors in Abstracts (3): interesting ways. Moreover, in this framework stochasticity is both a feature, as learning cannot 65 Dec. 14, 2019

happen in the absence of noise, and a bug, as the Fair ML in Healthcare noise corrupts neural representations. Importantly, the network learns to use recurrent Shalmali Joshi, Irene Y Chen, Shems Saleh interactions to compensate for its negative efects, and maintain robust circuit function. East Ballroom B, Sat Dec 14, 08:00 AM

Clinical healthcare has been a natural application domain for ML with a few modest success stories Abstract 17: Invited Talk: Sensory prediction of practical deployment. Inequity and healthcare error signals in the neocortex in Real disparity has long been a concern in clinical and Neurons & Hidden Units: future directions public health for decades. However, the at the intersection of neuroscience and AI, challenges of fair and equitable care using ML in Richards 04:45 PM health has largely remained unexplored. While a Many models have postulated that the neocortex few works have attempted to highlight potential implements hierarchical inference system, concerns and pitfalls in recent years, there are whereby each region sends predictions of the massive gaps in academic ML literature in this inputs it expects to lower-order regions, allowing context. The goal of this workshop is to the latter to learn from any prediction errors. The investigate issues around fairness that are combining of top-down predictions with bottom- specifc to ML based healthcare. We hope to up sensory information to generate errors that investigate a myriad of questions via the can then be communicated across the hierarchy workshop. is critical to credit assignment in deep predictive learning algorithms. Indirect experimental Schedule evidence supporting a hierarchical prediction system in the neocortex comes from both human and animal work. However, direct evidence for 09:00 Check-in and set up contributed top-down guided prediction errors in the AM posters neocortex that can be used for deep credit 09:15 Opening Remarks assignment during unsupervised learning AM remains limited. Here, we address this issue with 09:30 Keynote - Milind Tambe 2-photon calcium imaging of layer 2/3 and layer 5 AM pyramidal neurons in the primary visual cortex of 10:00 Invited Talk - ZIad Obermeyer awake mice during passive exposure to visual AM Obermeyer stimuli where unexpected events occur. To assess 10:30 Cofee Break and Poster Session the evidence for top-down guided prediction AM Panda, Sattigeri, Varshney, Natesan errors we recorded from both the somatic Ramamurthy, Singh, Mhasawade, Joshi, compartments, and the apical dendrites in layer Seyyed-Kalantari, McDermott, Yona, 1, where a large number of top-down inputs are Atwood, Srinivasan, Halpern, Sculley, received. We fnd evidence for a diversity of Babaki, Carvalho, Williams, Razavian, prediction error signals depending on both the Zhang, Lu, Chen, Mao, Zhou, Kallus stimulus type and cell type. These signals can be 11:00 Breakout Sessions learnt in some cases, and in turn, they appear to AM drive some learning. This data will help us to both 12:45 Lunch Break understand hierarchical inference in the PM neocortex, and potentially guide new unsupervised techniques for machine learning. 02:00 Invited Talk - Sharad Goel PM 02:30 Invited Talk - Noa Dagan/Noam PM Barda Barda, Dagan 03:00 Invited Talk - Chelsea Barabas PM 66 Dec. 14, 2019

03:30 Cofee Break and Poster Session 12:00 Lunch + Poster Session Diehl, Cai, PM PM Hoedt, Kochanski, Kim, Lee, Park, Zhou, 04:00 Discussion Panel - All invited Gauch, Wilson, Chatterjee, Shrotriya, PM speakers will be panelists Papadimitriou, Schön, Zantedeschi, 05:00 Spotlight Talks and Poster Session Baasch, Waegeman, Cosne, Farrell, PM Lucier, Mones, Robinson, Chitsiga, Kristof, Das, Min, Puchko, Luccioni, Story, Hickey, Hu, Lütjens, Wang, Jing, Flaspohler, Wang, Sinha, Tang, Tiihonen, Tackling Climate Change with ML Glatt, Komurcu, Drgona, Gomez-Romero, Kapoor, Fitzpatrick, Rezvanifar, Albert, David Rolnick, Priya Donti, Lynn Kaack, Alexandre Irzak, Lamb, Mahesh, Maeng, Kratzert, Lacoste, Tegan Maharaj, , John Platt, Jennifer Chayes, Yoshua Bengio Friedler, Dalmasso, Robson, Malobola, Maystre, Lin, Mukkavili, Hutchinson,

East Ballroom C, Sat Dec 14, 08:00 AM Lacoste, Wang, Wang, Zhang, Preston, Pettit, Vrabie, Molina-Solana, Buonassisi, Climate change is one of the greatest problems Annex, Marques, Voss, Rausch, Evans society has ever faced, with increasingly severe 02:00 Carla Gomes (Cornell) Gomes consequences for humanity as natural disasters PM multiply, sea levels rise, and ecosystems falter. 02:40 Spotlight talks Since climate change is a complex issue, action PM takes many forms, from designing smart electric 03:30 Cofee Break + Poster Session grids to tracking greenhouse gas emissions PM through satellite imagery. While no silver bullet, 04:15 Lester Mackey (Microsoft Research machine learning can be an invaluable tool in PM and Stanford) Mackey fghting climate change via a wide array of 04:40 Spotlight talks applications and techniques. These applications PM require algorithmic innovations in machine learning and close collaboration with diverse 05:00 Practical Challenges in Applying ML felds and practitioners. This workshop is PM to Climate Change Chayes, Platt, intended as a forum for those in the machine Creutzig, Gonzalez, Miller learning community who wish to help tackle climate change. Joint Workshop on AI for Social Good Schedule Fei Fang, Joseph Bullock, Marc-Antoine Dilhac, Brian Green, natalie saltiel, Dhaval Adjodah, Jack Clark, Sean McGregor, Margaux Luck, Jonnie Penn, Tristan 08:15 Welcome and Opening Remarks Sylvain, Geneviève Boucher, Sydney Swaine-Simon, AM Girmaw Abebe Tadesse, Myriam Côté, Anna Bethke, 08:30 Jef Dean (Google AI) Dean Yoshua Bengio AM East Meeting Rooms 11 + 12, Sat Dec 14, 08:00 AM 09:05 Spotlight talks AM The accelerating pace of intelligent systems 09:45 Cofee Break + Poster Session research and real world deployment presents AM three clear challenges for producing "good" 10:30 Felix Creutzig (TU Berlin, MCC) intelligent systems: (1) the research community AM Creutzig lacks incentives and venues for results centered 11:05 Spotlight talks on social impact, (2) deployed systems often AM produce unintended negative consequences, and 11:15 Climate Change: A Grand Challenge (3) there is little consensus for public policy that AM for ML Bengio, Gomes, Ng, Dean, maximizes "good" social impacts, while Mackey minimizing the likelihood of harm. As a result, 67 Dec. 14, 2019

researchers often fnd themselves without a clear 09:35 All Tracks Poster Session path to positive real world impact. AM 09:45 Break / All Tracks Poster Session The Workshop on AI for Social Good addresses AM these challenges by bringing together machine 10:30 Towards a Social Good? Theories of learning researchers, social impact leaders, AM Change in AI saltiel, Richardson, Hamid ethicists, and public policy leaders to present 11:30 Hard Choices in AI Safety Dobbe, their ideas and applications for maximizing the AM , Mintz social good. This workshop is a collaboration of 11:35 The Efects of Competition and three formerly separate lines of research (i.e., AM Regulation on Error Inequality in this is a "joint" workshop), including researchers Data-driven Markets Elzayn in applications-driven AI research, applied ethics, 11:40 Learning Fair Classifers in Online and AI policy. Each of these research areas are AM Stochastic Setting Sun unifed into a 3-track framework promoting the exchange of ideas between the practitioners of 11:45 Fraud detection in telephone each track. AM conversations for fnancial services using linguistic features We hope that this gathering of research talent 11:50 A Typology of AI Ethics Tools, will inspire the creation of new approaches and AM Methods and Research to Translate tools, provide for the development of intelligent Principles into Practices Kinsey systems benefting all stakeholders, and 11:55 AI Ethics for Systemic Issues: A converge on public policy mechanisms for AM Structural Approach Schim van der encouraging these goals. Loef 12:00 Lunch - on your own PM Schedule 02:00 ML system documentation for PM transparency and responsible AI 08:00 Opening remarks Bengio development - a process and an AM artifact Yang 08:05 Computational Sustainability: 02:05 Beyond Principles and Policy AM Computing for a Better World and a PM Proposals: A framework for the agile Sustainable Future Gomes governance of AI Wallach 08:25 Translating AI Research into 02:30 Untangling AI Ethics: Working AM operational impact to achieve the PM Toward a Root Issue Lin Sustainable Development Goals 02:45 AI in Healthcare: Working Towards Luengo-Oroz PM Positive Clinical Impact Tomasev 08:45 Sacred Waveforms: An Indigenous 03:00 Implementing Responsible AI Green, AM Perspective on the Ethics of PM Wallach, Lin, Tomasev, Yang, Kinsey Collecting and Usage of Spiritual 03:30 Break / All Tracks Poster Session Data for Machine Learning Running PM Wolf, Running Wolf 04:15 "Good" isn't good enough Green 09:15 Balancing Competing Objectives for PM AM Welfare-Aware Machine Learning 04:20 Automated Quality Control for a with Imperfect Data Rolf PM Weather Sensor Network Dietterich 09:20 Dilated LSTM with ranked units for 04:50 AI and Sustainable Development AM Classifcation of suicide note Schoene PM Fang, Gomes, Luengo-Oroz, Dietterich, 09:25 Speech in Pixels: Automatic Cornebise AM Detection of Ofensive Memes for 05:50 Open announcement and Best Moderation Giro-i-Nieto PM Papers/Posters Award 09:30 Towards better healthcare: What AM could and should be automated? Fruehwirt Abstracts (23): 68 Dec. 14, 2019

Abstract 1: Opening remarks in Joint work in developing computational methods to Workshop on AI for Social Good, Bengio 08:00 address challenges in sustainability. AM

Speaker bio: Abstract 3: Translating AI Research into Yoshua Bengio is Full Professor of the Department of Computer Science and Operations Research, operational impact to achieve the scientifc director of Mila, CIFAR Program co- Sustainable Development Goals in Joint director of the CIFAR Learning in Machines and Workshop on AI for Social Good, Luengo-Oroz 08:25 AM Brains program (formerly Neural Computation and Adaptive Perception), scientifc director of In September 2015, Member States of the United IVADO and Canada Research Chair in Statistical Nations adopted the Sustainable Development Learning Algorithms. His main research ambition Goals: a set of goals to end poverty, protect the is to understand principles of learning that yield planet and ensure prosperity for all as part of a intelligence. He supervises a large group of new global agenda. To achieve the SDGs by 2030, graduate students and post-docs. His research is governments, private sector, civil society and widely cited (over 130000 citations found by academia must work together. In this talk, I will Google Scholar in August 2018, with an H-index present my journey working almost a decade at over 120, and rising fast). UN Global Pulse - an innovation initiative of the UN Secretary General- researching and developing real applications of data innovation Abstract 2: Computational Sustainability: and AI for sustainable development, Computing for a Better World and a humanitarian action and peace. The work of the Sustainable Future in Joint Workshop on AI UN includes providing food and assistance to 80 for Social Good, Gomes 08:05 AM million people, supplying vaccines to 45% of the world's children and assisting 65 million people Computational sustainability is a new feeing war, famine and persecution. Examples of interdisciplinary research feld with the innovation projects include understanding overarching goal of developing computational perceptions on refugees from social data; models, methods, and tools to help manage the mapping population movements in the aftermath balance of environmental, economic, and societal of natural disasters; understanding recovery from needs for a sustainable future. I will provide a shocks with fnancial transaction data; using short overview of computational sustainability, satellite data to inform humanitarian operations with examples ranging from wildlife conservation in confict zones or monitoring public radio to and biodiversity to evaluating the impacts of give voice to citizens in unconnected areas. hydropower dam proliferation in the Amazon Based on these examples, the session will discuss basin. Our research leverages the recent artifcial operational realities and the global policy intelligence (AI) advances in deep learning, environment, as well as challenges and reasoning, and decision making. I will highlight opportunities for the research community to cross-cutting computational themes, how AI ensure that its groundbreaking discoveries are enriches sustainability sciences and conversely, used responsibly and can be translated into social how sustainability questions enrich AI and impact for all. computer science.

Speaker bio: Speaker bio: Dr. Miguel Luengo-Oroz is the Chief Data Scientist Carla Gomes is a Professor of Computer Science at UN Global Pulse, an innovation initiative of the and the Director of the Institute for United Nations Secretary-General. He is the head Computational Sustainability at Cornell of the data science teams across the network of University. Her research area is artifcial Pulse labs in New York, Jakarta & Kampala. Over intelligence with a focus on large-scale the last decade, Miguel has built and directed constraint-based reasoning, optimization and teams bringing data and AI to operations and machine learning. She is noted for her pioneering policy through innovation projects with international organizations, govs, private sector & 69 Dec. 14, 2019

academia. He has worked in multiple domains Thanks to her genuine interest in people and including poverty, food security, refugees & their stories she is a multilingual Cultural migrants, confict prevention, human rights, Acclimation Artist dedicated to supporting economic indicators, gender, hate speech and Indigenous language and culture vitality. Together climate change. with her husband, Michael Running Wolf, they create virtual and augmented reality experiences to advocate for Native American voices,

Abstract 4: Sacred Waveforms: An languages and cultures. Caroline has a Master’s Indigenous Perspective on the Ethics of degree in Native American Studies from Montana Collecting and Usage of Spiritual Data for State University in Bozeman, Montana. She is currently pursuing her PhD in anthropology at the Machine Learning in Joint Workshop on AI University of British Columbia in Vancouver, for Social Good, Running Wolf, Running Wolf 08:45 AM Canada.

This talk is an introduction to the intersection of revitalizing sacred knowledge and exploitation of Abstract 5: Balancing Competing Objectives this data. For centuries Indigenous Peoples of the for Welfare-Aware Machine Learning with Americas have resisted the loss of their land, Imperfect Data in Joint Workshop on AI for technology, and cultural knowledge. This Social Good, Rolf 09:15 AM resistance has been enabled by vibrant cultural protocols, unique to each tribal nation, which From fnancial loans and humanitarian aid, to controls the sharing of, and limits access to medical diagnosis and criminal justice, sacred knowledge. Technology has made consequential decisions in society increasingly preserving cultural data easy, but there is a rely on machine learning. In most cases, the natural tension between reigniting ancient machine learning algorithms used in these knowledge and mediums that allow contexts are trained to optimize a single metric of uncontrollable exploitation of this data. Easy to performance; however, most real-world decisions access ML opens a new path toward creating new exist in a multi-objective setting that requires the Indigenous technology, such as ASR, but creating balance of multiple incentives and outcomes. To AI using Indigenous heritage requires care. this end, we develop a methodology for optimizing multi-objective decisions. Building on Speakers bio: the traditional notion of Pareto optimality, we Michael Running Wolf was raised in a rural village focus on understanding how to balance multiple in Montana with intermittent water and objectives when those objectives are measured electricity; naturally he now has a Masters of noisily or not directly observed. We believe this Science in Computer Science. Though he is a regime of imperfect information is far more published poet, he is a computer nerd at heart. common in real-world decisions, where one His lifelong goal is to pursue endangered cannot easily measure the social consequences indigenous language revitalization using of an algorithmic decision. To show how the multi- Augmented Reality and Virtual Reality (AR/VR) objective framework can be used in practice, we technology. He was raised with a grandmother present results using data from roughly 40,000 who only spoke his tribal language, Cheyenne, videos promoted by YouTube’s recommendation which like many other indigenous languages is algorithm. This illustrates the empirical trade-of near extinction. By leveraging his advanced between maximizing user engagement and Master’s degree in Computer Science and his promoting high-quality videos. We show that technical skills, Michael hopes to strengthen the multi-objective optimization could produce ecology of thought represented by indigenous substantial increases in average video quality at languages through immersive technology. the expense of almost negligible reductions in user engagement. Caroline Running Wolf, nee Old Coyote, is an enrolled member of the Apsáalooke Nation (Crow) Speaker bio: in Montana, with a Swabian (German) mother and Esther Rolf is a 4th year Ph.D. student in the also Pikuni, Oglala, and Ho-Chunk heritage. Computer Science department at the University 70 Dec. 14, 2019

of California, Berkeley, advised by Benjamin speech, unlike previous works that have focused Recht and Michael I. Jordan. She is an NSF in language. Graduate Research Fellow and is a fellow in the Global Policy Lab in the Goldman School of Public Speaker bio: Policy at UC Berkeley. Esther’s research targets Xavier Giro-i-Nieto is an associate professor at machine learning algorithms that interact with the Universitat Politecnica de Catalunya (UPC) in society. Her current focus lies in two main Barcelona and visiting researcher at Barcelona domains: the feld of algorithmic fairness, which Supercomputing Center (BSC). His obtained his aims to design and audit black-box decision doctoral degree from UPC in 2012 under the algorithms to ensure equity and beneft for all supervision of Prof. Ferran Marques (UPC) and individuals, and in machine learning for Prof. Shih-Fu Chang (Columbia University). His environmental monitoring, where abundant research interests focus on deep learning applied sources of temporally recurrent data provide an to multimedia and reinforcement learning. exciting opportunity to make inferences and predictions about our planet.

Abstract 8: Towards better healthcare: What could and should be automated? in Joint Abstract 6: Dilated LSTM with ranked units Workshop on AI for Social Good, Fruehwirt for Classifcation of suicide note in Joint 09:30 AM Workshop on AI for Social Good, Schoene While artifcial intelligence (AI) and other 09:20 AM automation technologies might lead to enormous Recent statistics in suicide prevention show that progress in healthcare, they may also have people are increasingly posting their last words undesired consequences for people working in online and with the unprecedented availability of the feld. In this interdisciplinary study, we textual data from social media platforms capture empirical evidence of not only what researchers have the opportunity to analyse such healthcare work could be automated, but also data. This work focuses on identifying suicide what should be automated. We quantitatively notes from other types of text in a document- investigate these research questions by utilizing level classifcation task, using a hierarchical probabilistic machine learning models trained on recurrent neural network to uncover linguistic thousands of ratings, provided by both healthcare patterns in the data. practitioners and automation experts. Based on our fndings, we present an analytical tool Speaker bio: (Automatability-Desirability Matrix) to support Annika Marie Schoene is a third-year PhD policymakers and organizational leaders in candidate in Natural Language Processing at the developing practical strategies on how to harness University of Hull and is afliated to IBM Research the positive power of automation technologies, UK. The main focus of her work lies in while accompanying change and empowering investigating recurrent neural networks for fne- stakeholders in a participatory fashion. grained emotion detection in social media data. She also has an interest in mental health issues Speaker bio: on social media, where she looks at how to Wolfgang Frühwirt is an Associate Member of the identify suicidal ideation in textual data. Oxford-Man Institute (University of Oxford, Engineering Department), where he works with the Machine Learning Research Group.

Abstract 7: Speech in Pixels: Automatic Detection of Ofensive Memes for Moderation in Joint Workshop on AI for Abstract 11: Towards a Social Good? Theories Social Good, Giro-i-Nieto 09:25 AM of Change in AI in Joint Workshop on AI for Social Good, saltiel, Richardson, Hamid 10:30 This work addresses the challenge of hate speech AM detection in Internet memes, and attempts using visual information to automatically detect hate 71 Dec. 14, 2019

Considerable hope and energy is put into AI -- maintains state-of-the-art conversational AI and its critique -- under the assumption that the technologies including natural language feld will make the world a “better” place by understanding, intelligent dialog systems, entity maximizing social good. Will it? For who? At what linking and resolution, and associated worldwide time scale? Most importantly, who defnes "social runtime operations. Dr. Natarajan joined Amazon good"? from the University of Southern California (USC) where he was Senior Vice Dean of Engineering in This panel invites dissent. Leading voices will pick the Viterbi School of Engineering, Executive apart these questions by sharing their own Director of the Information Sciences Institute (a competing theories of social change in relation to 300-person R&D organization), and Research AI. In addition to answering audience questions, Professor of computer science with distinction. they will share how they decide on trade-ofs Prior to that, as Executive VP at Raytheon BBN between pragmatism and principles and how Technologies, he led the speech, language, and they resist elements of AI research that are multimedia business unit, which included known to be dangerous and/or socially research and development operations, and degenerative, particularly in relation to commercial products for real-time multimedia surveillance, criminal justice and privacy. monitoring, document analysis, and information extraction. During his tenure at USC and at BBN, We undertake a probing and genuine Dr. Natarajan directed R&D eforts in speech conversation around these questions. Audience recognition, natural language processing, members are invited to submit questions at: computer vision, and other applications of https://app.sli.do/event/yndumbf3/live/questions. machine learning. While at USC, he directly led nationally infuential DARPA and IARPA sponsored Facilitators bio: research eforts in biometrics/face recognition, Dhaval Adjodah is a research scientist at the MIT OCR, NLP, media forensics, and forecasting. Most Media Lab. His research investigates the current recently, he helped to launch the Fairness in AI limitations of generalization in machine learning (FAI) program – a collaborative efort between as well as how to move beyond them by NSF and Amazon for funding fairness focused leveraging the social cognitive adaptations research eforts in US Universities. humans evolved to collaborate efectively at scale. Beyond pushing the limits of modern Rashida Richardson: As Director of Policy machine learning, he is also interested in Research, Rashida designs, implements, and improving institutions by using online human coordinates AI Now’s research strategy and experiments to better understand the cognitive initiatives on the topics of law, policy, and civil limits and biases that afect everyday individual rights. Rashida joins AI Now after working as economic, political, and social decisions. During Legislative Counsel at the American Civil Liberties his PhD, Dhaval was an intern in Prof. Yoshua Union of New York (NYCLU), where she led the Bengio's group at MILA, a member of the Harvard organization’s work on privacy, technology, Berkman Assembly on Ethics and Governance in surveillance, and education issues. Prior to the Artificial Intelligence, and a fellow at the Dalai NYCLU, she was a staf attorney at the Center for Lama Center For Ethics And Transformative HIV Law and Policy, where she worked on a wide- Values. He has a B.S. in Physics from MIT, and an range of HIV-related legal and policy issues M.S. in Technology Policy from the MIT Institute nationally, and she previously worked at for Data, Systems, and Society. Facebook Inc. and HIP Investor in San Francisco. Rashida currently serves on the Board of Trustees Natalie Saltiel (MIT) of Wesleyan University, the Advisory Board of the Civil Rights and Restorative Justice Project, the Speakers bio: Board of Directors of the College & Community Dr. Prem Natarajan is a Vice President in Fellowship, and she is an afliate and Advisory Amazon’s Alexa unit where he leads the Natural Board member of the Center for Critical Race + Understanding (NU) organization within Alexa AI. Digital Studies. She received her BA with honors NU is a multidisciplinary science and engineering in the College of Social Studies at Wesleyan organization that develops, deploys, and University and her JD from Northeastern 72 Dec. 14, 2019

University School of Law. stakeholders can navigate these dilemmas by incorporating distinct forms of dissent into the Sarah T. Hamid is an abolitionist and organizer in development pipeline, drawing on Elizabeth Southern California, working to build community Anderson's work on the epistemic powers of defense against carceral technologies. She's built democratic institutions. We outline a framework and worked on campaigns against predictive of sociotechnical commitments to formal, policing, risk assessment technologies, public/ substantive and discursive challenges that private surveillance partnerships, electronic address normative uncertainty across monitoring, and automated border screening. In stakeholders, and propose the cultivation of March 2019, she co-founded the Prison Tech related virtues by those responsible for Research Group (PTR-Grp), a coalition of development. abolitionists working on the intersection of technology/innovation and the prison-industrial Speakers bio: complex. PTR-Grp focuses on private-public Roel Dobbe’s research addresses the research partnerships deployed under the guise development, analysis, integration and of prison reform, which stage the prison as a site governance of data-driven systems. His PhD work for technological innovation and low-cost testing. combined optimization, machine learning and The project centers the needs and safety of control theory to enable monitoring and control of incarcerated and directly impacted people who safety-critical systems, including energy & power face the violently expropriative data science systems and cancer diagnosis and treatment. In industry with few safety nets. Sarah also addition to research, Roel has experience in facilitates the monthly convening of the industry and public institutions, where he has Community Defense Syllabus, during which served as a management consultant for AT activists of color from all over the country work to Kearney, a data scientist for C3 IoT, and a theorize the intersection of race and carceral researcher for the National ThinkTank in The computing. In 2020, she will lead the launch and Netherlands. His diverse experiences led him to roll-out of the Carceral Tech Resistance Network, examine the ways in which values and a community archive and knowledge-sharing stakeholder perspectives are represented in the project that seeks to amplify the capacity of process of designing and deploying AI and community organizations to resist the algorithmic decision-making and control systems. encroachment and experimentation of harmful Roel founded Graduates for Engaged and technologies. Extended Scholarship around Computing & Engineering (GEESE); a student organization stimulating graduate students across all Abstract 12: Hard Choices in AI Safety in Joint disciplines studying or developing technologies to take a broader lens at their feld of study and Workshop on AI for Social Good, Dobbe, engage across disciplines. Roel has published his Gilbert, Mintz 11:30 AM work in various journals and conferences, As AI systems become prevalent in high stakes including Automatica, the IEEE Conference on domains such as surveillance and healthcare, Decision and Control, the IEEE Power & Energy researchers now examine how to design and Society General Meeting, IEEE/ACM Transactions implement them in a safe manner. However, the on Computational Biology and Bioinformatics and potential harms caused by systems to NeurIPS. stakeholders in complex social contexts and how to address these remains unclear. In this paper, Thomas Krendl Gilbert is an interdisciplinary we explain the inherent normative uncertainty in Ph.D. candidate in Machine Ethics and debates about the safety of AI systems. We then Epistemology at UC Berkeley. With prior training address this as a problem of vagueness by in philosophy, sociology, and political theory, Tom examining its place in the design, training, and researches the various technical and deployment stages of AI system development. organizational predicaments that emerge when We adopt Ruth Chang's theory of intuitive machine learning alters the context of expert comparability to illustrate the dilemmas that decision-making. In particular, he is interested in manifest at each stage. We then discuss how how diferent algorithmic learning procedures 73 Dec. 14, 2019

(e.g. reinforcement learning) reframe classic technological ones. ethical questions, such as the problem of aggregating human values and interests. In his Speaker bio: free time he enjoys sailing and creative writing. Hadi Elzayn is a 4th year PhD Candidate in Applied Math and Computational Science at the Yonatan Mintz is a Postdoctoral Research Fellow University of Pennsylvania, advised by Michael at the H. Milton Stewart School of Industrial and Kearns. He is interested in the intersection of Systems Engineering at the Georgia Institute of computer science and economics, and in the Technology, previously he completed his PhD at particular topics of how algorithmic learning the department of Industrial Engineering and interacts with social concerns like fairness, Operations research at the University of privacy, and markets (and how to design California, Berkeley. His research interests focus algorithms respecting those concerns). He on human sensitive decision making and in received his BA from Columbia University in particular the application of machine learning and Mathematics and Economics. He has interned at optimization methodology for personalized Microsoft Research, and previously worked at the healthcare and fair and accountable decision consulting frm TGG. making. Yonatan's work has been published in many journals and conferences across the machine learning, operations research, and Abstract 14: Learning Fair Classifers in medical felds. Online Stochastic Setting in Joint Workshop on AI for Social Good, Sun 11:40 AM

Abstract 13: The Efects of Competition and One thing that diferentiates policy-driven Regulation on Error Inequality in Data- machine learning is that new public policies are driven Markets in Joint Workshop on AI for often implemented in a trial-and-error fashion, as data might not be available upfront. In this work, Social Good, Elzayn 11:35 AM we try to accomplish approximate group fairness Much work has documented instances of in an online decision-making process where unfairness in deployed machine learning models, examples are sampled \textit{i.i.d} from an and signifcant efort has been dedicated to underlying distribution. Our work follows from the creating algorithms that take into account issues classical learning-from-experts scheme, of fairness. Our work highlight an important but extending the multiplicative weights algorithm by understudied source of unfairness: market forces keeping separate weights for label classes as well that drive difering amounts of frm investment in as groups. Although accuracy and fairness are data across populations.We develop a high-level often conficting goals, we try to mitigate the framework, based on insights from learning trade-ofs using an optimization step and theory and industrial organization, to study this demonstrate the performance on real data set. phenomenon. In a simple model of this type of data-driven market, we frst show that a Speaker bio: monopoly will invest unequally in the groups. Yi (Alicia) Sun is a PhD Candidate in Institute for There are two natural avenues for preventing this Data, Systems and Society at MIT. Her research disparate impact: promoting competition and interests are designing algorithms that are robust regulating frm behavior. We show frst that and reliable, and as well as align with societal competition, under many natural models, does values. not eliminate incentives to invest unequally, and can even exacerbate it. We then consider two avenues for regulating the monopoly - requiring Abstract 15: Fraud detection in telephone the monopoly to ensure that each group’s error conversations for fnancial services using rates are low, or forcing each group’s error rates linguistic features in Joint Workshop on AI to be similar to each other - and quantify the for Social Good, 11:45 AM price of fairness (and who pays it). These models imply that mitigating fairness concerns may In collaboration with linguistics and expert require policy-driven solutions, and not only interrogators, we present an approach for fraud 74 Dec. 14, 2019

detection in transcribed telephone conversations. technology venture capital before returning to The proposed approach exploits the syntactic and university to study machine learning in 2014. semantic information of the transcription to extract both the linguistic markers and the sentiment of the customer's response. The Abstract 17: AI Ethics for Systemic Issues: A results of the proposed approach are Structural Approach in Joint Workshop on AI demonstrated with real-world fnancial services for Social Good, Schim van der Loef 11:55 AM data using efcient, robust and explainable classifers such as Naive Bayes, Decision Tree, Much of the discourse on AI ethics has focused on Nearest Neighbours, and Support Vector technical improvements and holding individuals Machines. accountable to prevent accidents and malicious use of AI. While this is useful and necessary, such Speaker bio: an “agency-focused” approach does not cover all Nikesh Bajaj is a Postdoctoral Research Fellow at the harmful outcomes caused by AI. In particular the University of East London, working on the it ignores the more indirect and complex risks Innovate UK funded project - Automation and resulting from AI’s interaction with the socio- Transparency across Financial and Legal Services, economic and political context. A “structural” in collaboration with Intelligent Voice Ltd. and approach is needed to account for such broader Strenuus Ltd. The project includes working with negative impacts where no individual can be held machine learning researchers, data scientists, accountable. This is particularly relevant for AI linguistics experts and expert interrogators to applied to systemic issues such as climate model human behaviour for deception detection. change. This talk explains why a structural He completed his PhD from Queen Mary approach is needed in addition to the existing University of London in a joint program with agency approach to AI ethics, and ofers some University of Genova. His PhD work is focused on preliminary suggestions for putting this into predictive analysis of auditory attention using practice. physiological signals (e.g. EEG, PPG, GSR). In addition to research, Nikesh has 5+ years of Speaker bio: teaching experience. His research interests focus Agnes Schim van der Loef: Hi, my name is Agnes on signal processing, machine learning, deep and I do ethics and policy research at Cervest, learning, and optimization. which is developing Earth Science AI to quantify climate uncertainty and inform decisions on more sustainable land use. As part of Cervest’s Abstract 16: A Typology of AI Ethics Tools, research residency programme earlier this year I Methods and Research to Translate started exploring the ethical implications of such use of AI, which resulted in this NeurIPS paper! Principles into Practices in Joint Workshop Now I am developing a framework to ensure all on AI for Social Good, Kinsey 11:50 AM steps in the development, distribution and use of What tools are available to guide the ethically Cervest’s AI-driven platform are ethical and aligned research, development and deployment prevent any harmful outcomes. I hold a frst class of intelligence systems? We construct a typology Honours degree in Arabic and Development to help practically-minded developers ‘apply Studies from SOAS University of London. Having ethics’ at each stage of the AI development studied the intersection of social, economic and pipeline, and to signal to researchers where political aspects of development, I am interested further work is needed. in how dilemmas around AI refect wider debates on power relations in society and I want to Speaker bio: explore how AI could be a vehicle for Libby Kinsey: Libby is lead technologist for AI at transformative social change. I am particularly Digital Catapult, the UK's advanced digital passionate about climate justice, which I have technology innovation centre, where she works engaged with academically and through with a multi-disciplinary team to support campaigning. organisations in building their AI capabilities responsibly. She spent her early career in 75 Dec. 14, 2019

Abstract 19: ML system documentation for transparency and responsible AI Speaker bio: development - a process and an artifact in Wendell Wallach is an internationally recognized Joint Workshop on AI for Social Good, Yang expert on the ethical and governance concerns 02:00 PM posed by emerging technologies, particularly artifcial intelligence and neuroscience. He is a One large-scale multistakeholder efort to consultant, an ethicist, and a scholar at Yale implement the values of the Montreal Declaration University’s Interdisciplinary Center for Bioethics, as well as other AI ethical principles is ABOUT ML, where he chairs the working research group on a recently-launched project led by the Partnership technology and ethics. He is co-author (with Colin on AI to synthesize and advance the existing Allen) of Moral Machines: Teaching Robots Right research by bringing PAI's Partner community and from Wrong, which maps the new feld variously beyond into a public conversation and catalyze called machine ethics, machine morality, building a set of resources that allow more computational morality, and friendly AI. His latest organizations to experiment with pilots. book is A Dangerous Master: How to Keep Eventually ABOUT ML aims to surface research- Technology from Slipping Beyond Our Control. driven best practices and aid with translating Wallach is the principal investigator of a Hastings those into new industry norms. This talk will be Center project on the control and responsible an overview of the work to date and ways to get innovation in the development of autonomous involved moving forward. machines.

Speaker bio: Jingying Yang is a Program Lead on the Research team at the Partnership on AI, where she leads a Abstract 21: Untangling AI Ethics: Working Toward a Root Issue in Joint Workshop on AI portfolio of collaborative multistakeholder for Social Good, Lin 02:30 PM projects on the topics of safety, fairness, transparency, and accountability, including the Given myriad issues in AI ethics as well as many ABOUT ML project to set new industry norms on competing frameworks/declarations, it may be ML documentation. Previously, she worked in useful to step back to see if we can fnd a root or Product Operations at Lyft, for the state of common issue, which may help to suggest a Massachusetts on health care policy, and in broad solution to the complex problem. This management consulting at Bain & Company. involves returning to frst principles: what is the nature of AI? I will suggest that AI is the power of increasing omniscience, which is not only Abstract 20: Beyond Principles and Policy generally disruptive to society but also a threat to Proposals: A framework for the agile our autonomy. A broad solution, then, is to aim at governance of AI in Joint Workshop on AI for restoring that autonomy. Social Good, Wallach 02:05 PM Speaker bio: The mismatch between the speed at which Patrick Lin is the director of the Ethics + innovative technologies are deployed and the Emerging Sciences Group, based at California slow traditional implementation of ethical/lethal Polytechnic State University, San Luis Obispo, oversight, requires creative, agile, multi- where he is also a philosophy professor. He has stakeholder, and cooperative approaches to published several books and papers in the feld of governance. Agile governance must go beyond technology ethics, especially with respect to hard law and regulations to accommodate soft robotics—including Robot Ethics (MIT Press, 2012) law, corporate self-governance, and technological and Robot Ethics 2.0 (Oxford University Press, solutions to challenges. This presentation will 2017)—human enhancement, cyberwarfare, summarize the concepts, insights, and creative space exploration, nanotechnology, and other approaches to AI oversight that have led to the areas. 1st International Congress for the Governance of AI, which will convey in Prague on April 16-18, 2020. 76 Dec. 14, 2019

Abstract 22: AI in Healthcare: Working Facilitator: Towards Positive Clinical Impact in Joint Brian Patrick Green is Director of Technology Workshop on AI for Social Good, Tomasev Ethics at the Markkula Center for Applied Ethics 02:45 PM at Santa Clara University. His interests include AI and ethics, the ethics of space exploration and Artifcial intelligence (AI) applications in use, the ethics of technological manipulation of healthcare hold great promise, aiming to humans, the ethics of catastrophic risk, and the empower clinicians to diagnose and treat medical intersection of human society and technology, conditions earlier and more efectively. To ensure including religion and technology. Green teaches that AI solutions deliver on this promise, it is AI ethics in the Graduate School of Engineering important to approach the design of prototype and is co-author of the Ethics in Technology solutions with clinical applicability in mind, Practice corporate technology ethics resources. envisioning how they might ft within existing clinical workfows. Here we provide a brief Speakers bio: overview of how we are incorporating this Wendell Wallach is an internationally recognized thinking in our research projects, while expert on the ethical and governance concerns highlighting challenges that lie ahead. posed by emerging technologies, particularly artifcial intelligence and neuroscience. He is a Speaker bio: consultant, an ethicist, and a scholar at Yale Nenad Tomasev: My research interests lie at the University’s Interdisciplinary Center for Bioethics, intersection of theory and impactful real-world AI where he chairs the working research group on applications, with a particular focus on AI in technology and ethics. He is co-author (with Colin healthcare, which I have been pursuing at Allen) of Moral Machines: Teaching Robots Right DeepMind since early 2016. In our most recent from Wrong, which maps the new feld variously work, published in Nature in July 2019, we called machine ethics, machine morality, demonstrate how deep learning can be used for computational morality, and friendly AI. His latest accurate early predictions of patient deterioration book is A Dangerous Master: How to Keep from electronic health records and alerting that Technology from Slipping Beyond Our Control. opens possibilities for timely interventions and Wallach is the principal investigator of a Hastings preventative care. Prior to moving to London, I Center project on the control and responsible had been involved with other applied projects at innovation in the development of autonomous Google, such as Email Intelligence and the machines. Chrome Data team. I obtained my PhD in 2013 from the Artifcial Intelligence Laboratory at JSI in Patrick Lin is the director of the Ethics + Slovenia, where I was working on better Emerging Sciences Group, based at California understanding the consequences of the curse of Polytechnic State University, San Luis Obispo, dimensionality in instance-based learning in where he is also a philosophy professor. He has many dimensions. published several books and papers in the feld of technology ethics, especially with respect to robotics—including Robot Ethics (MIT Press, 2012) Abstract 23: Implementing Responsible AI in and Robot Ethics 2.0 (Oxford University Press, Joint Workshop on AI for Social Good, Green, 2017)—human enhancement, cyberwarfare, Wallach, Lin, Tomasev, Yang, Kinsey 03:00 PM space exploration, nanotechnology, and other areas. This panel will discuss what might be some practical solutions for encouraging and Nenad Tomasev: My research interests lie at the implementing responsible AI. There will be time intersection of theory and impactful real-world AI for audience Q&A. applications, with a particular focus on AI in healthcare, which I have been pursuing at Audience members are invited to submit DeepMind since early 2016. In our most recent questions at https://app.sli.do/event/kfdhmkbd/ work, published in Nature in July 2019, we live/questions demonstrate how deep learning can be used for accurate early predictions of patient deterioration 77 Dec. 14, 2019

from electronic health records and alerting that technological interventions, and an opens possibilities for timely interventions and interdisciplinary focus that no longer prioritizes preventative care. Prior to moving to London, I technical considerations as superior to other had been involved with other applied projects at forms of knowledge. Google, such as Email Intelligence and the Chrome Data team. I obtained my PhD in 2013 Speaker bio: from the Artifcial Intelligence Laboratory at JSI in Ben Green is a PhD Candidate in Applied Math at Slovenia, where I was working on better Harvard, an Afliate at the Berkman Klein Center understanding the consequences of the curse of for Internet & Society at Harvard, and a Research dimensionality in instance-based learning in Fellow at the AI Now Institute at NYU. He studies many dimensions. the social and policy impacts of data science, with a focus on algorithmic fairness, municipal Jingying Yang is a Program Lead on the Research governments, and the criminal justice system. His team at the Partnership on AI, where she leads a book, The Smart Enough City: Putting Technology portfolio of collaborative multistakeholder in Its Place to Reclaim Our Urban Future, was projects on the topics of safety, fairness, published in 2019 by MIT Press. transparency, and accountability, including the ABOUT ML project to set new industry norms on ML documentation. Previously, she worked in Abstract 26: Automated Quality Control for a Product Operations at Lyft, for the state of Weather Sensor Network in Joint Workshop Massachusetts on health care policy, and in on AI for Social Good, Dietterich 04:20 PM management consulting at Bain & Company. TAHMO (the Trans-African Hydro-Meteorological Libby Kinsey: Libby is lead technologist for AI at Observatory) is a growing network of more than Digital Catapult, the UK's advanced digital 500 automated weather stations. The eventual technology innovation centre, where she works goal is to operate 20,000 stations covering all of with a multi-disciplinary team to support sub-Saharan Africa and providing ground truth for organisations in building their AI capabilities weather and climate models. Because sensors fail responsibly. She spent her early career in and go out of calibration, some form of quality technology venture capital before returning to control is needed to detect bad values and university to study machine learning in 2014. determine when a technician needs to visit a station. We are deploying a three-layer architecture that consists of (a) ftted anomaly Abstract 25: "Good" isn't good enough in detection models, (b) probabilistic diagnosis of Joint Workshop on AI for Social Good, Green broken sensors, and (c) spatial statistics to detect extreme weather events (that may exonerate 04:15 PM fagged sensors). Despite widespread enthusiasm among computer scientists to contribute to “social good,” the Speaker bio: feld's eforts to promote good lack a rigorous Dr. Dietterich is Distinguished Emeritus Professor foundation in politics or social change. There is of computer science at Oregon State University limited discourse regarding what “good” actually and currently pursues interdisciplinary research entails, and instead a reliance on vague notions at the boundary of computer science, ecology, of what aspects of society are good or bad. and sustainability policy. Moreover, the feld rarely considers the types of social change that result from algorithmic interventions, instead following a “greedy Abstract 27: AI and Sustainable Development algorithm” approach of pursuing technology- in Joint Workshop on AI for Social Good, centric incremental reform at all points. In order Fang, Gomes, Luengo-Oroz, Dietterich, Cornebise to reason well about doing good, computer 04:50 PM scientists must adopt language and practices to reason refexively about political commitments, a The focus on this panel is the use of AI for praxis that considers the long-term impacts of Sustainable Development and will explore the 78 Dec. 14, 2019

many opportunities this technology presents to improve lives around the world, as well as Julien Cornebise is an Honorary Associate address the challenges and barriers to its Professor at University College London. He applications. While there is much outstanding focuses on putting Machine Learning frmly into work being done to apply AI to such situations, the hands of nonprofts, certain part of certain, too often this research is not deployed and there governments, NGOs, UN agencies: those who is a disconnect between the research and actually work on tackling our societies' biggest industry communities and the public sector problems: . He built and until recently was a actors. With leading researchers and practitioners Director of Research of Element AI's AI for Good from across the academic, public, UN and private team, and head of its London Ofce. Prior to this, sectors this panel brings a diversity of experience Julien was at DeepMind (later acquired by to address these important issues. Google) as an early employee, where he led several fundamental research projects used in Audience members are invited to submit early demos and fundraising then co-created its questions at: https://app.sli.do/event/skexhgej/ Health Research team. Since leaving DeepMind in live/questions 2016, he has been working with Amnesty International, Human Rights Watch, and other Facilitator: I am an Assistant Professor in the actors. Julien holds an MSc in Computer Institute for Software Research in the School of Engineering, an MSc in Mathematical Statistics, Computer Science at Carnegie Mellon University. and a PhD in Mathematics, specialized in Computational Statistics, from University Paris VI Speaker bio: Pierre and Marie and Telecom ParisTech. He Carla Gomes is a Professor of Computer Science received the 2010 Savage Award in Theory and and the Director of the Institute for Methods from the International Society for Computational Sustainability at Cornell Bayesian Analysis for his PhD work. University. Her research area is artifcial intelligence with a focus on large-scale constraint-based reasoning, optimization and machine learning. She is noted for her pioneering Machine Learning for Autonomous Driving work in developing computational methods to address challenges in sustainability. Rowan McAllister, Nick Rhinehart, Fisher Yu, Li Erran Li, Anca Dragan Dr. Miguel Luengo-Oroz is the Chief Data Scientist East Meeting Rooms 1 - 3, Sat Dec 14, 08:00 AM at UN Global Pulse, an innovation initiative of the United Nations Secretary-General. He is the head Autonomous vehicles (AVs) provide a rich source of the data science teams across the network of of high-impact research problems for the Pulse labs in New York, Jakarta & Kampala. Over machine learning (ML) community at NeurIPS in the last decade, Miguel has built and directed diverse felds including computer vision, teams bringing data and AI to operations and probabilistic modeling, gesture recognition, policy through innovation projects with pedestrian and vehicle forecasting, human- international organizations, govs, private sector & machine interaction, and multi-agent planning. academia. He has worked in multiple domains The common goal of autonomous driving can including poverty, food security, refugees & catalyze discussion between these subfelds, migrants, confict prevention, human rights, generating a cross-pollination of research ideas. economic indicators, gender, hate speech and Beyond the benefts to the research community, climate change. AV research can improve society by reducing road accidents; giving independence to those Thomas G. Dietterich: Dr. Dietterich is unable to drive; and inspiring younger Distinguished Emeritus Professor of computer generations towards ML with tangible examples science at Oregon State University and currently of ML-based technology clearly visible on local pursues interdisciplinary research at the streets. boundary of computer science, ecology, and sustainability policy. 79 Dec. 14, 2019

As many NeurIPS attendees are key drivers 03:30 Cofee + Posters Caine, Wang, Sakib, behind AV-applied ML, the proposed NeurIPS PM Otawara, Kaushik, amirloo, Djuric, Rock, 2019 Workshop on Autonomous Driving intends Agarwal, Filos, Tigkas, Lee, Jeon, Jaipuria, to bring researchers together from both Wang, Zhao, Zhang, Singh, Banijamali, academia and industries to discuss machine Rohani, Sinha, Joshi, Chan, Abdou, Chen, learning applications in autonomous driving. Our Kim, mohamed, OKelly, Singhania, proposal includes regular paper presentations, Tsukahara, Keyaki, Palanisamy, Norden, invited speakers, and technical benchmark Marchetti-Bowick, Gu, Arora, Deshpande, challenges to present the current state of the art, Schneider, Jui, Aggarwal, Gangopadhyay, as well as the limitations and future directions for Yan autonomous driving. 04:30 Meta Learning Deep Visual Words PM for Fast Video Object Segmentation Behl Schedule 04:45 Urban Driving With Conditional PM Imitation Learning Reda 08:50 Welcome McAllister, Rhinehart, Li 05:00 Safety and Interaction: the Game AM PM Theory of Autonomous Vehicles 09:00 Invited Talk Koltun Fernández Fisac AM 05:30 Mixed Autonomy Trafc: A 09:30 Cofee + Posters Chen, Gählert, PM Reinforcement Learning Perspective AM Leurent, Lehner, Bhattacharyya, Behl, Wu Lim, Kim, Novosel, Osiński, Das, Shen, Hawke, Sicking, Shahian Jahromi, Tulabandhula, Michaelis, Rusak, BAO, Privacy in Machine Learning (PriML) Rashed, Chen, Ansari, Cha, Zahran, Reda, Kim, Dohyun, Suk, Jhung, Kister, Borja Balle, Kamalika Chaudhuri, Antti Honkela, Antti Fahrland, Jakubowski, Miłoś, Mercat, Koskela, Casey Meehan, Mi Jung Park, Mary Anne Arsenali, Homoceanu, Liu, , El Sallab, Smart, Adrian Weller Sobh, Arnab, Galias 10:30 Towards Robust Interactive East Meeting Rooms 8 + 15, Sat Dec 14, 08:00 AM AM Autonomy Gilitschenski The goal of our workshop is to bring together 11:00 Human-inspired AI for autonomous privacy experts working in academia and industry AM driving Baker to discuss the present and the future of privacy- 11:30 ArgoAI Challenge Chang, Singh, aware technologies powered by machine AM Hartnett, Cebron, Luo, Huang learning. The workshop will focus on the technical 12:00 Lunch aspects of privacy research and deployment with PM invited and contributed talks by distinguished 01:30 Invited Talk Urtasun researchers in the area. The programme of the PM workshop will emphasize the diversity of points of 02:00 DiDi Challenge view on the problem of privacy. We will also PM ensure there is ample time for discussions that 02:30 Invited Talk Wolf encourage networking between researches, PM which should result in mutually benefcial new 03:00 Patch Refnement - Localized 3D long-term collaborations. PM Object Detection Lehner 03:15 Conditional Flow Variational Schedule PM Autoencoders for Structured Sequence Prediction Bhattacharyya 08:10 Opening AM 80 Dec. 14, 2019

08:15 Privacy for Federated Learning, and Abstract 3: Gaussian Diferential Privacy in AM Federated Learning for Privacy Privacy in Machine Learning (PriML), Dong, McMahan Roth 09:05 AM 09:05 Gaussian Diferential Privacy Dong, Diferential privacy has seen remarkable success AM Roth as a rigorous and practical formalization of data 09:25 QUOTIENT: Two-Party Secure Neural privacy in the past decade. This privacy defnition AM Network Training & Prediction and its divergence based relaxations, however, Agrawal, Kusner, Gascon have several acknowledged weaknesses, either in 09:45 Cofee break handling composition of private algorithms or in AM analyzing important primitives like privacy 10:30 Fair Decision Making using Privacy- amplifcation by subsampling. Inspired by the AM Protected Data Machanavajjhala hypothesis testing formulation of privacy, this 11:20 Spotlight talks paper proposes a new relaxation, which we term AM ❝f-diferential privacy❞ (f-DP). This notion of 11:30 Poster Session Canonne, Jun, Neel, privacy has a number of appealing properties AM Wang, Vietri, Song, Lebensold, Zhang, and, in particular, avoids difculties associated Gondara, Li, Mireshghallah, Dong, with divergence based relaxations. First, f-DP Sarwate, Koskela, Jälkö, Kusner, Chen, preserves the hypothesis testing interpretation. Park, Machanavajjhala, Kalpathy-Cramer, In addition, f-DP allows for lossless reasoning , Feldman, Tomkins, Phan, Esfandiari, about composition in an algebraic fashion. Jaiswal, Sharma, Druce, Meehan, Zhao, Moreover, we provide a powerful technique to Hsu, Railsback, Flaxman, , Adebayo, import existing results proven for original DP to f- Korolova, Xu, Holohan, Basu, Joseph, DP and, as an application, obtain a simple Thai, Yang, Vitercik, Hutchinson, Wang, subsampling theorem for f-DP. In addition to the Yauney, Tao, Jin, Lee, McMillan, Izmailov, above fndings, we introduce a canonical single- Guo, Swaroop, Orekondy, Esmaeilzadeh, parameter family of privacy notions within the f- Procopio, Polyzotis, Mohammadi, Agrawal DP class that is referred to as ❝Gaussian 12:30 Lunch break diferential privacy❞ (GDP), defned based on PM testing two shifted Gaussians. GDP is focal 02:00 Fair Universal Representations via among the f-DP class because of a central limit PM Generative Models and Model theorem we prove. More precisely, the privacy Auditing Guarantees Sankar guarantees of any hypothesis testing based defnition of privacy (including original DP) 02:50 Pan-Private Uniformity Testing Amin, converges to GDP in the limit under composition. PM Joseph The CLT also yields a computationally inexpensive 03:10 Private Stochastic Convex tool for analyzing the exact composition of PM Optimization: Optimal Rates in private algorithms. Taken together, this collection Linear Time Feldman, Koren, Talwar of attractive properties render f-DP a 03:30 Cofee break mathematically coherent, analytically tractable, PM and versatile framework for private data analysis. 04:15 Formal Privacy At Scale: The 2020 Finally, we demonstrate the use of the tools we PM Decennial Census TopDown develop by giving an improved privacy analysis Disclosure Limitation Algorithm of noisy stochastic gradient descent. Leclerc 05:05 Panel Discussion PM Abstract 4: QUOTIENT: Two-Party Secure 05:55 Closing Neural Network Training & Prediction in PM Privacy in Machine Learning (PriML), Agrawal, Kusner, Gascon 09:25 AM

Abstracts (8): Recently, there has been a wealth of efort devoted to the design of secure protocols for 81 Dec. 14, 2019

machine learning tasks. Much of this is aimed at Machanavajjhala, Michael Hay and Gerome enabling secure prediction from highly-accurate Miklau] PrivateSQL: A Diferentially Private SQL Deep Neural Networks (DNNs). However, as DNNs Query Engine (#27) are trained on data, a key question is how such 4. [Amrita Roy Chowdhury, Chenghong Wang, Xi models can be also trained securely. The few He, Ashwin Machanavajjhala and Somesh Jha] prior works on secure DNN training have focused Crypt$\epsilon$: Crypto-Assisted Diferential either on designing custom protocols for existing Privacy on Untrusted Servers (#31) training algorithms or on developing tailored 5. [Jiaming Xu and Dana Yang] Optimal Query training algorithms and then applying generic Complexity of Private Sequential Learning (#32) secure protocols. In this work, we investigate the 6. [Hsiang Hsu, Shahab Asoodeh and Flavio advantages of designing training algorithms Calmon] Discovering Information-Leaking alongside a novel secure protocol, incorporating Samples and Features (#43) optimizations on both fronts. We present 7. [Martine De Cock, Rafael Dowsley, Anderson QUOTIENT, a new method for discretized training Nascimento, Davis Railsback, Jianwei Shen and of DNNs, along with a customized secure two- Ariel Todoki] Fast Secure Logistic Regression for party protocol for it. QUOTIENT incorporates key High Dimensional Gene Data (#44) components of state-of-the-art DNN training such 8. [Giuseppe Vietri, Grace Tian, Mark Bun, as layer normalization and adaptive gradient Thomas Steinke and Steven Wu] New Oracle- methods, and improves upon the state-of-the-art Efcient Algorithms for Private Synthetic Data in DNN training in two-party computation. Release (#45) Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy. Abstract 10: Fair Universal Representations via Generative Models and Model Auditing Guarantees in Privacy in Machine Learning Abstract 6: Fair Decision Making using (PriML), Sankar 02:00 PM Privacy-Protected Data in Privacy in Machine Learning (PriML), Machanavajjhala There is a growing demand for ML methods that 10:30 AM limit inappropriate use of protected information to avoid both disparate treatment and disparate Data collected about individuals is regularly used impact. In this talk, we present Generative to make decisions that impact those same Adversarial rePresentations (GAP) as a data- individuals. We consider settings where sensitive driven framework that leverages recent personal data is used to decide who will receive advancements in adversarial learning to allow a resources or benefts. While it is well known that data holder to learn universal representations there is a tradeof between protecting privacy that decouple a set of sensitive attributes from and the accuracy of decisions, in this talk, I will the rest of the dataset while allowing learning describe our recent work on a frst-of-its-kind multiple downstream tasks. We will briefy empirical study into the impact of formally highlight the theoretical and practical results of private mechanisms (based on diferential GAP. privacy) on fair and equitable decision-making. In the second half of the talk we will focus on model auditing. Privacy concerns have led to the Abstract 7: Spotlight talks in Privacy in development of privacy-preserving approaches Machine Learning (PriML), 11:20 AM for learning models from sensitive data. Yet, in practice, models (even those learned with privacy 1. [Jonathan Lebensold, William Hamilton, Borja guarantees) can inadvertently memorize unique Balle and Doina Precup] Actor Critic with training examples or leak sensitive features. To Diferentially Private Critic (#08) identify such privacy violations, existing model 2. [Andres Munoz, Umar Syed, Sergei Vassilvitskii auditing techniques use fnite adversaries defned and Ellen Vitercik] Private linear programming as machine learning models with (a) access to without constraint violations (#17) some fnite side information (e.g., a small 3. [Ios Kotsogiannis, Yuchao Tao, Xi He, Ashwin auditing dataset), and (b) fnite capacity (e.g., a 82 Dec. 14, 2019

fxed neural network architecture). In the second We study diferentially private (DP) algorithms for half of the talk, we present requirements under stochastic convex optimization: the problem of which an unsuccessful attempt to identify privacy minimizing the population loss given i.i.d. violations by a fnite adversary implies that no samples from a distribution over convex loss stronger adversary can succeed at such a task. functions. A recent work of Bassily et al. (2019) We will do so via parameters that quantify the has established the optimal bound on the excess capabilities of the fnite adversary, including the population loss achievable given n samples. size of the neural network employed by such an Unfortunately, their algorithm achieving this adversary and the amount of side information it bound is relatively inefcient: it requires has access to as well as the regularity of the O(min{n3/2,n5/2/d}) gradient computations, (perhaps privacy-guaranteeing) audited model. where d is the dimension of the optimization problem. We describe two new techniques for deriving DP convex optimization algorithms both Abstract 11: Pan-Private Uniformity Testing achieving the optimal bound on excess loss and in Privacy in Machine Learning (PriML), using O(min{n,n2/d}) gradient computations. In particular, the algorithms match the running time Amin, Joseph 02:50 PM of the optimal non-private algorithms. The frst A centrally diferentially private algorithm maps approach relies on the use of variable batch sizes raw data to diferentially private outputs. In and is analyzed using the privacy amplifcation by contrast, a locally diferentially private algorithm iteration technique of Feldmanet al. (2018). The may only access data through public interaction second approach is based on a general reduction with data holders, and this interaction must be a to the problem of localizing an approximately diferentially private function of the data. We optimal solution with diferential privacy. Such study the intermediate model of pan-privacy. localization, in turn, can be achieved using Unlike a locally private algorithm, a pan-private existing (non-private) uniformly stable algorithm receives data in the clear. Unlike a optimization algorithms. As in the earlier work, centrally private algorithm, the algorithm our algorithms require a mild smoothness receives data one element at a time and must assumption. We also give a linear-time algorithm maintain a diferentially private internal state achieving the optimal bound on the excess loss while processing this stream. First, we show that for the strongly convex case, as well as a faster pan-privacy against multiple intrusions on the algorithm for the non-smooth case. internal state is equivalent to sequentially interactive local privacy. Next, we contextualize pan-privacy against a single intrusion by Abstract 14: Formal Privacy At Scale: The analyzing the sample complexity of uniformity 2020 Decennial Census TopDown Disclosure testing over domain [k]. Focusing on the Limitation Algorithm in Privacy in Machine dependence on k, centrally private uniformity Learning (PriML), Leclerc 04:15 PM testing has sample complexity Θ(√k), while noninteractive locally private uniformity testing To control vulnerabilities to reconstruction- has sample complexity Θ(k). We show that the abetted re-identifcation attacks that leverage sample complexity of pan-private uniformity massive external data stores and cheap testing is Θ(k2/3). By a new Ω(k) lower bound for computation, the U.S. Census Bureau has elected the sequentially interactive setting, we also to adopt a formally private approach to disclosure separate pan-private from sequentially limitation in the 2020 Decennial Census of interactive locally private and multi-intrusion pan- Population and Housing. To this end, a team of private uniformity testing. disclosure limitation specialists have worked over the past 3 years to design and implement the U.S. Census Bureau TopDown Disclosure Abstract 12: Private Stochastic Convex Limitation Algorithm (TDA). This formally private Optimization: Optimal Rates in Linear Time algorithm generates Persons and Households micro-data, which will then be tabulated to in Privacy in Machine Learning (PriML), produce the fnal set of demographic statistics Feldman, Koren, Talwar 03:10 PM published as a result of the 2020 Census 83 Dec. 14, 2019

enumeration. In this talk, I outline the main learned model means. features of TDA, describe the current iteration of the process used to help policy makers decide By bringing together machine learning how to set and allocate privacy-loss budget, and researchers and physical scientists who apply outline known issues with - and intended fxes for machine learning, we expect to strengthen the - the current implementation of TDA. interdisciplinary dialogue, introduce exciting new open problems to the broader community, and stimulate production of new approaches to solving open problems in sciences. Invited talks Machine Learning and the Physical from leading individuals in both communities will Sciences cover the state-of-the-art techniques and set the stage for this workshop.

Atilim Gunes Baydin, Juan Carrasquilla, Shirley Ho, Karthik Kashinath, Michela Paganini, Savannah Thais, Anima Anandkumar, Kyle Cranmer, Roger Schedule Melko, Mr. Prabhat, Frank Wood

West 109 + 110, Sat Dec 14, 08:00 AM 08:10 Opening Remarks Baydin, Carrasquilla, AM Ho, Kashinath, Paganini, Thais, Machine learning methods have had great Anandkumar, Cranmer, Melko, Prabhat, success in learning complex representations that Wood enable them to make predictions about 08:20 Bernhard Schölkopf Schölkopf unobserved data. Physical sciences span AM problems and challenges at all scales in the 09:00 Towards physics-informed deep universe: from fnding exoplanets in trillions of AM learning for turbulent fow sky pixels, to fnding machine learning inspired prediction Yu solutions to the quantum many-body problem, to 09:20 JAX, M.D.: End-to-End Diferentiable, detecting anomalies in event streams from the AM Hardware Accelerated, Molecular Large Hadron Collider. Tackling a number of Dynamics in Pure Python Schoenholz associated data-intensive tasks including, but not 09:40 Morning Cofee Break & Poster limited to, segmentation, 3D computer vision, AM Session Metodiev, Zhang, Stoye, sequence modeling, causal reasoning, and Churchill, Sarkar, Cranmer, Brehmer, efcient probabilistic inference are critical for Jimenez Rezende, Harrington, Nigam, furthering scientifc discovery. In addition to using Thuerey, Maziarka, Sanchez Gonzalez, machine learning models for scientifc discovery, Okan, Ritchie, Erichson, Cheng, Jiang, the ability to interpret what a model has learned Pahng, Koelle, Khairy, Pol, Anirudh, Born, is receiving an increasing amount of attention. Sanchez-Lengeling, Timar, Goodall, Kriváchy, Lu, Adler, Trask, Cherrier, In this targeted workshop, we would like to bring Konno, Kasim, Golling, Alperstein, together computer scientists, mathematicians Ustyuzhanin, Stokes, Golubeva, Char, and physical scientists who are interested in Korovina, Cho, Chatterjee, Westerhout, applying machine learning to various outstanding Muñoz-Gil, Zamudio-Fernandez, Wei, Lee, physical problems, in particular in inverse Kofer, Power, Kazeev, Ustyuzhanin, problems and approximating physical processes; Maevskiy, Friederich, Tavakoli, understanding what the learned model really Neiswanger, Kulchytskyy, hari, Leu, represents; and connecting tools and insights Atzberger from physical sciences to the study of machine 10:40 Katie Bouman Bouman learning models. In particular, the workshop AM invites researchers to contribute papers that 11:20 Alán Aspuru-Guzik Aspuru-Guzik demonstrate cutting-edge progress in the AM application of machine learning techniques to real-world problems in physical sciences, and 12:00 Hamiltonian Graph Networks with using physical insights to understand what the PM ODE Integrators Sanchez Gonzalez 84 Dec. 14, 2019

12:20 Lunch Break study. AD transforms a program performing PM numerical computation into one computing the 02:00 Maria Schuld Schuld gradient of those computations. In PPL, a PM program describing a sampling procedure can be 02:40 Lenka Zdeborova Zdeborová modifed to perform inference on model PM parameters given observations. Other examples are vectorizing a program expressed on one data 03:20 Afternoon Cofee Break & Poster point, and learned transformations where ML PM Session Komkov, Fort, Wang, Yu, Park, models use programs as inputs or outputs. Schoenholz, Cheng, Grifths, Shimmin, This workshop will bring together researchers in Mukkavili, Schwaller, Knoll, Sun, the felds of AD, programming languages, Kisamori, Graham, Portwood, Huang, compilers, and ML, with the goal of Novello, Munchmeyer, Jungbluth, Levine, understanding the commonalities between Ayed, Atkinson, Hermann, Grönquist, , disparate approaches and views, and sharing Saha, Glaser, Li, Iiyama, Anirudh, Koch- ways to make these techniques broadly available. Janusz, Sundar, Lanusse, Edelen, Köhler, It would enable ML practitioners to iterate faster Yip, guo, Ju, Hanuka, Albert, Salvatelli, on novel models and architectures (e.g., those Verzetti, Duarte, Moreno, de Bézenac, naturally expressed through high-level constructs Vlontzos, Singh, Klijnsma, Neuberg, like recursion). Wright, Mustafa, Schmidt, Farrell, Sun Topics: 04:20 Towards an understanding of wide, —Abstractions and syntax (beyond meta- PM deep neural networks Bahri programming and operator overloading) to 05:00 Learning Symbolic Physics with naturally express a program (expression, or PM Graph Networks Cranmer procedure) as an object to be manipulated. 05:20 Metric Methods with Open Collider —Techniques from AD and PPL the ML community PM Data Metodiev could adopt to enable research on new models 05:40 Equivariant Hamiltonian Flows —How to overcome challenges due to the ML’s PM Jimenez Rezende specifc hardware (GPUs, specialized chips) and software (Python) stacks, and the particular demands of practitioners for their tools Program Transformations for ML —Greater collaboration between ML and programming languages communities

Pascal Lamblin, Atilim Gunes Baydin, Alexander Wiltschko, Bart van Merriënboer, Emily Fertig, Barak Pearlmutter, David Duvenaud, Laurent Hascoet Schedule

West 114 + 115, Sat Dec 14, 08:00 AM 08:30 Opening statements AM Machine learning researchers often express complex models as a program, relying on 08:40 Jan-Willem van de Meent - program transformations to add functionality. AM Compositional Methods for Learning New languages and transformations (e.g., and Inference in Deep Probabilistic TorchScript and TensorFlow AutoGraph) are Programs van de Meent becoming core capabilities of ML libraries. 09:30 Applications of a disintegration However, existing transformations, such as AM transformation Narayanan automatic diferentiation (AD), inference in 09:50 Cofee break probabilistic programming languages (PPL), and AM optimizing compilers are often built in isolation, 10:30 Christine Tasson - Semantics of and limited in scope. This workshop aims at AM Functional Probabilistic Programs viewing program transformations in ML in a Tasson unifed light, making these capabilities more 11:20 The Diferentiable Curry Vytiniotis accessible, and building entirely new ones. AM Program transformations are an area of active 85 Dec. 14, 2019

11:40 Functional Tensors for Probabilistic To enable a more iterative design of these AM Programming Obermeyer methods, we introduce compositional constructs, 12:00 Lunch break & Poster session which we refer to as combinators, which serve to PM Considine, Innes, Phan, Maclaurin, defne both model structure and evaluation Manhaeve, Radul, Gowda, Sharma, strategies that correspond to diferent Sennesh, Kochurov, Plotkin, Wiecki, importance sampling schemes. Together these Kukreja, Shan, Johnson, Belov, Pradhan, constructs defne a path towards a more Meert, Kimmig, De Raedt, Patton, compositional design of variational methods that Hofman, A. Saurous, Roy, Bingham, are correct by construction. Jankowiak, Carroll, Lao, Paull, Abadi, Rojas Jimenez, Chen 02:00 Optimized execution of PyTorch Abstract 5: Christine Tasson - Semantics of PM programs with TorchScript DeVito Functional Probabilistic Programs in 02:50 Skye Wanderman-Milne - JAX: Program Transformations for ML, Tasson PM accelerated machine-learning 10:30 AM research via composable function Probabilities are extensively used in Computer transformations in Python Science. Algorithms use Wanderman-Milne probabilistic choices for improving efciency or 03:40 Cofee break even for tackling PM problems that are unsolvable with deterministic 04:20 Generalized Abs-Linear Learning computing. Recently, PM Griewank (Functional) Probabilistic Programming has been 04:40 Towards Polyhedral Automatic introduced for PM Diferentiation Hueckelheim applications in Machine Learning and Artifcial 05:00 Taylor-Mode Automatic Intelligence. PM Diferentiation for Higher-Order Probabilistic programs are used to describe Derivatives in JAX Bettencourt statistical models and for 05:20 Panel and general discussion developing probabilistic data analysis. PM In Probabilistic Programming Languages, inference algorithms are often Abstracts (3): delegated to compilers including optimizations. This program Abstract 2: Jan-Willem van de Meent - transformations are error prone, yet they should Compositional Methods for Learning and not change the Inference in Deep Probabilistic Programs in probabilistic models. Hence the need for formal Program Transformations for ML, van de methods to avoid bugs. Meent 08:40 AM Developing formal semantics for probabilistic Deep learning and probabilistic programming are computing is challenging domains that have a lot in common in certain but crucial in order to systematize the analysis respects; both rely on software abstractions to and certifcation of enable iterative model development. probabilistic programs.

In this talk we discuss how we can integrate In this talk, I will frst introduce functional techniques from both domains in problems where probabilistic programing we would like to use priors to induce structured and the related problems. Then, I will present representations. To do so, we employ reweighted recent works in wake-sleep methods, which combine importance semantics of probabilistic computing, based on sampling methods (which have been approximation of programs operationalized in probabilistic programming) according to their use of resources. with variational methods for learning proposals. 86 Dec. 14, 2019

Abstract 10: Skye Wanderman-Milne - JAX: 03:00 Trafc4cast -- Trafc Map Movie accelerated machine-learning research via PM Forecasting Hochreiter, Sigal, Neun, composable function transformations in Jonietz, Choi, Martin, Yu, Liu, Nguyen, Python in Program Transformations for ML, Herruzo Sánchez, Shi, Gruca, Sutherland, Wanderman-Milne 02:50 PM Kreil, Kopp 04:15 The Game of Drones Competition JAX is a system for high-performance machine PM Toumieh, Vemprala, Shin, Kumar, Ivanov, learning research. It ofers the familiarity of Shim, Martinez-Carranza, Gyde, Kapoor, Python+NumPy together with hardware Nagami, Taubner, Madaan, Gillette, acceleration, and it enables the defnition and Stubbs composition of user-wielded function transformations useful for machine learning programs. These transformations include Abstracts (2): automatic diferentiation, automatic batching, end-to-end compilation (via XLA), parallelizing Abstract 2: The MineRL competition in over multiple accelerators, and more. Composing Competition Track Day 2, Ogura, Booth, Sun, these transformations is the key to JAX's power Topin, Houghton, Guss, Milani, Vinyals, Hofmann, and simplicity. KIM, Ramanauskas, Laurent, Nishio, Kanervisto, Skrynnik, Amiranashvili, Scheller, WANG, Schraner 09:00 AM

MineRL Competition on Sample Efcient Competition Track Day 2 Reinforcement Learning.

Hugo Jair Escalante Competition chairs: Brandon Houghton, William

West 116 + 117, Sat Dec 14, 08:00 AM Guss, Stephanie Milani, Nicholay Topin https://nips.cc/Conferences/2019/ * Overview and highlights of the competition. CallForCompetitions Brandon Houghton, William Guss

* Competition Awards. Stephanie Milani Schedule * Special Awards. Oriol Vinyals & advisory board.

08:00 Causality for Climate (C4C) Käding, * Discussion of future competitions. Katja AM Gerhardus, Runge Hofmann 09:00 The MineRL competition Ogura, Booth, AM Sun, Topin, Houghton, Guss, Milani, * Competitors Presentations Vinyals, Hofmann, KIM, Ramanauskas, Laurent, Nishio, Kanervisto, Skrynnik, Amiranashvili, Scheller, WANG, Schraner 10:30 Cofee break Abstract 9: The Game of Drones Competition AM in Competition Track Day 2, Toumieh, Vemprala, Shin, Kumar, Ivanov, Shim, Martinez- 11:00 The Animal-AI Olympics Makoviichuk, Carranza, Gyde, Kapoor, Nagami, Taubner, AM Crosby, Beyret, Feyereisl, Yamakawa Madaan, Gillette, Stubbs 04:15 PM 12:00 The AutoDL Challenge Treguer, Kim, PM Guo, Luo, Zhao, Li, Guo, Zhang, Ota * Opening/Introduction 01:00 Lunch break PM -- Speakers: Ratnesh Madaan, Keiko Nagami 02:00 Overview of the Live Reinforcement PM Learning Malaria Challenge Remy -- Ashish Kapoor 87 Dec. 14, 2019

* Tier 1 ICLR, ICML, AAAI. Now, we wish to extend these ideas and explore a new direction: how emergent -- Speakers: Rahul Kumar, Charbel Toumieh, communication can become more like natural Andrey Ivanov, Antony Gillette, Joe Booth, Jose language, and what natural language Martinez-Carranza understanding can learn from emergent communication. -- Chair: Ratnesh Madaan, Keiko Nagami The push towards emergent natural language is a necessary and important step in all facets of the * Tier 2 feld. For studying the evolution of human language, emerging a natural language can -- Speakers: Sangyun Shin, David Hyunchul Shim, uncover the requirements that spurred crucial Ratnesh Madaan, Keiko Nagami aspects of language (e.g. compositionality). When emerging communication for multi-agent scenarios, protocols may be sufcient for * Tier 3 machine-machine interactions, but emerging a natural language is necessary for human- -- Speakers: Sangyun Shin, Charbel Toumieh machine interactions. Finally, it may be possible to have truly general natural language -- Chair: Ratnesh Madaan, Keiko Nagami, understanding if agents learn the language through interaction as humans do. To make this progress, it is necessary to close the gap * Prize Distribution between artifcial and natural language learning.

--Speaker: Ashish Kapoor, To tackle this problem, we want to take an interdisciplinary approach by inviting researchers -- Chair: Ratnesh Madaan, Keiko Nagami from various felds (machine learning, game theory, evolutionary biology, linguistics, cognitive science, and programming languages) to participate and engaging them to unify the Emergent Communication: Towards difering perspectives. We believe that the third Natural Language iteration of this workshop with a novel, unexplored goal and strong commitment to

Abhinav Gupta, Michael Noukhovitch, Cinjon Resnick, diversity will allow this burgeoning feld to Natasha Jaques, Angelos Filos, Marie Ossenkopf, fourish. Angeliki Lazaridou, Jakob Foerster, Ryan Lowe, Douwe Kiela, Kyunghyun Cho Schedule West 118 - 120, Sat Dec 14, 08:00 AM

Communication is one of the most impressive 08:55 Introductory Remarks human abilities but historically it has been AM studied in machine learning on confned datasets 09:00 Invited Talk - 1 Gibson of natural language, and by various other felds in AM simple low-dimensional spaces. Recently, with 09:45 Contributed Talk - 1 Lee the rise of deep RL methods, the questions AM around the emergence of communication can 10:00 Cofee Break / Poster Session now be studied in new, complex multi-agent AM scenarios. Two previous successful workshops 10:30 Invited Talk - 2 Zaslavsky (2017, 2018) have gathered the community to AM discuss how, when, and to what end 11:15 Contributed Talk - 2 Cowen-Rivers communication emerges, producing research that AM was later published at top ML venues such as 88 Dec. 14, 2019

11:30 Extended Poster Session LaCroix, naming system of a single language. Second, I AM Ossenkopf, Lee, Fitzgerald, Mihai, Hare, will derive a theoretical link between optimized Zaidi, Cowen-Rivers, Marzoev, semantic systems and local, context-dependent Kharitonov, Yuan, Korbak, Liang, Ren, interactions that involve pragmatic skills. Dessì, Potash, Guo, Hashimoto, Liang, Specifcally, I will show that pressure for efcient Zubek, Fu, Zhu, Lerer coding may also give rise to human pragmatic 02:00 Invited Talk - 3 Eisner reasoning, as captured by the Rational Speech PM Act framework. This line of work identifes 02:45 Contributed Talk - 3 Lerer information-theoretic optimization principles that PM characterize human semantic and pragmatic communication, and that could be used to inform 03:00 Invited Talk - 4 Andreas artifcial agents with human-like communication PM systems. 03:45 Cofee Break / Poster Session PM 04:15 Invited Talk - 5 Lee PM Science meets Engineering of Deep 05:00 Panel Discussion Andreas, Gibson, Lee, Learning PM Zaslavsky, Eisner, Schmidhuber

05:55 Closing Remarks Levent Sagun, CAGLAR Gulcehre, Adriana Romero, PM Negar Rostamzadeh, Nando de Freitas

West 121 + 122, Sat Dec 14, 08:00 AM Abstracts (1): Deep learning can still be a complex mix of art Abstract 5: Invited Talk - 2 in Emergent and engineering despite its tremendous success Communication: Towards Natural Language, in recent years, and there is still progress to be Zaslavsky 10:30 AM made before it has fully evolved into a mature Information-theoretic principles in semantic and scientifc discipline. The interdependence of pragmatic communication architecture, data, and optimization gives rise to an enormous landscape of design and Maintaining useful semantic representations of performance intricacies that are not well- the environment and pragmatically reasoning understood. The evolution from engineering about utterances are crucial aspects of human towards science in deep learning can be achieved language. However, it is not yet clear what by pushing the disciplinary boundaries. Unlike in computational principles could give rise to the natural and physical sciences -- where human-like semantics and pragmatics in experimental capabilities can hamper progress, machines. In this talk, I will propose a possible i.e. limitations in what quantities can be probed answer to this open question by hypothesizing and measured in physical systems, how much that pressure for efcient coding may underlie and how often -- *in deep learning the vast both abilities. First, I will argue that languages majority of relevant quantities that we wish to efciently encode meanings into words by measure can be tracked in some way*. As such, a optimizing the Information Bottleneck (IB) greater limiting factor towards scientifc tradeof between the complexity and accuracy of understanding and principled design in deep the lexicon. This proposal is supported by cross- learning is how to *insightfully harness the linguistic data from three semantic domains: tremendous collective experimental capability of names for colors, artifacts, and animals. the feld*. As a community, some primary aims Furthermore, it suggests that semantic systems would be to (i) identify obstacles to better models may evolve by navigating along the IB theoretical and algorithms, (ii) identify the general trends limit via an annealing-like process. This process that are potentially important which we wish to generates quantitative predictions, which are understand scientifcally and potentially directly supported by an analysis of recent data theoretically and; (iii) careful design of scientifc documenting changes over time in the color experiments whose purpose is to clearly resolve 89 Dec. 14, 2019

and pinpoint the origin of mysteries (so-called 12:00 Lunch Break and Posters Song, Hofer, 'smoking-gun' experiments). PM Chang, Cohen, Islam, Blumenfeld, Madsen, Frankle, Goldt, Chatterjee, Panigrahi, Renda, Bartoldson, Birhane, Schedule Baratin, Chatterji, Novak, Forde, Jiang, Du, Adilova, Kamp, Weinstein, Hubara, 08:00 Welcoming remarks and introduction Ben-Nun, Hoefer, Soudry, Yu, Zhong, AM Sagun, Gulcehre, Romero, Rostamzadeh, Yang, Dhillon, Carbonell, Zhang, Gilboa, de Freitas Brandstetter, Johansen, Dziugaite, 08:15 Surya Ganguli - An analytic theory of Somani, Morcos, Kalaitzis, Sedghi, Xiao, AM generalization dynamics and Zech, Yang, Kaur, Ma, Tsai, transfer learning in deep linear Salakhutdinov, Yaida, Lipton, Roy, Carbin, networks Ganguli Krzakala, Zdeborová, Gur-Ari, Dyer, Krishnan, Mobahi, Bengio, Neyshabur, 08:35 Yasaman Bahri - Tractable limits for Netrapalli, Sankaran, Cornebise, Bengio, AM deep networks: an overview of the Michalski, Ebrahimi Kahou, Arefn, Hron, large width regime Bahri Lee, Sohl-Dickstein, Schoenholz, Schwab, 08:55 Florent Krzakala - Learning with Li, Choe, Petzka, Verma, Lin, AM "realistic" synthetic data Krzakala Sminchisescu 09:15 Surya Ganguli, Yasaman Bahri, 02:00 Douwe Kiela - Benchmarking AM Florent Krzakala moderated by PM Progress in AI: A New Benchmark for Lenka Zdeborova Krzakala, Bahri, Natural Language Understanding Ganguli, Zdeborová, Dieng, Bruna Kiela 09:45 Cofee and posters 02:20 Audrey Durand - Trading of theory AM PM and practice: A bandit perspective 10:30 Carl Doersch - On Self-Supervised Durand AM Learning for Vision Doersch 02:40 Kamalika Chaudhuri - A Three 10:50 Raquel Urtasun - Science and PM Sample Test to Detect Data Copying AM Engineering for Self-driving Urtasun in Generative Models Chaudhuri 11:10 Sanja Fidler - TBA Fidler 03:00 Audrey Durand, Douwe Kiela, AM PM Kamalika Chaudhuri moderated by 11:30 Carl Doersch, Raquel Urtasun, Sanja Yann Dauphin Durand, Chaudhuri, AM Fidler moderated by Natalia Dauphin, Firat, Gorur, Kiela Neverova Urtasun, Fidler, Neverova, 03:30 Cofee and posters Radosavovic, Doersch PM 04:15 Panel - The Role of Communication PM at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez Lakshmiratan, Doshi-Velez, Ganguli, Lipton, Paganini, Anandkumar, Yosinski 05:10 Contributed Session - Spotlight Talks PM Frankle, Schwab, Morcos, Ma, Tsai, Salakhutdinov, Jiang, Krishnan, Mobahi, Bengio, Yaida, Yang

Abstracts (5):

Abstract 5: Surya Ganguli, Yasaman Bahri, Florent Krzakala moderated by Lenka Zdeborova in Science meets Engineering of 90 Dec. 14, 2019

Deep Learning, Krzakala, Bahri, Ganguli, Dauphin. Session advisors are Orhan First and Zdeborová, Dieng, Bruna 09:15 AM Dilan Gorur.

The mini panel with Florent Krzakala, Yasaman Bahri, Surya Ganguli will be moderated by Lenka Abstract 17: Panel - The Role of Zdeborova. Session advisors are Joan Bruna and Adji Bousso Dieng. Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez in Science meets Engineering of Deep Learning, Abstract 10: Carl Doersch, Raquel Urtasun, Lakshmiratan, Doshi-Velez, Ganguli, Lipton, Sanja Fidler moderated by Natalia Neverova Paganini, Anandkumar, Yosinski 04:15 PM in Science meets Engineering of Deep Learning, Urtasun, Fidler, Neverova, The panel with Aparna Lakshmiratan, Jason Radosavovic, Doersch 11:30 AM Yosinski, Been Kim, Surya Ganguli, Finale Doshi- Velez will be moderated by Zack Lipton. The mini-panel with Carl Doersch, Raquel Urtasun, Sanja Fidler will be moderated by Natalia Neverova. Session advisors are Alp Guler and Ilija Radosavovic. ML For Systems

Milad Hashemi, Azalia Mirhoseini, Anna Goldie, Kevin Abstract 11: Lunch Break and Posters in Swersky, Xinlei XU, Jonathan Raiman Science meets Engineering of Deep West 202 - 204, Sat Dec 14, 08:00 AM Learning, Song, Hofer, Chang, Cohen, Islam, Blumenfeld, Madsen, Frankle, Goldt, Chatterjee, Compute requirements are growing at an Panigrahi, Renda, Bartoldson, Birhane, Baratin, exponential rate, and optimizing these computer Chatterji, Novak, Forde, Jiang, Du, Adilova, Kamp, systems often involves complex high-dimensional Weinstein, Hubara, Ben-Nun, Hoefer, Soudry, Yu, combinatorial problems. Yet, current methods rely Zhong, Yang, Dhillon, Carbonell, Zhang, Gilboa, heavily on heuristics. Very recent work has Brandstetter, Johansen, Dziugaite, Somani, outlined a broad scope where machine learning Morcos, Kalaitzis, Sedghi, Xiao, Zech, Yang, Kaur, vastly outperforms these traditional heuristics: Ma, Tsai, Salakhutdinov, Yaida, Lipton, Roy, including scheduling, data structure design, Carbin, Krzakala, Zdeborová, Gur-Ari, Dyer, microarchitecture, compilers, circuit design, and Krishnan, Mobahi, Bengio, Neyshabur, Netrapalli, the control of warehouse scale computing Sankaran, Cornebise, Bengio, Michalski, Ebrahimi systems. In order to continue to scale these Kahou, Arefn, Hron, Lee, Sohl-Dickstein, computer systems, new learning approaches are Schoenholz, Schwab, Li, Choe, Petzka, Verma, Lin, needed. The goal of this workshop is to develop Sminchisescu 12:00 PM novel machine learning methods to optimize and Since we are a small workshop, we will hold the accelerate software and hardware systems. poster sessions during the day, including all the breaks as the authors wish. Machine Learning for Systems is an interdisciplinary workshop that brings together researchers in computer architecture and systems and machine learning. This workshop is Abstract 15: Audrey Durand, Douwe Kiela, meant to serve as a platform to promote Kamalika Chaudhuri moderated by Yann discussions between researchers in the Dauphin in Science meets Engineering of workshops target areas. Deep Learning, Durand, Chaudhuri, Dauphin, Firat, Gorur, Kiela 03:00 PM This workshop is part two of a two-part series The mini-panel with Audrey Durand, Douwe Kiela, with one day focusing on ML for Systems and the Kamalika Chaudhuri will be moderated by Yann other on Systems for ML. Although the two workshops are being led by diferent organizers, 91 Dec. 14, 2019

we are coordinating our call for papers to ensure 05:20 Panel that the workshops complement each other and PM that submitted papers are routed to the appropriate venue.

The third Conversational AI workshop – Schedule today's practice and tomorrow's potential

Alborz Geramifard, Jason Williams, Bill Byrne, Asli 09:00 Opening Celikyilmaz, Milica Gasic, Dilek Hakkani-Tur, Matt AM Henderson, Luis Lastras, Mari Ostendorf 09:10 Invited Speaker: Eytan Bakshy AM Bakshy West 205 - 207, Sat Dec 14, 08:00 AM 09:45 Break In the span of only a few years, conversational AM systems have become commonplace. Every day, 10:30 Poster Session 1 Mao, Nathan, Baldini, millions of people use natural-language interfaces AM Sivakumar, Wang, Magalle Hewa, Shi, such as Siri, Google Now, Cortana, Alexa and Kaufman, Fang, Zhou, Ding, He, Lubin others via in-home devices, phones, or 11:00 Contributed Talk 1: A Weak messaging channels such as Messenger, Slack, AM Supervision Approach to Detecting Skype, among others. At the same time, interest Visual Anomalies for Automated among the research community in conversational Testing of Graphics Units Szeskin systems has blossomed: for supervised and 11:15 Contributed Talk 2: Learned TPU reinforcement learning, conversational systems AM Cost Model for XLA Tensor Programs often serve as both a benchmark task and an Kaufman inspiration for new ML methods at conferences 11:30 Contributed Talk 3: Learned Multi- which don't focus on speech and language per AM dimensional Indexing Nathan se, such as NIPS, ICML, IJCAI, and others. Such 11:45 Contributed Talk 4: Neural Hardware movement has not been unnoticed by major AM Architecture Search Lin publications. This year in collaboration with AAAI 12:00 Lunch community, the AI magazine will have a special PM issue on conversational AI (https://tinyurl.com/ 01:45 Invited Speaker: Jef Dean Dean y6shq2ld). Moreover, research community PM challenge tasks are proliferating, including the seventh Dialog Systems Technology Challenge 02:15 Invited Speaker: Akanksha Jain Jain (DSTC7), the Amazon Alexa prize, and the PM Conversational Intelligence Challenge live 02:45 Contributed Talk 5: Predictive competitions at NIPS (2017, 2018). PM Precompute with Recurrent Neural Networks Wang Following the overwhelming participation in our 03:00 Poster Session 2 Wang, Lin, Duan, last two NeurIPS workshops: PM Paliwal, Haj-Ali, Marcus, Hope, Xu, Le, 2017: 9 invited talks, 26 submissions, 3 oral Sun, Cutler, Nathan, Sun papers, 13 accepted papers, 37 reviewers 03:30 Break 2018: 4 invited talks, 42 submission, 6 oral PM papers, 23 accepted papers, 58 reviewers, we are 04:15 Contributed Talk 6: Zero-Shot excited to continue promoting cross-pollination of PM Learning for Fast Optimization of ideas between academic research centers and Computation Graphs Paliwal industry. The goal of this workshop is to bring 04:30 Invited Speaker: Ion Stoica Stoica together researchers and practitioners in this PM area, to clarify impactful research problems, 04:55 Invited Speaker: Mohammad understand well-founded methods, share fndings PM Alizadeh Alizadeh from large-scale real-world deployments, and generate new ideas for future lines of research. 92 Dec. 14, 2019

This one day workshop will include invited talks 04:45 Contributed talk 8: Recurrent and a panel from academia and industry, PM Chunking Mechanisms for contributed work, and open discussion. Conversational Machine Reading Comprehension 05:00 Panel discussion Schedule PM 05:50 Closing Geramifard, Williams 08:45 Opening Remarks Williams PM AM 08:55 Invited talk - Alan Ritter AM Document Intelligence 09:25 Contributed talk 1

AM Nigel Dufy, Rama Akkiraju, Tania Bedrax Weiss, Paul 09:40 Poster lighting round Zheng, Søgaard, Bennett, Hamid Reza Motahari-Nezhad AM Saleh, Jang, Gong, Florez, Li, Madotto, Nguyen, Kulikov, einolghozati, Wang, West 208 + 209, Sat Dec 14, 08:00 AM Eric, Hansen, Lubis, Wu Business documents are central to the operation 09:55 Posters + cofee break of business. Such documents include sales AM agreements, vendor contracts, mortgage terms, 10:40 Invited Talk - Ryuichiro Higashinaka loan applications, purchase orders, invoices, AM Higashinaka fnancial statements, employment agreements 10:55 Contributed talk 2: Persona-aware and a wide many more. The information in such AM Dialogue Generation with Enriched business documents is presented in natural Profle language, and can be organized in a variety of 11:10 Contributed talk 3: HSCJN: A Holistic ways from straight text, multi-column formats, AM Semantic Constraint Joint Network and a wide variety of tables. Understanding these for Diverse Response Generation documents is made challenging due to 11:25 Contributed talk 4: Hierarchical inconsistent formats, poor quality scans and OCR, AM Reinforcement Learning for Open- internal cross references, and complex document Domain Dialog structure. Furthermore, these documents often 11:55 Lunch refect complex legal agreements and reference, AM explicitly or implicitly, regulations, legislation, 01:45 Invited talk - David Traum Traum case law and standard business practices. PM The ability to read, understand and interpret 02:15 Invited talk - Y-Lan Boureau Boureau business documents, collectively referred to here PM as “Document Intelligence”, is a critical and challenging application of artifcial intelligence 02:45 Contributed talk 5: Domain Transfer (AI) in business. While a variety of research has PM in Dialogue Systems without Turn- advanced the fundamentals of document Level Supervision understanding, the majority have focused on 03:00 Contributed talk 6: Reinforcement documents found on the web which fail to PM Learning of Multi-Domain Dialog capture the complexity of analysis and types of Policies Via Action Embeddings understanding needed across business 03:15 Contributed talk 7: Bayes-Adaptive documents. Realizing the vision of document PM Monte-Carlo Planning and Learning intelligence remains a research challenge that for Goal-Oriented Dialogues requires a multi-disciplinary perspective spanning 03:30 Posters + cofee break not only natural language processing and PM understanding, but also computer vision, 04:15 Invited talk - Gabriel Skantze knowledge representation and reasoning, PM information retrieval, and more -- all of which have been profoundly impacted and advanced by 93 Dec. 14, 2019

neural network-based approaches and deep Abstract 2: David Lewis: Artifcial Intelligence learning in the last few years. in Legal Discovery in Document Intelligence, We propose to organize a workshop for AI Lewis 08:10 AM researchers, academics and industry practitioners to discuss the opportunities and challenges for Abstract: In December 2006, a change to the US document intelligence. Federal Rules of Civil Procedure made “electronically stored information” – efectively every bit of storage in an enterprise – fair game Schedule for discovery requests in civil litigation. The result was a multi-billion dollar electronic discovery industry, a remarkable embrace by lawyers and 08:00 Opening Remarks judges of the artifacts of experimental machine AM learning (learning curves, efectiveness 08:10 David Lewis: Artifcial Intelligence in estimates, active learning,...), and a torrent of AM Legal Discovery Lewis technical challenges for machine learning, 09:05 Ndapa Nakashole: Generalizing natural language processing, information AM Representations of Language for retrieval, and statistics. I will discuss the state of Documents Analysis across Diferent e-discovery science and technology, and its Domains Nakashole spread to new applications such as internal 10:00 Cofee Break investigation and breach response. AM Biography: David D. Lewis, Ph.D. is Chief Data 10:30 Poster Teaser Presentations Scientist at Brainspace, a Cyxtera business, AM where he leads their research eforts as well as 12:05 Posters Denk, Androutsopoulos, the machine learning software development PM Bakhteev, Kane, Stojanov, Park, team. Prior to joining Brainspace, he was Mamidibathula, Liepieshov, Höhne, Feng, variously a freelance consultant, corporate Bayraktar, Aruleba, OGALTSOV, researcher (Bell Labs, AT&T Labs), research Kuznetsova, Bennett, , Fadnis, Lastras, professor, and software company co-founder. Jabbarzadeh Gangeh, Reisswig, Elwany, Dave has published more than 40 peer-reviewed Chalkidis, DeGange, Zhang, de Oliveira, scientifc publications and 9 patents. He was Koçyiğit, Dong, Liao, Hwang elected a Fellow of the American Association for 01:30 Rajasekar Krishnamurthy: Document Advancement of Science in 2006 for foundational PM Intelligence for Enterprise AI work in text categorization, and won a Test of Applications: Requirements & Time Award from ACM SIGIR in 2017 for his paper Research Challenges Krishnamurthy w/ Gale introducing uncertainty sampling 02:30 Asli Celikyilmaz: Learning Structure PM in Text Generation Celikyilmaz 03:30 Cofee Break Abstract 3: Ndapa Nakashole: Generalizing PM Representations of Language for 04:00 Discussion: Document Intelligence Documents Analysis across Diferent PM Research Challenges & Directions Domains in Document Intelligence, Nakashole 09:05 AM 05:00 Best Paper Talk: BERTGrid PM Contextualized Embedding for 2D Abstract. Labeled data for tasks such as Document Representation and information extraction, question answering, text Understanding classifcation, and other types of document 05:30 Summary of Workshop and Closing analysis are often drawn from a limited set of PM Remarks document types and genres because of availability, and cost. At test time, we would like to apply the trained models to diferent document Abstracts (6): types and genres. However, a model trained on one dataset often fails to generalize to data drawn from distributions other than that of the 94 Dec. 14, 2019

training data. In this talk, I will talk about our Nigel P. Dufy. work on generalizing representations of “CORD: A Consolidated Receipt Dataset for Post- language, and discuss some of the document OCR Parsing”, by Seunghyun Park, Seung Shin, types we are studying. Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Biography: Ndapa Nakashole is an Assistant Seo, Hwalsuk Lee. Professor at the University of California, San “On recognition of Cyrillic Text”, by Kostiantyn Diego, where she teaches and carries out Liepieshov, Oles Dobosevych. research on Statistical Natural Language “Representation Learning in Geology and Processing. Before that she was postdoctoral GilBERT”, by Zikri Bayraktar, Hedi Driss, Marie scholar at Carnegie Mellon University. She Lefranc. obtained her PhD from Saarland University and “Neural Contract Element Extraction Revisited”, the Max Planck Institute for Informatics. She by Ilias Chalkidis, Manos Fergadiotis, Prodromos completed undergraduate studies in Computer Malakasiotis, Ion Androutsopoulos. Science at the University of Cape Town, South “Doc2Dial: a Framework for Dialogue Composition Africa. Grounded in Business Documents”, by Song Feng, Kshitij Fadni, Q. Vera Liao, Luis A. Lastras. “On Domain Transfer When Predicting Intent in

Abstract 5: Poster Teaser Presentations in Text”, by Petar Stojanov, Ahmed Hassan Document Intelligence, 10:30 AM Awadallah, Paul Bennett, Saghar Hosseini. “BERT Goes to Law School: Quantifying the Papers presented as re follows: Competitive Advantage of Access to Large Legal “Repurposing Decoder-Transformer Language Corpora in Contract Understanding", by Emad Models for Abstractive Summarization” by Luke Elwany, Dave Moore, Gaurav Oberoi. de Oliveira, and Alfredo Láinez Rodrigo. “Information Extraction from Text Regions with “From Stroke to Finite Automata: An Ofine Complex Structure”, by Kaixuan Zhang, Zejiang Recognition Approach”, by Kehinde Aruleba. Shen, Jie Zhou, Melissa Dell. “Post-OCR parsing: building simple and robust “DeepErase: Unsupervised Ink Artifact Removal in parser via BIO tagging”, by Wonseok Hwang, Document Text Images”, Yike Qi, W. Ronny Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Huang, Qianqian Li, Jonathan L. Degange. Seunghyun Park, Sungrae Park, Junyeop Lee, “Towards Neural Similarity Evaluator” , Hassan Bado Lee, Hwalsuk Lee. Kané, Yusuf Kocyigit, Pelkins Ajanoh, Ali Abdalla, “CrossLang: the system of cross-lingual Mohamed Coulibali. plagiarism detection”, by Oleg Bakhteev, Alexandr Ogaltsov, Andrey Khazov, Kamil Safn, Rita Kuznetsova. Abstract 6: Posters in Document Intelligence, “BERTgrid: Contextualized Embedding for 2D Denk, Androutsopoulos, Bakhteev, Kane, Document Representation and Understanding”, Stojanov, Park, Mamidibathula, Liepieshov, Timo I. Denk, Christian Reisswig. Höhne, Feng, Bayraktar, Aruleba, OGALTSOV, “Chargrid-OCR: End-to-end trainable Optical Kuznetsova, Bennett, , Fadnis, Lastras, Character Recognition through Semantic Jabbarzadeh Gangeh, Reisswig, Elwany, Chalkidis, Segmentation and Object Detection” by Christian DeGange, Zhang, de Oliveira, Koçyiğit, Dong, Reisswig, Anoop R Katti, Marco Spinaci, Johannes Liao, Hwang 12:05 PM Höhne. “SVDocNet: Spatially Variant U-Net for Blind Papers presented as re follows: Document Deblurring”, by Bharat Mamidibathula, “Repurposing Decoder-Transformer Language Prabir Kumar Biswas. Models for Abstractive Summarization” by Luke “Semantic Structure Extraction for Spreadsheet de Oliveira, and Alfredo Láinez Rodrigo. Tables with a Multi-task Learning Architecture", by “From Stroke to Finite Automata: An Ofine Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Recognition Approach”, by Kehinde Aruleba. Dongmei Zhang. “Post-OCR parsing: building simple and robust “Document Enhancement System Using Auto- parser via BIO tagging”, by Wonseok Hwang, encoders", by Mehrdad J. Gangeh, Sunil R. Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Tiyyagura, Sridhar V. Dasaratha, Hamid Motahari, Seunghyun Park, Sungrae Park, Junyeop Lee, 95 Dec. 14, 2019

Bado Lee, Hwalsuk Lee. Kané, Yusuf Kocyigit, Pelkins Ajanoh, Ali Abdalla, “CrossLang: the system of cross-lingual Mohamed Coulibali. plagiarism detection”, by Oleg Bakhteev, Alexandr Ogaltsov, Andrey Khazov, Kamil Safn, Rita Kuznetsova. Abstract 7: Rajasekar Krishnamurthy: “BERTgrid: Contextualized Embedding for 2D Document Intelligence for Enterprise AI Document Representation and Understanding”, Applications: Requirements & Research Timo I. Denk, Christian Reisswig. Challenges in Document Intelligence, “Chargrid-OCR: End-to-end trainable Optical Krishnamurthy 01:30 PM Character Recognition through Semantic Segmentation and Object Detection” by Christian Abstract: Enterprise applications and Business Reisswig, Anoop R Katti, Marco Spinaci, Johannes processes rely heavily on experts and knowledge Höhne. workers reading, searching and analyzing “SVDocNet: Spatially Variant U-Net for Blind business documents to perform their daily tasks. Document Deblurring”, by Bharat Mamidibathula, For instance, legal professionals read contracts to Prabir Kumar Biswas. identify non-standard clauses, risks and “Semantic Structure Extraction for Spreadsheet exposures. Loan ofcers analyze borrower Tables with a Multi-task Learning Architecture", by business documents to understand income, Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, expense and contractual commitments before Dongmei Zhang. making lending decisions. “Document Enhancement System Using Auto- Document Intelligence is the ability for a system encoders", by Mehrdad J. Gangeh, Sunil R. to read, understand and interpret business Tiyyagura, Sridhar V. Dasaratha, Hamid Motahari, documents through the application of AI-based Nigel P. Dufy. technologies. It has the potential to signifcantly “CORD: A Consolidated Receipt Dataset for Post- improve an employee's productivity and an OCR Parsing”, by Seunghyun Park, Seung Shin, organization's efectiveness by augmenting the Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon expert in their daily task. Several challenges arise Seo, Hwalsuk Lee. in this context such as variability in document “On recognition of Cyrillic Text”, by Kostiantyn authoring, necessity to contextually understand Liepieshov, Oles Dobosevych. textual and tabular content and organization/role- “Representation Learning in Geology and specifc variations in semantic interpretations. GilBERT”, by Zikri Bayraktar, Hedi Driss, Marie Furthermore, as experts rely on document Lefranc. intelligence, they expect the system to exhibit “Neural Contract Element Extraction Revisited”, key properties such as explainability, consistent by Ilias Chalkidis, Manos Fergadiotis, Prodromos model evolution and ability to enhance the Malakasiotis, Ion Androutsopoulos. system's knowledge with a few examples. “Doc2Dial: a Framework for Dialogue Composition In this talk, using real-world enterprise Grounded in Business Documents”, by Song application examples, I frst describe how Feng, Kshitij Fadni, Q. Vera Liao, Luis A. Lastras. document intelligence can play a key role in “On Domain Transfer When Predicting Intent in augmenting enterprise AI applications. I then Text”, by Petar Stojanov, Ahmed Hassan outline key challenges that arise in business Awadallah, Paul Bennett, Saghar Hosseini. document understanding and desiderata that “BERT Goes to Law School: Quantifying the enterprise AI applications and users expect. I Competitive Advantage of Access to Large Legal conclude with a set of open research challenges Corpora in Contract Understanding", by Emad that need to be tackled spanning across language Elwany, Dave Moore, Gaurav Oberoi. understanding, knowledge representation and “Information Extraction from Text Regions with reasoning, deep learning and systems research. Complex Structure”, by Kaixuan Zhang, Zejiang Biography: Rajasekar Krishnamurthy is a Principal Shen, Jie Zhou, Melissa Dell. Research Staf Member and Senior Manager “DeepErase: Unsupervised Ink Artifact Removal in leading the Watson Discovery team in the Watson Document Text Images”, Yike Qi, W. Ronny AI organization. Prior to this role, he was a Huang, Qianqian Li, Jonathan L. Degange. Principal Research Staf Member at IBM Research “Towards Neural Similarity Evaluator” , Hassan - Almaden leading the NLP, Entity Resolution and 96 Dec. 14, 2019

Discovery department. Rajasekar's technical Washington. She is also an Afliate Professor at interests focus around helping enterprises derive the University of Washington. Her research business insights from a variety of unstructured interests are mainly in deep learning and natural content sources ranging from public and third- language, specifcally on language generation party data sources to governing business with long-term coherence, language documents within an enterprise. Rajasekar has understanding, language grounding with vision, expertise in building scalable and usable and building intelligent agents for human- analytics tools for individual stages in analyzing computer interaction She has received several unstructured documents, such as text analytics, “best of” awards including NAFIPS 2007, document structure analysis and entity Semantic Computing 2009, and CVPR 2019. resolution. He is a member of the IBM Academy of Technology. He received a B.Tech in Computer Science and Engineering from the Indian Institute of Technology-Madras, and a Ph.D.in Computer Learning Transferable Skills Science from the University of Wisconsin- Madison. Marwan Mattar, Arthur Juliani, Danny Lange, Matthew Crosby, Benjamin Beyret

West 211 - 214, Sat Dec 14, 08:00 AM Abstract 8: Asli Celikyilmaz: Learning Structure in Text Generation in Document After spending several decades on the margin of Intelligence, Celikyilmaz 02:30 PM AI, reinforcement learning has recently emerged as a powerful framework for developing Abstract: Automatic text generation enables intelligent systems that can solve complex tasks computers to summarize text, describe pictures in real-world environments. This has had a to visually impaired, write stories or articles tremendous impact on a wide range of tasks about an event, have conversations in customer- ranging from playing games such as Go and service, chit-chat with individuals, and other StarCraft to learning dexterity. However, one settings, and customize content based on the attribute of intelligence that still eludes modern characteristics and goal of the human learning systems is generalizability. Until very interlocutor. Neural text generation (NLG) – using recently, the majority of reinforcement learning neural network models to generate coherent text research involved training and testing algorithms – have seen a paradigm shift in the last years, on the same, sometimes deterministic, caused by the advances in deep contextual environment. This has resulted in algorithms that language modeling (e.g., LSTMs, GPT, GPT2) and learn policies that typically perform poorly when transfer learning (e.g., ELMo, BERT). While these deployed in environments that difer, even tools have dramatically improved the state of slightly, from those they were trained on. Even NLG, particularly for low resources tasks, state-of- more importantly, the paradigm of task-specifc the-art NLG models still face many challenges: a training results in learning systems that scale lack of diversity in generated text, commonsense poorly to a large number of (even interrelated) violations in depicted situations, difculties in tasks. making use of factual information, and difculties in designing reliable evaluation metrics. In this Recently there has been an enduring interest in talk I will discuss existing work on text only developing learning systems that can learn transformers that specifes how to generate long- transferable skills. This could mean robustness to text with better discourse structure and narrative changing environment dynamics, the ability to fow, generate multi-document summaries, build quickly adapt to environment and task variations automatic knowledge graphs with commonsense or the ability to learn to perform multiple tasks at transformers as text generators. I will conclude once (or any combination thereof). This interest the talk with a discussion of current challenges has also resulted in a number of new data sets and shortcomings of neural text generation, and challenges (e.g. Obstacle Tower Environment, pointing to avenues for future research. Animal-AI, CoinRun) and an urgency to Biography: Asli Celikyilmaz is a Principal standardize the metrics and evaluation protocols Researcher at Microsoft Research in Redmond, 97 Dec. 14, 2019

to better assess the generalization abilities of 05:45 Closing Remarks Mattar, Juliani, novel algorithms. We expect this area to continue PM Crosby, Beyret, Lange to increase in popularity and importance, but this can only happen if we manage to build consensus on which approaches are promising, and, equally Sets and Partitions important, how to test them.

Nicholas Monath, Manzil Zaheer, Andrew McCallum, The workshop will include a mix of invited Ari Kobren, Junier Oliva, Barnabas Poczos, Ruslan speakers, accepted papers (oral and poster Salakhutdinov sessions) and a panel discussion. The workshop welcomes both theoretical and applied research, West 215 + 216, Sat Dec 14, 08:00 AM in addition to novel data sets and evaluation protocols. Classic problems for which the input and/or output is set-valued are ubiquitous in machine learning. For example, multi-instance learning, Schedule estimating population statistics, and point cloud classifcation are all problem domains in which the input is set-valued. In multi-label 09:00 Opening Remarks Mattar, Juliani, classifcation the output is a set of labels, and in AM Crosby, Beyret, Lange clustering, the output is a partition. New tasks 09:15 Challenges of Deep RL in Complex that take sets as input are also rapidly emerging AM Environments Hadsell in a variety of application areas including: high 10:00 Cofee Break energy physics, cosmology, crystallography, and AM art. As a natural means of succinctly capturing 10:30 Environments and Data Sets Cobbe, large collections of items, techniques for learning AM De Fabritiis, Makoviichuk representations of sets and partitions have 11:20 Vladlen Koltun (Intel) Koltun signifcant potential to enhance scalability, AM capture complex dependencies, and improve 12:05 Lunch interpretability. The importance and potential of PM improved set processing has led to recent work 01:30 Innate Bodies, Innate Brains, and on permutation invariant and equivariant PM Innate World Models Ha representations (Ravanbakhsh et al, 2016; 02:15 Oral Presentations Petangoda, Zaheer et al, 2017; Ilse et al, 2018; Hartford et al, PM Pascual-Diaz, Grau-Moya, Marinier, 2018; Lee et al, 2019, Cotter et al, 2019, Bloom- Pietquin, Efros, Isola, Darrell, Lu, Pathak, Reddy & Teh, 2019, and more) and continuous Ferret representations of set-based outputs and partitions (Tai and Lin, 2012; Belanger & 03:15 Poster Presentations Mehta, McCallum, 2015; Wiseman et al, 2016; Caron et PM Lampinen, Chen, Pascual-Diaz, Grau- al, 2018; Zhang et al, 2019; Vikram et al 2019). Moya, Faisal, Tompson, Lu, Khetarpal, Klissarov, Bacon, Precup, Kurutach, The goal of this workshop is to explore: Tamar, Abbeel, He, Igl, Whiteson, - Permutation invariant and equivariant Boehmer, Marinier, Pietquin, Hausman, representations; empirical performance, Levine, Finn, Yu, Lee, Eysenbach, limitations, implications, inductive biases of Parisotto, Xing, Salakhutdinov, Ren, proposed representations of sets and partitions, Anandkumar, Pathak, Lu, Darrell, Efros, as well as rich models of interaction among set Isola, Liu, Han, Niu, Sugiyama, Kumar, elements; Petangoda, Ferret, McClelland, Liu, Garg, - Inference methods for predicting sets or Lange clusterings; approaches based on gradient- 04:15 Multi-Task Reinforcement Learning descent, continuous representations, amenable PM and Generalization Hofmann to end-to-end optimization with other models; 05:00 Solving Rubik’s Cube with a Robot - New applications of set and partition-based PM Hand Zaremba models. 98 Dec. 14, 2019

09:45 Cofee Break & Poster Session 1 The First Workshop on Sets and Partitions, to be AM Zhang, Hare, Prugel-Bennett, Leung, held as a part of the NeurIPS 2019 conference, Flaherty, Wiratchotisatian, Epasto, focuses on models for tasks with set-based Lattanzi, Vassilvitskii, Zadimoghaddam, inputs/outputs as well as models of partitions and Tulabandhula, Fuchs, Kosiorek, Posner, novel clustering methodology. The workshop Hang, Goldie, Ravi, Mirhoseini, Xiong, welcomes both methodological and theoretical Ren, Liao, Urtasun, Zhang, Borassi, Luo, contributions, and also new applications. Trapp, Dubourg-Felonneau, Kussad, Connections to related problems in optimization, Bender, Zaheer, Oliva, Stypułkowski, algorithms, theory as well as investigations of Zieba, Dill, Li, Ge, Kang, Parker Jones, learning approaches to set/partition problems are Wong, Payne, Li, Nazi, Erdem, Erdem, also highly relevant to the workshop. We invite O'Connor, Garcia, Zamorski, Chorowski, both paper submissions and submissions of open Sinha, Cliford, Cassidy problems. We hope that the workshops will 10:30 Invited Talk - Siamak Ravanbakhsh - inspire further progress in this important feld. AM Equivariant Multilayer Perceptrons Ravanbakhsh Organizing Committee: 11:15 Contributed Talk - Towards deep Andrew McCallum, UMass Amherst AM amortized clustering Lee, Lee, Teh Ruslan Salakhutdinov, CMU 11:30 Contributed Talk - Fair Hierarchical Barnabas Poczos, CMU AM Clustering Ahmadian, Epasto, Knittel, Junier Oliva, UNC Chapel Hill Kumar, Mahdian, Pham Manzil Zaheer, Google Research 11:45 Invited Talk - Abhishek Khetan - Ari Kobren, UMass Amherst AM Molecular geometries as point Nicholas Monath, UMass Amherst clouds: Learning physico-chemical with senior advisory support from Alex Smola. properties using DeepSets Khetan 12:30 Lunch Break (on your own) Invited Speakers: PM Siamak Ravanbakhsh Abhishek Khetan 02:00 Contributed Talk - Limitations of Eunsu Kang PM Deep Learning on Point Clouds Bueno Amr Ahmed 02:15 Contributed Talk - Chirality Nets: Stefanie Jegelka PM Exploiting Structure in Human Pose Regression Yeh, Hu, Schwing 02:30 Invited Talk - Eunsu Kang - Sets for Schedule PM Arts Kang 03:15 Cofee Break & Poster Session 2 Lee, 08:45 Opening Remarks Zaheer, Monath, PM Lee, Teh, Yeh, Hu, Schwing, Ahmadian, AM Kobren, Oliva, Poczos, Salakhutdinov, Epasto, Knittel, Kumar, Mahdian, Bueno, McCallum Sanghi, Jayaraman, Arroyo-Fernández, 09:00 Invited Talk - Stefanie Jegelka - Set Hryniowski, Mathur, Singh, Haddadan, AM Representations in Graph Neural Portilheiro, Zhang, Yuksekgonul, Arias Networks and Reasoning Jegelka Figueroa, Maurya, Ravindran, NIELSEN, Pham, Payan, McCallum, Mehta, Sun 04:15 Invited Talk - Alexander J. Smola - PM Sets and symmetries Smola 05:00 Panel Discussion PM 05:40 Closing Remarks PM

Abstracts (3): 99 Dec. 14, 2019

Abstract 3: Cofee Break & Poster Session 1 in Sets and Partitions, Zhang, Hare, Prugel- Clustering by Learning to Optimize Normalized Bennett, Leung, Flaherty, Wiratchotisatian, Cuts. Azade Nazi, Will Hang, Anna Goldie, Sujith Epasto, Lattanzi, Vassilvitskii, Zadimoghaddam, Ravi, Azalia Mirhoseini Tulabandhula, Fuchs, Kosiorek, Posner, Hang, Goldie, Ravi, Mirhoseini, Xiong, Ren, Liao, Deformable Filter Convolution for Point Cloud Urtasun, Zhang, Borassi, Luo, Trapp, Dubourg- Reasoning. Yuwen Xiong, Mengye Ren, Renjie Felonneau, Kussad, Bender, Zaheer, Oliva, Liao, Wong, Raquel Urtasun Stypułkowski, Zieba, Dill, Li, Ge, Kang, Parker Jones, Wong, Payne, Li, Nazi, Erdem, Erdem, Learning Embeddings from Cancer Mutation Sets O'Connor, Garcia, Zamorski, Chorowski, Sinha, for Classifcation Tasks. Geofroy Dubourg- Cliford, Cassidy 09:45 AM Felonneau, Yasmeen Kussad, Dominic Kirkham, John Cassidy, Harry W Cliford Poster Session 1 Paper Titles & Authors:

Exchangeable Generative Models with Flow Deep Set Prediction Networks. Yan Zhang, Scans. Christopher M. Bender, Kevin O'Connor, Jonathon Hare, Adam Prügel-Bennett Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs. Joshua Conditional Invertible Flow for Point Cloud Payne Generation. Stypulkowski Michal, Zamorski Maciej, Zieba Maciej, Chorowski Jan FSPool: Learning Set Representations with Featurewise Sort Pooling. Yan Zhang, Jonathon Getting Topology and Point Cloud Generation to Hare, Adam Prügel-Bennett Mesh. Austin Dill, Chun-Liang Li, Songwei Ge, Eunsu Kang Deep Learning Features Through Dictionary Learning with Improved Clustering for Image Distributed Balanced Partitioning and Classifcation. Shengda Luo, Alex Po Leung, Haici Applications in Large-scale Load Balancing. Aaron Zhang Archer, Kevin Aydin, MohammadHossein Bateni, Vahab Mirrokni, Aaron Schild, Ray Yang, Richard Globally Optimal Model-based Clustering via Zhuang Mixed Integer Nonlinear Programming. Patrick Flaherty, Pitchaya Wiratchotisatian, Andrew C. Trapp Abstract 9: Contributed Talk - Limitations of

Sliding Window Algorithms for k-Clustering Deep Learning on Point Clouds in Sets and Problems. Michele Borassi, Alessandro Epasto, Partitions, Bueno 02:00 PM Silvio Lattanzi, Sergei Vassilvitski, Morteza Limitations of Deep Learning on Point Clouds Zadimoghaddam Christian Bueno, Alan G. Hylton

Optimized Recommendations When Customers Select Multiple Products. Prasoon Patidar, Deeksha Sinha, Theja Tulabandhula Abstract 12: Cofee Break & Poster Session 2 in Sets and Partitions, Lee, Lee, Teh, Yeh, Hu, Schwing, Ahmadian, Epasto, Knittel, Kumar, Manipulating Person Videos with Natural Language. Levent Karacan, Mehmet Gunel, Aykut Mahdian, Bueno, Sanghi, Jayaraman, Arroyo- Erdem, Erkut Erdem Fernández, Hryniowski, Mathur, Singh, Haddadan, Portilheiro, Zhang, Yuksekgonul, Arias Figueroa, Maurya, Ravindran, NIELSEN, Pham, Payan, Permutation Invariance and Relational Reasoning McCallum, Mehta, Sun 03:15 PM in Multi-Object Tracking. Fabian B. Fuchs, Adam R. Kosiorek, Li Sun, Oiwi Parker Jones, Ingmar Poster Session 2 Paper Titles & Authors: Posner. 100 Dec. 14, 2019

Towards deep amortized clustering. Juho Lee, Yoonho Lee, Yee Whye Teh Hypergraph Partitioning using Tensor Eigenvalue Decomposition. Deepak Maurya, Balaraman Chirality Nets: Exploiting Structure in Human Ravindran, Shankar Narasimhan Pose Regression. Raymond Yeh, Yuan-Ting Hu, Alexander Schwing Information Geometric Set Embeddings: From Sets to Distributions. Ke Sun, Frank Nielsen Fair Hierarchical Clustering. Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Document Representations using Fine-Grained Mohammad Mahdian, Philip Pham Topics. Justin Payan, Andrew McCallum

Limitations of Deep Learning on Point Clouds. Christian Bueno, Alan G. Hylton Context and Compositionality in Biological How Powerful Are Randomly Initialized Pointcloud and Artifcial Neural Systems Set Functions? Aditya Sanghi, Pradeep Kumar

Jayaraman Javier Turek, Shailee Jain, Alexander Huth, Leila Wehbe, Emma Strubell, Alan Yuille, Tal Linzen, On the Possibility of Rewarding Structure Christopher Honey, Kyunghyun Cho Learning Agents: Mutual Information on Linguistic West 217 - 219, Sat Dec 14, 08:00 AM Random Sets. Ignacio Arroyo-Fernández, Mauricio Carrasco-Ruiz, José Anibal Arias-Aguilar The ability to integrate semantic information across narratives is fundamental to language Modelling Convolution as a Finite Set of understanding in both biological and artifcial Operations Through Transformation Semigroup cognitive systems. In recent years, enormous Theory. Andrew Hryniowski, Alexander Wong strides have been made in NLP and Machine Learning to develop architectures and techniques HCA-DBSCAN: HyperCube Accelerated Density that efectively capture these efects. The feld Based Spatial Clustering for Applications with has moved away from traditional bag-of-words Noise. Vinayak Mathur, Jinesh Mehta, Sanjay approaches that ignore temporal ordering, and Singh instead embraced RNNs, Temporal CNNs and Transformers, which incorporate contextual Finding densest subgraph in probabilistically information at varying timescales. While these evolving graphs. Sara Ahmadian, Shahrzad architectures have lead to state-of-the-art Haddadan performance on many difcult language understanding tasks, it is unclear what Representation Learning with Multisets. Vasco representations these networks learn and how Portilheiro exactly they incorporate context. Interpreting these networks, systematically analyzing the PairNets: Novel Fast Shallow Artifcial Neural advantages and disadvantages of diferent Networks on Partitioned Subspaces. Luna Zhang elements, such as gating or attention, and refecting on the capacity of the networks across Fair Correlation Clustering. Sara Ahmadian, various timescales are open and important Alessandro Epasto, Ravi Kumar, Mohammad questions. Mahdian

On the biological side, recent work in Learning Maximally Predictive Prototypes in neuroscience suggests that areas in the brain are Multiple Instance Learning. Mert Yuksekgonul, organized into a temporal hierarchy in which Ozgur Emre Sivrikaya, Mustafa Gokce Baydogan diferent areas are not only sensitive to specifc semantic information but also to the composition Deep Clustering using MMD Variational of information at diferent timescales. Autoencoder and Traditional Clustering Computational neuroscience has moved in the Algorithms. Jhosimar Arias 101 Dec. 14, 2019

direction of leveraging deep learning to gain 09:00 Gina Kuperberg - How probabilistic insights about the brain. By answering questions AM is language comprehension in the on the underlying mechanisms and brain? Insights from multimodal representational interpretability of these artifcial neuroimaging studies Kuperberg networks, we can also expand our understanding 09:45 Poster Session + Break of temporal hierarchies, memory, and capacity AM efects in the brain. 10:30 Uncovering the compositional AM structure of vector representations In this workshop we aim to bring together with Role Learning Networks Soulos researchers from machine learning, NLP, and 10:40 Spiking Recurrent Networks as a neuroscience to explore and discuss how AM Model to Probe Neuronal Timescales computational models should efectively capture Specifc to Working Memory Kim the multi-timescale, context-dependent efects 10:50 Learning Compositional Rules via that seem essential for processes such as AM Neural Program Synthesis Nye language understanding. 11:00 Tom Mitchell - Understanding Neural AM Processes: Getting Beyond Where We invite you to submit papers related to the and When, to How Mitchell following (non-exahustive) topics: * Contextual sequence processing in the human 12:00 Poster Session + Lunch Nye, Kim, St brain PM Clere Smithe, Itoh, Florez, G. Djokic, * Compositional representations in the human Aenugu, Toneva, Schlag, Schwartz, brain Sobroza Marques, Sainath, Li, * Systematic generalization in deep learning Bommasani, Kim, Soulos, Frankland, * Compositionality in human intelligence Chirkova, Han, Kortylewski, Pang, * Compositionality in natural language Rabovsky, Mamou, Kumar, Marra * Understanding composition and temporal 02:00 Yoshua Bengio - Towards processing in neural network models PM compositional understanding of the * New approaches to compositionality and world by agent-based deep learning temporal processing in language Bengio * Hierarchical representations of temporal 03:00 Ev Fedorenko - Composition as the information PM core driver of the human language * Datasets for contextual sequence processing system Fedorenko * Applications of compositional neural networks 03:30 Break to real-world problems PM 04:00 Panel Discussion Willke, Fedorenko, Submissions should be up to 4 pages excluding PM Lee, Smolensky references, and should be NIPS format and 05:30 Closing remarks Wehbe anonymous. The review process is double-blind. PM

We also welcome published papers that are within the scope of the workshop (without re- Abstracts (9): formatting). This specifc papers do not have to be anonymous. They will only have a very light Abstract 1: Opening Remarks in Context and review process. Compositionality in Biological and Artifcial Neural Systems, Huth 08:00 AM

Schedule Note: schedule not fnal and may change

08:00 Opening Remarks Huth Abstract 2: Gary Marcus - Deep AM Understanding: The Next Challenge for AI in 08:15 Gary Marcus - Deep Understanding: Context and Compositionality in Biological AM The Next Challenge for AI Marcus 102 Dec. 14, 2019

and Artifcial Neural Systems, Marcus 08:15 imaging studies have shown us only where in the AM brain we can observe neural activity correlated with particular types of processing, and when. It Note: schedule not fnal and may change has taught us remarkably little about the key question of how the brain computes the neural representations we observe. Abstract 3: Gina Kuperberg - How probabilistic is language comprehension in The good news is that a new paradigm has begun the brain? Insights from multimodal to emerge over the past few years, to directly neuroimaging studies in Context and address the how question. The key idea in this Compositionality in Biological and Artifcial paradigm shift is to create explicit hypotheses Neural Systems, Kuperberg 09:00 AM concerning how computation is done in the brain, in the form of computer programs that perform Note: schedule not fnal and may change the same computation (e.g., visual object recognition, sentence processing, equation solving). Alternative hypotheses can then be Abstract 5: Uncovering the compositional tested to see which computer program aligns structure of vector representations with best with the observed neural activity when Role Learning Networks in Context and humans and the program process the same input Compositionality in Biological and Artifcial stimuli. We will use our work studying language Neural Systems, Soulos 10:30 AM processing as a case study to illustrate this new paradigm, in our case using ELMo and BERT deep By Paul Soulos, R. Thomas Mccoy, Tal Linzen, Paul neural networks as the computer programs that Smolensky process the same input sentences as the human. Using this case study, we will examine the potential and the limits of this new paradigm as a Abstract 6: Spiking Recurrent Networks as a route toward understanding how the brain Model to Probe Neuronal Timescales computes. Specifc to Working Memory in Context and Compositionality in Biological and Artifcial Neural Systems, Kim 10:40 AM Abstract 10: Yoshua Bengio - Towards by Robert Kim, Terry Sejnowski compositional understanding of the world by agent-based deep learning in Context and Compositionality in Biological and Artifcial Neural Systems, Bengio 02:00 PM Abstract 7: Learning Compositional Rules via Neural Program Synthesis in Context and Note: schedule not fnal and may change Compositionality in Biological and Artifcial Neural Systems, Nye 10:50 AM Abstract 13: Panel Discussion in Context and By Nye, Armando Solar-Lezama, Joshua Tenenbaum, Brenden Lake Compositionality in Biological and Artifcial Neural Systems, Willke, Fedorenko, Lee, Smolensky 04:00 PM

Abstract 8: Tom Mitchell - Understanding Note: schedule not fnal and may change Neural Processes: Getting Beyond Where and When, to How in Context and Compositionality in Biological and Artifcial Neural Systems, Mitchell 11:00 AM

Cognitive neuroscience has always sought to understand the computational processes that occur in the brain. Despite this, years of brain 103 Dec. 14, 2019

Robot Learning: Control and Interaction in 11:30 Invited Talk - Takayuki Osa Osa the Real World AM 12:00 Lunch Break Roberto Calandra, Kate Rakelly, Sanket Kamthe, PM Danica Kragic, Stefan Schaal, Markus Wulfmeier 01:30 Invited Talk - Raia Hadsell Hadsell PM West 220 - 222, Sat Dec 14, 08:00 AM 02:00 Invited Talk - Nima Fazeli Fazeli The growing capabilities of learning-based PM methods in control and robotics has precipitated 02:30 Posters 2 a shift in the design of software for autonomous PM systems. Recent successes fuel the hope that 03:30 Cofee Break robots will increasingly perform varying tasks PM working alongside humans in complex, dynamic 04:00 Contributed Talk (Best Paper) - environments. However, the application of PM Michelle Lee & Carlos Florensa learning approaches to real-world robotic systems Florensa, Lee has been limited because real-world scenarios 04:15 Invited Talk - Angela Schoellig introduce challenges that do not arise in PM Schoellig simulation. 04:45 Invited Talk - Edward Johns Johns In this workshop, we aim to identify and tackle PM the main challenges to learning on real robotic 05:15 Panel systems. First, most machine learning methods PM rely on large quantities of labeled data. While raw sensor data is available at high rates, the required variety is hard to obtain and the human Abstracts (4): efort to annotate or design reward functions is an even larger burden. Second, algorithms must Abstract 4: Posters 1 in Robot Learning: guarantee some measure of safety and Control and Interaction in the Real robustness to be deployed in real systems that World, 10:30 AM interact with property and people. Instantaneous All poster presenters are welcome to present at reset mechanisms, as common in simulation to both poster sessions. recover from even critical failures, present a Please see the list of accepted papers at our great challenge to real robots. Third, the real website: http://www.robot-learning.ml/2019/ world is signifcantly more complex and varied than curated datasets and simulations. Successful approaches must scale to this complexity and be able to adapt to novel Abstract 5: Contributed Talk - Laura Smith in situations. Robot Learning: Control and Interaction in the Real World, Smith 11:15 AM

Schedule AVID: Translating Human Demonstrations for Automated Learning

09:00 Introduction and opening remarks AM Abstract 10: Posters 2 in Robot Learning: 09:15 Invited Talk - Marc Deisenroth Control and Interaction in the Real AM Deisenroth (he/him) World, 02:30 PM 09:45 Cofee Break AM All poster presenters are welcome to present at 10:30 Posters 1 both poster sessions. AM Please see the list of accepted papers at our 11:15 Contributed Talk - Laura Smith Smith website: http://www.robot-learning.ml/2019/ AM 104 Dec. 14, 2019

Abstract 12: Contributed Talk (Best Paper) - Schedule Michelle Lee & Carlos Florensa in Robot Learning: Control and Interaction in the Real World, Florensa, Lee 04:00 PM 08:15 Welcome and Introduction AM Combining Model-Free and Model-Based 08:30 Transfer Learning for Text Strategies for Sample-Efcient Reinforcement AM Generation Radford Learning 09:00 Deepfakes: commodifcation, AM consequences and countermeasures Patrini 09:30 AI Art Gallery Overview Elliott NeurIPS Workshop on Machine Learning AM for Creativity and Design 3.0 09:45 Cofee Break AM Luba Elliott, Sander Dieleman, Adam Roberts, Jesse Engel, Tom White, Rebecca Fiebrink, Parag Mital, 10:30 Artist Lightning Talks Hastie, Petric, Christine Payne, Nao Tokui AM Thio 10:30 Yann LeCun (?) West 223 + 224, Sat Dec 14, 08:00 AM AM 11:00 Neural Painters: A learned Generative machine learning and machine AM diferentiable constraint for creativity have continued to grow and attract a generating brushstroke paintings wider audience to machine learning. Generative Nakano models enable new types of media creation 11:10 Transform the Set: Memory across images, music, and text - including recent AM Attentive Generation of Guided and advances such as StyleGAN, MuseNet and GPT-2. Unguided Image Collages Jetchev, This one-day workshop broadly explores issues in Vollgraf the applications of machine learning to creativity 11:20 Paper Dreams: An Interactive and design. We will look at algorithms for AM Interface for Generative Visual generation and creation of new media, engaging Expression Bernal, Zhou researchers building the next generation of generative models (GANs, RL, etc). We 11:30 Deep reinforcement learning for 2D investigate the social and cultural impact of these AM soft body locomotion Rojas new models, engaging researchers from HCI/UX 11:40 Towards Sustainable Architecture: communities and those using machine learning to AM 3D Convolutional Neural Networks develop new creative tools. In addition to for Computational Fluid Dynamics covering the technical advances, we also address Simulation and Reverse Design the ethical concerns ranging from the use of Workfow Musil biased datasets to the use of synthetic media 11:50 Human and GAN collaboration to such as “DeepFakes”. Finally, we’ll hear from AM create haute couture dress Seita, some of the artists and musicians who are Koga adopting machine learning including deep 12:00 Lunch Break learning and reinforcement learning as part of PM their own artistic process. We aim to balance the 01:30 Poster Session 1 Lee, Saeed, Broad, technical issues and challenges of applying the PM Gillick, Hertzmann, Aggarwal, Sung, latest generative models to creativity and design Champandard, Park, Mellor, Herrmann, with philosophical and cultural issues that Wu, Lee, Jieun, Han, jung, Kim surround this area of research. 02:30 How to Chain Trip Evans, Bechtolt, PM Kieswetter 03:00 Sougwen Chung Chung PM 03:30 Cofee Break PM 105 Dec. 14, 2019

04:15 MidiMe: Personalizing a MusicVAE Schedule PM model with user data Dinculescu 04:25 First Steps Towards Collaborative 08:00 Opening Remarks Lombaert, Glocker, PM Poetry Generation Uthus, Voitovich AM Konukoglu, de Bruijne, Feragen, Oguz, 04:35 Panel Discussion Teuwen PM 08:15 Keynote I – Rene Vidal (Johns 05:00 Poster Session 2 Saxena, Frosst, AM Hopkins University) Vidal PM Cabannes, Kogan, Dill, Sarkar, Moniz, 09:00 Oral Session I – Methods Weng, Kohl, Thio, , Coleman, De Bleser, AM Patra Quanz, Kereliuk, Achlioptas, Elhoseiny, Ge, Gomez, Brew 10:00 Cofee Break + Poster Session I AM Weng, Kohl, Taleb, Patra, Namdar, 05:05 Artwork Sarin, Bourached, Carr, Perkonigg, Gong, Imran, Abdi, Manakov, PM Zukowski, Zhou, Malakhova, Petric, Paetzold, Glocker, Sahoo, Fadnavis, Roth, Laurenzo, O'Brien, Wegner, Kishi, Burnam Liu, Zhang, Preuhs, Eitel, Trivedi, Weiss, Stern, Vazquez Romaguera, Hofmanninger, Kaku, Olatunji, Medical Imaging meets NeurIPS Razdaibiedina, Zhang 10:30 Keynote II – Julia Schnabel (King's Hervé Lombaert, Ben Glocker, Ender Konukoglu, AM College London) Schnabel Marleen de Bruijne, Aasa Feragen, Ipek Oguz, Jonas Teuwen 11:15 Oral Session II – Image Analysis and AM Segmentation Taleb, Namdar, West 301 - 305, Sat Dec 14, 08:00 AM Perkonigg, Gong 12:35 Lunch Medical imaging and radiology are facing a major PM crisis with an ever-increasing complexity and 02:00 Keynote III – Leo Grady (Paige AI) volume of data along an immense economic PM Grady pressure. The current advances and widespread 02:45 Oral Session III – Imaging Parmar, use of imaging technologies now overload the PM Hallgrimsson, Kames human capacity of interpreting medical images, 03:45 Cofee Break + Poster Session II dangerously posing a risk of missing critical PM Parmar, Hallgrimsson, Kames, Patra, patterns of diseases. Machine learning has Imran, Yang, Zimmerer, Chakravarty, emerged as a key technology for developing Schobs, Gossmann, CHEN, Dutt, Yao, novel tools in computer aided diagnosis, therapy Martinez Manzanera, Pinckaers, Dalmis, and intervention. Still, progress is slow compared Gupta, Haq, Ruhe, Gamper, De to other felds of visual recognition, which is Goyeneche Macaya, Tamir, Jeon, OOTA, mainly due to the domain complexity and Heckel, Douglas, Sidorov, Wang, Garcia, constraints in clinical applications, i.e., Soni, Shukla robustness, high accuracy and reliability. 04:15 Keynote IV – Daniel Sodickson (NYU “Medical Imaging meets NeurIPS” aims to bring PM Langone Health) Sodickson researchers together from the medical imaging 05:00 fastMRI Challenge Talks Yakubova, and machine learning communities to discuss the PM Pezzotti, Wang, Zitnick, Karkalousos, Sun, major challenges in the feld and opportunities for Caan, Murrell, Putzky research and novel applications. The proposed 06:00 Closing Remarks event will be the continuation of a successful PM workshop organized in NeurIPS 2017 and 2018 (https://sites.google.com/view/med-nips-2018). It Abstracts (10): will feature a series of invited speakers from academia, medical sciences and industry to give Abstract 2: Keynote I – Rene Vidal (Johns an overview of recent technological advances Hopkins University) in Medical Imaging and remaining major challenges. meets NeurIPS, Vidal 08:15 AM 106 Dec. 14, 2019

Machine Learning in Hematology: Reinventing the GAN-enhanced Conditional Echocardiogram – Blood Test Abdi, Tsang, Abolmaesumi Invasiveness Prediction of Pulmonary Adenocarcinomas Using Deep Feature Fusion Abstract 3: Oral Session I – Methods in Networks – Li, Ma, Li Push it to the Limit: Discover Edge-Cases in Medical Imaging meets NeurIPS, Weng, Kohl, Image Data with Autoencoders – Manakov, Tresp, Patra 09:00 AM Maximilian 09:00 – Multimodal Multitask Representation Noise-aware PET image Enhancement with Learning for Metadata Prediction in Pathology – Adaptive Deep Learning – Xiang, Wang, Gong, Weng, Cai, Lin, Tan, Chen Zaharchuk, Zhang 09:20 – A Hierarchical Probabilistic U-Net for clDice - a Novel Connectivity-Preserving Loss Modeling Multi-Scale Ambiguities – Kohl, Romera- Function for Vessel Segmentation – Paetzold, Shit, Paredes, Haier-Hein, Rezende, Eslami, Kohli, Ezhov, Tetteh, Ertuerk, Menze Zisserman, Ronneberger Machine Learning with Multi-Site Imaging Data: 09:40 – Task incremental learning of Chest X-ray An Empirical Study on the Impact of Scanner data on compact architectures – Patra Efects – Glocker, Robinson, Coelho de Castro, Dou, Konukoglu Extraction of hierarchical functional connectivity Abstract 4: Cofee Break + Poster Session I in components in human brain using resting-state fMRI – Sahoo, Bassett, Davatzikos Medical Imaging meets NeurIPS, Weng, Kohl, A Study into Echocardiography View Conversion – Taleb, Patra, Namdar, Perkonigg, Gong, Imran, Abdi, Manakov, Paetzold, Glocker, Sahoo, Abdi, Jafari, Fels, Tsang, Abolmaesumi Fadnavis, Roth, Liu, Zhang, Preuhs, Eitel, Trivedi, Variable Projection optimization for Intravoxel Weiss, Stern, Vazquez Romaguera, Hofmanninger, Incoherent Motion (IVIM) MRI estimation – Fadnavis, Garyfallidis Kaku, Olatunji, Razdaibiedina, Zhang 10:00 AM Boosting Liver and Lesion Segmentation from CT Multimodal Multitask Representation Learning for Scans by Mask Mining – Roth, Konopczynski, Metadata Prediction in Pathology – Weng, Cai, Lin, Hesser Tan, Chen [ORAL+Poster] Unsupervised Sparse-view Backprojection via A Hierarchical Probabilistic U-Net for Modeling Convolutional and Spatial Transformer Networks – Multi-Scale Ambiguities – Kohl, Romera-Paredes, Liu, Sajda Haier-Hein, Rezende, Eslami, Kohli, Zisserman, Collaborative Unsupervised Domain Adaptation Ronneberger [ORAL+Poster] for Medical Image Diagnosis – Zhang, Wei, Zhao, Task incremental learning of Chest X-ray data on Niu, Wu, Tan, Huang compact architectures – Patra [ORAL+Poster] Image Quality Assessment for Rigid Motion Multimodal Self-Supervised Learning for Medical Compensation – Preuhs, Manhart, Roser, Stimpel, Image Analysis – Taleb, Lippert, Nabi, Klein Syben, Psychogios, Kowarschik, Maier [ORAL+Poster] Harnessing spatial MRI normalization: patch Evolution-based Fine-tuning of CNNs for Prostate individual flter layers for CNNs – Eitel, Albrecht, Cancer Detection – Namdar, Gujrathi, Haider, Paul, Ritter Khalvati [ORAL+Poster] Binary Mode Multinomial Deep Learning Model for Unsupervised deep clustering for predictive more efcient Automated Diabetic Retinopathy texture pattern discovery in medical images – Detection – Trivedi, Desbiens, Gross, Ferres, Perkonigg, Sobotka, Ba-Ssalamah, Langs Dodhia [ORAL+Poster] PILOT: Physics-Informed Learned Optimal Large-scale classifcation of breast MRI exams Trajectories for Accelerated MRI – Weiss, Senouf, using deep convolutional networks – Gong, Vedula Muckley, Wu, Makino, Kim, Heacock, Moy, Knoll, Variational Inference and Bayesian CNNs for Geras [ORAL+Poster] Uncertainty Estimation in Multi-Factorial Bone Bipartite Distance For Shape-Aware Landmark Age Prediction – Stern, Urschler, Payer, Detection in Spinal X-Rays – Zubaer, Huang, Fan, Eggenreich Cheung, To, Qian, Terzopoulos In-plane organ motion prediction using a 107 Dec. 14, 2019

recurrent encoder-decoder framework – Vazquez Abstract 9: Oral Session III – Imaging in Romaguera, Plantefeve, Kadoury Medical Imaging meets NeurIPS, Parmar, Separation of target anatomical structure and Hallgrimsson, Kames 02:45 PM occlusions in thoracic X-ray images – Hofmanninger, Langs 14:45 – High Resolution Medical Image Analysis Knee Cartilage Segmentation Using Difusion- with Spatial Partitioning – Hou, Cheng, Shazeer, Weighted MRI – Duarte, Hedge, Kaku, Mohan, Parmar, Li, Korfatis, Drucker, Blezek, Song 15:05 – Estimating localized complexity of white- Raya matter wiring with GANs – Hallgrimsson, Sharan, Learning to estimate label uncertainty for automatic radiology report parsing – Olatunji, Yao Grafton, Singh Multi-defect microscopy image restoration under 15:25 – Training a Variational Network for use on limited data conditions – Razdaibiedina, 3D High Resolution MRI Data in 1 Day – Kames, Doucette, Rauscher Velayutham, Modi

Abstract 10: Cofee Break + Poster Session II Abstract 5: Keynote II – Julia Schnabel (King's in Medical Imaging meets NeurIPS, Parmar, College London) in Medical Imaging meets NeurIPS, Schnabel 10:30 AM Hallgrimsson, Kames, Patra, Imran, Yang, Zimmerer, Chakravarty, Schobs, Gossmann, Deep learning for medical image quality control CHEN, Dutt, Yao, Martinez Manzanera, Pinckaers, Dalmis, Gupta, Haq, Ruhe, Gamper, De Goyeneche Macaya, Tamir, Jeon, OOTA, Heckel,

Abstract 6: Oral Session II – Image Analysis Douglas, Sidorov, Wang, Garcia, Soni, Shukla and Segmentation in Medical Imaging 03:45 PM meets NeurIPS, Taleb, Namdar, Perkonigg, Gong High Resolution Medical Image Analysis with 11:15 AM Spatial Partitioning – Hou, Cheng, Shazeer, Parmar, Li, Korfatis, Drucker, Blezek, Song 11:15 – Multimodal Self-Supervised Learning for [ORAL+Poster] Medical Image Analysis – Taleb, Lippert, Nabi, Klein Estimating localized complexity of white-matter 11:35 – Evolution-based Fine-tuning of CNNs for wiring with GANs – Hallgrimsson, Sharan, Grafton, Prostate Cancer Detection – Namdar, Gujrathi, Singh [ORAL+Poster] Training a Variational Network for use on 3D High Haider, Khalvati Resolution MRI Data in 1 Day – Kames, Doucette, 11:55 – Unsupervised deep clustering for predictive texture pattern discovery in medical Rauscher [ORAL+Poster] images – Perkonigg, Sobotka, Ba-Ssalamah, Saliency cues for continual learning of ultrasound Langs – Patra End-to-End Fully Automatic Segmentation of 12:15 – Large-scale classifcation of breast MRI Scoliotic Vertebrae from Spinal X-Ray Images – exams using deep convolutional networks – Gong, Muckley, Wu, Makino, Kim, Heacock, Moy, Imran, Huang, Tang, Fan, Cheung, To, Qian, Knoll, Geras Terzepoulos Hepatocellular Carcinoma Intra-arterial Treatment Response Prediction for Improved Therapeutic Decision-Making – Yang, Dvornek, Zhang, Abstract 8: Keynote III – Leo Grady (Paige AI) Chapiro, Lin, Abajian, Duncan in Medical Imaging meets NeurIPS, Grady High- and Low-level image component 02:00 PM decomposition using VAEs for improved reconstruction and anomaly detection – Changing the Paradigm of Pathology: AI and Zimmerer, Petersen, Maier-Hein Computational Diagnostics Radiologist Validated Systematic Search over Deep Neural Networks for Screening Musculoskeletal Radiographs – Chakravarty, Sheet, Ghosh, Sarkar, Sethuraman 108 Dec. 14, 2019

A Biased Sampling Network to Localise Towards High Fidelity Direct-Contrast Synthesis Landmarks for Automated Disease Diagnosis – from Magnetic Resonance Fingerprinting – Wang, Schobs, Zhou, Cogliano, Swift, Lu Karasan, Doneva, Tan Variational inference based assessment of HR-CAMs : Using multi-level features for precise mammographic lesion classifcation algorithms discriminative localization of pathology – under distribution shift – Gossmann, Cha, Sun Ingalhalikar, Shinde, Chougule, Saini Batch-wise Dice Loss: Rethinking the Data Towards Autism detection on brain structural MRI Imbalance for Medical Image Segmentation – scans with Adversarially Learned Inference – Chang, Lin, Wu, Chen, Hsu Garcia Towards Artifact Rejection in Microscopic Neural Network Compression using Urinalysis – Dutt Reinforcement Learning in Medical Image Analysis of focal loss with noisy labels – Yao, Segmentation – Chhabra, Soni, Avinash Jadhav Data Augmentation for Early Stage Lung Nodules using Deep Image Prior and CycleGan – Martinez Abstract 11: Keynote IV – Daniel Sodickson Manzanera, Ellis, Baltatzis, Devaraj, Desai, Le (NYU Langone Health) in Medical Imaging Golgoc, Nair, Glocker, Schnabel meets NeurIPS, Sodickson 04:15 PM Neural Ordinary Diferential Equations for Semantic Segmentation of Individual Colon AI and Radiology: How machine learning will Glands – Pinckaers, Litjens change the way we see patients, and the way we Class-Aware CycleGAN: A domain adaptation see ourselves method for mammography and tomosynthesis – Dalmis, Birhanu, Vanegas, Kallenerg, Kroes Tracking-Assisted Segmentation of Biological Abstract 12: fastMRI Challenge Talks in Cells – Gupta, de Bruin, Panteli, Gavves Medical Imaging meets NeurIPS, Yakubova, Deep learning feature based medical image Pezzotti, Wang, Zitnick, Karkalousos, Sun, Caan, retrieval for large-scale datasets – Haq, Moradi, Murrell, Putzky 05:00 PM Wang Generating CT-scans with 3D Generative 3 Winner Talks of fastMRI Adversarial Networks Using a Supercomputer – Ruhe, Codreanu, va Leeuwen, Podareanu, Saletore, Teuwen Meta-SVDD: Probabilistic Meta-Learning for One- Learning with Temporal Point Processes Class Classifcation in Cancer Histology Images – Gamper, Chan, Tsang, Snead, Rajpoot Manuel Rodriguez, Le Song, Isabel Valera, Yan Liu, One-Click Spine MRI – De Goyeneche, Peterson, Abir De, Hongyuan Zha He, Addy, Santos West 306, Sat Dec 14, 08:00 AM Improved generalizability of deep-learning based low dose volumetric contrast-enhanced MRI – In recent years, there has been an increasing Tamir, Pasumarthi, Gong, Zaharchuk, Zhang number of machine learning models and Deep Recursive Bayesian Maximal Path for Fully algorithms based on the theory of temporal point Automatic Extraction of Coronary Arteries in CT processes, which is a mathematical framework to Images – Jeon, Shim, Chang model asynchronous event data. These models A Deep Multi-Modal Method for Patient Wound and algorithm have found a wide range of Healing Assessment – Oota, Rowtula, human-centered applications, from social and Mohammed, Galitz, Liu, Gupta information networks and recommender systems Signal recovery with un-trained convolutional to crime prediction and health. Moreover, this neural networks – Heckel emerging line of research has already established On the Similarity of Deep Learning connections to deep learning, deep generative Representations Across Didactic and Adversarial models, Bayesian nonparametrics, causal Examples – Douglas, Farahani inference, stochastic optimal control and Generative Smoke Removal – Sidorov, Wang, reinforcement learning. However, despite these Alaya-Chekh 109 Dec. 14, 2019

recent advances, learning with temporal point 01:50 Invited Talk by Walter Dempsey processes is still a relatively niche topic within PM Dempsey the machine learning community---there are only 02:30 Better Approximate Inference for a few research groups across the world with the PM Partial Likelihood Models with a necessary expertise to make progress. In this Latent Structure Setlur workshop, we aim to popularize temporal point 02:45 Deep Point Process Destructors processes within the machine learning PM community at large. In our view, this is the right 03:00 A sleep-wake detection algorithm time to organize such a workshop because, as PM for memory-constrained wearable algorithmic decisions becomes more devices: Change Point Decoder consequential to individuals and society, Cakmak temporal point processes will play a major role on 03:15 Topics are not Marks: Modeling Text- the development of human-centered machine PM based Cascades using Multi-network learning models and algorithms accounting for Hawkes Process Choudhari the feedback loop between algorithmic and 03:30 Poster Setup + Cofee Break human decisions, which are inherently PM asynchronous events. Moreover, it will be a natural follow up of a very successful and well- 04:15 Temporal point process models vs. attended ICML 2018 tutorial on learning with PM discrete time models temporal point processes, which two of us 05:00 Poster Session Cakmak, Zhang, recently taught. PM Prabhakarannair Kusumam, Ahmed, Wu, Choudhari, Inouye, Taylor, Besserve, Turkmen, Islam, Artés, Setlur, Fu, Han, Schedule De, Du, Sanchez Martin

08:30 Welcome Address and Introduction AM The Optimization Foundations of 08:35 Invited Talk by Negar Kiyavash Reinforcement Learning AM Kiyavash 09:15 Fused Gromov-Wasserstein Bo Dai, Niao He, Nicolas Le Roux, Lihong Li, Dale AM Alignment for Hawkes Processes Schuurmans, Martha White 09:30 Insider Threat Detection via West Ballroom A, Sat Dec 14, 08:00 AM AM Hierarchical Neural Temporal Point Processes Wu Interest in reinforcement learning (RL) has 09:45 Cofee Break boomed with recent improvements in benchmark AM tasks that suggest the potential for a 10:30 Invited Talk By Niloy Ganguly ganguly revolutionary advance in practical applications. AM Unfortunately, research in RL remains hampered 11:10 Intermittent Demand Forecasting by limited theoretical understanding, making the AM with Deep RenewalProcesses feld overly reliant on empirical exploration with Turkmen insufcient principles to guide future 11:25 Temporal Logic Point Processes development. It is imperative to develop a AM stronger fundamental understanding of the 11:40 The Graph Hawkes Network for success of recent RL methods, both to expand AM Reasoning on Temporal Knowledge the useability of the methods and accelerate Graphs future deployment. Recently, fundamental concepts from optimization and control theory 11:55 Multivariate coupling estimation have provided a fresh perspective that has led to AM between continuous signals and the development of sound RL algorithms with point processes Besserve provable efciency. The goal of this workshop is 12:10 Lunch Break to catalyze the growing synergy between RL and PM optimization research, promoting a rational 110 Dec. 14, 2019

reconsideration of the foundational principles for 11:40 Poster Spotlight 2 Sidford, Wang, Yang, reinforcement learning, and bridging the gap AM Ye, Fu, Yang, Chen, Wang, Nachum, Dai, between theory and practice. Kostrikov, Schuurmans, Tang, Feng, Li, Zhou, Liu, Toro Icarte, Waldie, Klassen, Valenzano, Castro, Du, Kakade, Wang, Schedule Chen, Liu, Li, Wang, Zhao, Amortila, Precup, Panangaden, Bellemare 08:00 Opening Remarks Dai, He, Le Roux, Li, 02:00 Reinforcement Learning Beyond AM Schuurmans, White PM Optimization Van Roy 08:10 Unsupervised State Embedding and 02:40 Learning in structured MDPs with AM Aggregation towards Scalable PM convex cost function: improved Reinforcement Learning Wang regret bounds for inventory 08:50 Adaptive Trust Region Policy management Agrawal AM Optimization: Convergence and 03:20 Poster and Cofee Break 2 Hausman, Faster Rates of regularized MDPs PM Dong, Goldberg, Li, Yang, Wang, Shani, Shani, Efroni, Mannor Wang, Amdahl-Culleton, Cassano, 09:10 Poster Spotlight 1 Brandfonbrener, Dymetman, Bellemare, Tomczak, Castro, AM Bruna, Zahavy, Kaplan, Mansour, Kloft, Dinu, Holzleitner, White, Wang, Karampatziakis, Langford, Mineiro, Lee, Jordan, Jovanovic, Yu, Chen, Ryu, Zaheer, He Agarwal, Jiang, He, Yasui, Karampatziakis, 09:30 Poster and Cofee Break 1 Sidford, Vieillard, Nachum, Pietquin, Sener, Xu, AM Mahajan, Ribeiro, Lewandowski, Sayed, Kamalaruban, Mineiro, Rolland, Amortila, Tewari, Steger, Anandkumar, Mujika, Bacon, Panangaden, Cai, Liu, Gu, Seraj, Kappen, Zhou, Boots, Finn, Wei, Jin, Sutton, Valenzano, Dadashi, Toro Icarte, Cheng, Yu, Gehring, Boutilier, Lin, Sharif, Fox, Wang, Ghadimi, Sokota, McNamee, Russo, Brandfonbrener, Zhou, Sinclair, Hochreiter, Levine, Valcarcel Jha, Romeres, Precup, Thalmeier, Macua, Kakade, Zhang, McIlraith, Gorbunov, Hazan, Smirnova, Dohmatob, Mannor, Whiteson, Li, Qiu, Li, Banerjee, Brunskill, Munoz de Cote, Waldie, Meier, Luan, Basar, Doan, Yu, Liu, Zahavy, Schaefer, Liu, Neu, Kaplan, Sun, Yao, Klassen, Zhao, Gómez, Liu, Cevher, Bhandari, Preiss, Subramanian, Li, Ye, Suttle, Chang, Wei, Liu, Li, Chen, Song, Smith, Bas Serrano, Bruna, Langford, Liu, Jiang, Feng, Du, Chow, Ye, Mansour, , Lee, Arjona-Medina, Zhang, Singh, Luo, Efroni, Chen, Wang, Dai, Wei, Ahmed, Chen, Wang, Li, Yang, Xu, Tang, Shrivastava, Zhang, Zheng, SATPATHI, Mao, Brandfonbrener, Di-Castro Shashua, Liu, Vall Islam, Fu, Naik, Kumar, Petit, Kamoutsi, 04:20 On the Convergence of GTD($ Totaro, Raghunathan, Wu, Lee, Ding, PM \lambda$) with General $\lambda$ Yu Koppel, Sun, Tjandraatmadja, Karami, 05:00 Continuous Online Learning and New Mei, Xiao, Wen, Zhang, Goroshin, PM Insights to Online Imitation Learning Pezeshki, Zhai, Amortila, Huang, Lee, Cheng, Goldberg, Boots Vasileva, Bergou, Ahmadyan, Sun, 05:20 Logarithmic Regret for Online Zhang, Gruber, Wang, Parshakova PM Control Agarwal, Hazan, Singh 10:30 The Provable Efectiveness of Policy 05:40 Closing Remarks Dai, He, Le Roux, Li, AM Gradient Methods in Reinforcement PM Schuurmans, White Learning Kakade 11:10 Panel Discussion Sutton, Precup Abstracts (10): AM Abstract 2: Unsupervised State Embedding and Aggregation towards Scalable Reinforcement Learning in The Optimization 111 Dec. 14, 2019

Foundations of Reinforcement Learning, Wang 08:10 AM However, little is known about even their most basic theoretical convergence properties, TBA including: - do they converge to a globally optimal solution, say with a sufciently rich policy class? Abstract 3: Adaptive Trust Region Policy - how well do they cope with approximation error, Optimization: Convergence and Faster say due to using a class of neural policies? Rates of regularized MDPs in The - what is their fnite sample complexity? Optimization Foundations of Reinforcement This talk will survey a number of results on these Learning, Shani, Efroni, Mannor 08:50 AM basic questions. We will highlight the interplay of theory, algorithm design, and practice. Trust region policy optimization (TRPO) is a popular and empirically successful policy search Joint work with: Alekh Agarwal, Jason Lee, Gaurav algorithm in Reinforcement Learning (RL) in Mahajan which a surrogate problem, that restricts consecutive policies to be `close' to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Abstract 7: Panel Discussion in The Optimization Foundations of Reinforcement Conservative Policy Iteration (CPI). We show that Learning, Sutton, Precup 11:10 AM the adaptive scaling mechanism used in TRPO is in fact the natural ``RL version" of traditional TBA trust-region methods from convex analysis. We frst analyze TRPO in the planning setting, in which we have access to the model and the Abstract 9: Reinforcement Learning Beyond entire state space. Then, we consider sample- based TRPO and establish $\tilde O(1/\sqrt{N})$ Optimization in The Optimization convergence rate to the global optimum. Foundations of Reinforcement Learning, Van Importantly, the adaptive scaling mechanism Roy 02:00 PM allows us to analyze TRPO in regularized MDPs for The reinforcement learning problem is often which we prove fast rates of $\tilde O(1/N)$, framed as one of quickly optimizing an uncertain much like results in convex optimization. This is Markov decision process. This formulation has led the frst result in RL of better rates when to substantial insight and progress in algorithms regularizing the instantaneous cost or reward. and theory. However, this perspective is limiting and can also give rise to poor algorithm designs. I will discuss this issue and how it is addressed by Abstract 6: The Provable Efectiveness of popular reinforcement learning algorithms. Policy Gradient Methods in Reinforcement Learning in The Optimization Foundations of Reinforcement Learning, Kakade 10:30 AM Abstract 10: Learning in structured MDPs

Reinforcement learning is now the dominant with convex cost function: improved regret paradigm for how an agent learns to interact with bounds for inventory management in The the world in order to achieve some long term Optimization Foundations of Reinforcement Learning, Agrawal 02:40 PM objectives. Here, policy gradient methods are among the most efective methods in challenging We present a learning algorithm for the reinforcement learning problems, due to that stochastic inventory control problem under lost they: are applicable to any diferentiable policy sales penalty and positive lead times, when the parameterization; admit easy extensions to demand distribution is a priori unknown. Our function approximation; easily incorporate main result is a regret bound of O(L\sqrt{T}+D) structured state and action spaces; are easy to for the algorithm, where T is the time horizon, L is implement in a simulation based, model-free the fxed and known lead time, and D is an manner. unknown parameter of the demand distribution 112 Dec. 14, 2019

described roughly as the number of time steps mirror-descent TD variants, for diferent choices needed to generate enough demand for of $\lambda$: constant, state-dependent, and depleting one unit of inventory. Our results history-dependent. These convergence results signifcantly improve the existing regret bounds are obtained by combining special properties of for this problem. Notably, even though the state the joint state and eligibility trace process in TD space of the underlying Markov Decision Process learning with ergodic theory for weak Feller (MDP) in this problem is continuous and L- Markov chains, mean-ODE-based proof methods, dimensional, our regret bounds depend linearly and convex optimization theory. Our results not on L. Our techniques utilize convexity of the long only resolve long-standing open questions run average cost and a newly derived bound on regarding the convergence of GTD($\lambda$) the `bias' of base-stock policies, to establish an but also provide the basis for using a broader almost blackbox connection between the problem class of generalized Bellman operators for of learning and optimization in such MDPs and approximate policy evaluation with linear TD stochastic convex bandit optimization. The methods. techniques presented here may be of independent interest for other settings that involve learning large structured MDPs but with Abstract 13: Continuous Online Learning and convex cost functions. New Insights to Online Imitation Learning in The Optimization Foundations of Reinforcement Learning, Lee, Cheng, Abstract 12: On the Convergence of GTD($ Goldberg, Boots 05:00 PM \lambda$) with General $\lambda$ in The Optimization Foundations of Reinforcement Online learning is a powerful tool for analyzing Learning, Yu 04:20 PM iterative algorithms. However, the classic adversarial setup sometimes fails to capture Gradient temporal-diference (GTD) algorithms certain regularity in online problems in practice. are an important family of extensions of the Motivated by this, we establish a new setup, standard TD algorithm, overcoming the stability called Continuous Online Learning (COL), where issues in of-policy learning with function the gradient of online loss function changes approximation by minimizing the projected continuously across rounds with respect to the Bellman error using stochastic gradient descent learner’s decisions. We show that COL covers and (SGD). In this talk, we consider GTD($\lambda$) more appropriately describes many interesting for policy evaluation in the case of linear function applications, from general equilibrium problems approximation and fnite-space Markov decision (EPs) to optimization in episodic MDPs. Using this processes (MDPs) with discounted rewards. new setup, we revisit the difculty of achieving GTD($\lambda$) difers from the typical SGD sublinear dynamic regret. We prove that there is algorithm for minimizing an objective function, in a fundamental equivalence between achieving the following way. In an MDP, each stationary sublinear dynamic regret in COL and solving policy is associated with a family of Bellman certain EPs, and we present a reduction from equations. By selecting the values of the dynamic regret to both static regret and eligibility trace parameters, $\lambda$, we convergence rate of the associated EP. At the determine the Bellman operator that defnes the end, we specialize these new insights into online objective function. This choice infuences both imitation learning and show improved the approximation error and learning behavior of understanding of its learning stability. the resulting algorithm. We frst describe a dynamic scheme to judiciously set $\lambda$ in a history-dependent way. This scheme is based on Abstract 14: Logarithmic Regret for Online the idea of randomized stopping times and Control in The Optimization Foundations of generalized Bellman equations, and it is useful Reinforcement Learning, Agarwal, Hazan, for balancing the bias-variance tradeof in of- Singh 05:20 PM policy learning. We then present asymptotic convergence results for several variations of We study optimal regret bounds for control in GTD($\lambda$), including saddle-point and linear dynamical systems under adversarially 113 Dec. 14, 2019

changing strongly convex cost functions, given while more recent examples include non-vacuous the knowledge of transition dynamics. This risk bounds for neural networks (Dziugaite & Roy, includes several well studied and fundamental 2017, 2018), algorithms that optimize both the frameworks such as the Kalman flter and the weights and structure of a neural network linear quadratic regulator. State of the art (Cortes, 2017), counterfactual risk minimization methods achieve regret which scales as T^0.5, for learning from logged bandit feedback where T is the time horizon. (Swaminathan & Joachims, 2015; London & Sandler, 2019), robustness to adversarial attacks We show that the optimal regret in this setting (Schmidt et al., 2018; Wong & Kolter, 2018), can be signifcantly smaller, scaling as polylog T. diferentially private learning (Dwork et al., 2006, This regret bound is achieved by two diferent Chaudhuri et al., 2011), and algorithms that efcient iterative methods, online gradient ensure fairness (Dwork et al., 2012). descent and online natural gradient. This one-day workshop will bring together researchers in both theoretical and applied machine learning, across areas such as statistical Abstract 15: Closing Remarks in The learning theory, adversarial learning, fairness and Optimization Foundations of Reinforcement Learning, Dai, He, Le Roux, Li, Schuurmans, privacy, to discuss the problem of obtaining White 05:40 PM performance guarantees and algorithms to optimize them. The program will include invited Awards Announcement and contributed talks, poster sessions and a panel discussion. We particularly welcome contributions describing fundamentally new problems, novel learning principles, creative Machine Learning with Guarantees bound optimization techniques, and empirical studies of theoretical fndings. Ben London, Gintare Karolina Dziugaite, Dan Roy, Thorsten Joachims, Aleksander Madry, John Shawe- Taylor Schedule

West Ballroom B, Sat Dec 14, 08:00 AM 08:45 Welcome Address London As adoption of machine learning grows in high- AM stakes application areas (e.g., industry, 09:00 Tengyu Ma, "Designing Explicit government and health care), so does the need AM Regularizers for Deep Models" Ma for guarantees: how accurate a learned model 09:45 Vatsal Sharan, "Sample will be; whether its predictions will be fair; AM Amplifcation: Increasing Dataset whether it will divulge information about Size even when Learning is individuals; or whether it is vulnerable to Impossible" Sharan adversarial attacks. Many of these questions involve unknown or intractable quantities (e.g., risk, regret or posterior likelihood) and complex constraints (e.g., diferential privacy, fairness, and adversarial robustness). Thus, learning algorithms are often designed to yield (and optimize) bounds on the quantities of interest. Beyond providing guarantees, these bounds also shed light on black-box machine learning systems.

Classical examples include structural risk minimization (Vapnik, 1991) and support vector machines (Cristianini & Shawe-Taylor, 2000), 114 Dec. 14, 2019

10:15 Break / Poster Session 1 Marcu, Yang, performances of deep neural networks. We derive AM Gourdeau, Zhu, Lykouris, Chi, Kozdoba, data-dependent generalization bounds for deep Bhagoji, Wu, Nandy, Smith, Wen, Xie, neural networks. We empirically regularize the Pitas, Shit, Andriushchenko, Yu, Letarte, bounds and obtain improved generalization Khodak, Mozannar, Podimata, Foulds, performance (in terms of the standard accuracy Wang, Zhang, Kuzelka, Levine, Lu, or the robust accuracy). I will also touch on Mhammedi, Viallard, Cai, Gondara, Lucas, recent results on applying these techniques to Mahdaviyeh, Baratin, Bommasani, Barp, imbalanced datasets. Ilyas, Wu, Behrmann, Rivasplata, Nazemi, Raghunathan, Stephenson, Singla, Based on joint work with Colin Wei, Kaidi Cao, Gupta, Choi, Kilcher, Lyle, Manino, Adrien Gaidon, and Nikos Arechiga Bennett, Xu, Chatterji, Barut, Prost, Toro Icarte, Blaas, Yun, Lale, Jiang, Medini, https://arxiv.org/abs/1910.04284 Rezaei, Meinke, Mell, Kazantsev, Garg, https://arxiv.org/abs/1906.07413 Sinha, Lokhande, Rizk, Zhao, Akash, Hou, https://arxiv.org/abs/1905.03684 Ghodsi, Hein, Sypherd, Yang, Pentina, Gillot, Ledent, Gur-Ari, MacAulay, Zhang 10:45 Mehryar Mohri, "Learning with Abstract 3: Vatsal Sharan, "Sample AM Sample-Dependent Hypothesis Sets" Amplifcation: Increasing Dataset Size even Mohri when Learning is Impossible" in Machine 11:30 James Lucas, "Information-theoretic Learning with Guarantees, Sharan 09:45 AM AM limitations on novel task generalization" Lucas Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' 12:00 Lunch Break this dataset and faithfully output a larger set of PM samples that appear to have been drawn from 01:45 Soheil Feizi, "Certifable Defenses $D$? We formalize this question as follows: an $ PM against Adversarial Attacks" Feizi (n,m)$ amplifcation procedure takes as input $n$ 02:30 Maksym Andriushchenko, "Provably independent draws from an unknown distribution PM Robust Boosted Decision Stumps $D$, and outputs a set of $m > n$ ``samples'' and Trees against Adversarial which must be indistinguishable from $m$ Attacks" Andriushchenko samples drawn i.i.d. from $D$. We consider this 03:00 Cofee Break / Poster Session 2 sample amplifcation problem in two fundamental PM settings: the case where $D$ is an arbitrary 03:30 Aaron Roth, "Average Individual discrete distribution supported on $k$ elements, PM Fairness" Roth and the case where $D$ is a $d$-dimensional 04:15 Hussein Mozannar, "Fair Learning Gaussian with unknown mean, and fxed PM with Private Data" Mozannar covariance matrix. Perhaps surprisingly, we show 04:45 Emma Brünskill, "Some Theory RL a valid amplifcation procedure exists for both of PM Challenges Inspired by Education" these settings, even in the regime where the size Brunskill of the input dataset, $n$, is signifcantly less than 05:30 Discussion Panel what would be necessary to learn distribution PM $D$ to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we also formalize a Abstracts (8): number of curious directions for future research along this vein. Abstract 2: Tengyu Ma, "Designing Explicit Regularizers for Deep Models" in Machine Learning with Guarantees, Ma 09:00 AM Abstract 4: Break / Poster Session 1 in I will discuss some recent results on designing Machine Learning with Guarantees, Marcu, explicit regularizers to improve the generalization Yang, Gourdeau, Zhu, Lykouris, Chi, Kozdoba, 115 Dec. 14, 2019

Bhagoji, Wu, Nandy, Smith, Wen, Xie, Pitas, Shit, last couple of years, several practical defenses Andriushchenko, Yu, Letarte, Khodak, Mozannar, based on regularization and adversarial training Podimata, Foulds, Wang, Zhang, Kuzelka, Levine, have been proposed which are often followed by Lu, Mhammedi, Viallard, Cai, Gondara, Lucas, stronger attacks to defeat them. To escape this Mahdaviyeh, Baratin, Bommasani, Barp, Ilyas, cycle, a new line of work focuses on developing Wu, Behrmann, Rivasplata, Nazemi, certifably robust classifers. In these models, for Raghunathan, Stephenson, Singla, Gupta, Choi, a given input sample, one can calculate a Kilcher, Lyle, Manino, Bennett, Xu, Chatterji, robustness certifcate such that for ‘any’ Barut, Prost, Toro Icarte, Blaas, Yun, Lale, Jiang, perturbation of the input within the robustness Medini, Rezaei, Meinke, Mell, Kazantsev, Garg, radius, the classifcation output will ‘provably’ Sinha, Lokhande, Rizk, Zhao, Akash, Hou, Ghodsi, remain unchanged. In this talk, I will present two Hein, Sypherd, Yang, Pentina, Gillot, Ledent, Gur- certifable defenses: (1) Wasserstein smoothing Ari, MacAulay, Zhang 10:15 AM to defend against non-additive Wasserstein adversarial attacks, and (2) Curvature-based Visit https://sites.google.com/view/ robust training to certifably defend against L2 mlwithguarantees/accepted-papers for the list of attacks by globally bounding curvature values of papers. the network.

Posters will be up all day. This is a joint work with Alex Levine and Sahil Singla.

Abstract 6: James Lucas, "Information- theoretic limitations on novel task Abstract 9: Maksym Andriushchenko, generalization" in Machine Learning with "Provably Robust Boosted Decision Stumps Guarantees, Lucas 11:30 AM and Trees against Adversarial Attacks" in

Machine learning models have traditionally been Machine Learning with Guarantees, developed under the assumption that the training Andriushchenko 02:30 PM and test distributions match exactly. However, The problem of adversarial robustness has been recent success in few-shot learning and related studied extensively for neural networks. problems are encouraging signs that these However, for boosted decision trees and decision models can be adapted to more realistic settings stumps there are almost no results, even though where train and test difer. Unfortunately, there is they are widely used in practice (e.g. XGBoost) severely limited theoretical support for these due to their accuracy, interpretability, and algorithms and little is known about the difculty efciency. We show in this paper that for boosted of these problems. In this work, we provide novel decision stumps the \textit{exact} min-max information-theoretic lower-bounds on minimax robust loss and test error for an $l_\infty$-attack rates of convergence for algorithms which are can be computed in $O(T\log T)$ time per input, trained on data from multiple sources and tested where $T$ is the number of decision stumps and on novel data. Our bounds depend intuitively on the optimal update step of the ensemble can be the information shared between sources of data done in $O(n^2\,T\log T)$, where $n$ is the and characterizes the difculty of learning in this number of data points. For boosted trees we setting for arbitrary algorithms. show how to efciently calculate and optimize an upper bound on the robust loss, which leads to state-of-the-art robust test error for boosted trees Abstract 8: Soheil Feizi, "Certifable Defenses on MNIST (12.5\% for $\epsilon_\infty=0.3$), against Adversarial Attacks" in Machine FMNIST (23.2\% for $\epsilon_\infty=0.1$), and Learning with Guarantees, Feizi 01:45 PM CIFAR-10 (74.7\% for $\epsilon_\infty=8/255$). Moreover, the robust test error rates we achieve While neural networks have achieved high are competitive to the ones of provably robust performance in diferent learning tasks, their CNNs. Code of our method is available at accuracy drops signifcantly in the presence of \url{https://git.io/Je18r}. This is a short version of small adversarial perturbations to inputs. In the 116 Dec. 14, 2019

the corresponding NeurIPS 2019 paper driven policy-making. This dramatic success has \cite{andriushchenko2019provably}. led to increased expectations for autonomous systems to make the right decision at the right target at the right time. This gives rise to one of Abstract 10: Cofee Break / Poster Session 2 the major challenges of machine learning today that is the understanding of the cause-efect in Machine Learning with Guarantees, 03:00 connection. Indeed, actions, intervention, and PM decisions have important consequences, and so, Visit https://sites.google.com/view/ in seeking to make the best decision, one must mlwithguarantees/accepted-papers for the list of understand the process of identifying causality. papers. By embracing causal reasoning autonomous systems will be able to answer counterfactual Posters will be up all day. questions, such as “What if I had treated a patient diferently?”, and “What if had ranked a list diferently?” thus helping to establish the evidence base for important decision-making Abstract 12: Hussein Mozannar, "Fair processes. Learning with Private Data" in Machine Learning with Guarantees, Mozannar 04:15 PM The purpose of this workshop is to bring together experts from diferent felds to discuss the We study learning non-discriminatory predictors relationships between machine learning and when the protected attributes are privatized or causal inference and to discuss and highlight the noisy. We observe that, in the population limit, formalization and algorithmization of causality non-discrimination against noisy attributes is toward achieving human-level machine equivalent to that against original attributes. We intelligence. show this to hold for various fairness criteria. We then characterize the amount of difculty, in This purpose will guide the makeup of the invited sample complexity, that privacy adds to testing talks and the topics for the panel discussions. The non-discrimination. Using this relationship, we panel discussions will tackle controversial topics, propose how to carefully adapt existing non- with the intent of drawing out an engaging discriminatory learners to work with privatized intellectual debate and conversation across protected attributes. Care is crucial, as naively felds. using these learners may create the illusion of non-discrimination, while continuing to be highly This workshop will lead to advance and extend discriminatory. knowledge on how machine learning could be used to conduct causal inference, and how causal inference could support the development of machine learning methods for improved decision- “Do the right thing”: machine learning and making. causal inference for improved decision making Schedule

Michele Santacatterina, Thorsten Joachims, Nathan Kallus, Adith Swaminathan, David Sontag, Angela 08:45 Opening Remarks Joachims, Kallus, Zhou AM Santacatterina, Swaminathan, Sontag, Zhou West Ballroom C, Sat Dec 14, 08:00 AM 09:00 Susan Athey Athey In recent years, machine learning has seen AM important advances in its theoretical and 09:30 Andrea Rotnitzky Rotnitzky practical domains, with some of the most AM signifcant applications in online marketing and commerce, personalized medicine, and data- 117 Dec. 14, 2019

10:00 Poster Spotlights Namkoong, Namkoong, Charpignon, Rudolph, Coston, Saito, AM Charpignon, Rudolph, Coston, Saito, Dhillon, Markham 10:00 AM Dhillon, Markham Poster spotlights ID: 10, 11, 16, 17, 20, 24, 31 10:15 Cofee break, posters, and 1-on-1 AM discussions Lu, Chen, Namkoong, Charpignon, Rudolph, Coston, von Kügelgen, Prasad, Dhillon, Xu, Wang, Abstract 8: Tentative topic: Reasoning about Markham, Rohde, Singh, Zhang, untestable assumptions in the face of Hassanpour, Sharma, Lee, Pouget- unknowable counterfactuals in “Do the Abadie, Krijthe, Mahajan, Ke, right thing”: machine learning and causal Wirnsberger, Semenova, Mykhaylov, inference for improved decision Shen, Takatsu, Sun, Yang, Franks, Wong, making, 12:00 PM Zaman, Mitchell, kang, Yang Tentative topic: How machine learning, and 11:00 Susan Murphy Murphy causal inference work together: cross-pollination AM and new challenges. 11:30 Ying-Qi Zhao Zhao AM 12:00 Tentative topic: Reasoning about Abstract 11: Contributed talk 1 in “Do the PM untestable assumptions in the face right thing”: machine learning and causal of unknowable counterfactuals inference for improved decision making, 12:45 Lunch Chen, Boehnke, Wang, Bonaldi 03:00 PM PM 02:30 Susan Shortreed Shortreed Oral Spotlights ID: 8,9, 27 PM 03:00 Contributed talk 1 Chen, Boehnke, PM Wang, Bonaldi Abstract 12: Contributed talk 2 in “Do the 03:15 Contributed talk 2 Mahajan, Khosravi, right thing”: machine learning and causal PM D'Amour inference for improved decision making, 03:30 Poster Spotlights Griveau-Billion, Mahajan, Khosravi, D'Amour 03:15 PM PM Singh, Zhang, Lee, Krijthe, Charles, Oral Spotlights ID: 57, 93, 113 Semenova, Ladhania, Oprescu 03:45 Cofee break, posters, and 1-on-1 PM discussions von Kügelgen, Rohde, Schumann, Charles, Veitch, Semenova, Abstract 13: Poster Spotlights in “Do the Demirer, Syrgkanis, Nair, Puli, Uehara, right thing”: machine learning and causal Gopalan, Ding, Ng, Khosravi, Sherman, inference for improved decision making, Zeng, Wieczorek, Liu, Gan, Hartford, Griveau-Billion, Singh, Zhang, Lee, Krijthe, Oprescu, D'Amour, Boehnke, Saito, Charles, Semenova, Ladhania, Oprescu 03:30 PM Griveau-Billion, Modi, Karimov, Poster Spotlights ID: 34, 35, 39, 50, 56, 68, 75, Berrevoets, Graham, Mayer, Sridhar, 111, 112 Dahabreh, Mishler, Wadsworth, Qureshi, Ladhania, Morishita, Welle 05:00 Closing Remarks PM Bridging Game Theory and Deep Learning

Ioannis Mitliagkas, Gauthier Gidel, Niao He, Reyhane Abstracts (5): Askari Hemmat, Nika Haghtalab, Simon Lacoste- Julien Abstract 4: Poster Spotlights in “Do the right thing”: machine learning and causal West Exhibition Hall A, Sat Dec 14, 08:00 AM inference for improved decision making, 118 Dec. 14, 2019

Advances in generative modeling and adversarial 03:05 Contributed talk: Exploiting learning gave rise to a recent surge of interest in PM Uncertain Real-Time Information diferentiable two-players games, with much of from Deep Learning in Signaling the attention falling on generative adversarial Games for Security and networks (GANs). Solving these games introduces Sustainability Bondi distinct challenges compared to the standard 03:30 Cofee break minimization tasks that the machine learning PM (ML) community is used to. A symptom of this 04:00 Invited talk: Aryan Mokhtari (UT issue is ML and deep learning (DL) practitioners PM Austin) Mokhtari using optimization tools on game-theoretic 04:40 Afternoon poster spotlight problems. Our NeurIPS 2018 workshop, "Smooth PM games optimization in ML", aimed to rectify this 05:00 Discussion panel situation, addressing theoretical aspects of PM games in machine learning, their special 05:30 Concluding remarks -- afternoon dynamics, and typical challenges. For this year, PM poster session we signifcantly expand our scope to tackle questions like the design of game formulations for other classes of ML problems, the integration Abstracts (4): of learning with game theory as well as their important applications. To that end, we have Abstract 6: Contributed talk: What is Local confrmed talks from Éva Tardos, David Balduzzi Optimality in Nonconvex-Nonconcave and Fei Fang. We will also solicit contributed Minimax Optimization? in Bridging Game posters and talks in the area. Theory and Deep Learning, Netrapalli 11:40 AM

Schedule Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks 08:15 Opening remarks (GANs), adversarial training and multi-agent AM reinforcement learning. As most of these 08:30 Invited talk: Eva Tardos (Cornell) applications involve continuous nonconvex- AM Tardos nonconcave formulations, a very basic question 09:10 Morning poster Spotlight arises---``what is a proper defnition of local AM optima?'' 09:30 Morning poster session -- cofee Most previous work answers this question using AM break classical notions of equilibria from simultaneous 11:00 Invited talk: David Balduzzi games, where the min-player and the max-player AM (DeepMind Balduzzi act simultaneously. In contrast, most applications 11:40 Contributed talk: What is Local in machine learning, including GANs and AM Optimality in Nonconvex- adversarial training, correspond to sequential Nonconcave Minimax Optimization? games, where the order of which player acts frst Netrapalli is crucial (since minimax is in general not equal 12:05 Contributed talk: Characterizing to maximin due to the nonconvex-nonconcave PM Equilibria in Stackelberg Games Fiez nature of the problems). The main contribution of this paper is to propose a proper mathematical 12:30 Lunch break defnition of local optimality for this sequential PM setting---local minimax, as well as to present its 02:00 Invited talk: Fei Fang (CMU) Fang properties and existence results. Finally, we PM establish a strong connection to a basic local 02:40 Contributed talk: On Solving Local search algorithm---gradient descent ascent PM Minimax Optimization: A Follow-the- (GDA): under mild conditions, all stable limit Ridge Approach Wang points of GDA are exactly local minimax points up to some degenerate points. 119 Dec. 14, 2019

Abstract 11: Contributed talk: Exploiting Uncertain Real-Time Information from Deep Abstract 7: Contributed talk: Characterizing Learning in Signaling Games for Security Equilibria in Stackelberg Games in Bridging and Sustainability in Bridging Game Theory Game Theory and Deep Learning, Fiez 12:05 and Deep Learning, Bondi 03:05 PM PM Motivated by real-world deployment of drones for This paper investigates the convergence of conservation, this paper advances the state-of- learning dynamics in Stackelberg games on the-art in security games with signaling. The well- continuous action spaces, a class of games known defender-attacker security games distinguished by the hierarchical order of play framework can help in planning for such strategic between agents. We establish connections deployments of sensors and human patrollers, between the Nash and Stackelberg equilibrium and warning signals to ward of adversaries. concepts and characterize conditions under However, we show that defenders can sufer which attractors of simultaneous gradient signifcant losses when ignoring real-world descent are Stackelberg equilibria in zero-sum uncertainties, such as detection uncertainty games. Moreover, we show that the only stable resulting from imperfect deep learning models, attractors of the Stackelberg gradient dynamics despite carefully planned security game are Stackelberg equilibria in zero-sum games. strategies with signaling. In fact, defenders may Using this insight, we develop two-timescale perform worse than forgoing drones completely learning dynamics that converge to Stackelberg in this case. We address this shortcoming by equilibria in zero-sum games and the set of proposing a novel game model that integrates stable attractors in general-sum games. signaling and sensor uncertainty; perhaps surprisingly, we show that defenders can still perform well via a signaling strategy that exploits Abstract 10: Contributed talk: On Solving the uncertain real-time information primarily from Local Minimax Optimization: A Follow-the- deep learning models. For example, even in the Ridge Approach in Bridging Game Theory presence of uncertainty, the defender still has an and Deep Learning, Wang 02:40 PM informational advantage in knowing that she has or has not actually detected the attacker; and she Many tasks in modern machine learning can be can design a signaling scheme to ``mislead'' the formulated as fnding equilibria in attacker who is uncertain as to whether he has \emph{sequential} games. In particular, two- been detected. We provide a novel algorithm, player zero-sum sequential games, also known as scale-up techniques, and experimental results minimax optimization, have received growing from simulation based on our ongoing interest. It is tempting to apply gradient descent deployment of a conservation drone system in to solve minimax optimization given its popularity South Africa. in supervised learning. However, we note that naive application of gradient descent fails to fnd local minimax -- the analogy of local minima in minimax optimization, since the fxed points of Deep Reinforcement Learning gradient dynamics might not be local minimax. In this paper, we propose \emph{Follow-the-Ridge} Pieter Abbeel, Chelsea Finn, Joelle Pineau, David (FR), an algorithm that locally converges to and Silver, Satinder Singh, Joshua Achiam, Carlos only converges to local minimax. We show Florensa, Christopher Grimm, Haoran Tang, Vivek theoretically that the algorithm addresses the Veeriah limit cycling problem around fxed points, and is West Exhibition Hall C, Sat Dec 14, 08:00 AM compatible with preconditioning and \emph{positive} momentum. Empirically, FR In recent years, the use of deep neural networks solves quadratic minimax problems and improves as function approximators has enabled GAN training on simple tasks. researchers to extend reinforcement learning techniques to solve increasingly complex control tasks. The emerging feld of deep reinforcement 120 Dec. 14, 2019

learning has led to remarkable empirical results 02:30 Poster Session Sabatelli, Stooke, Abdi, in rich and varied domains like robotics, strategy PM Rauber, Adolphs, Osband, Meisheri, games, and multiagent interaction. This Kurach, Ackermann, Benatan, ZHANG, workshop will bring together researchers working Tessler, Shen, Samvelyan, Islam, Dalal, at the intersection of deep learning and Harries, Kurenkov, Żołna, Dasari, reinforcement learning, and it will help interested Hartikainen, Nachum, Lee, Holzleitner, researchers outside of the feld gain a high-level Nguyen, Song, Grimm, Silva, Luo, Wu, view about the current state of the art and Lee, Paine, Qu, Graves, Flet-Berliac, Tang, potential directions for future contributions. Nair, Hausknecht, Bagaria, Schmitt, Baker, Parmas, Eysenbach, Lee, Lin, Seita, Gupta, Simmons-Edler, Guo, Corder, Kumar, Fujimoto, Lerer, Clavera Schedule Gilaberte, Rhinehart, Nair, Yang, Wang, Sohn, Hernandez-Garcia, Lee, Srivastava, 08:45 Welcome Comments Khetarpal, Xiao, Carvalho Melo, Agarwal, AM Yu, Berseth, Chaplot, Tang, Srinivasan, 09:00 Grandmaster Level in StarCraft II Medini, Havens, Laskin, Mujika, Saphal, AM using Multi-Agent Reinforcement Marino, Ray, Achiam, Mandlekar, Liu, Learning - Invited Talk Vinyals Hafner, Tang, Xiao, Walton, Druce, Alet, Hong, Chan, Nagabandi, Liu, Sun, Liu, 09:30 Contributed Talks Tang, Guo, Hafner Jayaraman, Co-Reyes, Sanborn AM 04:00 NeurIPS RL Competitions Results 10:00 Bayes-Adaptive Deep Reinforcement PM Presentations AM Learning via Meta-Learning - Invited Talk Whiteson 05:00 Assessing the Robustness of Deep PM RL Algorithms - Invited Talk Littman 10:30 Cofee Break AM 05:30 Panel Discussion PM 11:00 Optico: A Framework for Model- AM Based Optimization with MuJoCo Physics - Invited Talk Todorov Abstracts (7): 11:30 Contributed Talks Lu, Hausknecht, AM Nachum Abstract 3: Contributed Talks in Deep 12:00 Late-Breaking Papers (Talks) Silver, Reinforcement Learning, Tang, Guo, Hafner PM Du, Plappert 09:30 AM

01:30 Invited Talk Brunskill * "Playing Dota 2 with Large Scale Deep PM Reinforcement Learning" - OpenAI, Christopher 02:00 Contributed Talks Agarwal, Gleave, Lee Berner, Greg Brockman, Brooke Chan, Vicki PM Cheung, Przemyłsaw Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott , Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang * "Efcient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy" - Yijie Guo, Jongwook Choi, Marcin Moczulski, Samy Bengio, Mohammad Norouzi, Honglak Lee * "Efcient Visual Control by Latent Imagination" - Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi 121 Dec. 9, 2019

Abstract 7: Contributed Talks in Deep Abstract 9: Invited Talk in Deep Reinforcement Learning, Lu, Hausknecht, Reinforcement Learning, Brunskill 01:30 PM Nachum 11:30 AM (Talk title and abstract TBD.) * "Adaptive Online Planning for Lifelong Reinforcement Learning" - Kevin Lu, Igor Mordatch, Pieter Abbeel Abstract 10: Contributed Talks in Deep * "Interactive Fiction Games: A Colossal Reinforcement Learning, Agarwal, Gleave, Lee Adventure" - Matthew Hausknecht, Prithviraj V 02:00 PM Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan * "Hierarchy is Exploration: An Empirical Analysis * "Striving for Simplicity in Of-Policy Deep of the Benefts of Hierarchy" - Ofr Nachum, Reinforcement Learning" - Rishabh Agarwal, Dale Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Schuurmans, Mohammad Norouzi Lee, Sergey Levine * "Adversarial Policies: Attacking Deep Reinforcement Learning" - Adam R Gleave, Michael Dennis, Neel Kant, Cody Wild, Sergey Abstract 8: Late-Breaking Papers (Talks) in Levine, Stuart Russell * "A Simple Randomization Technique for Deep Reinforcement Learning, Silver, Du, Generalization in Deep Reinforcement Learning" - Plappert 12:00 PM Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee * Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model - Julian Schrittwieser, Ioannis Antonoglou, Thomas Abstract 12: NeurIPS RL Competitions Results Hubert, Karen Simonyan, Laurent Sifre, Simon Presentations in Deep Reinforcement Schmitt, Arthur Guez, Edward Lockhart, Demis Learning, 04:00 PM Hassabis, Thore Graepel, Timothy Lillicrap, David Silver 16:00 - 16:15 Learn to Move: Walk Around * Is a Good Representation Sufcient for Sample 16:15 - 16:30 Animal Olympics Efcient Reinforcement Learning? - Simon S. Du, 16:30 - 16:45 Robot open-Ended Autonomous Sham M. Kakade, Ruosong Wang, Lin F. Yang Learning (REAL) * Solving Rubik's Cube with a Robot Hand - 16:45 - 17:00 MineRL OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Abstract 14: Panel Discussion in Deep Glenn Powell, Raphael Ribas, Jonas Schneider, Reinforcement Learning, 05:30 PM Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei (Topic and panelists TBA.) Zhang

Dec. 9, 2019