<<

NeurIPS 2020 Workshop book

Schedule Highlights Generated Fri Dec 18, 2020

Workshop organizers make last-minute changes to their schedule. Download this document again to get the lastest changes, or use the NeurIPS mobile application.

Dec. 10, 2020

None Topological Data Analysis and Beyond Rieck, Chazal, Krishnaswamy, Kwitt, Natesan Ramamurthy, Umeda, Wolf

Dec. 11, 2020

None Privacy Preserving - PriML and PPML Joint Edition Balle, Bell, Bellet, Chaudhuri, Gascon, Honkela, Koskela, Meehan, Ohrimenko, Park, Raykova, Smart, Wang, Weller None Tackling Climate Change with ML Dao, Sherwin, Donti, Kuntz, Kaack, Yusuf, Rolnick, Nakalembe, Monteleoni, Bengio None Meta-Learning Wang, Vanschoren, Grant, Schwarz, Visin, Clune, Calandra None OPT2020: Optimization for Machine Learning Paquette, Schmidt, Stich, Gu, Takac None Advances and Opportunities: Machine Learning for Education Garg, Hefernan, Meyers None Diferential Geometry meets (DifGeo4DL) Bose, Mathieu, Le Lan, Chami, Sala, De Sa, Nickel, Ré, Hamilton None Workshop on Dataset Curation and Security Baracaldo Angel, Bisk, Blum, Curry, Dickerson, Goldblum, Goldstein, Li, Schwarzschild None Machine Learning for Health (ML4H): Advancing Healthcare for All Hyland, Schmaltz, Onu, Nosakhare, Alsentzer, Chen, McDermott, Roy, Akera, Kiyasseh, Falck, Adams, Bica, Bear Don't Walk IV, Sarkar, Pfohl, Beam, Beaulieu-Jones, Belgrave, Naumann None Learning Meaningful Representations of Life (LMRL.org) Wood, Marks, Jones, Dieng, Aspuru-Guzik, Kundaje, Engelhardt, Liu, Boyden, Lindorf-Larsen, Nitzan, Krishnaswamy, Boomsma, Wang, Van Valen, Ashenberg None First Workshop on Quantum Tensor Networks in Machine Learning Liu, Zhao, Biamonte, Caiafa, Liang, Cohen, Leichenauer None Human in the loop dialogue systems Hedayatnia, Goel, Oraby, See, Khatri, Boureau, Geramifard, Walker, Hakkani-Tur None The pre-registration experiment: an alternative publication model for machine learning research Bertinetto, Henriques, Albanie, Paganini, Varol None Diferentiable , graphics, and physics in machine learning Jatavallabhula, Allen, Dean, Hansen, Song, Shkurti, Paull, Nowrouzezahrai, Tenenbaum None Causal Discovery and Causality-Inspired Machine Learning Huang, Magliacane, Zhang, Belgrave, Bareinboim, Malinsky, Richardson, Meek, Spirtes, Schölkopf None Self- for Speech and Audio Processing Mohamed, Lee, Watanabe, Li, Sainath, Livescu None Machine Learning and the Physical Sciences Anandkumar, Cranmer, Ho, Prabhat, Zdeborová, Baydin, Carrasquilla, Dieng, Kashinath, Louppe, Nord, Paganini, Thais 2

None ML Competitions at the Grassroots (CiML 2020) Chklovski, Mendrik, Banifatemi, Stolovitzky None Resistance AI Workshop Kite, Tesfaldet, Abdurahman, Agnew, Creager, Foryciarz, Gontijo Lopes, Kalluri, Png, Sabin, Skoularidou, Vilarino, Wang, Kapoor, Carroll None Workshop on Deep Learning and Inverse Problems Heckel, Hand, Baraniuk, Zdeborová, Feizi None 3rd Robot Learning Workshop Itkina, Bewley, Calandra, Gilitschenski, PEREZ, Senanayake, Wulfmeier, Vanhoucke None Machine Learning for Autonomous Driving McAllister, Weng, Omeiza, Rhinehart, Yu, Ros, Koltun None Fair AI in Finance Kumar, Rudin, Paisley, Moulinier, Bruss, K., Tibbs, Olabiyi, Gandrabur, Vyetrenko, Compher None Object Representations for Learning and Reasoning Agnew, Assouel, Chang, Creswell, Kosoy, Rajeswaran, van Steenkiste None Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation Baidakova, Casati, Drutsa, Ustalov None Competition Track Friday Escalante, Hofmann None ML Retrospectives, Surveys & Meta-Analyses (ML-RSA) Yadav, Pradhan, Dodge, Jaiswal, Henderson, Gupta, Lowe, Jessica Forde, Pineau None Deep Abbeel, Finn, Pineau, Silver, Singh, Devin, Laskin, Lee, Rajendran, Veeriah None KR2ML - Knowledge Representation and Reasoning Meets Machine Learning Thost, Talamadupula, Srikumar, Zhang, Tenenbaum None BabyMind: How Babies Learn and How Machines Can Imitate Zhang, Marcus, Cangelosi, Knoeferle, Obermayer, Vernon, Yu None Machine Learning for Economic Policy Zheng, Trott, Liang, Morgenstern, Parkes, Haghtalab

Dec. 12, 2020

None Algorithmic Fairness through the Lens of Causality and Interpretability Dieng, Schrouf, Kusner, Farnadi, Diaz None Medical Imaging Meets NeurIPS Teuwen, Dou, Glocker, Oguz, Feragen, Lombaert, Konukoglu, de Bruijne None Learning Meets Combinatorial Vlastelica, Song, Ferber, Amos, Martius, Dilkina, Yue None Machine Learning for the Developing World (ML4D): Improving Resilience Afonja, Klemmer, Kalavakonda, Azeez, Salama, Rodriguez Diaz None Biological and Artifcial Reinforcement Learning Chua, Behbahani, Lee, Zannone, Ponte Costa, Richards , Momennejad, Precup None I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning Forde, Ruiz, Fernandez Pradier, Schein, Doshi-Velez, Valera, Blei, Wallach None Machine Learning for Engineering Modeling, Simulation and Design 3

Beatson, Donti, Abdel-Rahman, Hoyer, Yu, Kolter, Adams None Machine Learning for Creativity and Design 4.0 Elliott, Dieleman, Roberts, White, Ippolito, Grimm, Tesfaldet, Azadi None Cooperative AI Graepel, Amodei, Conitzer, Dafoe, Hadfeld, Horvitz, Kraus, Larson, Bachrach None Machine Learning for Molecules Hernández-Lobato, Kusner, Paige, Segler, Wei None Navigating the Broader Impacts of AI Research Ashurst, Campbell, Raji, Barocas, Russell None Beyond : Novel Ideas for Training Neural Architectures Malinowski, Swirszcz, Patraucean, Gori, Huang, Löwe, Choromanska None MLPH: Machine Learning in Public Health Chunara, Flaxman, Lizotte, Patel, Rosella None Wordplay: When Language Meets Games Ammanabrolu, Hausknecht, Yuan, Côté, Trischler, Mathewson, Urbanek, Weston, Riedl None Interpretable Inductive Biases and Physically Structured Learning Lutter, Terenin, Ho, Wang None AI for Earth Sciences Mukkavilli, Hansen, Dudek, Beucler, Kochanski, Mudigonda, Kashinath, McGovern, Miller, Frischmann, Gentine, Dudek, Courville, Kammen, Kumar None Machine Learning for Mobile Health Futoma, Dempsey, Heller, Ma, Foti, Njifon, Zhang, Shi None Talking to Strangers: Zero-Shot Emergent Communication Ossenkopf, Filos, Gupta, Noukhovitch, Lazaridou, Foerster, Bullard, Chaabouni, Kharitonov, Dessì None Shared Visual Representations in Human and Machine Intelligence (SVRHM) Deza, Peterson, Murty, Grifths None Competition Track Saturday Escalante, Hofmann None Machine Learning for Structural Biology Townshend, Eismann, Dror, Zhong, Anand, Ingraham, Boomsma, Ovchinnikov, Rao, Greisen, Kolodny, Berger None Second Workshop on AI for Humanitarian Assistance and Disaster Response Gupta, Murphy, Heim, Wang, Goodman, Patel, Bilinski, Nemni None Consequential Decisions in Dynamic Environments Kilbertus, Zhou, Wilson, Miller, Hu, Liu, Kallus, Mitchell None HAMLETS: Human And Model in the Loop Evaluation and Training Strategies Kaushik, Paranjape, Arabshahi, Elazar, Nie, Bartolo, Kirichenko, Saito Stenetorp, Bansal, Lipton, Kiela None International Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL 2020) Li, Dou, Talwalkar, Li, Wang, Wang None Workshop on Computer Assisted Programming (CAP) Odena, Sutton, Polikarpova, Tenenbaum, Solar-Lezama, Dillig None The Challenges of Real World Reinforcement Learning Mankowitz, Dulac-Arnold, Mannor, Gottesman, Nagabandi, Precup, Mann, Dulac-Arnold None Self-Supervised Learning -- Theory and Practice 4

Xie, Zhang, Agrawal, Misra, Rudin, Mohamed, Yuan, Zoph, van der Maaten, Yang, Xing None Machine Learning for Systems Goldie, Mirhoseini, Raiman, Maas, XU NoneOfine Reinforcement Learning Kumar, Agarwal, Tucker, Li, Precup, Kumar None Deep Learning through Information Geometry Chaudhari, Alemi, Jog, Mehta, Nielsen, Soatto, Ver Steeg 5 Dec. 10, 2020

Dec. 10, 2020 Schedule

N/A Gather.Town (for poster sessions) Topological Data Analysis and Beyond N/A Rocket.Chat (for asking questions to panellists) Bastian Rieck, Frederic Chazal, Smita Krishnaswamy, N/A Slack (for asking questions to Roland Kwitt, Karthi Natesan Ramamurthy, Yuhei Umeda, Guy Wolf panellists) 11:00 Opening Remarks Chazal, Thu Dec 10, 23:00 PM PM Krishnaswamy, Kwitt, Natesan Ramamurthy, Rieck, Umeda, Wolf The last decade saw an enormous boost in the 11:15 Keynote: Kathryn Hess: Topological feld of computational topology: methods and PM Insights in Neuroscience Hess concepts from algebraic and diferential topology, 11:45 Invited Talk: Vidit Nanda: Singularity formerly confned to the realm of pure PM Detection in Data Nanda mathematics, have demonstrated their utility in 12:00 Invited Talk: Yuzuru Yamakage: numerous areas such as computational biology, AM Industrial Application of TDA-ML personalised medicine, materials science, and technology: Achievement so far and time-dependent data analysis, to name a few. expectations of future Yamakage 12:15 Invited Talk: Katharine Turner Turner The newly-emerging domain comprising AM topology-based techniques is often referred to as topological data analysis (TDA). Next to their 12:30 Invited Talk: Manohar Kaul: Solving applications in the aforementioned areas, TDA AM Partial Assignment Problems using methods have also proven to be efective in Random Simplicial Complexes Kaul supporting, enhancing, and augmenting both 12:45 Invited Talk: Yasuaki Hiraoka: classical machine learning and deep learning AM Characterizing Rare Events in models. Persistent Homology Hiraoka 01:00 Invited Talk: Serguei Barannikov: We believe that it is time to bring together AM Topological Obstructions to Neural theorists and practitioners in a creative Networks’ Learning Barannikov environment to discuss the goals beyond the 01:15 Invited Talk: Ulrich Bauer: The currently-known bounds of TDA. We want to start AM Representation Theory of Filtered a conversation between experts, non-experts, Hierarchical Clustering Bauer and users of TDA methods to debate the next 01:30 Spotlight: Topo Sampler: A Topology steps the feld should take. We also want to AM Constrained Noise Sampling for disseminate methods to a broader audience and GANs Dey, Das demonstrate how easy the integration of 01:33 Spotlight: Weighting Vectors for topological concepts into existing methods can AM Machine Learning: Numerical be. Harmonic Analysis Applied to Boundary Detection Bunch, Kline, **Important links**: Dickinson, Fung 01:36 Spotlight: Hypothesis Classes with a - [Gather.Town (for poster sessions)](https:// AM Unique Persistence Diagram are neurips.gather.town/app/EfqcVjt6CmhKKeu0/ Nonuniformly Learnable Bishop, Tran- TDA%20and%20Beyond%20@%20NeurIPS) Thanh, Davies - [Rocket.Chat (for asking questions)](https:// 01:39 Spotlight: Quantifying Barley neurips2020.rocket.chat/channel/topological- AM Morphology Using the Euler data-analysis-and-beyond-99) Characteristic Transform Amézquita, - [Slack (for asking questions)](https:// Munch, Quigley, Ophelders, Landis, join.slack.com/t/tda-in-ml/shared_invite/zt- Chitwood, Koenig brm7ypv4-Br0vXGge8wUoaSmgp~JTGA) 6 Dec. 10, 2020

01:42 Spotlight: giotto-tda: A Topological 06:36 Spotlight: Characterizing the Latent AM Data Analysis Toolkit for Machine AM Space of Molecular Generative Learning and Data Exploration Models with Persistent Homology Tauzin, Lupo, Hess, Tunstall, Pérez, Metrics Schif, Das, Chenthamarakshan, Caorsi, Reise, Medina-Mardones, Dassatti Natesan Ramamurthy 01:45 Poster Session I & Break 06:39 Spotlight: Permutation Invariant AM AM Networks to Learn Wasserstein 02:15 Discussion I Metrics Sehanobish, Ravindra, van Dijk AM 06:42 Spotlight: Multidimensional 03:00 Keynote: Gunnar Carlsson Carlsson AM Persistence Module Classifcation via AM Lattice-Theoretic Riess 03:30 Lunch Break 06:45 Poster Session II & Break AM AM 04:00 Invited Talk: Lida Kanari: A 07:15 Discussion II AM Topological Insight on Neuronal AM Morphologies Kanari 08:00 Invited Talk: Laxmi Parida: TDA on 04:30 Invited Talk: Peter Bubenik Bubenik AM Covid 19 OMICS data PARIDA AM 08:15 Invited Talk: Jose Perea: TALLEM – 04:45 Invited Talk: Andrew J. Blumberg AM Topological Assembly of LocalLy AM Blumberg Euclidean Models Perea N/A Demo: Teaspoon Package 08:30 Invited Talk: Yusu Wang: Discrete 05:00 Invited Talk: Bei Wang: Topology and AM Morse-based Graph Reconstruction AM Neuron Activations in Deep Learning and Data Analysis Wang Wang 08:45 Invited Talk: Robert Ghrist: The 05:15 Invited Talk: Lorin Crawford: A AM Tarski Laplacian Ghrist AM Machine Learning Pipeline for 09:00 Invited Talk: Elizabeth Munch: Feature Selection and Association AM Persistent Homology of Complex Mapping with 3D Shapes Crawford Networks for Dynamic State 05:30 Invited Talk: Chao Chen Chen Detection in Time Series Munch AM 09:15 Invited Talk: Leland McInnes: UMAP 05:45 Invited Talk: Mathieu Carrière: AM + MAPPER = UMAPPER McInnes AM Probabilistic and Statistical Aspects 09:30 Invited Talk: Facundo Mémoli: of Reeb spaces and Mappers Carriere AM Spatiotemporal Persistent Homology 05:45 Invited Talk: Brittany Terese Fasy: for Dynamic Metric Spaces Mémoli AM Searching in the Space of 09:45 Poster Session III & Break Persistence Diagrams Fasy AM 06:15 Invited Talk: Don Sheehy Sheehy 10:15 Discussion III AM AM 06:30 Spotlight: k-simplex2vec: A 11:15 Closing Remarks Chazal, AM Simplicial Extension of node2vec AM Krishnaswamy, Kwitt, Natesan Hacker Ramamurthy, Rieck, Umeda, Wolf 06:33 Spotlight: Sheaf Neural Networks AM Hansen, Gebhart 7 Dec. 11, 2020

Dec. 11, 2020 03:10 Contributed Talk #1: POSEIDON: AM Privacy-Preserving Federated Neural Network Learning Sav 03:25 Contributed Talk Q&A Privacy Preserving Machine Learning - AM PriML and PPML Joint Edition 03:30 Poster Session & Social on AM Gather.Town Borja Balle, James Bell, Aurélien Bellet, Kamalika Chaudhuri, Adria Gascon, Antti Honkela, Antti 08:30 Welcome & Introduction Koskela, Casey Meehan, Olga Ohrimenko, Mi Jung AM Park, Mariana Raykova, Mary Anne Smart, Yu-Xiang 08:40 Invited Talk #3: Carmela Troncoso Wang, Adrian Weller AM (EPFL) Troncoso

Fri Dec 11, 01:20 AM 09:00 Invited Talk #4: Dan Boneh AM (Stanford University) Boneh This one day workshop focuses on privacy 09:30 Invited Talk Q&A with Carmela and preserving techniques for machine learning and AM Dan disclosure in large scale data analysis, both in the 10:00 Break distributed and centralized settings, and on AM scenarios that highlight the importance and need 10:10 Poster Session & Social on for these techniques (e.g., via privacy attacks). AM Gather.Town There is growing interest from the Machine 11:10 Break Learning (ML) community in leveraging AM cryptographic techniques such as Multi-Party 11:20 Contributed Talk #2: On the Computation (MPC) and Homomorphic Encryption AM (Im)Possibility of Private Machine (HE) for privacy preserving training and Learning through Instance Encoding inference, as well as Diferential Privacy (DP) for Carlini disclosure. Simultaneously, the systems security 11:35 Contributed Talk #3: Poirot: Private and cryptography community has proposed AM Contact Summary Aggregation Wang various secure frameworks for ML. We encourage 11:50 Contributed Talk #4: Greenwoods: A both theory and application-oriented submissions AM Practical Random Forest Framework exploring a range of approaches listed below. for Privacy Preserving Training and Additionally, given the tension between the Prediction Chaudhari adoption of machine learning technologies and ethical, technical and regulatory issues about 12:05 Contributed Talks Q&A privacy, as highlighted during the COVID-19 PM pandemic, we invite submissions for the special 12:20 Break track on this topic. PM 12:25 Contributed Talk #5: Shufed Model PM of Federated Learning: Privacy, Schedule Accuracy, and Communication Trade- ofs Data

01:20 Welcome & Introduction 12:40 Contributed Talk #6: Sample- AM PM efcient proper PAC learning with approximate diferential privacy 01:30 Invited Talk #1: Reza Shokri Ghazi AM (National University of Singapore) Shokri 12:55 Contributed Talk #7: Training PM Production Language Models 02:00 Invited Talk #2: Katrina Ligett without Memorizing User Data AM (Hebrew University) Ligett Ramaswamy, Thakkar 02:30 Invited Talk Q&A with Reza and 01:10 Contributed Talks Q&A AM Katrina PM 03:00 Break AM 8 Dec. 11, 2020

N/A Privacy Attacks on Machine N/A Asymmetric Private Set Unlearning Gao with Applications to Contact Tracing N/A On Polynomial Approximations for and Private Vertical Federated Privacy-Preserving and Verifable Machine Learning Cebere ReLU Networks Avestimehr N/A Diferentially Private Generative N/A Multi-Headed Global Model for Models Through Optimal Transport handling Non-IID data Arora Kreis N/A Secure Medical Image Analysis with N/A PrivAttack: A Membership Inference CrypTFlow Alvarez-Valle AttackFramework Against Deep N/A Quantifying Privacy Leakage in Reinforcement LearningAgents Graph Embedding Boutet gomrokchi N/A Diferentially Private Bayesian N/AEfectiveness of MPC-friendly Inference For GLMs Jälkö Softmax Replacement Keller N/A Optimal Client Sampling for N/A Accuracy, Interpretability and Federated Learning Horváth Diferential Privacy via Explainable Boosting Nori N/A Machine Learning with Membership Privacy via Knowledge Transfer N/A Diferentially private cross-silo Shejwalkar federated learning Heikkilä N/A DYSAN: Dynamically sanitizing N/A On the Sample Complexity of motion sensor data against sensitive Privately Learning Unbounded High- inferences through adversarial Dimensional Gaussians Aden-Ali networks JOURDAN N/A Characterizing Private Clipped N/A Privacy Amplifcation by on Convex Decentralization Bellet Generalized Linear Problems Song N/A Challenges of Diferentially Private N/A Enabling Fast Diferentially Private Prediction in Healthcare Settings SGD via Static Graph Compilation Papernot and Batch-Level Parallelism Subramani N/A Unifying Privacy Loss for Data Analytics Rogers N/A Understanding Unintended Memorization in Federated Learning N/A Privacy in Multi-armed Bandits: Thakkar Fundamental Defnitions and Lower Bounds on Regret Basu N/A Local Diferentially Private Regret Minimization in Reinforcement N/A Privacy Risks in Embedded Deep Learning Garcelon Learning Shejwalkar N/A Dynamic Channel Pruning for N/A CrypTen: Secure Multi-Party Privacy Singh Computation Meets Machine Learning Sengupta N/A SOTERIA: In Search of Efcient Neural Networks for Private N/A Twinify: A software package for Inference Shokri diferentially private data release Jälkö N/A Does Domain Generalization Provide Inherent Membership Privacy N/A Hiding Among the Clones: A Simple Mahajan and Nearly Optimal Analysis of Privacy Amplifcation by Shufing N/A Fairness in the Eyes of the Data: Feldman Certifying Machine-Learning Models Baum N/A Dataset Inference: Ownership Resolution in Machine Learning N/A Secure Single-Server Aggregation Papernot with (Poly)Logarithmic Overhead Bell N/A Privacy-preserving XGBoost N/A Robust and Private Learning of Inference Meng Halfspaces Ghazi 9 Dec. 11, 2020

N/A Randomness Beyond Noise: N/A Revisiting Membership Inference Diferentially Private Optimization Under Realistic Assumptions Improvement through Mixup Xiao Jayaraman N/A Distributed Diferentially Private N/A Network Generation with Diferential Averaging with Improved Utility and Privacy Zheng Robustness to Malicious Parties N/A Privacy Regularization: Joint Privacy- Bellet Utility Optimization in Language N/A DAMS: Meta-estimation of private Models Mireshghallah sketch data structures for N/A Tight Approximate Diferential diferentially private contact tracing Privacy for Discrete-Valued Vepakomma Mechanisms Using FFT Koskela N/A Individual Privacy Accounting via a Rényi Filter Feldman N/A SparkFHE: Distributed Datafow Tackling Climate Change with ML Framework with Fully Homomorphic

Encryption Hu David Dao, Evan Sherwin, Priya Donti, Lauren Kuntz, N/A SWIFT: Super-fast and Robust Lynn Kaack, Yumna Yusuf, David Rolnick, Catherine Privacy-Preserving Machine Nakalembe, Claire Monteleoni, Learning Koti Fri Dec 11, 03:00 AM N/A MP2ML: A Mixed-Protocol Machine LearningFramework for Private Climate change is one of the greatest problems Inference Boemer society has ever faced, with increasingly severe N/A Robustness Threats of Diferential consequences for humanity as natural disasters Privacy Oseledets multiply, sea levels rise, and ecosystems falter. N/A Adversarial Attacks and Since climate change is a complex issue, action Countermeasures on Private takes many forms, from designing smart electric Training in MPC Jagielski grids to tracking greenhouse gas emissions N/A Data-oblivious training for XGBoost through satellite imagery. While no silver bullet, models Leung machine learning can be an invaluable tool in N/A Generative Adversarial User Privacy fghting climate change via a wide array of in Lossy Single-Server Information applications and techniques. These applications Retrieval Weng require algorithmic innovations in machine N/A Towards General-purpose learning and close collaboration with diverse Infrastructure for Protecting felds and practitioners. This workshop is Scientifc Data Under Study Prakash intended as a forum for those in the machine N/A Privacy Preserving Chatbot learning community who wish to help tackle Conversations Biswas climate change. Building on our past workshops on this topic, this workshop aims to especially N/A Diferentially Private Stochastic emphasize the pipeline to impact, through Coordinate Descent Damaskinos conversations about machine learning with N/A New Challenges for Fully decision-makers and other global leaders in Homomorphic Encryption Joye implementing climate change strategies. The all- N/A Data Appraisal Without Data Sharing virtual format of NeurIPS 2020 provides a special Xu opportunity to foster cross-pollination between N/A Mitigating Leakage in Federated researchers in machine learning and experts in Learning with Trusted Hardware complementary felds. Ghareh Chamani N/A A Principled Approach to Learning Stochastic Representations for Privacy in Deep Neural Inference Mireshghallah 10 Dec. 11, 2020

Schedule 09:00 Introduction to Zico Kolter AM 09:40 Q&A with Zico Kolter 03:00 Welcome and opening remarks AM AM 10:00 Climate Change and ML in the 04:00 Introduction to Spotlights AM Private Sector Walcott-Bryant, Boche, AM Anandkumar 04:05 Spotlight: Deep Learning for Climate 11:00 Introduction to Spotlights AM Model Output Steininger AM 04:15 Spotlight: An Enriched Automated 11:05 Spotlight: Machine Learning for AM PV Registry: Combining Image AM Glacier Monitoring in the Hindu Kush Recognition and 3D Building Data Himalaya Sankaran Mayer 11:15 Spotlight: Wildfre Smoke and Air 04:22 Spotlight: Interpretability in AM Quality: How Machine Learning Can AM Convolutional Neural Networks for Guide Forest Management Tomaselli Building Damage Classifcation in 11:25 Spotlight: OGNet: Towards a Global Satellite Imagery Chen AM Oil and Gas Infrastructure Database 04:32 Spotlight: A Machine Learning using Deep Learning on Remotely AM Approach to Methane Emissions Sensed Imagery Sheng Mitigation in the Oil and Gas 11:36 Spotlight: Climate Change Driven Industry Wang AM Crop Yield Failures Sharma 04:42 Spotlight: RainBench: Enabling 11:45 Spotlight: Towards Tracking the AM Data-Driven Precipitation AM Emissions of Every Power Plant on Forecasting on a Global Scale Tong the Planet Couture 04:52 Introduction to frst poster session 12:00 Poster session 3 AM PM 05:00 Poster session 1 01:00 Introduction to Jennifer Chayes AM PM 06:00 Introduction to Spotlights 01:40 Q&A with Jennifer Chayes AM PM 06:09 Spotlight: The Peruvian Amazon 02:50 Closing remarks AM Forestry Dataset: A Leaf Image PM Classifcation Corpus Vizcarra Aguilar 03:15 Poster reception 06:19 Spotlight: Data-driven modeling of PM AM cooling demand in a commercial N/A The Human Efect Requires Afect: building Naeem Addressing Social-Psychological 06:25 Spotlight: Structural Forecasting for Factors of Climate Change with AM Tropical Cyclone Intensity Machine Learning Tilbury Prediction: Providing Insight with N/A Hyperspectral Remote Sensing of Deep Learning McNeely Aquatic Microbes to Support Water 06:37 Spotlight: FireSRnet: Geoscience- Resource Management Kim AM driven super-resolution of future fre N/A Leveraging Machine learning for risk from climate change Ballard Sustainable and Self-sufcient 06:47 Spotlight: Spatiotemporal Features Energy Communities Faustine AM Improve Fine-Grained Butterfy N/A HECT: High-Dimensional Ensemble Image Classifcation Skreta Consistency Testing for Climate 07:00 Climate Change and ML for Policy Models Dalmasso AM Hsu, Newman, Rattling Leaf, Sr., Cisse N/A Monitoring the Impact of Wildfres 08:00 Poster session 2 on Tree Species with Deep Learning AM ZHOU 11 Dec. 11, 2020

N/A Predicting the Solar Potential of N/A Privacy Preserving Demand Rooftops using Image Segmentation Forecasting to Encourage Consumer and Structured Data de Barros Soares Acceptance of Smart Energy Meters N/A FlowDB: A new large scale river fow, Briggs fash food, and precipitation N/A Predicting Landsat Refectance with dataset Godfried Deep Generative Fusion Bouabid N/A Can Federated Learning Save The N/A Storing Energy with Organic Planet ? Qiu Molecules: Towards a Metric for N/A Machine Learning towards a Global Improving Molecular Performance Parametrization of Atmospheric New for Redox Flow Batteries Mejia Particle Formation and Growth Mendoza Nicolaou N/A Annual and in-season mapping of N/A Learning the distribution of extreme cropland at feld scale with sparse precipitation from atmospheric labels Tseng general circulation model variables N/A Monitoring Shorelines via High- Hess Resolution Satellite Imagery and N/A Graph Neural Networks for Improved Deep Learning Ramesh El Ni√±o Forecasting Rühling Cachay N/A Accurate river level predictions N/A Estimating Forest Ground using a Wavenet-like model Doyle Vegetation Cover From Nadir N/A Machine Learning Informed Policy Photographs Using Deep for Environmental Justice in Atlanta Convolutional Neural Networks with Climate Justice Implications Barczyk Hampton N/A Satellite imagery analysis for Land N/A VConstruct: Filling Gaps in Chl-a Use, Land Use Change and Forestry: Data Using a Variational A pilot study in Kigali, Rwanda Aboh Ehrler N/A Understanding global fre regimes N/A Physics-constrained Deep Recurrent using Artifcial Intelligence Pais Neural Models of Building Thermal N/A Revealing the Oil Majors' Adaptive Dynamics Drgona Capacity to the Energy Transition N/A Automated Salmonid Counting in with Deep Multi-Agent Sonar Data Kulits Reinforcement Learning Radovic N/A Explaining Complex Energy Systems: N/A Analyzing Sustainability Reports A Challenge Hülsmann Using Natural Language Processing N/A Deep learning architectures for Luccioni inference of AC-OPF solutions N/A Deep Reinforcement Learning in Falconer Electricity Generation Investment N/A Using attention to model long-term for the Minimization of Long-Term dependencies in occupancy behavior Carbon Emissions and Electricity Kleinebrahm Costs Kell N/A Spatio-Temporal Learning for N/A ForestNet: Classifying Drivers of Feature Extraction inTime-Series Deforestation in Indonesia using Images Kamdem De Teyou Deep Learning on Satellite Imagery N/A Machine learning for advanced solar Irvin cell production: adversarial N/AOfceLearn: An OpenAI Gym denoising, sub-pixel alignment and Environment for Building Level the digital twin Demant Energy Demand Response Spangher N/A Residue Density Segmentation for N/A Meta-modeling strategy for data- Monitoring and Optimizing Tillage driven forecasting Skinner Practices Hobbs 12 Dec. 11, 2020

N/A Mangrove Ecosystem Detection N/A Context-Aware Urban Energy using Mixed-Resolution Imagery Efciency Optimization Using Hybrid with a Hybrid-Convolutional Neural Physical Models Choi Network Hicks N/A Quantifying the presence of air N/A Is Africa leapfrogging to renewables pollutants over a road network in or heading for carbon lock-in? A high spatio-temporal resolution machine-learning-based approach to Bohm predicting success of power- N/A DeepWaste: Applying Deep Learning generation projects Alova to Waste Classifcation for a N/A Short-term PV output prediction Sustainable Planet Narayan using convolutional neural network: N/A Emerging Trends of Sustainability learning from an imbalanced sky Reporting in the ICT Industry: images dataset via sampling and Insights from Discriminative Topic Nie Mining Shi N/A EarthNet2021: A novel large-scale N/A Deep Fire Topology: Understanding dataset and challenge for the role of landscape spatial forecasting localized climate patterns in wildfre susceptibility impacts Requena-Mesa Pais N/A Investigating two super-resolution N/A A Temporally Consistent Image- methods for downscaling based Sun Tracking for precipitation: ESRGAN and CAR Solar Energy Forecasting Applications Paletta N/A High-resolution global irrigation N/A Optimal District Heating in China prediction with Sentinel-2 30m data with Deep Reinforcement Learning Hawkins Le Coz N/A Do Occupants in a Building exhibit N/A Formatting the Landscape: Spatial patterns in Energy Consumption? conditional GAN for varying Analyzing Clusters in Energy Social population in satellite imagery Games Das Langer N/A A Generative Adversarial Gated N/A Expert-in-the-loop Systems Towards Recurrent Network for Power Safety-critical Machine Learning Disaggregation & Consumption Technology in Wildfre Intelligence Awareness Kaselimi Sousa N/A NightVision: Generating Nighttime N/A A Comparison of Data-Driven Models Satellite Imagery from Infra-Red for Predicting Stream Water Observations Harder Temperature Weierbach N/A Counting Cows: Tracking Illegal N/A ClimaText: A Dataset for Climate Cattle Ranching From High- Change Topic Detection Leippold Resolution Satellite Imagery Laradji N/A pymgrid: An Open-Source Python N/A Towards DeepSentinel: An Microgrid Simulator for Applied extensible corpus of labelled Artifcial Intelligence Research Henri Sentinel-1 and -2 imagery and a N/A Quantitative Assessment of Drought proposed general purpose sensor- Impacts Using XGBoost based on the fusion semantic embedding model Drought Impact Reporter Zhang Kruitwagen N/A Movement Tracks for the Automatic N/A Loosely Conditioned Emulation of Detection of Fish Behavior in Videos Global Climate Models With McIntosh Generative Adversarial Networks N/A ACED: Accelerated Computational Hutchinson Electrochemical systems Discovery Kurchin 13 Dec. 11, 2020

N/A Machine Learning Climate Model Abstract 3: Spotlight: Deep Learning for Dynamics: Ofine versus Online Climate Model Output Statistics in Tackling Performance Brenowitz Climate Change with ML, Steininger 04:05 AM N/A Characterization of Industrial Smoke Climate models are an important tool for the Plumes from Remote Sensing Data assessment of prospective climate change efects Mommert but they sufer from systematic and N/A Narratives and Needs: Analyzing representation errors, especially for precipitation. Experiences of Cyclone Amphan Model output statistics (MOS) reduce these errors Using Twitter Discourse Crayton by ftting the model output to observational data N/A Street to Cloud: Improving Flood with machine learning. In this work, we explore Maps With Crowdsourcing and the feasibility and potential of deep learning with Semantic Segmentation Sunkara convolutional neural networks (CNNs) for MOS. N/A Artifcial Intelligence, Machine We propose the CNN architecture ConvMOS Learning and Modeling for specifcally designed for reducing errors in Understanding the Oceans and climate model outputs and apply it to the climate Climate Change Martí model REMO. Our results show a considerable N/A Towards Data-Driven Physics- reduction of errors and mostly improved Informed Global Precipitation performance compared to three commonly used Forecasting from Satellite Imagery MOS approaches. Zantedeschi N/A Long-Range Seasonal Forecasting of 2m-Temperature with Machine Abstract 4: Spotlight: An Enriched Automated Learning Vos PV Registry: Combining Image Recognition N/A Electric Vehicle Range Improvement and 3D Building Data in Tackling Climate by Utilizing Deep Learning to Change with ML, Mayer 04:15 AM Optimize Occupant Thermal Comfort Warey While photovoltaic (PV) systems are installed at N/A In-N-Out: Pre-Training and Self- an unprecedented rate, reliable information on an Training using Auxiliary Information installation level remains scarce. As a result, for Out-of-Distribution Robustness automatically created PV registries are a timely Jones contribution to optimize grid planning and operations. This paper demonstrates how aerial N/A A Way Toward Low-Carbon Shipping: imagery and three-dimensional building data can Improving Port Operations Planning be combined to create an address-level PV using Machine Learning El Mekkaoui registry, specifying area, tilt, and orientation N/A Short-term prediction of angles. We demonstrate the benefts of this photovoltaic power generation using approach for PV capacity estimation. In addition, Gaussian process regression Al this work presents, for the frst time, a Lawati comparison between automated and ofcially- N/A Forecasting Marginal Emissions created PV registries. Our results indicate that Factors in PJM Wang our enriched automated registry proves to be N/A Automated Identifcation of Oil Field useful to validate, update, and complement Features using CNNs DILEEP ofcial registries. N/A A Multi-source, End-to-End Solution for Tracking Climate Change Adaptation in Agriculture Coca-Castro Abstract 5: Spotlight: Interpretability in N/A Short-Term Solar Irradiance Convolutional Neural Networks for Building Forecasting Using Calibrated Damage Classifcation in Satellite Imagery Probabilistic Models Zelikman in Tackling Climate Change with ML, Chen 04:22 AM

Abstracts (15): 14 Dec. 11, 2020

Natural disasters ravage the world's cities, regression performs the best out of several valleys, and shores on a monthly basis. Having algorithms. The model achieved a 70% accuracy precise and efcient mechanisms for assessing rate with a 57% recall and a 66% balanced infrastructure damage is essential to channel accuracy rate. Compared to the conventional resources and minimize the loss of life. Using a approach, the machine learning model reduced dataset that includes labeled pre- and post- the time to achieve a 50% emissions mitigation disaster satellite imagery, we train multiple target by 42%. Correspondingly, the mitigation convolutional neural networks to assess building cost reduced from $85/t CO2e to $49/t CO2e. damage on a per-building basis. In order to investigate how to best classify building damage, we present a highly interpretable deep-learning Abstract 7: Spotlight: RainBench: Enabling methodology that seeks to explicitly convey the Data-Driven Precipitation Forecasting on a most useful information required to train an Global Scale in Tackling Climate Change accurate classifcation model. We also delve into with ML, Tong 04:42 AM which loss functions best optimize these models. Our fndings include that ordinal-cross entropy Climate change is expected to aggravate loss is the most optimal to use and extreme precipitation events, directly impacting that including the type of disaster that caused the livelihood of millions. Without a global the damage in combination with a pre- and post- precipitation forecasting system in place, many disaster image best predicts the level of damage regions -- especially those constrained in caused. Our research seeks to computationally resources to collect expensive groundstation data contribute to aiding in this ongoing and growing -- are left behind. To mitigate such unequal reach humanitarian crisis, heightened by climate of climate change, a solution is to alleviate the change. reliance on numerical models (and by extension groundstation data) by enabling machine- learning-based global forecasts from satellite imagery. Though prior works exist in regional Abstract 6: Spotlight: A Machine Learning Approach to Methane Emissions Mitigation precipitation nowcasting, there lacks work in in the Oil and Gas Industry in Tackling global, medium-term precipitation forecasting. Climate Change with ML, Wang 04:32 AM Importantly, a common, accessible baseline for meaningful comparison is absent. In this work, Reducing methane emissions from the oil and gas we present RainBench, a multi-modal benchmark sector is a key component of climate policy in the dataset dedicated to advancing global United States. Methane leaks across the supply precipitation forecasting. We establish baseline chain are stochastic and intermittent, with a tasks and release PyRain, a data-handling small number of sites (‘super-emitters’) pipeline to enable efcient processing of responsible for a majority of emissions. Thus, decades-worth of data by any modeling cost-efective emissions reduction critically relies framework. Whilst our work serves as a basis for on efectively identifying the super-emitters from a new chapter on global precipitation forecast thousands of well-sites and millions of miles of from satellite imagery, the greater promise lies in pipelines. Conventional approaches such as the community joining forces to use our released walking surveys using optical gas imaging datasets and tools in developing machine technology are slow and time-consuming. In learning approaches to tackle this important addition, several variables contribute to the challenge. formation of leaks such as infrastructure age, production, weather conditions, and maintenance practices. Here, we develop a machine learning Abstract 11: Spotlight: The Peruvian Amazon algorithm to predict high-emitting sites that can Forestry Dataset: A Leaf Image be prioritized for follow-up repair. Such Classifcation Corpus in Tackling Climate prioritization can signifcantly reduce the cost of Change with ML, Vizcarra Aguilar 06:09 AM surveys and increase emissions reductions compared to conventional approaches. Our This paper introduces the Peruvian Amazon results show that the algorithm using logistic Forestry Dataset, which includes 59,441 leaves 15 Dec. 11, 2020

samples from ten of the most proftable and human in-the-loop pipeline requires that any endangered Amazon timber-tree species. forecasting guidance must be easily digestible by Besides, the proposal includes a background TC experts if it is to be adopted at operational removal algorithm to feed a fne-tuned CNN. We centers like the National Hurricane Center. Our evaluate the quantitative (accuracy metric) and proposed framework leverages deep learning to qualitative (visual interpretation) impacts of each provide forecasters with something neither end- stage by ablation experiments. The results show to-end prediction models nor traditional intensity a 96.64 % training accuracy and 96.52 % testing guidance does: a powerful tool for monitoring accuracy on the VGG-19 model. Furthermore, the high-dimensional time series of key physically visual interpretation of the model evidences that relevant predictors and the means to understand leaf venations have the highest correlation in the how the predictors relate to one another and to plant recognition task. short-term intensity changes.

Abstract 12: Spotlight: Data-driven modeling Abstract 14: Spotlight: FireSRnet: of cooling demand in a commercial building Geoscience-driven super-resolution of in Tackling Climate Change with ML, Naeem future fre risk from climate change in 06:19 AM Tackling Climate Change with ML, Ballard 06:37 AM Heating, ventilation, and air conditioning (HVAC) systems account for 30% of the total energy With fres becoming increasingly frequent and consumption in buildings. Design and severe across the globe in recent years, implementation of energy-efcient schemes can understanding climate change’s role in fre play a pivotal role in minimizing energy usage. As behavior is critical for quantifying current and an important frst step towards improved HVAC future fre risk. However, global climate models system controls, this study proposes a new typically simulate fre behavior at spatial scales framework for modeling the thermal response of too coarse for local risk assessments. Therefore, buildings by leveraging data measurements and we propose a novel approach towards super- formulating a data-driven system identifcation resolution (SR) enhancement of fre risk exposure model. The proposed method combines principal maps that incorporates not only 2000 to 2020 component analysis (PCA) to identify the most monthly satellite observations of active fres but signifcant predictors that infuence the cooling also local information on land cover and demand of a building with an auto-regressive temperature. Inspired by SR architectures, we integrated moving average with exogenous propose an efcient deep learning model trained variables (ARIMAX) model. The performance of for SR on fre risk exposure maps. We evaluate the developed model was evaluated both this model on resolution enhancement and fnd it analytically and visually. It was found that our outperforms standard image interpolation PCA-based ARIMAX (2-0-5) model was able to techniques at both 4x and 8x enhancement while accurately forecast the cooling demand for the having comparable performance at 2x prediction horizon of 7 days. In this work, the enhancement. We then demonstrate the actual measurements from a university campus generalizability of this SR model over northern building are used for model development and California and New South Wales, Australia. We validation. conclude with a discussion and application of our proposed model to climate model simulations of fre risk in 2040 and 2100, illustrating the potential for SR enhancement of fre risk maps Abstract 13: Spotlight: Structural Forecasting from the latest state-of-the-art climate models. for Tropical Cyclone Intensity Prediction: Providing Insight with Deep Learning in Tackling Climate Change with ML, McNeely 06:25 AM Abstract 15: Spotlight: Spatiotemporal Features Improve Fine-Grained Butterfy Tropical cyclone (TC) intensity forecasts are ultimately issued by human forecasters. The 16 Dec. 11, 2020

Image Classifcation in Tackling Climate Guide Forest Management in Tackling Change with ML, Skreta 06:47 AM Climate Change with ML, Tomaselli 11:15 AM

Understanding the changing distributions of Prescribed burns are currently the most efective butterfies gives insight into the impacts of method of reducing the risk of widespread climate change across ecosystems and is a wildfres, but a largely missing component in prerequisite for conservation eforts. eButterfy is forest management is knowing which fuels one a citizen science website created to allow people can safely burn to minimize exposure to toxic to track the butterfy species around them and smoke. Here we show how machine learning, use these observations to contribute to research. such as spectral clustering and manifold learning, However, correctly identifying butterfy species is can provide interpretable representations and a challenging task for non-specialists and powerful tools for diferentiating between smoke currently requires the involvement of types, hence providing forest managers with vital entomologists to verify the labels of novice users information on efective strategies to reduce on the website. We have developed a computer climate-induced wildfres while minimizing vision model to label butterfy images from production of harmful smoke. eButterfy automatically, decreasing the need for human experts. We employ a model that incorporates geographic and temporal Abstract 24: Spotlight: OGNet: Towards a information of where and when the image was Global Oil and Gas Infrastructure Database taken, in addition to the image itself. We show using Deep Learning on Remotely Sensed that we can successfully apply this Imagery in Tackling Climate Change with spatiotemporal model for fne-grained image ML, Sheng 11:25 AM recognition, signifcantly improving the accuracy of our classifcation model compared to a At least a quarter of the warming that the Earth is baseline image recognition system trained on the experiencing today is due to anthropogenic same dataset. methane emissions. There are multiple satellites in orbit and planned for launch in the next few years which can detect and quantify these

Abstract 22: Spotlight: Machine Learning for emissions; however, to attribute methane Glacier Monitoring in the Hindu Kush emissions to their sources on the ground, a Himalaya in Tackling Climate Change with comprehensive database of the locations and characteristics of emission sources worldwide is ML, Sankaran 11:05 AM essential. In this work, we develop deep learning Glacier mapping is key to ecological monitoring in algorithms that leverage freely available high- the Hindu Kush Himalaya region. Climate change resolution aerial imagery to automatically detect poses a risk to individuals whose livelihoods oil and gas infrastructure, one of the largest depend on the health of glacier ecosystems. In contributors to global methane emissions. We use this work, we present a machine learning based the best algorithm, which we call OGNet, approach to support ecological monitoring, with a together with expert review to identify the focus on glaciers. Our approach is based on semi- locations of oil refneries and petroleum terminals automated mapping from satellite images. We in the U.S. We show that OGNet detects many utilize readily available remote sensing data to facilities which are not present in four standard create a model to identify and outline both clean public datasets of oil and gas infrastructure. All ice and debris-covered glaciers from satellite detected facilities are associated with imagery. We also release data and develop a web characteristics critical to quantifying and tool that allows experts to visualize and correct attributing methane emissions, including the model predictions, with the ultimate aim of types of infrastructure and number of storage accelerating the glacier mapping process. tanks. The data curated and produced in this study is freely available at https://link/provided/ in/camera/ready/version. Abstract 23: Spotlight: Wildfre Smoke and Air Quality: How Machine Learning Can 17 Dec. 11, 2020

Abstract 25: Spotlight: Climate Change Gather.Town. You can participate by: Driven Crop Yield Failures in Tackling * Accessing the **livestream** on our [NeurIPS.cc Climate Change with ML, Sharma 11:36 AM virtual workshop page](https://neurips.cc/virtual/ 2020/protected/workshop_16141.html) - likely The efect of extreme temperatures, precipitation this page! and variations in other meteorological factors * Asking questions to the speakers and panelists afect crop yields, and hence climate change on Sli.do, on the [MetaLearn 2020 website] jeopardizes the entire food supply chain and (https://meta-learn.github.io/2020/) dependent economic activities. We utilize Deep * Joining the **Zoom to message questions to the Neural Networks and Gaussian Processes for moderator during the panel discussion**, also understanding crop yields as functions of from the NeurIPS.cc virtual workshop page. climatological variables, and use change * Joining the **poster sessions on Gather.Town** detection techniques to identify climatological (you can fnd the list of papers (and their virtual thresholds where yield drops signifcantly. placement) for each session on the [MetaLearn 2020 website](https://meta-learn.github.io/ 2020/): Abstract 26: Spotlight: Towards Tracking the * [Session 1](https://neurips.gather.town/app/ Emissions of Every Power Plant on the eLVTscsoraKKHuLI/posterRoom0); Planet in Tackling Climate Change with ML, * [Session 2](https://neurips.gather.town/app/ Couture 11:45 AM ahqPiPXgtuJF1JU7/posterRoom1); * [Session 3](https://neurips.gather.town/app/ Greenhouse gases emitted from fossil-fuel- DK1OFwX1bToEpTYi/posterRoom2). burning power plants are a major contributor to * Chatting with us and other participants on the climate change. Current methods to track [MetaLearn 2020 Rocket.Chat](https:// emissions from individual sources are expensive neurips2020.rocket.chat/channel/meta- and only used in a few countries. While carbon learning-99)! dioxide concentrations can be measured globally * Entering **panel discussion questions** in [this using remote sensing, direct methods do not sli.do](https://app.sli.do/event/uihcwqn4)! provide sufcient spatial resolution to distinguish emissions from diferent sources. We use machine learning to infer power generation and **Focus of the workshop**: Recent years have emissions from visible and thermal power plant seen rapid progress in meta-learning methods, signatures in satellite images. By training on a which transfer knowledge across tasks and data set of power plants for which we know the domains to learn new tasks more efciently, generation or emissions, we are able to apply our optimize the learning process itself, and even models globally. This paper demonstrates initial generate new learning methods from scratch. progress on this project by predicting whether a Meta-learning can be seen as the logical power plant is on or of from a single satellite conclusion of the arc that machine learning has image. undergone in the last decade, from learning classifers and policies over hand-crafted features, to learning representations over which classifers and policies operate, and fnally to Meta-Learning learning algorithms that themselves acquire representations, classifers, and policies. Meta- Jane Wang, Joaquin Vanschoren, Erin Grant, Jonathan learning methods are also of substantial practical Schwarz, Francesco Visin, Jef Clune, Roberto interest. For instance, they have been shown to Calandra yield new state-of-the-art automated machine

Fri Dec 11, 03:00 AM learning algorithms and architectures, and have substantially improved few-shot learning **How to join the virtual workshop**: The 2020 systems. Moreover, the ability to improve one’s Workshop on Meta-Learning will be a series of own learning capabilities through experience can streamed pre-recorded talks + live question-and- also be viewed as a hallmark of intelligent beings, answer (Q&A) periods, and poster sessions on and there are strong connections with work on 18 Dec. 11, 2020

human learning in cognitive science and reward 10:00 Introduction for invited speaker, learning in neuroscience. AM Kate Rakelly Grant 10:01 An inference perspective on meta- AM reinforcement learning Rakelly Schedule 10:26 Q/A for invited talk #6 Rakelly AM 03:00 Introduction and opening remarks 10:30 Reverse engineering learned AM AM optimizers reveals known and novel 03:10 Introduction for invited speaker, mechanisms Maheswaranathan, AM Frank Hutter Wang Sussillo, Metz, Sun, Sohl-Dickstein 03:11 Meta-learning neural architectures, 10:45 Bayesian optimization by density AM initial weights, hyperparameters, AM ratio estimation Tiao, Klein, and algorithm components Hutter Archambeau, Bonilla, Seeger, Ramos 03:36 Q/A for invited talk #1 Hutter 11:00 Panel discussion AM AM 03:40 On episodes, Prototypical Networks, N/A Decoupling Exploration and AM and few-shot learning Laenen, Exploitation in Meta-Reinforcement Bertinetto Learning without Sacrifces Liu 04:00 Poster session #1 N/A A Meta-Learning Approach for Graph AM Representation Learning in Multi- 05:00 Introduction for invited speaker, Task Settings Bufelli AM Luisa Zintgraf Visin N/A Prior-guided Bayesian Optimization 05:01 Exploration in meta-reinforcement Souza AM learning Zintgraf N/A Task Similarity Aware Meta 05:26 Q/A for invited talk #2 Zintgraf Learning: Theory-inspired AM Improvement on MAML Zhou 05:30 Introduction for invited speaker, Tim N/A MPLP: Learning a Message Passing AM Hospedales Schwarz Learning Protocol Randazzo 05:31 Meta-Learning: Representations and N/A Few-shot Sequence Learning with AM Objectives Hospedales Transformers Logeswaran 05:56 Q/A for invited talk #3 Hospedales N/A Uniform Priors for Meta-Learning AM Sinha 06:00 Break N/A Is Support Set Diversity Necessary AM for Meta-Learning? Li 07:00 Poster session #2 N/A How Important is the Train- AM Validation Split in Meta-Learning? 08:00 Introduction for invited speaker, Bai AM Louis Kirsch Vanschoren N/A Learning Flexible Classifers with 08:01 General meta-learning Kirsch Shot-CONditional Episodic (SCONE) AM Training Triantafllou 08:26 Q/A for invited talk #4 Kirsch N/A Learning not to learn: versus AM nurture in silico Lange 08:30 Introduction for invited speaker, Fei- N/A Meta-Learning Initializations for AM Fei Li Grant Image Segmentation Hendryx 08:31 Creating diverse tasks to catalyze N/A Meta-Learning of Compositional Task AM robot learning Fei-Fei Distributions in Humans and 08:56 Q/A for invited talk #5 Fei-Fei Machines Kumar AM N/A Multi-Objective Multi-Fidelity 09:00 Poster session #3 Hyperparameter Optimization with AM Application to Fairness Schmucker 19 Dec. 11, 2020

N/A NAS-Bench-301 and the Case for N/A Contextual HyperNetworks for Novel Surrogate Benchmarks for Neural Feature Adaptation Lamb Architecture Search Siems N/A Continual learning with direction- N/A Open-Set Incremental Learning via constrained optimization Teng Bayesian Prototypical Embeddings N/A Continual Model-Based Willes Reinforcement Learning with N/A Pareto-efcient Acquisition Hypernetworks Huang Functions for Cost-Aware Bayesian N/A Data Augmentation for Meta- Optimization Guinet Learning Ni N/A Similarity of classifcation tasks N/A Defning Benchmarks for Continual Nguyen Few-Shot Learning Patacchiola N/A Task Meta-Transfer from Limited N/A Few-Shot Unsupervised Continual Parallel Labels Jian Learning through Meta-Examples N/A Flexible Dataset Distillation: Learn Bertugli Labels Instead of Images Bohdal N/A Exploring Representation Learning N/A Putting Theory to Work: From for Flexible Few-Shot Tasks Ren Learning Bounds to Meta-Learning N/A Learning to Generate Noise for Algorithms Bouniot Multi-Attack Robustness Madaan N/A Model-Based Meta-Reinforcement N/A of PuPpets: Model-Agnostic Learning for Flight with Suspended Meta-Learning via Pre-trained Payloads Belkhale Parameters for Natural Language N/A HyperVAE: Variational Hyper- Generation Lin Encoding Network Nguyen N/A Meta-Learning via Hypernetworks N/A Adaptive Risk Minimization: A Meta- Zhao Learning Approach for Tackling N/A MobileDets: Searching for Object Group Shift Zhang Detection Architecture for Mobile N/A Measuring few-shot extrapolation Accelerators Xiong with program induction Alet N/A Learning in Low Resource Modalities N/A Towards Meta-Algorithm Selection via Cross-Modal Generalization Liang Tornede N/A Training more efective learned N/A Meta-Learning Backpropagation And optimizers Metz Improving It Kirsch N/A Hyperparameter Transfer Across Abstracts (55): Developer Adjustments Stoll N/A Model-Agnostic Graph Abstract 3: Meta-learning neural Regularization for Few-Shot architectures, initial weights, Learning Shen hyperparameters, and algorithm N/A Prototypical Region Proposal components in Meta-Learning, Hutter 03:11 Networks for Few-shot Localization AM and Classifcation Skomski Meta-learning is a powerful set of approaches N/A Tailoring: encoding inductive biases that promises to replace many components of the by optimizing unsupervised deep learning toolbox by learned alternatives, objectives at prediction time Alet such as learned architectures, optimizers, N/A Meta-Learning Bayesian Neural hyperparameters, and weight initializations. Network Priors Based on PAC- While typical approaches focus on only one of Bayesian Theory Rothfuss these components at a time, in this talk, I will N/A Synthetic Petri Dish: A Novel discuss various efcient approaches for tackling Surrogate Model for Rapid two of them simultaneously. I will also highlight Architecture Search Rawal the advantages of *not* learning complete algorithms from scratch but of rather exploiting 20 Dec. 11, 2020

the inductive bias of existing algorithms by evaluated. In the second part of the talk, we take learning to improve existing algorithms. Finally, I a step back and consider how to meta-learn will briefy discuss the connection of meta- exploration strategies in the frst place, which learning and benchmarks. might require a diferent type of exploration during meta-learning. Throughout the talk, I will focus on the "online adaptation" setting where the agent has to perform well from the very frst Abstract 5: On episodes, Prototypical Networks, and few-shot learning in Meta- time step in a new environment. In these settings Learning, Laenen, Bertinetto 03:40 AM the agent has to very carefully trade of exploration and exploitation, since each action Episodic learning is a popular practice among counts towards its fnal performance. researchers and practitioners interested in few- shot learning. It consists of organising training in a series of Abstract 11: Meta-Learning: Representations learning problems, each relying on small support and Objectives in Meta-Learning, Hospedales and query sets to mimic the few-shot 05:31 AM circumstances encountered during evaluation. In this paper, we investigate the usefulness of In this talk, I will frst give an overview episodic learning in Prototypical Networks, one perspective and taxonomy of major work the the most popular algorithms making use of this feld, as motivated by our recent survey paper on practice. Surprisingly, in our experiments we meta-learning in neural networks. I hope that this found that episodic learning is detrimental to will be informative for newcomers, as well as performance, and that it is under no reveal some interesting connections and circumstance benefcial to diferentiate between diferences between the methods that will be a support and query set within a training batch. thought-provoking for experts. I will then give a This non-episodic version of Prototypical brief overview of recent meta-learning work from Networks, which corresponds to the classic my group, which covers some broad issues in Neighbourhood Component Analysis, reliably machine learning where meta-learning can be improves over its episodic counterpart in multiple applied, including dealing with domain-shift, data datasets, achieving an accuracy that is augmentation, learning with label noise, and competitive with the state-of-the-art, despite accelerating single task RL. Along the way, I will being extremely simple. point out some of the many open questions that remain to be studied in the feld.

Abstract 8: Exploration in meta- reinforcement learning in Meta-Learning, Abstract 16: General meta-learning in Meta- Zintgraf 05:01 AM Learning, Kirsch 08:01 AM

Learning a new task often requires exploration: Humans develop learning algorithms that are gathering data to learn about the environment incredibly general and can be applied across a and how to solve the task. But how do we wide range of tasks. Unfortunately, this process is efciently explore, and how can an agent make often tedious trial and error with numerous the best use of prior knowledge it has about the possibilities for suboptimal choices. General environment? Meta-reinforcement learning allows meta-learning seeks to automate many of these us to learn inductive biases for exploration from choices, generating new learning algorithms data, which plays a crucial role in enabling agents automatically. Diferent from contemporary meta- to rapidly pick up new tasks. In the frst part of learning, where the generalization ability has this talk, I look at diferent meta-learning problem been limited, these learning algorithms ought to settings that exist in the literature, and what type be general-purpose. This allows us to leverage of exploratory behaviour is necessary in these data at scale for learning algorithm design that is settings. This generally depends on how much difcult for humans to consider. I present a time the agent has to interact with the General Meta Learner, MetaGenRL, that meta- environment, before its performance is learns novel Reinforcement Learning algorithms 21 Dec. 11, 2020

that can be applied to signifcantly diferent perspective can inform future meta- environments. We further investigate how we can reinforcement learning research. reduce inductive biases and simplify meta- learning. Finally, I introduce variable-shared meta-learning (VS-ML), a novel principle that Abstract 25: Reverse engineering learned generalizes learned learning rules, fast weights, optimizers reveals known and novel and meta-RNNs (learning in activations). This mechanisms in Meta-Learning, enables (1) implementing backpropagation purely Maheswaranathan, Sussillo, Metz, Sun, Sohl- in the recurrent dynamics of an RNN and (2) Dickstein 10:30 AM meta-learning algorithms for supervised learning from scratch. Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such Abstract 19: Creating diverse tasks to as momentum or Adam) that use simple update catalyze robot learning in Meta-Learning, rules derived from theoretical principles, learned optimizers use fexible, high-dimensional, Fei-Fei 08:31 AM nonlinear parameterizations. Although this can Data has become an essential catalyst for the lead to better performance in certain settings, development of artifcial intelligence. But it is their inner workings remain a mystery. How is a challenging to obtain data for robotic learning. So learned optimizer able to outperform a well tuned how should we tackle this issue? In this talk, we baseline? Has it learned a sophisticated start with a retrospective of how ImageNet and combination of existing optimization techniques, other large-scale datasets incentivized the deep or is it implementing completely new behavior? In learning revolution in the past decade, and aim to this work, we address these questions by careful tackle the new challenges faced by robotic data. analysis and visualization of learned optimizers. To this end, we introduce two lines of work in the We study learned optimizers trained from scratch Stanford Vision and Learning Lab on creating on three disparate tasks, and discover that they tasks to catalyze robot learning in this new era. have learned interpretable mechanisms, We frst present the design of a large-scale and including: momentum, gradient clipping, learning realistic environment in simulation that enables rate schedules, and a new form of learning rate human and robotic agents to perform interactive adaptation. Moreover, we show how the tasks. We further propose a novel approach for dynamics of learned optimizers enables these automatically generating suitable tasks as behaviors. Our results help elucidate the curricula to expedite reinforcement learning in previously murky understanding of how learned hard-exploration problems. optimizers work, and establish tools for interpreting future learned optimizers.

Abstract 23: An inference perspective on meta-reinforcement learning in Meta- Abstract 26: Bayesian optimization by Learning, Rakelly 10:01 AM density ratio estimation in Meta-Learning, Tiao, Klein, Archambeau, Bonilla, Seeger, Ramos While meta-learning algorithms are often viewed 10:45 AM as algorithms that learn to learn, an alternative viewpoint frames meta-learning as inferring a Bayesian optimization (BO) is among the most hidden task variable from experience consisting efective and widely-used blackbox optimization of observations and rewards. From this methods. BO proposes solutions according to an perspective, learning-to-learn is learning-to-infer. explore-exploit trade-of criterion encoded in an This viewpoint can be useful in solving problems acquisition function, many of which are derived in meta-reinforcement learning, which I’ll from the posterior predictive of a probabilistic demonstrate through two examples: (1) enabling surrogate model. Prevalent among these is the of-policy meta-learning and (2) performing expected improvement (EI). Naturally, the need efcient meta-reinforcement learning from image to ensure analytical tractability in the model observations. Finally, I’ll discuss how I think this poses limitations that can ultimately hinder the 22 Dec. 11, 2020

efciency and applicability of BO. In this paper, Graph Neural Networks (GNNs) are a framework we cast the computation of EI as a binary for graph representation learning, where a model classifcation problem, building on the well-known learns to generate low dimensional node link between class probability estimation (CPE) embeddings that encapsulate structural and and density ratio estimation (DRE), and the feature-related information. GNNs are usually lesser-known link between density ratios and EI. trained in an end-to-end fashion, leading to By circumventing the tractability constraints highly specialized node embeddings. However, imposed on the model, this reformulation generating node embeddings that can be used to provides several natural advantages, not least in perform multiple tasks (with performance scalability, increased fexibility, and greater comparable to single-task models) is an open representational capacity. problem. We propose a novel meta-learning strategy capable of producing multi-task node embeddings. Our method avoids the difculties Abstract 28: Decoupling Exploration and arising when learning to perform multiple tasks Exploitation in Meta-Reinforcement concurrently by, instead, learning to quickly (i.e. with a few steps of gradient descent) adapt to Learning without Sacrifces in Meta- multiple tasks singularly. We show that the Learning, Liu N/A embeddings produced by our method can be The goal of meta-reinforcement learning (meta- used to perform multiple tasks with comparable RL) is to build agents that can quickly learn new or higher performance than classically trained tasks by leveraging prior experience on related models. Our method is model-agnostic and task- tasks. Learning a new task often requires both agnostic, thus applicable to a wide variety of exploring to gather task-relevant information and multi-task domains. exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing Abstract 30: Prior-guided Bayesian task performance. However, such meta-RL Optimization in Meta-Learning, Souza N/A approaches struggle with local optima due to a chicken-and-egg problem: learning to explore While Bayesian Optimization (BO) is a very requires good exploitation to gauge the popular method for optimizing expensive black- exploration’s utility, but learning to exploit box functions, it fails to leverage the knowledge requires information gathered via exploration. of domain experts. This causes BO to waste Optimizing separate objectives for exploration function evaluations on bad design choices (e.g., and exploitation can avoid this problem, but prior machine learning hyperparameters) that the meta-RL exploration objectives yield suboptimal expert already knows to work poorly. To address policies that gather information irrelevant to the this issue, we introduce Prior-guided Bayesian task. We alleviate both concerns by constructing Optimization (PrBO). PrBO allows users to transfer an exploitation objective that automatically their knowledge into the optimization process in identifes task-relevant information and an the form of priors about which parts of the input exploration objective to recover only this space will yield the best performance, rather than information. This avoids local optima in end-to- BO’s standard priors over functions (which are end training, without sacrifcing optimal much less intuitive for users). PrBO then exploration. Empirically, DREAM substantially combines these priors with BO’s standard outperforms existing approaches on complex probabilistic model to form a pseudo-posterior meta-RL problems, such as sparse-reward 3D used to select which points to evaluate next. We visual navigation. show that PrBO is around 12x faster than state- of-the-art methods without user priors and 10,000x faster than random search on a common Abstract 29: A Meta-Learning Approach for suite of benchmarks. PrBO also converges faster Graph Representation Learning in Multi- even if the user priors are not entirely accurate and robustly recovers from misleading priors. Task Settings in Meta-Learning, Bufelli N/A 23 Dec. 11, 2020

Abstract 31: Task Similarity Aware Meta Learning: Theory-inspired Improvement on Abstract 33: Few-shot Sequence Learning MAML in Meta-Learning, Zhou N/A with Transformers in Meta-Learning, Few-shot learning ability is heavily desired for Logeswaran N/A machine intelligence. By meta-learning a model initialization from training tasks with fast Few-shot algorithms aim at learning new tasks adaptation ability to new tasks, model-agnostic provided only a handful of training examples. In meta-learning (MAML) has achieved remarkable this work we investigate few-shot learning in the setting where the data points are sequences (or success in a number of few-shot learning sets) of tokens and propose an efcient learning applications. However, theoretical understandings on the learning ability of MAML algorithm based on Transformers. In the simplest remain absent yet, hindering developing new and setting, we append a token to an input sequence more advanced meta learning methods in a which represents the particular task to be undertaken, and show that the embedding of this principle way. In this work, we solve this problem token can be optimized on the fy given few by theoretically justifying the fast adaptation capability of MAML when applied to new tasks. labeled examples. Our approach does not require Specifcally, we prove that the learnt meta- complicated changes to the model architecture initialization can quickly adapt to new tasks with such as adapter layers nor computing second order derivatives as is currently popular in the only a few steps of gradient descent. This result, meta-learning and few-shot learning literature. for the frst time, explicitly reveals the benefts of the unique designs in MAML. Then we propose a We demonstrate our approach on a variety of theory-inspired task similarity aware MAML which tasks, and analyze the generalization properties clusters tasks into multiple groups according to of several model variants and baseline approaches. In particular, we show that the estimated optimal model parameters and compositional task descriptors can improve learns group-specifc initializations. The proposed method improves upon MAML by speeding up the performance. Experiments show that our adaptation and giving stronger few-shot learning approach works at least as well as other ability. Experimental results on the few-shot methods, while being more computationally efcient. classifcation tasks testify its advantages.

Abstract 34: Uniform Priors for Meta- Abstract 32: MPLP: Learning a Message Learning in Meta-Learning, Sinha N/A Passing Learning Protocol in Meta-Learning, Randazzo N/A Deep Neural Networks have shown great promise

We present a novel method for learning the on a variety of downstream applications; but their weights of an artifcial neural network: a Message ability to adapt and generalize to new data and Passing Learning Protocol (MPLP). In MPLP, we tasks remains a challenging problem. However, the ability to perform few-shot adaptation to abstract every operation occurring in ANNs as novel tasks is important for the scalability and independent agents. Each agent is responsible for ingesting incoming multidimensional deployment of machine learn-ing models. messages from other agents, updating its internal It is therefore crucial to understand what makes state, and generating multidimensional messages for good, transferable features in deep networks that best allow for such adaptation. In this paper, to be passed on to neighbouring agents. We we shed light on this by showing that features demonstrate the viability of MPLP as opposed to traditional gradient-based approaches on simple that are most transferable have high uniformity feed-forward neural networks, and present a in the embedding space and propose a uniformity framework capable of generalizing to non- regularization scheme that encourages better transfer and feature reuse for few-shot learning. traditional neural network architectures. MPLP is We evaluate our regularization on few-shot Meta- meta learned using end-to-end gradient-based meta-optimisation. Learning benchmarks and show that uniformity regularization consistently ofers benefts over 24 Dec. 11, 2020

baseline methods while also being able to helpful on the linear centroid meta-learning achieve state-of-the-art on the Meta-Dataset problem, in the asymptotic setting where the number of tasks goes to infnity. We show that the splitting method converges to the optimal Abstract 35: Is Support Set Diversity prior as expected, whereas the non-splitting method does not in general without structural Necessary for Meta-Learning? in Meta- assumptions on the data. In contrast, if the data Learning, Li N/A are generated from linear models (the realizable Meta-learning is a popular framework for learning regime), we show that both the splitting and non- with limited data in which a model is produced by splitting methods converge to the optimal prior. training over multiple few-shot learning tasks. For Further, perhaps surprisingly, our main result classifcation problems, these tasks are typically shows that the non-splitting method achieves a constructed by sampling a small number of \emph{strictly better} asymptotic excess risk support and query examples from a subset of the under this data distribution, even when the classes. While conventional wisdom is that task regularization parameter and split ratio are diversity should improve the performance of optimally tuned for both methods. Our results meta-learning, in this work we fnd evidence to highlight that data splitting may not always be the contrary: we propose a modifcation to preferable, especially when the data is realizable traditional meta-learning approaches in which we by the model. We validate our theories by keep the support sets fxed across tasks, thus experimentally showing that the non-splitting reducing task diversity. Surprisingly, we fnd that method can indeed outperform the splitting not only does this modifcation not result in method, on both simulations and real meta- adverse efects, it almost always improves the learning tasks. performance for a variety of datasets and meta- learning methods. We also provide several initial analyses to understand this phenomenon. Our Abstract 37: Learning Flexible Classifers with work serves to: (i) more closely investigate the Shot-CONditional Episodic (SCONE) Training efect of support set construction for the problem in Meta-Learning, Triantafllou N/A of meta-learning, and (ii) suggest a simple, general, and competitive baseline for few-shot Early few-shot classifcation work advocates for learning. episodic training, i.e. training over learning episodes each posing a few-shot classifcation task. However, the role of this training regime Abstract 36: How Important is the Train- remains poorly understood. Standard Validation Split in Meta-Learning? in Meta- classifcation methods (``pre-training'') followed by episodic fne-tuning have recently achieved Learning, Bai N/A strong results. We aim to understand the role of Meta-learning aims to perform fast adaptation on this episodic fne-tuning phase through an a new task through learning a “prior” from exploration of the efect of the ``shot'' (number multiple existing tasks. A common practice in of examples per class) that is used during fne- meta-learning is to perform a \emph{train- tuning. We discover that using a fxed shot can validation split} where the prior adapts to the specialize the pre-trained model to solving task on one split of the data, and the resulting episodes of that shot at the expense of predictor is evaluated on another split. Despite its performance on other shots, in agreement with a prevalence, the importance of the train-validation trade-of recently observed in the context of end- split is not well understood either in theory or in to-end episodic training. To amend this, we practice, particularly in comparison to the more propose a shot-conditional form of episodic fne- direct \emph{non-splitting} method, which uses tuning, inspired from recent work that trains a all the per-task data for both training and single model on a distribution of losses. We show evaluation. that this fexible approach consitutes an efective general solution that does not sufer We provide a detailed theoretical study on disproportionately on any shot. We then subject it whether and when the train-validation split is to the large-scale Meta-Dataset benchmark of 25 Dec. 11, 2020

varying shots and imbalanced episodes and results on the FSS-1000 dataset by meta-training observe performance gains in that challenging EfcientLab with FOMAML and using Bayesian environment. optimization to infer the optimal test-time adaptation routine hyperparameters. We also construct a benchmark dataset, binary PASCAL, for the empirical study of how image Abstract 38: Learning not to learn: Nature segmentation meta-learning systems improve as versus nurture in silico in Meta-Learning, Lange N/A a function of the number of labeled examples. On the binary PASCAL dataset, we show that when Animals are equipped with a rich innate generalizing out of meta-distribution, meta- repertoire of sensory, behavioral and motor skills, learned initializations provide only a small which allows them to interact with the world improvement over joint training in accuracy but immediately after birth. At the same time, many require signifcantly fewer gradient updates. Our behaviors are highly adaptive and can be tailored code and meta-learned model are available at to specifc environments by means of learning. In https://drive.google.com/drive/folders/ this work, we use mathematical analysis and the 1VhTJtYQ_byC9woS1fBaRi-hdWksfm5qq? framework of meta-learning (or 'learning to usp=sharing. learn') to answer when it is benefcial to learn such an adaptive strategy and when to hard-code a heuristic behavior. We fnd that the interplay of Abstract 40: Meta-Learning of Compositional ecological uncertainty, task complexity and the Task Distributions in Humans and Machines agents' lifetime has crucial efects on the meta- in Meta-Learning, Kumar N/A learned amortized Bayesian inference performed by an agent. There exist two regimes: One in Modern machine learning systems struggle with which meta-learning yields a learning algorithm sample efciency and are usually trained with that implements task-dependent information- enormous amounts of data for each task. This is integration and a second regime in which meta- in sharp contrast with humans, who often learn learning imprints a heuristic or 'hard-coded' with very little data. In recent years, meta- behavior. Further analysis reveals that non- learning, in which one trains on a family of tasks adaptive behaviors are not only optimal for (i.e. a task distribution), has emerged as an aspects of the environment that are stable across approach to improving the sample complexity of individuals, but also in situations where an machine learning systems and to closing the gap adaptation to the environment would in fact be between human and machine learning. However, highly benefcial, but could not be done quickly in this paper, we argue that current meta- enough to be exploited within the remaining learning approaches still difer signifcantly from lifetime. Hard-coded behaviors should hence not human learning. We argue that humans learn only be those that always work, but also those over tasks by constructing compositional that are too complex to be learned within a generative models and using these to generalize, reasonable time frame. whereas current meta-learning methods are biased toward the use of simpler statistical patterns. To highlight this diference, we construct a new meta-reinforcement learning Abstract 39: Meta-Learning Initializations for Image Segmentation in Meta-Learning, task with a compositional task distribution. We Hendryx N/A also introduce a novel approach to constructing a ``null task distribution'' with the same statistical We evaluate frst-order model agnostic meta- complexity as the compositional distribution but learning algorithms (including FOMAML and without explicit compositionality. We train a Reptile) on few-shot image segmentation, standard meta-learning agent, a recurrent present a novel neural network architecture built network trained with model-free reinforcement for fast learning which we call EfcientLab, and learning, and compare it with human leverage a formal defnition of the test error of performance across the two task distributions. meta-learning algorithms to decrease error on We fnd that humans do better in the out of distribution tasks. We show state of the art compositional task distribution whereas the agent 26 Dec. 11, 2020

does better in the non-compositional null task search spaces. Motivated by the fact that similar distribution -- despite comparable statistical architectures tend to yield comparable results, complexity. This work highlights a particular we propose NAS-Bench-301 which covers a diference between human learning and current search space many orders of magnitude larger meta-learning models, introduces a task that than any previous NAS benchmark. We achieve displays this diference, and paves the way for this by meta-learning a performance predictor future work on human-like meta-learning. that predicts the capability of diferent neural architectures to facilitate base-level learning, and using it to defne a surrogate benchmark. We ft Abstract 41: Multi-Objective Multi-Fidelity various regression models on our dataset, which consists of ~60k architecture evaluations, and Hyperparameter Optimization with build surrogates via deep ensembles to also Application to Fairness in Meta-Learning, Schmucker N/A model uncertainty. We benchmark a wide range of NAS algorithms using NAS-Bench-301 and In many real-world applications, the performance obtain comparable results to the true benchmark of machine learning models is evaluated not at a fraction of the real cost. along a single objective, but across multiple, potentially competing ones. For instance, for a model deciding whether to grant or deny loans, it Abstract 43: Open-Set Incremental Learning is critical to make sure decisions are fair and not via Bayesian Prototypical Embeddings in only accurate. As it is often infeasible to fnd a Meta-Learning, Willes N/A single model performing best across all objectives, practitioners are forced to fnd a As autonomous decision-making agents move trade-of between the individual objectives. While from narrow operating environments to several multi-objective optimization (MO) unstructured worlds, learning systems must techniques have been proposed in the machine move from a closed-world formulation to an learning literature (and beyond), little efort has open-world, incremental, few-shot setting in been put towards using MO for hyperparameter which agents continuously learn new labels from optimization (HPO) problems; a task that has small amounts of information. This stands in stark gained immense relevance and adoption in contrast to modern machine learning systems recent years. In this paper, we evaluate the that are typically designed with a known set of suitability of existing MO algorithms for HPO and classes and a large number of examples for each propose a novel multi-fdelity method for this class. In this work, we extend embedding-based problem. We evaluate our approach on public few-shot learning algorithms toward open-world datasets with a special emphasis on fairness- problems. In particular, we investigate both the motivated applications, and report substantially lifelong setting---in which an entirely new set of lower wall-clock times when approximating classes exists at evaluation time---as well as the Pareto frontiers compared to the state-of-the-art. incremental setting, in which new classes are added to a set of base classes available at training time. We combine Bayesian non- parametric class priors with an embedding-based Abstract 42: NAS-Bench-301 and the Case for pre-training scheme to yield a highly fexible Surrogate Benchmarks for Neural Architecture Search in Meta-Learning, Siems framework for use in both the lifelong and the N/A incremental settings. We benchmark our framework on MiniImageNet and, and show Several tabular NAS benchmarks have been strong performance compared to baseline proposed to simulate runs of NAS methods in methods. seconds in order to allow scientifcally sound empirical evaluations. However, all existing tabular NAS benchmarks are limited to extremely Abstract 44: Pareto-efcient Acquisition small architectural spaces since they rely on Functions for Cost-Aware Bayesian exhaustive evaluations of the space. This leads to Optimization in Meta-Learning, Guinet N/A unrealistic results that do not transfer to larger 27 Dec. 11, 2020

Bayesian optimization (BO) is a popular method classifcation tasks without depending on any to optimize expensive black-box functions. It particular models nor external pre-trained efciently tunes machine learning algorithms networks. The proposed modelling approach under the implicit assumption that allows to represent any classifcation task in the hyperparameter evaluations cost approximately latent \say{topic} space, so that we can analyse the same. In reality, the cost of evaluating task similarity, or select the most similar tasks to diferent hyperparameters, be it in terms of time, facilitate the meta-learning of a novel task. We dollars or energy, can span several orders of demonstrate that the proposed method can magnitude of diference. While a number of provide an insightful evaluation for meta-learning heuristics have been proposed to make BO cost- algorithms on two few-shot classifcation aware, none of these have been proven to work benchmarks. We also show that the proposed robustly. In this work, we reformulate cost-aware task-selection strategy for meta-learning BO in terms of Pareto efciency and introduce the produces more accurate classifcation results on cost Pareto Front, a mathematical object allowing a new testing task than a method that randomly us to highlight the shortcomings of commonly selects the training tasks. used acquisition functions. Based on this, we propose a novel Pareto-efcient adaptation of the expected improvement. On 144 real-world black- Abstract 46: Task Meta-Transfer from Limited box function optimization problems we show that Parallel Labels in Meta-Learning, Jian N/A our Pareto-efcient acquisition functions signifcantly outperform previous solutions, In this work we introduce a novel meta-learning bringing up to 50\% speed-ups while providing algorithm that learns to utilize the gradient fner control over the cost-accuracy trade-of. We information of auxiliary tasks to improve the also revisit the common choice of Gaussian performance of a model on a given primary task. process cost models, showing that simple, low- Our proposed method learns to project gradients variance cost models predict training times from the auxiliary tasks to the primary task from efectively a {\em small} training set with ``parallel labels,'' i.e., examples annotated with respect to both the primary task and the auxiliary tasks. This Abstract 45: Similarity of classifcation tasks strategy enables the learning of models with strong performance on the primary task by in Meta-Learning, Nguyen N/A leveraging a large collection of auxiliary Recent advances in meta-learning has led to examples and few primary examples. Our remarkable performances on several scheme difers from methods for transfer benchmarks. Such success depends on not only learning, multi-task learning or domain the meta-learning algorithms, but also the adaptation in several ways: unlike na\"ive similarity between training and testing tasks. transfer learning, our strategy uses auxiliary However, such task similarity observation is often examples to directly optimize the model with ignored when evaluating meta-learning methods, respect to the primary task instead of the potentially biasing the classifcation results of the auxiliary task; unlike hard-sharing multi-task testing tasks. For instance, recent studies have learning methods, our algorithm devotes the found a large variance of classifcation results entire capacity of the backbone model to attend among testing tasks, suggesting that not all the primary task instead of splitting it over testing tasks are equally related to training tasks. multiple tasks; unlike most domain adaptation This motivates the need to analyse task similarity techniques, our scheme does not require any to optimise and better understand the overlap in labels between the auxiliary and the performance of meta-learning. Despite some primary task, thus enabling knowledge transfer successes in investigating task similarity, most between completely disjoint tasks. Experiments studies in the literature rely on task-specifc on two image analysis benchmarks involving models or the need of external models pre- multiple tasks demonstrate the performance trained on some large data sets. We, therefore, improvements of our meta-learning scheme over propose a generative approach based on a na\"ive transfer learning, multi-task learning as variant of Latent Dirichlet Allocation to model well as prior related work. 28 Dec. 11, 2020

Suspended Payloads in Meta-Learning, Belkhale N/A Abstract 47: Flexible Dataset Distillation: Learn Labels Instead of Images in Meta- Transporting suspended payloads is challenging Learning, Bohdal N/A for autonomous aerial vehicles because the payload can cause signifcant and unpredictable We study the problem of dataset distillation - changes to the robot's dynamics. These changes creating a small set of synthetic examples can lead to suboptimal fight performance or capable of training a good model. In particular, even catastrophic failure. Although adaptive we study the problem of label distillation - control and learning-based methods can in creating synthetic labels for a small set of real principle adapt to changes in these hybrid robot- images, and show it to be more efective than the payload systems, rapid mid-fight adaptation to prior image-based approach to dataset payloads that have a priori unknown physical distillation. Methodologically, we introduce a properties remains an open problem. We propose more robust and fexible meta-learning algorithm a meta-learning approach that learns how to for distillation, as well as an efective frst-order adapt models of altered dynamics within seconds strategy based on convex optimization layers. after picking up or dropping a payload. Our Distilling labels with our new algorithm leads to experiments demonstrate that our approach improved results over prior image-based outperforms non-adaptive methods on several distillation. More importantly, it leads to clear challenging suspended payload transportation improvements in fexibility of the distilled dataset tasks. in terms of compatibility with of-the-shelf optimizers and diverse neural architectures. Interestingly, label distillation can be applied across datasets, for example enabling learning Abstract 50: HyperVAE: Variational Hyper- Japanese character recognition by training only Encoding Network in Meta-Learning, Nguyen N/A on synthetically labeled English letters. We propose a framework called HyperVAE for encoding distributions of distributions. When a Abstract 48: Putting Theory to Work: From target distribution is modeled by a VAE, its neural Learning Bounds to Meta-Learning network parameters \theta are drawn from a Algorithms in Meta-Learning, Bouniot N/A distribution p(\theta) which is modeled by a hyper-level VAE. Given a target distribution, we In this paper, we review the recent advances in predict the posterior distribution of the latent meta-learning theory and show how they can be code, then use a matrix-network decoder to used in practice both to better understand the generate a posterior distribution q(\theta). behavior of popular meta-learning algorithms and HyperVAE can encode the parameters \theta in to improve their generalization capacity. This full in contrast to common hyper-networks latter is achieved by integrating the theoretical practices, which generate only the scale and bias assumptions ensuring efcient meta-learning in vectors to modify the target-network parameters. the form of regularization terms into several Thus HyperVAE preserves information about the popular meta-learning algorithms for which we model for each task in the latent space. We provide a large study of their behavior on classic evaluate HyperVAE in density estimation tasks, few-shot classifcation benchmarks. To the best of outlier detection and discovery of novel design our knowledge, this is the frst contribution that classes. puts the most recent learning bounds of meta- learning theory into practice for the popular task of few-shot classifcation. Abstract 51: Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift in Meta-Learning, Zhang N/A Abstract 49: Model-Based Meta- Reinforcement Learning for Flight with A fundamental assumption of most machine learning algorithms is that the training and test 29 Dec. 11, 2020

data are drawn from the same underlying their corresponding input-output pairs. Statistical distribution. However, this assumption is violated analysis and preliminary human experiments in almost all practical applications: machine show the potential of this benchmark for enabling learning systems are regularly tested under progress in few-shot extrapolation. distribution shift, due to temporal correlations, particular end users, or other factors. In this work, we consider the setting where the training Abstract 53: Towards Meta-Algorithm data are structured into groups and test time Selection in Meta-Learning, Tornede N/A shifts correspond to changes in the group distribution. Prior work has approached this Instance-specifc algorithm selection (AS) deals problem by attempting to be robust to all with the automatic selection of an algorithm from possible test time distributions, which may a fxed set of candidates most suitable for a degrade average performance. In contrast, we specifc instance of an algorithmic problem class, propose to use ideas from meta-learning to learn where "suitability" often refers to an algorithm's models that are adaptable, such that they can runtime. Over the past years, a plethora of adapt to shift at test time using a batch of algorithm selectors have been proposed.As an AS unlabeled test points. We acquire such models by selector is again an algorithm solving a specifc learning to adapt to training batches sampled problem, the idea of algorithm selection could according to diferent distributions, which also be applied to AS algorithms, leading to a simulate structural shifts that may occur at test meta-AS approach: Given an instance, the goal is time. Our primary contribution is to introduce the to select an algorithm selector, which is then framework of adaptive risk minimization (ARM), a used to select the actual algorithm for solving the formalization of this setting that lends itself to problem instance. We elaborate on consequences meta-learning. We develop meta-learning of applying AS on a meta-level and identify methods for solving the ARM problem, and possible problems. Empirically, we show that compared to a variety of prior methods, these meta-algorithm selection can indeed prove methods provide substantial gains on image benefcial in some cases. In general, however, classifcation problems in the presence of shift. successful AS approaches have problems with solving the meta-level problem.

Abstract 52: Measuring few-shot extrapolation with program induction in Abstract 54: Meta-Learning Backpropagation Meta-Learning, Alet N/A And Improving It in Meta-Learning, Kirsch N/ A Neural networks are capable of learning complex functions, but still have problems generalizing In the past a large number of variable update from few examples and beyond their training rules have been proposed for meta learning such distribution. Meta-learning provides a paradigm as fast weights, hyper networks, learned learning to train networks to learn from few examples, but rules, and meta recurrent neural networks. We it has been shown that some of its most popular unify these architectures by demonstrating that a benchmarks do not require signifcant adaptation single weight-sharing and sparsity principle to each task nor learning representations that underlies them that can be used to express extrapolate beyond the training distribution. complex learning algorithms. We propose a Program induction lies at the opposite end of the simple implementation of this principle, the spectrum: programs are capable of extrapolating Variable Shared Meta RNN, and demonstrate that from very few examples, but we still do not know it allows implementing neuronal dynamics and how to efciently search these discrete spaces. backpropagation solely by running the recurrent We propose a common benchmark for both neural network in forward-mode. This ofers a communities, by learning to extrapolate from few direction for backpropagation that is biologically examples coming from the execution of small plausible. Then we show how backpropagation programs. These are obtained by leveraging a C+ itself can be further improved through meta- + interpreter on codes from programming learning. That is, we can use a human-engineered competitions and extracting small sub-codes with 30 Dec. 11, 2020

algorithm as an initialization for meta-learning graph regularization approach that allows deeper better learning algorithms. understanding of the impact of incorporating graph information between labels. Our proposed regularization is widely applicable and model- Abstract 55: Hyperparameter Transfer Across agnostic, and boosts performance of any few- shot learning model, including metric-learning, Developer Adjustments in Meta-Learning, meta-learning, and fne-tuning. Our approach Stoll N/A improves strong base learners by up to 2% on After developer adjustments to a machine Mini-ImageNet and 6.7% on ImageNet-FS, learning (ML) system, how can the results of an outperforming state-of-the-art models and other old hyperparameter optimization automatically graph embedded methods. Additional analyses be used to speed up a new hyperparameter reveal that graph regularizing models results in optimization? This question poses a challenging lower loss for more difcult tasks such as lower- problem, as developer adjustments can change shot and less informative few-shot episodes. which hyperparameter settings perform well, or even the hyperparameter space itself. While many approaches exist that leverage knowledge Abstract 57: Prototypical Region Proposal obtained on previous tasks, so far, knowledge Networks for Few-shot Localization and from previous development steps remains Classifcation in Meta-Learning, Skomski N/A entirely untapped. In this work, we remedy this situation and propose a new research framework: Recently proposed few-shot image classifcation hyperparameter transfer across adjustments (HT- methods have generally focused on use cases AA). To lay a solid foundation for this research where the objects to be classifed are the central framework, we provide four HT-AA baseline subject of images. Despite success on benchmark algorithms and eight benchmarks. The best vision datasets aligned with this use case, these baseline, on average, reaches a given methods typically fail on use cases involving performance 2x faster than a prominent HPO densely-annotated, busy images: images algorithm without transfer. As hyperparameter common in the wild where objects of relevance optimization is a crucial step in ML development are not the central subject, instead appearing but requires extensive computational resources, potentially occluded, small, or among other this speed up would lead to faster development incidental objects belonging to other classes of cycles, lower costs, and reduced environmental potential interest. impacts. To make these benefts available to ML developers of-the-shelf, we provide a python To localize relevant objects, we employ a package that implements the proposed transfer prototype-based few-shot segmentation model algorithm. which compares the encoded features of unlabeled query images with support class centroids to produce region proposals indicating Abstract 56: Model-Agnostic Graph the presence and location of support set classes Regularization for Few-Shot Learning in in a query image. These region proposals are then used as additional conditioning input to few- Meta-Learning, Shen N/A shot image classifers. We develop a framework In many domains, relationships between to unify the two stages (segmentation and categories are encoded in the knowledge graph. classifcation) into an end-to-end classifcation Recently, promising results have been achieved model---PRoPnet---and empirically demonstrate by incorporating knowledge graphs as a side- that our methods improve accuracy on image information in hard classifcation tasks with datasets with natural scenes containing multiple severely limited data. However, prior models object classes. consist of highly complex architectures with many sub-components that all seem to impact performance. In this paper, we present a Abstract 58: Tailoring: encoding inductive comprehensive empirical study on graph biases by optimizing unsupervised embedded few-shot learning. We introduce a 31 Dec. 11, 2020

objectives at prediction time in Meta- variational posteriors in an interdependent Learning, Alet N/A manner, our method does not rely on difcult nested optimization problems. Our experiments From CNNs to attention mechanisms, encoding show that the proposed method is not only inductive biases into neural networks has been a computationally more efcient but also yields fruitful source of improvement in machine better predictions and uncertainty estimates learning. Auxiliary losses are a general way of when compared to previous meta-learning encoding biases in order to help networks learn methods and BNNs with standard priors. better representations by adding extra terms to the loss function. However, since they are minimized on the training data, they sufer from the same generalization gap as regular task Abstract 60: Synthetic Petri Dish: A Novel losses. Moreover, by changing the loss function, Surrogate Model for Rapid Architecture the network is optimizing a diferent objective Search in Meta-Learning, Rawal N/A than the one we care about. In this work we solve Neural Architecture Search (NAS) explores a large both problems: frst, we take inspiration from space of architectural motifs -- transductive learning and note that, after a compute-intensive process that often involves receiving an input but before making a ground-truth evaluation of each motif by prediction, we can fne-tune our models on any instantiating it within a large network, and unsupervised objective. We call this process training and evaluating the network with tailoring, because we customize the model to thousands or more data samples. Inspired by how each input. Second, we formulate a nested biological motifs such as cells are sometimes optimization (similar to those in meta-learning) extracted from their natural environment and and train our models to perform well on the task studied in an artifcial Petri dish setting, this loss after adapting to the tailoring loss. The paper proposes the Synthetic Petri Dish model for advantages of tailoring and meta-tailoring are evaluating architectural motifs. In the Synthetic discussed theoretically and demonstrated Petri Dish, architectural motifs are instantiated in empirically on several diverse examples: very small networks and evaluated using very encoding inductive conservation laws from few learned synthetic data samples (to efectively physics to improve predictions, improving local approximate performance in the full problem). smoothness to increase robustness to adversarial The relative performance of motifs in the examples, and using contrastive losses on the Synthetic Petri Dish can substitute for their query image to improve generalization. ground-truth performance, thus accelerating the most expensive step of NAS. Unlike other neural network-based prediction models that parse the Abstract 59: Meta-Learning Bayesian Neural structure of the motif to estimate its Network Priors Based on PAC-Bayesian performance, the Synthetic Petri Dish predicts Theory in Meta-Learning, Rothfuss N/A motif performance by training the actual motif in an artifcial setting, thus deriving predictions from Bayesian Neural Networks (BNNs) are a promising its true intrinsic properties. Experiments in this approach towards improved uncertainty paper demonstrate that the Synthetic Petri Dish quantifcation and sample efciency. Due to their can therefore predict the performance of new complex parameter space, choosing informative motifs with signifcantly higher accuracy, priors for BNNs is challenging. Thus, often a especially when insufcient ground truth data is naive, zero-centered Gaussian is used, resulting available. both in bad generalization and poor uncertainty Our hope is that this work can inspire a new estimates when training data is scarce. In research direction in studying the performance of contrast, meta-learning aims to extract such prior extracted components of models in a synthetic knowledge from a set of related learning tasks. diagnostic setting optimized to provide We propose a principled and scalable algorithm informative evaluations. for meta-learning BNN priors based on PAC- Bayesian bounds. Whereas previous approaches require optimizing the prior and multiple 32 Dec. 11, 2020

Abstract 61: Contextual HyperNetworks for manifold. We frst analyze a deep model trained Novel Feature Adaptation in Meta-Learning, on only one learning task in isolation and identify Lamb N/A a region in network parameter space, where the model performance is close to the recovered While deep learning has obtained state-of-the-art optimum. We provide empirical evidence that this results in many applications, the adaptation of region resembles a cone that expands along the neural network architectures to incorporate new convergence direction. We study the principal output features remains a challenge, as a neural directions of the trajectory of the optimizer after networks are commonly trained to produce a convergence and show that traveling along a few fxed output dimension. This issue is particularly top principal directions can quickly bring the severe in online learning settings, where new parameters outside the cone but this is not the output features, such as items in a recommender case for the remaining directions. We argue that system, are added continually with few or no catastrophic forgetting in a continual learning associated observations. As such, methods for setting can be alleviated when the parameters adapting neural networks to novel features which are constrained to stay within the intersection of are both time and data-efcient are desired. To the plausible cones of individual tasks that were address this, we propose the Contextual so far encountered during training. Enforcing this HyperNetwork (CHN), an auxiliary model which is equivalent to preventing the parameters from generates parameters for extending the base moving along the top principal directions of model to a new feature, by utilizing both existing convergence corresponding to the past tasks. For data as well as any observations and/or metadata each task we introduce a new linear autoencoder associated with the new feature. to approximate its corresponding top forbidden At prediction time, the CHN requires only a single principal directions. They are then incorporated forward pass through a neural network, yielding a into the loss function in the form of a signifcant speed-up when compared to re- regularization term for the purpose of learning training and fne-tuning approaches. the coming tasks without forgetting. We empirically demonstrate that our algorithm To assess the performance of CHNs, we use a performs favorably compared to other state-of-art CHN to augment a partial variational autoencoder regularization-based continual learning methods, (P-VAE), a deep generative model which can including EWC and SI. impute the values of missing features in sparsely- observed data. We show that this system obtains improved few-shot learning performance for novel features over existing imputation and Abstract 63: Continual Model-Based Reinforcement Learning with meta-learning baselines across recommender Hypernetworks in Meta-Learning, Huang N/A systems, e-learning, and healthcare tasks. Efective planning in model-based reinforcement learning (MBRL) and model-predictive control Abstract 62: Continual learning with (MPC) relies on the accuracy of the learned direction-constrained optimization in Meta- dynamics model. In many instances of MBRL and Learning, Teng N/A MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state This paper studies a new design of the transition experience collected from the optimization algorithm for training deep learning beginning of environment interactions. This models with a fxed architecture of the implies that the time required to train the classifcation network in a continual learning dynamics model - and the pause required framework, where the training data is non- between plan executions - grows linearly with the stationary and the non-stationarity is imposed by size of the collected experience. We argue that a sequence of distinct tasks. This setting implies this is too slow for lifelong robot learning and the existence of a manifold of network propose HyperCRL, a method that continually parameters that correspond to good performance learns the encountered dynamics in a sequence of the network on all tasks. Our algorithm is of tasks using task-conditional hypernetworks. derived from the geometrical properties of this Our method has three main attributes: frst, it 33 Dec. 11, 2020

enables constant-time dynamics learning stemming from all those tasks. In this paper we sessions between planning and only needs to propose such a setting, naming it Continual Few- store the most recent fxed-size portion of the Shot Learning (CFSL). We frst defne a theoretical state transition experience; second, it uses fxed- framework for CFSL, then we propose a range of capacity hypernetworks to represent non- fexible benchmarks to unify the evaluation stationary and task-aware dynamics; third, it criteria. As part of the benchmark, we introduce a outperforms existing continual learning compact variant of ImageNet, called alternatives that rely on fxed-capacity networks, SlimageNet64, which retains all original 1000 and does competitively with baselines that classes but only contains 200 instances of each remember an ever increasing coreset of past one (a total of 200K data-points) downscaled to experience. We show that HyperCRL is efective 64 by 64 pixels. We provide baselines for the in continual model-based reinforcement learning proposed benchmarks using a number of popular in robot locomotion and manipulation scenarios, few-shot and continual learning methods, such as tasks involving pushing and door exposing previously unknown strengths and opening. weaknesses of those algorithms. The dataloader and dataset will be released with an open-source license.

Abstract 64: Data Augmentation for Meta- Learning in Meta-Learning, Ni N/A Abstract 66: Few-Shot Unsupervised Conventional image classifers are trained by Continual Learning through Meta-Examples randomly sampling mini-batches of images. To in Meta-Learning, Bertugli N/A achieve state-of-the-art performance, sophisticated data augmentation schemes are In real-world applications, data do not refect the used to expand the amount of training data ones commonly used for neural networks available for sampling. In contrast, meta-learning training, since they are usually few, unbalanced, algorithms sample not only images, but classes unlabeled and can be available as a stream. as well. We investigate how data augmentation Hence many existing deep learning solutions can be used not only to expand the number of sufer from a limited range of applications, in images available per class, but also to generate particular in the case of online streaming data entirely new classes. We systematically dissect that evolve over time. To narrow this gap, in this the meta-learning pipeline and investigate the work we introduce a novel and complex setting distinct ways in which data augmentation can be involving unsupervised meta-continual learning integrated at both the image and class levels. with unbalanced tasks. These tasks are built Our proposed meta-specifc data augmentation through a clustering procedure applied to a ftted signifcantly improves the performance of meta- embedding space. We exploit a meta-learning learners on few-shot classifcation benchmarks. scheme that simultaneously alleviates catastrophic forgetting and favors the generalization to new tasks. Moreover, to Abstract 65: Defning Benchmarks for encourage feature reuse during the meta- optimization, we exploit a single inner loop taking Continual Few-Shot Learning in Meta- advantage of an aggregated representation Learning, Patacchiola N/A achieved through the use of a self-attention In recent years there has been substantial mechanism. Experimental results on few-shot progress in few-shot learning, where a model is learning benchmarks show competitive trained on a small labeled dataset related to a performance even compared to the supervised specifc task, and in continual learning, where a case. Additionally, we empirically observe that in model has to retain knowledge acquired on a an unsupervised scenario, the small tasks and sequence of datasets. However, the feld has still the variability in the clusters pooling play a to frame a suite of benchmarks for the hybrid crucial role in the generalization capability of the setting combining these two paradigms, where a network. Further, on complex datasets, the model is trained on several sequential few-shot exploitation of more clusters than the true tasks, and then tested on a validation set number of classes leads to higher results, even 34 Dec. 11, 2020

compared to the ones obtained with full meta-learning framework that explicitly learns to supervision, suggesting that a predefned generate noise to improve the model's partitioning into classes can miss relevant robustness against multiple types of attacks. Its structural information. key component is Meta Noise Generator (MNG) that outputs optimal noise to stochastically perturb a given sample, such that it helps lower the error on diverse adversarial perturbations. By Abstract 67: Exploring Representation Learning for Flexible Few-Shot Tasks in utilizing samples generated by MNG, we train a Meta-Learning, Ren N/A model by enforcing the label consistency across multiple perturbations. We validate the Existing approaches to few-shot learning deal robustness of models trained by our scheme on with tasks that have persistent, rigid notions of various datasets and against a wide variety of classes. Typically, the learner observes data only perturbations, demonstrating that it signifcantly from a fxed number of classes at training time outperforms the baselines across multiple and is asked to generalize to a new set of classes perturbations with a marginal computational cost. at test time. Two examples from the same class would always be assigned the same labels in any episode. In this work, we consider a realistic Abstract 69: MAster of PuPpets: Model- setting where the relationship between examples Agnostic Meta-Learning via Pre-trained can change from episode to episode depending Parameters for Natural Language on the task context, which is not given to the Generation in Meta-Learning, Lin N/A learner. We defne two new benchmark datasets for this fexible few-shot scenario, where the Pre-trained Transformer-based language models tasks are based on images of faces (Celeb-A) and have been an enormous success in generating shoes (Zappos50K). While classifcation baselines realistic natural language. However, how to adapt learn representations that work well for standard these models to specifc domains efectively few-shot learning, they sufer in our fexible tasks remains unsolved. On the other hand, Model- since the classifcation criteria shift from training Agnostic Meta-Learning (MAML) has been an to testing. On the other hand, unsupervised infuential framework for few-shot learning, while contrastive representation learning with instance- how to determine the initial parameters of MAML based invariance objectives preserves such is still not well-researched. In this paper, we fuse fexibility. A combination of instance and class the information from the pre-training stage with invariance learning objectives is found to perform meta-learning to learn how to adapt a pre-trained best on our new fexible few-shot learning generative model to a new domain. In particular, benchmarks, and a novel variant of Prototypical we fnd that applying the pre-trained information Networks is proposed for selecting useful feature as the initial state of meta-learning helps the dimensions. model adapt to new tasks efciently and is competitive with the state-of-the-art results over evaluation metrics on the Persona dataset. Abstract 68: Learning to Generate Noise for Besides, in few-shot experiments, we show that the proposed model converges signifcantly faster Multi-Attack Robustness in Meta-Learning, than naive transfer learning baselines. Madaan N/A

The majority of existing adversarial defense methods are tailored to defend against a single Abstract 70: Meta-Learning via category of adversarial perturbation (e.g. $ Hypernetworks in Meta-Learning, Zhao N/A \ell_\infty$-attack). However, this makes these methods extraneous as the attacker can adopt Recent developments in few-shot learning have diverse adversaries to deceive the system. shown that during fast adaption, gradient-based Moreover, training on multiple perturbations meta-learners mostly rely on embedding features simultaneously signifcantly increases the of powerful pretrained networks. This leads us to computational overhead during training. To research ways to efectively adapt features and address these challenges, we propose a novel utilize the meta-learner's full potential. Here, we 35 Dec. 11, 2020

demonstrate the efectiveness of hypernetworks Abstract 72: Learning in Low Resource in this context. We propose a soft row-sharing Modalities via Cross-Modal Generalization in hypernetwork architecture and show that training Meta-Learning, Liang N/A the hypernetwork with a variant of MAML is tightly linked to meta-learning a curvature matrix The natural world is abundant with underlying used to condition gradients during fast concepts expressed naturally in multiple adaptation. We achieve similar results as state-of- heterogeneous sources such as the visual, acoustic, tactile, and linguistic modalities. art model-agnostic methods in the Despite vast diferences in these raw modalities, overparametrized case, while outperforming many MAML variants without using diferent humans seamlessly perceive multimodal data, optimization schemes in the compressive regime. learn new concepts, and show extraordinary Furthermore, we empirically show that capabilities in generalizing across input modalities. Much of the existing progress in hypernetworks do leverage the inner loop multimodal learning, however, focuses primarily optimization for better adaptation, and analyse how they naturally try to learn the shared on problems where the same set of modalities curvature of constructed tasks on a toy problem are present at train and test time, which makes when using our proposed training algorithm. learning in low-resource modalities particularly difcult. In this work, we propose a general algorithm for cross-modal generalization: a learning paradigm where data from more Abstract 71: MobileDets: Searching for abundant source modalities is used to learn Object Detection Architecture for Mobile useful representations for scarce target Accelerators in Meta-Learning, Xiong N/A modalities. Our algorithm is based on meta- alignment, a novel method to align Inverted bottleneck layers, which are built upon depthwise convolutions, have been the representation spaces across modalities while predominant building blocks in state-of-the-art ensuring quick generalization to new concepts. object detection models on mobile devices. In Experimental results on generalizing from image to audio classifcation and from text to speech this work, we investigate the optimality of this classifcation demonstrate strong performance on design pattern over a broad range of mobile accelerators by revisiting the usefulness of classifying data from an entirely new target regular convolutions. We achieve substantial modality with only a few (1-10) labeled samples. improvements in the latency-accuracy trade-of In addition, our method works particularly well when the target modality sufers from noisy or by incorporating regular convolutions in the limited labels, a scenario particularly prevalent in search space, efectively placing them in the network via neural architecture search, and low-resource modalities. directly optimizing the network architectures for object detection. We obtain a family of object detection models, MobileDets, that achieve state- Abstract 73: Training more efective learned of-the-art results across mobile accelerators. On optimizers in Meta-Learning, Metz N/A the COCO object detection task, MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at Much as replacing hand-designed features with comparable mobile CPU inference latencies. learned functions has revolutionized how we solve perceptual tasks, we believe learning MobileDets also outperform algorithms will transform how we train models. In MobileNetV2+SSDLite by 1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs, 3.4 mAP on DSPs this work, we focus on general-purpose learned and 2.7 mAP on edge GPUs without latency optimizers capable of training a wide variety of increase. Moreover, MobileDets are comparable problems with no user-specifed hyperparameters. We introduce a new, neural with the state-of-the-art MnasFPN on mobile CPUs network parameterized, hierarchical optimizer even without using the feature pyramid, and achieve better mAP scores on both EdgeTPUs and with access to additional features such as DSPs with up to 2x speedup. validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task or a small number of tasks. We 36 Dec. 11, 2020

train our optimizers on thousands of tasks, Donald Goldfarb (Columbia) making use of orders of magnitude more Andreas Krause (ETH, Zurich) compute, resulting in optimizers that generalize Suvrit Sra (MIT) better to unseen tasks. The learned optimizers Rachel Ward (UT Austin) not only perform well but learn behaviors that are Ashia Wilson (MSR) distinct from existing frst-order optimizers. For Tong Zhang (HKUST) instance, they generate update steps that have implicit regularization and adapt as the problem **Instructions** hyperparameters (e.g. batch size) or architecture Please join us in gather.town for all breaks and (e.g. neural network width) change. Finally, these poster sessions (Click "Open Link" on any break learned optimizers show evidence of being useful or poster session). for out of distribution tasks such as training themselves from scratch. To see all submitted paper and posters, go to the "opt-ml website" at the top of the page.

Use RocketChat or Zoom link (top of page) if you OPT2020: Optimization for Machine want to ask the speaker a direct question during Learning the Live Q&A and Contributed Talks.

Courtney Paquette, Mark Schmidt, Sebastian Stich, Quanquan Gu, Martin Takac Schedule

Fri Dec 11, 03:15 AM 03:15 Welcome event (gather.town) Gu, AM Paquette, Schmidt, Stich, Takac Optimization lies at the heart of many machine learning algorithms and enjoys great interest in 03:50 Welcome remarks to Session 1 Stich our community. Indeed, this intimate relation of AM optimization with ML is the key motivation for the 04:00 Invited speaker: The Convexity of OPT series of workshops. AM Learning Infnite-width Deep Neural Networks, Tong Zhang Zhang Looking back over the past decade, a strong 04:20 Live Q&A with Tong Zhang (Zoom) trend is apparent: The intersection of OPT and ML AM Stich has grown to the point that now cutting-edge 04:30 Invited speaker: Adaptation and advances in optimization often arise from the ML AM universality in frst-order methods, community. The distinctive feature of Volkan Cevher Cevher optimization within ML is its departure from 05:00 Contributed talks in Session 1 textbook approaches, in particular, its focus on a AM (Zoom) Stich, Condat, Li, Shamir, Vlaar, diferent set of goals driven by "big-data, Zaki nonconvexity, and high-dimensions," where both 05:00 Contributed Video: Distributed theory and implementation are crucial. AM Proximal Splitting Algorithms with Rates and Acceleration, Laurent We wish to use OPT 2020 as a platform to foster Condat Condat discussion, discovery, and dissemination of the 05:00 Contributed Video: Employing No state-of-the-art in optimization as relevant to AM Regret Learners for Pure Exploration machine learning. And well beyond that: as a in Linear Bandits, Mohammadi Zaki platform to identify new directions and Zaki challenges that will drive future research, and 05:00 Contributed Video: PAGE: A Simple continue to build the OPT+ML joint research AM and Optimal Probabilistic Gradient community. Estimator for Nonconvex Optimization, Zhize Li Li **Invited Speakers** Volkan Cevher (EPFL) Michael Friedlander (UBC) 37 Dec. 11, 2020

05:00 Contributed Video: Constraint-Based 09:20 Live Q&A with Suvrit Sra (Zoom) AM Regularization of Neural Networks, AM Takac Tifany Vlaar Vlaar 09:45 Poster Session 2 (gather.town) 05:00 Contributed Video: Can We Find AM Vaswani, Loizou, Li, Nakkiran, Gao, AM Near-Approximately-Stationary Baghal, Wu, Yousefzadeh, Wang, Wang, Points of Nonsmooth Nonconvex Xie, Borovykh, Jastrzebski, Dan, Zhang, Functions?, Ohad Shamir Shamir Tuddenham, Pattathil, Redko, Cohen, 06:00 Poster Session 1 (gather.town) Esfandiari, Jiang, ElAraby, Yun, Psenka, AM Condat, Vlaar, Shamir, Zaki, Li, Liu, Gower, Wang Horváth, Safaryan, Choukroun, Shridhar, 10:50 Welcome remarks to Session 3 Kahale, Jin, Jawanpuria, Yadav, Koyama, AM Schmidt Kim, Li, Purkayastha, Salim, Banerjee, 11:00 Invited speaker: Stochastic Geodesic Richtarik, Mahto, Ye, Mishra, Liu, Zhu AM Optimization, Ashia Wilson Wilson 06:50 Welcome remarks to Session 2 Takac 11:20 Live Q&A with Ashia Wilson (Zoom) AM AM Schmidt 07:00 Invited speaker: Adaptive Sampling 11:30 Invited speaker: Concentration for AM for Stochastic Risk-Averse Learning, AM matrix products, and convergence of Andreas Krause Krause Oja’s algorithm for streaming PCA, 07:20 Live Q&A with Andreas Krause Rachel Ward Ward AM (Zoom) Takac 11:50 Live Q&A with Rachel Ward (Zoom) 07:30 Invited speaker: Practical Kronecker- AM Schmidt AM factored BFGS and L-BFGS methods 12:00 Contributed Video: Incremental for training deep neural networks, PM Greedy BFGS: An Incremental Quasi- Donald Goldfarb Goldfarb Newton Method with Explicit 08:00 Contributed Video: How to make Superlinear Rate, Zhan Gao Gao AM your optimizer generalize better, 12:00 Contributed Video: Learning Rate Sharan Vaswani Vaswani PM Annealing Can Provably Help 08:00 Contributed talks in Session 2 Generalization, Even for Convex AM (Zoom) Takac, Horváth, Liu, Loizou, Problems, Preetum Nakkiran Vaswani Nakkiran 08:00 Contributed Video: Adaptive 12:00 Contributed Video: TenIPS: Inverse AM Gradient Methods Converge Faster PM Propensity Sampling for Tensor with Over-Parameterization (and you Completion, Chengrun Yang Yang can do a line-search), Sharan 12:00 Contributed talks in Session 3 Vaswani Vaswani PM (Zoom) Schmidt, Gao, Li, Nakkiran, Wu, 08:00 Contributed Video: Adaptivity of Yang AM Stochastic Gradient Methods for 12:00 Contributed Video: When Does Nonconvex Optimization, Samuel PM Preconditioning Help or Hurt Horvath Horváth Generalization?, Denny Wu Wu 08:00 Contributed Video: DDPNOpt: 12:00 Contributed Video: Variance AM Diferential Dynamic Programming PM Reduction on Adaptive Stochastic Neural Optimizer, Guan-Horng Liu Liu Mirror Descent, Wenjie Li Li 08:00 Contributed Video: Stochastic 12:30 Break (gather.town) AM Polyak Step-size for SGD: An PM Adaptive Learning Rate for Fast 01:30 Invited speaker: Fast convergence of Convergence, Nicolas Loizou Loizou PM stochastic subgradient method 08:30 Break (gather.town) under interpolation, Michael AM Friedlander Friedlander 09:00 Invited speaker: SGD without 01:30 Intro to Invited Speaker 8 Schmidt AM replacement: optimal rate analysis PM and more, Suvrit Sra Sra 38 Dec. 11, 2020

01:50 Live Q&A with Michael Friedlander PM (Zoom) Schmidt Abstract 3: Invited speaker: The Convexity of 02:00 Poster Session 3 (gather.town) Wu, Learning Infnite-width Deep Neural PM Yang, Ergen, lotf, Guille-Escuret, Networks, Tong Zhang in OPT2020: Ginsburg, Lyu, Xie, Newton, Basu, Wang, Optimization for Machine Learning, Zhang Lucas, LI, Ding, Gonzalez Ortiz, Askari 04:00 AM Hemmat, Bu, Lawton, Thekumparampil, Liang, Roberts, Zhu, Zhou Deep learning has received considerable 02:50 Welcome remarks to Session 4 Gu empirical successes in recent years. Although PM deep neural networks (DNNs) are highly 03:00 Invited speaker: Online nonnegative nonconvex with respect to the model parameters, PM matrix factorization for Markovian it has been observed that the training of and other real data, Deanna Needell overparametrized DNNs leads to consistent and Hanbaek Lyu Lyu, Needell solutions that are highly reproducible with 03:20 Live Q&A with Deanna Needell and diferent random initializations. PM Hanbake Lyu (Zoom) Gu 03:30 Contributed Video: A Study of I will explain this phenomenon by modeling DNNs PM Condition Numbers for First-Order using feature representations, and show that the Optimization, Charles Guille-Escuret optimization landscape is convex with respect to Guille-Escuret the features. Moreover, we show that optimization with respect to the nonconvex DNN 03:30 Contributed Video: Stochastic parameters leads to a global optimal solution PM Damped L-BFGS with controlled under an idealized regularity condition, which can norm of the Hessian approximation, explain various empirical fndings. Sanae Lotf lotf 03:30 Contributed Video: Convex Programs PM for Global Optimization of Convolutional Neural Networks in Abstract 5: Invited speaker: Adaptation and Polynomial-Time, Tolga Ergen Ergen universality in frst-order methods, Volkan 03:30 Contributed Video: Afne-Invariant Cevher in OPT2020: Optimization for PM Analysis of Frank-Wolfe on Strongly Machine Learning, Cevher 04:30 AM Convex Sets, Lewis Liu In this talk, we review some of the recent 03:30 Contributed talks in Session 4 advances in frst-order methods for convex and PM (Zoom) Gu, lotf, Guille-Escuret, Ergen, non-convex optimization as well as their Zhou universality properties. We say an algorithm is 03:30 Contributed Video: On the universal if it does not require to know whether PM Convergence of Adaptive Gradient the optimization objective is smooth or not. Methods for Nonconvex Optimization, Dongruo Zhou Zhou We frst recall the AdaGrad method and show that 04:00 Closing remarks Gu, Paquette, AdaGrad is a universal algorithm without any PM Schmidt, Stich, Takac modifcations: It implicitly exploits the smoothness of the problem to achieve the standard O(1/k) rate in the presence of smooth Abstracts (39): objectives, where k is the iteration count.

Abstract 1: Welcome event (gather.town) in To this end, we introduce an accelerated, OPT2020: Optimization for Machine universal variant of AdaGrad, dubbed as Learning, Gu, Paquette, Schmidt, Stich, Takac AcceleGrad, that in addition obtains the optimal 03:15 AM convergence rate of O(1/k^2) in the smooth Please join us in gather.town for all breaks and setting with deterministic oracles. We then poster sessions. Click on "Open Link" to join introduce UniXGrad, which is the frst algorithm gather.town. that simultaneously achieves optimal rates for 39 Dec. 11, 2020

smooth or non-smooth problems with either Abstract 8: Contributed Video: Employing No deterministic or stochastic frst-order oracles in Regret Learners for Pure Exploration in the constrained convex setting. Linear Bandits, Mohammadi Zaki in OPT2020: Optimization for Machine We conclude the presentation with results in non- Learning, Zaki 05:00 AM convex optimization revolving around ADAM-type algorithms, including new convergence rates. We study the best arm identifcation problem in linear multi-armed bandits (LMAB) in the fxed confdence ($\delta$-PAC) setting; this is also the problem of optimizing an unknown linear function Abstract 6: Contributed talks in Session 1 over a discrete ground set with noisy, zeroth- (Zoom) in OPT2020: Optimization for order access. We propose an explicitly Machine Learning, Stich, Condat, Li, Shamir, implementable and provably order-optimal Vlaar, Zaki 05:00 AM sample-complexity algorithm to solve this

Join us to hear some new, exciting work at the problem. Most previous approaches rely on intersection of optimization and ML. Come and access to a minimax optimization oracle which is ask questions and join the discussion. at the heart of the complexity of the problem. We propose a method to solve this optimization problem (upto suitable accuracy) by interpreting Speakers: Laurent Condat, "Distributed Proximal Splitting the problem as a two-player zero-sum game, and Algorithms with Rates and Acceleration" attempting to sequentially converge to its saddle Zhize Li, "PAGE: A Simple and Optimal point using low-regret learners to compute the players' strategies in each round which yields a Probabilistic Gradient Estimator for Nonconvex concrete querying algorithm. The algorithm, Optimization" Ohad Shamir, "Can We Find Near-Approximately- which we call the {\em Phased Elimination Linear Stationary Points of Nonsmooth Nonconvex Exploration Game} (PELEG), maintains a high- Functions?" probability confdence ellipsoid containing $ \theta^*$ in each round and uses it to eliminate Tifany Vlaar, "Constraint-Based Regularization of suboptimal arms in phases. We analyze the Neural Networks" Mohammadi Zaki, "Employing No Regret Learners sample complexity of PELEG and show that it for Pure Exploration in Linear Bandits" matches, up to order, an instance-dependent lower bound on sample complexity in the linear bandit setting without requiring boundedness You can fnd a video on the NeurIPS website assumptions on the parameter space. PELEG is, where the speakers discuss in detail their paper. thus, the frst algorithm to achieve both order- optimal sample complexity and explicit implementability for this setting. We also provide Abstract 7: Contributed Video: Distributed numerical results for the proposed algorithm Proximal Splitting Algorithms with Rates consistent with its theoretical guarantees. and Acceleration, Laurent Condat in OPT2020: Optimization for Machine Learning, Condat 05:00 AM Abstract 9: Contributed Video: PAGE: A We propose new generic distributed proximal Simple and Optimal Probabilistic Gradient splitting algorithms, well suited for large-scale Estimator for Nonconvex Optimization, convex nonsmooth optimization. We derive Zhize Li in OPT2020: Optimization for sublinear and linear convergence results with Machine Learning, Li 05:00 AM new nonergodic rates, as well as new accelerated In this paper, we propose a novel stochastic versions of the algorithms, using varying gradient estimator---ProbAbilistic Gradient stepsizes. Estimator (PAGE)---for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD 40 Dec. 11, 2020

update with probability $p$ and reuses the Shamir in OPT2020: Optimization for previous gradient with a small adjustment, at a Machine Learning, Shamir 05:00 AM much lower computational cost, with probability It is well-known that given a bounded, smooth $1-p$. We give a simple formula for the optimal choice of $p$. We prove tight lower bounds for nonconvex function, standard gradient-based nonconvex problems, which are of independent methods can fnd $\epsilon$-stationary points interest. Moreover, we prove matching upper (where the gradient norm is less than $\epsilon$) in $O(1/\epsilon^2)$ iterations. However, many bounds both in the fnite-sum and online regimes, important nonconvex optimization problems, which establish that PAGE is an optimal method. Besides, we show that for nonconvex functions such as those associated with training modern satisfying the Polyak-\L ojasiewicz (PL) condition, neural networks, are inherently \emph{not} PAGE can automatically switch to a faster linear smooth, making these results inapplicable. Moreover, as recently pointed out in convergence rate. Finally, we conduct several \citet{zhang2020complexity}, it is generally deep learning experiments (e.g., LeNet, VGG, ResNet) on real datasets in PyTorch, and the impossible to provide fnite-time guarantees for results demonstrate that PAGE not only fnding an $\epsilon$-stationary point of converges much faster than SGD in training but nonsmooth functions. Perhaps the most natural relaxation of this is to fnd points which are also achieves the higher test accuracy, validating *near* such $\epsilon$-stationary points. In this our theoretical results and confrming the practical superiority of PAGE. paper, we show that even this relaxed goal is hard to obtain in general, given only black-box access to the function values and gradients. We also discuss the pros and cons of alternative Abstract 10: Contributed Video: Constraint- approaches. Based Regularization of Neural Networks, Tifany Vlaar in OPT2020: Optimization for Machine Learning, Vlaar 05:00 AM Abstract 12: Poster Session 1 (gather.town) We propose a method for efciently incorporating in OPT2020: Optimization for Machine constraints into a stochastic gradient Langevin Learning, Condat, Vlaar, Shamir, Zaki, Li, Liu, framework for the training of deep neural Horváth, Safaryan, Choukroun, Shridhar, Kahale, networks. Constraints allow direct control of the Jin, Jawanpuria, Yadav, Koyama, Kim, Li, parameter space of the model. Appropriately Purkayastha, Salim, Banerjee, Richtarik, Mahto, designed, they reduce the vanishing/exploding Ye, Mishra, Liu, Zhu 06:00 AM gradient problem, control weight magnitudes and stabilize deep neural networks and thus improve Please join us in gather.town for all breaks and the robustness of training algorithms and poster sessions. Click on "Open Link" to join generalization capabilities of the trained neural gather.town. network. We present examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit Abstract 14: Invited speaker: Adaptive weight normalizations. We describe the methods Sampling for Stochastic Risk-Averse in the overdamped formulation of Langevin Learning, Andreas Krause in OPT2020: dynamics and the underdamped form, in which Optimization for Machine Learning, Krause momenta help to improve sampling efciency. 07:00 AM Our methods see performance improvements on image classifcation tasks. In high-stakes machine learning applications, it is crucial to not only perform well on average, but also when restricted to difcult examples. To address this, we consider the problem of training Abstract 11: Contributed Video: Can We Find models in a risk averse manner. We propose an Near-Approximately-Stationary Points of adaptive sampling algorithm for stochastically Nonsmooth Nonconvex Functions?, Ohad optimizing the Conditional Value-at-Risk (CVaR) of a loss distribution. We use a distributionally 41 Dec. 11, 2020

robust formulation of the CVaR to phrase the solutions with varying generalization problem as a zero-sum game between two performance. In this setting, we show that players, and solve it efciently using regret projections onto linear spans can be used to minimization. Our approach relies on sampling move between solutions. Furthermore, via a from structured Determinantal Point Processes simple reparameterization, we can ensure that an (DPPs), which allows scaling it to large data sets. arbitrary optimizer converges to the minimum l2- Finally, we empirically demonstrate its norm solution with favourable generalization efectiveness on large-scale convex and non- properties. For under-parameterized linear convex learning tasks. classifcation, optimizers can converge to diferent decision boundaries separating the data. We prove that for any such classifer, there exists a family of quadratic norms ||.||_P such that the Abstract 16: Invited speaker: Practical Kronecker-factored BFGS and L-BFGS classifer's direction is the same as that of the methods for training deep neural networks, maximum P-margin solution. We argue that Donald Goldfarb in OPT2020: Optimization analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing for Machine Learning, Goldfarb 07:30 AM the norm induced by the data can result in better In training deep neural network (DNN) models, generalization. We validate our theoretical results computing and storing a full BFGS approximation via experiments on synthetic and real datasets. or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is impractical. In our methods, Abstract 18: Contributed talks in Session 2 we approximate the Hessian by a block-diagonal (Zoom) in OPT2020: Optimization for matrix and use the structure of the gradient and Machine Learning, Takac, Horváth, Liu, Loizou, Hessian to further approximate these blocks, Vaswani 08:00 AM each of which corresponds to a , as the Kronecker product of two much smaller matrices, Join us to hear some new, exciting work at the analogous to the approach in KFAC for intersection of optimization and ML. Come and approximating the Fisher matrix in a stochastic ask questions and join the discussion. natural gradient method. Because of the indefnite and highly variable nature of the Speakers: Hessian in a DNN, we also propose a new Samuel Horvath, "Adaptivity of Stochastic damping approach to keep the BFGS and L-BFGS Gradient Methods for Nonconvex Optimization" approximations bounded, both above and below. Guan-Horng Liu, "DDPNOpt: Diferential Dynamic In tests on autoencoder feed forward and Programming Neural Optimizer" convolutional neural network models, our Nicolas Loizou, "Stochastic Polyak Step-size for methods outperformed KFAC and were SGD: An Adaptive Learning Rate for Fast competitive with state-of-the-art frst-order Convergence" stochastic methods. Sharan Vaswani, "Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)" Sharan Vaswani, "How to make your optimizer Abstract 17: Contributed Video: How to make your optimizer generalize better, Sharan generalize better" Vaswani in OPT2020: Optimization for Machine Learning, Vaswani 08:00 AM You can fnd a video on the NeurIPS website where the speakers discuss in detail their paper. We study the implicit regularization of optimization methods for linear models interpolating the training data in the under- Abstract 19: Contributed Video: Adaptive parametrized and over-parametrized regimes. For Gradient Methods Converge Faster with over-parameterized linear regression, where Over-Parameterization (and you can do a there are infnitely many interpolating solutions, line-search), Sharan Vaswani in OPT2020: diferent optimization methods can converge to 42 Dec. 11, 2020

Optimization for Machine Learning, Vaswani theoretical results, such divisive strategies 08:00 AM provide little, if any, insight to practitioners to select algorithms that work broadly without Adaptive gradient methods are typically used for tweaking the hyperparameters. In this work, training over-parameterized models capable of blending the ``geometrization'' technique exactly ftting the data; we thus study their introduced by \cite{lei2016less} and the convergence in this interpolation setting. Under \texttt{SARAH} algorithm of an interpolation assumption, we prove that \cite{nguyen2017sarah}, we propose the AMSGrad with a constant step-size and Geometrized \texttt{SARAH} algorithm for non- momentum can converge to the minimizer at the convex fnite-sum and stochastic optimization. faster $O(1/T)$ rate for smooth, convex Our algorithm is proved to achieve adaptivity to functions. Furthermore, in this setting, we show both the magnitude of the target accuracy and that AdaGrad can achieve an $O(1)$ regret in the the Polyak-\L{}ojasiewicz (PL) constant, if online convex optimization framework. When present. In addition, it achieves the best-available interpolation is only approximately satisfed, we convergence rate for non-PL objectives show that constant step-size AMSGrad converges simultaneously while outperforming existing to a neighbourhood of the solution. On the other algorithms for PL objectives. hand, we prove that AdaGrad is robust to the violation of interpolation and converges to the minimizer at the optimal rate. However, we Abstract 21: Contributed Video: DDPNOpt: demonstrate that even for simple, convex problems satisfying interpolation, the empirical Diferential Dynamic Programming Neural performance of these methods heavily depends Optimizer, Guan-Horng Liu in OPT2020: on the step-size and requires tuning. We alleviate Optimization for Machine Learning, Liu 08:00 AM this problem by using stochastic line-search (SLS) and Polyak's step-sizes (SPS) to help these Interpretation of Deep Neural Networks (DNNs) methods adapt to the function's local training as an optimal control problem with smoothness. By using these techniques, we prove nonlinear dynamical systems has received that AdaGrad and AMSGrad do not require considerable attention recently, yet the knowledge of problem-dependent constants and algorithmic development remains relatively retain the convergence guarantees of their limited. In this work, we make an attempt along constant step-size counterparts. Experimentally, this line by frst showing that most widely-used we show that these techniques help improve the algorithms for training DNNs can be linked to the convergence and generalization performance Diferential Dynamic Programming (DDP), a across tasks, from binary classifcation with celebrated second-order method rooted in kernel mappings to classifcation with deep trajectory optimization. In this vein, we propose a neural networks. new class of optimizer, DDP Neural Optimizer (DDPNOpt), for training DNNs. DDPNOpt features layer-wise feedback policies which improve Abstract 20: Contributed Video: Adaptivity of convergence and robustness. It outperforms Stochastic Gradient Methods for Nonconvex other optimal-control inspired training methods in Optimization, Samuel Horvath in OPT2020: both convergence and complexity, and is Optimization for Machine Learning, Horváth competitive against state-of-the-art frst and 08:00 AM second order methods. Our work opens up new avenues for principled algorithmic design built Adaptivity is an important yet under-studied upon the optimal control theory. property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with Abstract 22: Contributed Video: Stochastic desirable theoretical guarantees typically involve Polyak Step-size for SGD: An Adaptive drastically diferent settings of hyperparameters, such as step-size schemes and batch sizes, in Learning Rate for Fast Convergence, Nicolas diferent regimes. Despite the appealing 43 Dec. 11, 2020

Loizou in OPT2020: Optimization for most ML toolkits), while version (i) is what almost Machine Learning, Loizou 08:00 AM all published work analyzes. This mismatch is well-known. It arises because analyzing SGD We propose a stochastic variant of the classical without replacement involves biased gradients Polyak step-size \citep{polyak1987introduction} and must cope with lack of independence commonly used in the subgradient method. between the stochastic gradients used. In this Although computing the Polyak step-size requires talk, I will present recent progress on analyzing knowledge of the optimal function values, this without replacement SGD, the bulk of which will information is readily available for typical modern focus on minimax optimal convergence rates. The machine learning applications. Consequently, the rates are obtained without assuming proposed stochastic Polyak step-size (SPS) is an componentwise convexity. I will mention further attractive choice for setting the learning rate for refnements of the results assuming this stochastic gradient descent (SGD). We provide additional convexity, which remove drawbacks theoretical convergence guarantees for SGD common to previous works (such as large number equipped with SPS in diferent settings, including of epochs required) strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is Abstract 26: Poster Session 2 (gather.town) in OPT2020: Optimization for Machine particularly efective when training over- Learning, Vaswani, Loizou, Li, Nakkiran, Gao, parameterized models capable of interpolating the training data. In this setting, we prove that Baghal, Wu, Yousefzadeh, Wang, Wang, Xie, SPS enables SGD to converge to the true solution Borovykh, Jastrzebski, Dan, Zhang, Tuddenham, at a fast rate without requiring the knowledge of Pattathil, Redko, Cohen, Esfandiari, Jiang, ElAraby, Yun, Psenka, Gower, Wang 09:45 AM any problem-dependent constants or additional computational overhead. We experimentally Please join us in gather.town for all breaks and validate our theoretical results via extensive poster sessions. Click on "Open Link" to join experiments on synthetic and real datasets. We gather.town. demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models. Abstract 28: Invited speaker: Stochastic Geodesic Optimization, Ashia Wilson in OPT2020: Optimization for Machine Learning, Wilson 11:00 AM Abstract 23: Break (gather.town) in OPT2020: Optimization for Machine Learning, 08:30 AM Geodesic convexity ofers a promising systematic way to handle non-convexity for many problems Please join us in gather.town for all breaks and of interest in statistics and computer science. The poster sessions. Click on "Open Link" to join focus of this talk will be to describe eforts to gather.town. extend the basic tools of convex optimization on Euclidean space to the general setting of Riemannian manifolds. We begin by motivating Abstract 24: Invited speaker: SGD without our focus on geodesic optimization with several replacement: optimal rate analysis and examples, reviewing the basics of geodesic more, Suvrit Sra in OPT2020: Optimization spaces and several techniques from optimization for Machine Learning, Sra 09:00 AM along the way. Particular attention will be given to optimization techniques which achieve oracle Stochastic gradient descent (SGD) is the lower bounds for minimizing stochastic functions, workhorse of machine learning. There are two namely accelerated methods. We end with a fundamental versions of SGD: (i) those that pick discussion of how one might adapt these stochastic gradients with replacement, and (ii) techniques to the Riemannian setting. those that pick without replacement. Ironically, version (ii) is what is used in practice (across 44 Dec. 11, 2020

Abstract 30: Invited speaker: Concentration Abstract 33: Contributed Video: Learning for matrix products, and convergence of Rate Annealing Can Provably Help Oja’s algorithm for streaming PCA, Rachel Generalization, Even for Convex Problems, Ward in OPT2020: Optimization for Machine Preetum Nakkiran in OPT2020: Optimization Learning, Ward 11:30 AM for Machine Learning, Nakkiran 12:00 PM

We present new nonasymptotic growth and Learning rate schedule can signifcantly afect concentration bounds for a product of generalization performance in modern neural independent random matrices, similar in spirit to networks, but the reasons for this are not yet concentration for sums of independent random understood. Li et al. (2019) recently proved this matrices developed in the previous decade. Our behavior can exist in a simplifed non-convex matrix product concentration bounds provide a neural-network setting. In this work, we show new, direct convergence proof of Oja's algorithm that this phenomenon can exist even for convex for streaming Principal Component Analysis, and learning problems -- in particular, linear should be useful more broadly for analyzing the regression in 2 dimensions. We give a toy convex convergence of stochastic gradient descent for problem where learning rate annealing (large certain classes of nonconvex optimization initial learning rate, followed by small learning problems, including neural networks. This talk rate) can lead gradient descent to minima with covers joint work with Amelia Henriksen, De provably better generalization than using a small Huang, Jon Niles-Weed, and Joel Tropp. learning rate throughout. In our case, this occurs due to a combination of the mismatch between the test and train loss landscapes, and early- stopping. Abstract 32: Contributed Video: Incremental Greedy BFGS: An Incremental Quasi-Newton Method with Explicit Superlinear Rate, Zhan Gao in OPT2020: Optimization for Machine Abstract 34: Contributed Video: TenIPS: Learning, Gao 12:00 PM Inverse Propensity Sampling for Tensor Completion, Chengrun Yang in OPT2020: Finite-sum minimization, i.e., problems where the Optimization for Machine Learning, Yang objective may be written as the sum over a 12:00 PM collection of instantaneous costs, are ubiquitous in modern machine learning and . Tensors are widely used to model relationships Efcient numerical techniques for their solution among objects in a high-dimensional space. The must trade of per-step complexity with the recovery of missing entries in a tensor has been number of steps required for convergence. extensively studied, generally under the Incremental Quasi-Newton methods (IQN) assumption that entries are missing completely achieve a favorable balance of these competing at random (MCAR). However, in most practical attributes in the sense that their complexity is settings, observations are missing not at random independent of the sample size, while their (MNAR): the probability that a given entry is convergence rate can be faster than linear. This observed (also called the propensity) may local superlinear behavior, to date, however, is depend on other entries in the tensor or even on known only asymptotically. In this work, we put the value of the missing entry. In this paper, we forth a new variant of IQN, specifcally of the study the problem of completing a partially Broyden-Fletcher-Goldfarb-Shanno (BFGS) type, observed tensor with MNAR observations, without that incorporates a greedy basis vector selection prior information about the propensities. To step, and admits a non-asymptotic explicit local complete the tensor, we assume that both the superlinear rate. To the best of our knowledge, original tensor and the tensor of propensities this is the frst time an explicit superlinear rate have low multilinear rank. The algorithm frst has been given for Quasi-Newton methods in the estimates the propensities using a convex incremental setting. relaxation and then predicts missing values using a randomized linear algebra approach, reweighting the observed tensor by the inverse propensities. We provide fnite-sample error 45 Dec. 11, 2020

bounds on the resulting complete tensor. performance of diferent optimizers depends on Numerical experiments demonstrate the label noise and ``shape'' of the signal (true efectiveness of our approach. parameters): when the labels are noisy, the model is misspecifed, or the signal is misaligned, NGD can achieve lower risk; conversely, GD generalizes better under clean labels, a well- Abstract 35: Contributed talks in Session 3 specifed model, or aligned signal. Based on this (Zoom) in OPT2020: Optimization for Machine Learning, Schmidt, Gao, Li, Nakkiran, analysis, we discuss approaches to manage the Wu, Yang 12:00 PM bias-variance tradeof, and the beneft of interpolating between frst- and second-order Join us to hear some new, exciting work at the updates. We then extend our analysis to intersection of optimization and ML. Come and regression in the reproducing kernel Hilbert space ask questions and join the discussion. and demonstrate that preconditioned GD can decrease the population risk faster than GD. Speakers: Lastly, we empirically compare the generalization Zhan Gao, "Incremental Greedy BFGS: An error of frst- and second-order optimizers in Incremental Quasi-Newton Method with Explicit neural network, and observe robust trends Superlinear Rate" matching our theoretical analysis. Wenjie Li, "Variance Reduction on Adaptive Stochastic Mirror Descent" Preetum Nakkiran, "Learning Rate Annealing Can Abstract 37: Contributed Video: Variance Provably Help Generalization, Even for Convex Reduction on Adaptive Stochastic Mirror Problems" Descent, Wenjie Li in OPT2020: Denny Wu, "When Does Preconditioning Help or Optimization for Machine Learning, Li 12:00 Hurt Generalization?" PM Chengrun Yang, "TenIPS: Inverse Propensity Sampling for Tensor Completion" We study the application of the variance reduction technique on general adaptive You can fnd a video on the NeurIPS website stochastic mirror descent algorithms in where the speakers discuss in detail their paper. nonsmooth nonconvex optimization problems. We prove that variance reduction helps to reduce the gradient complexity of most general stochastic

Abstract 36: Contributed Video: When Does mirror descent algorithms, so it works well with Preconditioning Help or Hurt time-varying steps sizes and adaptive Generalization?, Denny Wu in OPT2020: optimization algorithms such as AdaGrad. We check the validity of our claims using Optimization for Machine Learning, Wu 12:00 experiments in deep learning. PM

While second order optimizers such as natural gradient descent (NGD) often speed up Abstract 38: Break (gather.town) in OPT2020: optimization, their efect on generalization has Optimization for Machine Learning, 12:30 PM been called into question. This work presents a more nuanced view on how the \textit{implicit Please join us in gather.town for all breaks and bias} of optimizers afects the comparison of poster sessions. Click on "Open Link" to join generalization properties. We provide an exact gather.town. bias-variance decomposition of the of overparameterized ridgeless regression under a general class of preconditioner $ Abstract 39: Invited speaker: Fast \boldsymbol{P}$, and consider the inverse convergence of stochastic subgradient population Fisher information matrix (used in method under interpolation, Michael NGD) as a particular example. We determine the Friedlander in OPT2020: Optimization for optimal $\boldsymbol{P}$ for both the bias and Machine Learning, Friedlander 01:30 PM variance, and fnd that the relative generalization 46 Dec. 11, 2020

This paper studies the behaviour of the stochastic mixing condition. As the main application, by subgradient descent (SSGD) method applied to combining online non-negative matrix over-parameterized empirical-risk optimization factorization and a recent MCMC algorithm for models that exactly ft the training data. We sampling motifs from networks, we propose a prove that for models with composite structures novel framework of Network Dictionary Learning often found in neural networks, the interpolation that extracts `network dictionary patches' from a condition implies that the model is efectively given network in an online manner that encodes smooth at all minimizers, and therefore that main features of the network. We demonstrate SSGD converges at rates normally achievable this technique on real-world data and discuss only for smooth convex problems. We also prove recent extensions and variations. that the fast rates we derive are optimal for any subgradient method applied to convex problems where interpolation holds. Abstract 46: Contributed Video: A Study of Condition Numbers for First-Order This is joint work with Huang Fang and Zhenan Optimization, Charles Guille-Escuret in Fan. OPT2020: Optimization for Machine Learning, Guille-Escuret 03:30 PM

Abstract 42: Poster Session 3 (gather.town) In this work we introduce a new framework for in OPT2020: Optimization for Machine the theoretical study of convergence and tuning of frst-order optimization algorithms (FOA). The Learning, Wu, Yang, Ergen, lotf, Guille-Escuret, study of such algorithms typically requires Ginsburg, Lyu, Xie, Newton, Basu, Wang, Lucas, LI, Ding, Gonzalez Ortiz, Askari Hemmat, Bu, assumptions on the objective functions: the most Lawton, Thekumparampil, Liang, Roberts, Zhu, popular ones are probably smoothness and Zhou 02:00 PM strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a Please join us in gather.town for all breaks and class of perturbations quantifed via a new norm, poster sessions. Click on "Open Link" to join called *-norm. We show that adding a small gather.town. perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, Abstract 44: Invited speaker: Online nonnegative matrix factorization for we show that smoothness and strong convexity Markovian and other real data, Deanna can be heavily impacted by arbitrarily small Needell and Hanbaek Lyu in OPT2020: perturbations, leading to excessively conservative tunings and convergence issues. In Optimization for Machine Learning, Lyu, view of these observations, we propose a notion Needell 03:00 PM of continuity of the metrics, which is essential for Online Matrix Factorization (OMF) is a a robust tuning strategy.Since smoothness and fundamental tool for dictionary learning strong convexity are not continuous, we propose problems, giving an approximate representation a comprehensive study of existing alternative of complex data sets in terms of a reduced metrics which we prove to be continuous. We number of extracted features. Convergence describe their mutual relations and provide their guarantees for most of the OMF algorithms in the guaranteed convergence rates for the Gradient literature assume independence between data Descent algorithm accordingly tuned. matrices, and the case of dependent data streams remains largely unexplored. In this talk, we present results showing that a non-convex Abstract 47: Contributed Video: Stochastic generalization of the well-known OMF algorithm Damped L-BFGS with controlled norm of the for i.i.d. data converges almost surely to the set Hessian approximation, Sanae Lotf in of critical points of the expected loss function, OPT2020: Optimization for Machine even when the data matrices are functions of Learning, lotf 03:30 PM some underlying Markov chain satisfying a mild 47 Dec. 11, 2020

We propose a new stochastic variance-reduced norm-independent analysis of Frank Wolfe. Based damped L-BFGS algorithm, where we leverage on our analysis, we propose an afne-invariant estimates of bounds on the largest and smallest backtracking line-search. Interestingly, we show eigenvalues of the Hessian approximation to that typical backtracking line-searches using balance its quality and conditioning. Our smoothness of the objective function surprisingly algorithm, VARCHEN, draws from previous work converge to an afne-invariant stepsize, despite that proposed a novel stochastic damped L-BFGS using afne-dependent norms in the computation algorithm called SdLBFGS. We establish almost of the stepsize. sure converges to a stationary point and a complexity bound. We empirically demonstrate that VARCHEN is more robust than SdLBFGS-VR Abstract 50: Contributed talks in Session 4 and SVRG on a modifed DavidNet problem -- a (Zoom) in OPT2020: Optimization for highly nonconvex and ill-conditioned problem Machine Learning, Gu, lotf, Guille-Escuret, that arises in the context of deep learning, and Ergen, Zhou 03:30 PM their performance is comparable on a logistic regression problem and a nonconvex support- Join us to hear some new, exciting work at the vector machine problem. intersection of optimization and ML. Come and ask questions and join the discussion.

Abstract 48: Contributed Video: Convex Speakers: Tolga Ergen, "Convex Programs for Global Programs for Global Optimization of Optimization of Convolutional Neural Networks in Convolutional Neural Networks in Polynomial-Time, Tolga Ergen in OPT2020: Polynomial-Time" Optimization for Machine Learning, Ergen Charles Guille-Escuret, "A Study of Condition 03:30 PM Numbers for First-Order Optimization" Lewis Liu, "Afne-Invariant Analysis of Frank- We study training of Convolutional Neural Wolfe on Strongly Convex Sets" Networks (CNNs) with ReLU activations and Sanae Lotf, "Stochastic Damped L-BFGS with introduce exact convex optimization formulations controlled norm of the Hessian approximation" with a polynomial complexity with respect to the Dongruo Zhou, "On the Convergence of Adaptive number of data samples, the number of neurons Gradient Methods for Nonconvex Optimization" and data dimension. Particularly, we develop a convex analytic framework utilizing semi-infnite You can fnd a video on the NeurIPS website duality to obtain equivalent convex optimization where the speakers discuss in detail their paper. problems for two-layer CNNs, where convex problems are regularized by the sum of $\ell_2$ norms of variables. Abstract 51: Contributed Video: On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization, Dongruo Zhou Abstract 49: Contributed Video: Afne- in OPT2020: Optimization for Machine Invariant Analysis of Frank-Wolfe on Learning, Zhou 03:30 PM Strongly Convex Sets, Lewis Liu in OPT2020: Optimization for Machine Adaptive gradient methods are workhorses in Learning, 03:30 PM deep learning. However, the convergence guarantees of adaptive gradient methods for When the constraint set $\mathcal{C}$ is nonconvex optimization have not been strongly convex, the Frank-Wolfe algorithm, which thoroughly studied. In this paper, we provide a is afne co-variant, enjoys accelerated fne-grained convergence analysis for a general convergence rates. In contrast, existing results class of adaptive gradient methods including rely on norm-dependent assumptions, usually AMSGrad, RMSProp and AdaGrad. For smooth incurring non-afne invariant bounds. In this nonconvex functions, we prove that adaptive work, we introduce new structural assumptions gradient methods in expectation converge to a on the problem and derive an afne invariant, frst-order stationary point. Our convergence rate 48 Dec. 11, 2020

is better than existing results for adaptive education, this workshop is a step in that gradient methods in terms of dimension, and is community-building process, with a focus on strictly faster than stochastic gradient decent three things: (SGD) when the stochastic gradients are sparse. 1. identifying what learning platforms are of a To the best of our knowledge, this is the frst size and instrumentation that the ML community result showing the advantage of adaptive can leverage, gradient methods over SGD in nonconvex setting. 2. building a community of experts bringing In addition, we also prove high probability bounds rigorous theoretical and methodological insights on the convergence rates of AMSGrad, RMSProp across academia, industry, and education, to as well as AdaGrad, which have not been facilitate combinatorial innovation, established before. Our analyses shed light on 3. scoping potential Kaggle competitions and better understanding the mechanism behind “ImageNets for Education,” where benchmark adaptive gradient methods in optimizing datasets fne tuned to an education goal can fuel nonconvex objectives. goal-driven algorithmic innovation.

In addition to bringing speakers across verticals and issue areas, the talks and small group Advances and Opportunities: Machine conversations in this workshop will be designed Learning for Education for a diverse audience--from researchers, to industry professionals, to teachers and students.

Kumar Garg, Neil Hefernan, Kayla Meyers This interdisciplinary approach promises to generate new connections, high-potential Fri Dec 11, 05:30 AM partnerships, and inspire novel applications for machine learning in education. This workshop will explore how advances in machine learning could be applied to improve ​This workshop is not the frst Machine Learning educational outcomes. for Education workshop; there has been several (ml4ed.cc), and the existence of these others Such an exploration is timely given: the growth of speaks to recognition of the the obvious online learning platforms, which have the importance that ML will have for education potential to serve as testbeds and data sources; a moving forward! growing pool of CS talent hungry to apply their skills towards social impact; and the chaotic shift to online learning globally during COVID-19, and Schedule the many gaps it has exposed.

05:25 Welcome address Garg The opportunities for machine learning in AM education are substantial, from uses of NLP to 05:30 Opening Remarks from National power automated feedback for the substantial AM Science Foundation Director amounts of student work that currently gets no Sethuraman Panchanathan review, to advances in voice recognition Panchanathan diagnosing errors by early readers. 05:45 Panel discussion on efective AM partnerships to leverage machine Similar to the rise of computational biology, learning and improve education recognizing and realizing these opportunities will Garg, Ritter, Lim, Roschelle require a community of researchers and practitioners that are bilingual: technically adept 06:45 Carolyn Rosé, Professor of Human- at the cutting-edge advances in machine AM Computer Interaction at Carnegie learning, and conversant in most pressing Mellon University, The power of challenges and opportunities in education. intelligent conversation systems in collaborative learning Rosé With representation from senior representatives from industry, academia, government, and 49 Dec. 11, 2020

07:15 Jacob Whitehill, Assistant Professor 02:00 Closing remarks from Fei-Fei Li, AM of Computer Science at Worcester PM Sequoia Professor of Computer Polytechnic Institute, Using machine Science, Stanford University & Co- learning to create scientifc Director of Stanford’s Human- instruments for classroom Centered AI Institute Fei-Fei observation Whitehill N/A ImageNets for Math Handwriting 07:50 Panel discussion on ImageNets for Recognition: Aida Calculus Dataset AM education Garg, Whitmer, Picou, Hancock, Thomas Crossley N/A ImageNets for the Whole Child 09:00 Spotlight on ImageNets for Jarratt, Martinez AM Education N/A ImageNets for Math Errors Shukla, 09:30 Joon Suh Choi, PhD Candidate at Ching AM Georgia State University on research N/A ImageNets for Teaching CS Barnes, on ARTE Choi Price, Larimore 09:40 Zachary Pardos, Associate Professor, N/A ImageNets for Reading Gabrieli, AM Graduate School of Education, Bafour University of California, Berkeley, "Neural course embedding for recommendation" Pardos Abstracts (8): 10:10 Alina von Davier, Chief of Abstract 3: Panel discussion on efective AM Assessment, Duolingo, Machine partnerships to leverage machine learning learning and next generation and improve education in Advances and assessments von Davier Opportunities: Machine Learning for 10:30 Panel discussion of talent pipeline Education, Garg, Ritter, Lim, Roschelle 05:45 AM AM into education research and the learning engineering feld Garg, Tang, Moderator: Kumar Garg, Managing Director and Vase, Koedinger Head of Partnerships, Schmidt Futures 11:40 Remarks from Burr Settles, AM Research Director, DuoLingo Settles Panelists include: 12:00 Remarks from Candace Marie Thille, Steve Ritter, Founder & Chief Scientist, Carnegie PM Director of Learning Sciences, Learning Amazon.com Marie Thille Heejae Lim, Founder & CEO, TalkingPoints Jeremy Roschelle, Executive Director , Digital 12:15 Ryan Baker, Assistant Professor of Promise PM Economics and Education at the University of Pennsylvania, Predicting students’ afect and motivation through meta-cognitive Abstract 6: Panel discussion on ImageNets data Baker for education in Advances and 12:30 Discussion on how young Opportunities: Machine Learning for PM technologists can contribute to Education, Garg, Whitmer, Picou, Crossley 07:50 learning engineering Garg, Park, AM Binney, Mak Moderator: Kumar Garg, Managing Director and 12:40 Remarks from Bryan Richardson, Head of Partnerships, Schmidt Futures PM Senior Program Ofcer, the Bill & Melinda Gates Foundation’s K-12 Panelists: program Richardson John Whitmer, Former Senior Director of Data 01:00 Panel discussion on minimizing bias Science & Analytics, ACTnext PM in machine learning in education Aigner Piccou, Program Director, The Learning Hefernan, Osoba, Brunskill, Fisler Agency Lab 50 Dec. 11, 2020

Abstract 7: Spotlight on ImageNets for Moderator: Neil Hefernan, William Smith Dean's Education in Advances and Opportunities: Professor of Computer Science at Worcester Machine Learning for Education, 09:00 AM Polytechnic Institute, and Co-Founder of ASSISTments In 2007, Professor Fei-Fei Li started assembling a massive dataset of 14 million pictures, labeled Panelists: with the objects that appeared in those images. Osonde Osoba, Senior Information Scientist, This dataset, dubbed ImageNet, spurred dramatic RAND Corporation progress over the next decade in computer Emma Brunskill, Assistant Professor in the vision, the feld of artifcial intelligence that trains Computer Science Department, Stanford computers to understand images and videos. University Such datasets can serve as “benchmark” Kathi Fisler, Research Professor, Brown University challenges that researchers compete on, and incentivize advancements in fundamental and domain-specifc felds. Abstract 18: Closing remarks from Fei-Fei Li,

We sent solicited ideas for a potential dataset Sequoia Professor of Computer Science, that could drive a similarly transformative impact Stanford University & Co-Director of in education. Applicants submitted 300 word Stanford’s Human-Centered AI Institute in Advances and Opportunities: Machine abstracts, as we selected a few to showcase. Learning for Education, Fei-Fei 02:00 PM

Please use this time to listen to the recordings at Will be followed by a 10 minutes Q+A the bottom of the schedule to learn more about the benchmark data set ideas.

Abstract 19: ImageNets for Math : Aida Calculus Dataset in Abstract 11: Panel discussion of talent Advances and Opportunities: Machine pipeline into education research and the Learning for Education, Hancock, Thomas N/A learning engineering feld in Advances and Opportunities: Machine Learning for Authors: Education, Garg, Tang, Vase, Koedinger 10:30 Zac Hancock, Michael Chifala, Callie Federer, AM Jiamin He, & Quinn N Lathrop

Moderator: Kumar Garg One of the best ways to learn and practice math is by hand on paper. Digital math applications Panelists: can take advantage of this natural interaction by Richard Tang, Student, University of California, including a handwriting recognition capability. We Berkeley introduce a dataset that can be used to create Ajoy Vase, COO, the Learning Collider at Teachers such models to bridge math learners and digital College, Columbia University applications. Given the importance of Ken Koedinger, Professor of Human Computer mathematical expressions across all scientifc Interaction and Psychology, Carnegie Mellon branches, including physics, engineering, and University economics, this dataset can become an important resource for advancing the use of Q&A to follow machine learning for the beneft of education.

Our dataset (available at www.kaggle.com/ Abstract 17: Panel discussion on minimizing aidapearson/ocr-data) consists of 100,000 images bias in machine learning in education in of handwritten math expressions within calculus. Advances and Opportunities: Machine The images are synthetically generated which Learning for Education, Hefernan, Osoba, afords 100% correct pixel-level tagging and Brunskill, Fisler 01:00 PM results in realistic images capable of training models whose performance generalize to real 51 Dec. 11, 2020

images. It has a very permissive license, the full they work, and the data is structured by syntax collaboration tools of Kaggle, and standard data rules. Advances in programming analysis formats that increase generalizability and techniques for open-ended, sequential, and semi- usability. The dataset ofers something to all structured data will have broad applications levels, from beginners building simple character across educational domains. However, recent recognition models to experts who wish to predict advances in deep learning require larger datasets pixel-by-pixel masks with object detection models and more meaningful labels than those typically and decode the complex structure of math available from individual classrooms, expressions. necessitating cross-institutional data collection and labeling eforts. The most similar dataset is CROHME, which provides digital ink with stroke data. Our dataset Background and Progress: A series of workshops difers in that it focuses on images of math and from CS-SPLICE (https://cssplice.github.io/) and covers a targeted scope of limits expressions. CSEDM (http://go.ncsu.edu/csedm2020) have Also, because our dataset is generated, this brought together the research community to scope of math could be changed as needed and develop infrastructure and analysis techniques the size of the dataset is limited only by for programming data. The community has practicalities. developed the shared ProgSnap2 format for programming log data (https://go.ncsu.edu/ The ease of use and richness of the dataset will progsnap2), which is already used by 10+ hopefully excite ML researchers within education datasets, comprising 750,000+ program and draw new ML researchers to the feld. snapshots in various languages (many of the Applications beyond handwriting recognition datasets can be found on https:// include translating students’ math to pdfs and pslcdatashop.web.cmu.edu/). Researchers have automated grading for instructors. ML capabilities used this data to develop automated support built using this dataset would beneft many (e.g. hints, feedback, curated examples), predict educational institutions by helping to connect the student success, and personalize interventions. natural mode of math learners and digital The CSEDM Data Challenge (https://go.ncsu.edu/ educational applications csedm-dc) is a recurring data mining competition (held 2019, planned 2021) to gain insight from classroom programming data, which has helped to defne shared machine learning benchmarks Abstract 22: ImageNets for Teaching CS in Advances and Opportunities: Machine on common datasets. Learning for Education, Barnes, Price, Larimore N/A Next Steps: The key challenges will be collecting diverse existing datasets, and creating Abstract: infrastructure to support collecting and labeling In this breakout session, we propose the idea of new data. This will allow us to tackle novel an "ImageNet for Teaching Computer Science." research challenges, such as generalizing The proposed idea involves collecting a large set algorithms and labels across problems -- for of labeled programming datasets from example detecting knowledge components, classrooms, using a shared format, and strategies, or misconceptions on one problem developing a set of benchmarks and challenges using data from others. The CS-SPLICE and that will facilitate research for K-20 computing CSEDM communities include developers of many education. This data would beneft a growing widely-used educational programming platforms research community at the intersection of and will be important stakeholders in driving the computing education and learning analytics, with work forward. implications for students across many felds that teach computing. Acknowledgements: This refects joint work by Thomas Price, Tifany Barnes, Min Chi, Samiha Rationale: Programming data is ideal for learning Marwan, Yang Shi, Preya Shabrina, and Ye Mao at analytics/edu data mining, since it is rich, NC State University. It presents and builds on capturing students' every state and action as 52 Dec. 11, 2020

ideas and foundational work by the CS-SPLICE 05:45 Opening Remarks Bose and CSEDM teams. AM 06:00 Invited Talk 1: Geometric deep AM learning for 3D human body synthesis Bronstein Diferential Geometry meets Deep 06:30 Invited Talk 2: Gauge Theory in Learning (DifGeo4DL) AM Geometric Deep Learning Cohen 07:00 Contributed Talk 1: Learning Joey Bose, Emile Mathieu, Charline Le Lan, Ines AM Hyperbolic Representations for Chami, Fred Sala, Christopher De Sa, Maximillian Unsupervised 3D Segmentation Hsu, Nickel, Chris Ré, Will Hamilton Gu, Yeung

Fri Dec 11, 05:45 AM 07:06 Contributed Talk 2: Witness AM Autoencoder: Shaping the Latent Recent years have seen a surge in research at Space with Witness Complexes the intersection of diferential geometry and deep Varava, Kragic, Schönenberger, Chung, learning, including techniques for stochastic Chung, Polianskii optimization on curved spaces (e.g., hyperbolic or 07:12 Contributed Talk 3: A Riemannian spherical manifolds), learning embeddings for AM gradient fow perspective on non-Euclidean data, and generative modeling on learning deep linear neural networks Riemannian manifolds. Insights from diferential Terstiege, Rauhut, Bah, Westdickenberg geometry have led to new state of the art 07:18 Contributed Talk 4: Directional approaches to modeling complex real world data, AM Graph Networks Beaini, Passaro, such as graphs with hierarchical structure, 3D Létourneau, Hamilton, Corso, Liò medical data, and meshes. 07:24 Contributed Talk 5: A New Neural Thus, it is of critical importance to understand, AM Network Architecture Invariant to from a geometric lens, the natural invariances, the Action of Symmetry Subgroups equivariances, and symmetries that reside within Ozay, Kicki, Skrzypczynski data. 07:30 Virtual Cofee Break on Gather.Town AM In order to support the burgeoning interest of 08:00 Invited Talk 3: Reparametrization diferential geometry in deep learning, the AM invariance in representation primary goal for this workshop is to facilitate learning Hauberg community building and to work towards the identifcation of key challenges in comparison 08:30 Tree Covers: An Alternative to Metric with regular deep learning, along with techniques AM Embeddings Sahoo, Chami, Ré to overcome these challenges. With many new 08:30 Grassmann Iterative Linear researchers beginning projects in this area, we AM Discriminant Analysis with Proxy hope to bring them together to consolidate this Matrix Optimization Nagananda, fast-growing area into a healthy and vibrant Minnehan, Savakis subfeld. In particular, we aim to strongly promote 08:30 Isometric Gaussian Process Latent novel and exciting applications of diferential AM Variable Model Jørgensen, Hauberg geometry for deep learning with an emphasis on 08:30 A Metric for Linear Symmetry-Based bridging theory to practice which is refected in AM Disentanglement Pérez Rey, Tonnaer, our choices of invited speakers, which include Menkovski, Holenderski, Portegies both machine learning practitioners and 08:30 Graph of Thrones : Adversarial researchers who are primarily geometers. AM Perturbations dismantle Aristocracy in Graphs Jamadandi, Mudenagudi Schedule 08:30 Hermitian Symmetric Spaces for AM Graph Embeddings Lopez, Pozzetti, Trettel, Wienhard 05:00 gather.town 08:30 Quaternion Graph Neural Networks AM AM Nguyen, Nguyen, Phung 53 Dec. 11, 2020

08:30 Universal Approximation Property of 12:00 Convex Optimization for Blind AM Neural Ordinary Diferential PM Source Separation on a Statistical Equations Teshima, Tojo, Ikeda, Manifold Luo, azizi, Sugiyama Ishikawa, Oono 12:00 Unsupervised Orientation Learning 08:30 GENNI: Visualising the Geometry of PM Using Daems, Wyfels AM Equivalences for Neural Network 12:00 QuatRE: Relation-Aware Quaternions Identifability Kolbeinsson, Jennings, PM for Knowledge Graph Embeddings Deisenroth (he/him), Lengyel, Petangoda, Nguyen, Phung Lazarou, Highnam, Falk 01:00 Invited Talk 5: Disentangling 08:30 Deep Networks and the Multiple PM Orientation and Camera Parameters AM Manifold Problem Buchanan, Gilboa, from Cryo-Electron Microscopy Wright Images Using Diferential Geometry 08:30 Poster Session 1 on Gather.Town and Variational Autoencoders Miolane AM Bose, Chami 01:30 Invited Talk 6: Learning a robust 09:30 Panel Discussion Bose, Mathieu, Le PM classifer in hyperbolic space Weber AM Lan, Chami 10:15 Virtual Cofee Break on Gather.Town Abstracts (34): AM 10:45 Focused Breakout Session Chami, Abstract 1: gather.town in Diferential AM Bose Geometry meets Deep Learning N/A Focused Breakout Session (DifGeo4DL), 05:00 AM Companion Notebook: Poincare Embeddings For Poster sessions Gather.Town: https:// gather.town/app/jXxVw7lqYrgIZ2zL/difgeo4dl N/A Focused Breakout Session Companion Notebook: Wrapped Normal Distribution 11:30 Invited Talk 4: An introduction to the Abstract 3: Invited Talk 1: Geometric deep AM Calderon and Steklov inverse learning for 3D human body synthesis in problems on Riemannian manifolds Diferential Geometry meets Deep Learning with boundary Kamran (DifGeo4DL), Bronstein 06:00 AM 12:00 The Intrinsic Dimension of Images Geometric deep learning, a new class of ML PM and Its Impact on Learning Zhu, methods trying to extend the basic building Goldblum, Abdelkader, Goldstein, Pope blocks of deep neural architectures to geometric 12:00 Sparsifying networks by traversing data (point clouds, graphs, and meshes), has PM Geodesics Raghavan, Thomson recently excelled in many challenging analysis 12:00 Deep Riemannian Manifold Learning tasks in computer vision and graphics such as PM Lou, Nickel, Amos deformable 3D shape correspondence. In this 12:00 Extendable and invertible manifold talk, I will present recent research eforts in 3D PM learning with geometry regularized shape synthesis, focusing in particular on the autoencoders Duque, Morin, Wolf, Moon human body, face, and hands. 12:00 Leveraging Smooth Manifolds for PM Lexical Semantic Change Detection across Corpora Goel, Kumaraguru Abstract 4: Invited Talk 2: Gauge Theory in 12:00 Afnity guided Geometric Semi- Geometric Deep Learning in Diferential PM Supervised Metric Learning Dutta, Geometry meets Deep Learning Harandi, Shekhar (DifGeo4DL), Cohen 06:30 AM 12:00 Poster Session 2 on Gather.Town Le PM Lan, Mathieu It is often said that diferential geometry is in essence the study of connections on a principal 12:00 Towards Geometric Understanding bundle. These notions have been discovered PM of Low-Rank Approximation independently in gauge theory in physics, and Sugiyama, Ghalamkari 54 Dec. 11, 2020

over the last few years it has become clear that 3D segmentation on a hierarchical toy dataset they also provide a very general and systematic and the BraTS dataset. way to model convolutional neural networks on homogeneous spaces and general manifolds. Specifcally, representation spaces in these Abstract 6: Contributed Talk 2: Witness networks are described as felds of geometric Autoencoder: Shaping the Latent Space quantities on a manifold (i.e. sections of with Witness Complexes in Diferential associated vector bundles). These quantities can Geometry meets Deep Learning only be expressed numerically after making an (DifGeo4DL), Varava, Kragic, Schönenberger, arbitrary choice of frame / gauge (section of a Chung, Chung, Polianskii 07:06 AM principal bundle). Network layers map between representation spaces, and should be equivariant We present a Witness Autoencoder (W-AE) – an to symmetry transformations. In this talk I will autoencoder that captures geodesic distances of discuss two results that have a bearing on the data in the latent space. Our algorithm uses geometric deep learning research. First, we witness complexes to compute geodesic distance discuss the “ is all you need theorem” approximations on a mini-batch level, and which states that any linear equivariant map leverages topological information from the entire between homogeneous representation spaces is dataset while performing batch-wise a generalized convolution. Secondly, in the case approximations. This way, our method allows to of gauge symmetry (when all frames should be capture the global structure of the data even with considered equivalent), we show that defning a a small batch size, which is benefcial for large- non-trivial equivariant linear map between scale real-world data. We show that our method representation spaces requires the introduction of captures the structure of the manifold more a principal connection which defnes parallel accurately than the recently introduced transport. We will not assume familiarity with topological autoencoder (TopoAE). bundles or gauge theory, and use examples relevant to neural networks to illustrate the ideas. Abstract 7: Contributed Talk 3: A Riemannian gradient fow perspective on learning deep linear neural networks in Diferential Abstract 5: Contributed Talk 1: Learning Geometry meets Deep Learning Hyperbolic Representations for (DifGeo4DL), Terstiege, Rauhut, Bah, Unsupervised 3D Segmentation in Westdickenberg 07:12 AM Diferential Geometry meets Deep Learning (DifGeo4DL), Hsu, Gu, Yeung 07:00 AM We study the convergence of gradient fows related to learning deep linear neural networks There exists a need for unsupervised 3D from data. In this case, the composition of the segmentation on complex volumetric data, network layers amounts to simply multiplying the particularly when annotation ability is limited or weight matrices of all layers together, resulting in discovery of new categories is desired. Using the an overparameterized problem. The gradient fow observation that 3D data is innately hierarchical, with respect to these factors can be re- we propose learning efective representations of interpreted as a Riemannian gradient fow on the 3D patches for unsupervised segmentation manifold of rank-$r$ matrices endowed with a through a variational autoencoder with a suitable Riemannian metric. We show that the hyperbolic latent space and a proposed fow always converges to a critical point of the gyroplane convolutional layer, which better underlying functional. Moreover, we establish models underlying hierarchical structure within a that, for almost all initializations, the fow 3D image. We also introduce a hierarchical triplet converges to a global minimum on the manifold loss and multi-scale patch sampling scheme to of rank $k$ matrices for some $k\leq r$. embed relationships across varying levels of granularity. We demonstrate the efectiveness of our hyperbolic representations for unsupervised 55 Dec. 11, 2020

Abstract 8: Contributed Talk 4: Directional Graph Networks in Diferential Geometry Abstract 11: Invited Talk 3: meets Deep Learning (DifGeo4DL), Beaini, Reparametrization invariance in Passaro, Létourneau, Hamilton, Corso, Liò 07:18 AM representation learning in Diferential Geometry meets Deep Learning In order to overcome the expressive limitations of (DifGeo4DL), Hauberg 08:00 AM graph neural networks (GNNs), we propose the frst method that exploits vector fows over Generative models learn a compressed representation of data that is often used for graphs to develop globally consistent directional downstream tasks such as interpretation, and asymmetric aggregation functions. We show that our directional graph networks visualization and prediction via transfer learning. (DGNs) generalize convolutional neural networks Unfortunately, the learned representations are (CNNs) when applied on a grid. Whereas recent generally not statistically identifable, leading to a high risk of arbitrariness in the downstream theoretical works focus on understanding local tasks. We propose to use diferential geometry to neighbourhoods, local structures and local isomorphism with no global information fow, our construct representations that are invariant to novel theoretical framework allows directional reparametrizations, thereby solving the bulk of convolutional kernels in any graph. the identifability problem. We demonstrate that the approach is deeply tied to the uncertainty of First, by defning a vector feld in the graph, we the representation, and that practical develop a method of applying directional derivatives and smoothing by projecting node- applications require high-quality uncertainty specifc messages into the feld. quantifcation. With the identifability problem Then we propose the use of the Laplacian solved, we show how to construct better priors for generative models, and that the identifable eigenvectors as such vector feld. representations reveals signals in the data that Finally, we bring the power of CNN data augmentation to graphs by providing a means of were otherwise hidden. doing refection and rotation on the underlying directional feld. Abstract 12: Tree Covers: An Alternative to Metric Embeddings in Diferential Geometry Abstract 9: Contributed Talk 5: A New Neural meets Deep Learning (DifGeo4DL), Sahoo, Chami, Ré 08:30 AM Network Architecture Invariant to the Action of Symmetry Subgroups in We study the problem of fnding distance- Diferential Geometry meets Deep Learning preserving graph representations. (DifGeo4DL), Ozay, Kicki, Skrzypczynski 07:24 Most previous approaches focus on learning AM continuous embeddings in metric spaces such as We propose a computationally efcient $G$- Euclidean or hyperbolic spaces. Based on the observation that embedding into a metric space invariant neural network that approximates is not necessary to produce faithful functions invariant to the action of a given permutation subgroup $G \leq S_n$ of the representations, we explore a new conceptual symmetric group on input data. The key element approach to represent graphs using a collection of the proposed network architecture is a new of trees, namely a tree cover. We show that with the same amount of storage, covers achieve $G$-invariant transformation module, which lower distortion than learned metric embeddings. produces a $G$-invariant latent representation of the input data. While the distance induced by covers is not a metric, we fnd that tree covers still have the Theoretical considerations are supported by desirable properties of graph representations, including efciency in query and construction numerical experiments, which demonstrate the time. efectiveness and strong generalization properties of the proposed method in comparison to other $G$-invariant neural networks. 56 Dec. 11, 2020

Abstract 13: Grassmann Iterative Linear captures the symmetries of data. However, it is Discriminant Analysis with Proxy Matrix not clear how to measure the degree to which a Optimization in Diferential Geometry meets data representation fulflls these properties. In Deep Learning (DifGeo4DL), Nagananda, this work, we propose a metric for the evaluation Minnehan, Savakis 08:30 AM of the level of LSBD that a data representation achieves Linear Discriminant Analysis (LDA) is one of the We provide a practical method to evaluate this most common methods for dimensionality metric and use it to evaluate the reduction in and statistics. It disentanglement for the data representation is a supervised method that aims to fnd the most obtained for three datasets with underlying SO(2) discriminant space in the reduced dimensional symmetries. space, which can be further used with a linear classifer for classifcation. In this work, we present an iterative optimization method called Abstract 16: Graph of Thrones : Adversarial the Proxy Matrix Optimization (PMO) which makes Perturbations dismantle Aristocracy in use of automatic diferentiation and stochastic gradient descent (SGD) on the Grassmann Graphs in Diferential Geometry meets Deep manifold to arrive at the optimal projection Learning (DifGeo4DL), Jamadandi, matrix. We show that PMO does better than the Mudenagudi 08:30 AM prevailing manifold optimization methods. This paper investigates the efect of adversarial perturbations on the hyperbolicity of graphs. Learning low-dimensional embeddings of graph Abstract 14: Isometric Gaussian Process data in certain curved Riemannian manifolds has Latent Variable Model in Diferential recently gained traction due to their desirable Geometry meets Deep Learning property of acting as useful geometrical inductive (DifGeo4DL), Jørgensen, Hauberg 08:30 AM biases. More specifcally, models of Hyperbolic geometry such as Poincar\'{e} Ball and We propose a fully generative unsupervised Hyperboloid Model have found extensive model where the latent variable respects both applications for learning representations of the distances and the topology of the modeled discrete data such as Graphs and Trees with data. The model leverages the Riemannian hierarchical anatomy. The hyperbolicity concept geometry of the generated manifold to endow indicates whether the graph data under the latent space with a well-defned stochastic consideration is suitable for embedding in distance measure, which is modeled as Nakagami hyperbolic geometry. Lower values of distributions. These stochastic distances are hyperbolicity imply distortion-free embedding in sought to be as similar as possible to observed hyperbolic space. We study adversarial distances along a neighborhood graph through a perturbations that attempt to poison the graph censoring process. The model is inferred by structure, consequently rendering hyperbolic variational inference. We demonstrate how the geometry an inefective choice for learning new model can encode invariances in the learned representations. To circumvent this problem, we manifolds. advocate for utilizing Lorentzian manifolds in machine learning pipelines and empirically show they are better suited to learn hierarchical Abstract 15: A Metric for Linear Symmetry- relationships. Despite the recent proliferation of Based Disentanglement in Diferential adversarial robustness methods in the graph Geometry meets Deep Learning data, this is the frst work that explores the (DifGeo4DL), Pérez Rey, Tonnaer, Menkovski, relationship between adversarial attacks and Holenderski, Portegies 08:30 AM hyperbolicity property while also providing resolution to navigate such vulnerabilities. The defnition of Linear Symmetry-Based Disentanglement (LSBD) proposed by Higgins et al. outlines the properties that should characterize a disentangled representation that 57 Dec. 11, 2020

Abstract 17: Hermitian Symmetric Spaces for and graphs. The Quaternion space, a hyper- Graph Embeddings in Diferential Geometry complex vector space, provides highly meets Deep Learning (DifGeo4DL), Lopez, meaningful computations through Hamilton Pozzetti, Trettel, Wienhard 08:30 AM product compared to the Euclidean and complex vector spaces. As a result, our QGNN can reduce Learning faithful graph representations as sets of the model size up to four times and enhance vertex embeddings has become a fundamental learning better graph representations. intermediary step in a wide range of machine Experimental results show that the proposed learning applications. The quality of the QGNN produces state-of-the-art accuracies on a embeddings is usually determined by how well range of well-known benchmark datasets for the geometry of the target space matches the three downstream tasks, including graph structure of the data. In this work we learn classifcation, semi-supervised node continuous representations of graphs in spaces of classifcation, and text (node) classifcation. symmetric matrices over C. These spaces ofer a rich geometry that simultaneously admits hyperbolic and Euclidean subspaces, and are amenable to analysis and explicit computations. Abstract 19: Universal Approximation We implement an efcient method to learn Property of Neural Ordinary Diferential embeddings and compute distances, and develop Equations in Diferential Geometry meets Deep Learning (DifGeo4DL), Teshima, Tojo, the tools to operate with such spaces. The Ikeda, Ishikawa, Oono 08:30 AM proposed models are able to automatically adapt to very dissimilar arrangements without any Neural ordinary diferential equations (NODEs) is apriori estimates of graph features. On various an invertible neural network architecture datasets with very diverse structural properties promising for its free-form Jacobian and the and reconstruction measures our model ties the availability of a tractable Jacobian determinant results of competitive baselines for geometrically estimator. Recently, the representation power of pure graphs and outperforms them for graphs NODEs has been partly uncovered: they form an with mixed geometric features, showcasing the $L^p$-universal approximator for continuous versatility of our approach. maps under certain conditions. However, the $L^p$-universality may fail to guarantee an approximation for the entire input domain as it Abstract 18: Quaternion Graph Neural may still hold even if the approximator largely Networks in Diferential Geometry meets difers from the target function on a small region Deep Learning (DifGeo4DL), Nguyen, Nguyen, of the input space. To further uncover the Phung 08:30 AM potential of NODEs, we show their stronger approximation property, namely the $\sup$- Recently, graph neural networks (GNNs) become universality for approximating a large class of a principal research direction to learn low- difeomorphisms. It is shown by leveraging a dimensional continuous embeddings of nodes structure theorem of the difeomorphism group, and graphs to predict node and graph labels, and the result complements the existing respectively. However, Euclidean embeddings literature by establishing a fairly large set of have high distortion when using GNNs to model mappings that NODEs can approximate with a complex graphs such as social networks. stronger guarantee. Furthermore, existing GNNs are not very efcient with the high number of model parameters when increasing the number of hidden layers. Therefore, we move beyond the Euclidean space Abstract 20: GENNI: Visualising the Geometry to a hyper-complex vector space to improve of Equivalences for Neural Network graph representation quality and reduce the Identifability in Diferential Geometry meets Deep Learning (DifGeo4DL), number of model parameters. To this end, we Kolbeinsson, Jennings, Deisenroth (he/him), propose quaternion graph neural networks (QGNN) to generalize GCNs within the Quaternion Lengyel, Petangoda, Lazarou, Highnam, Falk space to learn quaternion embeddings for nodes 08:30 AM 58 Dec. 11, 2020

In this paper, we propose an efcient algorithm to Abstract 26: Focused Breakout Session visualise symmetries in neural networks. Typically Companion Notebook: Poincare Embeddings the models are defned with respect to a in Diferential Geometry meets Deep parameter space, where non-equal parameters Learning (DifGeo4DL), N/A can produce the same function. Our proposed tool, GENNI, allows us to identify parameters that Link to Collab notebook on Poincare are functionally equivalent and to then visualise Embeddings. the subspace of the resulting equivalence class. Specifcally, we experiment on simple cases, to demonstrate how to identify and provide possible Abstract 27: Focused Breakout Session solutions for more complicated scenarios. Companion Notebook: Wrapped Normal Distribution in Diferential Geometry meets Deep Learning (DifGeo4DL), N/A Abstract 21: Deep Networks and the Multiple Link to Google Collab notebook on plotting a Manifold Problem in Diferential Geometry Wrapped Normal Distribution. meets Deep Learning (DifGeo4DL), Buchanan, Gilboa, Wright 08:30 AM

We study the multiple manifold problem, a binary Abstract 28: Invited Talk 4: An introduction to classifcation task modeled on applications in the Calderon and Steklov inverse problems machine vision, in which a deep fully-connected on Riemannian manifolds with boundary in neural network is trained to separate two low- Diferential Geometry meets Deep Learning dimensional submanifolds of the unit sphere. We (DifGeo4DL), Kamran 11:30 AM provide an analysis of the one-dimensional case, proving for a simple manifold confguration that Given a compact Riemannian manifold with when the network depth L is large relative to boundary, the Dirichlet-to-Neumann operator is a certain geometric and statistical properties of the non-local map which assigns to data prescribed on the boundary of the manifold the normal data, the network width n grows as a sufciently derivative of the unique solution of the Laplace- large polynomial in L, and the number of i.i.d. samples from the manifolds is polynomial in L, Beltrami equation determined by the given randomly-initialized gradient descent rapidly boundary data. Physically, it can be thought of for learns to classify the two manifolds perfectly with example as a voltage to current map in an anisotropic medium in which the conductivity is high probability. Our analysis demonstrates modeled geometrically through a Riemannian concrete benefts of depth and width in the context of a practically-motivated model metric. The Calderon problem is the inverse problem: the depth acts as a ftting resource, with problem of recovering the Riemannian metric larger depths corresponding to smoother from the Dirichlet-to-Neumann operator, while the Steklov inverse problem is to recover the networks that can more readily separate the metric from the knowledge of the spectrum of the class manifolds, and the width acts as a statistical resource, enabling concentration of the Dirichlet-to-Neumann operator. These inverse randomly-initialized network and its gradients. problems are both severely ill-posed . We will Along the way, we establish essentially optimal give an overview of some of the main results known about these questions, and time nonasymptotic rates of concentration for the permitting, we will discuss the question of neural tangent kernel of deep fully-connected ReLU networks using martingale techniques, stability for the inverse Steklov problem. requiring width n \geq L poly(d_0) to achieve uniform concentration of the initial kernel over a d_0-dimensional submanifold of the unit sphere. Abstract 29: The Intrinsic Dimension of Our approach should be of use in establishing Images and Its Impact on Learning in similar results for other network architectures. Diferential Geometry meets Deep Learning (DifGeo4DL), Zhu, Goldblum, Abdelkader, Goldstein, Pope 12:00 PM 59 Dec. 11, 2020

It is widely believed that natural image data specifcally the Riemannian exponential and exhibits low-dimensional structure despite being logarithmic maps--are solved using approximate embedded in a high-dimensional pixel space. This numerical techniques. Input and parameter idea underlies a common intuition for the success gradients are computed with an adjoint of deep learning and has been exploited for sensitivity analysis. This enables us to ft enhanced regularization and adversarial geodesics and distances with gradient-based robustness. In this work, we apply dimension optimization of both on-manifold values and the estimation tools to popular datasets and manifold itself. We demonstrate our method's investigate the role of low dimensional structure capability to model smooth, fexible metric in neural network learning. We fnd that common structures in graph and dynamical system natural image datasets indeed have very low embedding tasks. intrinsic dimension relative to the high number of pixels in the images. Additionally, we fnd that low dimensional datasets are easier for neural Abstract 32: Extendable and invertible networks to learn. We validate our fndings by manifold learning with geometry carefully-designed experiments to vary the regularized autoencoders in Diferential intrinsic dimension of both synthetic and real Geometry meets Deep Learning data and evaluate its impact on sample (DifGeo4DL), Duque, Morin, Wolf, Moon 12:00 complexity. PM

A fundamental task in data exploration is to extract simplifed low dimensional Abstract 30: Sparsifying networks by traversing Geodesics in Diferential representations that capture intrinsic geometry in Geometry meets Deep Learning data, especially for faithfully visualizing data in (DifGeo4DL), Raghavan, Thomson 12:00 PM two or three dimensions. Common approaches to this task use kernel methods for manifold The geometry of weight spaces and functional learning. However, these methods typically only manifolds of neural networks play an important provide an embedding of fxed input data and role towards `understanding' the intricacies of cannot extend to new data points. Autoencoders ML. In this paper, we attempt to solve certain have also recently become popular for open questions in ML, by viewing them through representation learning. But while they naturally the lens of geometry, ultimately relating it to the compute feature extractors that are both discovery of points or paths of equivalent extendable to new data and invertible (i.e., function in these spaces. We propose a reconstructing original features from latent mathematical framework to evaluate geodesics representation), they have limited capabilities to in the functional space, to fnd high-performance follow global intrinsic geometry compared to paths from a dense network to its sparser kernel-based manifold learning. We present a counterpart. Our results are obtained on VGG-11 new method for integrating both approaches by trained on CIFAR-10 and MLP's trained on MNIST. incorporating a geometric regularization term in Broadly, we demonstrate that the framework is the bottleneck of the autoencoder. Our general, and can be applied to a wide variety of regularization, based on the difusion potential problems, ranging from sparsifcation to distances from the recently-proposed PHATE alleviating catastrophic forgetting. visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension Abstract 31: Deep Riemannian Manifold to new data and reconstruction of data in the Learning in Diferential Geometry meets Deep Learning (DifGeo4DL), Lou, Nickel, original feature space from latent coordinates. Amos 12:00 PM We compare our approach with leading kernel methods and autoencoder models for manifold We present a new class of learnable Riemannian learning to provide qualitative and quantitative manifolds with a metric parameterized by a deep evidence of our advantages in preserving intrinsic neural network. The core manifold operations-- 60 Dec. 11, 2020

structure, out of sample extension, and in Diferential Geometry meets Deep reconstruction. Learning (DifGeo4DL), Sugiyama, Ghalamkari 12:00 PM

Rank reduction of matrices has been widely Abstract 33: Leveraging Smooth Manifolds studied in linear algebra. However, its geometric for Lexical Semantic Change Detection understanding is limited and theoretical across Corpora in Diferential Geometry connection to statistical models remains meets Deep Learning (DifGeo4DL), Goel, unrevealed. We tackle this problem using Kumaraguru 12:00 PM information geometry and present a geometric Comparing two bodies of text and detecting unifed view of matrix rank reduction. Our key words with signifcant lexical semantic shift idea is to treat each matrix as a probability distribution represented by the log-linear model between them is an important part of digital on a partially ordered set (poset), which enables humanities. Traditional approaches have relied on aligning the diferent embeddings in the us to formulate rank reduction as projection onto Euclidean space using the Orthogonal Procrustes a statistical submanifold, which corresponds to problem. This study presents a geometric the set of low-rank matrices. This geometric view enables us to derive a novel efcient rank-1 framework that leverages optimization on smooth reduction method, called Legendre rank-1 Riemannian manifolds for obtaining corpus- specifc orthogonal rotations and a corpus- reduction, which analytically solves mean-feld independent scaling to project the diferent approximation and minimizes the KL divergence vector spaces into a shared latent space. This from a given matrix. enables us to capture any afne relationship between the embedding spaces while utilising the rich geometry of smooth manifolds. Abstract 37: Convex Optimization for Blind Source Separation on a Statistical Manifold in Diferential Geometry meets Deep

Abstract 34: Afnity guided Geometric Semi- Learning (DifGeo4DL), Luo, azizi, Sugiyama Supervised Metric Learning in Diferential 12:00 PM Geometry meets Deep Learning We present a novel blind source separation (BSS) (DifGeo4DL), Dutta, Harandi, Shekhar 12:00 PM method using a hierarchical structure of sample space that is incorporated with a log-linear In this paper, we revamp the forgotten classical model. Our approach is formulated as a convex Semi-Supervised Distance Metric Learning (SSDML) problem from a Riemannian geometric optimization with theoretical guarantees to lens, to leverage stochastic optimization within a uniquely recover a set of source signals by end-to-end deep framework. The motivation minimizing the KL divergence from a set of mixed signals. Source signals, received signals, and comes from the fact that apart from a few mixing matrices are realized as diferent layers in classical SSDML approaches learning a linear Mahalanobis metric, deep SSDML has not been our hierarchical sample space. Our empirical studied. We frst extend existing SSDML methods results have demonstrated superiority compared to their deep counterparts and then propose a to well established techniques. new method to overcome their limitations. Due to the nature of constraints on our metric parameters, we leverage Riemannian Abstract 38: Unsupervised Orientation optimization. Our deep SSDML method with a Learning Using Autoencoders in Diferential novel afnity propagation based triplet mining Geometry meets Deep Learning strategy outperforms its competitors. (DifGeo4DL), Daems, Wyfels 12:00 PM

We present a method to learn the orientation of symmetric objects in real-world images in an Abstract 36: Towards Geometric unsupervised way. Understanding of Low-Rank Approximation Our method explicitly maps in-plane relative 61 Dec. 11, 2020

rotations to the latent space of an autoencoder, Cryo-Electron Microscopy Images Using by rotating both in the image domain and latent Diferential Geometry and Variational domain. Autoencoders in Diferential Geometry This is achieved by adding a proposed meets Deep Learning (DifGeo4DL), Miolane \textit{crossing loss} to a standard autoencoder 01:00 PM training framework which enforces consistency between the image domain and latent domain Cryo-electron microscopy (cryo-EM) is capable of producing reconstructed 3D images of rotations. biomolecules at near-atomic resolution. However, This relative representation of rotation is made absolute, by using the symmetry of the observed raw cryo-EM images are highly corrupted 2D object, resulting in an unsupervised method to projections of the target 3D biomolecules. learn the orientation. Reconstructing the 3D molecular shape requires the estimation of the orientation of the Furthermore, orientation is disentangled in latent biomolecule that has produced the given 2D space from other descriptive factors. We apply this method on two real-world datasets: image, and the estimation of camera parameters aerial images of planes in the DOTA dataset and to correct for intensity defects. Current images of densely packed honeybees. techniques performing these tasks are often computationally expensive, while the dataset We empirically show this method can learn sizes keep growing. There is a need for next- orientation using no annotations with high accuracy compared to the same models trained generation algorithms that preserve accuracy with annotations. while improving speed and scalability. In this paper, we combine variational autoencoders (VAEs) to learn a low-dimensional latent representation of cryo-EM images. Analyzing the Abstract 39: QuatRE: Relation-Aware latent space with diferential geometry of shape Quaternions for Knowledge Graph spaces leads us to design a new estimation Embeddings in Diferential Geometry meets method for orientation and camera parameters of Deep Learning (DifGeo4DL), Nguyen, Phung single-particle cryo-EM images, that has the 12:00 PM potential to accelerate the traditional

We propose an efective embedding model, reconstruction algorithm. named QuatRE, to learn quaternion embeddings for entities and relations in knowledge graphs. QuatRE aims to enhance correlations between Abstract 41: Invited Talk 6: Learning a robust head and tail entities given a relation within the classifer in hyperbolic space in Diferential Quaternion space with Hamilton product. QuatRE Geometry meets Deep Learning achieves this goal by further associating each (DifGeo4DL), Weber 01:30 PM relation with two relation-aware quaternion Recently, there has been a surge of interest in vectors which are used to rotate the head and tail representing large-scale, hierarchical data in entities' quaternion embeddings, respectively. To obtain the triple score, QuatRE rotates the hyperbolic spaces to achieve better rotated embedding of the head entity using the representation accuracy with lower dimensions. normalized quaternion embedding of the relation, However, beyond representation learning, there are few empirical and theoretical results that followed by a quaternion-inner product with the develop performance guarantees for downstream rotated embedding of the tail entity. Experimental results demonstrate that our QuatRE produces machine learning and optimization tasks in state-of-the-art performances on four well-known hyperbolic spaces. In this talk we consider the benchmark datasets for knowledge graph task of learning a robust classifer in hyperbolic space. We start with algorithmic aspects of completion. developing analogues of classical methods, such as the or support vector machines, in hyperbolic spaces. We also discuss more broadly Abstract 40: Invited Talk 5: Disentangling the challenges of generalizing such methods to Orientation and Camera Parameters from non-Euclidean spaces. Furthermore, we analyze 62 Dec. 11, 2020

the role of geometry in learning robust classifers 07:30 Darrell West (TBD) West by evaluating the trade-of between low AM embedding dimensions and low distortion for 08:00 Adversarial, Socially Aware, and both Euclidean and hyperbolic spaces. AM Commonsensical Data Choi 08:30 Discussion panel AM 08:45 Lunch Break Workshop on Dataset Curation and AM Security 10:00 Dataset Curation via Active Learning AM Nowak Nathalie Baracaldo Angel, Yonatan Bisk, Avrim Blum, 10:30 Don't Steal Data O'Sullivan Michael Curry, John Dickerson, Micah Goldblum, Tom Goldstein, Bo Li, Avi Schwarzschild AM 11:30 Poster Session Fri Dec 11, 06:00 AM AM

Classical machine learning research has been focused largely on models, optimizers, and Abstracts (1): computational challenges. As technical progress Abstract 2: What Do Our Models Learn? in and hardware advancements ease these Workshop on Dataset Curation and Security, challenges, practitioners are now fnding that the Madry 06:30 AM limitations and faults of their models are the result of their datasets. This is particularly true of Large-scale vision benchmarks have driven—and deep networks, which often rely on huge datasets often even defned—progress in machine that are too large and unwieldy for domain learning. However, these benchmarks are merely experts to curate them by hand. This workshop proxies for the real-world tasks we actually care addresses issues in the following areas: data about. How well do our benchmarks capture such harvesting, dealing with the challenges and tasks? opportunities involved in creating and labeling massive datasets; data security, dealing with In this talk, I will discuss the alignment between protecting datasets against risks of poisoning and our benchmark-driven ML paradigm and the real- backdoor attacks; policy, security, and privacy, world uses cases that motivate it. First, we will dealing with the social, ethical, and regulatory explore examples of biases in the ImageNet issues involved in collecting large datasets, dataset, and how state-of-the-art models exploit especially with regards to privacy; and data bias, them. We will then demonstrate how these biases related to the potential of biased datasets to arise as a result of design choices in the data result in biased models that harm members of collection and curation processes. certain groups. Dates and details can be found at [securedata.lol](https://securedata.lol/) Throughout, we illustrate how one can leverage relatively standard tools (e.g., crowdsourcing, Schedule image processing) to quantify the biases that we observe. Based on joint works with Logan Engstrom, Andrew Ilyas, Shibani Santurkar, 06:00 Dawn Song (topic TBD) Song Dimitris Tsipras and Kai Xiao. AM 06:30 What Do Our Models Learn? Madry AM 07:00 Discussion Machine Learning for Health (ML4H): AM Advancing Healthcare for All 07:15 Break AM Stephanie Hyland, Allen Schmaltz, Charles Onu, Ehi Nosakhare, Emily Alsentzer, Irene Y Chen, Matthew McDermott, Subhrajit Roy, Benjamin Akera, Dani 63 Dec. 11, 2020

Kiyasseh, Fabian Falck, Grifn Adams, Ioana Bica, 07:25 Break Oliver J Bear Don't Walk IV, Suproteem Sarkar, AM Stephen Pfohl, Andrew Beam, Brett Beaulieu-Jones, Danielle Belgrave, Tristan Naumann 07:40 Sponsor remarks: Modeling Pan- AM tumor, Personalized Healthcare Fri Dec 11, 06:00 AM Insights in a Multi-modal, Real-world Oncology Database with Sarah The application of machine learning to healthcare McGough is often characterised by the development of 08:00 Spotlight A-1: "ML4H Auditing: From cutting-edge technology aiming to improve AM Paper to Practice" Oala patient outcomes. By developing sophisticated 08:10 Spotlight A-2: "The unreasonable models on high-quality datasets we hope to AM efectiveness of Batch-Norm better diagnose, forecast, and otherwise statistics in addressing catastrophic characterise the health of individuals. At the forgetting across medical same time, when we build tools which aim to institutions" Gupta assist highly-specialised caregivers, we limit the 08:20 Spotlight A-3: "DeepHeartBeat: beneft of machine learning to only those who AM Latent trajectory learning of cardiac can access such care. The fragility of healthcare cycles using cardiac ultrasounds" access both globally and locally prompts us to Laumer ask, “How can machine learning be used to help 08:30 Poster session A enable healthcare for all?” - the theme of the AM 2020 ML4H workshop. 09:30 Lunch Participants at the workshop will be exposed to AM new questions in machine learning for healthcare, 12:30 Judy Gichoya: Operationalising and be prompted to refect on how their work sits PM Fairness in Medical Algorithms: A within larger healthcare systems. Given the grand challenge Gichoya growing community of researchers in machine 12:50 Ziad Obermeyer: Explaining Pain learning for health, the workshop will provide an PM Disparities Obermeyer opportunity to discuss common challenges, share 01:10 Panel with Judy Gichoya and Ziad expertise, and potentially spark new research PM Obermeyer directions. By drawing in experts from adjacent 01:45 Spotlight B-1: "A Bayesian disciplines such as public health, fairness, PM Hierarchical Network for Combining epidemiology, and clinical practice, we aim to Heterogeneous Data Sources in further strengthen the interdisciplinarity of Medical Diagnoses" Donnat machine learning for health. 01:55 Spotlight B-2: "Assessing racial PM inequality in COVID-19 testing with See our workshop for more information: https:// Bayesian threshold tests" Pierson ml4health.github.io/ 02:05 Spotlight B-3: "EEG-GCNN: PM Augmenting Electroencephalogram- Schedule based Neurological Disease Diagnosis using a Domain-guided Graph Convolutional Neural 06:00 Opening Remarks Network" Wagh AM 02:15 Poster session B 06:10 Noémie Elhadad: Large scale PM AM characterization for health equity 03:15 Break assessment Elhadad PM 06:30 Mark Dredze: Reducing Health 03:30 : Practical limitations of AM Disparities in the Future of Medicine PM today's deep learning in healthcare Dredze Ng 06:50 Panel with Noémie Elhadad and 03:50 Panel with Andrew Ng AM Mark Dredze PM 64 Dec. 11, 2020

04:10 Closing remarks Abstract 5: Break in Machine Learning for PM Health (ML4H): Advancing Healthcare for All, 07:25 AM

Abstracts (13): Click on "Open Link" to mingle with other attendees in the Gather.Town Lounge Abstract 2: Noémie Elhadad: Large scale characterization for health equity assessment in Machine Learning for Health Abstract 6: Sponsor remarks: Modeling Pan- (ML4H): Advancing Healthcare for All, tumor, Personalized Healthcare Insights in a Elhadad 06:10 AM Multi-modal, Real-world Oncology Database Large scale characterization for health equity with Sarah McGough in Machine Learning assessment for Health (ML4H): Advancing Healthcare for All, 07:40 AM

Please use the video feed above to watch this Abstract 3: Mark Dredze: Reducing Health talk. Post your questions at any time in Disparities in the Future of Medicine in RocketChat. Machine Learning for Health (ML4H): Advancing Healthcare for All, Dredze 06:30 AM Abstract 10: Poster session A in Machine Health disparities in the United States are one of Learning for Health (ML4H): Advancing the largest factors in reducing the health of the Healthcare for All, 08:30 AM population. Disparities means some groups have Click on "Open Link" to attend the poster session lower life expectancy, are dying at higher rates in Gather.Town from COVID-19, and utilize less mental health services, to name just a few examples. The future of medicine will be based on Artifcial Intelligence and new technological platforms that promise to Abstract 11: Lunch in Machine Learning for improve outcomes and reduce cost. Our role as AI Health (ML4H): Advancing Healthcare for researchers should be to ensure that these new All, 09:30 AM technologies also reduce health disparities. In Click on "Open Link" to mingle with other this talk I will describe recent work showing how attendees in the Gather.Town Lounge we can work to reduce health disparities in the future of medicine. By ensuring that our task, datasets, algorithms and evaluations are equitable and representative of all types of Abstract 12: Judy Gichoya: Operationalising patients, we can ensure that the research we Fairness in Medical Algorithms: A grand develop will reduce health disparities. challenge in Machine Learning for Health (ML4H): Advancing Healthcare for All, Gichoya 12:30 PM

Abstract 4: Panel with Noémie Elhadad and The year 2020 has brought into focus a second Mark Dredze in Machine Learning for Health pandemic of social injustice and systemic bias (ML4H): Advancing Healthcare for All, 06:50 with the disproportionate deaths observed for AM minority patients infected with COVID. As we observe an increase in development and Please use the video feed above to watch the adoption of AI for medical care, we note variable panel. Post your questions at any time in performance of the models when tested on RocketChat. previously unseen datasets, and also bias when the outcome proxies such as healthcare costs are utilized. Despite progressive maturity in AI development with increased availability of large 65 Dec. 11, 2020

open source datasets and regulatory guidelines, Lungren, Curt Langlotz, Nigam Shah, and several operationalizing fairness is difcult and remains more collaborators. largely unexplored. In this talk, we review the background/context for FAIR and UNFAIR sequelae of AI algorithms in healthcare, describe Abstract 21: Panel with Andrew Ng in practical approaches to FAIR Medical AI, and issue Machine Learning for Health (ML4H): a grand challenge with open/unanswered Advancing Healthcare for All, 03:50 PM questions. Please use the video feed above to watch this panel. Post your questions at any time in RocketChat. Abstract 14: Panel with Judy Gichoya and Ziad Obermeyer in Machine Learning for Health (ML4H): Advancing Healthcare for All, 01:10 PM Learning Meaningful Representations of Please use the video feed above to watch this Life (LMRL.org) panel. Post your questions at any time in

RocketChat. Elizabeth Wood, Debora Marks, Ray Jones, Adji Dieng, Alan Aspuru-Guzik, Anshul Kundaje, Barbara Engelhardt, Chang Liu, Edward Boyden, Kresten Lindorf-Larsen, Mor Nitzan, Smita Krishnaswamy, Abstract 18: Poster session B in Machine Wouter Boomsma, Yixin Wang, David Van Valen, Orr Learning for Health (ML4H): Advancing Ashenberg Healthcare for All, 02:15 PM Fri Dec 11, 06:00 AM Click on "Open Link" to attend the poster session in Gather.Town This workshop is designed to bring together trainees and experts in machine learning with those in the very forefront of biological research Abstract 19: Break in Machine Learning for today for this purpose. Our full-day workshop will Health (ML4H): Advancing Healthcare for advance the joint project of the CS and biology All, 03:15 PM communities with the goal of "Learning Meaningful Representations of Life" (LMRL), Click on "Open Link" to mingle with other emphasizing interpretable representation attendees in the Gather.Town Lounge learning of structure and principle. As last year, the workshop will be oriented around four layers of biological abstraction: molecule, cell, synthetic Abstract 20: Andrew Ng: Practical limitations biology, and phenotypes. of today's deep learning in healthcare in Machine Learning for Health (ML4H): Mapping structural molecular detail to organismal Advancing Healthcare for All, Ng 03:30 PM phenotype and function; predicting emergent efects of human genetic variation; and designing Recent advances in training deep learning novel interventions including prevention, algorithms have demonstrated potential to diagnostics, therapeutics, and the development accommodate the complex variations present in of new synthetic biotechnologies for causal medical data. In this talk, I will describe technical investigations are just some of the challenges advancements and challenges in the that hinge on appropriate formal structures to development and clinical application of deep make them accessible to the broadest possible learning algorithms designed to interpret medical community of computer scientists, , images. I will also describe advances and current and their tools. challenges in the deployment of medical imaging deep learning algorithms into practice. This talk presents work that is jointly done with Matt 66 Dec. 11, 2020

Schedule 09:18 Harlan Krumholz Krumholz AM 09:20 Chang Liu Liu 12:00 Meet in Gather.town AM AM 09:25 Geofrey Schiebinger Schiebinger 12:00 Join the Discussion in Slack AM AM 09:25 Christine Peter Peter 05:00 Hilary Finucane Finucane AM AM 09:34 John Chodera Chodera 05:25 Live Discussion with Jacob Ulirsch AM AM (Finucane Lab) 10:00 Samantha Riesenfeld Riesenfeld 05:40 Daniela Witten Witten AM AM 10:00 Hirunima Jayasekara Jayasekara 06:20 Poster Session I - All Posters AM AM 10:00 Phenotype Panel 07:00 Timothy Springer AM AM 10:00 Eli Weinstein Weinstein 07:15 Cell Panel AM AM 10:06 Caroline Weis Weis 08:00 Akiko Iwasaki Iwasaki AM AM 10:10 Maria Littmann Littmann 08:00 David Ryan Koes Koes AM AM 10:15 Martin Voegele Voegele 08:00 Barak Raveh Raveh AM AM 10:30 James Morton Morton 08:00 Viviana Gradinaru Gradinaru AM AM 10:32 Manik Kuchroo Kuchroo 08:00 Christina Leslie Leslie AM AM 10:35 Surojit Biswas Biswas, Biswas 08:21 Leeat Keren Keren AM AM 10:36 Claus Hélix-Nielsen 08:22 Cecilia Clementi Clementi AM AM 10:40 Juan Caicedo and Shantanu Singh 08:30 Tamara Broderick Broderick AM Singh, Caicedo AM 10:42 Max Shen Shen 08:30 Jose Miguel Hernandez Lobato AM AM Hernández-Lobato 10:54 Eli Draizen Draizen 08:37 Eran Segal Segal AM AM 11:00 David Baker Baker 08:42 David Zeevi Zeevi AM AM 12:10 Pamela Silver and Debora Marks 08:44 Pedro Beltrao Beltrao PM AM 12:30 Jennifer Listgarten Listgarten 09:04 Jesse Bloom Bloom PM AM 01:20 Sri Kosuri 09:09 Tamara Broderick PM AM 01:45 Sri Kosuri 09:10 Hattie Chung Chung PM AM 02:00 Poster Session II - All Posters PM 67 Dec. 11, 2020

between classical tensor network contraction algorithms and executing tensor contractions on quantum processors. The modern feld of First Workshop on Quantum Tensor quantum enhanced machine learning has started Networks in Machine Learning to utilize several tools from tensor network theory to create new quantum models of machine Xiao-Yang Liu, Qibin Zhao, Jacob Biamonte, Cesar F learning and to better understand existing ones. Caiafa, Paul Pu Liang, Nadav Cohen, Stefan Leichenauer The interplay between tensor networks, machine Fri Dec 11, 06:00 AM learning and quantum algorithms is rich. Indeed, this interplay is based not just on numerical Quantum tensor networks in machine learning methods but on the equivalence of tensor (QTNML) are envisioned to have great potential networks to various quantum circuits, rapidly to advance AI technologies. Quantum machine developing algorithms from the mathematics and learning promises quantum advantages physics communities for optimizing and (potentially exponential speedups in training, transforming tensor networks, and connections to quadratic speedup in convergence, etc.) over low-rank methods for learning. A merger of classical machine learning, while tensor networks tensor network algorithms with state-of-the-art provide powerful simulations of quantum approaches in deep learning is now taking place. machine learning algorithms on classical A new community is forming, which this computers. As a rapidly growing interdisciplinary workshop aims to foster. area, QTNML may serve as an amplifer for computational intelligence, a transformer for machine learning innovations, and a propeller for Schedule AI industrialization.

06:00 Opening Remarks Liu Tensor networks, a contracted network of factor AM tensors, have arisen independently in several 06:05 Invited Talk 1: Tensor Networks as a areas of science and engineering. Such networks AM Data Structure in Probabilistic appear in the description of physical processes Modeling and for Learning and an accompanying collection of numerical Dynamical Laws from Data Eisert techniques have elevated the use of quantum 06:35 Invited Talk 1 Q&A by Jens Eisert tensor networks into a variational model of AM machine learning. Underlying these algorithms is the compression of high-dimensional data 06:45 Invited Talk 2: Expressiveness in needed to represent quantum states of matter. AM Deep Learning via Tensor Networks These compression techniques have recently and Quantum Entanglement Cohen proven ripe to apply to many traditional problems 07:17 Invited Talk 2 Q&A by Cohen Cohen faced in deep learning. Quantum tensor networks AM have shown signifcant power in compactly 07:25 Invited Talk 3: Tensor Networks and representing deep neural networks, and efcient AM Counting Problems on the Lattice training and theoretical understanding of deep Verstraete neural networks. More potential QTNML 07:55 Invited Talk 3 Q&A by Frank technologies are rapidly emerging, such as AM Verstraete approximating probability functions, and 08:05 Invited Talk 4: Quantum in ML and probabilistic graphical models. However, the topic AM ML in Quantum Oseledets of QTNML is relatively young and many open 08:50 Invited Talk 4 Q&A by Ivan Oseledets problems are still to be explored. AM 09:00 Invited Talk 5: Live Presentation of Quantum algorithms are typically described by AM TensorLy By Jean Kossaif quantum circuits (quantum computational Anandkumar, Kossaif networks). These networks are indeed a class of tensor networks, creating an evident interplay 68 Dec. 11, 2020

09:40 Invited Talk 6: A Century of the 02:15 Invited Talk 9: Tensor Network AM Tensor Network Formulation from PM Models for Structured Data the Ising Model Nishino Rabusseau 10:07 Invited Talk 6 Q&A by Tomotoshi 02:51 Invited Talk 9 Q&A by Guillaume AM Nishino PM Rabusseau 10:15 Poster 1: Multi-Graph Tensor 03:00 Invited Talk 10: Getting Started with AM Networks by Yao Lei Xu Xu PM Tensor Networks Evenbly 10:18 Poster 2: High Performance Single- 03:30 Invited Talk 10 Q&A by Evenbly AM Site Finite DMRG on GPUs by Hao PM Evenbly Hong Hao 03:40 Contributed Talk 4: Paper 27: 10:21 Poster 3: Variational Quantum PM Limitations of gradient-based Born AM Circuit Model for Knowledge Graph Machine over tensornetworks on Embeddings by Yunpu Ma Ma learning quantum nonlocality Najaf 10:24 Poster 4: Hybrid quantum-classical 03:50 Contributed Talk 5: Paper 19: Deep AM classifer based on tensor network PM convolutional tensor network and variational quantum circuit by Blagoveschensky Samuel Yen-Chi Chen Chen 04:00 Poster 6: Paper 16: Quantum Tensor 10:27 Poster 5: A Neural Matching Model PM Networks for Variational AM based on Quantum Interference and Reinforcement Learning Fang Quantum Many-body System Gao 04:04 Poster 7: Paper 13: Quantum Tensor 10:30 Contributed Talk 1: Paper 3: Tensor PM Networks, Stochastic Processes, and AM network approaches for data-driven Weighted Automata identifcation of non-linear 04:07 Poster 8: Paper 24: Modeling dynamical laws Goeßmann PM Natural Language via Quantum 10:40 Contributed Talk 2: Paper 6: Many-body Wave Function and AM Anomaly Detections with Tensor Tensor Network, YAO Networks Wang 04:10 Invited Talk 11: Tensor Methods for 10:50 Contributed Talk 3: Paper 32: High- PM Efcient and Interpretable AM order Learning Model via Fractional Spatiotemporal Learning Yu Tensor Network Decomposition Li 04:32 Invited Talk 11 Q&A by Rose Yu 11:00 Panel Discussion 1: Theoretical, PM AM Algorithmic and Physical Biamonte, 04:40 Invited Talk 12: Learning Quantum Oseledets, Eisert, Cohen, Rabusseau, Liu PM Channels with Tensor Networks Torlai 11:45 Break 05:10 Invited Talk 12: Q&A Torlai AM PM 12:00 Panel Discussion 2: Software and 05:20 Closing Remarks Liu PM High Performance Implementation PM Evenbly, Ganahl, Springer, Liu 12:45 Break PM Abstracts (21):

01:00 Invited Talk 7: cuTensor: High- Abstract 1: Opening Remarks in First PM Performance CUDA Tensor Primitives Workshop on Quantum Tensor Networks in Springer Machine Learning, Liu 06:00 AM 01:28 Invited Talk 7 Q&A by Paul Springer PM A short introduction 01:35 Invited Talk 8: TensorNetwork: A PM Python Package for Tensor Network Computations Ganahl Abstract 2: Invited Talk 1: Tensor Networks 02:05 Invited Talk 8 Q&A by Martin Ganahl as a Data Structure in Probabilistic PM Modeling and for Learning Dynamical Laws from Data in First Workshop on Quantum 69 Dec. 11, 2020

Tensor Networks in Machine Learning, Eisert results that shed light on expressiveness in deep 06:05 AM learning, and in addition, provide new tools for deep network design. Recent years have enjoyed a signifcant interest Works covered in the talk were in collaboration in exploiting tensor networks in the context of with Yoav Levine, Or Sharir, Ronen Tamari, David machine learning, both as a tool for the Yakira and Amnon Shashua. formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. In this talk, we will explore two Abstract 6: Invited Talk 3: Tensor Networks readings of such a connection. On the one hand, and Counting Problems on the Lattice in we will consider the task of identifying the underlying non-linear governing equations, First Workshop on Quantum Tensor required both for obtaining an understanding and Networks in Machine Learning, Verstraete making future predictions. We will see that this 07:25 AM problem can be addressed in a scalable way An overview will be given of counting problems making use of tensor network based on the lattice, such as the calculation of the hard parameterizations for the governing equations. square constant and of the residual entropy of On the other hand, we will investigate the ice. Unlike Monte Carlo techniques which have expressive power of tensor networks in difculty in calculating such quantities, we will probabilistic modelling. Inspired by the demonstrate that tensor networks provide a connection of tensor networks and machine natural framework for tackling these problems. learning, and the natural correspondence We will also show that tensor networks reveal between tensor networks and probabilistic nonlocal hidden symmetries in those systems, graphical models, we will provide a rigorous and that the typical critical behaviour is analysis of the expressive power of various witnessed by matrix product operators which tensor-network factorizations of discrete form representations of tensor fusion categories. multivariate probability distributions. Joint work with A. Goeßmann, M. Götte, I. Roth, R. Sweke, G. Kutyniok, I. Glasser, N. Pancotti, J. I. Cirac. Abstract 8: Invited Talk 4: Quantum in ML and ML in Quantum in First Workshop on Quantum Tensor Networks in Machine Abstract 4: Invited Talk 2: Expressiveness in Learning, Oseledets 08:05 AM Deep Learning via Tensor Networks and Quantum Entanglement in First Workshop In this talk, I will cover recent results in two on Quantum Tensor Networks in Machine areas: 1) Using quantum-inspired methods in Learning, Cohen 06:45 AM machine learning, including using low- entanglement states (matrix product states/ Understanding deep learning calls for addressing tensor train decompositions) for diferent three fundamental questions: expressiveness, regression and classifcation tasks. 2) Using optimization and generalization. This talk will machine learning methods for efcient classical describe a series of works aimed at unraveling simulation of quantum systems. I will cover our some of the mysteries behind expressiveness. I results on simulating quantum circuits on parallel will begin by showing that state of the art deep computers using graph-based algorithms, and learning architectures, such as convolutional also efcient numerical methods for optimization networks, can be represented as tensor networks using tensor-trains for the computational of large --- a prominent computational model for quantum number (up to B=100) on GPUs. The code is a many-body simulations. This connection will combination of classical linear algebra inspire the use of quantum entanglement for algorithms, Riemannian optimization methods defning measures of data dependencies modeled and efcient software implementation in by deep networks. Next, I will turn to derive a TensorFlow. quantum max-fow / min-cut theorem characterizing the entanglement captured by 1. Rakhuba, M., Novikov, A. and Oseledets, I., deep networks. The theorem will give rise to new 2019. Low-rank Riemannian eigensolver for high- 70 Dec. 11, 2020

dimensional Hamiltonians. Journal of entanglement structure would be a target of Computational Physics, 396, pp.718-737. future studies in this direction. 2. Schutski, Roman, Danil Lykov, and Ivan Oseledets. Adaptive algorithm for quantum circuit simulation. Physical Review A 101, no. 4 (2020): Abstract 18: Contributed Talk 1: Paper 3: 042335. Tensor network approaches for data-driven 3. Khakhulin, Taras, Roman Schutski, and Ivan identifcation of non-linear dynamical laws Oseledets. Graph Convolutional Policy for Solving in First Workshop on Quantum Tensor Tree Decomposition via Reinforcement Learning Networks in Machine Learning, Goeßmann Heuristics. arXiv preprint arXiv:1910.08371 10:30 AM (2019). To date, scalable methods for data-driven identifcation of non-linear governing equations Abstract 10: Invited Talk 5: Live Presentation do not exploit or ofer insight into fundamental of TensorLy By Jean Kossaif in First underlying physical structure. In this work, we show that various physical constraints can be Workshop on Quantum Tensor Networks in captured via tensor network based Machine Learning, Anandkumar, Kossaif 09:00 AM parameterizations for the governing equation, which naturally ensures scalability. In addition to Live Presentation providing analytic results motivating the use of such models for realistic physical systems, we demonstrate that efcient rank-adaptive

Abstract 11: Invited Talk 6: A Century of the optimization algorithms can be used to learn Tensor Network Formulation from the Ising optimal tensor network models without requiring Model in First Workshop on Quantum Tensor a~priori knowledge of the exact tensor ranks. Networks in Machine Learning, Nishino 09:40 AM Abstract 19: Contributed Talk 2: Paper 6: A hundred years have passed since Ising model Anomaly Detections with Tensor Networks was proposed by Lenz in 1920. One fnds that the in First Workshop on Quantum Tensor square lattice Ising model is already an example Networks in Machine Learning, Wang 10:40 of two-dimensional tensor network (TN), which is AM formed by contracting 4-leg tensors. In 1941, Kramers and Wannier assumed a variational state Originating from condensed matter physics, in the form of the matrix product state (MPS), and tensor networks are compact representations of they optimized it `numerically'. Baxter reached high-dimensional tensors. In this paper, the the concept of the corner-transfer matrix (CTM), prowess of tensor networks is demonstrated on and performed a variational computation in 1968. the particular task of one-class anomaly Independently from these statistical studies, MPS detection. We exploit the memory and was introduced by Afeck, Lieb, Kennedy and computational efciency of tensor networks to Tasaki (AKLT) in 1987 for the study of one- learn a linear transformation over a space with dimensional quantum spin chain, by Derrida for dimension exponential in the number of original asymetric exclusion processes, and also features. The linearity of our model enables us to (implicitly) by the establishment of the density ensure a tight ft around training instances by matrix renormalization group (DMRG) by White in penalizing the model's global tendency to predict 1992. After a brief (?) introduction of these normality via its Frobenius norm---a task that is prehistories, I'll speak about my contribution to infeasible for most deep learning models. Our this area, the applications of DMRG and CTMRG method outperforms deep and classical methods to two-dimensional statistical models, algorithms on tabular datasets and produces including those on hyperbolic lattices, fractal competitive results on image datasets, despite systems, and random spin models. Analysis of not exploiting the locality of images. the spin-glass state, which is related to learning processes, from the view point of the 71 Dec. 11, 2020

Abstract 20: Contributed Talk 3: Paper 32: Software and High Performance Implementation High-order Learning Model via Fractional discussion of Quantum Tensor Networks in Tensor Network Decomposition in First Machine Learning. Workshop on Quantum Tensor Networks in Machine Learning, Li 10:50 AM

We consider high-order learning models, of which Abstract 25: Invited Talk 7: cuTensor: High- the weight tensor is represented by (symmetric) Performance CUDA Tensor Primitives in tensor network~(TN) decomposition. Although First Workshop on Quantum Tensor Networks in Machine Learning, Springer such models have been widely used on various 01:00 PM tasks, it is challenging to determine the optimal order in complex systems (e.g., deep neural This talk discusses cuTENSOR, a high- networks). To tackle this issue, we introduce a performance CUDA library for tensor operations new notion of \emph{fractional tensor that efciently handles the ubiquitous presence network~(FrTN)} decomposition, which of high-dimensional arrays (i.e., tensors) in generalizes the conventional TN models with an today's HPC and DL workloads. This library integer order by allowing the order to be an supports highly efcient tensor operations such arbitrary fraction. Due to the density of fractions as tensor contractions, element-wise tensor in the feld of real numbers, the order of the operations such as tensor permutations, and model can be formulated as a learnable tensor reductions. While providing high parameter and simply optimized by stochastic performance, cuTENSOR also enables users to gradient descent~(SGD) and its variants. express their mathematical equations for tensors Moreover, it is uncovered that FrTN strongly in a straightforward way that hides the connects to well-known methods such as $ complexity of dealing with these high- \ell_p$-pooling~\cite{gulcehre2014learned} and dimensional objects behind an easy-to-use API. ``squeeze-and- excitation''~\cite{hu2018squeeze} operations in the deep learning studies. On the numerical side, we apply the proposed model to enhancing the Abstract 27: Invited Talk 8: TensorNetwork: A Python Package for Tensor Network classic ResNet-26/50~\cite{he2016deep} and Computations in First Workshop on MobileNet-v2~\cite{sandler2018mobilenetv2} on both CIFAR-10 and ILSVRC-12 classifcation tasks, Quantum Tensor Networks in Machine and the results demonstrate the efectiveness Learning, Ganahl 01:35 PM brought by the learnable order parameters in TensorNetwork is an open source python package FrTN. for tensor network computations. It has been designed with the goal in mind to help researchers and engineers with rapid Abstract 21: Panel Discussion 1: Theoretical, development of highly efcient tensor network Algorithmic and Physical in First Workshop algorithms for physics and machine learning on Quantum Tensor Networks in Machine applications. After a brief introduction to tensor Learning, Biamonte, Oseledets, Eisert, Cohen, networks, I will discuss some of the main design Rabusseau, Liu 11:00 AM principles of the TensorNetwork package, and show how one can use it to speed up tensor Theoretical, Algorithmic and Physical Discussions network algorithms by running them on of Quantum Tensor Networks in Machine accelerated hardware, or by exploiting tensor Learning. sparsity.

Abstract 23: Panel Discussion 2: Software Abstract 29: Invited Talk 9: Tensor Network and High Performance Implementation in Models for Structured Data in First First Workshop on Quantum Tensor Workshop on Quantum Tensor Networks in Networks in Machine Learning, Evenbly, Machine Learning, Rabusseau 02:15 PM Ganahl, Springer, Liu 12:00 PM 72 Dec. 11, 2020

In this talk, I will present uniform tensor network striking features of quantum states such as models (also known translation invariant tensor entanglement. An important category of highly networks) which are particularly suited for entangled quantum states are Greenberger- modelling structured data such as sequences and Horne-Zeilinger (GHZ) states which play key roles trees. Uniform tensor networks are tensor in various quantum-based technologies and are networks where the core tensors appearing in the particularly of interest in benchmarking noisy decomposition of a given tensor are all equal, quantum hardwares. A novel quantum inspired which can be seen as a weight sharing generative model known as Born Machine which mechanism in tensor networks. In the frst part of leverages on probabilistic nature of quantum the talk, I will show how uniform tensor networks physics has shown a great success in learning are particularly suited to represent functions classical and quantum data over tensor network defned over sets of structured objects such as (TN) architecture. To this end, we investigate the sequences and trees. I will then present how task of training the Born Machine for learning the these models are related to classical GHZ state over two diferent architectures of computational models such as hidden Markov tensor networks. Our result indicates that models, weighted automata, second-order gradient-based training schemes over TN Born recurrent neural networks and context free Machine fails to learn the non-local information of grammars. In the second part of the talk, I will the coherent superposition (or parity) of the GHZ present a classical learning algorithm for state. This leads to an important question of what weighted automata and show how and it can be kind of architecture design, initialization and interpreted as a mean to convert non-uniform optimization schemes would be more suitable to tensor networks to uniform ones. Lastly, I will learn the non-local information hidden in the present ongoing work leveraging the tensor quantum state and whether we can adapt network formalism to design efcient and quantum-inspired training algorithms to learn versatile probabilistic models for sequence data. such quantum states.

Abstract 31: Invited Talk 10: Getting Started Abstract 34: Contributed Talk 5: Paper 19: with Tensor Networks in First Workshop on Deep convolutional tensor network in First Quantum Tensor Networks in Machine Workshop on Quantum Tensor Networks in Learning, Evenbly 03:00 PM Machine Learning, Blagoveschensky 03:50 PM

I will provide an overview of the tensor network Neural networks have achieved state of the art formalism and its applications, and discuss the results in many areas, supposedly due to key operations, such as tensor contractions, parameter sharing, locality, and depth. Tensor required for building tensor network algorithms. I networks (TNs) are linear algebraic will also demonstrate the TensorTrace graphical representations of quantum many-body states interface, a software tool which is designed to based on their entanglement structure. TNs have allow users to implement and code tensor found use in machine learning. We devise a novel network routines easily and efectively. Finally, TN based model called Deep convolutional tensor the utility of tensor networks towards tasks in network (DCTN) for image classifcation, which machine learning will be briefy discussed. has parameter sharing, locality, and depth. It is based on the Entangled plaquette states (EPS) TN. We show how EPS can be implemented as a Abstract 33: Contributed Talk 4: Paper 27: backpropagatable layer. We test DCTN on MNIST, FashionMNIST, and CIFAR10 datasets. A shallow Limitations of gradient-based Born Machine DCTN performs well on MNIST and FashionMNIST over tensornetworks on learning quantum nonlocality in First Workshop on Quantum and has a small parameter count. Unfortunately, Tensor Networks in Machine Learning, Najaf depth increases overftting and thus decreases 03:40 PM test accuracy. Also, DCTN of any depth performs badly on CIFAR10 due to overftting. It is to be Nonlocality is an important constituent of determined why. We discuss how the quantum physics which lies at the heart of many 73 Dec. 11, 2020

hyperparameters of DCTN afect its training and overftting. Human in the loop dialogue systems

Abstract 38: Invited Talk 11: Tensor Methods Behnam Hedayatnia, Rahul Goel, Shereen Oraby, for Efcient and Interpretable Abigail See, Chandra Khatri, Y-Lan Boureau, Alborz Spatiotemporal Learning in First Workshop Geramifard, Marilyn Walker, Dilek Hakkani-Tur on Quantum Tensor Networks in Machine Fri Dec 11, 06:10 AM Learning, Yu 04:10 PM

Multivariate spatiotemporal data is ubiquitous in Conversational interaction systems such as science and engineering, from climate science to Amazon Alexa, , Apple Siri, and sports analytics, to neuroscience. Such data Microsoft Cortana have become very popular contain higher-order correlations and can be over the recent years. Such systems have represented as a tensor. Tensor latent factor allowed users to interact with a wide variety of models provide a powerful tool for reducing content on the web through a conversational dimensionality and discovering higher-order interface. Research challenges such as the structures. However, existing tensor models are Dialogue System Technology Challenges, often slow or fail to yield interpretable latent Dialogue Dodecathlon, Amazon Alexa Prize and factors. In this talk, I will demonstrate advances the Vision and Language Navigation task have in tensor methods to generate interpretable continued to inspire research in conversational AI. latent factors for high-dimensional These challenges have brought together spatiotemporal data. We provide theoretical researchers from diferent communities such as guarantees and demonstrate their applications to , spoken language real-world climate, basketball, and neuroscience understanding, reinforcement learning, language data. generation, and multi-modal question answering. Unlike other popular NLP tasks, dialogue frequently has humans in the loop, whether it is for evaluation, active learning or online reward Abstract 40: Invited Talk 12: Learning estimation. Through this workshop we aim to Quantum Channels with Tensor Networks in bring together researchers from academia and First Workshop on Quantum Tensor industry to discuss the challenges and Networks in Machine Learning, Torlai 04:40 opportunities in such human in the loop setups. PM We hope that this sparks interesting discussions We present a new approach to quantum process about conversational agents, interactive systems, tomography, the reconstruction of an unknown and how we can use humans most efectively quantum channel from measurement data. when building such setups. We will highlight Specifcally, we combine a tensor-network areas such as human evaluation setups, representation of the Choi matrix (a complete reliability in human evaluation, human in the loop description of a quantum channel), with training, interactive learning and user modeling. unsupervised machine learning of single-shot We also highly encourage non-English based projective measurement data. We show dialogue systems in these areas. numerical experiments for both unitary and noisy The one-day workshop will include talks from quantum circuits, for a number of qubits well senior technical leaders and researchers to share beyond the reach of standard process insights associated with evaluating dialogue tomography techniques. systems. We also plan on having oral presentations and poster sessions on works related to the topic of the workshop. Finally we will end the workshop with an interactive panel of Abstract 42: Closing Remarks in First speakers. As an outcome we expect the Workshop on Quantum Tensor Networks in participants from the NeurIPS community to walk Machine Learning, Liu 05:20 PM away with better understanding of human in the TBD loop dialogue modeling as well as key areas of 74 Dec. 11, 2020

research in this feld. Additionally we would like to 01:05 Invited Talk 5 Q/A - Zhou Yu Yu see discussions around the unifcation of human PM evaluation setups in some way. 01:20 Invited Talk 6 Q/A - Gokhan Tür Tur PM **This workshop will consist of live QA sessions. 01:35 Reserved Block of time to watch Pre- Therefore, in order to get the most out of the PM recorded Contributed Talks 5, 6 workshop, it is recommended that you watch all 01:55 Contributed Talk 5 Q/A Ng the prerecorded talks before the workshop day. PM Additionally we have put Reserved blocks of time 02:05 Contributed Talk 6 Q/A Lin as an opportunity to watch the pre-recorded talks PM before the Q/A.** 02:20 Reserved Block of time to watch Pre- PM recorded Invited Talks 7, 8, 9 Schedule 03:30 Invited Talk 7 Q/A - Ankur Parikh PM Parikh 03:45 Invited Talk 8 Q/A - Percy Liang Liang 06:10 Welcome and Opening Remarks PM AM Hedayatnia 04:00 Invited Talk 9 Q/A - Alexander 06:20 Reserved Block of time to watch Pre- PM Rudnicky Rudnicky AM recorded Invited Talks 1, 2, 3 04:15 Panel Eskenazi, Parikh, Thattai, 07:50 Invited Talk 1 Q/A - Milica Gašić Gasic PM Rudnicky, Weston AM 05:15 Closing Remarks / Best Paper Award 08:05 Invited Talk 2 Q/A - Larry Heck Heck PM Hedayatnia AM N/A Contributed Talk 4 Presentation - 08:20 Invited Talk 3 Q/A - Maxine Eskenazi "NICE: Neural Image Commenting AM Eskenazi, Mehri Evaluation with an Emphasis on 08:35 Reserved Block of time to watch Pre- Emotion and Empathy" Huang AM recorded Contributed Talks 1, 2 N/A Invited Talk 9 Presentation - 09:05 Contributed Talk 1 Q/A Jhan Alexander Rudnicky - Creating AM socialbots with human-like 09:15 Contributed Talk 2 Q/A Cordier conversational abilities Rudnicky AM N/A Contributed Talk 3 Presentation - 09:25 Poster Session Presentations "The Lab vs The Crowd: An AM Chhibber, Lu, Rojas-Barahona, Stasaski, Investigation into Data Quality for Park, Thattai, Augustin, Veron, Neural Dialogue Models" Águas Lopes Mazumder, Krivosheev, Bozzon N/A Contributed Talk 5 Presentation - 10:30 Breakout session: Human Evaluation "Improving Dialogue Breakdown AM Hedayatnia Detection with Semi-Supervised 10:30 Breakout session: Automatic Learning" Ng AM Evaluation Liu N/A Invited Talk 4 Presentation - Jason 11:00 Reserved Block of time to watch Pre- Weston - (Towards) Learning from AM recorded Contributed Talks 3, 4 Conversing Weston 11:20 Contributed Talk 3 Q/A Águas Lopes N/A Invited Talk 3 Presenation - Maxine AM Eskenazi - Human > User in the Loop 11:30 Contributed Talk 4 Q/A Huang, Chen Eskenazi AM N/A Invited Talk 5 Presentation - Zhou Yu 11:40 Reserved Block of time to watch Pre- - Augment Intelligence with AM recorded Invited Talks 4, 5, 6 Multimodal Information Yu 12:50 Invited Talk 4 Q/A - Jason Weston N/A Invited Talk 1 Presentation - Milica PM Weston Gašić - On the track of multi-domain dialogue models Gasic 75 Dec. 11, 2020

N/A Invited Talk 2 Presentation - Larry I will present my recent research on expanding Heck - Master-Apprentice Learning the AI skills of digital assistants through explicit Heck human-in-the-loop dialogue and demonstrations. N/A Invited Talk 8 Presentation - Percy Digital assistants learn from other digital Liang - Semantic Parsing for Natural assistants with each assistant initially trained Language Interfaces Liang through human interaction in the style of N/A Invited Talk 7 Presentation - Ankur a“Master and Apprentice”. For example, when a Parikh - Towards High Precision Text digital assistant does not know how to complete Generation Parikh a requested task, rather than responding “I do not know how to do this yet”, the digital assistant N/A Contributed Talk 1 Presentation - responds with an invitation to the human“can you "CheerBots: Chatbots toward teach me?”. Apprentice-style learning is powered Empathy and Emotion using by a combination of all the modalities: natural Reinforcement Learning" Jhan language conversations, non-verbal modalities N/A Contributed Talk 2 Presentation - including gestures, touch, robot manipulation and "Diluted Near-Optimal Expert motion, gaze, images/videos, and speech Demonstrations for Guiding prosody. The new apprentice learning model is Dialogue Stochastic Policy always helpful and always learning in an open Optimisation" Cordier world – as opposed to the current commercial N/A Invited Talk 6 Presentation - Gokhan digital assistants that are sometimes helpful, Tür - Past, Present, Future of trained exclusively ofine, and function over a Conversational AI Tur closed world of “walled garden” knowledge. N/A Contributed Talk 6 Presentation - Master-Apprentice learning has the potential to "Dialog Simulation with Realistic yield exponential growth in the collective Variations for Training Goal-Oriented intelligence of digital assistants. Conversational Systems" Lin

Abstracts (18): Abstract 5: Invited Talk 3 Q/A - Maxine Eskenazi in Human in the loop dialogue Abstract 3: Invited Talk 1 Q/A - Milica Gašić in systems, Eskenazi, Mehri 08:20 AM Human in the loop dialogue systems, Gasic 07:50 AM Most of the work on intelligent agents in the past has centered on the agent itself, ignoring the Current dialogue models are unnatural, narrow in needs and opinions of the user. We will show that domain and frustrating for users. Ultimately, we it is essential to include the user in agent would rather like to converse with continuously development and assessment. There is a evolving, human-like dialogue models at ease signifcant advantage to relying on real users as with large and extending domains. Limitations of opposed to paid users, which are the most the dialogue state tracking module, which prevalent at present. This introduces a study to maintains all information about what has assess system generation that employed the happened in the dialogue so far, are central to user’s following utterance for a more realistic this challenge. Its ability to extend its domain of picture of the appropriateness of an utterance. operation is directly related to how natural the This takes us to a discussion of user-centric user perceives the system. I will talk about some evaluation where two novel metrics, USR and of the latest research coming from the HHU FED, are introduced. Finally we present an Dialogue Systems and Machine Learning group interactive Challenge with real users held as a that addresses this question. thread of DSTC9.

Abstract 4: Invited Talk 2 Q/A - Larry Heck in Abstract 7: Contributed Talk 1 Q/A in Human Human in the loop dialogue systems, Heck in the loop dialogue systems, Jhan 09:05 AM 08:05 AM 76 Dec. 11, 2020

Apart from the coherence and fuency of Chhibber, Lu, Rojas-Barahona, Stasaski, Park, responses, an empathetic chatbot emphasizes Thattai, Augustin, Veron, Mazumder, Krivosheev, more on people's feelings. By considering Bozzon 09:25 AM altruistic behaviors between human interaction, empathetic chatbots enable people to get a Gather.town room is linked better interactive and supportive experience. This study presents a framework whereby several https://neurips.gather.town/app/ PWaiZS2fB5KdXNUK/HLDS%20Poster%20Session empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns. We call these chatbots CheerBots. CheerBots can be Abstract 10: Breakout session: Human retrieval-based or generative-based and were Evaluation in Human in the loop dialogue fnetuned by deep reinforcement learning. To systems, Hedayatnia 10:30 AM respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as https://us02web.zoom.us/j/71869602731? aids for CheerBots in training with considerations pwd=dFRoY3JwVUp6d2pOd3Q2ZXp3U3Z0QT09 on changes in user's emotional states in the Meeting ID: 718 6960 2731 future to arouse sympathy. Finally, automatic metrics and human rating results demonstrate that CheerBots outperform other baseline Passcode: HLDS chatbots and achieves reciprocal altruism. The code and the pre-trained models will be made available. Abstract 11: Breakout session: Automatic Evaluation in Human in the loop dialogue systems, Liu 10:30 AM

Abstract 8: Contributed Talk 2 Q/A in Human https://us02web.zoom.us/j/81906080248? in the loop dialogue systems, Cordier 09:15 pwd=aVlOdzFoZzJHWjZoaFlTODRVTEwxdz09 AM

These interactions can be taken from either Meeting ID: 819 0608 0248 human-to-human or human-machine conversations. However, human interactions are Passcode: HLDS scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's Abstract 13: Contributed Talk 3 Q/A in Human exploration with the help of an expert. We in the loop dialogue systems, Águas Lopes present in this paper several imitation learning 11:20 AM strategies for dialogue policy where the guiding Challenges around collecting and processing expert is a near-optimal handcrafted policy. We incorporate these strategies with state-of-the-art quality data have hampered progress in data- reinforcement learning methods based on Q- driven dialogue models. Previous approaches are learning and actor-critic. We notably propose a moving away from costly, resource-intensive lab settings, where collection is slow but where the randomised exploration policy which allows for a data is deemed of high quality. The advent of seamless hybridisation of the learned policy and the expert. Our experiments show that our crowd-sourcing platforms, such as Amazon hybridisation strategy outperforms several Mechanical Turk, has provided researchers with baselines, and that it can accelerate the learning an alternative cost-efective and rapid way to collect data. However, the collection of fuid, when facing real humans. natural spoken or textual interaction can be challenging, particularly between two crowd- sourced workers. In this study, we compare the Abstract 9: Poster Session Presentations in performance of dialogue models for the same Human in the loop dialogue systems, interaction task but collected in two diferent 77 Dec. 11, 2020

settings: in the lab vs. crowd-sourced. We fnd Abstract 17: Invited Talk 5 Q/A - Zhou Yu in that fewer lab dialogues are needed to reach Human in the loop dialogue systems, Yu similar accuracy, less than half the amount of lab 01:05 PM data as crowd-sourced data. We discuss the advantages and disadvantages of each data Augment Intelligence with Multimodal Information collection method.

Abstract 18: Invited Talk 6 Q/A - Gokhan Tür Abstract 14: Contributed Talk 4 Q/A in Human in Human in the loop dialogue systems, Tur in the loop dialogue systems, Huang, Chen 01:20 PM

11:30 AM Recent advances in deep learning based methods for language processing, especially using self- Emotion and empathy are examples of human supervised learning methods resulted in new qualities lacking in many human-machine interactions. The goal of our work is to generate excitement towards building more sophisticated engaging dialogue grounded in a user-shared Conversational AI systems. While this is partially image with increased emotion and empathy while true for social chatbots or retrieval based applications, the underlying skeleton of the goal minimizing socially inappropriate or ofensive oriented systems has remained unchanged: Still outputs. We release the Neural Image Commenting Evaluation (NICE) dataset consisting most language understanding models rely on of almost two million images and their supervised methods with manually annotated corresponding, human-generated comments, as datasets even though the resulting performances are signifcantly better with much less data. In well as a set of baseline models and over 28,000 this talk I will cover two directions we are human annotated samples. Instead of relying on manually labeled emotions, we also use exploring to break from this: The frst approach is automatically generated linguistic aiming to incorporate multimodal information for representations as a source of weakly supervised better understanding and semantic grounding. The second part introduces an interactive self- labels. Based on the annotations, we defne two supervision method to gather immediate diferent task settings on the NICE dataset. Then, we propose a novel model - Modeling Afect actionable user feedback converting frictional Generation for Image Comments (MAGIC) - which moments into learning opportunities for aims to generate comments for images, interactive learning.​ conditioned on linguistic representations that capture style and afect, and to help generate more empathetic, emotional, engaging and Abstract 20: Contributed Talk 5 Q/A in Human socially appropriate comments. Using this model in the loop dialogue systems, Ng 01:55 PM we achieve state-of-the-art performance on one Building user trust in dialogue agents requires setting and set a benchmark for the NICE smooth and consistent dialogue exchanges. dataset. Experiments show that our proposed method can generate more human-like and However, agents can easily lose conversational engaging image comments. context and generate irrelevant utterances. We call these situations dialogue breakdown, where agent ut- terances prevent users from continuing the conversation. Building systems to detect Abstract 16: Invited Talk 4 Q/A - Jason Weston dialogue breakdown allows agents to recover in Human in the loop dialogue systems, appropriately or avoid breakdown entirely. In this Weston 12:50 PM paper we investigate the use of semi-supervised learning methods to improve dialogue breakdown (Towards) Learning from Conversing detection, including continued pre-training on the dataset and a manifold-based data augmentation method. We demonstrate the efectiveness of these methods on the Dialogue Breakdown Detection Challenge (DBDC) English 78 Dec. 11, 2020

shared task. Our submissions to the 2020 DBDC5 Despite large advances in neural text generation shared task place frst, beating baselines and in terms of fuency, existing generation other submissions by over 12% accuracy. In abla- techniques are prone to hallucination and often tions on DBDC4 data from 2019, our semi- produce output that is unfaithful or irrelevant to supervised learning methods improve the the source text. In this talk, we take a multi- performance of a baseline BERT model by 2% faceted approach to this problem from 3 aspects: accuracy. These methods are applicable generally data, evaluation, and modeling. From the data to any dialogue task and provide a simple way to standpoint, we propose ToTTo, a tables-to-text- improve model performance. dataset with high quality annotator revised references that we hope can serve as a benchmark for high precision text generation task. While the dataset is challenging, existing n- Abstract 21: Contributed Talk 6 Q/A in Human in the loop dialogue systems, Lin 02:05 PM gram based evaluation metrics are often insufcient to detect hallucinations. To this end, Goal-oriented dialog systems enable users to we propose BLEURT, a fully learnt end-to-end complete specifc goals like requesting metric based on transfer learning that can quickly information about a movie or booking a ticket. adapt to measure specifc evaluation criteria. Typically the dialog system pipeline contains Finally, we propose a model based on confdence multiple ML models, including natural language decoding to mitigate hallucinations. understanding, state tracking and action prediction (policy learning). These models are trained through a combination of supervised or Abstract 24: Invited Talk 8 Q/A - Percy Liang reinforcement learning methods and therefore in Human in the loop dialogue systems, require collection of labeled domain specifc Liang 03:45 PM datasets. However, collecting annotated datasets with language and dialog-fow variations is Natural language promises to be the ultimate expensive, time- consuming and scales poorly interface for interacting with computers, allowing due to human involvement. In this paper, we users to efortlessly tap into the wealth of digital propose an approach for automatically creating a information and extract insights from it. Today, large corpus of annotated dialogs from a few virtual assistants such as Alex, Siri, and Google thoroughly annotated sample dialogs and the Assistant have given a glimpse into how this dialog schema. Our approach includes a novel long-standing dream can become a reality, but goal-sampling technique for sampling plausible there is still much work to be done. In this talk, I user goals and a dialog simulation technique that will discuss building natural language interfaces uses heuristic interplay between the user and the based on semantic parsing, which converts system, where the user tries to achieve the natural language into programs that can be sampled goal. We validate our approach by executed by a computer. There are multiple generating data and training three diferent challenges for building semantic parsers: how to downstream conversational ML models. We acquire data without requiring laborious achieve 18 − 50% relative accuracy annotation, how to represent the meaning of improvements on a held-out test set compared to sentences, and perhaps most importantly, how to a baseline dialog generation approach that only widen the domains and capabilities of a semantic samples natural language and entity value parser. Finally, I will talk about a new promising variations from existing catalogs but does not paradigm for tackling these challenges based on generate any novel dialog fow variations. We learning interactively from users. also qualitatively establish that the proposed approach is better than the baseline.

Abstract 25: Invited Talk 9 Q/A - Alexander Rudnicky in Human in the loop dialogue Abstract 23: Invited Talk 7 Q/A - Ankur Parikh systems, Rudnicky 04:00 PM in Human in the loop dialogue systems, We have two diferent communities in spoken Parikh 03:30 PM language interaction, one focused on goal- 79 Dec. 11, 2020

oriented dialog systems, the other on open- Schedule domain conversational agents. The latter has allowed us to focus on the mechanics of conversation and on the role of social behaviors. 06:15 Opening Remarks Bertinetto This talk describes some of our recent work on AM conversation systems. 06:31 Francis Bach - Where is Machine AM Learning Going? Bach 07:01 Yoshua Bengio - Incentives for AM Researchers Bengio The pre-registration experiment: an 07:31 Contributed talk - Contrastive Self- alternative publication model for machine AM Supervised Learning for Skeleton learning research Action Recognition Du 07:36 Contributed talk - PCA Retargeting: Luca Bertinetto, João Henriques, Samuel Albanie, AM Encoding Linear Shape Models as Michela Paganini, Gul Varol Convolutional Mesh Autoencoders O' Sullivan Fri Dec 11, 06:15 AM 07:41 Contributed talk - Testing the AM Genomic Bottleneck Hypothesis in Machine learning research has benefted Hebbian Meta-Learning Palm considerably from the adoption of standardised 07:46 Contributed talk - Policy public benchmarks. In this workshop proposal, we AM Convergence Under the Infuence of do not argue against the importance of these Antagonistic Agents in Markov benchmarks, but rather against the current Games Dowling incentive system and its heavy reliance upon performance as a proxy for scientifc progress. 08:01 Poster session (on gather.town) The status quo incentivises researchers to “beat AM the state of the art”, potentially at the expense of 09:00 Break 1 deep scientifc understanding and rigorous AM experimental design. Since typically only positive 10:31 Joelle Pineau - Can pre-registration results are rewarded, the negative results AM lead to better reproducibility in ML inevitably encountered during research are often research? Pineau omitted, allowing many other groups to 11:01 Contributed talk - Confronting unknowingly and wastefully repeat the same AM Domain Shift in Trained Neural negative fndings. Pre-registration is a publishing Networks Martinez and reviewing model that aims to address these 11:06 Contributed talk - Unsupervised issues by changing the incentive system. A pre- AM Resource Allocation with Graph registered paper is a regular paper that is Neural Networks Cranmer submitted for peer-review without any 11:11 Contributed talk - FedPerf: A experimental results, describing instead an AM Practitioners' Guide to Performance experimental protocol to be followed after the of Federated Learning Algorithms paper is accepted. This implies that it is Semwal important for the authors to make compelling 11:16 Contributed talk - On the low- arguments from theory or past published AM density latent regions of VAE-based evidence. As for reviewers, they must assess language models Li these arguments together with the quality of the 11:31 Jessica Zosa Forde - Build, Start, experimental design, rather than comparing AM Run, Push: Computational numeric results. In this workshop, we propose to Registration of ML Experiments Forde conduct a full pilot study in pre-registration for 12:00 Introduction to break 2 machine learning. It follows a successful small- PM scale trial of pre-registration in computer vision and is more broadly inspired by the success of 12:01 Break 2 pre-registration in the life sciences. PM 80 Dec. 11, 2020

12:31 Kirstie Whitaker - The Turing Way: physics in machine learning” with the aim of: PM Transparent research through the 1. Narrowing the gap and fostering synergies scientifc lifecycle Whitaker between the computer vision, graphics, physics, 01:01 Open Discussion and machine learning communities PM 2. Debating the promise and perils of N/A Closing remarks diferentiable methods, and identifying challenges that need to be overcome 3. Raising awareness about these techniques to the larger ML community Diferentiable computer vision, graphics, 4. Discussing the broader impact of such and physics in machine learning techniques, and any ethical implications thereof.

Krishna Jatavallabhula, Kelsey Allen, Victoria Dean, Johanna Hansen, Shuran Song, Florian Shkurti, Liam Schedule Paull, Derek Nowrouzezahrai, Josh Tenenbaum

Fri Dec 11, 06:45 AM 06:45 Opening remarks Jatavallabhula, Allen, AM Hansen, Dean “Diferentiable programs” are parameterized 07:00 Sanja Fidler Fidler programs that allow themselves to be rewritten AM by gradient-based optimization. They are 07:30 Andrea Tagliasacchi Tagliasacchi ubiquitous in modern-day machine learning. AM Recently, explicitly encoding our knowledge of 08:02 Peter Battaglia Battaglia the rules of the world in the form of diferentiable AM programs has become more popular. In 08:32 Peter Battaglia - Q&A particular, diferentiable realizations of well- AM studied processes such as physics, rendering, 08:38 Camillo Jose Taylor Taylor projective geometry, optimization to name a few, AM have enabled the design of several novel learning techniques. For example, many approaches have 08:54 Camillo Jose Taylor - Q&A been proposed for of depth AM estimation from unlabeled videos. Diferentiable 09:00 Oral 01: phifow - A diferentiable 3D reconstruction pipelines have demonstrated AM PDE solving framework for deep the potential for task-driven representation learning via physical simulations learning. A number of diferentiable rendering Thuerey approaches have been shown to enable single- 09:13 Oral 02: Diferentiable HDR image view 3D reconstruction and other inverse AM synthesis using multi-exposure graphics tasks (without requiring any form of 3D images Kim supervision). Diferentiable physics simulators are 09:23 Oral 03: DELUCA - Diferentiable being built to perform physical parameter AM control library - environments, estimation from video or for model-predictive methods, and benchmarking Gradu control. While these advances have largely 09:35 Oral 04: Blendshape-augmented occurred in isolation, recent eforts have AM facial action units detection Cui attempted to bridge the gap between the 09:44 Oral 05: Inverse articulated-body aforementioned areas. Narrowing the gaps AM dynamics from video via variational between these otherwise isolated disciplines sequential Monte-Carlo Biderman holds tremendous potential to yield new research 09:58 Contributed Talk - Q&A directions and solve long-standing problems, AM particularly in understanding and reasoning 10:10 Bethany Lusch Lusch about the 3D world. AM 10:36 Bethany Lusch - Q&A Hence, we propose the “frst workshop on AM diferentiable computer vision, graphics, and 81 Dec. 11, 2020

10:42 Yuanming Hu Hu N/A Poster 01: Using diferentiable AM physics for self-supervised 11:14 Yuanming Hu - Q&A assimilation of chaotic dynamical AM systems McCabe 11:20 Georgia Gkioxari Gkioxari N/A Poster 03: Diferentiable data AM augmentation with Kornia Shi 11:40 Georgia Gkioxari - Q&A N/A Poster 04: Semantic adversarial AM robustness with diferentiable ray- 11:46 Ming Lin Lin tracing Venkatesh AM N/A Poster 12: Spring-Rod system 12:16 Panel Discussion identifcation via diferentiable PM physics engine Wang 01:15 Poster session (gather.town) PM N/A Poster 06: Instance-wise depth and Causal Discovery and Causality-Inspired motion learning from monocular Machine Learning videos Lee N/A Poster 13: End-to-end diferentiable Biwei Huang, Sara Magliacane, Kun Zhang, Danielle 6DoF object pose estimation with Belgrave, Elias Bareinboim, Daniel Malinsky, Thomas Richardson, Christopher Meek, Peter Spirtes, local and global constraints Gupta, Bernhard Schölkopf Medhi, Chattopadhyay, Gupta

N/A Poster 07: System level Fri Dec 11, 06:50 AM diferentiable simulation of radio access networks Rivkin Causality is a fundamental notion in science and N/A Poster 10: Tractable loss function engineering, and one of the fundamental and color image generation of problems in the feld is how to fnd the causal multinary restricted Boltzmann structure or the underlying causal model. For machine Hwang instance, one focus of this workshop is on *causal N/A Poster 11: Diferentiable path discovery*, i.e., how can we discover causal tracing by regularizing structure over a set of variables from discontinuities Quinn observational data with automated procedures? N/A Poster 14: MSR-Net: Multi-scale Another area of interest is *how a causal relighting network for one-to-one perspective may help understand and solve relighting Shah advanced machine learning problems*. N/A Poster 08: Solving physics puzzles Recent years have seen impressive progress in by reasoning about paths Harter theoretical and algorithmic developments of N/A Poster 02: Learned equivariant causal discovery from various types of data (e.g., rendering without transformation from i.i.d. data, under distribution shifts or in supervision Resnick nonstationary settings, under latent confounding N/A Poster 09: Sparse-input neural or selection bias, or with missing data), as well as network augmentations for in practical applications (such as in neuroscience, diferentiable simulators Heiden, climate, biology, and epidemiology). However, Millard many practical issues, including confounding, the N/A Poster 05: Inverse graphics GAN Lunz large scale of the data, the presence of N/A Poster 15: Towards end-to-end measurement error, and complex causal training of proposal-based 3D mechanisms, are still to be properly addressed, human pose estimation Ajisafe to achieve reliable causal discovery in practice.

Moreover, causality-inspired machine learning (in the context of transfer learning, reinforcement 82 Dec. 11, 2020

learning, deep learning, etc.) leverages ideas 09:30 Cofee Break & Social on from causality to improve generalization, AM Gather.Town robustness, interpretability, and sample efciency 10:00 Spotlights 1 and is attracting more and more interest in AM Machine Learning (ML) and Artifcial Intelligence. 10:30 Poster Session 1 (Gather.Town) Despite the beneft of the causal view in transfer AM learning and reinforcement learning, some tasks 11:30 Cofee Break & Social on in ML, such as dealing with adversarial attacks AM Gather.Town and learning disentangled representations, are 12:00 Keynotes: Dominik Janzing Janzing closely related to the causal view but are PM currently underexplored, and cross-disciplinary 12:30 Keynotes: Caroline Uhler Uhler eforts may facilitate the anticipated progress. PM

This workshop aims to provide a forum for 01:00 Cofee Break & Social on discussion for researchers and practitioners in PM Gather.Town machine learning, statistics, healthcare, and 01:30 Keynotes: Karthika Mohan Mohan other disciplines to share their recent research in PM causal discovery and to explore the possibility of 02:00 Oral: Ignavier Ng Ng interdisciplinary collaboration. We also PM particularly encourage real applications, such as 02:10 Cofee Break & Social on in neuroscience, biology, and climate science, of PM Gather.Town causal discovery methods. 02:40 Keynotes: Shohei Shimizu Shimizu PM ************* 03:10 Spotlights 2 After each keynote, there will be 5 minutes for a PM live Q&A. You may post your questions in 03:40 Poster Session 2 (Gather.Town) Rocket.Chat before or during the keynote time. PM The poster session and the virtual cofee break 04:40 Closing Remarks will be on Gather.Town. There is no Q&A for orals PM and spotlight talks, but all papers will attend the poster session and you can interact with authors there. More details will come soon. Self-Supervised Learning for Speech and Schedule Audio Processing

Abdelrahman Mohamed, Hung-yi Lee, Shinji 06:50 Opening Remarks Watanabe, Shang-Wen Li, Tara Sainath, Karen Livescu AM

07:00 Keynotes: Aapo Hyvärinen Hyvarinen Fri Dec 11, 06:50 AM AM 07:30 Keynotes: Clark Glymour Glymour There is a trend in the machine learning AM community to adopt self-supervised approaches 08:00 Oral: Ashlynn Fuccello Benos to pre-train deep networks. Self-supervised AM learning utilizes proxy supervised learning tasks, 08:10 Cofee Break & Social on for example, distinguishing parts of the input AM Gather.Town signal from distractors, or generating masked input segments conditioned on the unmasked 08:40 Keynotes: James Robins robins ones, to obtain training data from unlabeled AM corpora. These approaches make it possible to 09:10 Oral: Tineke Blom Blom use a tremendous amount of unlabeled data on AM the web to train large networks and solve 09:20 Oral: Karthikeyan Shanmugam complicated tasks. ELMo, BERT, and GPT in NLP AM Bhattacharjya 83 Dec. 11, 2020

are famous examples in this direction. Recently Schedule self-supervised approaches for speech and audio processing are also gaining attention. These approaches combine methods for utilizing no or 06:50 Opening remarks Lee partial labels, unpaired text and audio data, AM contextual text and video supervision, and 07:00 Invited talk - A Broad Perspective signals from user interactions. Although the AM into Self Supervised Learning for research direction of self-supervised learning is Speech Recognition Ramabhadran active in speech and audio processing, current 07:35 Q&A for invited talk - 1 works are limited to several problems such as AM automatic speech recognition, speaker 07:45 Invited talk - Multimodal Distant identifcation, and speech translation, partially AM Supervision Hasegawa-Johnson due to the diversity of modeling in various 08:20 Q&A for invited talk - Multimodal speech and audio processing problems. There is AM Distant Supervision still much unexplored territory in the research 08:30 Self-Supervised Learning using direction for self-supervised learning. AM Contrastive Mixtures for Personalized Speech Enhancement This workshop will bring concentrated discussions Sivaraman on self-supervision for the feld of speech and 08:40 Self-supervised Pre-training audio processing via several invited talks, oral AM Reduces Label Permutation and poster sessions with high-quality papers, and Instability of Speech Separation a panel of leading researchers from academia Huang and industry. Alongside research work on new 08:50 Augmentation adversarial training self-supervised methods, data, applications, and AM for self-supervised speaker results, this workshop will call for novel work on recognition Huh understanding, analyzing, and comparing diferent self-supervision approaches for speech 09:00 Neural Composition: Learning to and audio processing. The workshop aims to: AM Generate from Multiple Models - Review existing and inspire new self-supervised Filimonov methods and results, 09:10 Towards Semi-Supervised Semantics - Motivate the application of self-supervision AM Understanding from Speech Lai approaches to more speech and audio processing 09:20 The Zero Resource Speech problems in academia and industry, and AM Benchmark 2021. Metrics and encourage discussion amongst experts and baselines for unsupervised spoken practitioners from the two realms, language modeling Nguyen - Encourage works on studying methods for 09:30 Q&A for contributed talks between understanding learned representations, AM 11:30 and 12:30 comparing diferent self-supervision methods and 09:45 Break comparing self-supervision to other self-training AM as well as transfer learning methods that low- 10:00 Invited talk - Speech Processing resource speech and audio processing have long AM with Yu utilized, 10:35 Q&A for invited talk - Speech - Facilitate communication within the feld of AM Processing with Weak Supervision speech and audio processing (e.g., people who 10:45 Towards Localisation of Keywords in attend conferences such as INTERSPEECH and AM Speech Using Weak Supervision ICASSP) as well as between the feld and the Olaleye whole machine learning community for sharing 10:55 Text-Free Image-to-Speech knowledge, ideas, and data, and encourage AM Synthesis Using Learned Segmental future collaboration to inspire innovation in the Units Hsu feld and the whole community. 11:05 Self-Supervised Audio-Visual AM Separation of On-Screen Sounds from Unlabeled Videos Tzinis 84 Dec. 11, 2020

11:15 Multi-Format Contrastive Learning of AM Audio Representations van den Oord Abstracts (1): 11:25 Q&A for contributed talks between AM 1:45 and 2:25 Abstract 35: Invited talk - De-noising 11:40 Break Sequence-to-Sequence Pre-training in Self- AM Supervised Learning for Speech and Audio 11:55 Invited talk - Underftting and Processing, Zettlemoyer 03:30 PM AM Uncertainty in Self-Supervised De-noising auto-encoders can be pre-trained at a Predictive Models Finn very large scale by noising and then 12:30 Q&A for invited talk - Underftting reconstructing any input text. Existing methods, PM and Uncertainty in Self-Supervised based on variations of masked languages models, Predictive Models have transformed the feld and now provide the 12:40 Invited talk - Towards robust self- de facto initialization to be tuned for nearly every PM supervised learning of speech task. In this talk, I will present our work on representations Ravanelli sequence-to-sequence pre-training that 01:15 Q&A for invited talk - Towards introduces and carefully measures the impact of PM robust self-supervised learning of two new types of noising strategies. I will fst speech representations describe an approach that allows arbitrary 01:25 Similarity Analysis of Self- noising, by learning to translate any corrupted PM Supervised Speech Representations text back to the original with standard Chung Transformer-based neural machine translation 01:35 Representation Learning for architectures. I will show that the resulting mono- PM Sequence Data with Deep lingual (BART) and multi-lingual (mBART) models Autoencoding Predictive Bai provide efective initialization for learning a wide 01:45 Pushing the Limits of Semi- range of discrimination and generation tasks, PM Supervised Learning for Automatic including question answer, summarization, and Speech Recognition Zhang machine translation. I will also present our 01:55 A Correspondence Variational recently introduced MARGE model, where we self- PM Autoencoder for Unsupervised supervise the reconstruction of target text by Acoustic Word Embedding Peng retrieving a set of related texts (in many 02:05 HUBERT: How much can a bad languages) and conditioning on them to PM teacher beneft ASR pre-training? maximize the likelihood of generating the Hsu original. The objective noisily captures aspects of paraphrase, translation, multi-document 02:15 Q&A for contributed talks between summarization, and information retrieval, PM 4:25 and 5:15 allowing for strong zero-shot performance with no 02:30 Break fne-tuning, as well as consistent performance PM gain when fne tuned for individual tasks. 02:45 Invited talk - Flexible contextualized Together, these techniques provide the most PM speech representation learning for comprehensive set of pre-training methods to diverse downstream tasks Kirchhhof date, as well as the frst viable alternative to the 03:20 Q&A for invited talk - Flexible dominant masked language modeling pre-training PM contextualized speech paradigm. representation learning for diverse downstream tasks 03:30 Invited talk - De-noising Sequence- PM to-Sequence Pre-training Zettlemoyer Machine Learning and the Physical 04:05 Q&A for invited talk - De-noising Sciences PM Sequence-to-Sequence Pre-training

04:15 Closing remark Mohamed Anima Anandkumar, Kyle Cranmer, Shirley Ho, Mr. PM Prabhat, Lenka Zdeborová, Atilim Gunes Baydin, Juan Carrasquilla, Adji Dieng, Karthik Kashinath, Gilles 85 Dec. 11, 2020

Louppe, Brian Nord, Michela Paganini, Savannah Thais Gather Town link: https://neurips.gather.town/ app/GS7AwXNphTXVVEZH/NeurIPS%20ML4PS Fri Dec 11, 07:00 AM

Machine learning methods have had great Schedule success in learning complex representations that enable them to make predictions about unobserved data. Physical sciences span 07:00 Session 1 | Opening remarks problems and challenges at all scales in the AM universe: from fnding exoplanets in trillions of 07:10 Session 1 | Invited talk: Lauren sky pixels, to fnding machine learning inspired AM Anderson, "3D Milky Way Dust Map solutions to the quantum many-body problem, to using a Scalable Gaussian Process" detecting anomalies in event streams from the Anderson, Baydin Large Hadron Collider. Tackling a number of 07:35 Session 1 | Invited talk Q&A: Lauren associated data-intensive tasks including, but not AM Anderson limited to, segmentation, 3D computer vision, 07:45 Session 1 | Invited talk: Michael sequence modeling, causal reasoning, and AM Bronstein, "Geometric Deep efcient probabilistic inference are critical for Learning for Functional Protein furthering scientifc discovery. In addition to using Design" Bronstein, Baydin machine learning models for scientifc discovery, 08:10 Session 1 | Invited talk Q&A: Michael the ability to interpret what a model has learned AM Bronstein is receiving an increasing amount of attention. 08:20 Session 1 | Poster session AM In this targeted workshop, we would like to bring 09:50 Session 2 | Opening remarks together computer scientists, mathematicians AM and physical scientists who are interested in 09:55 Session 2 | Invited talk: Estelle applying machine learning to various outstanding AM Inack, "Variational Neural physical problems, in particular in inverse Annealing" Inack, Baydin problems and approximating physical processes; understanding what the learned model really 10:20 Session 2 | Invited talk Q&A: Estelle represents; and connecting tools and insights AM Inack from physical sciences to the study of machine 10:30 Session 2 | Invited talk: Phiala learning models. In particular, the workshop AM Shanahan, "Generative Flow Models invites researchers to contribute papers that for Gauge Field Theory" Shanahan, demonstrate cutting-edge progress in the Baydin application of machine learning techniques to 10:55 Session 2 | Invited talk Q&A: Phiala real-world problems in physical sciences, and AM Shanahan using physical insights to understand what the 11:05 Session 2 | Poster session learned model means. AM 12:35 Session 3 | Opening remarks By bringing together machine learning PM researchers and physical scientists who apply 12:40 Session 3 | Invited talk: Laura machine learning, we expect to strengthen the PM Waller, "Physics-based Learning for interdisciplinary dialogue, introduce exciting new Computational Microscopy" Waller, open problems to the broader community, and Baydin stimulate production of new approaches to 01:05 Session 3 | Invited talk Q&A: Laura solving open problems in sciences. Invited talks PM Waller from leading individuals in both communities will 01:15 Session 3 | Community development cover the state-of-the-art techniques and set the PM breakouts stage for this workshop. 02:45 Session 3 | Feedback from PM community development breakouts — 86 Dec. 11, 2020

07:15 Keynote talk by Isabelle Guyon and AM Evelyne Viegas - "AI Competitions ML Competitions at the Grassroots (CiML and the Science Behind Contests" Guyon, Viegas 2020) 07:45 Live from the Field Moderated Q&A - AM “ML competitions as a way to Tara Chklovski, Adrienne Mendrik, Amir Banifatemi, Gustavo Stolovitzky engage and educate the broader public” (featuring families and Fri Dec 11, 07:00 AM educators from around the world) 08:20 Virtual Poster Presentations For the eighth edition of the CiML (Challenges in AM Machine Learning) workshop at NeurIPS, our 09:00 Cofee Break goals are to: 1) Increase diversity in the AM participant community in order to increase 09:15 Keynote talk by Saugato Datta quality of model predictions; 2) Identify and share AM best practices in building AI capability in 09:45 Small Group/Breakout Participant vulnerable communities; 3) Celebrate pioneers AM Discussion from these communities who are modeling lifelong learning, curiosity and courage in 10:20 Virtual Poster Presentations learning how to use ML to address critical AM problems in their communities. 10:50 Keynote talk by Lara Mangravite AM "Responsible Data Sharing for AI: The workshop will provide concrete Expanding who, what and why" recommendations to the ML community on Mangravite designing and implementing competitions that 11:30 Virtual Poster Presentations are more accessible to a broader public, and AM more efective in building long-term AI/ML 11:50 Closing Keynote by Aleksandra capability. AM (Saška) Mojsilović - "Platforms 4 Good: Realizing the potential of AI in The workshop will feature keynote speakers from addressing societal challenges" ML, behavioral science and gender and Mojsilovic development, interspersed with small group 12:20 Closing Remarks from Organizers discussions around best practices in PM implementing ML competitions. We will invite submissions of 2-page extended abstracts on topics relating to machine learning competitions, Abstracts (1): with a special focus on methods of creating Abstract 7: Small Group/Breakout Participant diverse datasets, strategies for addressing Discussion in ML Competitions at the behavioral barriers to participation in ML Grassroots (CiML 2020), 09:45 AM competitions from underrepresented communities, and strategies for measuring the Social Norms - What do young people see others long-term impact of participation in an ML around them doing in your community? competition. Identity Threats - What aspects of young people’s identities might participating in coding/ML Schedule competitions come into confict with? Framing - Are there other ways to frame the call to participate in an ML competition? 07:00 Welcome and Opening Remarks Scarcity - What features of the day-to-day lives of AM those you fnd hard to reach might deter their participation? What is the decision to (not) compete really about? 87 Dec. 11, 2020

07:30 Dreaming Up Resistance AI Activity AM Kalluri Resistance AI Workshop 08:45 Break AM

Suzanne Kite, Mattie Tesfaldet, J Khadijah 09:00 Panel 1: Tensions & Cultivating Abdurahman, William Agnew, Elliot Creager, Agata AM Resistance AI Birhane, Gebru, Raval, Foryciarz, Raphael Gontijo Lopes, Pratyusha Kalluri, Vilarino Marie-Therese Png, Manuel Sabin, Maria Skoularidou, Ramon Vilarino, Rose Wang, Sayash Kapoor, Micah 10:00 Indigenous Protocols and Artifcial Carroll AM Intelligence, Working Group Roundtable Kite, Cordes, Parker Jones, Fri Dec 11, 07:00 AM Lewis 11:00 Discussion in Small Groups It has become increasingly clear in the recent AM years that AI research, far from producing neutral 11:25 Introduction to Talks 1 tools, has been concentrating power in the hands AM of governments and companies and away from 11:30 Invited Talk 1: Salomon Kabongo marginalized communities. Unfortunately, AM Kabongo KABENAMUALU NeurIPS has lacked a venue explicitly dedicated to understanding and addressing the root of 11:45 Invited Talk 2: Jamelle Watson- these problems. As Black feminist scholar Angela AM Daniels Watson-Daniels Davis famously said, "Radical simply means 12:00 Break grasping things at the root." Resistance AI PM exposes the root problem of AI to be how 12:30 Panel 2: Tensions & Cultivating technology is used to rearrange power in the PM Resistance AI Gangadharan, Foryciarz, world. AI researchers engaged in Resistance AI Saba, Khan, Mathew, Marda, Carroll both resist AI that centralizes power into the 01:30 Poster Session hands of the few and dream up and build human/ PM AI systems that put power in the hands of the 02:30 Break people. This workshop will enable AI researchers PM in general, researchers engaged in Resistance AI, 02:45 Introduction to Talks 2 and marginalized communities in particular to PM refect on AI-fueled inequity and co-create tactics 02:50 Invited Talk 3: Inioluwa Deborah Raji for how to address this issue in our own work. PM Raji 03:05 Invited Talk 4: Saadia Gabriel Gabriel Logistics: PM We will use the main/webinar Zoom + livestream 03:20 Activity: Making Tactics and for most events, with interactive events taking PM Commitments place on a separate auxiliary/breakout Zoom or gather.town. Please see our workshop site for 04:15 Debrief & Until next time! details: https://sites.google.com/view/resistance- PM ai-neurips-20/schedule 05:00 Resistance AI Social See also our welcome doc here for further detail, PM including community guidelines and where each activity can be found: http://bit.ly/rai-welcome Abstracts (2):

Schedule Abstract 1: Welcome and Land Acknowledgement in Resistance AI Workshop, Kite 07:00 AM 07:00 Welcome and Land [main/webinar zoom + livestream] AM Acknowledgement Kite 07:15 Introduction to Resistance AI & AM Community guidelines Kite 88 Dec. 11, 2020

Abstract 6: Indigenous Protocols and accessible to a broader audience than the in- Artifcial Intelligence, Working Group person meeting is able to as described below. Roundtable in Resistance AI Workshop, Kite, Cordes, Parker Jones, Lewis 10:00 AM Schedule For background on this panel topic please see recent paper on Indigenous Protocols and 07:30 Newcomer presentation Heckel, Hand Artifcial Intelligence - http://www.indigenous- AM ai.net/position-paper 07:55 Opening Remarks Heckel, Hand, Feizi, AM Zdeborová, Baraniuk 08:00 Victor Lempitsky - Generative Workshop on Deep Learning and Inverse AM Models for Landscapes and Avatars Lempitsky Problems 08:30 Thomas Pock - Variational Networks

Reinhard Heckel, Paul Hand, Richard Baraniuk, Lenka AM Pock Zdeborová, Soheil Feizi 09:00 Risk Quantifcation in Deep MRI AM Reconstruction Edupuganti Fri Dec 11, 07:30 AM 09:15 GAN2GAN: Generative Noise AM Learning for Blind Denoising with Learning-based methods, and in particular deep Single Noisy Images Cha neural networks, have emerged as highly 09:30 Discussion successful and universal tools for image and AM signal recovery and restoration. They achieve state-of-the-art results on tasks ranging from 10:00 Rebecca Willett - Model Adaptation image denoising, image compression, and image AM for Inverse Problems in Imaging reconstruction from few and noisy Willett measurements. They are starting to be used in 10:30 Stefano Emron - Generative important imaging technologies, for example in AM Modeling via Denoising Ermon GEs newest computational tomography scanners 11:00 Compressed Sensing with and in the newest generation of the iPhone. AM Approximate Priors via Conditional Resampling Jalal The feld has a range of theoretical and practical 11:15 Chris Metzler - Approximate questions that remain unanswered. In particular, AM Message Passing (AMP) Algorithms learning and neural network-based approaches for Computational Imaging Metzler often lack the guarantees of traditional physics- 11:30 Discussion based methods. Further, while superior on AM average, learning-based methods can make 01:00 Poster Session drastic reconstruction errors, such as PM hallucinating a tumor in an MRI reconstruction or 02:00 Peyman Milanfar - Denoising as turning a pixelated picture of Obama into a white PM Building Block Theory and male. Applications Milanfar 02:30 Rachel Ward Ward This virtual workshop aims at bringing together PM theoreticians and practitioners in order to chart 03:00 Larry Zitnick - fastMRI Zitnick out recent advances and discuss new directions PM in deep neural network-based approaches for solving inverse problems in the imaging sciences 03:30 Discussion and beyond. NeurIPS, with its visibility and PM attendance by experts in machine learning, ofers the ideal frame for this exchange of ideas. We will Abstracts (8): use this virtual format to make this topic 89 Dec. 11, 2020

Abstract 1: Newcomer presentation in learns a generative model that can 1) simulate Workshop on Deep Learning and Inverse the noise in the given noisy images and 2) Problems, Heckel, Hand 07:30 AM generate a rough, noisy estimates of the clean images, then 3) iteratively trains a denoiser with This session consists of a 15-minute talk and a 10 subsequently synthesized noisy image pairs (as minute Q/A geared toward newcomers to the in N2N), obtained from the generative model. In feld, introducing them to the major questions results, we show the denoiser trained with our and approaches related to deep learning and GAN2GAN achieves an impressive denoising inverse problems. performance on both synthetic and real-world datasets for the blind denoising setting.

Abstract 5: Risk Quantifcation in Deep MRI

Reconstruction in Workshop on Deep Abstract 7: Discussion in Workshop on Deep Learning and Inverse Problems, Edupuganti Learning and Inverse Problems, 09:30 AM 09:00 AM Visit the Gather.town to discuss with speakers Reliable medical image recovery is crucial for and other attendees. accurate patient diagnoses, but little prior work has centered on quantifying uncertainty when using non-transparent deep learning approaches Abstract 10: Compressed Sensing with to reconstruct high-quality images from limited measured data. In this study, we develop Approximate Priors via Conditional methods to address these concerns, utilizing a Resampling in Workshop on Deep Learning VAE as a probabilistic recovery algorithm for and Inverse Problems, Jalal 11:00 AM pediatric knee MR imaging. Through our use of We characterize the measurement complexity of SURE, which examines the end-to-end network compressed sensing of Jacobian, we demonstrate a new and rigorous signals drawn from a known prior distribution, metric for assessing risk in medical image even when the support recovery that applies universally across model of the prior is the entire space (rather than, say, architectures. sparse vectors). We show for Gaussian measurements and \emph{any} prior distribution Abstract 6: GAN2GAN: Generative Noise on the signal, that the conditional resampling Learning for Blind Denoising with Single estimator achieves Noisy Images in Workshop on Deep near-optimal recovery guarantees. Moreover, this Learning and Inverse Problems, Cha 09:15 result is robust AM to model mismatch, as long as the distribution estimate (e.g., from We tackle a challenging blind image denoising an invertible generative model) is close to the problem, in which only single distinct noisy true distribution in images are available for training a denoiser, and Wasserstein distance. We implement the no information about noise is known, except for it conditional resampling being zero-mean, additive, and independent of estimator for deep generative priors using the clean image. In such a setting, which often Langevin dynamics, and occurs in practice, it is not possible to train a empirically fnd that it produces accurate denoiser with the standard discriminative training estimates with more or with the recently developed Noise2Noise (N2N) diversity than MAP. training; the former requires the underlying clean image for the given noisy image, and the latter requires two independently realized noisy image Abstract 12: Discussion in Workshop on Deep pair for a clean image. To that end, we propose GAN2GAN (Generated-Artifcial-Noise to Learning and Inverse Problems, 11:30 AM Generated-Artifcial-Noise) method that frst 90 Dec. 11, 2020

Visit the Gather.town to discuss with speakers workshops which focused on applications in and other attendees. robotics for machine learning, this workshop extends the discussion on how real-world applications within the context of robotics can Abstract 13: Poster Session in Workshop on trigger various impactful directions for the development of machine learning. For a more Deep Learning and Inverse Problems, 01:00 engaging workshop, we encourage each of our PM senior presenters to share their presentations Visit the gather.town to see the posters. with a PhD student or postdoctoral researcher from their lab. Additionally, all our presenters - invited and contributed - are asked to add a ``dirty laundry’’ slide, describing the limitations Abstract 17: Discussion in Workshop on Deep Learning and Inverse Problems, 03:30 PM and shortcomings of their work. We expect this will aid further discussion in poster and panel Visit the Gather.town to discuss with speakers sessions in addition to helping junior researchers and other attendees. avoid similar roadblocks along their path.

Schedule 3rd Robot Learning Workshop 07:30 Introduction Itkina Masha Itkina, Alex Bewley, Roberto Calandra, Igor AM Gilitschenski, Julien PEREZ, Ransalu Senanayake, Markus Wulfmeier, Vincent Vanhoucke 07:45 Invited Talk - "Walking the Boundary AM of Learning and Interaction" Sadigh,

Fri Dec 11, 07:30 AM Biyik 08:31 Contributed Talk 1 - "Accelerating In the proposed workshop, we aim to discuss the AM Reinforcement Learning with challenges and opportunities for machine Learned Skill Priors" (Best Paper learning research in the context of physical Runner-Up) Pertsch systems. This discussion involves the 08:45 Poster Session 1 presentation of recent methods and the AM experiences made during the deployment on 09:46 Invited Talk - "Object- and Action- real-world platforms. Such deployment requires a AM Centric Representational Robot signifcant degree of generalization. Namely, the Learning" Florence, Seita real world is vastly more complex and diverse 10:31 Invited Talk - "State of Robotics @ compared to fxed curated datasets and AM Google" Parada simulations. Deployed machine learning models 11:15 Break must scale to this complexity, be able to adapt to AM novel situations, and recover from mistakes. Moreover, the workshop aims to strengthen 03:00 Discussion Panel Florence, Sadigh, further the ties between the robotics and PM Parada, Bohg, Calandra, Stone, Ramos machine learning communities by discussing how 04:01 Invited Talk - "Learning-based their respective recent directions result in new PM Control of a Legged Robot" Hwangbo, challenges, requirements, and opportunities for Byun future research. 04:46 Contributed Talk 2 - "Multi-Robot PM Deep Reinforcement Learning via Following the success of previous robot learning Hierarchically Integrated Models" workshops at NeurIPS, the goal of this workshop (Best Paper) Kang is to bring together a diverse set of scientists at 05:00 Break various stages of their careers and foster PM interdisciplinary communication and discussion. 06:15 Poster Session 2 In contrast to the previous robot learning PM 91 Dec. 11, 2020

07:15 Closing skills during downstream learning. Yet, intuitively, PM not all skills should be explored with equal probability; for example information about the current state can hint which skills are promising Abstracts (6): to explore. In this work, we propose to implement this intuition by learning a prior over skills. We Abstract 2: Invited Talk - "Walking the propose a deep latent variable model that jointly Boundary of Learning and Interaction" in learns an embedding space of skills and the skill 3rd Robot Learning Workshop, Sadigh, Biyik prior from ofine agent experience. We then 07:45 AM extend common maximum-entropy RL There have been signifcant advances in the feld approaches to use skill priors to guide of robot learning in the past decade. However, downstream learning. We validate our approach, many challenges still remain when considering SPiRL (Skill-Prior RL), on complex navigation and how robot learning can advance interactive robotic manipulation tasks and show that learned agents such as robots that collaborate with skill priors are essential for efective skill transfer humans. This includes autonomous vehicles that from rich datasets. Videos and code are available interact with human-driven vehicles or at https://clvrai.com/spirl. pedestrians, service robots collaborating with their users at homes over short or long periods of time, or assistive robots helping patients with Abstract 5: Invited Talk - "Object- and Action- disabilities. This introduces an opportunity for Centric Representational Robot Learning" in developing new robot learning algorithms that 3rd Robot Learning Workshop, Florence, Seita can help advance interactive autonomy. 09:46 AM

In this talk, we will discuss a formalism for In this talk we'll discuss diferent views on human-robot interaction built upon ideas from representations for robot learning, in particular representation learning. Specifcally, we will frst towards the goal of precise, generalizable vision- discuss the notion of latent strategies — low based manipulation skills that are sample- dimensional representations sufcient for efcient and scalable to train. Object-centric capturing non-stationary interactions. We will representations, on the one hand, can enable then talk about the challenges of learning such using rich additional sources of learning, and can representations when interacting with humans, enable various efcient downstream behaviors. and how we can develop data-efcient Action-centric representations, on the other hand, techniques that enable actively learning can learn high-level planning, and do not have to computational models of human behavior from explicitly instantiate objectness. As case studies demonstrations and preferences. we’ll look at two recent papers in these two areas.

Abstract 3: Contributed Talk 1 - "Accelerating Reinforcement Learning with Learned Skill Abstract 6: Invited Talk - "State of Robotics @ Priors" (Best Paper Runner-Up) in 3rd Robot Google" in 3rd Robot Learning Workshop, Learning Workshop, Pertsch 08:31 AM Parada 10:31 AM

Intelligent agents rely heavily on prior experience Robotics@Google’s mission is to make robots when learning a new task, yet most modern useful in the real world through machine learning. reinforcement learning (RL) approaches learn We are excited about a new model for robotics, every task from scratch. One approach for designed for generalization across diverse leveraging prior knowledge is to transfer skills environments and instructions. This model is learned on prior tasks to the new task. However, focused on scalable data-driven learning, which is as the amount of prior experience increases, the task-agnostic, leverages simulation, learns from number of transferable skills grows too, making it past experience, and can be quickly adapted to challenging to explore the full set of available work in the real-world through limited interactions. In this talk, we’ll share some of our 92 Dec. 11, 2020

recent work in this direction in both manipulation Hierarchically Integrated Models" (Best and locomotion applications. Paper) in 3rd Robot Learning Workshop, Kang 04:46 PM

Deep reinforcement learning algorithms require Abstract 9: Invited Talk - "Learning-based large and diverse datasets in order to learn Control of a Legged Robot" in 3rd Robot successful perception-based control policies. Learning Workshop, Hwangbo, Byun 04:01 PM However, gathering such datasets with a single robot can be prohibitively expensive. In contrast, Legged robots pose one of the greatest challenges in robotics. Dynamic and agile collecting data with multiple platforms with maneuvers of animals cannot be imitated by possibly diferent dynamics is a more scalable existing methods that are crafted by humans. A approach to large-scale data collection. But how can deep reinforcement learning algorithms compelling alternative is reinforcement learning, leverage these dynamically heterogeneous which requires minimal craftsmanship and promotes the natural evolution of a control policy. datasets? In this work, we propose a deep However, so far, reinforcement learning research reinforcement learning algorithm with for legged robots is mainly limited to simulation, hierarchically integrated models (HInt). At training time, HInt learns separate perception and and only few and comparably simple examples dynamics models, and at test time, HInt have been deployed on real systems. The primary reason is that training with real robots, integrates the two models in a hierarchical particularly with dynamically balancing systems, manner and plans actions with the integrated is complicated and expensive. Recent algorithmic model. This method of planning with hierarchically integrated models allows the improvements have made simulation even algorithm to train on datasets gathered by a cheaper and more accurate at the same time. Leveraging such tools to obtain control policies is variety of diferent platforms, while respecting thus a seemingly promising direction. However, a the physical capabilities of the deployment robot few simulation-related issues have to be at test time. Our simulated and real world navigation experiments show that HInt addressed before utilizing them in practice. The outperforms conventional hierarchical policies biggest obstacle is the so-called reality gap -- discrepancies between the simulated and the real and single-source approaches. system. Hand-crafted models often fail to achieve a reasonable accuracy due to the complexities of actuation systems of existing robots. This talk will focus on how such obstacles can be overcome. Machine Learning for Autonomous Driving The main approaches are twofold: a fast and accurate algorithm for solving contact dynamics Rowan McAllister, Xinshuo Weng, Daniel Omeiza, Nick Rhinehart, Fisher Yu, German Ros, Vladlen and a data-driven simulation-augmentation Koltun method using deep learning. These methods are applied to the ANYmal robot, a sophisticated Fri Dec 11, 07:55 AM medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal Welcome to the NeurIPS 2020 Workshop on machine achieves locomotion skills that go Machine Learning for Autonomous Driving! beyond what had been achieved with prior methods: ANYmal is capable of precisely and Autonomous vehicles (AVs) ofer a rich source of energy-efciently following high-level body high-impact research problems for the machine velocity commands, running faster than ever learning (ML) community; including perception, before, and recovering from falling even in state estimation, probabilistic modeling, time complex confgurations. series forecasting, gesture recognition, robustness guarantees, real-time constraints, user-machine communication, multi-agent Abstract 10: Contributed Talk 2 - "Multi-Robot planning, and intelligent infrastructure. Further, Deep Reinforcement Learning via the interaction between ML subfelds towards a common goal of autonomous driving can catalyze 93 Dec. 11, 2020

interesting inter-feld discussions that spark new 03:00 CARLA Challenge Ros avenues of research, which this workshop aims to PM promote. As an application of ML, autonomous 04:00 Invited Talk: Beipeng Mu Mu driving has the potential to greatly improve PM society by reducing road accidents, giving 04:30 Q&A: Beipeng Mu Mu independence to those unable to drive, and even PM inspiring younger generations with tangible N/A Paper 56: IDE-Net: Extracting examples of ML-based technology clearly visible Interactive Driving Patterns from on local streets. Human Data Sun, Zhan N/A Paper 7: Real-time Semantic and All are welcome to submit and/or attend! This will Class-agnostic Instance be the 5th NeurIPS workshop in this series. Segmentation in Autonomous Previous workshops in 2016, 2017, 2018 and Driving Siam, Rashed, El Sallab 2019 enjoyed wide participation from both N/A Paper 38: Multi-Task Network academia and industry. Pruning and Embedded Optimization for Real-time Deployment in ADAS Schedule Dellinger, Mendoza Barrenechea, Leang N/A Paper 64: Modeling Afect-based Intrinsic Rewards for Exploration 07:55 Welcome McAllister and Learning McDuf, Kapoor AM N/A Paper 32: Reinforcement Learning 08:00 Invited Talk: Patrick Perez Pérez Based Approach for Multi-Vehicle AM Platooning Problem with Nonlinear 08:30 Q&A: Patrick Perez Pérez Dynamic Behavior Ramadan AM N/A Paper 18: Uncertainty-aware Vehicle 08:40 Invited Talk: Angela Schoellig Orientation Estimation for Joint AM Schoellig Detection-Prediction Models Vallespi, 09:20 Break and Posters Djuric AM N/A Paper 30: MODETR: Moving Object 10:00 Invited Talk: Jianxiong Xiao Xiao Detection with Transformers El AM Sallab, Rashed 10:40 Invited Talk: Pin Wang Wang N/A Paper 33: Risk Assessment for AM Machine Learning Models Hueger, 11:00 Q&A: Pin Wang Wang Schlicht AM N/A Paper 51: Multi-modal Agent 11:10 Invited Talk: Ehud Sharlin Sharlin Trajectory Prediction with Local Self- AM Attention Contexts Bhat, Francis 11:50 Q&A: Ehud Sharlin Sharlin N/A Paper 60: Trafc Forecasting using AM Vehicle-to-Vehicle Communication 12:00 Break and Posters and Recurrent Neural Networks Yu PM N/A Paper 2: Energy-Based Continuous 01:00 Invited Talk: Byron Boots Boots Inverse Optimal Control Xu, Xie, PM Baker, Zhao, Wu 01:30 Q&A: Byron Boots Boots N/A Paper 9: Stochastic-YOLO: Efcient PM Probabilistic Object Detection under 01:40 Invited Talk: Brandyn White White Dataset Shifts Azevedo, Mattina, Maji PM N/A Paper 11: Vehicle Trajectory 02:10 Q&A: Brandyn White White Prediction by Transfer Learning of PM Semi-Supervised Models Lamm, Drori 02:20 Break and Posters PM 94 Dec. 11, 2020

N/A Paper 12: DepthNet Nano: A Highly N/A Paper 19: Multiagent Driving Policy Compact Self-Normalizing Neural for Congestion Reduction in a Large Network for Monocular Depth Scale Scenario Cui, Stone Estimation McAllister N/A Paper 31: SAFENet: Self-Supervised N/A Paper 13: Conditional Imitation Monocular Depth Estimation with Learning Driving Considering Semantic-Aware Feature Extraction Camera and LiDAR Fusion Eraqi Kim N/A Paper 39: Bézier Curve Based End- N/A Paper 46: Disagreement-Regularized to-End Trajectory Synthesis for Agile Imitation of Complex Multi-Agent Autonomous Driving Weiss, Behl Interactions Song, Ermon N/A Paper 41: Extracting Trafc N/A Paper 59: Annotating Automotive Smoothing Controllers Directly From Radar efciently: Semantic Radar Driving Data using Ofine RL Vinitsky Labeling Framework (SeRaLF) Isele N/A Paper 44: CARLA Real Trafc N/A Paper 45: A Comprehensive Study Scenarios – novel training ground on the Application of Structured and benchmark for autonomous Pruning methods in Autonomous driving Osiński, Miłoś, Jakubowski, Vehicles Sobh, Hamed Galias, Homoceanu N/A Paper 15: Calibrating Self- N/A Paper 49: ULTRA: A reinforcement supervised Monocular Depth learning generalization benchmark Estimation McAllister for autonomous driving , Graves N/A Paper 50: Diverse Sampling for N/A Paper 52: Distributionally Robust Flow-Based Trajectory Forecasting Online Adaptation via Ofine Ma, Inala, Jayaraman, Bastani Population Synthesis Sinha, O'Kelly, N/A Paper 8: EvolveGraph: Multi-Agent Zheng Trajectory Prediction with Dynamic N/A Paper 57: Single Shot Multitask Relational Reasoning Li, Yang, Pedestrian Detection and Behavior Tomizuka, Choi Prediction McAllister N/A Paper 53: A Distributed Delivery- N/A Paper 1: Multimodal Trajectory Fleet Management Framework using Prediction for Autonomous Driving Deep Reinforcement Learning and with Semantic Map and Dynamic Dynamic Multi-Hop Routing Aggarwal, Graph Attention Network McAllister Bhargava N/A Paper 62: Instance-wise Depth and N/A Paper 55: Physically Feasible Vehicle Motion Learning from Monocular Trajectory Prediction Hoang, Videos Lee, Im, Lin, Kweon Marchetti-Bowick N/A Paper 58: Vehicle speed data N/A Paper 10: Certifed Interpretability imputation based on parameter Robustness for Class Activation transferred LSTM KWON, Park Mapping Gu, Weng, Chen, Liu, Daniel N/A Paper 37: Investigating the Efect of N/A Paper 20: YOLObile: Real-Time Sensor Modalities in Multi-Sensor Object Detection on Mobile Devices Detection-Prediction Models Mohta, via Compression-Compilation Co- Chou, Becker, Djuric, Vallespi Design CAI, Niu, Wang N/A Paper 27: Explainable Autonomous N/A Paper 16: Driving Behavior Driving with Grounded Relational Explanation with Multi-level Fusion Inference Srishankar, Martin, Tomizuka Cord, Pérez N/A Paper 40: Real2sim: Automatic N/A Paper 24: 3D-LaneNet+: Anchor Free Generation of Open Street Map Lane Detection using a Semi-Local Towns For Autonomous Driving Representation Oron Benchmarks Tigas, Gal 95 Dec. 11, 2020

N/A Paper 22: RAMP-CNN: A Novel Abstract 17: CARLA Challenge in Machine Neural Network for Learning for Autonomous Driving, Ros 03:00 EnhancedAutomotive Radar Object PM Recognition McAllister The CARLA Autonomous Driving Challenge 2020 N/A Paper 21: Haar Wavelet based Block is organized as part of the Machine Learning for Autoregressive Flows for Autonomous Driving Workshop at NeurIPS 2020. Trajectories Bhattacharyya, Straehle, This competition is open to any participant from Fritz, Schiele academia and industry. N/A Paper 14: PePScenes: A Novel Dataset and Baseline for Pedestrian The challenge follows the same structure and Action Prediction in 3D Rasouli, rules defned for the CARLA AD Leaderboard. You Rohani can participate in any of the two available tracks: N/A Paper 43: DeepSeqSLAM: A SENSORS and MAP, using the canonical sensors Trainable CNN+RNN for Joint Global available for the challenge. Description and Sequence-based Place Recognition in Large-Scale The top-1 submissions of each track will be Changing Environments Chancan invited to present their results at the Machine N/A Paper 61: Predicting times of Learning for Autonomous Driving Workshop. waiting on red signals using BERT Additionally, all participants are invited to submit Gora, Szejgis a technical report (up to 4 pages) describing their N/A Paper 42: Temporally-Continuous submissions. Based on the novelty and originality Probabilistic Prediction using of these technical reports, the organization will Polynomial Trajectory select up to two teams to present their work at Parameterization Su, Djuric, Vallespi, the workshop. Bradley N/A Paper 6: FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving Rashed, El Sallab Fair AI in Finance

Senthil Kumar, Cynthia Rudin, John Paisley, Isabelle Abstracts (4): Moulinier, C. Bayan Bruss, Eren K., Susan Tibbs, Oluwatobi Olabiyi, Simona Gandrabur, Svitlana Abstract 5: Break and Posters in Machine Vyetrenko, Kevin Compher Learning for Autonomous Driving, 09:20 AM Fri Dec 11, 08:00 AM https://neurips.gather.town/app/ RhCLTvx08wOwYaga/ml4ad The fnancial services industry has unique needs for fairness when adopting artifcial intelligence and machine learning (AI/ML). First and foremost, Abstract 11: Break and Posters in Machine there are strong ethical reasons to ensure that Learning for Autonomous Driving, 12:00 PM models used for activities such as credit decisioning and lending are fair and unbiased, or https://neurips.gather.town/app/ that machine reliance does not cause humans to RhCLTvx08wOwYaga/ml4ad miss critical pieces of data. Then there are the regulatory requirements to actually prove that the models are unbiased and that they do not Abstract 16: Break and Posters in Machine discriminate against certain groups. Learning for Autonomous Driving, 02:20 PM Emerging techniques such as algorithmic credit https://neurips.gather.town/app/ scoring introduce new challenges. Traditionally RhCLTvx08wOwYaga/ml4ad fnancial institutions have relied on a consumer’s past credit performance and transaction data to make lending decisions. But, with the emergence 96 Dec. 11, 2020

of algorithmic credit scoring, lenders also use 11:00 Invited Talk 6: Reconciling Legal and alternate data such as those gleaned from social AM Technical Approaches to Algorithmic media and this immediately raises questions Bias Xiang around systemic biases inherent in models used 11:30 Lunch Break to understand customer behavior. AM 12:30 Panel Discussion: Building a Fair We also need to play careful attention to ways in PM Future in Finance Elish, Xiang, Stoica, which AI can not only be de-biased, but also how Posey it can play an active role in making fnancial 01:15 Break services more accessible to those historically PM shut out due to prejudice and other social 01:20 Invited Talk 7:Fair Portfolio Design injustices. PM Kearns 01:50 Invited Talk 8: Fair AI in the The aim of this workshop is to bring together PM securities industry, a review of researchers from diferent disciplines to discuss methods and metrics Bryant fair AI in fnancial services. For the frst time, four major banks have come together to organize this 02:20 Invited Talk 9: Building Compliant workshop along with researchers from two PM Models: Fair Feature Selection with universities as well as SEC and FINRA (Financial Multiobjective Monte Carlo Tree Industry Regulatory Authority). Our confrmed Search Chen invited speakers come with diferent backgrounds 02:50 Break including AI, law and cultural anthropology, and PM we hope that this will ofer an engaging forum 03:10 Spotlight Talk 1: Quantifying risk- with diversity of thought to discuss the fairness PM fairness trade-of in regression aspects of AI in fnancial services. We are also Schreuder, Chzhen planning a panel discussion on systemic bias and 03:25 Spotlight Talk 2: Black Loans Matter: its impact on fnancial outcomes of diferent PM Distributionally Robust Fairness for customer segments, and how AI can help. Fighting Subgroup Discrimination Weber Schedule 03:40 Spotlight Talk 3: An Experiment on PM Leveraging SHAP Values to Investigate Racial Bias Vilarino, 08:00 Opening Remarks Kumar Vicente AM 03:55 Spotlight Talk 4: Fairness, Welfare, 08:05 Invited Talk : Modeling the Dynamics PM and Equity in Personalized Pricing AM of Poverty Abebe Kallus, Zhou 08:35 Invited Talk 2: Unavoidable Tensions 04:10 Spotlight Talk 5: Robust Welfare AM in Explaining Algorithmic Decisions PM Guarantees for Decentralized Credit Barocas Organizations Abebe, Ikeokwu, Taggart 09:05 Break 04:25 Spotlight Talk 6: Partially Aware: AM PM Some Challenges Around 09:15 Invited Talk 3: Stories of Invisibility: Uncertainty and Ambiguity in AM Re-thinking Human in the Loop Fairness Buet-Golfouse Design Elish 04:40 Spotlight Talk 7: Hidden Technical 09:45 Invited Talk 4: Actionable Recourse PM Debts for Fair Machine Learning in AM in Machine Learning Ustun Financial Services Huang, Nourian, 10:15 Break Griest AM 04:55 Lightning Talk 1: Insights into 10:30 Invited Talk 5: Navigating Value PM Fairness through Trust: Multi-scale AM Trade-ofs in ML for Consumer Trust Quantifcation for Financial Finance - A Legal and Regulatory Deep Learning Wong, Hryniowski, Wang Perspective Aggarwal 97 Dec. 11, 2020

04:58 Lightning Talk 2: Pareto Robustness is still a lack of agreement on how to best PM for Fairness Beyond Demographics represent objects, how to learn object Martinez, Bertran, Papadaki, Rodrigues, representations, and how best to leverage them Sapiro in agent training. 05:01 Lightning Talk 3: Developing a In this workshop we seek to build a consensus on PM Philosophical Framework for Fair what object representations should be by Machine Learning: The Case of engaging with researchers from developmental Algorithmic Collusion and Market psychology and by defning concrete tasks and Fairness Michelson capabilities that agents building on top of such 05:04 Lightning Talk 4: Latent-CF: A Simple abstract representations of the world should PM Baseline for Reverse Counterfactual succeed at. We will discuss how object Explanations Bruss, Balasubramanian, representations may be learned through invited Barr, Sharpe, Wittenbach presenters with expertise both in unsupervised and supervised object representation learning methods. Finally, we will host conversations and research on new frontiers in object learning. Object Representations for Learning and Reasoning Schedule William Agnew, Rim Assouel, Michael Chang, Antonia Creswell, Eliza Kosoy, Aravind Rajeswaran, Sjoerd van Steenkiste 08:00 Introduction Agnew AM Fri Dec 11, 08:00 AM 08:15 Keynote: Elizabeth Spelke AM Recent advances in deep reinforcement learning 09:02 Learning Object-Centric Video and robotics have enabled agents to achieve AM Models by Contrasting Sets superhuman performance on a variety of 09:04 Structure-Regularized Attention for challenging games and learn complex AM Deformable Object Representation manipulation tasks. While these results are very 09:06 Learning Long-term Visual Dynamics promising, several open problems remain. In AM with Region Proposal Interaction order to function in real-world environments, Networks learned policies must be both robust to input 09:08 Self-Supervised Attention-Aware perturbations and be able to rapidly generalize or AM Reinforcement Learning adapt to novel situations. Moreover, to collaborate and live with humans in these 09:10 Emergence of compositional environments, the goals and actions of embodied AM abstractions in human collaborative agents must be interpretable and compatible assembly with human representations of knowledge. 09:12 Semantic State Representation for Hence, it is natural to consider how humans so AM Reinforcement Learning successfully perceive, learn, and plan to build 09:14 Odd-One-Out Representation agents that are equally successful at solving real AM Learning world tasks. 09:16 Word(s) and Object(s): Grounded There is much evidence to suggest that objects AM Language Learning In Information are a core level of abstraction at which humans Retrieval perceive and understand the world [8]. Objects 09:20 Discrete Predictive Representation have the potential to provide a compact, casual, AM for Long-horizon Planning robust, and generalizable representation of the 09:22 Dynamic Regions Graph Neural world. Recently, there have been many AM Networks for Spatio-Temporal advancements in scene representation, allowing Reasoning scenes to be represented by their constituent 09:26 Dexterous Robotic Grasping with objects, rather than at the level of pixels. While AM Object-Centric Visual Afordances these works have shown promising results, there 98 Dec. 11, 2020

09:28 Understanding designed objects by Abstract 2: Keynote: Elizabeth Spelke in AM program synthesis Object Representations for Learning and 09:29 Learning Embeddings that Capture Reasoning, 08:15 AM AM Spatial Semantics for Indoor Elizabeth Spelke is the Marshall L. Berkman Navigation Professor of Psychology at Harvard University and 09:30 Poster Session A in GatherTown an investigator at the NSF-MIT Center for Brains, AM Minds and Machines. Her laboratory focuses on 10:30 Panel Discussion Hamrick, Gref, Lee, the sources of uniquely human cognitive AM Higgins, Tenenbaum capacities, including capacities for formal 11:45 Break in GatherTown mathematics, for constructing and using symbols, AM and for developing comprehensive taxonomies of 12:25 Invited Talk: Jessica Hamrick Hamrick objects. She probes the sources of these PM capacities primarily through behavioral research 12:55 Invited Talk: Irina Higgins Higgins on human infants and preschool children, PM focusing on the origins and development of their 01:25 Invited Talk: Sungjin Ahn Ahn understanding of objects, actions, people, places, PM number, and geometry. In collaboration with 01:55 Contributed Talk : A Symmetric and computational cognitive scientists, she aims to PM Object-Centric World Model for test computational models of infants’ cognitive Stochastic Environments capacities. In collaboration with economists, she has begun to take her research from the 02:07 Contributed Talk : OGRE: An Object- laboratory to the feld, where randomized PM based Generalization for Reasoning controlled experiments can serve to evaluate Environment interventions, guided by research in cognitive 02:19 Invited Talk: Wilka Carvalho Carvalho science, that seek to enhance young children’s PM learning. 02:49 Break in GatherTown PM 03:20 Invited Talk: Renée Baillargeon Abstract 17: Panel Discussion in Object PM Representations for Learning and 03:50 Invited Talk: Dieter Fox Reasoning, Hamrick, Gref, Lee, Higgins, PM Tenenbaum 10:30 AM 04:20 Contributed Talk : Disentangling 3D PM Prototypical Networks for Few-Shot How can we obtain object representations in real Concept Learning world environments? How can object 04:32 Contributed Talk : Deep Afordance representations be applied in robotics? Join us for PM Foresight: Planning for What Can Be a panel discussion with Jessica Hamrick, Irina Done Next Higgins, Michelle Lee, Josh Tenenbaum, 04:44 Contributed talk : Estimating Mass moderated by Klaus Gref. PM Distribution of Articulated Objects using Non-prehensile Manipulation 04:56 Panel , Carvalho, Fan, Kulkarni, Xie Abstract 19: Invited Talk: Jessica Hamrick in PM Object Representations for Learning and 06:10 Concluding Remarks Reasoning, Hamrick 12:25 PM PM Jessica Hamrick is a Senior Research Scientist at 06:15 Poster Session B in GatherTown DeepMind, where she studies how to build PM machines that can fexibly build and deploy models of the world. Her work combines insights Abstracts (9): from cognitive science with structured relational architectures, model-based deep reinforcement learning, and planning. Jessica received a Ph.D. in 99 Dec. 11, 2020

Psychology from UC Berkeley in 2017, and an M.Eng. and B.S. in Computer Science from MIT in Abstract 24: Invited Talk: Wilka Carvalho in 2012. Object Representations for Learning and Reasoning, Carvalho 02:19 PM

Abstract 20: Invited Talk: Irina Higgins in Wilka Carvalho is a PhD Candidate in Computer Object Representations for Learning and Science at the University of Michigan–Ann Arbor Reasoning, Higgins 12:55 PM where he is advised by Honglak Lee, Satinder Singh, and Richard Lewis. His long-term research Irina Higgins is a research scientist at DeepMind, goal is to develop cognitive theories of learning where she works in the Frontiers team. Her work that help us understand how humans infer, aims to bring together insights from the felds of reason with, and exploit the rich structure neuroscience and physics to advance general present in realistic visual scenes to enable artifcial intelligence through improved sophisticated behavioral policies. Towards this representation learning. Before joining DeepMind, end, he is studying how object-centric Irina was a British Psychological Society representation learning and reinforcement Undergraduate Award winner for her learning can bring us closer to human-level achievements as an undergraduate student in artifcial intelligence. He is supported by an NSF Experimental Psychology at Westminster GRFP Fellowship and a UM Rackham Merit University, followed by a DPhil at the Oxford Fellowship. Centre for Computational Neuroscience and Artifcial Intelligence, where she focused on understanding the computational principles Abstract 26: Invited Talk: Renée Baillargeon underlying speech processing in the auditory brain. During her DPhil, Irina also worked on in Object Representations for Learning and developing poker AI, applying machine learning in Reasoning, 03:20 PM the fnance sector, and working on speech Renée Baillargeon is an Alumni Distinguished recognition at Google Research. Professor of Psychology at the University of Illinois Urbana-Champaign. Her research examines cognitive development in infancy and Abstract 21: Invited Talk: Sungjin Ahn in focuses primarily on causal reasoning. In Object Representations for Learning and particular, she explores how infants make sense Reasoning, Ahn 01:25 PM of the events they observe, and what explanatory frameworks and learning mechanisms enable Sungjin Ahn is an Assistant Professor of Computer them to do so. In addition to this primary focus Science at Rutgers University and directs the on causal reasoning, she is interested in a broad Rutgers Machine Learning (RUML) lab. He is also range of related issues including object afliated with Rutgers Center for Cognitive perception, categorization, object individuation, Science. His research focus is on how an AI-agent number, and executive-function skills. can learn the structure and representations of the world in an unsupervised and compositional way, with a particular interest in object-centric Abstract 27: Invited Talk: Dieter Fox in Object learning. His approach to achieving this is based Representations for Learning and on deep learning, Bayesian modeling, reinforcement learning, and inspiration from Reasoning, 03:50 PM cognitive & neuroscience. He received Ph.D. at Object Representations for Robot Manipulation the University of California, Irvine with Max

Welling and did a postdoc with Yoshua Bengio at Reasoning about objects is a fundamental task in Mila. Then, he joined Rutgers University in Fall robot manipulation. Diferent representations can 2018. He has co-organized ICML 2020 Workshop have important repercussions on the capabilities on Object-Oriented Learning and received the and generality of a manipulation system. In this ICML best paper award in ICML 2012. talk I will discuss diferent ways we represent and 100 Dec. 11, 2020

reason about objects, ranging from explicit 3D (c) economic mechanisms that incentivize quality models to raw point clouds. and efectiveness for requester while maintaining a high level of quality and fairness for crowd performers (also known as workers). Abstract 31: Panel in Object Representations Because quality, fairness and opportunities for for Learning and Reasoning, , Carvalho, Fan, crowd workers are central to our workshop, we Kulkarni, Xie 04:56 PM will invite a diverse group of crowd workers from What should be in an object representation, what a global public crowdsourcing platform to our should an object representation be able to do, panel-led discussion. and how do we measure and compare them? Join us for a panel discussion with Wilka Carvalho, Workshop web site: https://research.yandex.com/ Judy Fan, Tejas Kulkarni, and Chris Xie, moderated workshops/crowd/neurips-2020 by Rachit Dubey. Gathertown: https://neurips.gather.town/app/ 8eTm8IQJRRpltf4F/crowdscience

Schedule Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation 08:00 Introduction & Icebreakers AM Daria Baidakova, Fabio Casati, Alexey Drutsa, Dmitry 08:15 Data Excellence: Better Data for Ustalov AM Better AI (by Lora Aroyo) Aroyo 08:35 Q&A with Lora Aroyo "Data Fri Dec 11, 08:00 AM AM Excellence: Better Data for Better AI " Despite the obvious advantages, automation driven by machine learning and artifcial 08:45 A Gamifed Crowdsourcing intelligence carries pitfalls for the lives of millions AM Framework for Data-Driven Co- of people: disappearance of many well- creation of Policy Making and Social established mass professions and consumption of Foresight (by Andrea Tocchetti and labeled data that are produced by humans Marco Brambilla) Tocchetti managed by out of time approach with full-time 09:00 Q&A with Andrea Tocchetti and ofce work and pre-planned task types. AM Marco Brambilla "A Gamifed Crowdsourcing methodology can be considered Crowdsourcing Framework for Data- as an efective way to overcome these issues Driven Co-creation of Policy Making since it provides freedom for task executors in and Social Foresight" terms of place, time and which task type they 09:05 Conversational Crowdsourcing (by want to work on. However, many potential AM Sihang Qiu, Ujwal Gadiraju, participants of crowdsourcing processes hesitate Alessandro Bozzon and Geert-Jan to use this technology due to a series of doubts Houben) Gadiraju, Bozzon (that have not been removed during the past 09:20 Q&A with Sihang Qiu, Ujwal decade). AM Gadiraju, Alessandro Bozzon and Geert-Jan Houben "Conversational This workshop brings together people studying Crowdsourcing" research questions on 09:25 Cofee Break AM (a) quality and efectiveness in remote crowd 09:35 Quality Control in Crowdsourcing (by work; AM Seid Muhie Yimam) Yimam (b) fairness and quality of life at work, tackling 09:55 Q&A with Seid Muhie Yimam issues such as fair task assignment, fair work AM "Quality Control in Crowdsourcing" conditions, and on providing opportunities for growth; and 101 Dec. 11, 2020

10:05 What Can Crowd Computing Do for 01:50 Human Computation Requires and AM the Next Generation of AI PM Enables a New Approach to Ethics Technology? (by Ujwal Gadiraju and (by Libuse Veprek, Patricia Seymour Jie Yang) Gadiraju, and Pietro Michelucci) Vepřek, , 10:20 Q&A with Ujwal Gadiraju and Jie Michelucci AM Yang "What Can Crowd Computing 02:05 Q&A with Libuse Veprek, Patricia Do for the Next Generation of AI PM Seymour and Pietro Michelucci: Technology?" "Human computation requires and 10:25 Real-Time Crowdsourcing of Health enables a new approach to ethics" AM Data in a Low-Income country: A 02:10 Cofee Break case study of Human Data Supply on PM Malaria frst-line treatment policy 02:20 Bias in Human-in-the-Loop Artifcial tracking in Nigeria (by Olubayo PM Intelligence (by Gianluca Demartini) Adekanmbi, Wuraola Fisayo Oyewusi Demartini and Ezekiel Ogundepo) Adekanmbi, 02:40 Q&A with Gianluca Demartini: "Bias Oyewusi PM in Human-in-the-loop Artifcial 10:40 Q&A with Olubayo Adekanmbi, Intelligence" AM Wuraola Fisayo Oyewusi and Ezekiel 02:50 VAIDA: An Educative Benchmark Ogundepo: "Real-Time PM Creation Paradigm using Visual Crowdsourcing of Health Data in a Analytics for Interactively Low-Income country: A case study of Discouraging Artifacts (by Anjana Human Data Supply on Malaria frst- Arunkumar, Swaroop Mishra, line treatment policy tracking in Bhavdeep Sachdeva, Chitta Baral Nigeria" and Chris Bryan) Arunkumar, Mishra, 10:45 Cofee Break Baral AM 03:05 Q&A with Anjana Arunkumar, 11:00 Panel Discussion "Successes and PM Swaroop Mishra, Bhavdeep AM failures in crowdsourcing: Sachdeva, Chitta Baral and Chris experiences from work providers, Bryan: " VAIDA: An Educative performers and platforms" Benchmark Creation Paradigm using 12:30 Lunch Break Visual Analytics for Interactively PM Discouraging Artifacts" 01:00 Modeling and Aggregation of 03:10 Achieving Data Excellence (by PM Complex Annotations Via Annotation PM Praveen Paritosh) Paritosh Distance (by Matt Lease) Lease 03:30 Q&A with Praveen Paritosh 01:20 Q&A with Matt Lease: "Modeling and PM "Achieving Data Excellence" PM Aggregation of Complex Annotations 03:40 Closing Via Annotation Distance" PM 01:30 Active Learning from Crowd in Item PM Screening (by Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon and Abstracts (7): Zoltán Szlávik) Krivosheev, Sayin Abstract 4: A Gamifed Crowdsourcing Günel, Bozzon, Szlavik Framework for Data-Driven Co-creation of 01:45 Q&A with Evgeny Krivosheev, Burcu Policy Making and Social Foresight (by PM Sayin, Alessandro Bozzon and Zoltán Andrea Tocchetti and Marco Brambilla) in Szlávik: "Active Learning from Crowd Science Workshop: Remoteness, Crowd in Item Screening" Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation, Tocchetti 08:45 AM 102 Dec. 11, 2020

Over the last decades, communication between crowdsourcing task execution. With governments and citizens has become a conversational crowdsourcing, workers receive remarkable problem. Governments’ decisions are task information as messages from a not always aligned with the visions of the citizens conversational agent, and provide answers by about the future. Achieving such alignment sending messages back to the agent. In this requires cooperation between communities and vision paper, we introduce our recent work in public institutions. Therefore, it’s important to terms of using conversational crowdsourcing to fnd a way to innovate governance and improve worker performance and experience by policymaking, developing new ways to harness employing novel human-computer interaction the potential of public engagement and afordances. Our fndings reveal that participatory foresight in complex governance conversational crowdsourcing has important decisions. In this paper we propose a implications in improving the worker satisfaction comprehensive framework that combines and requester-worker relationship in crowdsourcing and machine learning, aiming to crowdsourcing marketplaces. improve the collective engagement and contribution of the crowd in policy-making decisions. Our approach brings together social Abstract 11: What Can Crowd Computing Do networking, gamifcation, and data analysis for the Next Generation of AI Technology? practices for extracting relevant and coordinated (by Ujwal Gadiraju and Jie Yang) in Crowd future vision concerning public policies. The Science Workshop: Remoteness, Fairness, framework is validated through two experiments and Mechanisms as Challenges of Data with citizens and policy-making domain experts. Supply by Humans for Automation, Gadiraju, The fndings confrm the efectiveness of the 10:05 AM framework principles and provide useful feedback for future development. The unprecedented rise in the adoption of artifcial intelligence techniques and automation across several critical domains, is concomitant

Abstract 6: Conversational Crowdsourcing with shortcomings of such technology with (by Sihang Qiu, Ujwal Gadiraju, Alessandro respect to robustness, usability, interpretability, Bozzon and Geert-Jan Houben) in Crowd and trustworthiness. Crowd computing ofers a viable means to leverage human intelligence at Science Workshop: Remoteness, Fairness, scale for data creation, enrichment, and and Mechanisms as Challenges of Data Supply by Humans for Automation, Gadiraju, interpretation, demonstrating a great potential to Bozzon 09:05 AM improve the performance of AI systems and increase the adoption of AI in general. Existing The trend of remote work leads to the prosperity research and practice has mainly focused on of crowdsourcing marketplaces. In crowdsourcing leveraging crowd computing for training data marketplaces, online workers can select their creation. However, this perspective is rather preferable tasks and then complete them to get limiting in terms of how AI can fully beneft from paid, while requesters design and publish tasks to crowd computing. In this vision paper, we identify acquire their desirable data. The conventional opportunities in crowd computing to propel better user interface of the crowdsourcing task is the AI technology, and argue that to make such web page, where users provide answers using progress, fundamental problems need to be HTML-based web elements, and the task-related tackled from both computation and interaction information (including instructions and questions) standpoints. We discuss important research is displayed on a single web page. Although the questions in both these themes, with an aim to conventional way of presenting tasks is shed light on the research needed to pave a straightforward, it could negatively afect future where humans and AI can work together workers' satisfaction and performance by causing seamlessly, while benefting from each other. problems such as boredom and fatigue. To address this challenge, we proposed a novel paradigm --- conversational crowdsourcing, which employs conversational interfaces to facilitate 103 Dec. 11, 2020

Abstract 13: Real-Time Crowdsourcing of Health Data in a Low-Income country: A Abstract 20: Active Learning from Crowd in case study of Human Data Supply on Item Screening (by Evgeny Krivosheev, Malaria frst-line treatment policy tracking in Nigeria (by Olubayo Adekanmbi, Wuraola Burcu Sayin, Alessandro Bozzon and Zoltán Fisayo Oyewusi and Ezekiel Ogundepo) in Szlávik) in Crowd Science Workshop: Crowd Science Workshop: Remoteness, Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Fairness, and Mechanisms as Challenges of Automation, Krivosheev, Sayin Günel, Bozzon, Data Supply by Humans for Automation, Adekanmbi, Oyewusi 10:25 AM Szlavik 01:30 PM

Malaria is one of the leading causes of high In this paper, we explore how to efciently morbidity and mortality in Nigeria despite various combine crowdsourcing and machine intelligence policy interventions to frontally address the for the problem of document screening, where we need to screen a fnite number of documents with menace. While a national malaria policy agenda a set of machine-learning flters. Specifcally, we exists on the use of Artemisinin-based antimalarial as the frst-line drug of choice for the focus on building a set of machine learning treatment, there have been challenges in the classifers that evaluate documents, and then implementation monitoring across various drug screen them efciently. It is a challenging task since the budget is limited and there are distribution layers, particularly the informal countless number of ways to spend the given channels that dominate over eighty percent of the antimalarial drug distribution value chain. budget on the problem. We propose a multi-label The lack of sustained policy monitoring through a active learning screening specifc sampling structured and systematic surveillance system technique -objective-aware sampling- for querying unlabelled documents for annotating. can encourage irrational drug usage, trigger Our algorithm takes a decision on which machine antimalarial drug resistance, and worsen the disease burden in an economy where over ninety flter needs more training data and how to choose percent of the population live below the poverty unlabeled items to annotate in order to minimize line. We explored the use of real-time data the risk of overall classifcation errors rather than minimizing a single flter error. Our results collection through ordinary local residents, who demonstrate that objective-aware sampling leverages low-cost smartphones with an on- device app to run quick mystery shopping at drug signifcantly outperforms the state of the art outlets to check recommended malaria treatment sampling strategies on multi-flter classifcation drugs in four (4) states across the country. The problems. instant survey data is collected via guided mystery shopping, which requires the volunteer participants to answer three basic questions after Abstract 22: Human Computation Requires a 5 - 10 minutes in-store observation. Each and Enables a New Approach to Ethics (by submission is verifed with the drug store picture Libuse Veprek, Patricia Seymour and Pietro and auto-generated location co-ordinates. The Michelucci) in Crowd Science Workshop: antimalarial policy compliance level is Remoteness, Fairness, and Mechanisms as immediately determined and can be Challenges of Data Supply by Humans for anonymously aggregated into a national map for Automation, Vepřek, , Michelucci 01:50 PM onward sharing with pharmaceutical trade With humans increasingly serving as groups, government agencies and non-profts for computational elements in distributed immediate intervention via requisite stakeholder education. This crowd-sourcing efort provides an information processing systems and in afordable option that can be scaled up to consideration of the proft-driven motives and support healthcare surveillance and efective potential inequities that might accompany the emerging thinking economy, we recognize the policy compliance tracking in developing nations, need for establishing a set of related ethics to where there is a paucity of data as a result of high illiteracy and infrastructural inadequacy. ensure the fair treatment and wellbeing of online cognitive laborers and the conscientious use of 104 Dec. 11, 2020

the capabilities to which they contribute. Toward First session for the competition program at this end, we frst describe human-in-the-loop NeurIPS2020. computing in context of the new concerns it raises that are not addressed by traditional Machine learning competitions have grown in ethical research standards. We then describe popularity and impact over the last decade, shortcomings in the traditional approach to emerging as an efective means to advance the ethical review and a dynamic approach for state of the art by posing well-structured, sustaining an ethical framework that can relevant, and challenging problems to the continue to evolve within the rapidly shifting community at large. Motivated by a reward or context of disruptive new technologies. merely the satisfaction of seeing their machine learning algorithm reach the top of a leaderboard, practitioners innovate, improve, and

Abstract 27: VAIDA: An Educative Benchmark tune their approach before evaluating on a held- Creation Paradigm using Visual Analytics out dataset or environment. The competition for Interactively Discouraging Artifacts (by track of NeurIPS has matured in 2020, its fourth year, with a considerable increase in both the Anjana Arunkumar, Swaroop Mishra, number of challenges and the diversity of Bhavdeep Sachdeva, Chitta Baral and Chris Bryan) in Crowd Science Workshop: domains and topics. A total of 16 competitions Remoteness, Fairness, and Mechanisms as are featured this year as part of the track, with 8 Challenges of Data Supply by Humans for competitions associated to each of the two days. The list of competitions that ar part of the Automation, Arunkumar, Mishra, Baral 02:50 PM program are available here: We present VAIDA, a novel benchmark creation paradigm (BCP) for NLP. VAIDA provides realtime https://neurips.cc/Conferences/2020/ feedback to crowdworkers about the quality of CompetitionTrack samples as they are being created, educating them about potential artifacts and allowing them to update samples to remove the same. Schedule Concurrently, VAIDA supports backend analysts to review and approve submitted samples for 08:00 Opening - Competition Track Session benchmark inclusion, analyze the overall quality AM Hofmann, Escalante of the dataset, and resample splits to obtain and 08:15 3D+texture garment reconstruction freeze the optimum state. VAIDA is domain, AM challenge design (data, metrics, model, task, and metric agnostic, and constitutes tracks, etc) Bertiche a paradigm shift for robust, validated, and 08:35 3D+texture garment reconstruction dynamic benchmark creation via human-and- AM challenge results Madadi metric-in-the-loop workfows. We demonstrate 09:00 Opening the L2RPN challenge @ VAIDA's efectiveness by leveraging DQI (a data AM NeurIPS2020 Marot quality metric) over four datasets. We further evaluate via expert review and a user study with 09:03 Winning the L2RPN challenge Marot NASA TLX. We fnd that VAIDA decreases mental AM demand, temporal demand, efort, and frustration 09:13 A L2RPN Winning approach LU of crowdworkers (29.7%) and analysts(12.1%); it AM increases performance by 30.8\% and 26\% 09:23 The Best L2RPN wining approach respectively. AM Zhou 09:33 L2RPN Post Challenge open AM questions Marot 09:40 Closing and ceremony award Marot Competition Track Friday AM 10:00 Introducing the Hide-and-Seek Hugo Jair Escalante, Katja Hofmann AM privacy challenge Jordon

Fri Dec 11, 08:00 AM 105 Dec. 11, 2020

10:03 The importance of synthetic data 04:17 2nd prize: TLab: Trafc Map Movie AM Jordon PM Forecasting Based on HR-NET - 10:18 Synthetic data in the healthcare Trafc4cast highlight talk Wu AM setting Jordon 04:22 3rd prize: Towards Good Practices of 10:28 What we learned from the Hide-and- PM U-Net for Trafc Forecasting - AM Seek privacy challenge Jordon Trafc4cast highlight talk Xu 10:38 Closing remarks Jordon 04:27 Graph Ensemble Net and the AM PM Importance of Feature & Loss 11:00 Background on black box Function Design for Trafc AM optimization (BBO) Turner Prediction - Trafc4cast highlight talk Qi 11:16 BBO challenge platform with Valohai AM Kiili 04:32 Uncertainty Intervals for Graph- PM based Spatio-Temporal Trafc 11:27 Spotlight for 1st place (BBO Prediction - Trafc4cast highlight AM challenge) Turner, Cowen-Rivers talk Maas 11:32 Spotlight for 2nd place (BBO 04:37 Trafc4cast Award Ceremony, AM challenge) Turner, Liu PM Outlook, and Follow Up Challenges 11:37 Spotlight for 3rd place (BBO Kreil AM challenge) Turner, Sazanovich 05:00 The Hateful Memes Challenge: 02:00 Opening the SpaceNet 7 Challenge PM Competition Overview Kiela PM @ NeurIPS2020 Van Etten 05:15 The Hateful Memes Challenge: Live 02:07 Introduction to SpaceNet Shermeyer PM award ceremony and winner PM presentations Kiela 02:15 The SpaceNet 7 Dataset Martinez- PM Manso 02:25 The SpaceNet 7 Metric Van Etten PM ML Retrospectives, Surveys & Meta- 02:30 The Winners of SpaceNet 7 Van Etten Analyses (ML-RSA) PM Chhavi Yadav, Prabhu Pradhan, Jesse Dodge, 02:40 SpaceNet 7 Closing and Future Plans Mayoore Jaiswal, Peter Henderson, Abhishek Gupta, PM Van Etten Ryan Lowe, Jessica Forde Jessica Forde, Joelle Pineau 03:00 introduction to the 2020 NeurIPS PM education challenge Lamb Fri Dec 11, 08:30 AM 03:05 Competition overview: motivation, PM impact, dataset, tasks Lamb The exponential growth of AI research has led to several papers foating on , making it 03:20 Competition results and insights difcult to review existing literature. Despite the PM Wang huge demand, the proportion of survey & 03:30 Beyond the competition: what's analyses papers published is very low due to PM next? Wang reasons like lack of a venue and incentives. Our 03:35 Q&A and discussion Wang, Lamb Workshop, ML-RSA provides a platform and PM incentivizes writing such types of papers. It 04:00 Trafc Map Movies - An Introduction meets the need of taking a step back, looking at PM to the Trafc4cast Challenge the sub-feld as a whole and evaluating actual Hochreiter progress. We will accept 3 types of papers: broad 04:05 The Trafc4cast Competition Design survey papers, meta-analyses, and PM and Data Kopp retrospectives. Survey papers will mention and 04:10 The Best Trafc4Cast Submissions cluster diferent types of approaches, provide PM Kreil pros and cons, highlight good source code 04:12 1st prize: Utilizing UNet for the implementations, applications and emphasize PM Future Trafc Map Prediction - impactful literature. We expect this type of paper Trafc4cast highlight talk Choi to provide a detailed investigation of the 106 Dec. 11, 2020

techniques and link together themes across 02:55 Poster Session Starts multiple works. The main aim of these will be to PM organize techniques and lower the barrier to 04:55 Intro to speaker 5 : Lana Sinapayen entry for newcomers. Meta-Analyses, on the PM other hand, are forward-looking, aimed at 05:00 Invited: Lana Sinapayen Sinapayen providing critical insights on the current state-of- PM afairs of a sub-feld and propose new directions 05:30 Q&A with Lana Sinapayen based on them. These are expected to be more PM than just an ablation study -- though an empirical 05:55 Intro to Speaker 6 : Reza Shokri analysis is encouraged as it can provide for a PM stronger narrative. Ideally, they will seek to 06:00 Invited: Reza Shokri Shokri showcase trends that are not possible to be seen PM when looking at individual papers. Finally, retrospectives seek to provide further insights ex 06:30 Q&A with Reza Shokri post by the authors of a paper: these could be PM technical, insights into the research process, or 07:00 Awardees' Talks other helpful information that isn’t apparent from PM the original work. 08:00 Closing PM

Schedule Abstracts (2):

08:30 Introduction Abstract 8: Panel in ML Retrospectives, AM Surveys & Meta-Analyses (ML-RSA), 09:00 Invited: Shakir Mohamed Mohamed Weinberger, De-Arteaga, Santurkar, Frankle, Raji AM 12:00 PM 09:30 Q&A with Shakir Mohamed Moderator: Jessica Forde AM 10:00 Brainstorming AM Abstract 15: Poster Session Starts in ML 10:55 Intro to speaker 2 : Kilian Retrospectives, Surveys & Meta-Analyses AM Weinberger (ML-RSA), 02:55 PM 11:00 Invited: Kilian Weinberger Weinberger AM Please fnd zoom links for posters on our Rocket 11:30 Q&A with Kilian Weinberger Chat. AM 12:00 Panel Weinberger, De-Arteaga, PM Santurkar, Frankle, Raji 12:55 Intro to speaker 3 Maria De-Artega Deep Reinforcement Learning PM 01:00 Invited: Maria De-Artega De-Arteaga Pieter Abbeel, Chelsea Finn, Joelle Pineau, David Silver, Satinder Singh, Coline Devin, Misha Laskin, PM Kimin Lee, Janarthanan Rajendran, Vivek Veeriah 01:30 Q&A with Maria De-Arteaga

PM Fri Dec 11, 08:30 AM 01:55 Intro to speaker 4 : Shibani PM Santurkar In recent years, the use of deep neural networks 02:00 Invited: Shibani Santurkar Santurkar as function approximators has enabled PM researchers to extend reinforcement learning techniques to solve increasingly complex control 02:35 Q&A with Shibani Santurkar tasks. The emerging feld of deep reinforcement PM learning has led to remarkable empirical results 107 Dec. 11, 2020

in rich and varied domains like robotics, strategy 12:00 Invited talk: Matt Botvinick games, and multiagent interactions. This PM "Alchemy: A Benchmark Task workshop will bring together researchers working Distribution for Meta-Reinforcement at the intersection of deep learning and Learning Research" Botvinick reinforcement learning, and it will help interested 12:30 Poster session 1 researchers outside of the feld gain a high-level PM view about the current state of the art and 01:30 Invited talk: Susan Murphy "We used potential directions for future contributions. PM RL but…. Did it work?!" Murphy 02:00 Contributed Talk: MaxEnt RL and Schedule PM Robust Control Eysenbach, Levine 02:15 Contributed Talk: Reset-Free PM Lifelong Learning with Skill-Space 08:30 Invited talk: PierreYves Oudeyer Planning Lu, Grover, Abbeel, Mordatch AM "Machines that invent their own 02:30 Invited talk: Anusha Nagabandi problems: Towards open-ended PM "Model-based Deep Reinforcement learning of skills" Oudeyer Learning for Robotic Systems" 09:00 Contributed Talk: Learning Nagabandi AM Functionally Decomposed 03:00 Break Hierarchies for Continuous Control PM Tasks with Path Planning Christen, 03:30 Invited talk: Ashley Edwards Jendele, Aksan, Hilliges PM "Learning Ofine from Observation" 09:15 Contributed Talk: Maximum Reward Edwards AM Formulation In Reinforcement 04:00 NeurIPS RL Competitions: Flatland Learning Gottipati, Pathak, Nuttall, ., PM challenge Mohanty Chunduru, Touati, Ganapathi, Taylor , 04:07 NeurIPS RL Competitions: Learning Chandar PM to run a power network Marot 09:30 Contributed Talk: Accelerating 04:15 NeurIPS RL Competitions: Procgen AM Reinforcement Learning with PM challenge Mohanty Learned Skill Priors Pertsch, Lee, Lim 04:22 NeurIPS RL Competitions: MineRL 09:45 Contributed Talk: Asymmetric self- PM Guss, Milani AM play for automatic goal discovery in robotic manipulation Robotics, 04:30 Invited talk: Karen Liu "Deep Plappert, Sampedro, Xu , Akkaya, PM Reinforcement Learning for Physical Kosaraju, Welinder, D'Sa, Petron, Ponde, Human-Robot Interaction" Liu Paino, Noh , Weng, Yuan, Chu , Zaremba 05:00 Panel discussion Oudeyer, Bellemare, 10:00 Invited talk: Marc Bellemare PM Stone, Botvinick, Murphy, Nagabandi, AM "Autonomous navigation of Edwards, Liu, Abbeel stratospheric balloons using 06:00 Poster session 2 reinforcement learning" Bellemare PM 10:30 Break N/A Poster: Addressing Distribution Shift AM in Online Reinforcement Learning 11:00 Invited talk: Peter Stone "Grounded with Ofine Datasets AM Simulation Learning for Sim2Real N/A Poster: Learning Latent Landmarks with Connections to Of-Policy for Generalizable Planning Reinforcement Learning" Stone N/A Poster: Model-Based Reinforcement 11:30 Contributed Talk: Mirror Descent Learning via Latent-Space AM Policy Optimization Tomar, Shani, Collocation Efroni, Ghavamzadeh N/A Poster: Explanation Augmented 11:45 Contributed Talk: Planning from Feedback in Human-in-the-Loop AM Pixels using Inverse Dynamics Reinforcement Learning Models Paster, McIlraith, Ba 108 Dec. 11, 2020

N/A Poster: Regularized Inverse N/A Poster: BeBold: Exploration Beyond Reinforcement Learning the Boundary of Explored Regions N/A Poster: Goal-Conditioned N/A Poster: Policy Guided Planning in Reinforcement Learning in the Learned Latent Space Presence of an Adversary N/A Poster: Model-Based Meta- N/A Poster: Domain Adversarial Reinforcement Learning for Flight Reinforcement Learning with Suspended Payloads N/A Poster: Sample Efcient Training in N/A Poster: Structure and randomness in Multi-Agent AdversarialGames with planning and reinforcement Limited Teammate Communication learning N/A Poster: Learning Intrinsic Symbolic N/A Poster: Inter-Level Cooperation in Rewards in Reinforcement Learning Hierarchical Reinforcement N/A Poster: D2RL: Deep Dense Learning Architectures in Reinforcement N/A Poster: Quantifying Diferences in Learning Reward Functions N/A Poster: Targeted Query-based N/A Poster: Bringing order into Actor- Action-Space Adversarial Policies on Critic Algorithms usingStackelberg Deep Reinforcement Learning Games Agents N/A Poster: ReaPER: Improving Sample N/A Poster: Parameter-based Value Efciency in Model-Based Latent Functions Imagination N/A Poster: Modular Training, Integrated N/A Poster: Motion Planner Augmented Planning Deep Reinforcement Reinforcement Learning for Robot Learning for Mobile Robot Manipulation in Obstructed Navigation Environments N/A Poster: How to make Deep RL work N/A Poster: Utilizing Skipped Frames in in Practice Action Repeats via Pseudo-Actions N/A Poster: Which Mutual-Information N/A Poster: Addressing reward bias in Representation Learning Objectives Adversarial Imitation Learning with are Sufcient for Control? neutral reward functions N/A Poster: Trust, but verify: model- N/A Poster: Harnessing Distribution based exploration in sparse reward Ratio Estimators for Learning Agents environments with Quality and Diversity N/A Poster: Super-Human Performance N/A Poster: Evolving Reinforcement in Gran Turismo Sport Using Deep Learning Algorithms Reinforcement Learning N/A Poster: Provably Efcient Policy N/A Poster: Self-Supervised Policy Optimization via Thompson Adaptation during Deployment Sampling N/A Poster: AWAC: Accelerating Online N/A Poster: Deep Q-Learning with Low Reinforcement Learning With Ofine Switching Cost Datasets N/A Poster: Contrastive Behavioral N/A Poster: FactoredRL: Leveraging Similarity Embeddings for Factored Graphs for Deep Generalization in Reinforcement Reinforcement Learning Learning N/A Poster: Visual Imitation with N/A Poster: Policy Learning Using Weak Reinforcement Learning using Supervision Recurrent Siamese Networks N/A Poster: Value Generalization among N/A Poster: Learning Accurate Long-term Policies: Improving Value Function Dynamics for Model-based with Policy Representation Reinforcement Learning 109 Dec. 11, 2020

N/A Poster: Shortest-Path Constrained N/A Poster: Safety Aware Reinforcement Reinforcement Learning for Sparse Learning Reward Tasks N/A Poster: Planning from Pixels using N/A Poster: Randomized Ensembled Inverse Dynamics Models Double Q-Learning: Learning Fast N/A Poster: MaxEnt RL and Robust Without a Model Control N/A Poster: Compute- and Memory- N/A Poster: Preventing Value Function Efcient Reinforcement Learning Collapse in Ensemble Q-Learning by with Latent Experience Replay Maximizing Representation N/A Poster: PettingZoo: Gym for Multi- Diversity Agent Reinforcement Learning N/A Poster: Asymmetric self-play for N/A Poster: Evaluating Agents Without automatic goal discovery in robotic Rewards manipulation N/A Poster: Conservative Safety Critics N/A Poster: Optimizing Memory for Exploration Placement using Evolutionary Graph N/A Poster: Unlocking the Potential of Reinforcement Learning Deep Counterfactual Value N/A Poster: C-Learning: Learning to Networks Achieve Goals via Recursive N/A Poster: Robust Domain Randomised Classifcation Reinforcement Learning through N/A Poster: Accelerating Reinforcement Peer-to-Peer Distillation Learning with Learned Skill Priors N/A Poster: DREAM: Deep Regret N/A Poster: Unifed View of Inference- minimization with Advantage based Of-policy RL: Decoupling baselines and Model-free learning Algorithmic and Implemental Source N/A Poster: Predictive PER: Balancing of Performance Gaps Priority and Diversity towards Stable N/A Poster: Adversarial Environment Deep Reinforcement Learning Generation for Learning to Navigate N/A Poster: Action and Perception as the Web Divergence Minimization N/A Poster: Reinforcement Learning for N/A Poster: DERAIL: Diagnostic Sparse-Reward Object-Interaction Environments for Reward And Tasks in First-person Simulated 3D Imitation Learning Environments N/A Poster: Causal Curiosity: RL Agents N/A Poster: DisCo RL: Distribution- Discovering Self-supervised Conditioned Reinforcement Learning Experiments for Causal for General-Purpose Policies Representation Learning N/A Poster: GRAC: Self-Guided and Self- N/A Poster: Emergent Road Rules In Regularized Actor-Critic Multi-Agent Driving Environments N/A Poster: Deep Bayesian Quadrature N/A Poster: Backtesting Optimal Trade Policy Gradient Execution Policies in Agent-Based N/A Poster: A Policy Gradient Method for Market Simulator Task-Agnostic Exploration N/A Poster: Learning Functionally N/A Poster: On Efective Parallelization Decomposed Hierarchies for of Continuous Control Tasks with Path N/A Poster: Learning to Represent Planning Action Values as a Hypergraph on N/A Poster: Hyperparameter Auto-tuning the Action Vertices in Self-Supervised Robotic Learning N/A Poster: Combating False Negatives N/A Poster: Curriculum Learning through in Adversarial Imitation Learning Distilled Discriminators 110 Dec. 11, 2020

N/A Poster: Solving Compositional N/A Poster: Autoregressive Dynamics Reinforcement Learning Problems Models for Ofine Policy Evaluation via Task Reduction and Optimization N/A Poster: TACTO: A Simulator for N/A Poster: Learning Markov State Learning Control from Touch Abstractions for Deep Sensing Reinforcement Learning N/A Poster: Online Safety Assurance for N/A Poster: Multi-task Reinforcement Deep Reinforcement Learning Learning with a Planning Quasi- N/A Poster: Lyapunov Barrier Policy Metric Optimization N/A Poster: Interactive Visualization for N/A Poster: Pairwise Weights for Debugging RL Temporal Credit Assignment N/A Poster: A Deep Value-based Policy N/A Poster: Understanding Learned Search Approach for Real-world Reward Functions Vehicle Repositioning on Mobility- N/A Poster: Reinforcement Learning with on-Demand Platforms Bayesian Classifers: Efcient Skill N/A Poster: Beyond Exponentially Learning from Outcome Examples Discounted Sum: Automatic N/A Poster: A Variational Inference Learning of Return Function Perspective on Goal-Directed N/A Poster: FinRL: A Deep Behavior in Reinforcement Learning Reinforcement Learning Library for N/A Poster: Latent State Models for Automated Stock Trading in Meta-Reinforcement Learning from Quantitative Finance Images N/A Poster: Continual Model-Based N/A Poster: Dream and Search to Reinforcement Learning Control: Latent Space Planning for withHypernetworks Continuous Control N/A Poster: Model-Based Visual Planning N/A Poster: Maximum Mutation with Self-Supervised Functional Reinforcement Learning for Scalable Distances Control N/A Poster: Variational Empowerment as N/A Poster: Unsupervised Task Representation Learning for Goal- Clustering for Multi-Task Based Reinforcement Learning Reinforcement Learning N/A Poster: Implicit Under- N/A Poster: Benchmarking Multi-Agent Parameterization Inhibits Data- Deep Reinforcement Learning Efcient Deep Reinforcement Algorithms Learning N/A Poster: What Matters for On-Policy N/A Poster: Learning to Sample with Deep Actor-Critic Methods? A Large- Local and Global Contexts in Scale Study Experience Replay Bufer N/A Poster: Semantic State N/A Poster: Safe Reinforcement Learning Representation for Reinforcement with Natural Language Constraints Learning N/A Poster: Model-based Navigation in N/A Poster: Learning to Weight Environments with Novel Layouts Imperfect Demonstrations Using Abstract $n$-D Maps N/A Poster: Infuence-aware Memory for N/A Poster: Revisiting Rainbow: Deep Reinforcement Learning in Promoting more insightful and POMDPs inclusive deep reinforcement learning research N/A Poster: Optimizing Trafc Bottleneck Throughput using Cooperative, N/A Poster: Measuring Visual Decentralized Autonomous Vehicles Generalization in Continuous Control from Pixels 111 Dec. 11, 2020

N/A Poster: Weighted Bellman Backups N/A Poster: SCC: an efcient deep for Improved Signal-to-Noise in Q- reinforcement learning agent Updates mastering the game of StarCraft II N/A Poster: Reinforcement Learning with N/A Poster: Mirror Descent Policy Latent Flow Optimization N/A Poster: Disentangled Planning and N/A Poster: OPAL: Ofine Primitive Control in Vision Based Robotics via Discovery for Accelerating Ofine Reward Machines Reinforcement Learning N/A Poster: Exploring Zero-Shot N/A Poster: Parrot: Data-driven Emergent Communication in Behavioral Priors for Reinforcement Embodied Multi-Agent Populations Learning N/A Poster: Diverse Exploration via N/A Poster: Successor Landmarks for InfoMax Options Efcient Exploration and Long- N/A Poster: Of-Dynamics Reinforcement Horizon Navigation Learning: Training for Transfer with N/A Poster: Data-Efcient Reinforcement Domain Classifers Learning with Self-Predictive N/A Poster: Amortized Variational Deep Representations Q Network N/A Poster: R-LAtte: Visual Control via N/A Poster: Decoupling Representation Deep Reinforcement Learning with Learning from Reinforcement Attention Network Learning N/A Poster: Unsupervised Domain N/A Poster: Discovering Diverse Multi- Adaptation for Visual Navigation Agent Strategic Behavior via Reward N/A Poster: Towards Efective Context Randomization for Meta-Reinforcement Learning: an N/A Poster: Tonic: A Deep Reinforcement Approach based on Contrastive Learning Library for Fast Learning Prototyping and Benchmarking N/A Poster: Chaining Behaviors from N/A Poster: Multi-Agent Option Critic Data with Model-Free Reinforcement Architecture Learning N/A Poster: XT2: Training an X-to-Text N/A Poster: An Algorithmic Causal Model Typing Interface with Online of Credit Assignment in Learning from Implicit Feedback Reinforcement Learning N/A Poster: Greedy Multi-Step Of-Policy N/A Poster: Abstract Value Iteration for Reinforcement Learning Hierarchical Deep Reinforcement N/A Poster: Online Hyper-parameter Learning Tuning in Of-policy Learning via N/A Poster: Reusability and Evolutionary Strategies Transferability of Macro Actions for N/A Poster: An Examination of Reinforcement Learning Preference-based Reinforcement N/A Poster: Reset-Free Lifelong Learning Learning for Treatment with Skill-Space Planning Recommendation N/A Poster: Maximum Reward N/A Poster: Energy-based Surprise Formulation In Reinforcement Minimization for Multi-Agent Value Learning Factorization N/A Poster: Efcient Competitive Self- N/A Poster: Correcting Momentum in Play Policy Optimization Temporal Diference Learning N/A Poster: Decoupling Exploration and N/A Poster: Learning to Reach Goals via Exploitation in Meta-Reinforcement Iterated Supervised Learning Learning without Sacrifces N/A Poster: Model-Based Reinforcement N/A Poster: C-Learning: Horizon-Aware Learning: A Compressed Survey Cumulative Accessibility Estimation 112 Dec. 11, 2020

N/A Poster: XLVIN: eXecuted Latent cancer and so on. Digital apps and wearables, Value Iteration Nets observe the user's state via sensors/self-report, N/A Poster: Discovery of Options via deliver treatment actions (reminders, Meta-Gradients motivational messages, suggestions, social N/A Poster: PixL2R: Guiding outreach,...) and observe rewards repeatedly on Reinforcement Learning Using the user across time. This area is seeing Natural Language by Mapping Pixels increasing interest by RL researchers with the to Rewards goal of including in the digital app/wearable an RL algorithm that "personalizes" the treatments N/A Poster: Skill Transfer via Partially to the user. But after RL is run on a number of Amortized Hierarchical Planning users, how do we know whether the RL algorithm N/A Poster: Mastering Atari with actually personalized the sequential treatments Discrete World Models to the user? In this talk we report on our frst N/A Poster: Average Reward eforts to address this question after our RL Reinforcement Learning with algorithm was deployed on each of 111 Monotonic Policy Improvement individuals with hypertension.

Abstracts (5): Abstract 16: Invited talk: Anusha Nagabandi Abstract 8: Invited talk: Peter Stone "Model-based Deep Reinforcement Learning "Grounded Simulation Learning for for Robotic Systems" in Deep Sim2Real with Connections to Of-Policy Reinforcement Learning, Nagabandi 02:30 PM Reinforcement Learning" in Deep Reinforcement Learning, Stone 11:00 AM Deep learning has shown promising results in robotics, but we are still far from having For autonomous robots to operate in the open, intelligent systems that can operate in the dynamically changing unstructured settings of the real world, where world, they will need to be able to learn a robust disturbances, variations, and unobserved factors set of skills from lead to a dynamic environment. In this talk, we'll relatively little experience. This talk introduces see that model-based deep RL can indeed allow Grounded for efcient skill acquisition, as well as the ability Simulation Learning as a way to bridge the so- to repurpose models to solve a variety of tasks. called reality gap We'll scale up these approaches to enable between simulators and the real world in order to locomotion with a 6-DoF legged robot on varying enable transfer terrains in the real world, as well as dexterous learning from simulation to a real robot. manipulation with a 24-DoF anthropomorphic Grounded Simulation hand in the real world. We then focus on the Learning has led to the fastest known stable walk inevitable mismatch between an agent's training on a widely used conditions and the test conditions in which it may humanoid robot. Connections to theoretical actually be deployed, thus illuminating the need advances in of-policy for adaptive systems. Inspired by the ability of reinforcement learning will be highlighted. humans and animals to adapt quickly in the face of unexpected changes, we present a meta- learning algorithm within this model-based RL Abstract 13: Invited talk: Susan Murphy "We framework to enable online adaptation of large, used RL but…. Did it work?!" in Deep high-capacity models using only small amounts of Reinforcement Learning, Murphy 01:30 PM data from the new task. These fast adaptation capabilities are seen in both simulation and the Digital Healthcare is a growing area of real-world, with experiments such as a 6-legged importance in modern healthcare due to its robot adapting online to an unexpected payload potential in helping individuals improve their or suddenly losing a leg. We will then further behaviors so as to better manage chronic health extend the capabilities of our robotic systems by challenges such as hypertension, mental health, enabling the agents to reason directly from raw 113 Dec. 11, 2020

image observations. Bridging the benefts of applications. With the recent breakthrough in representation learning techniques with the collaborative robots and deep reinforcement adaptation capabilities of meta-RL, we'll present learning, accurately modeling human movements a unifed framework for efective meta-RL from and behaviors has become a common challenge images. With robotic arms in the real world that also faced by researchers in robotics and artifcial learn peg insertion and ethernet cable insertion intelligence. For example, mobile robots and to varying targets, we'll see the fast acquisition of autonomous vehicles can beneft from training in new skills, directly from raw image observations environments populated with ambulating humans in the real world. Finally, this talk will conclude and learning to avoid colliding with them. that model-based deep RL provides a framework Healthcare robotics, on the other hand, need to for making sense of the world, thus allowing for embrace physical contacts and learn to utilize reasoning and adaptation capabilities that are them for enabling human’s activities of daily necessary for successful operation in the living. An immediate concern in developing such dynamic settings of the real world. an autonomous and powered robotic device is the safety of human users during the early development phase when the control policies are still largely suboptimal. Learning from physically Abstract 18: Invited talk: Ashley Edwards "Learning Ofine from Observation" in Deep simulated humans and environments presents a Reinforcement Learning, Edwards 03:30 PM promising alternative which enables robots to safely make and learn from mistakes without A common trope in sci-f is to have a robot that putting real people at risk. However, deploying can quickly solve some problem after watching a such policies to interact with people in the real person, studying a video, or reading a book. world adds additional complexity to the already While these settings are (currently) fctional, the challenging sim-to-real transfer problem. In this benefts are real. Agents that can solve tasks by talk, I will present our current progress on solving observing others have the potential to greatly the problem of sim-to-real transfer with humans reduce the burden of their human teachers, in the environment, actively interacting with the removing some of the need to hand-specify robots through physical contacts. We tackle the rewards or goals. In this talk, I consider the problem from two fronts: developing more question of how an agent can not only learn by relevant human models to facilitate robot observing others, but also how it can learn learning and developing human-aware robot quickly by training ofine before taking any steps perception and control policies. As an example of in the environment. First, I will describe an contextualizing our research efort, we develop a approach that trains a latent policy directly from mobile manipulator to put clothes on people with state observations, which can then be quickly physical impairments, enabling them to carry out mapped to real actions in the agent’s day-to-day tasks and maintain independence. environment. Then I will describe how we can train a novel value function, Q(s,s’), to learn of- policy from observations. Unlike previous imitation from observation approaches, this KR2ML - Knowledge Representation and formulation goes beyond simply imitating and Reasoning Meets Machine Learning rather enables learning from potentially suboptimal observations. Veronika Thost, Kartik Talamadupula, Vivek Srikumar, Chenwei Zhang, Josh Tenenbaum

Fri Dec 11, 08:40 AM Abstract 23: Invited talk: Karen Liu "Deep Reinforcement Learning for Physical Machine learning (ML) has seen a tremendous Human-Robot Interaction" in Deep amount of recent success and has been applied Reinforcement Learning, Liu 04:30 PM in a variety of applications. However, it comes Creating realistic virtual humans has traditionally with several drawbacks, such as the need for been considered a research problem in Computer large amounts of training data and the lack of Animation primarily for entertainment explainability and verifability of the results. In 114 Dec. 11, 2020

many domains, there is structured knowledge 03:45 Break #2 (e.g., from electronic health records, laws, clinical PM guidelines, or common sense knowledge) which 04:01 Invited Talk #7 Bengio can be leveraged for reasoning in an informed PM way (i.e., including the information encoded in 04:35 Panel #2 Etzioni, Ji, Kambhampati, Lin, the knowledge representation itself) in order to PM Wu obtain high quality answers. Symbolic 05:20 Closing Remarks approaches for knowledge representation and PM reasoning (KRR) are less prominent today - N/A Poster #1 Ren mainly due to their lack of scalability - but their N/A Poster #4 Zhang strength lies in the verifable and interpretable reasoning that can be accomplished. The KR2ML N/A Poster #6 McIlraith workshop aims at the intersection of these two N/A Poster #2 Ren subfelds of AI. It will shine a light on the N/A Poster #17 Kolev synergies that (could/should) exist between KRR N/A Poster #20 Shvo and ML, and will initiate a discussion about the N/A Poster #19 Dobrowolska key challenges in the feld. N/A Poster #14 Stoica N/A Poster #5 Rama Schedule N/A Poster #9 Safavi N/A Poster #15 Stoica N/A Poster #3 Pittala 08:40 Poster Teasers N/A Poster #7 Parikh AM N/A Poster #8 Christofersen 09:00 Opening Remarks AM N/A Poster #11 Rader 09:06 Invited Talk #1 Etzioni N/A Poster #12 Biggio AM N/A Poster #10 Yan 09:36 Invited Talk #2 Lin N/A Poster #13 Parmar AM N/A Poster #18 Vaezipoor 10:06 Invited Talk #3 Rocktäschel N/A Poster #16 Niu, Thattai AM 10:35 Q&A #1 Etzioni, Rocktäschel, Lin AM BabyMind: How Babies Learn and How 10:50 Break #1 Machines Can Imitate AM

11:06 Invited Talk #4 Leskovec Byoung-Tak Zhang, Gary Marcus, Angelo Cangelosi, AM Pia Knoeferle, Klaus Obermayer, David Vernon, Chen 11:36 Invited Talk #5 Ji Yu AM Fri Dec 11, 08:40 AM 12:06 Invited Talk #6 Wu PM Deep neural network models have shown 12:35 Q&A #2 Ji, Leskovec, Wu remarkable performance in tasks such as visual PM object recognition, speech recognition, and 12:50 Poster Session autonomous . We have seen PM continuous improvements throughout the years 02:00 Panel #1 Bengio, Kahneman, Kautz, which have led to these models surpassing PM Lamb, Marcus, Rossi human performance in a variety of tasks such as 03:01 Contributed Talk #2-v2 Illanes image classifcation, video games, and board PM games. However, the performance of deep 03:30 Q&A #3 learning models heavily relies on a massive PM amount of data, which requires huge time and 115 Dec. 11, 2020

efort to collect and label them. encourage interdisciplinary contributions from researchers in the above topics. Hence, we Recently, to overcome these weaknesses and expect this workshop to be a good starting point limitations, attention has shifted towards for participants in various felds to discuss machine learning paradigms such as semi- theoretical fundamentals, open problems, and supervised learning, incremental learning, and major directions of further development in an meta-learning which aim to be more data- exciting new area. efcient. However, these learning models still require a huge amount of data to achieve high performance on real-world problems. There has Schedule been only a few achievement or breakthrough, especially in terms of the ability to grasp abstract 08:40 Opening Remarks: BabyMind, concepts and to generalize problems. AM Byoung-Tak Zhang and Gary Marcus Zhang, Marcus In contrast, human babies gradually make sense 09:00 Invited Talk: Latent Diversity in of the environment through their experiences, a AM Human Concepts Kidd process known as learning by doing, without a 09:40 Invited Talk: The Role of large amount of labeled data. They actively AM Embodiment in Development Brock engage with their surroundings and explore the 10:20 Cofee Break world through their own interactions. They AM gradually acquire the abstract concept of objects and develop the ability to generalize problems. 10:45 Contributed Talk: Automatic Recall Thus, if we understand how a baby's mind AM Machines: Internal Replay, Continual develops, we can imitate those learning Learning and the Brain Ji processes in machines and thereby solve 11:00 Contributed Talk: Architecture previously unsolved problems such as domain AM Agnostic Neural Networks Talukder generalization and overcoming the stability- 11:15 Contributed Talk: Human-Like Active plasticity dilemma. In this workshop, we explore AM Learning: Machines Simulating how these learning mechanisms can help us build Human Learning Process Lim human-level intelligence in machines. 11:30 Poster Session Park, Yu, Lafaquière, AM Zhang, Caselles-Dupré, Snell, Ball, Shin, In this interdisciplinary workshop, we bring Sucevic, Chen, Choi, Ko, Ji together eminent researchers in Computer 12:30 Lunch Break Science, Cognitive Science, Psychology, Brain PM Science, Developmental Robotics and various 01:30 Contributed Talk: What can babies other related felds to discuss the below PM teach us about contrastive questions on babies vs. machines. methods? Sucevic 01:45 Contributed Talk: Learning Canonical ■ How far is the state-of-the-art machine PM Transformations Dulberg intelligence from babies? 02:00 Contributed Talk: Modeling Social ■ How does a baby learn from their own PM Interaction for Baby in Simulated interactions and experiences? Environment for Developmental ■ What sort of insights can we acquire from the Robotics Park baby's mind? 02:15 Contributed Talk: ARLET: Adaptive ■ How can those insights help us build smart PM Representation Learning with End- machines with baby-like intelligence? to-end Training Choi ■ How can machines learn from babies to do better? 02:30 Contributed Talk: Not all input is ■ How can these machines further contribute to PM equal: the efcacy of book reading solving the real-world problems? in infants' word learning is mediated by child-directed speech Ko We will invite selected experts in the related 02:45 Cofee Break & Poster Session felds to give insightful talks. We will also PM 116 Dec. 11, 2020

03:30 Invited Talk: Developmental Machine Learning for Economic Policy PM Robotics: Language Learning, Trust and Theory of Mind Cangelosi Stephan Zheng, Alex Trott, Annie Liang, Jamie Morgenstern, David Parkes, Nika Haghtalab 04:10 Invited Talk: Growing into PM intelligence the human way: What Fri Dec 11, 09:00 AM do we start with, and how do we learn the rest? Tenenbaum www.mlforeconomicpolicy.com 04:50 Panel Discussion [email protected] PM The goal of this workshop is to inspire and engage a broad interdisciplinary audience, Abstracts (4): including computer scientists, economists, and Abstract 2: Invited Talk: Latent Diversity in social scientists, around topics at the exciting Human Concepts in BabyMind: How Babies intersection of economics, public policy, and Learn and How Machines Can Imitate, Kidd machine learning. We feel that machine learning 09:00 AM ofers enormous potential to transform our understanding of economics, economic decision Celeste Kidd (University of California, Berkeley) making, and public policy, and yet its adoption by economists and social scientists remains nascent.

Abstract 3: Invited Talk: The Role of Embodiment in Development in BabyMind: We want to use the workshop to expose some of How Babies Learn and How Machines Can the critical socio-economic issues that stand to Imitate, Brock 09:40 AM beneft from applying machine learning, expose underexplored economic datasets and Oliver Brock (Technical University of Berlin) simulations, and identify machine learning research directions that would have signifcant positive socio-economic impact. In efect, we aim Abstract 16: Invited Talk: Developmental to accelerate the use of machine learning to Robotics: Language Learning, Trust and rapidly develop, test, and deploy fair and Theory of Mind in BabyMind: How Babies equitable economic policies that are grounded in Learn and How Machines Can Imitate, representative data. Cangelosi 03:30 PM For example, we would like to explore questions Angelo Cangelosi around whether machine learning can be used to help with the development of efective economic policy, to understand economic behavior through Abstract 17: Invited Talk: Growing into granular, economic data sets, to automate intelligence the human way: What do we economic transactions for individuals, and how start with, and how do we learn the rest? in we can build rich and faithful simulations of BabyMind: How Babies Learn and How economic systems with strategic agents. We Machines Can Imitate, Tenenbaum 04:10 PM would like to develop economic policies and mechanisms that target socio-economic issues Josh Tenenbaum (Massachusetts Institute of including diversity and fair representation in Technology) economic outcomes, economic equality, and improving economic opportunity. In particular, we want to highlight both the opportunities as well as the barriers to adoption of ML in economics. 117 Dec. 11, 2020

Schedule Abstract 6: Panel Discussion: Algorithms & Methodology in Machine Learning for Economic Policy, 10:45 AM 09:00 Introduction 1 AM Eva Tardos, 09:05 Keynote: Michael Kearns Kearns Thore Graepel, AM Doyne Farmer, & Emma Pierson 09:45 Best Paper (Empirical) AM 10:00 5 Minute Break AM Abstract 9: Best Paper (Methodology) in 10:05 Keynote: Doina Precup Precup Machine Learning for Economic Policy, 12:40 AM PM

10:45 Panel Discussion: Algorithms & “Empirical Welfare Maximization with AM Methodology Constraints”, L. Sun 11:45 15 Minute Break AM 12:00 Keynote: Susan Athey Athey Abstract 11: Keynote: Sendhil Mullainathan PM in Machine Learning for Economic Policy, 12:40 Best Paper (Methodology) Mullainathan 01:00 PM PM 12:55 5 Minute Break Machine Learning and Economic Policy: The Uses PM of Prediction. 01:00 Keynote: Sendhil Mullainathan PM Mullainathan Machine learning tools excel at producing models that work in a predictive sense. Economics and 01:40 Panel Discussion: ML in Economics & policy, however, rely heavily on causality. One PM Real-World Policy fruitful approach to this tension is to marry causal 02:40 5 Minute Break inference and machine learning techniques. In PM this talk, I will argue for a complementary, 02:45 Posters, Focus Groups, Unstructured second approach: that prediction in and of itself PM Discussion can be very useful for a swath of applications. Many important policy problems have embedded in them pure prediction problems. Moreover, Abstracts (6): prediction tools by themselves can help reveal Abstract 2: Keynote: Michael Kearns in fundamental social mechanisms. These kinds of Machine Learning for Economic Policy, applications are plentiful, but sit in a blind spot: Kearns 09:05 AM because we have not had prediction tools in the past, we are not used to seeing them. Privacy and Fairness in Markets and Finance

Abstract 12: Panel Discussion: ML in Abstract 3: Best Paper (Empirical) in Machine Economics & Real-World Policy in Machine Learning for Economic Policy, 09:45 AM Learning for Economic Policy, 01:40 PM

"Estimating Policy Functions in Payment Systems Rediet Abebe, using Reinforcement Learning” P. Castro, A. Sharad Goel, Desai, H. Du, R. Garratt, F. Rivadeneyra Dan Bjorkegren, & Marietje Schaake 118 Dec. 12, 2020

Dec. 12, 2020 Website: www.afciworkshop.org

Algorithmic Fairness through the Lens of Schedule Causality and Interpretability 01:47 Tutorial: Questions Awa Dieng, Jessica Schrouf, Matt J Kusner, Golnoosh AM Farnadi, Fernando Diaz 01:55 Invited Talk: On Prediction, Action

Sat Dec 12, 01:00 AM AM and Interference Silva 02:25 Questions: Invited talk, R. Silva Black-box machine learning models have gained AM widespread deployment in decision-making 02:38 Introduction to contributed talks settings across many parts of society, from AM sentencing decisions to medical diagnostics to 03:10 Introduction to invited talk by Hoda loan lending. However, many models were found AM Heidari to be biased against certain demographic groups. 03:45 Short break -- Join us on Initial work on Algorithmic fairness focused on AM Gathertown formalizing statistical measures of fairness, that 03:55 Virtual Breakout Session 1 could be used to train new classifers. While these AM models were an important frst step towards addressing fairness concerns, there were 04:55 Introduction to Poster session immediate challenges with them. Causality has AM recently emerged as a powerful tool to address 07:55 Introduction to invited talk by Jon these shortcomings. Causality can be seen as a AM Kleinberg model-frst approach: starting with the language 08:32 Questions: Invited talk, J. Kleinberg of structural causal models or potential AM outcomes, the idea is to frame, then solve 09:15 Introduction to invited talk by Lily questions of algorithmic fairness in this language. AM Hu Such causal defnitions of fairness can have far- reaching impact, especially in high risk domains. Interpretability on the other hand can be viewed Abstracts (7): as a user-frst approach: can the ways in which Abstract 1: Tutorial: Questions in Algorithmic algorithms work be made more transparent, Fairness through the Lens of Causality and making it easier for them to align with our Interpretability, 01:47 AM societal values on fairness? In this way, Interpretability can sometimes be more Submit your questions on Rocketchat and the actionable than Causality work. moderator will convey them to the speaker

Given these initial successes, this workshop aims to more deeply investigate how open questions in Abstract 2: Invited Talk: On Prediction, Action algorithmic fairness can be addressed with and Interference in Algorithmic Fairness Causality and Interpretability. Questions such as: through the Lens of Causality and What improvements can causal defnitions Interpretability, Silva 01:55 AM provide compared to existing statistical defnitions of fairness? How can causally Ultimately, we want the world to be less unfair by grounded methods help develop more robust changing it. Just making fair passive predictions fairness algorithms in practice? What tools for is not enough, so our decisions will eventually interpretability are useful for detecting bias and have an efect on how a societal system works. building fair systems? What are good We will discuss ways of modelling hypothetical formalizations of interpretability when addressing interventions so that particular measures of fairness questions? counterfactual fairness are respected: that is, 119 Dec. 12, 2020

how are sensitivity attributes interacting with our pwd=SStyWFNmdFlUQUFnekt4Q2FWSXhYQT09 actions to cause an unfair distribution outcomes, and that being the case how do we mitigate such Q&A with Ricardo Silva: 11:55 AM - 12:55PM uneven impacts within the space of feasible onlinequestions event ID: 12122002 actions? To make matters even harder, Zoom: https://ucl.zoom.us/j/91814715763? interference is likely: what happens to one pwd=dmZkWkh6ZmN4bWN3WjY2L0dpakE2Zz09 individual may afect another. We will discuss how to express assumptions about and Q&A with Hoda Heidari: 11:55 AM - 12:55PM consequences of such causative factors for fair onlinequestions event ID: 12122003 policy making, accepting that this is a daunting https://us02web.zoom.us/j/89640547267? task but that we owe the public an explanation of pwd=RzVZOW9ISmtaSmhLaE5BTFJnRFdtUT09 our reasoning. Joint work with Matt Kusner, Chris Russell and Joshua Loftus Abstract 8: Introduction to Poster session in Algorithmic Fairness through the Lens of Causality and Interpretability, 04:55 AM Abstract 3: Questions: Invited talk, R. Silva in Algorithmic Fairness through the Lens of Please join the gather.town for the poster session. Causality and Interpretability, 02:25 AM

Submit your questions in Rocket.chat and the Abstract 10: Questions: Invited talk, J. moderator will convey them to the speaker. Kleinberg in Algorithmic Fairness through the Lens of Causality and To ask questions live, please join the Zoom call. Interpretability, 08:32 AM We highly encourage that you use rocketchat. Submit your questions in Rocket.chat and the Please join the speaker during the breakout moderator will convey them to the speaker. sessions for more discussions. To ask questions live, please join the Zoom call. We highly encourage that you use rocketchat.

Abstract 4: Introduction to contributed talks Please join the speaker during the breakout in Algorithmic Fairness through the Lens of Causality and Interpretability, 02:38 AM sessions for more discussions.

Please join the authors on Gather.Town during the poster sessions for questions. Medical Imaging Meets NeurIPS Feel free to submit your questions on Rocketchat and the moderator will convey them to the Jonas Teuwen, Qi Dou, Ben Glocker, Ipek Oguz, Aasa authors. Feragen, Hervé Lombaert, Ender Konukoglu, Marleen de Bruijne

Sat Dec 12, 02:30 AM Abstract 7: Virtual Breakout Session 1 in Algorithmic Fairness through the Lens of 'Medical Imaging meets NeurIPS' is a satellite Causality and Interpretability, 03:55 AM workshop established in 2017. The workshop aims to bring researchers together from the Please join the Zoom for breakout discussions. In medical image computing and machine learning case the Zoom is full, you can join the Breakouts communities. The objective is to discuss the through Gather.Town at the corresponding table. major challenges in the feld and opportunities for joining forces. This year the workshop will feature Fairness in Health: 11:55 AM - 12:55PM online oral and poster sessions with an emphasis onlinequestions event ID: 12122001 on audience interactions. In addition, there will Zoom: https://ucl.zoom.us/j/98811169765? 120 Dec. 12, 2020

be a series of high-profle invited speakers from 06:00 Break industry, academia, engineering and medical AM sciences giving an overview of recent advances, 06:45 Keynote by Spyridon Bakas: The challenges, latest technology and eforts for AM Federated Tumor Segmentation sharing clinical data. (FeTS) Initiative: Towards a paradigm-shift in multi-institutional Medical imaging is facing a major crisis with an collaborations Bakas ever increasing complexity and volume of data 07:25 Deep learning to assist radiologists and immense economic pressure. The AM in breast cancer diagnosis with interpretation of medical images pushes human ultrasound imaging Shen abilities to the limit with the risk that critical 07:35 Privacy-preserving medical image patterns of disease go undetected. Machine AM analysis Ziller learning has emerged as a key technology for 07:45 Poster Session 2 developing novel tools in computer aided AM diagnosis, therapy and intervention. Still, 09:00 Keynote by Jerry Prince: New progress is slow compared to other felds of visual AM Approaches for Magnetic Resonance recognition which is mainly due to the domain Image Harmonization Prince complexity and constraints in clinical applications which require most robust, accurate, and reliable 09:40 Brain2Word: Improving Brain solutions. The workshop aims to raise the AM Decoding Methods and Evaluation awareness of the unmet needs in machine Pascual Ortiz learning for successful applications in medical 09:50 3D Infant Pose Estimation Using imaging. AM Transfer Learning Ellershaw 10:00 FastMRI Introduction Muckley AM Schedule 10:15 FastMRI Talk 1 Mostapha AM 02:20 Introduction by Ben Glocker Glocker 10:25 FastMRI Talk 2 Ramzi AM AM 02:30 Keynote by Lena Maier-Hein: 10:35 FastMRI Talk 3 Kim AM Addressing the Data Bottleneck in AM Biomedical Image Analysis Maier-Hein 10:50 FastMRI keynote Yvonne Lui: 03:10 DeepSim: Semantic similarity AM Fast(er) MRI: a radiologist's AM metrics for learned image perspective Lui registration Czolbe 11:30 Closing remarks 03:20 Representing Ambiguity in AM AM Registration Problems with N/A AI system for predicting the Conditional Invertible Neural deterioration of COVID-19 patients Networks Trofmova in the emergency department 03:30 Poster Session 1 Shamout AM N/A COVIDNet-S: SARS-CoV-2 lung 05:00 Keynote by Nathan Silberman: Real- disease severity grading of chest X- AM world Insights from Patient-facing rays using deep convolutional neural Machine Learning Models Silberman networks Wong 05:40 Using StyleGAN for Visual N/A Zero-dose PET Reconstruction with AM Interpretability of Deep Learning Missing Input by U-Net with Models on Medical Images Schutte Attention Modules Ouyang 05:50 Context-aware Self-supervised N/A RANDGAN: Randomized Generative AM Learning for Medical Images Using Adversarial Network for Detection of Graph Neural Network Sun COVID-19 in Chest X-ray Motamed 121 Dec. 12, 2020

N/A Attention Transfer Outperforms N/A Ultrasound Diagnosis of COVID-19: Transfer Learning in Medical Image Robustness and Explainability Disease Classifers Akbarian Roberts N/A A Bayesian Unsupervised Deep- N/A Self-supervised out-of-distribution Learning Based Approach for detection in brain CT scans Kim Deformable Image Registration N/A Improving Interpretability in Medical Khawaled Imaging Diagnosis using Adversarial N/A Semi-Supervised Learning of MR Training Margeloiu Image Synthesis without Fully- N/A Unsupervised detection of Sampled Ground-Truth Acquisitions Hypoplastic Left Heart Syndrome in Yurt fetal screening Chotzoglou N/A Learning MRI contrast agnostic N/A Hip Fracture Risk Modeling Using registration Hofmann, Dalca DXA and Deep Learning Sadowski N/A Adversarial cycle-consistent N/A Autoencoder Image Compression synthesis of cerebral microbleeds Algorithm for Reduction of Resource for data augmentation Faryna Requirements Kwon N/A Embracing the Disharmony in N/A Classifcation with a domain shift in Heterogeneous Medical Data Wang medical imaging Fontanella N/A Deep Learning extracts novel MRI N/A 3D UNet with GAN discriminator for biomarkers for Alzheimer’s disease robust localisation of the fetal brain progression Li and trunk in MRI with partial N/A Multi-Label Incremental Few-Shot coverage of the fetal body Uus Learning for Medical Image N/A Decoding Brain States: Clustering Pathology classifers Seyyed-Kalantari fMRI Dynamic Functional N/A Retrospective Motion Correction of Connectivity Timeseries with Deep MR Images using Prior-Assisted Autoencoders Spencer Deep Learning Chatterjee N/A Community Detection in Medical N/A Towards disease-aware image Image Datasets: Using Wavelets and editing of chest X-rays Saboo Spectral Clustering Yousefzadeh N/A Difusion MRI-based structural N/A RATCHET: Medical Transformer for connectivity robustly predicts Chest X-ray Diagnosis and Reporting "brain-age'' Gurusamy Hou N/A Clinical Validation of Machine N/A Annotation-Efcient Deep Semi- Learning Algorithm Generated Supervised Learning for Automatic Images Kwon Knee Osteoarthritis Severity N/A Hierarchical Amortized Training for Diagnosis from Plain Radiographs Memory-efcient High Resolution 3D Nguyen GAN Sun N/A MVD-Fuse: Detection of White N/A Can We Learn to Explain Chest X- Matter Degeneration via Multi-View Rays?: A Cardiomegaly Use Case Learning of Difusion Microstructure Jethani Fadnavis N/A Joint Hierarchical Bayesian Learning N/A Scalable solutions for MR image of Full-structure Noise for Brain classifcation of Alzheimer's disease Source Imaging Hashemi Brueningk N/A Learning to estimate a surrogate N/A Quantifcation of task similarity for respiratory signal from cardiac efcient knowledge transfer in motion by signal-to-signal biomedical image analysis Scholz translation Iyer N/A Harmonization and the Worst N/A Biomechanical modelling of brain Scanner Syndrome Moyer atrophy through deep learning da Silva 122 Dec. 12, 2020

N/A LVHNet: Detecting Cardiac This talk will highlight some of the successes and Structural Abnormalities with Chest challenges we encountered on our journey. X-Rays Bhave N/A Predicting the Need for Intensive Care for COVID-19 Patients using Abstract 3: DeepSim: Semantic similarity Deep Learning on Chest metrics for learned image registration in Radiography Hu Medical Imaging Meets NeurIPS, Czolbe 03:10 N/A Probabilistic Recovery of Missing AM Phase Images in Contrast-Enhanced CT Patel We propose a semantic similarity metric for image registration. Existing metrics like euclidean N/A A Deep Learning Model to Detect distance or normalized cross-correlation focus on Anemia from Echocardiography aligning intensity values, giving difculties with Hughes low intensity contrast or noise. Our semantic N/A A Critic Evaluation Of Covid-19 approach learns dataset-specifc features that Automatic Detection From X-Ray drive the optimization of a learning-based Images Maguolo registration model. Comparing to existing N/A Comparing Sparse and Deep Neural unsupervised and supervised methods across Network(NN)s: Using AI to Detect multiple image modalities and applications, we Cancer. Strauss achieve consistently high registration accuracy N/A Encoding Clinical Priori in 3D and faster convergence than state of the art, and Convolutional Neural Networks for the learned invariance to noise gives smoother Prostate Cancer Detection in bpMRI transformations on low-quality images. Saha N/A Semantic Video Segmentation for Intracytoplasmic Sperm Injection Abstract 4: Representing Ambiguity in Procedures He Registration Problems with Conditional N/A Modifed VGG16 Network for Medical Invertible Neural Networks in Medical Image Analysis Vatsavai Imaging Meets NeurIPS, Trofmova 03:20 AM N/A StND: Streamline-based Non-rigid partial-Deformation Tractography Image registration is the basis for many Registration Chandio applications in the felds of medical image computing and computer assisted interventions. One example is the registration of 2D X-ray Abstracts (61): images with preoperative three-dimensional computed tomography (CT) images in Abstract 2: Keynote by Lena Maier-Hein: intraoperative surgical guidance systems. Due to Addressing the Data Bottleneck in the high safety requirements in medical Biomedical Image Analysis in Medical applications, estimating registration uncertainty Imaging Meets NeurIPS, Maier-Hein 02:30 AM is of a crucial importance in such a scenario. Machine learning has begun to revolutionize However, previously proposed methods, including almost all areas of health research. Success classical iterative registration methods and deep stories cover a wide variety of application felds learning-based methods have one characteristic ranging from radiology and dermatology to in common: They lack the capacity to represent gastroenterology and mental health applications. the fact that a registration problem may be Strikingly, however, such widely known success inherently ambiguous, meaning that multiple stories appear to be lacking in some subfelds of (substantially diferent) plausible solutions exist. healthcare, such as surgery. A main reason for To tackle this limitation, we explore the this phenomenon could be the lack of large application of invertible neural networks (INN) as annotated training data sets. In the past years, core component of a registration methodology. In we have investigated the hypothesis that this the proposed framework, INNs enable going bottleneck can be overcome by simulated data. beyond point estimates as network output by representing the possible solutions to a 123 Dec. 12, 2020

registration problem by a probability distribution latent vectors and images. Our method identifes that encodes diferent plausible solutions via the optimal direction in the latent space to create multiple modes. In a frst feasibility study, we test a change in the model prediction. By shifting the the approach for a 2D/3D registration setting by latent representation of an input image along this registering spinal CT volumes to X-ray images. To direction, we can produce a series of new this end, we simulate the X-ray images taken by synthetic images with changed predictions. We a C-Arm with multiple orientations using the validate our approach on histology and radiology principle of digitially reconstructed radiographs images, and demonstrate its ability to provide (DRRs). Due to the symmetry of human spine, meaningful explanations that are more there are potentially multiple substantially informative than GradCAM heatmaps. Our diferent poses of the C-Arm that can lead to method reveals the patterns learned by the similar projections. The hypothesis of this work is model, which allows clinicians to build trust in the that the proposed approach is able to identify model’s predictions, discover new biomarkers multiple solutions in such ambiguous registration and eventually reveal potential biases. problems.

Abstract 8: Context-aware Self-supervised Abstract 6: Keynote by Nathan Silberman: Learning for Medical Images Using Graph Real-world Insights from Patient-facing Neural Network in Medical Imaging Meets Machine Learning Models in Medical NeurIPS, Sun 05:50 AM Imaging Meets NeurIPS, Silberman 05:00 AM Although self-supervised learning enables us to While many machine learning products have a bootstrap the training by exploiting unlabeled typical pathway for development, those in the data, the generic self-supervised methods for medical imaging domain require a unique natural images do not sufciently incorporate the approach due to the higher bar for safety, context. For medical images, a desirable method efcacy, and the realities of clinical practice. In should be sensitive enough to detect deviation this talk, Nathan Silberman will discuss insights from normal-appearing tissue of each anatomical gained from launching and monitoring medical region; here, anatomy is the context. We imaging machine learning products in clinically introduce a novel approach with two levels of demanding settings. self-supervised representation learning objectives: one on the regional anatomical level and another on the patient-level. We use graph Abstract 7: Using StyleGAN for Visual neural networks to incorporate the relationship Interpretability of Deep Learning Models on between diferent anatomical regions. The structure of the graph is informed by anatomical Medical Images in Medical Imaging Meets correspondences between each patient and an NeurIPS, Schutte 05:40 AM anatomical atlas. In addition, the graph As AI-based medical devices are becoming more representation has the advantage of handling any common in imaging felds like radiology and arbitrarily sized image in full resolution. histology, interpretability of the underlying Experiments on large-scale Computer predictive models is crucial to expand their use in Tomography (CT) datasets of lung images show clinical practice. Existing heatmap-based that our approach compares favorably to baseline interpretability methods such as GradCAM only methods that do not account for the context. We highlight the location of predictive features but uses the learnt embedding for staging lung tissue do not explain how they contribute to the abnormalities related with COVID-19. prediction. In this paper, we propose a new interpretability method that can be used to understand the predictions of any black-box Abstract 10: Keynote by Spyridon Bakas: The model on images, by showing how the input Federated Tumor Segmentation (FeTS) image would be modifed in order to produce Initiative: Towards a paradigm-shift in diferent predictions. A StyleGAN is trained on medical images to provide a mapping between 124 Dec. 12, 2020

multi-institutional collaborations in Medical between data usage and privacy protection Imaging Meets NeurIPS, Bakas 06:45 AM requirements in such systems must be resolved for optimal results as well as ethical and legal Spyridon Bakas talk will revolve around his most compliance. This calls for innovative solutions recent focus on federated learning (FL), where he such as privacy-preserving machine learning co-authored what seems to be the frst study on (PPML). FL in medicine, and has been funded by the Informatics Technology for Cancer Research We present PriMIA (Privacy-preserving Medical (ITCR) program of the National Cancer Institute of Image Analysis), a software framework designed the National Institutes of Health (NIH) to develop for PPML in medical imaging. In a real-life case the federated tumor segmentation (FeTS - https:// study we demonstrate signifcantly better www.fets.ai/) platform, in collaboration with Intel, classifcation performance of a securely that enables the frst-ever real-world consortium aggregated federated learning model compared of 43 international institutions (so far) looking to human experts on unseen datasets. into FL for tumor segmentation, starting with Furthermore, we show an inference-as-a-service brain tumors. scenario for end-to-end encrypted diagnosis, where neither the data nor the model are revealed. Lastly, we empirically evaluate the Abstract 11: Deep learning to assist framework's security against a gradient-based radiologists in breast cancer diagnosis with model inversion attack and demonstrate that no ultrasound imaging in Medical Imaging usable information can be recovered from the Meets NeurIPS, Shen 07:25 AM model.

Sonography is an important tool in the detection and characterization of breast masses. Though Abstract 14: Keynote by Jerry Prince: New consistently shown to detect additional cancers as a supplemental imaging modality, breast Approaches for Magnetic Resonance Image ultrasound has been noted to have a high false- Harmonization in Medical Imaging Meets positive rate relative to mammography and NeurIPS, Prince 09:00 AM magnetic resonance imaging. Here, we propose a Magnetic resonance (MR) images have exquisite deep neural network that can detect benign and soft tissue contrast and are critical to modern malignant lesions in breast ultrasound images. clinical imaging and medical science research. The network achieves an area under the receiver Automatic processing of MR images has always operating characteristic curve (AUROC) of 0.902 been hampered, however, by the lack of (95\% CI: 0.892-0.911) on a test set consisting of standardized tissue contrasts with standardized 103,611 exams (around 2 million images) intensity scales. MR image harmonization or collected at Anonymized Institution between intensity normalization has long been 2012 and 2019. To confrm its generalizability, we investigated and used as part of neuroimaging evaluated the network on an independent pipelines to try to make quantitative measures external test set on which it achieved an AUROC compatible between MR scanners and across of 0.908 (95\% CI: 0.884 - 0.933). This highlights sites, but this has been a difcult task. In this the potential of AI in improving accuracy, talk, I will give an overview of past work and then consistency, and efciency of breast ultrasound describe three new harmonization approaches, diagnostics worldwide. each facilitated by a diferent style of deep network, that have recently been developed in my lab. The frst approach is based on image Abstract 12: Privacy-preserving medical synthesis, the second on domain adaptivity, and image analysis in Medical Imaging Meets the third on a disentangled latent space. I will NeurIPS, Ziller 07:35 AM present brief overviews and results for each method and then discuss their limitations as well The utilisation of artifcial intelligence in medicine as needs and opportunities for future research on and healthcare has led to successful clinical MR image harmonization. applications in several domains. The confict 125 Dec. 12, 2020

Abstract 17: FastMRI Introduction in Medical Imaging Meets NeurIPS, Muckley 10:00 AM Abstract 15: Brain2Word: Improving Brain Decoding Methods and Evaluation in Shortening the scan time for acquiring an MR Medical Imaging Meets NeurIPS, Pascual Ortiz image is a major outstanding problem for the MRI 09:40 AM community. To engage the community towards this objective, we hosted the second fastMRI Brain decoding, understood as the process of competition for reconstructing MR images with mapping brain activities to the stimuli that subsampled k-space data. The data set for the generated them, has been an active research 2020 competition focused on brain images and area in the last years. In the case of language included 7,299 anonymized, fully-sampled brain stimuli, recent studies have shown that it is scans, with 894 of these held back for challenge possible to decode fMRI scans into an embedding evaluation purposes. Our challenge included a of the word a subject is reading. However, such qualitative evaluation component where word embeddings are designed for natural radiologists assessed submissions for “quality of language processing tasks rather than for brain depiction of pathology.” Our challenge also decoding. Therefore, they limit our ability to introduced a Transfer track, where participants recover the precise stimulus. In this work, we were asked to run their models on scanners from propose to directly classify an fMRI scan, MRI manufacturers from outside the data set. mapping it to the corresponding word within a Results showed one team scoring best in both fxed vocabulary. Unlike existing work, we SSIM scores and qualitative radiologist evaluate on scans from previously unseen evaluations, establishing a new state-of-the-art subjects. We argue that this is a more realistic for MRI acceleration. setup and we present a model that can decode fMRI data from unseen subjects. Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy Abstract 21: FastMRI keynote Yvonne Lui: in this challenging task, signifcantly Fast(er) MRI: a radiologist's perspective in outperforming all the considered competitive baselines. Medical Imaging Meets NeurIPS, Lui 10:50 AM

The use of deep-learning based magnetic resonance image reconstruction methods is a Abstract 16: 3D Infant Pose Estimation Using rapidly developing area. Such techniques hold Transfer Learning in Medical Imaging Meets promise to solve signifcant clinical challenges in NeurIPS, Ellershaw 09:50 AM terms of improvements in image quality and decreases in acquisition time for patient comfort, This paper presents the frst deep learning-based increased accessibility, and decreased cost. 3D infant pose estimation model. We transfer- Neuroimaging is the use-case for this year’s learn models frst trained in the adult domain. fastMRI challenge and clinically relevant: MRI is The model outperforms the current 2D and 3D the best way to image the brain with excellent state-of-the-art on the synthetic infant MINI-RGBD soft tissue contrast that other imaging modalities test dataset, achieving an average joint position lack and brain MRI is the number one most error (AJPE) of 8.17 pixels and 28.47 mm common type of MRI performed. In addition to respectively. Furthermore, unlike the current 3D quantitative metrics for reconstruction quality, in state-of-the-art, the model presented here does medical applications, it is important incorporate not require a depth channel as input. This is an expert reader review into the evaluation process. important step in the development of an We will review the evaluations of six subspecialty automated general movement assessment tool neuroradiologists who compose the 2020 fastMRI for infants, which has the potential to support the challenge expert panel and discuss the broader diagnosis of a range of neurological disorders, context of deep learning-based approaches to MR including cerebral palsy. image reconstruction. 126 Dec. 12, 2020

Abstract 23: AI system for predicting the radiologists (with 20+ years of experience) and a deterioration of COVID-19 patients in the 2nd-year radiology resident. To the best of our emergency department in Medical Imaging knowledge, this is the largest study as well as Meets NeurIPS, Shamout N/A dataset of its kind for SARS-CoV-2 severity grading. Furthermore, this is the frst study of its During the COVID-19 pandemic, rapid and kind to make both models and dataset open accurate triage of patients at the emergency access for the research community. Experimental department is critical to inform decision-making. results using 100-trial stratifed Monte Carlo We propose a data-driven approach for automatic cross-validation (split between geographic and prediction of deterioration risk using a deep opacity extent) showed that the COVIDNet-S neural network that learns from chest X-ray networks achieved R^2 of 0.664 +/- 0.001 and images, and a gradient boosting model that 0.635 +/- 0.002 between predicted scores and learns from routine clinical variables. Our AI radiologist scores for geographic extent and prognosis system, trained using data from 3,661 opacity extent, respectively, with the best patients, achieves an area under the AUC of performing COVIDNet-S networks achieving R^2 0.786 (95% CI: 0.742-0.827) when predicting of 0.739 and 0.741 for geographic extent and deterioration within 96 hours. The deep neural opacity extent, respectively. These promising network extracts informative areas of chest X-ray results illustrate the potential of leveraging deep images to assist clinicians in interpreting the convolutional neural networks for computer-aided predictions, and performs comparably to two assessment of SARS-CoV-2 lung disease severity. radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at Anonymous Institution during the frst Abstract 25: Zero-dose PET Reconstruction with Missing Input by U-Net with Attention wave of the pandemic, which produced accurate Modules in Medical Imaging Meets NeurIPS, predictions in real-time. In summary, our fndings demonstrate the potential of the proposed Ouyang N/A system for assisting front-line physicians in the Positron emission tomography (PET) is a widely triage of COVID-19 patients. used molecular imaging technique with many clinical applications. To obtain high quality images, the amount of injected radiotracer in Abstract 24: COVIDNet-S: SARS-CoV-2 lung current protocols leads to the risk of radiation disease severity grading of chest X-rays exposure in scanned subjects. Recently, deep using deep convolutional neural networks in learning has been successfully used to enhance Medical Imaging Meets NeurIPS, Wong N/A the quality of low-dose PET images. Extending this to "zero-dose," i.e., predicting PET images Assessment of lung disease severity is a crucial based solely on data from other imaging step in the clinical workfow for patients with modalities such as multimodal MRI, is severe acute respiratory syndrome coronavirus 2 signifcantly more challenging but also much (SARS-CoV-2), the cause for the coronavirus more impactful. In this work, we propose a disease 2019 (COVID-19) pandemic. A routine attention-based framework that uses multi- procedure for performing such an assessment contrast MRI to reconstruct PET images using the involves analyzing chest x-rays (CXRs), with two most commonly-used radiotracer, 18F- key metrics being the extent of lung involvement fuorodeoxyglucose (FDG), a marker of and the degree of opacity. In this study, we metabolism. We also introduce an input dropout introduce COVIDNet-S, a pair of deep training strategy to handle possible missing MRI convolutional neural networks based on the contrasts. We evaluate our methods on a dataset COVID-Net architecture for performing automatic of patients with brain tumors, showing the ability geographic extent grading and opacity extent to create realistic and clinically-meaningful FDG grading. We further introduce COVIDx-S, a brain PET images with low errors compared with benchmark dataset consisting of 396 CXRs from full-dose ground truth PET images. SARS-CoV-2 positive patient cases around the world, graded by two board-certifed expert chest 127 Dec. 12, 2020

Abstract 26: RANDGAN: Randomized knowledge transfer approach to facilitate the Generative Adversarial Network for training of CNNs. Our approach adopts the Detection of COVID-19 in Chest X-ray in attention transfer framework to transfer Medical Imaging Meets NeurIPS, Motamed N/ knowledge from a carefully pre-trained CNN A teacher to a student CNN. The performance of the CNN models is then evaluated on three Automation of COVID-19 testing using medical medical image datasets including Diabetic images can speed up the testing process of Retinopathy, CheXpert, and ChestX-ray8. We patients where health care systems lack sufcient compare our results with the well-known and numbers of the reverse-transcription polymerase widely used transfer learning approach. We show chain reaction (RT-PCR) tests. Supervised deep that the teacher-student (Attention transfer) learning models such as convolutional neural framework not only outperforms transfer networks (CNN) need enough labeled data for all learning, in both in-domain and cross-domain classes to correctly learn the task of detection. knowledge transfer but also behave as a Gathering labeled data is a cumbersome task and regularizer. requires time and resources which could further strain health care systems and radiologists at the early stages of a pandemic such as COVID-19. In this study, we propose a randomized generative Abstract 28: A Bayesian Unsupervised Deep- Learning Based Approach for Deformable adversarial network (RANDGAN) that detects Image Registration in Medical Imaging images of an unknown class (COVID-19) from known and labelled classes (Normal and Viral Meets NeurIPS, Khawaled N/A Pneumonia) without the need for labels and Unsupervised deep-learning (DL) models were training data from the unknown class of images recently proposed for deformable image (COVID-19). We used the largest publicly registration tasks. In such models, a neural- available COVID-19 chest X-ray dataset, COVIDx, network is trained to predict the best deformation which is comprised of Normal, Pneumonia, and feld by minimizing some dissimilarity function COVID-19 images from multiple public databases. between the moving and the target images. We In this work, we use transfer learning to segment introduce a fully Bayesian framework for the lungs in the COVIDx dataset. Next, we show unsupervised DL-based deformable image why segmentation of the region of interest registration. Our method provides a principled (lungs) is vital to correctly learn the task of way to characterize the true posterior classifcation, specifcally in datasets that contain distribution, thus, avoiding potential over-ftting. images from diferent resources as it is the case We demonstrated the added-value of our for the COVIDx dataset. Finally, we show Bayesian unsupervised DL-based registration improved results in detection of COVID-19 cases framework on the MNIST and brain MRI (MGH10) using our generative model (RANDGAN) datasets in comparison to the VoxelMorph. Our compared to conventional generative adversarial experiments show that our approach provided networks (GANs) for anomaly detection in better estimates of the deformation feld by medical images, improving the area under the means of improved mean-squared-error (0.0063 ROC curve from 0.71 to 0.77. vs. 0.0065) and Dice coefcient (0.73 vs. 0.71) for the MNIST and the MGH10 datasets, respectively. Further, it provides an estimate of Abstract 27: Attention Transfer Outperforms the uncertainty in the deformation-feld. Transfer Learning in Medical Image Disease Classifers in Medical Imaging Meets NeurIPS, Akbarian N/A Abstract 29: Semi-Supervised Learning of MR Convolutional neural networks (CNN) are widely Image Synthesis without Fully-Sampled Ground-Truth Acquisitions in Medical used in medical images diagnostic. However, Imaging Meets NeurIPS, Yurt N/A training the CNNs is prohibitive in a low-data environment. In this study, for the low-data In this study, we present a novel semi-supervised medical image domain, we propose a novel generative model for multi-contrast MRI that 128 Dec. 12, 2020

synthesizes high-quality images without requiring We propose a novel framework for controllable large training sets of costly fully-sampled images pathological image synthesis for data of source or target contrasts. To do this, the augmentation. Inspired by CycleGAN, we perform proposed method introduces a selective loss cycle-consistent image-to-image translation expressed only in the available k-space between two domains: healthy and pathological. coefcients, and further leverages randomized Guided by a semantic mask, an adversarially sampling trajectories across training subjects to trained generator synthesizes pathology on a efectively learn relationships between acquired healthy image in the specifed location. We and nonacquired k-space samples at all locations. demonstrate our approach on an institutional Comprehensive experiments on multi-contrast dataset of cerebral microbleeds in traumatic brain images clearly demonstrate that the brain injury patients. proposed method maintains equivalent We utilize synthetic images generated with our performance to gold-standard model based on method for data augmentation in the detection of fully-supervised training, while alleviating cerebral microbleeds detection. Enriching the undesirable dependency on large-scale fully- training dataset with synthetic images exhibits sampled MRI acquisitions. the potential to increase detection performance for cerebral microbleeds in traumatic brain injury patients. Abstract 30: Learning MRI contrast agnostic registration in Medical Imaging Meets NeurIPS, Hofmann, Dalca N/A Abstract 32: Embracing the Disharmony in Heterogeneous Medical Data in Medical We introduce a strategy for learning image Imaging Meets NeurIPS, Wang N/A registration without imaging data, producing powerful networks agnostic to magnetic Heterogeneity in medical imaging data is often resonance imaging (MRI) contrast. While classical tackled, in the context of machine learning, using methods accurately estimate the spatial domain invariance, i.e. deriving models that are correspondence between images, they solve an robust to domain shifts, which can be both within optimization problem for every new image pair. domain (e.g. demographics) and across domains Learning methods are fast at test time but limited (e.g. scanner/protocol characteristics). However to images with contrasts and geometric content this approach can be detrimental to performance seen at training. We propose to remove this because it necessitates averaging across intra- dependency using a generative strategy that class variability and reduces discriminatory exposes networks to a wide range of synthetic power of learned models, in order to achieve images during training, forcing them to better intra- and inter-domain generalization. This generalize. We show that networks trained within paper instead embraces the heterogeneity and this framework generalize to a broad array of treats it as a multi-task learning problem to unseen MRI contrasts and surpass the state of explicitly adapt trained classifers to both inter- the art brain registration accuracy for any site and intra-site heterogeneity. We demonstrate contrast combination tested. Critically, training that the error of a base classifer on challenging on shapes synthesized from noise distributions 3D brain magnetic resonance imaging (MRI) results in competitive performance, removing the datasets can be reduced by 2-3x, in certain tasks, dependency on acquired data of any kind. by adapting to the specifc demographics of the However, if available, synthesizing images from patients, and diferent acquisition protocols. anatomical labels can further boost accuracy. Learning the characteristics of domain shifts is achieved via auxiliary learning tasks leveraging commonly available data and variables, e.g.

Abstract 31: Adversarial cycle-consistent demographics. In our experiments, we use synthesis of cerebral microbleeds for data gender classifcation and age regression as augmentation in Medical Imaging Meets auxiliary tasks helping the network weights trained on a source site adapt to data from a NeurIPS, Faryna N/A target site; we show that this approach improves classifcation error by 5-30% across diferent 129 Dec. 12, 2020

datasets on the main classifcation tasks, e.g. diferent classes. However, this setting difers disease classifcation. notably from the medical imaging domain, where images are multilabel, of fewer total categories, and retention of base label predictions is desired. Abstract 33: Deep Learning extracts novel In this paper, we study incremental few-shot learning for low- and multilabel medical image MRI biomarkers for Alzheimer’s disease data to address the problem of learning a novel progression in Medical Imaging Meets NeurIPS, Li N/A disease with few fnetuning samples while retaining knowledge over base fndings. We show Case/control Genome-wide association studies strong performance on incrementally learned (GWAS) for late onset Alzheimer's disease (AD) novel disease labels for chest X-rays with strong may miss genetic variants relevant for performance retention on base classes. delineating disease stages when using clinically defned case/control as a phenotype since the cases highlight advanced AD and widely Abstract 35: Retrospective Motion Correction heterogeneous mild cognitive impairment of MR Images using Prior-Assisted Deep patients are usually excluded. More precise Learning in Medical Imaging Meets NeurIPS, phenotypes for AD are in demand. Here we use a Chatterjee N/A transfer learning technique to train three- dimensional convolutional neural network (CNN) In MRI, motion artifacts are among the most models based on structural Magnetic Resonance common types of artefacts. They can greatly Images (MRI) from the screening stage in the degrade images and make them unusable for an ADNI consortium to derive image features that accurate diagnosis. Traditional methods, such as refect AD progression. CNN-derived image prospective or retrospective motion correction, phenotypes are signifcantly associated with are commonly used to avoid or limit the presence genetic variants mapped to candidate genes of motion artifacts. Recently, several other enriched for amyloid beta degradation, tau methods based on deep learning approaches phosphorylation, calcium ion binding-dependent have been proposed to solve this problem. This synaptic loss, APP regulated infammation work tries to enhance the performance of existing response, and insulin resistance. This is the frst deep learning models by making use of additional attempt to show that non-invasive MRI information present as image priors. The biomarkers are linked to AD progression proposed approach has shown promising results characteristics, reinforcing their utilizations in and will be further investigated for clinical early AD diagnosis and progression monitoring. validity.

Abstract 34: Multi-Label Incremental Few- Abstract 36: Towards disease-aware image Shot Learning for Medical Image Pathology editing of chest X-rays in Medical Imaging classifers in Medical Imaging Meets Meets NeurIPS, Saboo N/A NeurIPS, Seyyed-Kalantari N/A Disease-aware image editing by means of Deep learning models for medical image generative adversarial networks (GANs) classifcation are typically trained to predict pre- constitutes a promising avenue for advancing the determined radiological fndings and cannot use of AI in the healthcare sector incorporate a novel disease at test time - we present a Proof of Concept of the same. efciently. Retraining an entirely new classifer is While GAN-based techniques have often out of question due to insufcient novel been successful in generating and manipulating data or lack of compute/base disease data. Thus, natural images, their application to learning fast adapting models is essential. Few- the medical domain, however, is still in its shot learning has shown promise for efcient infancy. Working with the CheXpert data adaptation to new classes at test time, but set, here we show that StyleGAN can be trained literature revolves primarily around single-label to generate realistic chest X-rays. natural images distributed over a large number of Inspired by the Cyclic Reverse Generator (CRG) 130 Dec. 12, 2020

framework, we train an encoder Generative machine learning (ML) methods can that allows for faithfully inverting the generator reduce time, cost, and radiation associated with on synthetic X-rays and provides medical image acquisition, compression, or organ-level reconstructions of real ones. generation techniques. While quantitative metrics Employing a guided manipulation of are commonly used in the evaluation of ML latent codes, we confer the medical condition of generated images, it is unknown how well these Cardiomegaly (increased heart quantitative metrics relate to the diagnostic size) onto real X-rays from healthy patients utility of images. Here, fellowship-trained radiologists provided diagnoses and qualitative evaluations on chest radiographs reconstructed from the current standard JPEG2000 or variational Abstract 37: Difusion MRI-based structural autoencoder (VAE) techniques. Cohen’s kappa connectivity robustly predicts "brain-age'' in Medical Imaging Meets NeurIPS, Gurusamy coefcient measured the agreement of diagnoses N/A based on diferent reconstructions. Methods that produced similar Fréchet inception distance (FID) Neuroimaging-based biomarkers of brain health showed similar diagnostic performances. Thus in are necessary for early diagnosis of cognitive place of time-intensive expert radiologist decline in the aging population. While many verifcation, an appropriate target FID -- an recent studies have investigated whether an objective quantitative metric -- can evaluate the individual's "brain-age'' can be accurately clinical utility of ML generated medical images. predicted based on anatomical or functional brain biomarkers, comparatively few studies have sought to predict brain-age with structural Abstract 39: Hierarchical Amortized Training connectivity features alone. Here, we for Memory-efcient High Resolution 3D investigated this question with data from a large GAN in Medical Imaging Meets NeurIPS, Sun cross-sectional study of elderly volunteers in N/A India (n=158 participants, age-range=51-86 yrs, 66 females). We analyzed 23 standardized Generative Adversarial Networks (GAN) have cognitive test scores obtained from these many potential medical imaging applications, participants with factor analysis. All test score including data augmentation, domain adaptation, variations could be explained with just three and model explanation. Due to the limited latent cognitive factors, each of which declined embedded memory of Graphical Processing Units markedly with age. Next, using difusion (GPUs), most current 3D GAN models are trained magnetic resonance imaging (dMRI) and on low-resolution medical images. In this work, tractography we estimated the structural brain we propose a novel end-to-end GAN architecture connectome in a subset of n=101 individuals. that can generate high-resolution 3D images. We Structural connectivity features robustly achieve this goal by separating training and predicted inter-individual variations in cognitive inference. During training, we adopt a factor scores (r=0.293-0.407, p<0.001) and hierarchical structure that simultaneously chronological age (r=0.517-0.535, p<0.001), and generates a low-resolution version of the image identifed critical connections in the prefrontal and a randomly selected sub-volume of the high- and parietal cortex whose strength most strongly resolution image. The hierarchical design has two predicted each of these variables. dMRI structural advantages: First, the memory demand for connectivity may serve as a reliable tool for training on high-resolution images is amortized predicting age-related cognitive decline in among subvolumes. Furthermore, anchoring the healthy individuals, as well as accelerated decline high-resolution subvolumes to a single low- in patient populations. resolution image ensures anatomical consistency between subvolumes. During inference, our model can directly generate full high-resolution Abstract 38: Clinical Validation of Machine images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms Learning Algorithm Generated Images in baselines in quality of generated images. Medical Imaging Meets NeurIPS, Kwon N/A 131 Dec. 12, 2020

covariance along the manifold of positive defnite matrices. Our results demonstrate that the novel Abstract 40: Can We Learn to Explain Chest framework signifcantly improves upon state-of- X-Rays?: A Cardiomegaly Use Case in the-art techniques in the real-world scenario with Medical Imaging Meets NeurIPS, Jethani N/A fully-structured noise covariance.

In order to capitalize on the numerous applications of machine learning for medical imaging analysis, clinicians need to understand Abstract 42: Learning to estimate a surrogate the clinical decisions made by machine learning respiratory signal from cardiac motion by (ML) models. This allows clinicians to trust ML signal-to-signal translation in Medical models, understand their failure modes, and Imaging Meets NeurIPS, Iyer N/A ideally learn from their superhuman capabilities and expand clinical knowledge. Providing In this work, we develop a neural network-based explanations for each high resolution image in a method to convert a noisy motion signal large medical database can be computationally generated from segmenting cardiac SPECT images, to that of a high-quality surrogate signal, expensive. Recent methods amortize this cost by such as those seen from external motion tracking learning a selector model that takes a sample of data and selects the subset of its features that is systems (EMTs). This synthetic surrogate will be important. We show that while the selector model used as input to our pre-existing motion learned by these methods make it simple for correction technique developed for EMT surrogate signals. In our method, we test two families of practitioners to explain new images, the model neural networks to perform signal-to-signal learns to counterintuitively encode predictions within its selections, omitting the important translation (noisy internal motion to external features. We demonstrate that this phenomenon surrogate): 1) fully connected networks and 2) can occur even with simple medical imaging convolutional neural networks. Our dataset consists of cardiac perfusion SPECT acquisitions tasks, such as detecting cardiomegaly in chest X- for which cardiac motion was estimated (input: Rays. We propose REAL-X to address these issues and show that our method provides trustworthy COM signals) in conjunction with a respiratory explanations through quantitative and expert surrogate motion signal acquired using a radiologist evaluation. commercial Vicon Motion Tracking System (GT: EMT signals). We obtain an r-score of 0.74 between the predicted surrogate and the EMT signal and our goal is to lay a foundation to guide Abstract 41: Joint Hierarchical Bayesian the optimization of neural networks for Learning of Full-structure Noise for Brain respiratory motion correction from SPECT without Source Imaging in Medical Imaging Meets the need for an EMT. NeurIPS, Hashemi N/A

Many problems in human brain imaging involve hierarchical Bayesian (type-II maximum Abstract 43: Biomechanical modelling of likelihood) regression models for observations brain atrophy through deep learning in with latent variables for source and noise, where Medical Imaging Meets NeurIPS, da Silva N/A parameters of priors for source and noise terms need to be estimated jointly from data. One We present a proof-of-concept, deep learning (DL) example is the biomagnetic inverse problems, based, diferentiable biomechanical model of where crucial factors infuencing accuracy of realistic brain deformations. Using prescribed maps of local atrophy and growth as input, the brain source estimation are not only the noise network learns to deform images according to a level but also its correlation structure. Importantly, existing approaches have not Neo-Hookean model of tissue deformation. The addressed estimation of a full-structure noise tool is validated using longitudinal brain atrophy covariance matrix. Using ideas from Riemannian data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and we demonstrate geometry, we derive an efcient algorithm for that the trained model is capable of rapidly updating both source and a full-structure noise simulating new brain deformations with minimal 132 Dec. 12, 2020

residuals. This method has the potential to be transformation predictor can assign the anomaly used in data augmentation or for the exploration score by calculating the error between geometric of diferent causal hypotheses refecting brain transformation and prediction. Moreover, we growth and atrophy. further use self-supervised learning with context restoration for pretraining our model. By comparative experiments on clinical brain CT scans, the efectiveness of the proposed method Abstract 44: Ultrasound Diagnosis of COVID-19: Robustness and Explainability in has been verifed. Medical Imaging Meets NeurIPS, Roberts N/A

Diagnosis of COVID-19 at point of care is vital to Abstract 46: Improving Interpretability in the containment of the global pandemic. Point of Medical Imaging Diagnosis using care ultrasound (POCUS) provides rapid imagery Adversarial Training in Medical Imaging of lungs to detect COVID-19 in patients in a Meets NeurIPS, Margeloiu N/A repeatable and cost efective way. Previous work has focused on using a public dataset of POCUS We investigate the infuence of adversarial videos to train an AI model for diagnosis that training on the interpretability of convolutional obtains high sensitivity. Due to the high stakes neural networks (CNNs), specifcally applied to application we propose the use of robust and diagnosing skin cancer. We show that gradient- explainable techniques. We demonstrate based saliency maps of adversarially trained experimentally that robust models have more CNNs are signifcantly sharper and more visually stable predictions and ofer improved coherent than those of standardly trained CNNs. interpretability. A framework of contrastive Furthermore, we show that adversarially trained explanations based on adversarial perturbations networks highlight regions with signifcant color is used to explain model predictions that aligns variation within the lesion, a common with human visual perception. characteristic of melanoma. We fnd that fne- tuning a robust network with a small learning rate further improves saliency maps' sharpness. Abstract 45: Self-supervised out-of- Lastly, we provide preliminary work suggesting distribution detection in brain CT scans in that robustifying the frst layers to extract robust low-level features leads to visually coherent Medical Imaging Meets NeurIPS, Kim N/A explanations. Medical imaging data sufer from the limited availability of annotation because annotating 3D medical data is a time-consuming and expensive Abstract 47: Unsupervised detection of task. Moreover, even if the annotation is Hypoplastic Left Heart Syndrome in fetal available, supervised learning-based approaches screening in Medical Imaging Meets sufer highly imbalanced data. Most of the scans NeurIPS, Chotzoglou N/A during the screening are from normal subjects, but there are also large variations in abnormal Congenital heart disease is considered as one the cases. To address these issues, recently, most common congenital malfor- unsupervised deep anomaly detection methods mation which afects 6% − 11% per 1000 that train the model on large-sized normal scans newborns. In this work, an automated and detect abnormal scans by calculating framework for detection of cardiac anomalies reconstruction error have been reported. In this during ultrasound screening examina- paper, we propose a novel self-supervised tions is proposed and evaluated on the example learning technique for anomaly detection. Our of Hypoplastic Left Heart Syndrome, architecture largely consists of two parts: 1) a sub-category of congenital heart disease. We Reconstruction and 2) predicting geometric propose an unsupervised approach transformations. By training the network to that learns healthy anatomy exclusively from predict geometric transformations, the model clinically confrmed normal control could learn better image features and distribution patients. We evaluate a number of known of normal scans. In the test time, the geometric anomaly detection frameworks together 133 Dec. 12, 2020

with a model architecture based on the α-GAN decrease resource requirements for future ML network and fnd evidence that the research. proposed model shows a performance of 0.8 AUC and with a better robustness towards initialisation compared to individual Abstract 50: Classifcation with a domain state-of-the-art models. shift in medical imaging in Medical Imaging Meets NeurIPS, Fontanella N/A

Abstract 48: Hip Fracture Risk Modeling Labelled medical imaging datasets are often Using DXA and Deep Learning in Medical small in size, but other unlabelled datasets with a domain shift may be available. In this work, we Imaging Meets NeurIPS, Sadowski N/A propose a method that is able to exploit these The risk of hip fracture is predicted from dual- additional unlabelled data, possibly with a energy X-ray absorptiometry (DXA) images using domain shift, to improve predictions on our deep learning and over 10,000 exams from the labelled data. To this aim, we learn features in a HealthABC longitudinal study. The approach is self-supervised way while projecting all the data evaluated in four diferent clinical scenarios of onto the same space to achieve better transfer. increasing diagnostic intensity. In the scenario We frst test our approach on natural images and with the most information available, deep verify its efectiveness on Ofce-31 data. Then, learning achieves an area under the ROC curve we apply it to retinal fundus datasets and through (AUC) of 0.75 on a held-out test set, while a a series of experiments on age-related macular standard linear model that relies on feature- degeneration (AMD) and diabetic retinopathy engineering achieves an AUC of 0.72. (DR) grading, we show how our method improves the baseline of pre-training on ImageNet and fne-tuning on the labelled data in terms of classifcation accuracy, AUC and clinical Abstract 49: Autoencoder Image Compression interpretability. Algorithm for Reduction of Resource Requirements in Medical Imaging Meets NeurIPS, Kwon N/A Abstract 51: 3D UNet with GAN discriminator Exponentially increasing amounts of compute for robust localisation of the fetal brain and resources are used in the start of the art machine trunk in MRI with partial coverage of the learning (ML) models. We designed a lightweight fetal body in Medical Imaging Meets medical imaging compression machine learning NeurIPS, Uus N/A algorithm with preserved diagnostic utility. Our compression algorithm was a two-level, vector In fetal MRI, automated localisation of the fetal quantized variational autoencoder (VQ-VAE-2). brain or trunk is a prerequisite for motion We trained our algorithm in a self-supervised correction methods. However, the existing CNN- manner with CheXpert radiographs and based solutions are prone to errors and may externally validated with previously unseen require manual editing. In this work, we propose MIMIC-CXR radiographs. We also used the to combine a multi-label 3D UNet with a GAN compressed latent vectors or the reconstructed discriminator for localisation of both fetal brain CheXpert images as inputs to train a and trunk in fetal MRI stacks. The proposed DenseNet-121 classifer. VQ-VAE achieved 2.5 method is robust for datasets with both full and times the compression ratio with similar Fréchet partial coverage of the fetal body. inception distance as that of the current JPEG2000 standard. The classifer trained on latent vectors has similar AUROC as that of the Abstract 52: Decoding Brain States: model trained on original images. Model training Clustering fMRI Dynamic Functional with latent vectors required 6.2% of memory and Connectivity Timeseries with Deep compute and 48.5% time per epoch compared to Autoencoders in Medical Imaging Meets training with original images. Autoencoders can NeurIPS, Spencer N/A 134 Dec. 12, 2020

In dynamic functional connectivity analysis, brain Abstract 54: RATCHET: Medical Transformer states can be derived by identifying repetitively for Chest X-ray Diagnosis and Reporting in occurring functional connectivity patterns. This Medical Imaging Meets NeurIPS, Hou N/A presents a high-dimensional, unsupervised learning task, often approached with k-means Chest X-rays are one of the most common form of clustering. To advance this, we use deep radiological examinations. They are relatively autoencoders for dimensionality reduction before inexpensive and quick to perform. However, the ability to interpret a radiograph may take years of applying k-means to the embedded space. We training for a highly skilled practitioner. provide quantitative validation on synthetic data and demonstrate better performance than Automatic generation of radiology reports, currently used approaches. We go on to therefore, can be an attractive method to support demonstrate the utility of this method by clinical pathways and patient care. In this work, we present RATCHET: RAdiological Text Captioning applying it to real data from human subjects. for Human Examined Thoraxes. RATCHET is trained on free-text radiology reports from the MIMIC-CXR dataset, which demonstrated to be Abstract 53: Community Detection in Medical highly linguistic fuent whilst being clinically Image Datasets: Using Wavelets and accurate. Spectral Clustering in Medical Imaging Meets NeurIPS, Yousefzadeh N/A

Medical image datasets can have large number of Abstract 55: Annotation-Efcient Deep Semi- images representing patients with diferent Supervised Learning for Automatic Knee health conditions and various disease severity. Osteoarthritis Severity Diagnosis from Plain When dealing with raw unlabeled image datasets, Radiographs in Medical Imaging Meets the large number of samples often makes it hard NeurIPS, Nguyen N/A for non-experts to understand the variety of images present in a dataset. Supervised learning Osteoarthritis (OA) is a worldwide disease that occurs in joints causing irreversible damage to methods rely on labeled images which requires a cartilage and other joint tissues. The knee is considerable efort by medical experts to frst understand the communities of images present in particularly vulnerable to OA, and millions of the data and then labeling the images. Here, we people, regardless of gender, geographical propose an algorithm to facilitate the automatic location, and race, sufer from knee OA. When the disease reaches the late stages, patients have to identifcation of communities in medical image undergo a total knee replacement (TKR) surgery datasets. We further explain that such analysis can also be insightful in a supervised setting, to avoid disability. For society, direct and indirect when the images are already labeled. Such costs of OA are high, and for instance, OA is one insights are useful because, in reality, health and of the fve most expensive healthcare expenditures in Europe. In the United States, the disease severity can be considered a continuous burden of knee OA is also high, and TKR surgeries spectrum, and within each class, there usually are fner communities worthy of investigation, annually cost over 10 billion dollars. If knee OA especially when they have similarities to could be detected at an early stage, its communities in other classes. In our approach, progression might be slowed down, thereby yielding signifcant benefts at personal and we use wavelet decomposition of images in societal levels. tandem with spectral methods. We show that the eigenvalues of a graph Laplacian can reveal the Radiographs, low-cost and widely available in number of notable communities in an image primary care, are sufciently informative for knee dataset. In our experiments, we use a dataset of OA severity diagnosis. However, the process of visual assessment of radiographs is rather images labeled with diferent conditions for tedious, and as a result, various Deep Learning COVID patients. We detect 25 communities in the dataset and then observe that only 5 of those (DL) based methods for automatic diagnosis of communities contain patients with pneumonia. knee OA severity have recently been developed. The primary drawback of these methods is their dependency on large amounts of annotations, 135 Dec. 12, 2020

which are expensive in terms of cost and time to optimal trade-of between computational cost collect. and imaging details retained? Here, we present In this paper, we introduce Semixup, a novel two scalable approaches for image classifcation Semi-Supervised Learning (SSL) method, which relying on topological data analysis and ensemble we apply for to automatic diagnosis of the knee classifcation on parallelized 3D convolutional OA severity in an annotation-efcient manner. neural networks. We demonstrate the applicability of our models on a classifcation task of MR images of Alzheimer's disease patients and Abstract 56: MVD-Fuse: Detection of White cognitively normal subjects. Matter Degeneration via Multi-View Our approaches achieve competitive results in terms of area under the precision recall curve Learning of Difusion Microstructure in (0.95+/-0.03). Medical Imaging Meets NeurIPS, Fadnavis N/A

Detecting neuro-degenerative disorders in early- stage and asymptomatic patients is challenging. Abstract 58: Quantifcation of task similarity Difusion MRI (dMRI) has shown great success in for efcient knowledge transfer in generating biomarkers for cellular organization at biomedical image analysis in Medical the microscale level using complex biophysical Imaging Meets NeurIPS, Scholz N/A models, but there has never been a consensus on a clinically usable standard model. Here, we Shortage of annotated data is one of the greatest propose a new framework (MVD-Fuse) to bottlenecks related to deep learning in integrate measures of diverse difusion models to healthcare. Methods proposed to address this detect alterations of white matter microstructure. issue include transfer learning, crowdsourcing The spatial maps generated by each measure are and self-supervised learning. More recently, frst considered as a diferent difusion representation attempts to leverage the concept of meta (view), the fusion of these views being used to learning have been made. Meta learning studies detect diferences between clinically distinct how learning systems can increase in efciency groups. We investigate three diferent strategies through experience, where experience can be for performing intermediate fusion: neural represented by solutions to tasks connected to networks (NN), multiple kernel learning (MKL) and previously acquired data, for example. A core multi-view boosting (MVB). As a proof of concept, capability of meta learning-based approaches is we applied MVD-Fuse to a classifcation of the identifcation of similar previous tasks given a premanifest Huntington's disease (pre-HD) new task. Quantifying the similarity between individuals and healthy controls in the TRACK-ON tasks, however, is an open research problem. We cohort. Our results indicate that the MVD-Fuse address this challenge by investigating two boosts predictive power, especially with MKL complementary approaches: (1) Leveraging (0.90 AUC vs 0.85 with the best single difusion images and labels to embed a complete data set measure). Overall, our results suggest that an in a vector of fxed length that serves as a task improved characterization of pathological brain fngerprint (2) Directly comparing the microstructure can be obtained by combining distributions of the images with sample-based various measures from multiple difusion models. and optimal transport-based methods, thereby neglecting the labels.

Abstract 57: Scalable solutions for MR image classifcation of Alzheimer's disease in Abstract 59: Harmonization and the Worst Medical Imaging Meets NeurIPS, Brueningk N/ Scanner Syndrome in Medical Imaging A Meets NeurIPS, Moyer N/A

Magnetic resonance imaging is one of the We show that for a wide class of harmonization/ fagship techniques for non-invasive medical domain-invariance schemes several undesirable diagnosis. Yet, high-resolution three-dimensional properties are unavoidable. If a predictive (3D) imaging poses a challenge on machine machine is made invariant to a set of domains, learning applications: how to determine the the accuracy of the output predictions (as 136 Dec. 12, 2020

measured by mutual information) is limited by a previously established CXR dataset with a the domain with the least amount of information broad spectrum of pathologies, 2) refning on to begin with. If a real label value is highly another established dataset to detect informative about the source domain, it cannot pneumonia, and 3) fne-tuning on our training/ be accurately predicted by an invariant predictor. validation dataset to predict patients’ needs for These results are simple and intuitive, but we intensive care within 24, 48, 72, and 96 hours believe that it is benefcial to state them for following the CXR exams. The classifcation medical imaging harmonization. performances were evaluated on the independent test set using the area under the receiver operating characteristic curve (AUC) as the performance metric in the task of Abstract 60: LVHNet: Detecting Cardiac distinguishing between those COVID-19-positive Structural Abnormalities with Chest X-Rays in Medical Imaging Meets NeurIPS, Bhave N/ patients who required intensive care and those A who did not. We achieved an AUC [95% confdence interval] of 0.77 [0.70, 0.84] when Early identifcation of changes to heart size is predicting the need for intensive care 24 hours in critical to improving outcomes in heart failure. We advance, and at least 0.73 [0.66, 0.80] for earlier introduce a deep learning model for detecting predictions based on the AI prognostic marker cardiac structural abnormality in chest X-rays. derived from CXR images. State of the art deep learning models focus on detecting cardiomegaly, a label consistently shown to be a poor marker for cardiac disease. Abstract 62: Probabilistic Recovery of Our method targets four major cardiac structural Missing Phase Images in Contrast-Enhanced abnormalities -- left ventricular hypertrophy CT in Medical Imaging Meets NeurIPS, Patel (LVH), severe LVH, dilated cardiomyopathy N/A phenotype, and hypertrophic cardiomyopathy phenotype -- with performance superior to Contrast-Enhanced CT (CECT) imaging is used in radiologist assessments. Furthermore, upon the diagnosis of renal cancer and planning of interrogation, we fnd our model's predictions are surgery. Often, some CECT phase images are driven most strongly by structural features of the either completely missing or are corrupted with heart, confrming our model correctly focuses on external noise making them useless. We propose the elements of chest X-rays pertinent to the a probabilistic deep generative model for diagnoses of cardiac structural abnormality. imputing missing phase images in a sequence of CECT image. Our proposed model recovers the missing phase images with quantifed uncertainty estimates enabling medical decision-makers Abstract 61: Predicting the Need for make better-informed decisions. Furthermore, we Intensive Care for COVID-19 Patients using Deep Learning on Chest Radiography in propose a novel style-based adversarial loss to Medical Imaging Meets NeurIPS, Hu N/A learn very fne-scale features unique to CECT imaging resulting in better recovery. We In this study, we propose an artifcial intelligent demonstrate the efcacy of this algorithm using a (AI) COVID-19 prognosis method to predict patient dataset collected in an IRB-approved patients’ needs for intensive care by analyzing retrospective study. chest radiography (CXR) images using deep learning. The dataset consisted of the CXR exams of 1178 COVID-19 positive patients as confrmed Abstract 63: A Deep Learning Model to by reverse transcription polymerase chain Detect Anemia from Echocardiography in reaction tests for the SARS-CoV-2 virus, 20% of Medical Imaging Meets NeurIPS, Hughes N/A which were held out for testing. Our model was based on DenseNet121 and a curriculum learning Computer vision models applied in medical technique was employed to train on a sequence imaging domains are capable of diagnosing of gradually more specifc and complex tasks: 1) diseases beyond what human physicians are fne-tuning a model pretrained on ImageNet using capable of unassisted. This is especially the case 137 Dec. 12, 2020

in cardiology, where echocardiograms, Cancer. in Medical Imaging Meets NeurIPS, electrocardiograms, and other imaging methods Strauss N/A have been shown to contain large amounts of Human pathologists inspect pathology slides information beyond that described by simple clinical observation. Using 67,762 containing millions of cells but even experts echocardiograms and temporally associated disagree on diagnosis. While Deep learning has laboratory hemoglobin test results, we trained a shown human pathologist level success on the task of tumor discovery, it is hard to decipher video-based deep learning algorithm to predict why a classifcation decision was reached. abnormal lab values. On held-out test data, the model achieved an area under the curve (AUC) of Previously, adversarial examples have been used 0.80 in predicting abnormal hemoglobin. We to visualize the decision criteria employed by applied smoothgrad to further understand the deep learning algorithms, and often demonstrate that classifcations hinge on non-semantic features used by the model, and compared its features. Here, we demonstrate that adversarial performance with a linear model based on demographics and features derived from the examples to tumor detector NN models exist. We echocardiogram. These results suggest that compare the relative robustness to adversarial advanced algorithms can obtain additional value examples of two types of autoencoders, based either on deep NNs or on sparse-coding. Our from diagnostic imaging and identify phenotypic models consist of an autoencoder, whose latent information beyond the ability of expert clinicians. representation is fed into a cell-level classifer. We attack the models with adversarial examples, analyze the attack, and test how these attacks transfer to the model it was not built for. We Abstract 64: A Critic Evaluation Of Covid-19 found that the latent representations of both Automatic Detection From X-Ray Images in types of autoencoders did well at reconstructing Medical Imaging Meets NeurIPS, Maguolo N/A pathologist generated, pixel-level annotations In this paper, we compare and evaluate diferent and thus supported tumor detection at the cell level. Both models supported cell-level- testing protocols used for automatic COVID-19 classifcation AUC ROC scores of approximately diagnosis from X-Ray images in the recent literature. We show that similar results can be 0.85 on holdout slides. Small (1%) adversarial obtained using X-Ray images that do not contain perturbations were made to attack either model. most of the lungs. We are able to remove the Successful attacks on the deep model appeared to be random patterns (i.e. non-semantic), while lungs from the images by turning to black the successful attacks on the sparse model displayed center of the X-Ray scan and training our classifers only on the outer part of the images. cell-like features(i.e. potentially semantic). The Hence, we deduce that several testing protocols deep model was attacked through the Fast for the recognition are not fair and that the Gradient Sign Method (FGSM), whereas we demonstrate a novel method for attacking the neural networks are learning patterns in the sparse model: run FGSM on a deep classifer that dataset that are not correlated to the presence of COVID-19. Finally, we show that creating a fair uses the sparse latent representation as its testing protocol is a challenging task, and we inputs and reconstructing an image from that provide a method to measure how fair a specifc attacked sparse latent representation. Adversarial examples made for one model did not testing protocol is. In the future research we successfully transfer to the opposite model, suggest to check the fairness of a testing protocol using our tools and we encourage researchers to suggesting that the two classifers use diferent look for better techniques than the ones that we criteria for classifcation. propose.

Abstract 66: Encoding Clinical Priori in 3D Abstract 65: Comparing Sparse and Deep Convolutional Neural Networks for Prostate Cancer Detection in bpMRI in Medical Neural Network(NN)s: Using AI to Detect Imaging Meets NeurIPS, Saha N/A 138 Dec. 12, 2020

We hypothesize that anatomical priors can be without enough access to radiologists or viable mediums to infuse domain-specifc clinical radiographical equipment, the inability to analyze knowledge into state-of-the-art convolutional these images adversely afects patient care. neural networks (CNN) based on the U-Net Recent deep learning based thoracic disease architecture. We introduce a probabilistic classifcation using X-Ray images has been shown population prior which captures the spatial to perform on par with expert radiologists in prevalence and zonal distinction of clinically interpreting medical images. The purpose of this signifcant prostate cancer (csPCa), in order to study is to compare the transfer learning improve its computer-aided detection (CAD) in bi- performance of diferent deep learning algorithms parametric MR imaging (bpMRI). To evaluate on their detection of thoracic pathologies in chest performance, we train 3D adaptations of the U- radiographs. In addition, we present a simple Net, U-SEResNet, UNet++ and Attention U-Net modifcation to the well-known VGG16 network to using 800 institutional training-validation scans, overcome overftting. Comparative analysis paired with radiologically-estimated annotations shows that careful utilization of pretrained and our computed prior. For 200 independent networks may provide a good alternative to testing bpMRI scans with histologically-confrmed specialized handcrafted networks due to the lack delineations of csPCa, our proposed method of of sufcient labeled images in the medical encoding clinical priori demonstrates a strong domain. ability to improve patient-based diagnosis (upto 8.70% increase in AUROC) and lesion-level detection (average increase of 1.08 pAUC Abstract 69: StND: Streamline-based Non- between 0.1–1.0 false positive per patient) across rigid partial-Deformation Tractography all four architectures. Registration in Medical Imaging Meets NeurIPS, Chandio N/A

A brain pathway is digitally represented as a 3D Abstract 67: Semantic Video Segmentation line connecting an ordered sequence of 3D vector for Intracytoplasmic Sperm Injection Procedures in Medical Imaging Meets points called a streamline. Streamlines are NeurIPS, He N/A generated by tractography methods applied on difusion-weighted MRI. Direct alignment of white We present the frst deep learning model for the matter tractography/tracts is a crucial part of any analysis of intracytoplasmic sperm injection (ICSI) difusion MRI tractography based methods such procedures. Using a dataset of ICSI procedure as group analysis, tract segmentation, and videos, we train a deep neural network to tractometry analysis. In the past decade, several segment key objects in the videos achieving a linear registration methods for streamline mean IoU of 0.962, and to localize the needle tip registration have been developed but the achieving a mean pixel error of 3.793 pixels at 14 neuroimaging feld still lacks robust methods for FPS on a single GPU. We further analyze the nonrigid streamline-based registration. In this variation between the dataset's human paper, we introduce StND method for streamline- annotators and fnd the model's performance to based partial-deformation registration. We be comparable to human experts. formulate a registration problem for nonrigid registration of white matter tracts. In the StND, we frst perform afne streamline-based linear Abstract 68: Modifed VGG16 Network for registration (SLR) on white matter tracts and add Medical Image Analysis in Medical Imaging a deformation step in it using the probabilistic non-rigid registration method called Coherent Meets NeurIPS, Vatsavai N/A Point Drift. We model our collection of streamline Thoracic diseases, like pneumonia and data as a 3D point-set data and apply high-level emphysema, afect millions of people around the deformations to better align tracts. globe every year. Chest radiography is essential to detecting and treating these diseases. Manually interpreting radiographical images is a time-consuming and fatiguing task. In regions 139 Dec. 12, 2020

Learning Meets Combinatorial Algorithms - Attacking hard combinatorial problems with learning Marin Vlastelica, Jialin Song, Aaron Ferber, Brandon - Neural architectures mimicking combinatorial Amos, Georg Martius, Bistra Dilkina, Yisong Yue algorithms

Sat Dec 12, 03:00 AM Further information about speakers, paper submissions and schedule are available at the We propose to organize a workshop on machine workshop website: https://sites.google.com/view/ learning and combinatorial algorithms. The lmca2020/home . combination of methods from machine learning and classical AI is an emerging trend. Many researchers have argued that “future AI” Schedule methods somehow need to incorporate discrete structures and symbolic/algorithmic reasoning. Additionally, learning-augmented optimization 03:00 Poster Session A: 3:00 AM - 4:30 AM algorithms can impact the broad range of difcult AM PST Khakhulin, Addanki, Lee, Kim, but impactful optimization settings. Coupled Januszewski, Czechowski, Landolf, Vrček, learning and combinatorial algorithms have the Neumann, Gros, Fabre, Faber, Anquetil, ability to impact real-world settings such as Franzin, Bendinelli, Bartunov hardware & software architectural design, self- 06:50 Opening Vlastelica Pogančić, Martius driving cars, ridesharing, organ matching, supply AM chain management, theorem proving, and 07:00 Invited Talk (Ellen Vitercik) Vitercik program synthesis among many others. We aim AM to present diverse perspectives on the 07:25 Invited Talk (Petar Veličković) integration of machine learning and AM Veličković combinatorial algorithms. 07:50 Q&A for Session AM This workshop aims to bring together academic 08:10 Contributed Talk: A Framework For and industrial researchers in order to describe AM Diferentiable Discovery Of Graph recent advances and build lasting communication Algorithms Dai channels for the discussion of future research 08:18 Contributed Talk: Learning To Select directions pertaining the integration of machine AM Nodes In Bounded Suboptimal learning and combinatorial algorithms. The Confict-Based Search For Multi- workshop will connect researchers with various Agent Path Finding Huang relevant backgrounds, such as those working on 08:26 Contributed Talk: Neural Algorithms hybrid methods, have particular expertise in AM For Graph Navigation Zweig combinatorial algorithms, work on problems 08:35 Contributed Talk: Fit The Right Np- whose solution likely requires new approaches, as AM Hard Problem: End-To-End Learning well as everyone interested in learning something Of Integer Programming Constraints about this emerging feld of research. We aim to Paulus highlight open problems in bridging the gap between machine learning and combinatorial 08:44 Contributed Talk: Language optimization in order to facilitate new research AM Generation Via Combinatorial directions. Constraint Satisfaction: A Tree The workshop will foster the collaboration Search Enhanced Monte-Carlo between the communities by curating a list of Approach Jiang problems and challenges to promote the research 08:52 Q&A for Contributed Talks in the feld. AM 09:05 Break Our technical topics of interest include (but are AM not limited to): - Hybrid architectures with combinatorial building blocks 140 Dec. 12, 2020

09:10 Poster Session B Addanki, Deac, Xie, N/A Session A, Poster 8: K-Plex Cover AM Landolf, Prouvost, Gros, Massobrio, Pooling For Graph Neural Networks Cauligi, Alford, Dai, Franzin, Panigrahy, Landolf Kates, Drori, Huang, Zhou, Vlastelica, N/A Session B, Poster 10: Ecole: A Gym- Paulus, Zweig, Cho, Yin, Lisicki, Jiang, Sun Like Library For Machine Learning In 10:40 Break Combinatorial Optimization Solvers AM Prouvost 11:10 Invited Talk (Zico Kolter) Kolter N/A Session B, Poster 2: Neural Large AM Neighborhood Search Addanki 11:35 Invited Talk (Katherine Bouman) N/A Session A, Poster 7: Trust, But AM Bouman Verify: Model-Based Exploration In 12:00 Invited Talk (Michal Rolinek) Rolinek Sparse Reward Environments PM Czechowski 12:25 Q&A for Session 2 N/A Session B, Poster 8: K-Plex Cover PM Pooling For Graph Neural Networks 12:55 Break Landolf PM N/A Session B, Poster 11: Investment Vs. 01:25 Invited Talk (Armando Solar-Lezama) Reward In A Competitive Knapsack PM Solar-Lezama Problem Gros 01:50 Invited Talk (Kevin Ellis) Ellis N/A Session B, Poster 28: Fit The Right PM Np-Hard Problem: End-To-End Learning Of Integer Programming 02:15 Invited Talk (Yuandong Tian) Tian Constraints Paulus PM N/A Session B, Poster 32: Reinforcement 02:40 Q&A for Session 3 Learning With Efcient Active PM Feature Acquisition Yin 03:10 Guided Discussion and Closing N/A Session B, Poster 33: Evaluating PM Curriculum Learning Strategies In N/A Session B, Poster 18: Improving Neural Combinatorial Optimization Learning To Branch Via Lisicki Reinforcement Learning Sun N/A Session B, Poster 19: Dreaming With N/A Session B, Poster 26: Discrete ARC Alford Planning With Neuro-Algorithmic N/A Session B, Poster 4: Diferentiable Policies Vlastelica Top-k With Optimal Transport Xie N/A Session B, Poster 34: Language N/A Session B, Poster 3: Xlvin: Executed Generation Via Combinatorial Latent Value Iteration Nets Deac Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo N/A Session A, Poster 5: Fragment Approach Jiang Relation Networks For Geometric Shape Assembly Kim N/A Session A, Poster 5: Fragment Relation Networks For Geometric N/A Session A, Poster 31: Continuous Shape Assembly Lee Latent Search For Combinatorial Optimization Bartunov N/A Session A, Poster 11: Investment Vs. Reward In A Competitive Knapsack N/A Session A, Poster 2: Neural Large Problem Gros Neighborhood Search Addanki N/A Session A, Poster 6: Structure And N/A Session A, Poster 1: Learning Randomness In Planning And Elimination Ordering For Tree Reinforcement Learning Januszewski Decomposition Problem Khakhulin N/A Session B, Poster 12: Virtual Savant: N/A Session A, Poster 21: Towards Learning For Optimization Massobrio Transferring Algorithm Confgurations Across Problems Franzin 141 Dec. 12, 2020

N/A Session B, Poster 20: A Framework For Diferentiable Discovery Of Machine Learning for the Developing Graph Algorithms Dai World (ML4D): Improving Resilience N/A Session B, Poster 15: CoCo: Learning Strategies For Online Mixed-Integer Tejumade Afonja, Konstantin Klemmer, Niveditha Control Cauligi Kalavakonda, Femi (Oluwafemi) Azeez, Aya Salama, N/A Session A, Poster 9: A Step Towards Paula Rodriguez Diaz Neural Genome Assembly Vrček Sat Dec 12, 04:00 AM N/A Session A, Poster 11: Investment Vs. Reward In A Competitive Knapsack A few months ago, the world was shaken by the Problem Neumann outbreak of the novel Coronavirus, exposing the N/A Session B, Poster 22: Matching lack of preparedness for such a case in many Through Embedding In Dense nations around the globe. As we watched the Graphs Panigrahy daily number of cases of the virus rise N/A Session A, Poster 13: Neural-Driven exponentially, and governments scramble to Multi-Criteria Tree Search For design appropriate policies, communities Paraphrase Generation Fabre collectively asked “Could we have been better N/A Session B, Poster 23: Galaxytsp: A prepared for this?” Similar questions have been New Billion-Node Benchmark For Tsp brought up by the climate emergency the world is Kates now facing. N/A Session A, Poster 16: Learning At a time of global reckoning, this year’s ML4D Lower Bounds For Graph Exploration program will focus on building and improving With Reinforcement Learning Faber resilience in developing regions through machine N/A Session A, Poster 17: Wasserstein learning. Past iterations of the workshop have Learning Of Determinantal Point explored how machine learning can be used to Processes Anquetil tackle global development challenges, the N/A Session B, Poster 21: Towards potential benefts of such technologies, as well as Transferring Algorithm the associated risks and shortcomings. This year Confgurations Across Problems we seek to ask our community to go beyond Franzin solely tackling existing problems by building machine learning tools with foresight, N/A Session A, Poster 27: A Seq2Seq anticipating application challenges, and providing Approach To Symbolic Regression sustainable, resilient systems for long-term use. Bendinelli This one-day workshop will bring together a N/A Session B, Poster 23: Galaxytsp: A diverse set of participants from across the globe. New Billion-Node Benchmark For Attendees will learn about how machine learning TSP Drori tools can help enhance preparedness for disease N/A Session B, Poster 24: Learning To outbreaks, address the climate crisis, and Select Nodes In Bounded Suboptimal improve countries’ ability to respond to Confict-Based Search For Multi- emergencies. It will also discuss how naive “tech Agent Path Finding Huang solutionism” can threaten resilience by posing N/A Session B, Poster 25: Learning For risks to human rights, enabling mass Integer-Constrained Optimization surveillance, and perpetuating inequalities. The Through Neural Networks With workshop will include invited talks, contributed Limited Training Zhou talks, a poster session of accepted papers, N/A Session B, Poster 29: Neural breakout sessions tailored to the workshop’s Algorithms For Graph Navigation theme, and panel discussions. Zweig N/A Session B, Poster 30: Diferentiable Programming For Piecewise Polynomial Functions Cho 142 Dec. 12, 2020

Schedule 09:42 Introduction of Invited Talk 5 AM Kalavakonda 09:44 Invited Talk 5: Earth Observations 03:30 Join us in Gather.Town during AM and Machine Learning for AM Breakouts, Networking and Poster Agricultural Development Nakalembe Sessions! 10:12 Live QA with Catherine Nakalembe 04:00 Opening Remark by the ML4D AM Nakalembe AM Steering Committee Chair De-Arteaga 10:22 Contributed Talk 3: Accurate and 04:05 Introduction and Agenda Overview AM Scalable Matching of Translators to AM Afonja Displaced Persons for Overcoming 04:18 Introduction of Invited Talk 1 Afonja Language Barriers Vetterli AM 10:33 Contributed Talk 4: Incorporating 04:20 Invited Talk 1: Resilient societies - A AM Healthcare Motivated Constraints in AM framework for AI systems Sinha Restless Bandit Based Resource 04:40 Live QA with Anubha Sinha Sinha Allocation Prins AM 10:43 Poster Presentation at Gather.Town 04:50 Introduction of Invited Talk 2 AM AM Klemmer 11:43 Breakout Session on Gather.Town 04:52 Invited Talk 2: Artifcial Intelligence AM AM in Earth Observation for the 12:15 Discussion Panel with Amanda Developing World Zhu PM Coston Coston, Nsoesie, Nakalembe, 05:20 Live QA with Xiaoxiang Zhu Zhu Saavedra, Zhu, Mwebaze AM 01:15 ML4D Townhall Dubrawski 05:30 Breakout Session on Gather.Town PM AM 01:35 Best Paper / Poster Announcement 06:00 Poster Presentation at Gather.Town PM Salama AM 01:40 Closing Notes 07:00 Introduction of Invited Talk 3 Salama PM AM 07:02 Invited Talk 3: Using Search Data to AM Inform Public Health in Africa Nsoesie Abstracts (9):

07:32 Live QA with Elaine Nsoesie Nsoesie Abstract 8: Invited Talk 2: Artifcial AM Intelligence in Earth Observation for the 07:42 Introduction of Invited Talk 4 Developing World in Machine Learning for AM Rodriguez Diaz the Developing World (ML4D): Improving 07:44 Invited Talk 4: Colombian Mining Resilience, Zhu 04:52 AM AM Monitoring (CoMiMo) - detecting illegal mines using satellite data and Geoinformation derived from Earth observation Machine Learning Saavedra satellite data is indispensable for tackling grand societal challenges, such as urbanization, climate 08:12 Live QA with Santiago Saavedra change, and the UN’s SDGs. Furthermore, Earth AM Saavedra observation has irreversibly arrived in the Big 08:22 Networking Session on Gather.Town Data era, e.g. with ESA’s Sentinel satellites and AM with the blooming of NewSpace companies. This 09:22 Contributed Talk 1: Explainable requires not only new technological approaches AM Poverty Mapping using Social Media to manage and process large amounts of data, Data, Satellite Images, and but also new analysis methods. Here, methods of Geospatial Information Ledesma data science and artifcial intelligence, such as 09:32 Contributed Talk 2: Unsupervised machine learning, become indispensable. This AM learning for economic risk talk showcases how innovative machine learning evaluation in the context of Covid-19 methods and big data analytics solutions can pandemic CORTES 143 Dec. 12, 2020

signifcantly improve the retrieval of large-scale Berlotto] geo-information from Earth observation data, and 11. Automated and interpretable m-health consequently lead to breakthroughs in discrimination of vocal cord pathology enabled by geoscientifc and environmental research. In machine learning [Seedat, Aharonson, and particular, by the fusion of petabytes of EO data Hamzany] from the satellite to social media, fermented with 12. Inferring High Spatiotemporal Air Quality tailored and sophisticated data science Index - A Study in Bangkok [Muhammad Rizal algorithms, it is now possible to tackle Khaef] unprecedented, large-scale, infuential 13. Learning drivers of climate-induced human challenges, such as the mapping of urbanization migrations with Gaussian processes [Camps-Valls, on a global scale, with a particular focus on the Guillem, and Tarraga] developing world. 14. Localization of Malaria Parasites and White Blood Cells in Thick Blood Smears [Nakasi, Mwebaze, Zawedde,Tusubira, and Maiga] Abstract 11: Poster Presentation at 15. Detection of Malaria Vector Breeding Habitats using Topographic Models [Aishwarya N Jadhav] Gather.Town in Machine Learning for the 16. Enhancing Poaching Predictions for Under- Developing World (ML4D): Improving Resilience, 06:00 AM Resourced Wildlife Conservation Parks Using Remote Sensing Imagery [Guo, Xu, and Tambe] 1. The Challenge of Diacritics in Yoruba 17. I Spy With My Electricity Eye: Predicting levels Embeddings [Adewumi] of electricity consumption for residential buildings 2. Combining Twitter and Earth Observation Data in Kenya from satellite imagery. [Fobi, Taneja, and for Local Poverty Mapping [Kondmann, Häberle, Modi] and Zhu] 18. Bandit Data-driven Optimization: AI for Social Hi-UCD: A Large-scale Dataset for Urban Good and Beyond [Shi, Wu, Ghani, and Fang] Semantic Change Detection in Remote Sensing 19. Accurate and Scalable Matching of Translators Imagery [Tian, Zheng, Ma, and Zhong] to Displaced Persons for Overcoming Language 3. Application of Convolutional Neural Networks Barriers [Agarwal, Baba, Sachdeva, Tandon, in Food Resource Assessment [Muhammad Vetterli, and Alghunaim] Shakaib Iqbal, Talha Iqbal, and Hazrat Ali] 20. Crowd-Sourced Road Quality Mapping in the 4. Incorporating Healthcare Motivated Constraints Developing World [Choi and Kamalu] in Restless Bandit Based Resource Allocation 21. Learning Explainable Interventions to Mitigate [Prins, Mate, Killian, Abebe, and Tambe] HIV Transmission in Sex Workers Across Five 5. Unsupervised learning for economic risk States in India [Awasthi, Patel, Joshi, Karkal, and evaluation in the context of Covid-19 pandemic Sethi] [Cortes and Quintero] 22. Deep Learning Towards Efciency Malaria 6. Assessing the use of transaction and location Dataset Creation [Waigama, Shaka, Apina, based insights derived from Automatic Teller Ngatunga, Mmaka, and Maneno] Machines (ATM’s) as near real time “sensing” systems of economic shocks [Dhar Burra and Lokanathan] Abstract 13: Invited Talk 3: Using Search 7. Who is more ready to get back in shape? Data to Inform Public Health in Africa in [Idzalika] Machine Learning for the Developing World 8. Poor Man's Data in AI4SG [Sambasivan, (ML4D): Improving Resilience, Nsoesie 07:02 Kapania, Highfll, Akrong, Olson, Paritosh, and AM Aroyo] 9. Explainable Poverty Mapping using Social Search queries and social media data can be Media Data, Satellite Images, and Geospatial used to inform public health surveillance in Information [Ledesma, Garonita, Flores, Tingzon, Africa. Specifcally, these data can provide, (1) and Dalisay] early warning for public health crisis response; 10. Assessing the Quality of Gridded Population (2) fne-grained representation of public health Data for Quantifying the Population Living in concerns to develop targeted interventions; and Deprived Communities [Mattos, McArdle, and (3) timely feedback on public health policies. This 144 Dec. 12, 2020

talk covers examples of how search data has data, low-resolution satellite images, and been used for studying public health information volunteered geographic information. Using our needs, infectious disease surveillance and method, we achieve an R-squared of 0.66 for monitoring risk factors for chronic conditions in wealth estimation in the Philippines, an Africa. improvement over previous benchmarks. Finally, we use feature importance analysis to identify the highest contributing features both globally

Abstract 16: Invited Talk 4: Colombian Mining and locally to help decision-makers gain deeper Monitoring (CoMiMo) - detecting illegal insights into poverty. mines using satellite data and Machine Learning in Machine Learning for the Developing World (ML4D): Improving Abstract 20: Contributed Talk 2: Resilience, Saavedra 07:44 AM Unsupervised learning for economic risk evaluation in the context of Covid-19 Illegal mining is very common around the world: pandemic in Machine Learning for the 67% of United States companies could not Developing World (ML4D): Improving identify the origin of the minerals used in their Resilience, CORTES 09:32 AM supply chain (GAO, 2016). Currently, National Governments around the world are not able to Justifying draconian measures during the detect illegal activity, losing valuable resources Covid-19 pandemic was difcult not only because for development. Meanwhile, the pollution of the restriction of individual rights but also generated by illegal mines seriously afects because of its economic impact. The objective of surrounding populations. We use Sentinel 1 and this work is to present a machine learning Sentinel 2 imagery and machine learning to approach to identify regions that should identify mining activity. Through the user-friendly implement similar health policies. To that end, we interface called Colombian Mining Monitoring successfully developed a system that gives a (CoMiMo), we alert government authorities, notion of economic impact given the prediction of NGOs, and concerned citizens about possible new incidental cases through unsupervised mining activity. They can verify if the model is learning and time series forecasting. This system correct using high-resolution imagery and take was built taking into account computational action if needed. restrictions and low maintenance requirements in order to improve the system's resilience. Finally, this system was deployed as part of a web Abstract 19: Contributed Talk 1: Explainable application for simulation and data analysis of Poverty Mapping using Social Media Data, COVID-19, in Colombia, available at (https:// epidemiologia-matematica.org). Satellite Images, and Geospatial Information in Machine Learning for the Developing World (ML4D): Improving Resilience, Ledesma 09:22 AM Abstract 22: Invited Talk 5: Earth Observations and Machine Learning for Access to accurate, granular, and up-to-date Agricultural Development in Machine poverty data is essential for humanitarian Learning for the Developing World (ML4D): organizations to identify vulnerable areas for Improving Resilience, Nakalembe 09:44 AM poverty alleviation eforts. Recent works have shown success in combining computer vision and EO data ofer timely, objective, repeatable, satellite imagery for poverty estimation; global, scalable, and long-dense records and however, the cost of acquiring high-resolution methods to monitor diverse landscapes and often images coupled with black-box models can be a low-cost alternatives to traditional agricultural barrier to adoption for many development monitoring. The importance of these data in organizations. In this study, we present a cost- informing life-saving decision making can not be efcient and explainable approach to poverty overstated. NASA Harvest is NASA’s Agriculture estimation using machine learning and readily and Food Security Program. This talk will accessible data sources including social media summaries the current state of food security in 145 Dec. 12, 2020

SSA based on the recent Status of Food Security improving tuberculosis drug adherence, where a and Nutrition Report and provide an overview of health worker must simultaneously monitor and NASA Harvest’s Africa Program priorities and how provide services to many patients. We fnd that we are leveraging Machine Learning to address without considering domain expertise, the state critical data gaps necessary in planning, of the art algorithms allocates all resources to a implementation and informing agricultural small number of patients, neglecting most of the development and measuring progress towards population. To avoid this undesirable behavior, SDG-2 we propose a human-in-the-loop model, where constraints are imposed by domain experts to improve the equitability of resource allocations. Our framework enforces these constraints on the Abstract 24: Contributed Talk 3: Accurate and distribution of actions without signifcant loss of Scalable Matching of Translators to Displaced Persons for Overcoming utility on simulations derived from real-world Language Barriers in Machine Learning for data. This research opens a new line of research the Developing World (ML4D): Improving inquiry on human-machine interactions in restless multi-armed bandits. Resilience, Vetterli 10:22 AM

Residents of developing countries are disproportionately susceptible to displacement as a result of humanitarian crises. During such Biological and Artifcial Reinforcement crises, language barriers impede aid workers in Learning providing services to those displaced. To build resilience, such services must be fexible and Raymond Chua, Feryal Behbahani, Julie J Lee, Sara robust to a host of possible languages. Zannone, Rui Ponte Costa, Blake Richards , Ida Anonymous(1) aims to overcome these barriers Momennejad, Doina Precup by providing a platform capable of matching Sat Dec 12, 04:30 AM bilingual volunteers to displaced persons or aid workers in need of translating. However, Reinforcement learning (RL) algorithms learn Anonymous’s large pool of translators comes with through rewards and a process of trial-and-error. the challenge of selecting the right translator per This approach is strongly inspired by the study of request. In this paper, we describe a machine animal behaviour and has led to outstanding learning system capable of matching translator achievements. However, artifcial agents still requests to volunteers at scale. We demonstrate struggle with a number of difculties, such as that a simple logistic regression, operating on learning in changing environments and over easily computable features, can accurately longer timescales, states abstractions, predict and rank translator response. In generalizing and transferring knowledge. deployment, this lightweight system matches Biological agents, on the other hand, excel at 82% of requests with a median response time of these tasks. The frst edition of our workshop last 59 seconds, allowing aid workers to accelerate year brought together leading and emerging their services supporting displaced persons. researchers from Neuroscience, Psychology and Machine Learning to share how neural and cognitive mechanisms can provide insights for RL Abstract 25: Contributed Talk 4: research and how machine learning advances can Incorporating Healthcare Motivated further our understanding of brain and behaviour. Constraints in Restless Bandit Based This year, we want to build on the success of our Resource Allocation in Machine Learning for previous workshop, by expanding on the the Developing World (ML4D): Improving challenges that emerged and extending to novel Resilience, Prins 10:33 AM perspectives. The problem of state and action representation and abstraction emerged quite As reinforcement learning is increasingly being strongly last year, so this year’s program aims to considered in the healthcare space, it is add new perspectives like hierarchical important to consider how best to incorporate reinforcement learning, structure learning and practitioner expertise. One notable case is in 146 Dec. 12, 2020

their biological underpinnings. Additionally, we 07:01 Invited Talk #3 Kim Stachenfeld : will address learning over long timescales, such AM Structure Learning and the as lifelong learning or continual learning, by Hippocampal-Entorhinal Circuit including views from synaptic plasticity and Stachenfeld developmental neuroscience. We are hoping to 07:31 Invited Talk #3 QnA - Kim inspire and further develop connections between AM Stachenfeld Stachenfeld, Momennejad, biological and artifcial reinforcement learning by Behbahani, Chua bringing together experts from all sides and 07:45 Speaker Introduction: George encourage discussions that could help foster AM Konidaris Chua, Behbahani novel solutions for both communities. 07:46 Invited Talk #4 George Konidaris - AM Signal to Symbol (via Skills) Konidaris Schedule 08:16 Invited Talk #4 QnA - George AM Konidaris Konidaris, Chua, Behbahani 08:30 Cofee Break 04:30 Organizers Opening Remarks Chua, AM AM Behbahani, Lee, Momennejad, Ponte 08:45 Panel Discussions Lindsay, Konidaris, Costa, Richards , Precup AM Mohamed, Stachenfeld, Dayan, Niv, 04:45 Speaker Introduction: Shakir Precup, Hartley, Dasgupta AM Mohamed Behbahani, Chua 10:00 Break & Poster Session on 04:46 Invited Talk #1 Shakir Mohamed : AM Gather.Town (Main) AM Pain and Machine Learning Mohamed 12:00 Speaker Introduction: Ishita 05:16 Invited talk 1 QnA: Shakir Mohamed PM Dasgupta Lee, Chua, Behbahani AM Mohamed, Behbahani, Chua 12:01 Invited Talk #5 Ishita Dasgupta - 05:30 Speaker Introduction: Claudia PM Embedding structure in data: AM Clopath Chua, Behbahani, Ponte Costa Progress and challenges for the 05:31 Invited Talk #2 meta-learning approach Dasgupta AM (Live, no recording) - Continual 12:31 Invited Talk #5 QnA - Ishita learning with diferent timescales. PM Dasgupta Dasgupta, Lee, Behbahani, Clopath Chua 06:01 Invited Talk #2 QnA - Claudia 12:45 Speaker Introduction: Catherine AM Clopath (Live, no recording) Clopath, PM Hartley Lee, Chua, Behbahani Ponte Costa, Chua, Behbahani 12:46 Invited Talk #6 Catherine Hartley - 06:15 Speaker Introduction: Contributed PM Developmental tuning of action AM talk#1 Chua, Behbahani selection Hartley 06:16 Contributed Talk #1: Learning multi- 01:16 Invited Talk #6 QnA - Catherine AM dimensional rules with probabilistic PM Hartley Hartley, Lee, Chua, Behbahani feedback via value-based serial 01:30 Cofee Break hypothesis testing Song, Cai, Niv PM 06:30 Speaker Introduction: Contributed 01:45 Speaker Introduction: Contributed AM talk#2 Chua, Behbahani, Zannone PM talk#3 speaker Behbahani, Chua 06:31 Contributed Talk #2: Evaluating 01:46 Contributed Talk #3: Contrastive AM Agents Without Rewards Matusch, PM Behavioral Similarity Embeddings Hafner, Ba for Generalization in Reinforcement 06:45 Cofee Break Learning Agarwal, C. Machado, Castro, AM Bellemare 07:00 Speaker Introduction: Kim 02:00 Speaker Introduction: Yael Niv AM Stachenfeld Momennejad, Chua, PM Precup, Chua, Behbahani Behbahani 02:01 Invited Talk #7 Yael Niv - Latent PM causes, prediction errors and the organization of memory Niv 147 Dec. 12, 2020

02:31 Invited Talk #7 QnA - Yael Niv Niv, for researchers to share unexpected or negative PM Precup, Chua, Behbahani results and help one another improve their ideas. 02:45 Closing remarks Chua, Behbahani, Lee, PM Ponte Costa, Precup, Richards , Schedule Momennejad 02:55 Social & Poster Session on PM Gather.Town 04:45 Intro Schein, F. Pradier AM 05:00 Invited Talk: Max Welling - The LIAR AM (Learning with Interval Arithmetic I Can’t Believe It’s Not Better! Bridging the Regularization) is Dead Welling gap between theory and empiricism in 05:30 Invited Talk: Danielle Belgrave - probabilistic machine learning AM Machine Learning for Personalised Healthcare: Why is it not better? Jessica Forde, Francisco Ruiz, Melanie Fernandez Belgrave Pradier, Aaron Schein, Finale Doshi-Velez, Isabel Valera, David Blei, Hanna Wallach 06:00 Invited Talk: Mike Hughes - The Case AM for Prediction Constrained Training Sat Dec 12, 04:45 AM Hughes 06:30 Margot Selosse---A bumpy journey: We’ve all been there. A creative spark leads to a AM exploring deep Gaussian mixture beautiful idea. We love the idea, we nurture it, models Selosse and name it. The idea is elegant: all who hear it 06:33 Diana Cai---Power posteriors do not fawn over it. The idea is justifed: all of the AM reliably learn the number of literature we have read supports it. But, lo and components in a fnite mixture Cai behold: once we sit down to implement the idea, 06:36 W Ronny Huang---Understanding it doesn’t work. We check our code for software AM Generalization through bugs. We rederive our derivations. We try again Visualizations Huang and still, it doesn’t work. We Can’t Believe It’s Not Better [1]. 06:39 Udari Madhushani---It Doesn’t Get AM Better and Here’s Why: A In this workshop, we will encourage probabilistic Fundamental Drawback in Natural machine learning researchers who Can’t Believe Extensions of UCB to Multi-agent It’s Not Better to share their beautiful idea, tell us Bandits Madhushani why it should work, and hypothesize why it does 06:42 Erik Jones---Selective Classifcation not in practice. We also welcome work that AM Can Magnify Disparities Across highlights pathologies or unexpected behaviors in Groups Jones well-established practices. This workshop will 06:45 Yannick Rudolph---Graph Conditional stress the quality and thoroughness of the AM Variational Models: Too Complex for scientifc procedure, promoting transparency, Multiagent Trajectories? Rudolph deeper understanding, and more principled 06:50 Cofee Break (Gather.town available: science. AM https://bit.ly/3gxkLA7) 07:00 Poster Session in gather.town: Focusing on the probabilistic machine learning AM https://bit.ly/3gxkLA7 community will facilitate this endeavor, not only 08:00 Charline Le Lan---Perfect density by gathering experts that speak the same AM models cannot guarantee anomaly language, but also by exploiting the modularity of detection Le Lan probabilistic framework. Probabilistic machine 08:15 Fan Bao---Variational (Gradient) learning separates modeling assumptions, AM Estimate of the Score Function in inference, and model checking into distinct Energy-based Latent Variable phases [2]; this facilitates criticism when the fnal Models Bao outcome does not meet prior expectations. We aim to create an open-minded and diverse space 148 Dec. 12, 2020

08:30 Emilio Jorge---Inferential Induction: 01:15 Breakout Discussions (in AM A Novel Framework for Bayesian PM gather.town): https://bit.ly/3gxkLA7 Reinforcement Learning Jorge 01:45 Panel & Closing Broderick, Dinh, 09:00 Lunch Break (Gather.town available: PM Lawrence, Lum, Wallach, Williamson AM https://bit.ly/3gxkLA7) 10:00 Invited Talk: Andrew Gelman - It Abstracts (28): AM Doesn’t Work, But The Alternative Is Even Worse: Living With Abstract 2: Invited Talk: Max Welling - The Approximate Computation Gelman LIAR (Learning with Interval Arithmetic 10:30 Invited Talk: Roger Grosse - Why Regularization) is Dead in I Can’t Believe AM Isn’t Everyone Using Second-Order It’s Not Better! Bridging the gap between Optimization? Grosse theory and empiricism in probabilistic 11:00 Invited Talk: Weiwei Pan - What are machine learning, Welling 05:00 AM AM Useful Uncertainties for Deep Learning and How Do We Get Them? Two years ago we embarked on a project called Pan LIAR. LIAR was going to quantify uncertainty of a network through interval arithmetic (IA) 11:30 Vincent Fortuin---Bayesian Neural calculations (which are an ofcial IEEE standard). AM Network Priors Revisited Fortuin IA has the beautiful property that the answer of 11:33 Ziyu Wang---Further Analysis of your computation is guaranteed to lie in a AM Outlier Detection with Deep computed interval, and as such quantifes very Generative Models Wang precisely the numerical precision of your 11:36 Siwen Yan---The Curious Case of computation. Captured by this elegant idea we AM Stacking Boosted Relational applied this to neural networks. In particular, the Dependency Networks Yan idea was to add a regularization term to the 11:39 Maurice Frank - Problems using deep objective that would try to keep the interval of AM generative models for probabilistic the network’s output small. This is particularly audio source separation Frank interesting in the context of quantization, where 11:42 Ramiro Camino---Oversampling we quite naturally have intervals for the weights, AM Tabular Data with Deep Generative activations and inputs due to their limited Models: Is it worth the efort? Camino precision. By training a full precision neural 11:45 Ângelo Gregório Lovatto---Decision- network with intervals that represent the AM Aware Model Learning for Actor- quantization error, and by encouraging the Critic Methods: When Theory Does network to keep the resultant variation in the Not Meet Practice Lovatto predictions small, we hoped to learn networks 11:50 Cofee Break (Gather.town available: that were inherently robust to quantization noise. AM https://bit.ly/3gxkLA7) So far the good news. In this talk I will try to 12:00 Tin D. Nguyen---Independent versus reconstruct the process of how the project ended PM truncated fnite approximations for up on the scrap pile. I will also try to produce Bayesian nonparametric inference some “lessons learned” from this project and Nguyen hopefully deliver some advice for those who are 12:15 Ricky T. Q. Chen---Self-Tuning going through a similar situation. I still can’t PM Stochastic Optimization with believe it didn’t work better ;-) Curvature-Aware Gradient Filtering Chen 12:30 Elliott Gordon-Rodriguez---Uses and Abstract 3: Invited Talk: Danielle Belgrave - PM Abuses of the Cross-Entropy Loss: Machine Learning for Personalised Case Studies in Modern Deep Healthcare: Why is it not better? in I Can’t Learning Gordon-Rodriguez Believe It’s Not Better! Bridging the gap 12:45 Poster Session (in gather.town): between theory and empiricism in PM https://bit.ly/3gxkLA7 probabilistic machine learning, Belgrave 05:30 AM 149 Dec. 12, 2020

This talk presents an overview of probabilistic while still delivering reasonable generative graphical modelling as a strategy for models for the observed features. We highlight understanding heterogeneous subgroups of promising results of our proposed prediction- patients. The identifcation of such subgroups constrained framework including recent may elucidate underlying causal mechanisms extensions to semi-supervised VAEs and model- which may lead to more targeted treatment and based reinforcement learning. intervention strategies. We will look at (1) the ideal of personalisation within the context of machine learning for healthcare (2) “From the Abstract 5: Margot Selosse---A bumpy ideal to the reality” and (3) some of the possible journey: exploring deep Gaussian mixture pathways to progress for making the ideal of models in I Can’t Believe It’s Not Better! personalised healthcare to reality. The last part of Bridging the gap between theory and this talk focuses on the pipeline of empiricism in probabilistic machine personalisation and looks at probabilistic learning, Selosse 06:30 AM graphical models are part of a pipeline. The deep Gaussian mixture model (DGMM) is a framework directly inspired by the fnite mixture

Abstract 4: Invited Talk: Mike Hughes - The of factor analysers model (MFA) and the deep Case for Prediction Constrained Training in I learning architecture composed of multiple Can’t Believe It’s Not Better! Bridging the layers. The MFA is a generative model that considers a data point as arising from a latent gap between theory and empiricism in variable (termed the score) which is sampled probabilistic machine learning, Hughes 06:00 AM from a standard multivariate Gaussian distribution and then transformed linearly. The This talk considers adding supervision to well- linear transformation matrix (termed the loading known generative latent variable models (LVMs), matrix) is specifc to a component in the fnite including both classic LVMs (e.g. mixture models, mixture. The DGMM consists of stacking MFA topic models) and more recent “deep” favors layers, in the sense that the latent scores are no (e.g. variational autoencoders). The standard way longer assumed to be drawn from a standard to add supervision to LVMs would be to treat the Gaussian, but rather are drawn from a mixture of added label as another observed variable factor analysers model. Thus the latent scores generated by the , and then are at one point considered to be the input of an maximize the joint likelihood of both labels and MFA and also to have latent scores themselves. features. We fnd that across many models, this The latent scores of the DGMM's last layer only standard supervision leads to surprisingly are considered to be drawn from a standard negligible improvement in prediction quality over multivariate Gaussian distribution. In recent a more naive baseline that frst fts an years, the DGMM gained prominence in the unsupervised model, and then makes predictions literature: intuitively, this model should be able to given that model’s learned low-dimensional capture distributions more precisely than a representation. We can’t believe it is not better! simple Gaussian mixture model. We show in this Further, this problem is not properly solved by work that while the DGMM is an original and previous approaches that just upweight or novel idea, in certain cases it is challenging to “replicate” labels in the generative model (the infer its parameters. In addition, we give some problem is not just that we have more observed insights to the probable reasons of this difculty. features than labels). Instead, we suggest the Experimental results are provided on github: problem is related to model misspecifcation, and https://github.com/ansubmissions/ICBINB, that the joint likelihood objective does not alongside an R package that implements the properly encode the desired performance goals at algorithm and a number of ready-to-run R scripts. test time (we care about predicting labels from features, but not features from labels). This motivates a new training objective we call Abstract 6: Diana Cai---Power posteriors do prediction constrained training, which can not reliably learn the number of prioritize the label-from-feature prediction task 150 Dec. 12, 2020

components in a fnite mixture in I Can’t and how the curse (or, rather, the blessing) of Believe It’s Not Better! Bridging the gap dimensionality causes optimizers to settle into between theory and empiricism in minima that generalize well. probabilistic machine learning, Cai 06:33 AM

Scientists and engineers are often interested in learning the number of subpopulations (or Abstract 8: Udari Madhushani---It Doesn’t components) present in a data set. Data science Get Better and Here’s Why: A Fundamental folk wisdom tells us that a fnite mixture model Drawback in Natural Extensions of UCB to Multi-agent Bandits in I Can’t Believe It’s (FMM) with a prior on the number of components Not Better! Bridging the gap between will fail to recover the true, data-generating number of components under model theory and empiricism in probabilistic misspecifcation. But practitioners still widely use machine learning, Madhushani 06:39 AM FMMs to learn the number of components, and We identify a fundamental drawback of natural statistical machine learning papers can be found extensions of Upper Confdence Bound (UCB) recommending such an approach. Increasingly, algorithms to the multi-agent bandit problem in though, data science papers suggest potential which multiple agents facing the same explore- alternatives beyond vanilla FMMs, such as power exploit problem can share information. We posteriors, coarsening, and related methods. In provide theoretical guarantees that when agents this work we start by adding rigor to folk wisdom use a natural extension of the UCB sampling rule, and proving that, under even the slightest model sharing information about the optimal option misspecifcation, the FMM component-count degrades their performance. For K the number of posterior diverges: the posterior probability of agents and T the time horizon, we prove that any particular fnite number of latent components when agents share information only about the converges to 0 in the limit of infnite data. We use optimal option they sufer an expected group the same theoretical techniques to show that cumulative regret of O(KlogT + KlogK), whereas power posteriors with fxed power face the same when they do not share any information they only undesirable divergence, and we provide a proof sufer a group regret of O(KlogtT). Further, while for the case where the power converges to a non- information sharing about all options yields much zero constant. We illustrate the practical better performance than with no information consequences of our theory on simulated and sharing, we show that including information real data. We conjecture how our methods may about the optimal option is not as good as be applied to lend insight into other component- sharing information only about suboptimal count robustifcation techniques. options.

Abstract 7: W Ronny Huang---Understanding Abstract 9: Erik Jones---Selective Generalization through Visualizations in I Classifcation Can Magnify Disparities Can’t Believe It’s Not Better! Bridging the Across Groups in I Can’t Believe It’s Not gap between theory and empiricism in Better! Bridging the gap between theory probabilistic machine learning, Huang 06:36 and empiricism in probabilistic machine AM learning, Jones 06:42 AM The power of neural networks lies in their ability Selective classifcation, in which models are to generalize to unseen data, yet the underlying allowed to abstain on uncertain predictions, is a reasons for this phenomenon remain elusive. natural approach to improving accuracy in Numerous rigorous attempts have been made to settings where errors are costly but abstentions explain generalization, but available bounds are are manageable. In this paper, we fnd that while still quite loose, and analysis does not always selective classifcation can improve average lead to true understanding. The goal of this work accuracies, it can simultaneously magnify is to make generalization more intuitive. Using existing accuracy disparities between various visualization methods, we discuss the mystery of groups within a population, especially in the generalization, the geometry of loss landscapes, presence of spurious correlations. We observe 151 Dec. 12, 2020

this behavior consistently across fve datasets autoencoding does not contribute statistically from computer vision and NLP. Surprisingly, signifcantly to empirical performance. Instead, increasing the abstention rate can even decrease we show that well-known emission functions do accuracies on some groups. To better understand contribute, while coming with less complexity, when selective classifcation improves or worsens engineering and computation time. accuracy on a group, we study its margin distribution, which captures the model’s confdences over all predictions. For example, Abstract 12: Poster Session in gather.town: when the margin distribution is symmetric, we https://bit.ly/3gxkLA7 in I Can’t Believe It’s prove that whether selective classifcation Not Better! Bridging the gap between monotonically improves or worsens accuracy is theory and empiricism in probabilistic fully determined by the accuracy at full coverage machine learning, 07:00 AM (i.e., without any abstentions) and whether the distribution satisfes a property we term left-log- Link to access the gather town: https://bit.ly/ concavity. Our analysis also shows that selective 3gxkLA7 classifcation tends to magnify accuracy disparities that are present at full coverage.

Fortunately, we fnd that it uniformly improves Abstract 13: Charline Le Lan---Perfect density each group when applied to distributionally- models cannot guarantee anomaly robust models that achieve similar full-coverage detection in I Can’t Believe It’s Not Better! accuracies across groups. Altogether, our results Bridging the gap between theory and imply selective classifcation should be used with empiricism in probabilistic machine care and underscore the importance of models learning, Le Lan 08:00 AM that perform equally well across groups at full coverage. Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important

Abstract 10: Yannick Rudolph---Graph applications like anomaly detection, uncertainty Conditional Variational Models: Too estimation, and active learning. However, the Complex for Multiagent Trajectories? in I likelihood values empirically attributed to anomalies confict with the expectations these Can’t Believe It’s Not Better! Bridging the proposed applications suggest. In this paper, we gap between theory and empiricism in probabilistic machine learning, Rudolph 06:45 take a closer look at the behavior of distribution AM densities and show that these quantities carry less meaningful information than previously Recent advances in modeling multiagent thought, beyond estimation issues or the curse of trajectories combine graph architectures such as dimensionality. We conclude that the use of these graph neural networks (GNNs) with conditional likelihoods for out-of-distribution detection relies variational models (CVMs) such as variational on strong and implicit hypotheses and highlight RNNs (VRNNs). Originally, CVMs have been the necessity of explicitly formulating these proposed to facilitate learning with multi-modal assumptions for reliable anomaly detection. and structured data and thus seem to perfectly match the requirements of multi-modal multiagent trajectories with their structured Abstract 14: Fan Bao---Variational (Gradient) output spaces. Empirical results of VRNNs on Estimate of the Score Function in Energy- trajectory data support this assumption. In this based Latent Variable Models in I Can’t paper, we revisit experiments and proposed Believe It’s Not Better! Bridging the gap architectures with additional rigour, ablation runs between theory and empiricism in and baselines. In contrast to common belief, we probabilistic machine learning, Bao 08:15 AM show that both, historic and current results with CVMs on trajectory data are misleading. Given a The learning and evaluation of energy-based neural network with a graph architecture and/or latent variable models (EBLVMs) without any structured output function, variational structural assumptions are highly challenging, 152 Dec. 12, 2020

because the true posteriors and the partition and empiricism in probabilistic machine functions in such models are generally learning, Gelman 10:00 AM intractable. This paper presents variational We can’t ft the models we want to ft because it estimates of the score function and its gradient with respect to the model parameters in a takes too long to ft them on our computer. Also, general EBLVM, referred to as VaES and VaGES we don’t know what models we want to ft until respectively. The variational posterior is trained we try a few. I share some stories of struggles with data-partitioning and parameter-partitioning to minimize a certain divergence to the true algorithms, what kinda worked and what didn’t. model posterior and the bias in both estimates can be bounded by the divergence theoretically. With a minimal model assumption, VaES and VaGES can be applied to the kernelized Stein Abstract 18: Invited Talk: Roger Grosse - Why discrepancy (KSD) and score matching (SM)- Isn’t Everyone Using Second-Order based methods to learn EBLVMs. Besides, VaES Optimization? in I Can’t Believe It’s Not can also be used to estimate the exact Fisher Better! Bridging the gap between theory divergence between the data and general and empiricism in probabilistic machine EBLVMs. learning, Grosse 10:30 AM

In the pre-AlexNet days of deep learning, second- order optimization gave dramatic speedups and Abstract 15: Emilio Jorge---Inferential enabled training of deep architectures that Induction: A Novel Framework for Bayesian seemed to be inaccessible to frst-order Reinforcement Learning in I Can’t Believe optimization. But today, despite algorithmic It’s Not Better! Bridging the gap between advances such as K-FAC, nearly all modern neural theory and empiricism in probabilistic net architectures are trained with variants of SGD machine learning, Jorge 08:30 AM and Adam. What’s holding us back from using Bayesian Reinforcement Learning (BRL) ofers a second-order optimization? I’ll discuss three challenges to applying second-order optimization decision-theoretic solution to the reinforcement to modern neural nets: difculty of learning problem. While ''model-based'' BRL algorithms have focused either on maintaining a implementation, implicit regularization efects of posterior distribution on models, BRL ''model- gradient descent, and the efect of gradient free'' methods try to estimate value function noise. All of these factors are signifcant, though not in the ways commonly believed. distributions but make strong implicit assumptions or approximations. We describe a novel Bayesian framework, \emph{inferential induction}, for correctly inferring value function Abstract 19: Invited Talk: Weiwei Pan - What distributions from data, which leads to a new are Useful Uncertainties for Deep Learning family of BRL algorithms. We design an algorithm, and How Do We Get Them? in I Can’t Bayesian Backwards Induction (BBI), with this Believe It’s Not Better! Bridging the gap framework. We experimentally demonstrate that between theory and empiricism in BBI is competitive with the state of the art. probabilistic machine learning, Pan 11:00 AM However, its advantage relative to existing BRL While deep learning has demonstrable success model-free methods is not as great as we have on many tasks, the point estimates provided by expected, particularly when the additional computational burden is taken into account. standard deep models can lead to overftting and provide no uncertainty quantifcation on predictions. However, when models are applied to critical domains such as autonomous driving, Abstract 17: Invited Talk: Andrew Gelman - It precision health care, or criminal justice, reliable Doesn’t Work, But The Alternative Is Even measurements of a model’s predictive Worse: Living With Approximate uncertainty may be as crucial as correctness of Computation in I Can’t Believe It’s Not its predictions. In this talk, we examine a number Better! Bridging the gap between theory of deep (Bayesian) models that promise to 153 Dec. 12, 2020

capture complex forms for predictive generative model is uncalibrated. We also uncertainties, we also examine metrics conduct additional experiments to help commonly used to such uncertainties. We aim to disentangle the impact of low-level texture highlight strengths and limitations of these versus high-level semantics in diferentiating models as well as the metrics; we also discuss outliers. In aggregate, these results suggest that ideas to improve both in meaningful ways for modifcations to the standard evaluation downstream tasks. practices and benchmarks commonly applied in the literature are needed.

Abstract 20: Vincent Fortuin---Bayesian Neural Network Priors Revisited in I Can’t Abstract 22: Siwen Yan---The Curious Case of Believe It’s Not Better! Bridging the gap Stacking Boosted Relational Dependency between theory and empiricism in Networks in I Can’t Believe It’s Not Better! probabilistic machine learning, Fortuin 11:30 Bridging the gap between theory and AM empiricism in probabilistic machine learning, Yan 11:36 AM Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network Reducing bias while learning and inference is an inference. However, there has been recent important requirement to achieve generalizable controversy over the question whether they and better performing models. The method of might be to blame for the undesirable cold stacking took the frst step towards creating such posterior efect. We study this question models by reducing inference bias but the empirically and fnd that for densely connected question of combining stacking with a model that networks, Gaussian priors are indeed less well reduces learning bias is still largely unanswered. suited than more heavy-tailed ones. Conversely, In statistical relational learning, ensemble models for convolutional architectures, Gaussian priors of relational trees such as boosted relational seem to perform well and thus cannot fully dependency networks (RDN-Boost) are shown to explain the cold posterior efect. These fndings reduce the learning bias. We combine RDN-Boost coincide with the empirical maximum-likelihood and stacking methods with the aim of reducing weight distributions discovered by standard both learning and inference bias subsequently gradient-based training. resulting in better overall performance. However, our evaluation on three relational data sets shows no signifcant performance improvement Abstract 21: Ziyu Wang---Further Analysis of over the baseline models. Outlier Detection with Deep Generative Models in I Can’t Believe It’s Not Better! Bridging the gap between theory and Abstract 23: Maurice Frank - Problems using empiricism in probabilistic machine deep generative models for probabilistic learning, Wang 11:33 AM audio source separation in I Can’t Believe It’s Not Better! Bridging the gap between The recent, counter-intuitive discovery that deep theory and empiricism in probabilistic generative models (DGMs) can frequently assign machine learning, Frank 11:39 AM a higher likelihood to outliers has implications for both outlier detection applications as well as our Recent advancements in deep generative overall understanding of generative modeling. In modeling make it possible to learn prior this work, we present a possible explanation for distributions from complex data that this phenomenon, starting from the observation subsequently can be used for Bayesian inference. that a model's typical set and high-density region However, we fnd that distributions learned by may not coincide. From this vantage point we deep generative models for audio signals do not propose a novel outlier test, the empirical exhibit the right properties that are necessary for success of which suggests that the failure of tasks like audio source separation using a existing likelihood-based outlier tests does not probabilistic approach. We observe that the necessarily imply that the corresponding learned prior distributions are either 154 Dec. 12, 2020

discriminative and extremely peaked or smooth improve sample efciency and reduce reliance on and non-discriminative. We quantify this behavior the value function estimate. However, learning for two types of deep generative models on two an accurate dynamics model of the world audio datasets. remains challenging, often requiring computationally costly and data-hungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and Abstract 24: Ramiro Camino---Oversampling Tabular Data with Deep Generative Models: often detrimental to the overall task; instead, the Is it worth the efort? in I Can’t Believe It’s agent should improve the world model on task- Not Better! Bridging the gap between critical regions. For example, in Iterative Value- Aware Model Learning, the authors extend model- theory and empiricism in probabilistic based value iteration by incorporating the value machine learning, Camino 11:42 AM function (estimate) into the model loss function, In practice, machine learning experts are often showing the novel model objective refects confronted with imbalanced data. Without improved performance in the end task. Therefore, accounting for the imbalance, common classifers it seems natural to expect that model-based perform poorly and standard evaluation metrics Actor-Critic methods can beneft equally from mislead the practitioners on the model's learning value-aware models, improving overall performance. A common method to treat task performance, or reducing the need for large, imbalanced datasets is under- and oversampling. expensive models. However, we show empirically In this process, samples are either removed from that combining Actor-Critic and value-aware the majority class or synthetic samples are added model learning can be quite difcult and that to the minority class. In this paper, we follow up naive approaches such as maximum likelihood on recent developments in deep learning. We estimation often achieve superior performance take proposals of deep generative models, and with less computational cost. Our results suggest study the ability of these approaches to provide that, despite theoretical guarantees, learning a realistic samples that improve performance on value-aware model in continuous domains does imbalanced classifcation tasks via oversampling. not ensure better performance on the overall Across 160K+ experiments, we show that the task. improvements in terms of performance metric, while shown to be signifcant when ranking the methods like in the literature, often are minor in Abstract 27: Tin D. Nguyen---Independent absolute terms, especially compared to the versus truncated fnite approximations for required efort. Furthermore, we notice that a Bayesian nonparametric inference in I Can’t large part of the improvement is due to Believe It’s Not Better! Bridging the gap undersampling, not oversampling. between theory and empiricism in probabilistic machine learning, Nguyen 12:00 PM Abstract 25: Ângelo Gregório Lovatto--- Decision-Aware Model Learning for Actor- Bayesian nonparametric models based on completely random measures (CRMs) ofers Critic Methods: When Theory Does Not Meet fexibility when the number of clusters or latent Practice in I Can’t Believe It’s Not Better! Bridging the gap between theory and components in a data set is unknown. However, empiricism in probabilistic machine managing the infnite dimensionality of CRMs learning, Lovatto 11:45 AM often leads to slow computation during inference. Practical inference typically relies on either Actor-Critic methods are a prominent class of integrating out the infnite-dimensional modern reinforcement learning algorithms based parameter or using a fnite approximation: a on the classic Policy Iteration procedure. Despite truncated fnite approximation (TFA) or an many successful cases, Actor-Critic methods tend independent fnite approximation (IFA). The atom to require a gigantic number of experiences and weights of TFAs are constructed sequentially, can be very unstable. Recent approaches have while the atoms of IFAs are independent, which advocated learning and using a world model to facilitates more convenient inference schemes. 155 Dec. 12, 2020

While the approximation error of TFA has been probabilistic machine learning, Gordon- systematically addressed, there has not yet been Rodriguez 12:30 PM a similar study of IFA. We quantify the Modern deep learning is primarily an approximation error between IFAs and two common target nonparametric priors (beta- experimental science, in which empirical Bernoulli process and Dirichlet process mixture advances occasionally come at the expense of model) and prove that, in the worst-case, TFAs probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross- provide more component-efcient approximations entropy loss to model data that is not strictly than IFAs. However, in experiments on image denoising and topic modeling tasks with real categorical, but rather takes values on the data, we fnd that the error of Bayesian simplex. This practice is standard in neural approximation methods overwhelms any fnite network architectures with label smoothing and actor-mimic reinforcement learning, amongst approximation error, and IFAs perform very others. Drawing on the recently discovered similarly to TFAs. {continuous-categorical} distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is a more Abstract 28: Ricky T. Q. Chen---Self-Tuning principled and theoretically appealing. Through Stochastic Optimization with Curvature- careful experimentation, including an ablation Aware Gradient Filtering in I Can’t Believe study, we identify the potential for It’s Not Better! Bridging the gap between outperformance in these models, thereby theory and empiricism in probabilistic highlighting the importance of a proper machine learning, Chen 12:15 PM probabilistic treatment, as well as illustrating some of the failure modes thereof. Standard frst-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as Abstract 30: Poster Session (in gather.town): the curvature can help de-sensitize common https://bit.ly/3gxkLA7 in I Can’t Believe It’s hyperparameters. Based on this intuition, we Not Better! Bridging the gap between explore the use of exact per-sample Hessian- theory and empiricism in probabilistic vector products and gradients to construct machine learning, 12:45 PM optimizers that are self-tuning and Link to access the Gather.town: https:// hyperparameter-free. Based on a dynamics neurips.gather.town/app/5163xhrHdSWrUZsG/ model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive ICBINB online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base of of Abstract 31: Breakout Discussions (in our estimates quantities. We prove that our gather.town): https://bit.ly/3gxkLA7 in I model-based procedure converges in the noisy Can’t Believe It’s Not Better! Bridging the quadratic setting. Though we do not see similar gap between theory and empiricism in gains in deep learning tasks, we can match the probabilistic machine learning, 01:15 PM performance of well-tuned optimizers and Link to access the Gather.town: https:// ultimately, this is an interesting step for constructing self-tuning optimizers. neurips.gather.town/app/5163xhrHdSWrUZsG/ ICBINB

Abstract 29: Elliott Gordon-Rodriguez---Uses and Abuses of the Cross-Entropy Loss: Case Abstract 32: Panel & Closing in I Can’t Studies in Modern Deep Learning in I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in Believe It’s Not Better! Bridging the gap probabilistic machine learning, Broderick, between theory and empiricism in 156 Dec. 12, 2020

Dinh, Lawrence, Lum, Wallach, Williamson 01:45 Schedule PM

A panel discussion moderated by Hanna Wallach 04:50 Opening Remarks (MSR New York). AM 05:00 Nils Thuerey - Lead the Way! Deep Panelists: AM Learning via Diferentiable -- Tamera Broderick (MIT) Simulations Thuerey -- Laurent Dinh () 05:30 Nils Thuerey Q&A -- Neil Lawrence (Cambridge) AM -- Kristian Lum (Human Rights Data Analysis 05:40 Angela Dai - Self-supervised Group) AM generation of 3D shapes and scenes -- Sinead Williamson (UT Austin) Dai 06:10 Angela Dai Q&A AM 06:20 Poster Session 1 Machine Learning for Engineering AM Modeling, Simulation and Design 08:20 Tatiana Lopez-Guevara - Robots, AM Liquids & Inference Lopez-Guevara Alex Beatson, Priya Donti, Amira Abdel-Rahman, Stephan Hoyer, Rose Yu, J. Zico Kolter, Ryan Adams 08:50 Tatiana Lopez-Guevara Q&A AM Sat Dec 12, 04:50 AM 09:00 Peter Battaglia - Structured models AM of physics, objects, and scenes For full details see: https://ml4eng.github.io/ Battaglia 09:30 Peter Battaglia Q&A For questions, issues, and on-the-day help, email: AM [email protected] 09:40 Break AM gather.town link for poster sessions and breaks: https://neurips.gather.town/app/ 10:30 Panel discussion with invited D2n0HkRXoVlgUSWV/ML4Eng-NeurIPS20 AM speakers 11:30 Karen E Willcox - Operator Modern engineering workfows are built on AM Inference: Bridging model reduction computational tools for specifying models and and scientifc machine learning designs, for numerical analysis of system Willcox behavior, and for optimization, model-ftting and 12:00 Karen E Willcox Q&A rational design. How can machine learning be PM used to empower the engineer and accelerate 12:10 Grace X Gu - Artifcial intelligence this workfow? We wish to bring together machine PM for materials design and additive learning researchers and engineering academics manufacturing Gu to address the problem of developing ML tools 12:35 Grace X Gu Q&A which beneft engineering modeling, simulation PM and design, through reduction of required 12:50 Closing remarks computational or human efort, through PM permitting new rich design spaces, through 01:00 Poster Session 2 enabling production of superior designs, or PM through enabling new modes of interaction and N/A Real-time Prediction of Soft Tissue new workfows. Deformations Using Data-driven Nonlinear Presurgical Simulations Liu, Han, Emerson, Majditehran, Rabin, Kara 157 Dec. 12, 2020

N/A On the Efectiveness of Bayesian N/A Surrogates for Stif Nonlinear AutoML methods for Physics Systems using Continuous Time Emulators Mitra, Dal Santo, Echo State Networks Anantharaman, Haghshenas, Mitra, Daly, Schmidt Rackauckas, Shah N/A Continuous calibration of a digital N/A Accelerating Inverse Design of twin; a particle flter approach Ward, Nanostructures Using Manifold Choudhary, Gregory Learning Zandehshahvar, Kiarashinejad, N/A A Learning-boosted Quasi-Newton Zhu, Maleki, Hemmatyar, Method for AC Optimal Power Flow Abdollahramezani, Pourabolghasem, Baker Adibi N/A Frequency-compensated PINNs for N/A Combinatorial 3D Shape Generation Fluid-dynamic Design Problems via Sequential Assembly Kim, Chung, Zhang, Dey, Kakkar, Dasgupta, Lee, Cho, Park Chakraborty N/A Flaw Detection in Metal Additive N/A ManufacturingNet: A machine Manufacturing Using Deep Learned learning tool for engineers Magar, Acoustic Features Zhang, Kara Ghule , Doshi, Seshadri , Khalid, Barati N/A Collaborative Multidisciplinary Farimani Design Optimization with Neural N/A Model Order Reduction using a Deep Networks de Becdelievre, Kroo Orthogonal Decomposition Tait N/A A Nonlocal-Gradient Descent Method N/A On Training Efective Reinforcement for Inverse Design in Nanophotonics Learning Agents for Real-time Power Bi, Zhang, Zhang Grid Operation and Control Diao, Shi, N/A Scalable Deep-Learning-Accelerated Zhang, Wang, Li, Xu, Lan, Bian, Duan, Wu Topology Optimization for Additively N/A Predicting Nanorobot Shapes via Manufactured Materials Bi, Zhang, Generative Models Benjaminson, Zhang Taylor, Travers N/A Learning Mesh-Based Simulation N/A Constraint active search for with Graph Networks Pfaf, Fortunato, experimental design Malkomes, Sanchez Gonzalez, Battaglia Cheng, McCourt N/A A General Framework Combining N/A An adversarially robust approach to Generative Adversarial Networks security-constrained optimal power and Mixture Density Networks for fow Bedmutha, Donti, Kolter Inverse Modeling in Microstructural N/A Heat risk assessment using Materials Design Yang, Jha, Paul, Liao, surrogate model for meso-scale Choudhary, Agrawal surface temperature Choi, Pozzi, N/A Simultaneous Process Design and Berges Control Optimization using N/A Learning to Identify Drilling Defects Reinforcement Learning Sachio, del in TurbineBlades with Single Stage Rio Chanona, Petsagkourakis Detectors Panizza, Stefanek, Melacci, N/A A Sequential Modelling Approach for Veneri, Gori Indoor Temperature Prediction and N/A Parameterized Reinforcement Heating Control in Smart Buildings Learning for Optical System Huang, Miles, Zhang Optimization Wankerl, Stern, Mahdavi, N/A Probabilistic Adjoint Sensitivity Eichler, Lang Analysis for Fast Calibration of N/A Jacobian of Conditional Generative Partial Diferential Equation Models Models for Sensitivity Analysis of Cockayne, Duncan Photovoltaic Device Processes N/A Multi-Loss Sub-Ensembles for Molamohammadi, Rezaei-Shoshtari, Accurate Classifcation with Quitoriano Uncertainty Estimation Achrack, Kellerman, Barzilay 158 Dec. 12, 2020

N/A Analog Circuit Design with Dyna- N/A Machine Learning-based Anomaly Style Reinforcement Learning Lee, Detection with Magnetic Data Mitra, Oliehoek Akhiyarov, Araya-Polo, Byrd N/A Multilevel Delayed Acceptance N/A Autonomous Control of a Particle MCMC with an Adaptive Error Model Accelerator using Deep in PyMC3 Lykkegaard, Mingas, Scheichl, Reinforcement Learning Pang, Fox, Dodwell Thulasidasan, Rybarcyk N/A Bayesian polynomial chaos Seshadri, N/A Exact Preimages of Neural Network Duncan, Scillitoe Aircraft Collision Avoidance Systems N/A Battery Model Calibration with Deep Matoba, Fleuret Reinforcement Learning Unagar, Tian, N/A Uncertainty-aware Remaining Useful Fink, Arias Chao Life predictors Biggio, Arias Chao, Fink N/A Modular mobile robot design N/A Decoding the genome of cement by selection with deep reinforcement Gaussian Process Regression Song, learning Whitman, Travers, Choset Wang, Wang, Bauchy N/A Rethink AI-based Power Grid N/A A data centric approach to Control: Diving Into Algorithm generative modelling of rough Design Zhou, wang, Diao, Bian, Duan, surfaces: An application to 3D- Shi printed Stainless Steel Fleming N/A End-to-End Diferentiability and N/A Data-driven inverse design Computing to optimization of magnetically Accelerate Materials’ Inverse Design programmed soft structures LIU, Liu, Zhao, Schoenholz, Cubuk, Karacakol, Alapan, Sitti Bauchy N/A Building LEGO using Deep N/AEfcient nonlinear manifold reduced Generative Models of Graphs order model Kim, Choi, Widemann, Thompson, Taylor, DeVries, Ghalebi Zohdi N/A An Industrial Application of Deep N/AEfcient Nanopore Optimization by Reinforcement Learning for CNN-accelerated Deep Chemical Production Scheduling Reinforcement Learning Wang, Cao, Hubbs, Kelloway, Wassick, Sahinidis, Barati Farimani Grossmann N/A Information-Theoretic Multi- N/A TPINN: An improved architecture for Objective Bayesian Optimization distributed physics informed neural with Continuous Approximations networks Manikkan, Srinivasan Belakaria, Deshwal, Doppa N/A Diferentiable Implicit Layers Look, N/A Multi-stage Transmission Line Flow Doneva, Kandemir, Gemulla, Peters Control Using Centralized and N/A Signal Enhancement for Magnetic Decentralized Reinforcement Navigation Challenge Problem Gnadt, Learning Agents Shang, Yang, Zhu, Ye, Belarge, Canciani, Conger, Curro, Zhang, Xu, Lyu, Diao Edelman, Morales, O'Keefe, Taylor, N/A Context-Aware Urban Energy Rackauckas Efciency Optimization Using Hybrid N/A Robotic gripper design with Physical Models Choi, Nutkiewicz, Jain Evolutionary Strategies and Graph N/A Scalable Combinatorial Bayesian Element Networks Alet, Bauza, Optimization with Tractable Jeewajee, Thomsen, Rodriguez, Kaelbling, Statistical models Deshwal, Belakaria, Lozano-Pérez Doppa N/A Electric Vehicle Range Improvement N/A Scalable Multitask Latent Force by Utilizing Deep Learning to Models with Applications to Optimize Occupant Thermal Comfort Predicting Lithium-ion Concentration Warey, Kaushik, Khalighi, Cruse, Tait, Brosa Planella, Widanage, Damoulas Venkatesan 159 Dec. 12, 2020

N/A Learning Partially Known Stochastic These demonstrations will highlight the Dynamics with Empirical PAC Bayes properties and capabilities of PDE-powered deep Haußmann, Gerwinn, Look, Rakitsch, neural networks and serve as a starting point for Kandemir discussing future developments. N/A Prediction of high frequency resistance in polymer electrolyte Bio: membrane fuel cells using Long Nils is an Associate-Professor at the Technical Short Term Memory based model Lin University of Munich (TUM). He and his group N/A Placement in Integrated Circuits focus on deep learning methods for physical using Cyclic Reinforcement Learning simulations, with a particular focus on fuid and Simulated Annealing Vashisht, phenomena. He acquired a Ph.D. for his work on Rampal, Liao, Lu, Shanbhag, Fallon, Kara liquid simulations in 2006 from the University of Erlangen-Nuremberg. Until 2010 he held a position as a post-doctoral researcher at ETH Abstracts (68): Zurich. He received a tech-Oscar from the AMPAS in 2013 for his research on controllable smoke Abstract 2: Nils Thuerey - Lead the Way! efects. Subsequently, he worked for three years Deep Learning via Diferentiable as R&D lead at ScanlineVFX, before starting at Simulations in Machine Learning for TUM in October 2013. Engineering Modeling, Simulation and Design, Thuerey 05:00 AM

Diferentiable physics solvers (from the broader Abstract 4: Angela Dai - Self-supervised feld of diferentiable programming) show generation of 3D shapes and scenes in particular promise for including prior knowledge Machine Learning for Engineering Modeling, into machine learning algorithms. Diferentiable Simulation and Design, Dai 05:40 AM operators were shown to be powerful tools to Understanding the generation of 3D shapes and guide deep learning processes, and PDEs provide scenes is fundamental to comprehensive a wide range of components to build such perception and understanding of real-world operators. They also represent a natural way for environments. Recently, we have seen impressive traditional solvers and deep learning methods to progress in 3D shape generation and promising coexist: Using PDE solvers as diferentiable results in generating 3D scenes, largely relying operators in neural networks allows us to on the availability of large-scale synthetic 3D leverage existing numerical methods for efcient datasets. However, the application to real-world solvers, e.g., to provide reliable and fexible scenes remains challenging due to the domain gradients to update the weights during a learning gap between synthetic and real 3D data. In this run. talk, I will discuss a self-supervised approach for 3D scene generation from partial RGB-D Interestingly, it turns out to be benefcial to observations, and propose new techniques for combine "traditional" supervised and physics- self-supervised training for generating 3D based approaches. The former poses a much geometry and color of scenes. more straightforward and more stable learning task by providing explicit reference data, while Bio: physics-based learning can provide gradients for Angela Dai is an Assistant Professor at the a larger space of states that are only Technical University of Munich. Her research encountered at training time. Here, diferentiable focuses on understanding how the 3D world solvers are particularly powerful, e.g., to provide around us can be modeled and semantically neural networks with feedback about how understood. Previously, she received her PhD in inferred solutions infuence a physical model's computer science from Stanford in 2018 and her long-term behavior. I will show and discuss BSE in computer science from Princeton in 2013. examples with various advection-difusion type Her research has been recognized through a ZDB PDEs, among others the Navier-Stokes equations Junior Research Group Award, an ACM SIGGRAPH for fuids, for diferent learning applications. Outstanding Doctoral Dissertation Honorable 160 Dec. 12, 2020

Mention, as well as a Stanford Graduate physical scenes. The key insight is that many Fellowship. systems can be represented as graphs with nodes connected by edges, which can be processed by graph neural networks and Abstract 6: Poster Session 1 in Machine transformer-based models. The goal of the talk is to show how structured approaches are making Learning for Engineering Modeling, advances in solving increasingly challenging Simulation and Design, 06:20 AM problems in engineering, graphics, and everyday gather.town link: https://neurips.gather.town/app/ interactions with the world. D2n0HkRXoVlgUSWV/ML4Eng-NeurIPS20 Bio: Peter Battaglia is a research scientist at

Abstract 7: Tatiana Lopez-Guevara - Robots, DeepMind. He earned his PhD in Psychology at Liquids & Inference in Machine Learning for the University of Minnesota, and was later a Engineering Modeling, Simulation and postdoc and research scientist in MIT's Department of Brain and Cognitive Sciences. His Design, Lopez-Guevara 08:20 AM current work focuses on approaches for reasoning Our brains are able to exploit coarse physical about and interacting with complex systems, by models of fuids to quickly adapt and solve combining richly structured knowledge with everyday manipulation tasks. However, fexible learning algorithms. developing such capability in robots, so that they can autonomously manipulate fuids adapting to diferent conditions remains a challenge. In this Abstract 11: Break in Machine Learning for talk, I will present diferent strategies that a Engineering Modeling, Simulation and Robot can use to manipulate liquids by using Design, 09:40 AM approximate-but-fast simulation as an internal model. I'll describe strategies to pour and gather.town room will remain open for people calibrate the parameters of the model from who wish to socialize / network during the break: observations of real liquids with diferent https://neurips.gather.town/app/ viscosities via Bayesian Likelihood-free Inference. D2n0HkRXoVlgUSWV/ML4Eng-NeurIPS20 Finally, I'll present a methodology to learn the relevant parameters of a pouring task via Inverse

Value Estimation and describe potential Abstract 13: Karen E Willcox - Operator applications of the learned posterior to reason Inference: Bridging model reduction and about containers and safety. scientifc machine learning in Machine Learning for Engineering Modeling, Bio: Tatiana Lopez-Guevara is a fnal year PhD Simulation and Design, Willcox 11:30 AM student in Robotics and Autonomous Systems at the Edinburgh Centre for Robotics, UK. Her Model reduction methods have grown from the interests are in the application of intuitive physics community, with a focus models for robotic reasoning and manipulation of on reducing high-dimensional models that arise deformable objects. from physics-based modeling, whereas machine learning has grown from the computer science community, with a focus on creating expressive Abstract 9: Peter Battaglia - Structured models from black-box data streams. Yet recent years have seen an increased blending of the two models of physics, objects, and scenes in perspectives and a recognition of the associated Machine Learning for Engineering Modeling, Simulation and Design, Battaglia 09:00 AM opportunities. This talk presents our work in operator inference, where we learn efective This talk will describe various ways of using reduced-order operators directly from data. The structured machine learning models for physical governing equations defne the form of predicting complex physical dynamics, the model we should seek to learn. Thus, rather generating realistic objects, and constructing than learn a generic approximation with weak 161 Dec. 12, 2020

enforcement of the physics, we learn low- problems in current additive manufacturing dimensional operators whose structure is defned technologies, allowing for automated quality by the physics. This perspective provides new assessment and the creation of functional and opportunities to learn from data through the lens reliable structural materials. These advances will of physics-based models and contributes to the fnd applications in robotic devices, energy foundations of Scientifc Machine Learning, storage technologies, orthopedic implants, yielding a new class of fexible data-driven among many others. In the future, this methods that support high-consequence algorithmically driven approach will enable decision-making under uncertainty for physical materials-by-design of complex architectures, systems. opening up new avenues of research on advanced materials with specifc functions and Bio: Karen E. Willcox is Director of the Oden desired properties. Institute for Computational Engineering and Sciences, Associate Vice President for Research, Bio: Grace X. Gu is an Assistant Professor of and Professor of Aerospace Engineering and Mechanical Engineering at the University of Engineering Mechanics at the University of Texas California, Berkeley. She received her PhD and MS at Austin. She is also External Professor at the in Mechanical Engineering from the Santa Fe Institute. Before joining the Oden Massachusetts Institute of Technology and her BS Institute in 2018, she spent 17 years as a in Mechanical Engineering from the University of professor at the Massachusetts Institute of Michigan, Ann Arbor. Her current research Technology, where she served as the founding focuses on creating new materials with superior Co-Director of the MIT Center for Computational properties for mechanical, biological, and energy Engineering and the Associate Head of the MIT applications using multiphysics modeling, Department of Aeronautics and Astronautics. artifcial intelligence, and high-throughput Prior to joining the MIT faculty, she worked at computing, as well as developing intelligent Boeing Phantom Works with the Blended-- additive manufacturing technologies to realize Body aircraft design group. She is a Fellow of the complex material designs previously impossible. Society for Industrial and Applied Mathematics Gu is the recipient of several awards, including (SIAM) and Fellow of the American Institute of the 3M Non-Tenured Faculty Award, MIT Tech Aeronautics and Astronautics (AIAA). Review Innovators Under 35, Johnson & Johnson Women in STEM2D Scholars Award, Royal Society of Chemistry Materials Horizons Outstanding

Abstract 15: Grace X Gu - Artifcial Paper Prize, and SME Outstanding Young intelligence for materials design and Manufacturing Engineer Award. additive manufacturing in Machine Learning for Engineering Modeling, Simulation and Design, Gu 12:10 PM Abstract 18: Poster Session 2 in Machine Learning for Engineering Modeling, Developments in computation spurred the fourth Simulation and Design, 01:00 PM paradigm of materials discovery and design using artifcial intelligence. Our research aims gather.town link: https://neurips.gather.town/app/ to advance design and manufacturing processes D2n0HkRXoVlgUSWV/ML4Eng-NeurIPS20 to create the next generation of high- performance engineering and biological materials The gather.town room will remain live past ofcial by harnessing techniques integrating artifcial 6pm EST fnish time: attendees who wish may intelligence, multiphysics modeling, and stay to discuss, network and socialize as long as multiscale experimental characterization. This they like. work combines computational methods and algorithms to investigate design principles and mechanisms embedded in materials with superior Abstract 19: Real-time Prediction of Soft properties, including bioinspired materials. Tissue Deformations Using Data-driven Additionally, we develop and implement deep Nonlinear Presurgical Simulations in learning algorithms to detect and resolve 162 Dec. 12, 2020

Machine Learning for Engineering Modeling, methods have grown in popularity for image, text Simulation and Design, Liu, Han, Emerson, and other applications, their efectiveness for Majditehran, Rabin, Kara N/A high-dimensional, complex scientifc datasets remains to be investigated. In this work, a data Imaging modalities provide clinicians with real- driven emulator for turbulence closure terms in time visualization of anatomical regions of the context of Large Eddy Simulation (LES) interest (ROI) for the purpose of minimally models is trained using Artifcial Neural Networks invasive surgery. During the procedure, low- and an autoML framework based on Bayesian resolution image data are acquired and Optimization, incorporating priors to jointly registered with high-resolution preoperative 3D optimize the hyper-parameters as well as conduct reconstruction to guide the execution of surgical a full neural network architecture search to preplan. Unfortunately, due to the potential large converge to a global minima, is proposed. strain and nonlinearities in the deformation of Additionally the efect of using diferent network soft biological tissues, signifcant mismatch may weight initialization and optimizers such as be observed between ROI shapes during pre- and ADAM, SGDM and RMSProp, are explored. Weight intra-operative imaging stages, making the and function space similarities during the surgical preplan prone to failure. In an efort to optimization trajectory are investigated, and bridge the gap between the two imaging stages, critical diferences in the learning process this paper presents a data-driven approach based evolution are noted and compared to theory. We on artifcial neural network for predicting the ROI observe ADAM optimizer and Glorot initialization deformation in real time with sparsely registered consistently performs better, while RMSProp fducial markers. For a head-and-neck tumor outperforms SGDM as the latter appears to have model with an average maximum displacement been stuck at a local optima. Therefore, this of 30 mm, the maximum surface ofsets between autoML BayesOpt framework provides a means to benchmarks and predictions using the proposed choose the best hyper-parameter settings for a approach for 98% of the test cases are under 1.0 given dataset. mm, which is the typical resolution of high-quality interventional ultrasound. Each of the prediction processes takes less than 0.5 s. With the resulting Abstract 21: Continuous calibration of a prediction accuracy and computational efciency, digital twin; a particle flter approach in the proposed approach demonstrates its potential to be clinically relevant. Machine Learning for Engineering Modeling, Simulation and Design, Ward, Choudhary, Gregory N/A

Abstract 20: On the Efectiveness of Bayesian Assimilation of continuously streamed monitored AutoML methods for Physics Emulators in data is an essential component of a digital twin. Machine Learning for Engineering Modeling, The assimilated data are then used to ensure the Simulation and Design, Mitra, Dal Santo, digital twin is a true representation of the Haghshenas, Mitra, Daly, Schmidt N/A monitored system; one way this is achieved is by calibration of simulation models, whether data- The adoption of Machine Learning (ML) for derived or physics-based. Traditional manual building emulators for complex physical calibration is not time-efcient in this context; processes has seen an exponential rise in the new methods are required for continuous recent years. While ML models are good function calibration. In this paper, a particle flter approximators, optimizing the hyper-parameters methodology for continuous calibration of the of the model to reach a global minimum is not physics-based model element of a digital twin is trivial, and often needs human knowledge and presented and applied to an example of an expertise. In this light, automatic ML or autoML underground farm. The results are compared methods have gained large interest as they against static Bayesian calibration and are shown automate the process of network hyper- to give insight into the time variation of parameter tuning. In addition, Neural Architecture dynamically varying model parameters. Search (NAS) has shown promising outcomes for improving model performance. While autoML 163 Dec. 12, 2020

Abstract 22: A Learning-boosted Quasi- fow velocity and pressure over the spatio- Newton Method for AC Optimal Power Flow temporal domain. We demonstrate this approach in Machine Learning for Engineering by predicting simulation results over out of range Modeling, Simulation and Design, Baker N/A time interval and for novel design conditions. Our results show that incorporation of Fourier Despite being at the heart of many optimal power features improves the generalization fow solvers, Newton-Raphson can sufer from performance over both temporal domain and slow and numerically unstable Jacobian matrix design space. inversions at each iteration. To reduce the computational burden associated with calculating the full Jacobian and its inverse, many Quasi- Newton methods attempt to fnd a solution to the Abstract 24: ManufacturingNet: A machine optimality conditions by leveraging an learning tool for engineers in Machine approximate Jacobian matrix. In this paper, a Learning for Engineering Modeling, Simulation and Design, Magar, Ghule , Doshi, Quasi-Newton method based on machine learning Seshadri , Khalid, Barati Farimani N/A is presented which performs iterative updates for candidate optimal solutions without having to The manufacturing industry is one of the largest calculate a Jacobian or approximate Jacobian industries in the world, vitally supporting the matrix. The resulting learning-based algorithm economies of many countries across the globe. utilizes a deep neural network with feedback. With the growing deployability of artifcial With proper choice of weights and activation intelligence (AI), manufacturers are turning to AI functions, the model becomes a contraction to turn their production plants into more efcient mapping and convergence can be guaranteed. smart factories. Smart factories have contributed Results demonstrated on networks up to 1,354 towards improving worker safety and their high buses indicate the proposed method is capable of efciency means that they can deliver quality fnding approximate solutions to AC OPF faster products faster to their customers. As the than Newton-Raphson, but can sufer from manufacturing industry embraces machine infeasibile solutions in large networks. learning, demand for user-friendly tools that can deploy complex machine learning models with relative ease for engineering professionals has Abstract 23: Frequency-compensated PINNs been growing over the years. In particular, deep for Fluid-dynamic Design Problems in learning tools need a considerable amount of Machine Learning for Engineering Modeling, programming knowledge and, thus, remain Simulation and Design, Zhang, Dey, Kakkar, obscure to engineers inexperienced with Dasgupta, Chakraborty N/A programming. To overcome these barriers, we propose ManufacturingNet, an open-source Incompressible fuid fow around a cylinder is one machine learning tool for engineers which will of the classical problems in fuid-dynamics with enable them to develop and deploy complex strong relevance with many real-world machine learning models by answering a few engineering problems, for example, design of simple questions. We also have curated ten ofshore structures or design of a pin-fn heat publicly-available datasets and benchmarked the exchanger. Thus learning a high-accuracy performance using ManufacturingNet‘s machine surrogate for this problem can demonstrate the learning models. We obtained state-of-the-art efcacy of a novel machine learning approach. In results for each dataset and have included pre- this work, we propose a physics-informed neural trained models with our package. We believe network (PINN) architecture for learning the ManufacturingNet will enable engineers around relationship between simulation output and the the world to deploy machine learning models with underlying geometry and boundary conditions. In ease. The GitHub repository for ManufacturingNet addition to using a physics-based regularization can be found at https://github.com/BaratiLab/ term, the proposed approach also exploits the ManufacturingNet. underlying physics to learn a set of Fourier Keywords: Manufacturing, Deep Learning, features, i.e. frequency and phase ofset Programming, ManufacturingNet parameters, and then use them for predicting 164 Dec. 12, 2020

violating thermal constraints of lines. A software prototype was developed and deployed in the Abstract 25: Model Order Reduction using a control center of SGCC Jiangsu Electric Power Deep Orthogonal Decomposition in Machine Company that interacts with their Energy Learning for Engineering Modeling, Management System (EMS) every 5 minutes. Simulation and Design, Tait N/A Massive numerical studies using actual power grid snapshots in the real-time environment Near-term prediction of the structured spatio- temporal processes driving our climate is of verify the efectiveness of the proposed profound importance to the safety and well-being approach. Well-trained SAC agents can learn to of millions, but the prounced nonlinear provide efective and subsecond (<20 ms) control actions in regulating voltage profles and convection of these processes make a complete reducing transmission losses. mechanistic description even of the short-term dynamics challenging. However, convective transport provides not only a principled physical description of the problem, but is also indicative Abstract 27: Predicting Nanorobot Shapes via of the transport in time of informative features Generative Models in Machine Learning for which has lead to the recent successful Engineering Modeling, Simulation and development of ``physics free'' approaches. In Design, Benjaminson, Taylor, Travers N/A this work we demonstrate that their remains an important role to be played by physically The feld of DNA nanotechnology has made it possible to assemble, with high yields, diferent informed models, which can successfully structures that have actionable properties. For leverage deep learning (DL) to project the process onto a lower dimensional space on which example, researchers have created components a minimal dynamical description holds. Our that can be actuated, used to sense (e.g., approach synthesises the feature extraction changes in pH), or to store and release loads. An exciting next step is to combine these capabilities of DL with physically motivated components into multifunctional nanorobots that dynamics to outperform existing model free approaches, as well as state of the art hybrid could, potentially, perform complex tasks like approaches, on complex real world datasets swimming to a target location in the human body, including sea surface temperature and detecting an adverse reaction and then releasing a drug load to stop it. However, as we start to precipitation. assemble more complex nanorobots, the yield of the desired nanorobot begins to decrease as the number of possible component combinations Abstract 26: On Training Efective increases. Therefore, the ultimate goal of this Reinforcement Learning Agents for Real- work is to develop a predictive model to time Power Grid Operation and Control in maximize yield. However, training predictive Machine Learning for Engineering Modeling, models typically requires a large dataset. For the Simulation and Design, Diao, Shi, Zhang, nanorobots we are interested in assembling, this Wang, Li, Xu, Lan, Bian, Duan, Wu N/A will be difcult to collect. This is because high- fdelity data, which allows us to exactly Deriving fast and efectively coordinated control characterize the shape and size of individual actions remains a grand challenge afecting the secure and economic operation of today’s large- structures, is extremely time-consuming to scale power grid. This paper presents a novel collect, whereas low-fdelity data is readily artifcial intelligence (AI) based methodology to available but only captures overall statistics for diferent processes. Therefore, this work achieve multi-objective real-time power grid combines low- and high-fdelity data to train a control for real-world implementation. State-of- the-art of-policy reinforcement learning (RL) generative model using a two-step process. First, algorithm, soft actor-critic (SAC) is adopted to we pretrain the model using a relatively small train AI agents with multi-thread ofine training (1000s), high-fdelity dataset to represent the distribution of nanorobot shapes. Second, we bias and periodic online training for regulating the learned distribution towards samples with voltages and transmission losses without certain physical properties that are measured 165 Dec. 12, 2020

using low-fdelity data. In this work we bias our frame SCOPF as a bi-level optimization problem -- distribution towards a desired node degree of a viewing power generation settings as parameters graphical model that we take as a surrogate associated with a neural network defender, and representation of the nanorobots that this work equipment failures as (adversarial) attacks -- and will ultimately focus on. We have not yet solve this problem via gradient-based techniques. accumulated a high-fdelity dataset of We describe the results of initial experiments on nanorobots, so we leverage the MolGAN a 30-bus test system. architecture [1] and the QM9 small molecule dataset [2-3] to demonstrate our approach. Abstract 30: Heat risk assessment using surrogate model for meso-scale surface Abstract 28: Constraint active search for temperature in Machine Learning for experimental design in Machine Learning Engineering Modeling, Simulation and for Engineering Modeling, Simulation and Design, Choi, Pozzi, Berges N/A Design, Malkomes, Cheng, McCourt N/A Heat pattern of cities is characterized by its Many problems in engineering and design require higher temperature than the surrounding balancing competing objectives under the environments, and cities are vulnerable places to presence of uncertainty. The standard approach heat-induced risk because of its dense in the literature characterizes the relationship population. Therefore, fast/accurate heat risk between design decisions and their assessment is desired for mitigation plans and corresponding outcomes as a Pareto frontier, sustainable community management. This paper which is discovered through multiobjective introduces a probabilistic model to forecast the optimization. In this position paper, we suggest meso-scale surface temperature at a relatively that this approach is not ideal for reasoning about low computational cost, as an alternative to practical design decisions. Instead of computationally intensive Numerical Weather multiobjective optimization, we propose soliciting Prediction (NWP) models. After calibrating the desired minimum performance constraints on all model, we integrate the model into the objectives to defne regions of satisfactory. We probabilistic risk analysis framework to estimate present work-in-progress which visualizes the extreme temperature distribution around the design decisions that consistently satisfy user- cities. The surrogate model expands its defned thresholds in an additive manufacturing applicability, providing insights on the future risk problem. and various statistical inferences, being integrated with the framework.

Abstract 29: An adversarially robust approach to security-constrained optimal Abstract 31: Learning to Identify Drilling power fow in Machine Learning for Defects in TurbineBlades with Single Stage Engineering Modeling, Simulation and Detectors in Machine Learning for Design, Bedmutha, Donti, Kolter N/A Engineering Modeling, Simulation and Design, Panizza, Stefanek, Melacci, Veneri, Gori Security-constrained optimal power fow (SCOPF) N/A is a critical problem for the operation of power systems, aiming to schedule power generation in Nondestructive testing (NDT) is widely applied to a way that is robust to potential equipment defect identifcation of turbine components failures. However, many SCOPF approaches during manufacturing and operation. Operational require constructing large optimization problems efciency is key for gas turbine OEM (Original that explicitly account for each of these potential Equipment Manufacturers). Automating the system failures, thus sufering from issues of inspection process as much as possible, while computational complexity that limit their use in minimizing the uncertainties involved, is thus practice. In this paper, we propose an approach crucial. We propose a model based on RetinaNet to solving SCOPF inspired by adversarially robust to identify drilling defects in X-ray images of training in neural networks. In particular, we turbine blades. 166 Dec. 12, 2020

The application is challenging due to the large optical properties of multi-layer optical systems, image resolutions in which defects are very small thereby allowing physical interpretation or what-if and hardly captured by the commonly used analysis. anchor sizes, and also due to the small size of the available dataset. As a matter of fact, all these issues are pretty Abstract 33: Jacobian of Conditional common in the application of Deep Learning- Generative Models for Sensitivity Analysis based object detection models to industrial of Photovoltaic Device Processes in Machine defect data. We overcome such issues using open Learning for Engineering Modeling, source models, splitting the input images into Simulation and Design, Molamohammadi, tiles and scaling them up, applying heavy data Rezaei-Shoshtari, Quitoriano N/A augmentation, and optimizing the anchor size and aspect ratios with a diferential evolution Modeling and sensitivity analysis of complex solver. We validate the model with 3-fold cross- photovoltaic device processes is explored in this validation, showing a very high accuracy in work. We use conditional variational identifying images with defects. We also defne a autoencoders to learn the generative model and set of best practices which can help other latent space of the process which is in turn used practitioners overcome similar challenges. to predict the device performance. We further compute the Jacobian of the trained neural network to compute global sensitivity indices of the inputs in order to obtain an intuition and Abstract 32: Parameterized Reinforcement interpretation of the process. The results show Learning for Optical System Optimization in Machine Learning for Engineering Modeling, the outperformance of generative models Simulation and Design, Wankerl, Stern, compared to predictive models for learning Mahdavi, Eichler, Lang N/A device processes. Furthermore, comparison of the results with sampling-based sensitivity analysis Designing a multi-layer optical system with methods demonstrates the validity of our designated optical characteristics is an inverse approach and the interpretability of the learned design problem in which the resulting design is latent space. determined by several discrete and continuous parameters. In particular, we consider three design parameters to describe a multi-layer Abstract 34: Surrogates for Stif Nonlinear stack: Each layer’s dielectric material and Systems using Continuous Time Echo State thickness as well as the total number of layers. Networks in Machine Learning for Such a combination of both, discrete and Engineering Modeling, Simulation and continuous parameters is a challenging Design, Anantharaman, Rackauckas, Shah N/A optimization problem that often requires a computationally expensive search for an optimal Modern design, control, and optimization often system design. Hence, most methods merely requires simulation of highly nonlinear models, determine the optimal thicknesses of the leading to prohibitive computational costs. These system’s layers. To incorporate layer material and costs can be amortized by evaluating a cheap the total number of layers as well, we propose a surrogate of the full model. Here we present a method that considers the stacking of general data-driven method, the continuous-time consecutive layers as parameterized actions in a echo state network (CTESN), for generating Markov decision process. We propose an surrogates of nonlinear ordinary diferential exponentially transformed reward signal that equations with dynamics at widely separated eases policy optimization and adapt a recent timescales. We empirically demonstrate near- variant of Q-learning for inverse design constant time performance using our CTESNs on optimization. We demonstrate that our method a physically motivated scalable model of a outperforms human experts and a naive heating system whose full execution time reinforcement learning algorithm concerning the increases exponentially, while maintaining achieved optical characteristics. Moreover, the relative error of within 0.2 \%. We also show that learned Q-values contain information about the our model captures fast transients as well as slow 167 Dec. 12, 2020

dynamics efectively, while other techniques such generating a sequence of volumetric primitives. as physics informed neural networks have To alleviate this consequence induced by a huge difculties trying to train and predict the highly number of feasible combinations, we propose a nonlinear behavior of these models. combinatorial 3D shape generation framework. The proposed framework refects an important aspect of human generation processes in real life -- we often create a 3D shape by sequentially Abstract 35: Accelerating Inverse Design of Nanostructures Using Manifold Learning in assembling unit primitives with geometric Machine Learning for Engineering Modeling, constraints. To fnd the desired combination Simulation and Design, Zandehshahvar, regarding combination evaluations, we adopt Bayesian optimization, which is able to exploit Kiarashinejad, Zhu, Maleki, Hemmatyar, and explore efciently the feasible regions Abdollahramezani, Pourabolghasem, Adibi N/A constrained by the current primitive placements. Deep learning and machine learning have An evaluation function conveys global structure recently attracted remarkable attention in the guidance for an assembly process and stability in inverse design of nanostructures. However, terms of gravity and external forces limited works have used these techniques to simultaneously. Experimental results demonstrate reduce the design complexity of structures. In that our method successfully generates this work, we present an evolutionary-based combinatorial 3D shapes and simulates more method using manifold learning for inverse realistic generation processes. We also introduce design of nanostructures with minimal design a new dataset for combinatorial 3D shape complexity. This method encodes the high generation. dimensional spectral responses obtained by electromagnetic simulation software for a class of nanostructure with diferent design complexities Abstract 37: Flaw Detection in Metal Additive using an autoencoder (AE). We model the Manufacturing Using Deep Learned Acoustic governing distributions of the data in the latent Features in Machine Learning for space using Gaussian mixture models (GMM) Engineering Modeling, Simulation and which then provides the level of feasibility of a Design, Zhang, Kara N/A desired response for each structure and use a neural network (NN) to fnd the optimum solution. While additive manufacturing has seen rapid This method also provides valuable information proliferation in recent years, process monitoring about the underlying physics of light-matter and quality assurance methods capable of interactions by representing the sub-manifolds of detecting micro-scale faws have seen little feasible regions for each design complexity level improvement and remain largely expensive and (i.e., number of design parameters) in the latent time-consuming. In this work we propose a space. To show the applicability of the method, pipeline for training two deep learning faw we employ this technique for inverse design of a formation detection techniques including class of nanostructures consisting of dielectric convolutional neural networks and long short- metasurfaces with diferent complexity degrees. term memory networks. We demonstrate that the faw formation mechanisms of interest to this study, including keyhole porosity, lack of fusion,

Abstract 36: Combinatorial 3D Shape and bead up, are separable using these methods. Generation via Sequential Assembly in Both approaches have yielded a classifcation Machine Learning for Engineering Modeling, accuracy over 99% on unseen test sets. The results suggest that the implementation of Simulation and Design, Kim, Chung, Lee, Cho, machine learning enabled acoustic process Park N/A monitoring is potentially a viable replacement for Sequential assembly with geometric primitives traditional quality assurance methods as well as has drawn attention in robotics and 3D vision a tool to guide traditional quality assurance since it yields a practical blueprint to construct a methods. target shape. However, due to its combinatorial property, a greedy method falls short of 168 Dec. 12, 2020

Abstract 38: Collaborative Multidisciplinary the performance of gradient-based methods in Design Optimization with Neural Networks optimizing highly multi-modal loss functions. in Machine Learning for Engineering However, the current DGS method is designed for Modeling, Simulation and Design, de unbounded and uncontrained optimization Becdelievre, Kroo N/A problems, making it inapplicable to real-world engineering optimization problems where the The design of complex engineering systems leads tuning parameters are often bounded and the to solving very large optimization problems loss function is usually constrained by physical involving diferent disciplines. Strategies allowing processes. In this work, we propose to extend to disciplines to optimize in parallel by providing the DGS approach to the constrained inverse sub-objectives and splitting the problem into design framework in order to fnd better optima smaller parts, such as Collaborative Optimization, of multi-modal loss functions. A series of adaptive are promising solutions. However, most of them strategies for smoothing radius and learning rate have slow convergence which reduces their updating are developed to improve the practical use. Earlier eforts to fasten computational efciency and robustness. Our convergence by learning surrogate models have methodology is demonstrated by an example of not yet succeeded at sufciently improving the designing a nanoscale wavelength demultiplexer, competitiveness of these strategies. This paper and shows superior performance compared to the shows that, in the case of Collaborative state-of-the-art approaches. By incorporating Optimization, faster and more reliable volume constraints, the optimized design convergence can be obtained by solving an achieves an equivalently high performance but interesting instance of binary classifcation: on signifcantly reduces the amount of material top of the target label, the training data of one of usage. the two classes contains the distance to the decision boundary and its derivative. Leveraging this information, we propose to train a neural network with an asymmetric loss function, Abstract 40: Scalable Deep-Learning- a structure that guarantees Lipshitz continuity, Accelerated Topology Optimization for and a regularization towards respecting basic Additively Manufactured Materials in Machine Learning for Engineering Modeling, distance function properties. The approach is Simulation and Design, Bi, Zhang, Zhang N/A demonstrated on a toy learning example, and then applied to a multidisciplinary aircraft design Topology optimization (TO) is a popular and problem. powerful computational approach for designing novel structures, materials, and devices. Two computational challenges have limited the Abstract 39: A Nonlocal-Gradient Descent applicability of TO to a variety of industrial Method for Inverse Design in applications. First, a TO problem often involves a Nanophotonics in Machine Learning for large number of design variables to guarantee Engineering Modeling, Simulation and sufcient expressive power. Second, many TO Design, Bi, Zhang, Zhang N/A problems require a large number of expensive physical model simulations, and those Local-gradient-based optimization approaches simulations cannot be parallelized. To address lack nonlocal exploration ability required for these issues, we propose a general scalable escaping from local minima when searching non- deep-learning (DL) based TO framework, referred convex landscapes. A directional Gaussian to as SDL-TO, which utilizes parallel CPU+GPU smoothing (DGS) approach was recently schemes to accelerate the TO process for proposed in \cite{2020arXiv200203001Z} and designing additively manufactured (AM) used to defne a truly nonlocal gradient, referred materials. Unlike the existing studies of DL for to as the DGS gradient, in order to enable TO, our framework accelerates TO by learning the nonlocal exploration in high-dimensional black- iterative history data and simultaneously training box optimization. Promising results show that on the mapping between the given design and its replacing the traditional local gradient with the gradient. The surrogate gradient is learned by nonlocal DGS gradient can signifcantly improve utilizing parallel computing on multi-CPUs 169 Dec. 12, 2020

incorporated with distributed DL training on Modeling, Simulation and Design, Yang, Jha, multi-GPUs. The surrogate gradient enables a fast Paul, Liao, Choudhary, Agrawal N/A online update scheme instead of an expensive Microstructural materials design is one of the update. Using a local sampling strategy, we achieve to reduce the intrinsic high most important applications of inverse modeling dimensionality of design space and improve the in materials science. Generally speaking, there training accuracy and the scalability of the SDL- are two broad modeling paradigms in scientifc applications: forward and inverse. While the TO framework. The method is demonstrated by forward modeling estimates the observations benchmark examples and AM materials design for heat conduction, and shows competitive based on known parameters, the inverse performance compared to the baseline methods modeling attempts to infer the parameters given but signifcantly reduce the computational cost the observations. Inverse problems are usually more critical as well as difcult in scientifc by a speed up of 8.6x over standard TO applications as they seek to explore the implementation. parameters that cannot be directly observed. Inverse problems are used extensively in various scientifc felds, such as geophysics, healthcare Abstract 41: Learning Mesh-Based Simulation and materials science. However, it is challenging with Graph Networks in Machine Learning to solve inverse problems, because they usually for Engineering Modeling, Simulation and need to learn a one-to-many non-linear mapping, Design, Pfaf, Fortunato, Sanchez Gonzalez, and also require signifcant computing time, Battaglia N/A especially for high-dimensional parameter space. Further, inverse problems become even more Mesh-based simulations are central to modeling difcult to solve when the dimension of input (i.e. complex physical systems in many disciplines across science and engineering, as they support observation) is much lower than that of output powerful numerical integration methods and their (i.e. parameters). In this work, we propose a resolution can be adapted to strike favorable framework consisting of generative adversarial networks and mixture density networks for trade-ofs between accuracy and efciency. inverse modeling, and it is evaluated on a Here we introduce MeshGraphNets, a graph neural network-based method for learning materials science dataset for microstructural simulations, which leverages mesh materials design. Compared with baseline representations. Our model can be trained to methods, the results demonstrate that the proposed framework can overcome the above- pass messages on a mesh graph and to adapt the mentioned challenges and produce multiple mesh discretization during forward simulation. We show that our method can accurately predict promising solutions in an efcient manner. the dynamics of a wide range of physical systems, including aerodynamics, structural mechanics, and cloth-- and do so efciently, Abstract 43: Simultaneous Process Design running 1-2 orders of magnitude faster than the and Control Optimization using simulation on which it is trained. Our approach Reinforcement Learning in Machine broadens the range of problems on which neural Learning for Engineering Modeling, network simulators can operate and promises to Simulation and Design, Sachio, del Rio improve the efciency of complex, scientifc Chanona, Petsagkourakis N/A modeling tasks. With the ever-increasing numbers in population and quality in healthcare, it is inevitable for the demand of energy and natural resources to rise. Abstract 42: A General Framework Combining Therefore, it is important to design highly Generative Adversarial Networks and efcient and sustainable chemical processes in Mixture Density Networks for Inverse the pursuit of sustainability. The performance of a Modeling in Microstructural Materials chemical plant is highly afected by its design Design in Machine Learning for Engineering and control. A design cannot be evaluated without its controls and vice versa. To optimally 170 Dec. 12, 2020

address design and control simultaneously, one contributes to efcient energy management and must formulate a bi-level mixed-integer nonlinear sustainability in smart buildings. program with a dynamic optimization problem as the inner problem; this, is intractable. However, by computing an optimal policy using Abstract 45: Probabilistic Adjoint Sensitivity reinforcement learning, a controller with close- Analysis for Fast Calibration of Partial form expression can be found and embedded into Diferential Equation Models in Machine the mathematical program. In this work, an Learning for Engineering Modeling, approach using a policy gradient method along Simulation and Design, Cockayne, Duncan N/A with mathematical programming to solve the problem simultaneously is proposed. The Calibration of large-scale diferential equation approach was tested in two case studies and the models to observational or experimental data is a performance of the controller was evaluated. It widespread challenge throughout applied was shown that the proposed approach sciences and engineering. A crucial bottleneck in outperforms current state-of-the-art control state-of-the art calibration methods is the strategies. This opens a whole new range of calculation of local sensitivities, i.e. derivatives of possibilities to address the simultaneous design the loss function with respect to the estimated and control of engineering systems. parameters, which often necessitates several numerical solves of the underlying system of partial diferential equations. In this paper, we present a new probabilistic approach which Abstract 44: A Sequential Modelling permits budget-constrained computations of local Approach for Indoor Temperature Prediction and Heating Control in Smart Buildings in sensitivities, providing a quantifcation of Machine Learning for Engineering Modeling, uncertainty incurred in the sensitivities from this Simulation and Design, Huang, Miles, Zhang N/ constraint. Moreover, information from previous sensitivity estimates can be recycled in A subsequent computations, reducing the overall The rising availability of large volume data has computational efort for iterative gradient-based enabled a wide application of statistical Machine calibration methods. Learning (ML) algorithms in the domains of Cyber-Physical Systems (CPS), Internet of Things (IoT) and Smart Building Networks (SBN). This Abstract 46: Multi-Loss Sub-Ensembles for paper proposes a learning-based framework for Accurate Classifcation with Uncertainty sequentially applying the data-driven statistical Estimation in Machine Learning for methods to predict indoor temperature and yields Engineering Modeling, Simulation and an algorithm for controlling building heating Design, Achrack, Kellerman, Barzilay N/A system accordingly. This framework consists of a two-stage modelling efort: in the frst stage, an Deep neural networks (DNNs) have made a univariate time series model (AR) was employed revolution in numerous felds during the last to predict ambient conditions; together with other decade. However, in tasks with high safety control variables, they served as the input requirements, such as medical or autonomous features for a second stage modelling where an driving applications, providing an assessment of multivariate ML model (XGBoost) was deployed. the model's reliability can be vital. Uncertainty The models were trained with real world data estimation for DNNs has been addressed using from building sensor network measurements, and Bayesian methods, providing mathematically used to predict future temperature trajectories. founded models for reliability assessment. These Experimental results demonstrate the model are computationally expensive and efectiveness of the modelling approach and generally impractical for many real-time use control algorithm, and reveal the promising cases. Recently, non-Bayesian methods were potential of the data-driven approach in smart proposed to tackle uncertainty estimation more building applications over traditional dynamics- efciently. based modelling methods. By making wise use of We propose an efcient method for uncertainty IoT sensory data and ML algorithms, this work estimation in DNNs achieving high accuracy. We 171 Dec. 12, 2020

simulate the notion of multi-task learning on free method applied with 20,000 circuit single-task problems by producing parallel simulations to train the policy, DynaOpt achieves predictions from similar models difering by their even much better performance by learning from loss. This multi-loss approach allows one-phase scratch with only 500 simulations. training for single-task learning with uncertainty estimation. We keep our inference time relatively low by leveraging the advantage proposed by the Abstract 48: Multilevel Delayed Acceptance Deep Sub-Ensembles method. MCMC with an Adaptive Error Model in The novelty of this work resides in the proposed PyMC3 in Machine Learning for Engineering accurate variational inference with a simple and Modeling, Simulation and Design, convenient training procedure, while remaining Lykkegaard, Mingas, Scheichl, Fox, Dodwell N/A competitive in terms of computational time. We conduct experiments on SVHN, CIFAR10, Uncertainty Quantifcation using Markov Chain CIFAR100 as well as ImageNet using diferent Monte Carlo (MCMC) can be prohibitively architectures. Our results show improved expensive for target probability densities with accuracy on the classifcation task and expensive likelihood functions, for instance when competitive results on several uncertainty it involves solving a Partial Diferential Equation measures. (PDE), as is the case in a wide range of engineering applications. Multilevel Delayed Acceptance (MLDA) with an Adaptive Error Model (AEM) is a novel approach, which alleviates this Abstract 47: Analog Circuit Design with Dyna- problem by exploiting a hierarchy of models, with Style Reinforcement Learning in Machine Learning for Engineering Modeling, increasing complexity and cost, and correcting Simulation and Design, Lee, Oliehoek N/A the inexpensive models on-the-fy. The method has been integrated with the open-source In this work, we present a learning based probabilistic programming package PyMC3 and is approach to analog circuit design, where the goal available in the latest development version. In is to optimize circuit performance subject to this paper, we present the algorithm along with certain design constraints. One of the aspects an illustrative example. that makes this problem challenging to optimize, is that measuring the performance of candidate confgurations with simulation can be Abstract 49: Bayesian polynomial chaos in computationally expensive, particularly in the Machine Learning for Engineering Modeling, post-layout design. Additionally, the large Simulation and Design, Seshadri, Duncan, number of design constraints and the interaction Scillitoe N/A between the relevant quantities makes the problem complex. Therefore, to better facilitate In this brief paper we introduce Bayesian supporting the human designers, it is desirable to polynomial chaos, a Gaussian process analogue gain knowledge about the whole space of feasible to polynomial chaos. We argue why this Bayesian solutions. In order to tackle these challenges, we re-formulation of polynomial chaos is necessary take inspiration from model-based reinforcement and then proceed to mathematically defne it, learning and propose a method with two key followed by an examination of its utility in properties. First, it learns a reward model, i.e., computing moments and sensitivities; multi- surrogate model of the performance fdelity modelling, and information fusion. approximated by neural networks, to reduce the required number of simulation. Second, it uses a stochastic policy generator to explore the diverse Abstract 50: Battery Model Calibration with solution space satisfying constraints. Together we Deep Reinforcement Learning in Machine combine these in a Dyna-style optimization Learning for Engineering Modeling, framework, which we call DynaOpt, and Simulation and Design, Unagar, Tian, Fink, empirically evaluate the performance on a circuit Arias Chao N/A benchmark of a two-stage operational amplifer. The results show that, compared to the model- 172 Dec. 12, 2020

Lithium-Ion (Li-I) batteries have recently become specialized design for each new task and setting pervasive and are used in many is computationally expensive, especially if the physical assets. To enable a good prediction of task changes frequently. In this work, our goal is the end of discharge of batteries, to select mobile robot designs that will perform detailed electrochemical Li-I battery models have highest in a given environment under a known been developed. Their parameters are typically control policy, with the assumption that the calibrated before they are taken into operation selection process must be conducted for new and are typically not re-calibrated during environments frequently. We use deep operation. However, since battery performance is reinforcement learning to create a neural network afected by aging, the reality gap between the that, given a terrain map as an input, outputs the computational battery models and the real mobile robot designs deemed most likely to physical systems leads to inaccurate predictions. locomote successfully in that environment. A supervised machine learning algorithm would require an extensive representative training dataset mapping the observation to the ground Abstract 52: Rethink AI-based Power Grid truth calibration parameters. This may be Control: Diving Into Algorithm Design in infeasible for many practical applications. In this Machine Learning for Engineering Modeling, paper, we implement a Reinforcement Learning- Simulation and Design, Zhou, wang, Diao, based framework for reliably and efciently Bian, Duan, Shi N/A inferring calibration parameters of battery models. The framework enables real-time Recently, deep reinforcement learning (DRL)- inference of the computational model parameters based approach has shown promise in solving in order to compensate the reality-gap from the complex decision and control problems in power observations. Most importantly, the proposed engineering domain. In this paper, we present an methodology does not need any labeled data in-depth analysis of DRL-based voltage control samples, (samples of observations and the from aspects of algorithm selection, state space ground truth calibration parameters). representation, and reward engineering. To Furthermore, the framework does not require any resolve observed issues, we propose a novel information on the underlying physical model.The imitation learning-based approach to directly map experimental results demonstrate that the power grid operating points to efective actions proposed methodology is capable of inferring the without any interim reinforcement learning model parameters with high accuracy and high process. The performance results demonstrate robustness. While the achieved results are that the proposed approach has strong comparable to those obtained with supervised generalization ability with much less training machine learning, they do not rely on the ground time. The agent trained by imitation learning is truth information during training. efective and robust to solve voltage control problem and outperforms the former RL agents.

Abstract 51: Modular mobile robot design selection with deep reinforcement learning Abstract 53: End-to-End Diferentiability and in Machine Learning for Engineering Tensor Processing Unit Computing to Modeling, Simulation and Design, Whitman, Accelerate Materials’ Inverse Design in Travers, Choset N/A Machine Learning for Engineering Modeling, Simulation and Design, LIU, Liu, Zhao, The widespread adoption of robots will require a Schoenholz, Cubuk, Bauchy N/A fexible and automated approach to robot design. Exploring the full space of all possible designs Numerical simulations have revolutionized when creating a custom robot can prove to be material design. However, although simulations computationally intractable, leading us to excel at mapping an input material to its output consider modular robots, composed of a common property, their direct application to inverse set of repeated components that can be design (i.e., mapping an input property to an reconfgured for each new task. But, conducting a optimal output material) has traditionally been combinatorial optimization process to create a limited by their high computing cost and lack of 173 Dec. 12, 2020

diferentiability—so that simulations are often show that neural networks can replaced by surrogate machine learning models learn a more efcient latent space representation in inverse design problems. Here, taking the on advection-dominated data example of the inverse design of a porous matrix from 2D Burgers' equations with a high Reynolds featuring targeted sorption isotherm, we number. A speed-up of up to introduce a computational inverse design 11.7 for 2D Burgers' equations is achieved with framework that addresses these challenges. We an appropriate treatment of reformulate a lattice density functional theory of the nonlinear terms through a hyper-reduction sorption as a diferentiable simulation technique. programmed on TensorFlow platform that leverages automated end-to-end diferentiation. Thanks to its diferentiability, the simulation is Abstract 55: Efcient Nanopore Optimization used to directly train a deep generative model, by CNN-accelerated Deep Reinforcement which outputs an optimal porous matrix based on Learning in Machine Learning for an arbitrary input sorption isotherm curve. Engineering Modeling, Simulation and Importantly, this inverse design pipeline Design, Wang, Cao, Barati Farimani N/A leverages for the frst time the power of tensor processing units (TPU)—an emerging family of Two-dimensional nanomaterials, such as dedicated chips, which, although they are graphene, have been extensively studied specialized in deep learning, are fexible enough because of their outstanding physical properties. for intensive scientifc simulations. This approach Structure and geometry optimization of holds promise to accelerate inverse materials nanopores on such materials is benefcial for their design. performance in real-world engineering applications such as water desalination. However, the optimization process often involves very large numbers of experiments or simulations which are Abstract 54: Efcient nonlinear manifold expensive and time-consuming. In this work, we reduced order model in Machine Learning for Engineering Modeling, Simulation and propose a graphene nanopore optimization Design, Kim, Choi, Widemann, Zohdi N/A framework via the combination of deep reinforcement learning (DRL) and convolutional Traditional linear subspace reduced order models neural network (CNN) for efcient water (LS-ROMs) are able to desalination. The DRL agent controls the accelerate physical simulations, in which the geometry of nanopore, while the CNN is intrinsic solution space falls employed to predict the water fux and ion into a subspace with a small dimension, i.e., the rejection of the nanoporous graphene membrane solution space has a small at a certain external pressure. With the CNN- Kolmogorov n-width. However, for physical accelerated property prediction, our DRL agent phenomena not of this type, such can optimize the nanoporous graphene efciently as advection-dominated fow phenomena, a low- in an online manner. Experiments show that our dimensional linear subspace framework can design nanopore structures that poorly approximates the solution. To address are promising in energy-efcient water cases such as these, we have desalination. developed an efcient nonlinear manifold ROM (NM-ROM), which can better approximate high-fdelity model solutions with a Abstract 56: Information-Theoretic Multi- smaller latent space Objective Bayesian Optimization with dimension than the LS-ROMs. Our method takes Continuous Approximations in Machine advantage of the existing Learning for Engineering Modeling, numerical methods that are used to solve the Simulation and Design, Belakaria, Deshwal, corresponding full order models Doppa N/A (FOMs). The efciency is achieved by developing a hyper-reduction technique Many real-world applications involve black-box in the context of the NM-ROM. Numerical results optimization of multiple objectives using 174 Dec. 12, 2020

continuous function approximations that trade-of power grid of SGCC Zhejiang Electric Power accuracy and resource cost of evaluation. For Company. example, in rocket launching research, we need to fnd designs that trade-of return-time and angular distance using continuous-fdelity Abstract 58: Context-Aware Urban Energy simulators (e.g., varying tolerance parameter to Efciency Optimization Using Hybrid trade-of simulation time and accuracy) for Physical Models in Machine Learning for design evaluations. The goal is to approximate Engineering Modeling, Simulation and the optimal Pareto set by minimizing the cost for Design, Choi, Nutkiewicz, Jain N/A evaluations. In this paper, we propose a novel approach referred to as {\em {\bf i}nformation- Buildings produce more U.S. greenhouse gas Theoretic {\bf M}ulti-Objective Bayesian {\bf O} emissions through electricity generation than any ptimization with {\bf C}ontinuous {\bf A} other economic sector. To improve the energy pproximations (iMOCA)} to solve this problem. efciency of buildings, engineers often rely on The key idea is to select the sequence of input physics-based building simulations to predict the and function approximations for multiple impacts of retrofts in individual buildings. In objectives which maximize the information gain dense urban areas, these models sufer from per unit cost for the optimal Pareto front. Our inaccuracy due to imprecise parameterization or experiments on diverse synthetic and real-world external, unmodeled urban context factors such benchmarks show that iMOCA signifcantly as inter-building efects and urban microclimates. improves over existing single-fdelity methods. In a case study of approximately 30 buildings in Sacramento, California, we demonstrate how our hybrid physics-driven deep learning framework Abstract 57: Multi-stage Transmission Line can use these external factors advantageously to Flow Control Using Centralized and identify a more optimal energy efciency retroft installation strategy and achieve signifcant Decentralized Reinforcement Learning savings in both energy and cost. Agents in Machine Learning for Engineering Modeling, Simulation and Design, Shang, Yang, Zhu, Ye, Zhang, Xu, Lyu, Diao N/A Abstract 59: Scalable Combinatorial Bayesian Planning future operational scenarios of bulk Optimization with Tractable Statistical power systems that meet security and economic models in Machine Learning for Engineering constraints typically requires intensive labor Modeling, Simulation and Design, Deshwal, eforts in performing massive simulations. To Belakaria, Doppa N/A automate this process and relieve engineers' burden, a novel multi-stage approach is We study the problem of optimizing expensive presented in this paper to train centralized and blackbox functions over combinatorial spaces decentralized reinforcement learning agents that (e.g., sets, sequences, trees, and graphs). BOCS can automatically adjust grid controllers for is a state-of-the-art Bayesian optimization regulating transmission line fows at normal method for tractable statistical models, which condition and under contingencies. The power performs semi-defnite programming based grid fow control problem is formulated as Markov acquisition function optimization (AFO) to select Decision Process (MDP). At Stage 1, centralized the next structure for evaluation. Unfortunately, soft actor-critic (SAC) agent is trained to control BOCS scales poorly for large number of binary generator active power outputs in a wide area to and/or categorical variables. Based on recent control transmission line fows against specifed advances in submodular relaxation for solving security limits. If line overloading issues remain Binary Quadratic Programs, we study an unresolved, Stage 2 is used that train approach referred as Parametrized Submodular decentralized SAC agents via load throw-over at Relaxation (PSR) towards the goal of improving local substations. The efectiveness of the the scalability and accuracy of solving AFO proposed approach is verifed on a series of problems for BOCS model. Experiments on actual planning cases used for operating the diverse benchmark problems including real-world applications in communications engineering and 175 Dec. 12, 2020

electronic design automation show signifcant present LSM defect identifcation approach based improvements with PSR for BOCS model. on machine learning (ML). We show that this ML approach is able to successfully detect anomalous readings using a series of methods Abstract 60: Scalable Multitask Latent Force with increasing model complexity and capacity. The methods start from unsupervised learning Models with Applications to Predicting with "point" methods and eventually increase Lithium-ion Concentration in Machine Learning for Engineering Modeling, complexity to supervised learning with sequence Simulation and Design, Tait, Brosa Planella, methods and multi-output predictions. We Widanage, Damoulas N/A observe data leakage issues for some methods with randomized train/test splitting and resolve Engineering applications typically require a them by specifc non-randomized splitting of mathematical reduction of complex physical training and validation data. We also achieve a model to a more simplistic representation, 200x acceleration of support-vector classifer unfortunately this simplifcation typically leads to (SVC) method by porting computations from CPU a missing physics problem. In this work we to GPU leveraging the cuML RAPIDS AI library. For introduce a state space solution to recovering the sequence methods, we develop a customized hidden physics by sharing information between Convolutional Neural Network (CNN) architecture diferent operating scenarios, referred to as based on 1D convolutional flters to identify and ``tasks''. We introduce an approximation that characterize multiple properties of these defects. ensures the resulting model scales linearly in the In the end, we report scalability of the best- number of tasks, and provide theoretical performing methods and compare them, for guarantees that this solution will exist for viability in feld trials. sufciently small time-steps. Finally we demonstrate how this framework may be used to improve the prediction of Lithium-ion Abstract 62: Autonomous Control of a concentration in electric batteries. Particle Accelerator using Deep Reinforcement Learning in Machine Learning for Engineering Modeling, Abstract 61: Machine Learning-based Simulation and Design, Pang, Thulasidasan, Anomaly Detection with Magnetic Data in Rybarcyk N/A Machine Learning for Engineering Modeling, Simulation and Design, Mitra, Akhiyarov, We describe an approach to learning optimal Araya-Polo, Byrd N/A control policies for a large, linear particle accelerator that uses a powerful AI-based Pipeline integrity is an important area of concern approach using deep reinforcement learning for the oil and gas, refning, chemical, hydrogen, coupled with a high-fdelity physics engine. The carbon sequestration, and electric-power framework consists of an AI controller that uses industries, due to the safety risks associated with deep neural nets for state and action-space pipeline failures. Regular monitoring, inspection, representation and learns optimal policies using and maintenance of these facilities is therefore reward signals that are provided by the physics required for safe operation. Large stand-of simulator. For this work, we only focus on magnetometry (LSM) is a non-intrusive, passive controlling a small section of the entire magnetometer-based measurement technology accelerator. Nevertheless, initial results indicate that has shown promise in detecting defects that we can achieve better-than-human level (anomalies) in regions of elevated mechanical performance in terms of particle beam current stresses. However, analyzing the noisy multi- and distribution. The ultimate goal of this line of sensor LSM data to clearly identify regions of wok is to substantially reduce the tuning time for anomalies is a signifcant challenge. This is such facilities by orders of magnitude, and mainly due to the high frequency of the data achieve near-autonomous control. collection, mis-alignment between consecutive inspections and sensors, as well as the number of sensor measurements recorded. In this paper we 176 Dec. 12, 2020

Abstract 63: Exact Preimages of Neural performances in a wide range of engineering Network Aircraft Collision Avoidance felds, Machine Learning (ML) algorithms are Systems in Machine Learning for natural candidates to tackle the challenges Engineering Modeling, Simulation and involved in the design of intelligent maintenance Design, Matoba, Fleuret N/A approaches. In particular, given the potentially catastrophic consequences associated with A common pattern of progress in engineering has wrong maintenance decisions, it is desirable that seen deep neural networks displacing human- ML algorithms provide uncertainty estimates designed logic. There are many advantages to alongside their predictions. In this work, we this approach, divorcing decisionmaking from propose and compare a number of techniques human oversight and intuition has costs as well. based on Gaussian Processes (GPs) that can cope One is that deep neural networks can map similar with this aspect. We apply these algorithms to inputs to very diferent outputs in a way that the new C-MAPSS (Commercial Modular Aero- makes their application to safety-critical problem Propulsion System Simulation) dataset from NASA problematic. for aircraft engines. The results show that the proposed methods are able to provide very We present a method to check that the decisions accurate RUL predictions along with sensible of a deep neural network are as intended by uncertainty estimates, resulting in more safely constructing the exact preimage of its deployable solutions to real-life industrial predictions. Preimages generalize verifcation in applications. the sense that they can be used to verify a wide class of properties, and answer much richer questions besides. We examine the functioning of an aircraft collision avoidance system, and show Abstract 65: Decoding the genome of cement by Gaussian Process Regression in Machine how exact preimages reduce undue conservatism Learning for Engineering Modeling, when examining dynamic safety. Simulation and Design, Song, Wang, Wang, Our method iterates backwards through the Bauchy N/A layers of piecewise linear deep neural networks. Reducing the carbon footprint in cement Uniquely, we compute \emph{all} intermediate production is a pressing challenge faced by the values that correspond to a prediction, construction industry. In the past few years, the propagating this calculation through layers using world annual cement consumption is analytical formulae for layer preimages. approximately at 4 billion tons, where each ton leads to 1-ton CO2 emissions. To curb the massive environmental impact, it is pertinent to improve material performance and reduce carbon Abstract 64: Uncertainty-aware Remaining embodiment of cement. This requires an in-depth Useful Life predictors in Machine Learning understanding of how cement strength is for Engineering Modeling, Simulation and controlled by its chemical composition. Although Design, Biggio, Arias Chao, Fink N/A this problem has been investigated for more than one hundred years, our current knowledge is still Remaining Useful Life (RUL) estimation is the defcient for a clear decomposition of this problem of inferring how long a certain industrial complex composition-strength relationship. Here, asset is going to operate until a system failure we take advantage of Gaussian process occurs. Deploying successful RUL methods in regression (GPR) to decipher the fundamental real-life applications would result in a drastic compositional attributes (the cement "genome") change of perspective in the context of to cement strength performance. Among all maintenance of industrial assets. In particular, machine learning methods applied to the same the design of intelligent maintenance strategies dataset, our GPR model achieves the highest capable of automatically establishing when accuracy of predicting cement strength based on interventions have to be performed has the the chemical compounds. Based on the optimized potential of drastically reducing costs and GPR model, we are able to decompose the machine downtimes. In light of their superior 177 Dec. 12, 2020

infuence of each oxide on cement strength to an Engineering Modeling, Simulation and unprecedented level. Design, Karacakol, Alapan, Sitti N/A

Magnetically programmed soft structures with complex, fast, and reversible deformation Abstract 66: A data centric approach to capabilities are transforming various felds generative modelling of rough surfaces: An including soft robotics, wearable devices, and application to 3D-printed Stainless Steel in active metamaterials. While the encoded Machine Learning for Engineering Modeling, magnetization profle determines the shape- Simulation and Design, Fleming N/A transformation of the magnetic soft structures, The emergence of 3D printing technologies for the current design methods are mainly limited to stainless steel enables steel struc-tures with intuition-based trial and error process. In this work, a data-driven inverse design optimization almost arbitrarily complex geometries to be approach for magnetically programmed soft manufactured. A common design preference for steel structures is that they arethin-walled, to structures is introduced to achieve complex reduce weight and limit the requirement for raw shape-transformations. The proposed method is material. The mechanical properties of thin- optimizing the design of the magnetization profle by utilizing a genetic algorithm relying on ftness walled structures are principally determined by and novelty function running cost-efectively in a their geometry; however, 3D-printed steel components exhibit geometric variation beyond simulation environment. Inverse design that which was intended, due to the welding optimization of magnetization profles for the process involved, at a scale that is non-negligible quasi-static shape-transformation of 2D linear beams into 'M', 'P', and 'I' letter shapes are with respect to the thickness of the wall. The presented. 3D magnetization profle optimization cumulative impact of geometric variation is to alter the macro-scale mechanical properties of a enabled 3D deformation a rotating beam printed component, such as deformation under demonstration. The presented approach is also load. An important challenge is therefore to expanded to design of 3D magnetization profle for 3D shape-transformation of a linear beam predict the (random) macro-scale mechanical rotating along its longitudinal axis. The data- properties of a component, before it is manufactured. To address this, we trained a driven inverse design approach established here generative probabilistic model for rough surfaces paves the way for the automated design of defned on smooth manifolds to an magnetic soft structures with complex 3D shape- transformations. experimentally-obtained dataset consisting of samples of 3D-printed steel. Combined with fnite element simulation of components under load, we were able to produce detailed probabilistic Abstract 68: Building LEGO using Deep predictions of the mechanical properties of a 3D- Generative Models of Graphs in Machine printed steel component. The main technical Learning for Engineering Modeling, challenge was to transfer information from the Simulation and Design, Thompson, Taylor, training dataset to the hypothetical component, DeVries, Ghalebi N/A whose notional geometry may be described by a diferent manifold. Our proposed solution was to Generative models are now used to create a variety of high-quality digital artifacts. Yet their employ spatial random feld models which can be use in designing physical objects has received far characterised locally using a diferential operator, and to leverage the correspondence between the less attention. In this paper, we argue for the Laplacian on the training and the test manifolds building toy LEGO as a platform for developing to facilitate the transfer of information. generative models of sequential assembly. We develop a generative model based on graph- structured neural networks that can learn from human-built structures and produce visually Abstract 67: Data-driven inverse design compelling designs. optimization of magnetically programmed soft structures in Machine Learning for 178 Dec. 12, 2020

Abstract 69: An Industrial Application of PINN across diferent non overlapping sub- Deep Reinforcement Learning for Chemical domains are changed keeping the other layers Production Scheduling in Machine Learning same for all the sub-domains. Solutions from for Engineering Modeling, Simulation and diferent sub-domains are connected via problem Design, Hubbs, Kelloway, Wassick, Sahinidis, specifc interface conditions which are Grossmann N/A incorporated in to the loss function. We demonstrate the efcacy of TPINN through two We discuss the implementation of a deep heat transfer problems. reinforcement learning based agent to automatically make scheduling decisions for a continuous chemical reactor currently in operation. This model is tasked with scheduling Abstract 71: Diferentiable Implicit Layers in the reactor on a daily basis in the face of Machine Learning for Engineering Modeling, uncertain demand and production interruptions. Simulation and Design, Look, Doneva, Kandemir, Gemulla, Peters N/A The reinforcement learning model has been trained on a simulator of the scheduling process In this paper, we introduce an efcient that was built with historical demand and backpropagation scheme for non-constrained production data. The model has been successfully implicit functions. These functions are implemented to develop schedules on-line for an parametrized by a set of learnable weights and industrial reactor and has exhibited may optionally depend on some input; making improvements over human made schedules. We them perfectly suitable as learnable layer in a discuss the process of training, implementation, neural network. We demonstrate our scheme on and development of this system and the diferent applications: (i) neural ODEs with the application of reinforcement learning for complex, implicit Euler method, and (ii) system stochastic decision making in the chemical identifcation in model predictive control. industry.

Abstract 72: Signal Enhancement for Abstract 70: TPINN: An improved architecture Magnetic Navigation Challenge Problem in for distributed physics informed neural Machine Learning for Engineering Modeling, networks in Machine Learning for Simulation and Design, Gnadt, Belarge, Engineering Modeling, Simulation and Canciani, Conger, Curro, Edelman, Morales, Design, Manikkan, Srinivasan N/A O'Keefe, Taylor, Rackauckas N/A

Signifcant progress has been made to obtain Harnessing the magnetic feld of the earth for approximate solutions to PDEs using neural navigation has shown promise as a viable networks as a basis. One of these approaches alternative to other navigation systems. A (and the most popular and well-developed one) is magnetic navigation system collects its own the Physics Informed Neural Network (PINN). PINN magnetic feld data using a magnetometer and has proved to provide promising results in various uses magnetic anomaly maps to determine the forward and inverse problems with great current location. The greatest challenge with accuracy. However, PINN cannot be employed in magnetic navigation arises when the magnetic its native form for solving problems where the feld data from the magnetometer on the PDE changes its form or when there is a navigation system encompass the magnetic feld discontinuity in the parameters of PDE across from not just the earth, but also from the vehicle diferent sub-domains. Using separate PINNs for on which it is mounted. It is difcult to separate each sub-domain and connecting the the earth magnetic anomaly feld magnitude, corresponding solutions by interface conditions is which is crucial for navigation, from the total a possible solution for this. However, this magnetic feld magnitude reading from the approach demands a high computational burden sensor. The purpose of this challenge problem is and memory usage. Here, we present a new to decouple the earth and aircraft magnetic method, Transfer Physics Informed Neural signals in order to derive a clean signal from Network (TPINN), where one or more layer of which to perform magnetic navigation. Baseline 179 Dec. 12, 2020

testing on the dataset shows that the earth Heating, ventilation and air-conditioning (HVAC) magnetic feld can be extracted from the total systems can have a signifcant impact on the magnetic feld using machine learning (ML). The driving range of battery electric vehicles (EV’s). challenge is to remove the aircraft magnetic feld Predicting thermal comfort in an automotive from the total magnetic feld using a trained vehicle cabin’s highly asymmetric and dynamic neural network. These challenges ofer an thermal environment is critical for developing opportunity to construct an efective neural energy-efcient HVAC systems. In this study we network for removing the aircraft magnetic feld have coupled high-fdelity Computational Fluid from the dataset, using an ML algorithm Dynamics (CFD) simulations and Artifcial Neural integrated with physics of magnetic navigation. Networks (ANN) to predict vehicle occupant thermal comfort for any combination of steady- state boundary conditions. A vehicle cabin CFD

Abstract 73: Robotic gripper design with model, validated against climatic wind tunnel Evolutionary Strategies and Graph Element measurements, was used to systematically Networks in Machine Learning for generate training and test data that spanned the entire range of boundary conditions which impact Engineering Modeling, Simulation and occupant thermal comfort in an electric vehicle. Design, Alet, Bauza, Jeewajee, Thomsen, Rodriguez, Kaelbling, Lozano-Pérez N/A Artifcial neural networks (ANN) were applied to the simulation data to predict the overall Robots are increasingly pervasive in Equivalent Homogeneous Temperature (EHT) manufacturing. However, robotic grippers are comfort index for each occupant. An ensemble of often still very simple parallel-jaw grippers with fve neural network models was able to achieve a fat fngers, which are very sub-optimal for many mean absolute error of 2 ºC or less in predicting objects. Having engineers design a new gripper the overall EHT for all occupants in the vehicle on for every object is a very expensive and unseen or test data, which is acceptable for rapid inefcient process. We instead propose to evaluation and optimization of thermal comfort automatically design them using machine energy demand. The deep learning model learning. First, we use Evolutionary Strategies in developed in this work enables predictions of simulation to get a good initial gripper. We also thermal comfort for any combination of steady- propose an automatic curriculum design that state boundary conditions in real-time without automatically increases the difculty of the being limited by time-consuming and expensive design task in simulation to ease the design CFD simulations or climatic wind tunnel tests. process. Once the gripper is designed in This model has been deployed as an easy-to-use simulation we fne-tune it via back-propagation web application within the organization for HVAC on a Graph Neural Network model trained on real engineers to optimize thermal comfort energy data for many grippers and objects. By demand and, thereby, driving range of electric amortizing real-world data across grippers and vehicle programs. objects we can be very data-efcient in the real world, leveraging prior experience in a manner analogous to that of meta-learning. We show that Abstract 75: Learning Partially Known our method improves the default gripper by Stochastic Dynamics with Empirical PAC signifcant margins on multiple datasets of varied Bayes in Machine Learning for Engineering objects. Modeling, Simulation and Design, Haußmann, Gerwinn, Look, Rakitsch, Kandemir N/A

We propose a novel scheme for ftting heavily Abstract 74: Electric Vehicle Range parameterized non-linear stochastic diferential Improvement by Utilizing Deep Learning to Optimize Occupant Thermal Comfort in equations (SDEs). We assign a prior on the Machine Learning for Engineering Modeling, parameters of the SDE drift and difusion Simulation and Design, Warey, Kaushik, functions to achieve a Bayesian model. We then infer this model using the well-known local Khalighi, Cruse, Venkatesan N/A reparameterized trick for the frst time for empirical Bayes, i.e. to integrate out the SDE 180 Dec. 12, 2020

parameters. The model is then ft by maximizing the efect of the extracted features generated by the likelihood of the resultant marginal with our LSTM model. Our study fnds that even a respect to a potentially large number of simple LSTM based model can accurately predict hyperparameters, which prohibits stable training. HFR values. As the prior parameters are marginalized, the model also no longer provides a principled means to incorporate prior knowledge. We overcome Abstract 77: Placement in Integrated Circuits both of these drawbacks by deriving a training using Cyclic Reinforcement Learning and loss that comprises the marginal likelihood of the Simulated Annealing in Machine Learning predictor and a PAC-Bayesian complexity penalty. for Engineering Modeling, Simulation and We observe on synthetic as well as real-world Design, Vashisht, Rampal, Liao, Lu, Shanbhag, time series prediction tasks that our method Fallon, Kara N/A provides an improved model ft accompanied with favorable extrapolation properties when provided Physical design and production of integrated a partial description of the environment circuits (IC) is becoming increasingly more dynamics. Hence, we view the outcome as a challenging as the sophistication in IC technology promising attempt for building cutting-edge is steadily increasing. Placement has been one of hybrid learning systems that efectively combine the most critical steps in IC physical design. frst-principle physics and data-driven Through decades of research, partition-based, approaches. analytical-based, and annealing-based placers have been enriching the placement solution toolbox. However, open challenges including long

Abstract 76: Prediction of high frequency run time and lack of the ability to generalize resistance in polymer electrolyte membrane continue to restrict wider applications of existing fuel cells using Long Short Term Memory placement tools. We devise a learning-based placement tool based on cyclic application of based model in Machine Learning for reinforcement learning (RL) and simulated Engineering Modeling, Simulation and Design, Lin N/A annealing (SA) by leveraging the advancement of RL. Results show that the RL module is able to High-frequency resistance (HFR) is a critical provide a better initialization for SA and thus quantity strongly related to a fuel cell system's leads to a better fnal placement design. performance. As such, an accurate and timely Compared to other recent learning-based placers, prediction of HFR is useful for understanding the our method is majorly diferent with its system's operating status and the corresponding combination of RL and SA by leveraging the RL control strategy optimization. It is benefcial to model’s ability to quickly get a good rough estimate the fuel cell system's HFR from the solution after training and the heuristics’ ability measurable operating conditions without to realize greedy improvements in the solution. resorting to costly HFR measurement devices, the latter of which are difcult to implement at the real automotive scale. In this study, we propose a data-driven approach for a real-time prediction of Machine Learning for Creativity and HFR. Specifcally, we use a long short-term Design 4.0 memory (LSTM) based machine learning model that takes into account both the current and past Luba Elliott, Sander Dieleman, Adam Roberts, Tom states of the fuel cell, as characterized through a White, Daphne Ippolito, Holly Grimm, Mattie set of sensors. These sensor signals form the Tesfaldet, Samaneh Azadi input to the LSTM. The data is experimentally Sat Dec 12, 05:15 AM collected from a vehicle lab that operates a 100 kW automotive fuel cell stack running on a Generative machine learning and machine automotive-scale test station. Our current results creativity have continued to grow and attract a indicate that our prediction model achieves high wider audience to machine learning. Generative accuracy HFR predictions and outperforms other models enable new types of media creation frequently used regression models. We also study 181 Dec. 12, 2020

across images, music, and text - including recent 10:47 LIGHT: Language in Games with advances such as StyleGAN2, Jukebox and GPT-3. AM Humans and Text Fan This one-day workshop broadly explores issues in 11:09 Not the Only One Dinkins the applications of machine learning to creativity AM and design. We will look at algorithms for 11:35 Art Showcase 1 Elliott generation and creation of new media, engaging AM researchers building the next generation of 11:50 Audio-reactive Latent Interpolations generative models (GANs, RL, etc). We AM with StyleGAN Brouwer investigate the social and cultural impact of these 12:00 Towards realistic MIDI instrument new models, engaging researchers from HCI/UX PM synthesizers Castellon communities and those using machine learning to 12:10 Creative Sketch Generation Ge develop new creative tools. In addition to PM covering the technical advances, we also address the ethical concerns ranging from the use of 12:20 Agence: an interactive flm exploring biased datasets to replicating artistic work. PM multi-agent systems and human Finally, we’ll hear from some of the artists and agency Camarena musicians who are adopting machine learning 12:30 Art Showcase 2 Grimm including deep learning and reinforcement PM learning as part of their own artistic process. We 12:45 Poster Session 2 aim to balance the technical issues and PM challenges of applying the latest generative 01:45 Panel Discussion 2 White, Engel, models to creativity and design with philosophical PM Hertzmann, Dinkins, Grimm and cultural issues that surround this area of 02:15 Social 2 research. PM N/A A Speech-Based Music Composition Schedule Tool with Transformer d'Eon N/A Image Generation With Neural Cellular Automatas Chen 05:15 Introduction and Art Gallery N/A Painting from Music using Neural AM Overview Elliott Visual StyleTransfer Verma, Odlen, 05:30 Poster Session 1 Davis kivelson, Basica AM N/A Weird AI Yankovic: Generating 06:30 Farewell to Fart: Working with Parody Lyrics Riedl AM generated text in the age of huge N/A GANterpretations Castro neural nets Shane N/A White-box Audio VST Efect 06:56 Artist+AI: Figures&Form Eaton Programming Mitcheltree AM N/A Text to Dialog: Using Semantic 07:21 Artifcial biodiversity Crespo Similarity to Extend Narrative AM Immersion in Virtual Worlds Chen 07:40 Art Showcase 1 Elliott N/A Latent Space Oddity: Exploring AM Latent Spaces to Design Guitar 08:00 Panel Discussion 1 Elliott, Shane, Timbres Taylor AM Crespo, Eaton, Roberts, Fan N/A Diptychs of human and machine 08:30 Social 1 perceptions Cabannes AM N/A A Note on Data Biases in Generative 09:15 magenta: Empowering Creative Models Esser AM Agency with Machine Learning Engel N/A Latent Compass Schwettmann 09:46 Art Showcase 2 Grimm N/A Transformer-GAN: Symbolic music AM generation using a learned loss 10:20 Computation and the Human Visual Muhamed AM Perception of Art Hertzmann 182 Dec. 12, 2020

N/A Spatial Assembly:Generative ubiquitous and important. They can be found at Architecture With Reinforcement all scales ranging from our daily routines—such Learning, Self Play and Tree Search as highway driving, communication via shared Tigas language, division of labor, and work N/A Randomized Overdrive Neural collaborations—to our global challenges—such as Networks Steinmetz disarmament, climate change, global commerce, N/A Musical Diary - AI Application for and pandemic preparedness. Arguably, the Music Making and Journaling Joshi success of the human species is rooted in our ability to cooperate, in our social intelligence and N/A Copyspace: Where to Write on skills. Since machines powered by artifcial Images Lundin intelligence and machine learning are playing an N/A Colorization Transformer Kumar ever greater role in our lives, it will be important N/A Generating Novel Glyph without to equip them with the skills necessary to Human Data by Learning to cooperate and to foster cooperation. Communicate Park

N/A Horses With Blue Jeans - Creating We see an opportunity for the feld of AI, and New Worlds by Rewriting a GAN Bau particularly machine learning, to explicitly focus N/A Choreo-Graph: Learning Latent efort on this class of problems which we term Graph Representations of the Cooperative AI. The goal of this research would Dancing Body Pettee be to study the many aspects of the problem of N/A Network Bending Neural Vocoders cooperation, and innovate in AI to contribute to McCallum solving these problems. Central questions include N/A Neural Style Transfer for Casual how to build machine agents with the capabilities Creation Colton needed for cooperation, and how advances in N/A TräumerAI: Dreaming Music with machine learning can help foster cooperation in StyleGAN Jeong populations of agents (of machines and/or N/A Mask-Guided Discovery of Semantic humans), such as through improved mechanism Manifolds in Generative Models Yang design and mediation. N/A Behaviour Aesthetics of Research could be organized around key Reinforcement Learning in a Robotic capabilities necessary for cooperation, including: Art Installation Audry, Gagne, Scurto understanding other agents, communicating with N/A Resolution Dependent GAN other agents, constructing cooperative Interpolation for Controllable Image commitments, and devising and negotiating Synthesis Between Domains Pinkney suitable bargains and institutions. Since artifcial N/A LiveGAN Smith agents will often act on behalf of particular N/A A Framework and Dataset for humans and in ways that are consequential for Abstract Art Generation via humans, this research will need to consider how CalligraphyGAN Zhuo machines can adequately learn human preferences, and how best to integrate human norms and ethics into cooperative arrangements. Cooperative AI We are planning to bring together scholars from

Thore Graepel, Dario Amodei, Vincent Conitzer, Allan diverse backgrounds to discuss how AI research Dafoe, Gillian Hadfeld, Eric Horvitz, Sarit Kraus, can contribute to the feld of cooperation. Kate Larson, Yoram Bachrach

Sat Dec 12, 05:20 AM Call for Papers We invite high-quality paper submissions on the https://www.CooperativeAI.com/ following topics (broadly construed, this is not an exhaustive list): Problems of cooperation—in which agents seek ways to jointly improve their welfare—are -Multi-agent learning 183 Dec. 12, 2020

-Agent cooperation 07:30 Invited Speaker: Sarit Kraus (Bar- -Agent communication AM Ilan University) on Agent-Human -Resolving commitment problems Collaboration and Learning for -Agent societies, organizations and institutions Improving Human Satisfaction Kraus -Trust and reputation 08:00 Invited Speaker: William Isaac -Theory of mind and peer modelling AM (DeepMind) on Can Cooperation -Markets, mechanism design and and economics make AI (and Society) Fairer? Isaac based cooperation 08:30 Q&A: Open Problems in Cooperative -Negotiation and bargaining agents AM AI with Thore Graepel (DeepMind), -Team formation problems Allan Dafoe (), Yoram Bachrach (DeepMind), and Accepted papers will be presented during joint Natasha Jaques (Google) virtual poster sessions and be made publicly [moderator] Graepel, Bachrach, Dafoe, available as non archival reports, allowing future Jaques submissions to archival conferences or journals. 08:45 Q&A: Gillian Hadfeld (University of AM Toronto): The Normative Submissions should be up to eight pages Infrastructure of Cooperation, with excluding references, acknowledgements, and Natasha Jaques (Google) supplementary material, and should follow [moderator] Hadfeld, Jaques NeurIPS format. The review process will be 09:00 Q&A: William Isaac (DeepMind): Can double-blind. AM Cooperative Make AI (and Society) Fairer?, with Natasha Jaques Paper submissions: https://easychair.org/my/ (Google) [moderator] Isaac, Jaques conference?conf=coopai2020# 09:15 Q&A: Peter Stone (The University of AM Texas at Austin): Ad Hoc Schedule Autonomous Agent Teams: Collaboration without Pre- Coordination, with Natasha Jaques 05:20 Welcome: Yoram Bachrach (Google) [moderator] Stone, Jaques AM (DeepMind) and Gillian Hadfeld 09:30 Q&A: Sarit Kraus (Bar-Ilan (University of Toronto) Bachrach, AM University): Agent-Human Hadfeld Collaboration and Learning for 05:30 Open Problems in Cooperative AI: Improving Human Satisfaction, with AM Thore Graepel (DeepMind) and Allan Natasha Jaques (Google) Dafoe (University of Oxford) Graepel, [moderator] Kraus, Jaques Dafoe 09:45 Q&A: James Fearon (Stanford 06:00 Invited Speaker: Peter Stone (The AM University): Cooperation Inside and AM University of Texas at Austin) on Ad Over the Rules of the Game, with Hoc Autonomous Agent Teams: Natasha Jaques (Google) Collaboration without Pre- [moderator] Fearon, Jaques Coordination Stone 10:00 Poster Sessions (hosted in 06:30 Invited Speaker: Gillian Hadfeld AM GatherTown) AM (University of Toronto) on The 11:00 Panel: Kate Larson (DeepMind) Normative Infrastructure of AM [moderator], Natasha Jaques Cooperation Hadfeld (Google), Jefrey Rosenschein (The 07:00 Invited Speaker: James Fearon Hebrew University of Jerusalem), AM (Stanford University) on Two Kinds Michael Wooldridge (University of of Cooperative AI Challenges: Game Oxford) Larson, Jaques, Rosenschein, Play and Game Design Fearon Wooldridge 184 Dec. 12, 2020

11:45 Spotlight Talk: Too many cooks: agents that can cooperate with humans and, AM Bayesian inference for coordinating likely, other AI agents. Training environments multi-agent collaboration Wang need to be built with normative infrastructure 12:00 Spotlight Talk: Learning Social that enables AI agents to learn and participate in PM Learning Ndousse cooperative activities—including the cooperative 12:15 Spotlight Talk: Benefts of activity that undergirds all others: collective PM Assistance over Reward Learning punishment of agents that violate community Shah norms. 12:30 Spotlight Talk: Watch-And-Help: A PM Challenge for Social Perception and Human-AI Collaboration Puig Abstract 5: Invited Speaker: James Fearon 12:45 Closing Remarks: Eric Horvitz (Stanford University) on Two Kinds of PM (Microsoft) Horvitz Cooperative AI Challenges: Game Play and Game Design in Cooperative AI, Fearon 07:00 AM Abstracts (15): Humans routinely face two types of cooperation Abstract 3: Invited Speaker: Peter Stone (The problems: How to get to a collectively good University of Texas at Austin) on Ad Hoc outcome given some set of preferences and Autonomous Agent Teams: Collaboration structural constraints; and how to design, shape, without Pre-Coordination in Cooperative AI, or shove structural constraints and preferences to Stone 06:00 AM induce agents to make choices that bring about better collective outcomes. In the terminology of As autonomous agents proliferate in the real economic theory, the frst is a problem of world, both in software and robotic settings, they equilibrium selection given a game structure, and will increasingly need to band together for the second is a problem of mechanism design by cooperative activities with previously unfamiliar a “social planner.” These two types of problems teammates. In such "ad hoc" team settings, team have been distinguished in and are central to a strategies cannot be developed a priori. much longer tradition of political philosophy (e.g., state of nature arguments). It is fairly clear how Rather, an agent must be prepared to cooperate AI can and might be constructively applied to the with many types of teammates: it must frst type of problem, while less clear for the collaborate without pre-coordination. This talk will second type. How to think about using AI to cover past and ongoing research on the contribute to optimal design of the terms and challenge of building autonomous agents that are parameters – the rules of a game – for other capable of robust ad hoc teamwork. agents? Put diferently, could there be an AI of constitutional design?

Abstract 4: Invited Speaker: Gillian Hadfeld (University of Toronto) on The Normative Abstract 6: Invited Speaker: Sarit Kraus (Bar- Infrastructure of Cooperation in Ilan University) on Agent-Human Cooperative AI, Hadfeld 06:30 AM Collaboration and Learning for Improving In this talk, I will present the case for the critical Human Satisfaction in Cooperative AI, Kraus role played by third-party enforced rules in the 07:30 AM extensive forms of cooperation we see in We consider environments where a set of human humans. Cooperation, I’ll argue, cannot be workers needs to handle a large set of tasks while adequately accounted for—or modeled for AI— interacting with human users. The arriving tasks within the framework of human preferences, vary: they may difer in their urgency, their coordination incentives or bilateral commitments difculty and the required knowledge and time and reciprocity alone. Cooperation is a group duration in which to perform them. Our goal is to phenomenon and requires group infrastructure to decrease the number of workers, which we refer maintain. This insight is critical for training AI to as operators that are handling the tasks while 185 Dec. 12, 2020

increasing the users’ satisfaction. We present Abstract 11: Q&A: Peter Stone (The automated intelligent agents that will work University of Texas at Austin): Ad Hoc together with the human operators in order to Autonomous Agent Teams: Collaboration improve the overall performance of such systems without Pre-Coordination, with Natasha and increase both operators' and users’ Jaques (Google) [moderator] in Cooperative satisfaction. Examples include: home AI, Stone, Jaques 09:15 AM hospitalization environment where remote Participants can send questions via Sli.do using specialists will instruct and supervise treatments this link: https://app.sli.do/event/50mlx6cq that are carried out at the patients' homes; operators that tele-operate autonomous vehicles when human intervention is needed and bankers that provide online service to customers. The Abstract 12: Q&A: Sarit Kraus (Bar-Ilan automated agents could support the operators: University): Agent-Human Collaboration and the machine learning-based agent follows the Learning for Improving Human Satisfaction, operator’s work and makes recommendations, with Natasha Jaques (Google) [moderator] helping him interact profciently with the users. in Cooperative AI, Kraus, Jaques 09:30 AM The agents can also learn from the operators and Participants can send questions via Sli.do using eventually replace the operators in many of their this link: https://app.sli.do/event/9opzmndo tasks.

Abstract 13: Q&A: James Fearon (Stanford Abstract 8: Q&A: Open Problems in Cooperative AI with Thore Graepel University): Cooperation Inside and Over (DeepMind), Allan Dafoe (University of the Rules of the Game, with Natasha Jaques Oxford), Yoram Bachrach (DeepMind), and (Google) [moderator] in Cooperative AI, Fearon, Jaques 09:45 AM Natasha Jaques (Google) [moderator] in Cooperative AI, Graepel, Bachrach, Dafoe, Participants can send questions via Sli.do using Jaques 08:30 AM this link: https://app.sli.do/event/uqh9pktn

Participants can send questions via Sli.do using this link: https://app.sli.do/event/ambolxqi Abstract 14: Poster Sessions (hosted in GatherTown) in Cooperative AI, 10:00 AM Abstract 9: Q&A: Gillian Hadfeld (University Gather Town link: https://neurips.gather.town/app of Toronto): The Normative Infrastructure of

Cooperation, with Natasha Jaques (Google) /1l0kNMMpqLZvr9Co/CooperativeAI [moderator] in Cooperative AI, Hadfeld, Jaques 08:45 AM

Participants can send questions via Sli.do using Abstract 16: Spotlight Talk: Too many cooks: this link: https://app.sli.do/event/02lguhzy Bayesian inference for coordinating multi- agent collaboration in Cooperative AI, Wang 11:45 AM Abstract 10: Q&A: William Isaac (DeepMind): Authors: Rose Wang, Sarah Wu, James Evans, Can Cooperative Make AI (and Society) Joshua Tenenbaum, David Parkes and Max Fairer?, with Natasha Jaques (Google) Kleiman-Weiner [moderator] in Cooperative AI, Isaac, Jaques 09:00 AM

Participants can send questions via Sli.do using Abstract 17: Spotlight Talk: Learning Social this link: https://app.sli.do/event/riko0stp Learning in Cooperative AI, Ndousse 12:00 PM 186 Dec. 12, 2020

Authors: Kamal Ndousse, Douglas Eck, Sergey benchmarked. The program is a collection of Levine and Natasha Jaques invited talks, alongside contributed posters. A panel discussion will provide diferent perspectives and experiences of infuential Abstract 18: Spotlight Talk: Benefts of researchers from both felds and also engage open participant conversation. An expected Assistance over Reward Learning in outcome of this workshop is the interdisciplinary Cooperative AI, Shah 12:15 PM exchange of ideas and initiation of collaboration. Authors: Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael Dennis, Pieter Abbeel, Schedule Anca Dragan and Stuart Russell 05:30 Discord for Q&A AM Abstract 19: Spotlight Talk: Watch-And-Help: 05:30 Opening Remarks A Challenge for Social Perception and AM Human-AI Collaboration in Cooperative AI, 05:41 Invited Talk: Nadine Schneider -Real- Puig 12:30 PM AM world application of ML in drug discovery Schneider Authors: Xavier Puig, Tianmin Shu, Shuang Li, 06:01 Invited Talk: Nadine Schneider - Live Zilin Wang, Josh Tenenbaum, Sanja Fidler and AM Q&A Antonio Torralba 06:11 Invited Talk: Frank Noe - The AM sampling problem in statistical mechanics and Boltzmann- Generating Flows Noe Machine Learning for Molecules 06:31 Invited Talk: Frank Noe - Live Q&A

Jose Miguel Hernández-Lobato, Matt Kusner, Brooks AM Paige, Marwin Segler, Jennifer Wei 06:40 Contributed Talk: Evidential Deep AM Learning for Guided Molecular Sat Dec 12, 05:30 AM Property Prediction and Discovery - Ava Soleimany, Alexander Amini, Discovering new molecules and materials is a Samuel Goldman, Daniela Rus, central pillar of human well-being, providing new Sangeeta Bhatia and Connor Coley medicines, securing the world’s food supply via Soleimany agrochemicals, or delivering new battery or solar 06:50 Contributed Talk: Gaussian Process panel materials to mitigate climate change. AM Molecular Property Prediction with However, the discovery of new molecules for an FlowMO - Henry Moss and Ryan-Rhys application can often take up to a decade, with Grifths Moss costs spiraling. Machine learning can help to 07:00 Contributed Talk: Explaining Deep accelerate the discovery process. The goal of this AM Graph Networks with Molecular workshop is to bring together researchers Counterfactuals - Davide Bacciu and interested in improving applications of machine Danilo Numeroso Numeroso learning for chemical and physical problems and industry experts with practical experience in 07:11 Invited Talk: Klaus Robert-Müller & pharmaceutical and agricultural development. In AM Kristof Schütt: Machine Learning a highly interactive format, we will outline the meets Quantum Chemistry Müller, current frontiers and present emerging research Schütt directions. We aim to use this workshop as an 07:31 Invited Talk: Klaus Robert-Müller opportunity to establish a common language AM and Kristof Schütt - Live Q&A between all communities, to actively discuss new 07:41 Invited Talk: Rocio Mercado - research problems, and also to collect datasets AM Applying Graph Neural Networks to by which novel machine learning models can be Molecular Design Mercado 187 Dec. 12, 2020

08:01 Invited Talk: Rocio Mercado - Live 11:01 Invited Talk: Yannick Djoumbou AM Q&A AM Feunang - In Silico Prediction and 08:10 Spotlight Talk: Comparison of Atom Identifcation of Metabolites with AM Representations in Graph Neural BioTransformer Djoumbou Feunang Networks for Molecular Property 11:21 Invited Talk: Yannick Djoumbou Prediction - Agnieszka Pocha, AM Feunang - Live Q&A Tomasz Danel and Lukasz Maziarka 11:30 Spotlight Talk: Data augmentation Danel AM strategies to improve reaction yield 08:15 Spotlight Talk: Completion of partial predictions and estimate AM reaction equations - Alain C. uncertainty - Philippe Schwaller, Vaucher, Philippe Schwaller and Alain Vaucher, Teodoro Laino and Teodoro Laino Vaucher, Jean-Louis Reymond Schwaller 08:20 Spotlight Talk: Molecular 11:35 Spotlight Talk: Message Passing AM representation learning with AM Networks for Molecules with language models and domain- Tetrahedral Chirality - Lagnajit relevant auxiliary tasks - Benedek Pattanaik, Octavian Ganea, Ian Fabian, Thomas Edlich, Héléna Coley, Klavs Jensen, William Green Gaspar, Marwin Segler, Joshua and Connor Coley. Pattanaik Meyers, Marco Fiscato and 11:40 Spotlight Talk: Protein model quality Mohamed Ahmed Fabian AM assessment using rotation- 08:25 Spotlight Talk: Accelerate the equivariant, hierarchical neural AM screening of complex materials by networks - Stephan Eismann, learning to reduce random and Patricia Suriana, Bowen Jing, systematic errors - Tian Xie, Yang Raphael Townshend and Ron Dror. Shao-Horn and Jefrey Grossman. Xie Eismann 08:30 Poster Session Break 11:45 Spotlight Talk: Crystal Structure AM AM Search with Random Relaxations 09:30 Panel Aspuru-Guzik, Listgarten, Müller, Using Graph Networks - Gowoon AM Schneider Cheon, Lusann Yang, Kevin 10:00 Contributed Talk: Bayesian GNNs for McCloskey, Evan Reed and Ekin AM Molecular Property Prediction - Cubuk Cheon George Lamb and Brooks Paige Lamb 11:51 Invited Talk: Benjamin Sanchez- 10:10 Contributed Talk: Design of AM Lengeling - Evaluating Attribution of AM Experiments for Verifying Molecules with Graph Neural Biomolecular Networks - Ruby Networks Sanchez-Lengeling Sedgwick, John Goertz, Ruth 12:11 Invited Talk: Benjamin Sanchez- Misener, Molly Stevens and Mark PM Lengeling - Live Q&A van der Wilk. Sedgwick 12:21 Invited Talk: Jennifer Listgarten 10:20 Contributed Talk: Multi-task learning PM Listgarten AM for electronic structure to predict 12:41 Invited Talk: Jennifer Listgarten - and explore molecular potential PM Live Q&A energy surfaces - Z. Qiao, F. Ding, M. 12:50 Closing Remarks Welborn, P.J. Bygrave, D.G.A. Smith, PM A. Anandkumar, F. R. Manby and TF. 01:00 Poster Session Part 2 Miller III Qiao PM 10:31 Invited Talk: Patrick Walters - AM Challenges and Opportunities for Machine Learning in Drug Discovery Abstracts (9): Walters Abstract 1: Discord for Q&A in Machine 10:51 Invited Talk: Patrick Walters - Live Learning for Molecules, 05:30 AM AM Q&A 188 Dec. 12, 2020

Please use this Discord for questions/discussion Research (NIBR) in Basel (Switzerland) for a for all sessions. postdoc focusing on Cheminformatics and Data Science under supervision of Dr. Gregory First time Discord user? Check out this video Landrum and Dr. Nikolaus Stief. Since 2017 she (https://www.youtube.com/watch? is a researcher in the Computer-Aided Drug v=LDVqruRsYtA). Design team in Global Discovery Chemistry in NIBR, Basel.

Abstract 3: Invited Talk: Nadine Schneider - Real-world application of ML in drug Abstract 5: Invited Talk: Frank Noe - The discovery in Machine Learning for sampling problem in statistical mechanics Molecules, Schneider 05:41 AM and Boltzmann-Generating Flows in Machine Learning for Molecules, Noe 06:11 Abstract: AM The digital revolution fnally reached the pharmaceutical industry and machine-learning Abstract: models are becoming more and more relevant for drug discovery. Often people outside the domain The rare-event sampling problem is one of the underestimate the complexity related to drug fundamental problems in statistical mechanics discovery. The hope that novel algorithms and and particularly in molecular dynamics or Monte- models can remedy the challenge of fnding new Carlo simulations of molecules. Here I will drugs more efciently is often shaken when data introduce to Boltzmann-generating fows that science experts dive more deeply in the domain. combine invertible neural networks and Nevertheless, there are many areas in drug statistical-mechanics based reweighting or discovery where machine learning and data resampling methods in order to train a machine- science can make a diference. A very important learning method to generate samples from the point is trying to better understand the data desired equilibrium distribution of the molecule or before just applying new models. Especially in other many-body system. In particular, two early drug discovery, many data sets are very recent developments will be described: small and tricky given the data distribution, data equivariant fows that take symmetries in the bias, data shift or incompleteness. Another molecular energy function into account, and critical point are the users, to make machine Stochastic Normalizing Flows which combine learning models efective and actionable they deterministic invertible neural networks with need to be accessible and integrated into the stochastic sampling steps and are trained using daily work of the scientists which are most of the path likelihood maximization techniques that time not data scientists themselves. With this, an have emerged in nonequilibrium statistical important aspect is also education of the users to mechanics. deepen their knowledge and create the right expectations on machine learning models. In this presentation, several of these aspects will be discussed in more detail using examples and Biography: learnings we made over the past years. Frank Noé has undergraduate degrees in Biography: electrical energineering and computer science Dr. Nadine Schneider obtained a BSc and MSc in and graduated in computer science and Bioinformatics from the Saarland University in computational physics at University of Germany. She did her PhD in Molecular Modeling Heidelberg. Frank is currently full professor for in the group of Prof. Dr. Matthias Rarey at the Mathematics, Computer Science and Physics at University of Hamburg, Germany. In her PhD she Freie Universität Berlin, Germany. Since 2015 he worked on a novel protein-ligand scoring function also holds an adjunct professorship in Chemistry which was integrated in the commercial modeling at Rice University Houston, Texas. software SeeSAR (BioSolveIT GmbH). In 2014 she joined the Novartis Institutes for BioMedical Frank's research focuses on developing new 189 Dec. 12, 2020

Machine Learning methods for the physical National Academy of Sciences-Leopoldina, in sciences, especially molecular sciences. Frank 2017 of the Berlin Brandenburg Academy of received two awards of the European Research Sciences and also in 2017 external scientifc Council, an ERC starting grant in 2012 and an member of the Max Planck Society. In 2019 and ERC consolidator grant in 2017. He received the 2020 he became ISI Highly Cited Researcher. His early career award in theoretical Chemistry of the research interests are intelligent data analysis American Chemical Society in 2019 and he is ISI and machine learning with applications in highly cited researcher since 2019. neuroscience (specifcally brain-computer interfaces), physics and chemistry.

Kristof T. Schütt is a senior researcher at the Abstract 10: Invited Talk: Klaus Robert-Müller Berlin Institute for the Foundations of Learning & Kristof Schütt: Machine Learning meets Quantum Chemistry in Machine Learning for and Data (BIFOLD). He received his master's Molecules, Müller, Schütt 07:11 AM degree in computer science in 2012 and his PhD in machine learning in 2018 at the machine Abstract: learning group of Technische Universität Berlin. Until September 2020, he worked at the Audatic Machine learning is emerging as a powerful tool company developing neural networks for real- in quantum chemistry and materials science, time speech enhancement. His research interests combining the accuracy of electronic structure include interpretable neural networks, methods with computational efciency. Going representation learning, generative models, and beyond the simple prediction of chemical machine learning applications in quantum properties, machine learning potentials can be chemistry. applied to perform fast molecular dynamics simulations, model solvent efects and response properties as well as fnd structures with desired Abstract 12: Invited Talk: Rocio Mercado - properties by inverse design. In this talk, we will Applying Graph Neural Networks to show how this opens a clear path towards Molecular Design in Machine Learning for unifying machine learning and quantum Molecules, Mercado 07:41 AM chemistry. Abstract: Biographies: Deep learning methods applied to chemistry can Klaus-Robert Müller has been a professor of be used to accelerate the discovery of new computer science at Technische Universit{\"a}t molecules, such as promising pharmaceuticals. Berlin since 2006; at the same time he is co- Notably, methods such as graph neural networks directing the Berlin Big Data Center. He studied (GNNs) are interesting tools to explore for physics in Karlsruhe from 1984 to 1989 and molecular design because graphs are natural obtained his Ph.D. degree in computer science at data structures for describing molecules. The Technische Universit{\"a}t Karlsruhe in 1992. process of designing novel, drug-like compounds After completing a postdoctoral position at GMD can be viewed as one of generating graphs which FIRST in Berlin, he was a research fellow at the optimize all the features of the desirable University of Tokyo from 1994 to 1995. In 1995, molecules. he founded the Intelligent Data Analysis group at GMD-FIRST (later Fraunhofer FIRST) and directed In this talk, I will provide an overview of how deep it until 2008. From 1999 to 2006, he was a learning methods can be applied to complex drug professor at the University of Potsdam. He was design tasks, focusing on our recently published awarded the Olympus Prize for Pattern tool, GraphINVENT. GraphINVENT uses GNNs and Recognition (1999), the SEL Alcatel a tiered deep neural network architecture to Communication Award (2006), the Science Prize probabilistically generate new molecules a single of Berlin by the Governing Mayor of Berlin (2014), bond at a time, and learns to build new molecules and the Vodafone Innovations Award (2017). In resembling a training set without any explicit 2012, he was elected member of the German programming of chemical rules. GraphINVENT is 190 Dec. 12, 2020

one of many recent platforms which aim to began with expert systems in the late 1980s, streamline the drug discovery process using AI. moved to machine learning in the 1990s, and has continued through 25 years in the Biography: pharmaceutical industry. Before joining Relay, Pat spent more than 20 years at Vertex I joined the Molecular AI group at AstraZeneca in Pharmaceuticals, where he was Global Head of October 2018. My work focuses on using deep Modeling & Informatics. He is a member of the learning methods for graph-based molecular editorial advisory board for the Journal of design. Before AstraZeneca, I was a PhD student Medicinal Chemistry and has been a guest editor in Professor Berend Smit’s molecular simulation for multiple scientifc journals. Pat received his group at UC Berkeley and EPFL. I received my Ph.D. in Organic Chemistry from the University of PhD in Chemistry from UC Berkeley in July 2018, Arizona, where he studied the application of and my BS in Chemistry from Caltech in June artifcial intelligence in conformational analysis. 2013. Before obtaining his Ph.D., he worked at Varian Instruments as both a chemist and a software developer. Pat received his B.S. in Chemistry from the University of California, Santa Barbara. Abstract 23: Invited Talk: Patrick Walters - Challenges and Opportunities for Machine Learning in Drug Discovery in Machine Learning for Molecules, Walters 10:31 AM Abstract 25: Invited Talk: Yannick Djoumbou Feunang - In Silico Prediction and Abstract: Identifcation of Metabolites with BioTransformer in Machine Learning for Over the last few years, we have seen a dramatic Molecules, Djoumbou Feunang 11:01 AM uptick in the application of Machine Learning in drug discovery. Developments in deep learning Abstract: have led to a renaissance in Quantitative Structure-Activity Relationships (QSAR) and de- Increased reliance on chemicals in both novo molecule generation. While the feld industrialized and developing countries has led to continues to advance, it faces several challenges. a dramatic change of our exposure patterns to As with any application of machine learning, the both natural and synthetic chemicals. This results will depend on the data, the diverse plethora of xenobiotics, some of which representation, and the algorithms used to have become nearly ubiquitous, includes among generate the machine learning models. In many others, pesticides, pharmaceuticals, food cases, drug discovery data presents some unique compounds, and their largely unknown chemo-/ challenges not found in data from other biotransformation products. To accurately assess disciplines. Furthermore, the optimal means of the various environmental health threats they representing molecules in machine learning is may pose, it is crucial to understand how they still an open question. This presentation will are biologically produced, activated, detoxifed, highlight current challenges and hopefully and eliminated from various biological matrices. motivate new work to move the feld forward. As it turns out, understanding the biological and environmental fate of xenobiotics is a major step towards deciphering the aforementioned Biography: mechanisms. Moreover, it contributes signifcantly to the development of safer and Pat Walters heads the Computation & Informatics more sustainable chemicals. Over the past group at Relay Therapeutics in Cambridge, MA. decade several in silico tools have been His group focuses on novel computational developed for the prediction and identifcation of methods that integrate computer simulations and metabolites, most of which are only commercially experimental data to provide insights that drive available and signifcantly biased towards drug- drug discovery programs. Pat is co-author of the like molecules. In this presentation, we will book “Deep Learning for the Life Sciences,” describe BioTransformer, an open source software published by O’Reilly and Associates. His AI work and freely accessible server for the prediction of 191 Dec. 12, 2020

human CYP450-catalyzed metabolism, human gut approach to interpretability, which highlights microbial degradation, human phase-II parts of the input that are infuential to a neural metabolism, human promiscuous metabolism, network’s prediction. With molecules, we can set and environmental microbial degradation. up synthetic tasks such as the identifcation of Moreover, we will present an assessment of its subfragment logics to generate ground truth performance in predicting the metabolism of attributions and labels. This scenario serves as a agrochemicals, conducted at Corteva Agriscience. testbed to quantitatively study attributions of Furthermore, we will illustrate a few examples of molecular graphs with Graph Neural Networks its application as demonstrated by various (GNNs). We perform multiple experiments looking published scientifc studies. Finally, we will share at the efect of GNN architectures, label noise, future perspectives for this open source project, and spurious correlations in attributions. In the and describe how it could signifcantly beneft the end, we make concrete recommendations for exposure science and regulatory communities. which attribution methods and models to use while also providing a framework for evaluating Biography: new attribution techniques.

Dr. Yannick Djoumbou Feunang earned his PhD in Biography: Microbiology and Biotechnology at the University I am a research scientist at Google Research. My of Alberta - Canada, in 2017, where his research research centers around using machine learning focused in developing Cheminformatics tools to techniques to build data-driven models for the enhance Metabolomics. Some of his main prediction of molecular properties and the contributions include software tools ClassyFire, generation of new molecules and materials via BioTransformer, and CFM-ID 3.0, with applications generative models. Applications include solar of ontology and linked data, as well as machine- cells, solubility, drug-design, and particularly learning, and knowledge-based artifcial smelly molecules. I am part of a team that wants intelligence to biology and chemistry. to do for olfaction, what machine learning has Additionally, he has contributed to the done for vision and speech. development of databases such as DrugBank and HMDB. Since 2018, Dr. Djoumbou Feunang has I am also passionate about science education and worked as a Research Investigator for the divulgation, I am one of the founders and Chemistry Data Science research group at organizers for Clubes de Ciencia Mexico and a Corteva Agriscience in Indianapolis, Indiana. His LatinX-centered AI conference RIIAA. In my free responsibilities include among others: (1) the time, I like to run, eat ice cream and cook food. development of machine learning models to support lead generation and optimization projects, and; (2) the enhancement of Corteva’s Abstract 33: Invited Talk: Jennifer Listgarten Cheminformatics scientifc computing platform. in Machine Learning for Molecules, Listgarten He also currently leads a project aiming at 12:21 PM building a cutting-edge, adapted in silico metabolism platform at Corteva Agriscience. Abstract:

Data-driven design is making headway into a

Abstract 31: Invited Talk: Benjamin Sanchez- number of application areas, including protein, Lengeling - Evaluating Attribution of small-molecule, and materials engineering. The Molecules with Graph Neural Networks in design goal is to construct an object with desired properties, such as a protein that binds to a Machine Learning for Molecules, Sanchez- target more tightly than previously observed. To Lengeling 11:51 AM that end, costly experimental measurements are Abstract: being replaced with calls to a high-capacity regression model trained on labeled data, which The interpretability of machine learning models can be leveraged in an in silico search for for molecules is critical to scientifc discovery, promising design candidates. The aim then is to understanding, and debugging. Attribution is one discover designs that are better than the best 192 Dec. 12, 2020

design in the observed data. This goal puts machine-learning based design in a much more difcult spot than traditional applications of These eforts refect a recognition that existing predictive modelling, since successful design research norms have failed to address the requires, by defnition, some degree of impacts of AI research, and take place against extrapolation---a pushing of the predictive models the backdrop of a larger reckoning with the role to its unknown limits, in parts of the design space of AI in perpetuating injustice. The changes have that are a priori unknown. In this talk I'll discuss been met with both praise and criticism some our emerging approaches to tackle this problem. within and outside the community see them as a crucial frst step towards integrating ethical Biography: refection and review into the research process, Since Jan. 2018, Jennifer Listgarten has been a fostering necessary changes to protect Professor in the Department of Electrical populations at risk of harm. Others worry that AI Engineering and Computer Science, and Center researchers are not well placed to recognize and for Computational Biology, at the University of reason about the potential impacts of their work, California, Berkeley. She is also a member of the as efective ethical deliberation may require steering committee for the Berkeley AI Research diferent expertise and the involvement of other (BAIR) Lab, and a Chan Zuckerberg investigator. stakeholders. From 2007 to 2017 she was at Microsoft Research, through Cambridge, MA (2014-2017), Los Angeles (2008-2014), and Redmond, WA (2007-2008). She completed her Ph.D. in the This debate reveals that even as the AI research machine learning group in the Department of community is beginning to grapple with the Computer Science at the University of Toronto, legitimacy of certain research questions and located in her hometown. She has two critically refect on its research practices, there undergraduate degrees, one in Physics and one remains many open questions about how to in Computer Science, from Queen's University in ensure efective ethical oversight. This workshop Kingston, Ontario. Jennifer's research interests therefore aims to examine how concerns with are broadly at the intersection of machine harmful impacts should afect the way the learning, applied statistics, molecular biology and research community develops its research science. agendas, conducts its research, evaluates its research contributions, and handles the publication and dissemination of its fndings. This event complements other NeurIPS workshops this Navigating the Broader Impacts of AI year devoted to normative issues in AI and builds Research on others from years past, but adopts a distinct focus on the ethics of research practice and the

Carolyn Ashurst, Rosie Campbell, Deborah Raji, Solon ethical obligations of researchers. Barocas, Stuart Russell

Sat Dec 12, 05:30 AM Schedule

Following growing concerns with both harmful 05:30 Welcome research impact and research conduct in AM computer science, including concerns with research published at NeurIPS, this year’s 05:45 Morning keynote Wallach, Campbell conference introduced two new mechanisms for AM ethical oversight: a requirement that authors 06:15 Ethical oversight in the peer review include a “broader impact statement” in their AM process Brown, Douglas, Gabriel, Hecht, paper submissions and additional evaluation Campbell criteria asking paper reviewers to identify any 07:15 Morning break potential ethical issues with the submissions. AM 193 Dec. 12, 2020

07:30 Harms from AI research Hofmann, N/A An Ethical Highlighter for People- AM Moorosi, Prabhu, Raji, Metcalf, Stanley Centric Dataset Creation Hanley, 08:30 How should researchers engage with Khandelwal, Averbuch-Elor, Snavely, AM controversial applications of AI? Nissenbaum Koepke, ONEIL, Petty, Rudin, Raji, N/A Ethical Testing in the Real World: Bushway Recommendations for Physical 09:30 Lunch and watch lightning talks (in Testing of Adversarial Machine AM parallel) from workshop Learning Attacks Siva Kumar, Delano, submissions Albert, Rigot, Penney 10:30 Discussions with authors of N/A Auditing Government AI: Assessing AM submitted papers ethical vulnerability of machine 11:30 Responsible publication: NLP case learning Kennedy AM study Brundage, McCann, Rafel, N/A The Managerial Efects of Schulter, Waseem, Campbell Algorithmic Fairness Activism 12:30 Afternoon break Cowgill, Dell'Acqua, Matz PM N/A Biased Programmers? Or Biased 12:45 Strategies for anticipating and Data? A Field Experiment in PM mitigating risks Casovan, Gebru, Operationalizing AI Ethics Cowgill, Mohamed, Barocas, Ovadya Dell'Acqua, Chaintreau, Verma, Deng, Hsu 01:45 The roles of diferent parts of the PM research ecosystem in navigating N/A Anticipatory Ethics and the Role of broader impacts Greenberg, Venema, Uncertainty Nanayakkara, Diakopoulos, Zevenbergen, Irani, Barocas Hullman 02:45 Closing remarks PM Abstracts (4): N/A AI in the “Real World”: Examining the Impact of AI Deployment in Low- Abstract 3: Ethical oversight in the peer Resource Contexts Okolo review process in Navigating the Broader N/A Nose to Glass: Looking In to Get Impacts of AI Research, Brown, Douglas, Beyond Seah Gabriel, Hecht, Campbell 06:15 AM N/A Ideal theory in AI ethics Estrada Discussion on potential refection and oversight N/A Training Ethically Responsible AI interventions such as 'broader impact Researchers: a Case Study Yuan, statements' and their efectiveness Vanea, Lucivero, Hallowell N/A Overcoming Failures of Imagination in AI Infused System Development Abstract 5: Harms from AI research in and Deployment Boyarskaya, Olteanu, Navigating the Broader Impacts of AI Crawford Research, Hofmann, Moorosi, Prabhu, Raji, N/A Like a Researcher Stating Broader Metcalf, Stanley 07:30 AM Impact For the Very First Time Abuhamad, Rheault Case studies and mitigation approaches N/A An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process Abstract 7: Lunch and watch lightning talks Tran, Valtchanov, Ganapathy, Feng, Slud, (in parallel) from workshop submissions in Goldblum, Goldstein Navigating the Broader Impacts of AI N/A Non-Portability of Algorithmic Research, 09:30 AM Fairness in India Sambasivan, Arnesen, Hutchinson, Prabhakaran In advance of the next session (which is discussions with paper authors), please take 194 Dec. 12, 2020

some time to watch the videos of the submitted Schedule papers over lunch.

06:00 Live Intro Malinowski, Patraucean, AM Swirszcz, Löwe, Choromanska, Gori, Abstract 8: Discussions with authors of Huang submitted papers in Navigating the Broader 06:15 Introduction: Bastiaan Veeling Impacts of AI Research, 10:30 AM AM Malinowski Please join our Gather Town to meet the paper 06:17 Invited Talk Bastiaan Veeling Veeling authors! AM - Walk up to the title of the paper you're 06:45 Introduction: Olivier Teytaud Löwe interested in, then press 'x' to view the paper AM - There is a lounge area to the North of the paper 06:47 Invited Talk Olivier Teytaud Teytaud discussion room if you would like to have more AM informal conversations 07:15 Poster Session: Morning AM 08:30 Introduction: Karl Friston Malinowski AM Beyond BackPropagation: Novel Ideas for 08:32 Invited Talk Karl Friston Friston Training Neural Architectures AM 09:00 Panel discussion 1 Veeling, Teytaud, Mateusz Malinowski, Grzegorz Swirszcz, Viorica Patraucean, Marco Gori, Yanping Huang, Sindy Löwe, AM Friston, Löwe, Malinowski Anna Choromanska 09:45 Long Break AM Sat Dec 12, 06:00 AM 11:00 Introduction: Yoshua Bengio AM Choromanska Is backpropagation the ultimate tool on the path 11:02 Invited Talk Yoshua Bengio Bengio to achieving synthetic intelligence as its success AM and widespread adoption would suggest? 11:32 Introduction: Danielle Bassett AM Choromanska Many have questioned the biological plausibility of backpropagation as a learning mechanism 11:34 Invited Talk Danielle Bassett Bassett since its discovery. The weight transport and AM timing problems are the most disputable. The 12:09 Introduction: Oral 1.1 and Oral 1.2 same properties of backpropagation training also PM Patraucean have practical consequences. For instance, 12:10 Orals 1.1: Randomized Automatic backpropagation training is a global and coupled PM Diferentiation Oktay, McGreivy, procedure that limits the amount of possible Beatson, Adams parallelism and yields high latency. 12:22 Orals 1.2: ZORB: A Derivative-Free PM Backpropagation Algorithm for These limitations have motivated us to discuss Neural Networks Ranganathan, possible alternative directions. In this workshop, Lewandowski we want to promote such discussions by bringing 12:35 Oral 1.1 and 1.2 Q&A together researchers from various but related PM disciplines, and to discuss possible solutions from 12:40 Short Break 2 engineering, machine learning and PM neuroscientifc perspectives. 12:45 Introduction: Oral 2.1 and Oral 2.2 PM Patraucean 12:46 Orals 2.1: Policy Manifold Search for PM Improving Diversity-based Neuroevolution Rakicevic 195 Dec. 12, 2020

12:59 Orals 2.2: Hardware Beyond timely given the COVID outbreak, protests PM Backpropagation: a Photonic Co- regarding racism, and associated interest in Processor for Direct Feedback exploring relevance of machine learning to Alignmen Launay, Poli, Daudet, Krzakala questions around disease incidence, prevention 01:12 Oral 2.1 and Oral 2.2 Q&A and mitigation related to both of these and their PM synergy. These questions require the use of data 01:15 Introduction: David Duvenaud Huang from outside of healthcare, as well as PM considerations of how machine learning can augment work in epidemiology and biostatistics. 01:17 Invited Talk David Duvenaud PM Duvenaud 01:45 Introduction: Cristina Savin Huang Schedule PM 01:47 Invited Talk Cristina Savin Savin 05:55 Opening Remarks - Rumi Chunara PM AM 02:15 Panel discussion 2 Bassett, Bengio, 06:00 Participatory Epidemiology and PM Savin, Duvenaud, Choromanska, Huang AM Machine Learning for Innovation in 03:00 Poster Session: Evening Public Health - Daniela Paolotti PM Paolotti 06:50 Unsupervised Discovery of AM Subgroups with Anomalous Maternal MLPH: Machine Learning in Public Health and Neonatal Outcomes with WHO´s Safe Childbirth Checklist as Rumi Chunara, Abraham Flaxman, Daniel Lizotte, Intervention - Girmaw Abebe Chirag Patel, Laura Rosella Tadesse Tadesse

Sat Dec 12, 06:00 AM 07:06 Detection of Malaria Vector AM Breedding Habitats using Public health and population health refer to the Topographic Models - Aishwarya study of daily life factors and prevention eforts, Jadhav Jadhav and their efects on the health of populations. We 07:18 AutoODE: Bridging Physics-based expect that work featured in this workshop will AM and Data-driven modeling for difer from Machine Learning in Healthcare as it COVID-19 Forecasting - Rui Wang will focus on data and algorithms related to the Wang non-medical conditions that shape our health 07:28 FireNet - Dense Forecasting of including structural, lifestyle, policy, social, AM Wildfre Smoke Particulate Matter behavior and environmental factors. Indeed, Using Sparsity Invariant CNNs - much of the data that is traditionally used in Renhao Wang Wang machine learning and health problems are really 07:38 Predicting air pollution spatial about our interactions with the health care AM variation with street-level imagery - system, and this workshop aims to balance this Esra Suel Suel with machine learning work using data on the 07:40 Automated Medical Assistance: non-medical conditions that shape our health. AM Attention Based Consultation There are many machine learning opportunities System - Raj Pranesh Pranesh specifc to these data and how they are used to 07:43 A Expectation-Based Network Scan assess and understand health and disease, that AM Statistic for a COVID-19 Early difer from healthcare specifc data and tasks Warning System - Thorpe Woods (e.g. the data is often unstructured, must be Thorpe-Woods captured across the life-course, in diferent 07:49 Incorporating Healthcase Motivated environments, etc.) This is pertinent for both AM Constraints in Restless Multi-Armed infectious diseases such as COVID-19 and non- Bandit Based Resource Allocation - communicable diseases such as diabetes, stroke, Aviva Prins Prins etc. Indeed, this workshop topic is especially 196 Dec. 12, 2020

07:52 Temporal Graph Analysis for 12:52 Steering a Historical Disease AM Outbreak Pattern Detection in PM Forecasting Model Under a Covid-19 Contact Tracing Networks - Pandemic: A Case of Flu and Dario Antweiler Antweiler COVID-19 - Alexander Rodriguez 07:55 Break Rodriguez AM 12:55 Break 2 08:00 Public Health in Practice Panel: PM AM Matthew Biggerstaf (CDC), Brian 01:00 High Performance AI for Pandemic DeRenzi (Dimagi), Roni Rosenfeld PM Prediction and Response - Madhav (CMU), Zainab Samad (AKU) Chunara Marathe Marathe 10:00 Images and Audio Data as a 01:45 Closing remarks Chunara AM Resource for Environmental Health - PM Scott Weichenthal Weichenthal 10:45 Speed research encounter Abstracts (1): AM 11:45 Understanding Big Data in Abstract 15: Speed research encounter in AM Biomedicine and Public Health - MLPH: Machine Learning in Public Latifa Jackson Jackson Health, 10:45 AM 12:31 How the COVID-19 Community PM Vulnerability Index (CCVI) and Information will be provided to participants in machine learning can enable a advance precision public health response to the pandemic - Nicholas Stewart Sgaier 12:34 Addressing Public Health Literacy Wordplay: When Language Meets Games PM Disparities through Machine Prithviraj Ammanabrolu, Matthew Hausknecht, Eric Learning: A Human in the Loop Yuan, Marc-Alexandre Côté, Adam Trischler, Kory Augmented Intelligence based Tool Mathewson, Jack Urbanek, Jason Weston, Mark Riedl for Public Health - Anjala Susarla

Susarla Sat Dec 12, 06:00 AM 12:39 Twitter Detects Who is Social PM Distancing During COVID-19 - This workshop will focus on exploring the utility of Paiheng Xu Xu interactive narratives to fll a role as the learning 12:42 Sequential Stochastic Network environments of choice for language-based tasks PM Structure Optimization With including but not limited to storytelling. A Applications to Addressing Canada's previous iteration of this workshop took place Obesity Epidemic - Nicholas Johnson very successfully with over a hundred attendees, Johnson also at NeurIPS, in 2018 and since then the community of people working in this area has 12:46 Detecting Individuals with rapidly increased. This workshop aims to be a PM Depressive Disorder From Personal centralized place where all researchers involved and YouTube History across a breadth of felds can interact and learn Logs - Boyu Zhang Zhang from each other. Furthermore, it will act as a 12:49 Scalable Gaussian Process showcase to the wider NLP/RL/Game PM Regression Via Median Posterior communities on interactive narrative's place as a Inference for Estimating Multi- learning environment. The program will feature a Pollutant Mixture Health Efects - collection of invited talks in addition to Aaron Sonabend Sonabend contributed talks and posters from each of these sections of the interactive narrative community and the wider NLP and RL communities. 197 Dec. 12, 2020

Schedule 12:10 Cofee Break 2 PM 12:30 Speaker Intro - Nick Walton 06:00 Opening Remarks PM AM 12:32 Invited talk - AI powered games that 06:10 Invited Talk - From Ground Truth to PM enable unlimited creativity in AM Grounded Truth - Dafna Shahaf infnite worlds - Nick Walton Walton Shahaf 01:17 Live QA - Nick Walton 06:50 Live QA - Dafna Shahaf PM AM 01:25 Contributed talk - Playing Text- 07:00 Contributed talk - Process-Level PM Based Games with Common Sense - AM Representation of Scientifc Sahith Dambekodi Dambekodi Protocols with a Text-Based Game 01:45 Parallel Poster Session 2 Annotation Interface - Ronen Tamari PM Tamari 07:20 Parallel Poster Session 1 AM 08:07 Speaker intro - Angela Fan Interpretable Inductive Biases and AM Physically Structured Learning 08:09 Invited talk - LIGHT: Language in AM Games with Humans and Text - Michael Lutter, Alexander Terenin, Shirley Ho, Lei Wang Angela Fan Fan

08:32 Live QA - Angela Fan Sat Dec 12, 06:30 AM AM 08:40 Contributed talk - The Game Over the last decade, deep networks have AM Engineer's Challenge: Generalizing propelled machine learning to accomplish tasks and Grounding Language Abstractly previously considered far out of reach, human- and Concretely - Catherine Wong level performance in image classifcation and Wong game-playing. However, research has also shown 09:00 Lunch Break that the deep networks are often brittle to AM distributional shifts in data: it has been shown 10:00 Speaker intro - Karthik Narasimhan that human-imperceptible changes can lead to AM absurd predictions. In many application areas, including physics, robotics, social sciences and 10:02 Invited talk - Bringing Back Text life sciences, this motivates the need for AM Understanding into Text-based robustness and interpretability, so that deep Games - Karthik Narasimhan networks can be trusted in practical applications. Narasimhan Interpretable and robust models can be 10:47 Live QA - Karthik Narasimhan constructed by incorporating prior knowledge AM within the model or learning process as an 10:55 Contributed talk - Towards Emotion- inductive bias, thereby regularizing the model, AM Aware Storytelling Using avoiding overftting, and making the model easier Reinforcement Learning - Faeze to understand for scientists who are non- Brahman Brahman machine-learning experts. Already in the last few 11:15 Speaker Intro - Nanyun Peng years researchers from diferent felds have AM proposed various combinations of domain 11:17 Invited talk - Creative Language knowledge and machine learning and AM Generation: Stories, Sarcasms, and successfully applied these techniques to various Similes - Nanyun Peng Peng applications. 12:02 Live QA - Nanyun Peng PM 198 Dec. 12, 2020

Schedule 12:00 12 - IV-Posterior: Inverse Value PM Estimation forInterpretable Policy Certifcates López-Guevara 06:30 Introduction 12:00 22 - Modelling Advertising AM PM Awareness, an Interpretable and 06:35 Thomas Pierrot - Learning Diferentiable Approach Blaz AM Compositional Neural Programs for 12:00 7 - A Symmetric and Object-Centric Continuous Control PIERROT PM World Model for Stochastic 06:50 Jessica Hamrick - Structured Environments Emami AM Computation and Representation in 12:00 20 -SOrT-ing VQA Models : Deep Reinforcement Learning PM Contrastive Gradient Learning for Hamrick Improved Consistency Dharur 07:10 Manu Kalia - Deep learning of 12:00 24 - Deep Context-Aware Novelty AM normal form autoencoders for PM Detection Rushe universal, parameter-dependent 12:00 4 - Physics-informed Generative dynamics Kalia PM Adversarial Networks for Sequence 07:25 Rose Yu - Physics-Guided AI for Generation with Limited Data Chen AM Learning Spatiotemporal Dynamics 12:00 5 - On the Structure of Cyclic Linear Yu PM Disentangled Representations Painter 07:50 Ferran Alet - Tailoring: encoding 12:00 Poster Session 2 AM inductive biases by optimizing PM unsupervised objectives at 12:00 17 - Uncovering How Neural Network prediction time Alet PM Representations Vary with Width 08:05 Poster Session 1 and Depth Nguyen AM 12:00 14 - Learning Dynamical Systems 09:00 Frank Noé - PauliNet: Deep Neural PM Requires Rethinking Generalization AM Network Solution of the Electronic Wang Schrödinger Equation Noe 12:00 15 - Lie Algebra Convolutional 09:25 Kimberly Stachenfeld - Graph PM Networks with Automatic Symmetry AM Networks with Spectral Message Extraction Dehmamy Passing Stachenfeld 12:00 12 - Physics-aware, data-driven 09:40 Franziska Meier - Inductive Biases PM discovery of slow and stable coarse- AM for Models and Learning-to-Learn grained dynamics for high- Meier dimensional multiscale systems 10:10 Rui Wang - Shapley Explanation Kaltenbach AM Networks Wang 12:00 1 - Real-time Classifcation from 10:25 Jeanette Bohg - One the Role of PM Short Event-Camera Streams using AM Hierarchies for Learning Input-fltering Neural ODEs Giannone Manipulation Skills Bohg 12:00 13 - Gradient-based Optimization for 11:00 Panel Discussion PM Multi-resource Spatial Coverage AM Kamra 12:00 16 - An Image is Worth 16 × 16 12:00 2 - Relevance of Rotationally PM Tokens: Visual Priors for Efcient PM Equivariant Convolutions for Image Synthesis with Transformers Predicting Molecular Properties Miller Rombach 12:00 3 - Improving the trustworthiness of 12:00 18 - Simulating Surface Wave PM image classifcation models by PM Dynamics with Convolutional utilizing bounding-box annotations Networks KC 12:00 26 - Is the Surrogate Model 12:00 8 - Individuality in the hive - PM Interpretable? Kim PM Learning to embed lifetime social behavior of honey bees Wild 199 Dec. 12, 2020

12:00 11 - A novel approach for pollution, environmental epidemics), biosphere, PM semiconductor etching process with and biogeosciences. We also seek interest in AI inductive biases Myung applied to energy for renewable energy 12:00 10 - A Trainable Optimal Transport meteorology, thermodynamics and heat transfer PM Embedding for Feature Aggregation problems. We call for papers demonstrating novel Mialon machine learning techniques in remote sensing 12:00 25 - Complex Skill Acquisition for meteorology and geosciences, generative PM through Simple Skill Imitation Earth system modeling, and transfer learning Learning Pasula from geophysics and numerical simulations and uncertainty in Earth science learning 12:00 19 - Choice of Representation representations. We also seek theoretical PM Matters for Adversarial Robustness developments in interpretable machine learning Sanyal in meteorology and geoscientifc models, hybrid 12:00 21 - Solving Physics Puzzles by models with Earth science knowledge guided PM Reasoning about Paths Harter machine learning, representation learning from 12:00 23 - Constraining neural networks graphs and manifolds in spatiotemporal models PM output by an interpolating loss and dimensionality reduction in Earth sciences. In function with region priors Bergkvist addition, we seek Earth science applications from 12:00 9 - Thermodynamic Consistent vision, robotics, multi-agent systems and PM Neural Networks for Learning reinforcement learning. New labelled benchmark Material Interfacial Mechanics Zhang datasets and generative visualizations of the 12:00 6 - Interpretable Models for Granger Earth are also of particular interest. A new area of PM Causality Using Self-explaining interest is in integrated assessment models and Neural Networks Marcinkevičs human-centered AI for Earth. 01:00 Liwei Chen - Deep Learning PM Surrogates for Computational Fluid Dynamics Thuerey AI4Earth Areas of Interest: 01:15 Maziar Raissi - Hidden Physics - Atmospheric Science PM Models Raissi - Hydro and Cryospheres 02:15 Closing Remarks - Solid Earth PM - Theoretical Advances - Remote Sensing - Energy in the Earth system - Extreme weather & climate AI for Earth Sciences - Geo-health - Biosphere & Biogeosciences Karthik Mukkavilli, Johanna Hansen, Natasha Dudek, - Planetary sciences Tom Beucler, Kelly Kochanski, Mayur Mudigonda, Karthik Kashinath, Amy McGovern, Paul D Miller, - Benchmark datasets Chad Frischmann, Pierre Gentine, Gregory Dudek, - People-Earth Aaron Courville, Daniel Kammen, Vipin Kumar

Sat Dec 12, 06:45 AM Schedule

Our workshop proposal AI for Earth sciences N/A Link to Gather.Town for Casual seeks to bring cutting edge geoscientifc and Conversation planetary challenges to the fore for the machine learning and deep learning communities. We seek 06:45 Introduction and opening remarks machine learning interest from major areas AM Mukkavilli encompassed by Earth sciences which include, 06:55 Sensors and Sampling Hansen atmospheric physics, hydrologic sciences, AM cryosphere science, oceanography, geology, planetary sciences, space weather, volcanism, seismology, geo-health (i.e. water, land, air 200 Dec. 12, 2020

06:58 Yogesh Girdhar - Enabling Vision 11:25 A Machine Learner's Guide to AM Guided Interactive Exploration in AM Streamfow Prediction Gauch Bandwidth Limited Environments 11:40 A Deep Learning Architecture for Girdhar AM Conservative Dynamical Systems: 07:22 Eyes in the sky without boots on the Application to Rainfall-Runof AM ground: Using satellites and Modeling Nearing machine learning to monitor 11:55 Dynamic Hydrology Maps from agriculture and food security during AM Satellite-LiDAR Fusion Mateo-García COVID-19 Kerner 12:10 Efcient Reservoir Management 07:35 Autonomous Robot Manipulation for PM through Deep Reinforcement AM Planetary Science: Mars Sample Learning Wang Return, Climbing Lava Tubes Detry 12:20 Q/A and Discussion for Water 07:58 DeepFish: A realistic fsh‑habitat PM Session Mukkavilli, Gentine, Nearing AM dataset to evaluate algorithms for 12:45 Milind Tambe Tambe underwater visual analysis Saleh, PM Laradji, Vázquez 01:15 Q/A and Discussion Mukkavilli, 08:06 Automatic three‐dimensional PM Mudigonda, Tambe AM mapping for tree diameter 01:25 Atmosphere Beucler measurements in inventory PM operations Tremblay 01:30 Michael Pritchard Pritchard 08:20 Q/A and Discussion for Sensing & PM AM Sampling Session Hansen, Girdhar, 01:55 Elizabeth Barnes Barnes Kerner, Detry PM 08:55 Ecology Dudek 02:20 Spatio-temporal segmentation and AM PM tracking of weather patterns with 09:00 Dan Morris Morris light-weight Neural Networks Kapp- AM Schwoerer 09:25 Giulio De Leo De Leo 02:35 Leveraging Lightning with AM PM Convolutional Recurrent 09:55 Graph Learning for Inverse AutoEncoder and ROCKET for Severe AM Landscape Genetics Dharangutte Weather Detection Ahmed 10:05 Segmentation of Soil Degradation 02:50 Towards Data-Driven Physics- AM Sites in Swiss Alpine Grasslands PM Informed Global Precipitation with Deep Learning Samarin Forecasting from Satellite Imagery 10:15 Novel application of Convolutional Zantedeschi, Zantedeschi AM Neural Networks for the meta- 02:55 Q/A and Discussion for Atmosphere modeling of large-scale spatial data PM Session Beucler, Pritchard, Barnes Stern 03:25 Simulations, Physics-guided, and ML 10:20 Understanding Climate Impacts on PM Theory Kashinath AM Vegetation with Gaussian Processes 03:30 Stephan Mandt Mandt in Granger Causality Morata Dolz PM 10:25 Interpreting the Impact of Weather 03:55 Rose Yu Yu AM on Crop Yield Using Attention PM Gangopadhyay 04:20 Generating Synthetic Multispectral 10:30 Q/A and Discussion for Ecology PM Satellite Imagery from Sentinel-2 AM Session Dudek, Morris, De Leo Alemohammad 10:55 Water Mukkavilli 04:30 Multiresolution Tensor Learning for AM PM Efcient and Interpretable 11:00 Pierre Gentine Gentine Spatiotemporal Analysis Walker AM 201 Dec. 12, 2020

04:40 Climate-StyleGAN : Modeling 08:20 An Active Learning Pipeline to PM Turbulent ClimateDynamics Using PM Detect Hurricane Washover in Post- Style-GAN Gupta Storm Aerial Images Goldstein 04:50 Interpretable Deep Generative 08:25 Developing High Quality Training PM Spatio-Temporal Point Processes Zhu PM Samples for Deep Learning Based 04:55 Completing physics-based model by Local Climate Classifcation in Korea PM learning hidden dynamics through Kim data assimilation Filoche 08:30 Q/A and Discussion for Benchmark 05:00 Q/A and Discussion for ML Theory PM Datasets Kashinath PM Session Kashinath, Mudigonda, Mandt, 08:55 Posters Mukkavilli Yu PM 05:20 People-Earth Mudigonda 08:55 Workshop Closing Remarks Mukkavilli PM PM 05:25 Q/A and Panel Discussion for People- 09:00 Nowcasting Solar Irradiance Over PM Earth with Dan Kammen and Milind PM Oahu Sadowski Tambe Kammen, Tambe, De Leo, 09:00 Predicting Streamfow By Using Mudigonda, Mukkavilli PM BiLSTM with Attention from 06:00 Solid Earth Kochanski heterogeneous spatiotemporal PM remote sensing products Bhatia - 06:05 Soft Attention Convolutional Neural IITGN PM Networks for Rare Event Detection 09:00 Interpretability in Convolutional in Sequences Kulkarni PM Neural Networks for Building 06:20 An End-to-End Earthquake Damage Classifcation in Satellite PM Monitoring Method for Joint Imagery Chen Earthquake Detection and 09:00 A Comparison of Data-Driven Models Association using Deep Learning Zhu PM for Predicting Stream Water 06:30 Single-Station Earthquake Location Temperature Weierbach PM Using Deep Neural Networks Mousavi 09:00 MonarchNet: Diferentiating 06:40 Framework for automatic globally PM Monarch Butterfies from Those with PM optimal well log correlation Datskiv Similar Appearances Chen 06:45 Q/A and Discussion for Solid Earth 09:00 Towards Automated Satellite PM Kochanski PM Conjunction Management with 07:00 Benchmark Datasets Kashinath Bayesian Deep Learning Pinto PM 09:00 Domain Adaptive Shake-shake 07:05 Stephan Rasp Rasp PM Residual Network for Corn Disease PM Recognition Fang 07:30 RainBench: Enabling Data-Driven 09:00 Optimising Placement of Pollution PM Precipitation Forecasting on a Global PM Sensors in Windy Environments Scale Tong Hellan 07:45 WildfreDB: A Spatio-Temporal 09:00 Bias correction of global climate PM Dataset Combining Wildfre PM model using machine learning Occurrence with Relevant Covariates algorithms to determine Singla, Diao meteorological variables in diferent tropical climates of Indonesia 08:00 LandCoverNet: A global benchmark Nathaniel PM land cover classifcation training dataset Alemohammad 09:00 Temporally Weighting Machine PM Learning Models for High-Impact 08:10 Applying Machine Learning to Severe Hail Prediction Burke PM Crowd-sourced Data from Earthquake Detective Ranadive 202 Dec. 12, 2020

09:00 Semantic Segmentation of Medium- WARPLab is headed by Yogesh Girdhar, and is PM Resolution Satellite Imagery using part of the Deep Submergence Laboratory (DSL), Conditional Generative Adversarial and the Applied Ocean Physics & Engineering Networks Alemohammad (AOPE) department at Woods Hole Oceanographic 09:00 Spectral Unmixing With Multinomial Institution. PM Mixture Kernel and Wasserstein Generative Adversarial Loss Ozkan 09:00 Integrating data assimilation with Abstract 5: Eyes in the sky without boots on PM structurally equivariant spatial the ground: Using satellites and machine transformers: Physically consistent learning to monitor agriculture and food data-driven models for weather security during COVID-19 in AI for Earth forecasting Chattopadhyay Sciences, Kerner 07:22 AM 09:00 Unsupervised Regionalization of Talk Title: "Eyes in the sky without boots on the PM Particle-resolved Aerosol Mixing ground: Using satellites and machine learning to State Indices on the Global Scale monitor agriculture and food security during Zheng COVID-19" 09:00 Inductive Predictions of Extreme PM Hydrologic Events in The Wabash Hannah Kerner is an Assistant Research Professor River Watershed Majeske at the University of Maryland, College Park. Her research focuses on developing machine learning Abstracts (29): solutions for remote sensing applications in agricultural monitoring, food security, and Earth/ Abstract 2: Introduction and opening planetary science. She is the Machine Learning remarks in AI for Earth Sciences, Mukkavilli Lead and U.S. Domestic Co-Lead for NASA 06:45 AM Harvest, NASA’s food security initiative run out of the University of Maryland. AI for Earth Sciences, Workshop Founder & Chair, S. Karthik Mukkavilli

Abstract 6: Autonomous Robot Manipulation for Planetary Science: Mars Sample Return, Abstract 3: Sensors and Sampling in AI for Climbing Lava Tubes in AI for Earth Earth Sciences, Hansen 06:55 AM Sciences, Detry 07:35 AM Sensors and Sampling, Session Chair, Johanna Talk Title: Autonomous Robot Manipulation for Hansen Planetary Science: Mars Sample Return, Climbing Lava Tubes

Abstract 4: Yogesh Girdhar - Enabling Vision This talk will highlight work at NASA on robotic Guided Interactive Exploration in Bandwidth missions from a machine vision perspective. The Limited Environments in AI for Earth discussion will focus on the science questions Sciences, Girdhar 06:58 AM that NASA hopes to answer through returned samples from Mars and the challenges imposed WARPLab's research focuses on both the science on robotic systems used for scientifc data and systems of exploration robots in extreme, collection. communication starved environments such as the deep sea. It aims to develop robotics and Related Papers: machine learning-based techniques to enable http://renaud-detry.net/publications/Pham-2020- search, discovery, and mapping of natural AEROCONF.pdf phenomena that are difcult to observe and https://www.liebertpub.com/doi/10.1089/ast. study due to various physical and information- 2019.2177 theoretic challenges.

Renaud Detry is the group leader for the 203 Dec. 12, 2020

Perception Systems group at NASA's Jet benchmark serves as a testbed to motivate Propulsion Laboratory (JPL). Detry earned his further development in this challenging domain Master's and Ph.D. degrees in computer of underwater computer vision. engineering and robot learning from ULiege in 2006 and 2010. He served as a postdoc at KTH and ULiege between 2011 and 2015, before Abstract 8: Automatic three‐dimensional joining the Robotics and Mobility Section at JPL in mapping for tree diameter measurements 2016. His research interests are perception and in inventory operations in AI for Earth learning for manipulation, robot grasping, and Sciences, Tremblay 08:06 AM mobility, for terrestrial and planetary applications. At JPL, Detry leads the machine- Forestry is a major industry in many parts of the vision team of the Mars Sample Return surface world, yet this potential domain of application mission, and he leads and contributes to a variety area has been overlooked by the robotics of research projects related to industrial robot community. For instance, forest inventory, a manipulation, orbital image understanding, in- cornerstone of efcient and sustainable forestry, space assembly, and autonomous wheeled or is still traditionally performed manually by legged mobility for Mars, Europa, and Enceladus. qualifed professionals. The lack of automation in this particular task, consisting chiefy of measuring tree attributes, limits its speed, and, Abstract 7: DeepFish: A realistic fsh‑habitat therefore, the area that can be economically covered. To this efect, we propose to use recent dataset to evaluate algorithms for advancements in three‐dimensional mapping underwater visual analysis in AI for Earth Sciences, Saleh, Laradji, Vázquez 07:58 AM approaches in forests to automatically measure tree diameters from mobile robot observations. Visual analysis of complex fsh habitats is an While previous studies showed the potential for important step towards sustainable fsheries for such technology, they lacked a rigorous analysis human consumption and environmental of diameter estimation methods in challenging protection. Deep Learning methods have shown and large‐scale forest environments. Here, we great promise for scene analysis when trained on validated multiple diameter estimation methods, large-scale datasets. However, current datasets including two novel ones, in a new publicly‐ for fsh analysis tend to focus on the classifcation available dataset which includes four diferent task within constrained, plain environments forest sites, 11 trajectories, totaling 1458 tree which do not capture the complexity of observations, and 14,000 m2. From our extensive underwater fsh habitats. To address this validation, we concluded that our mapping limitation, we present DeepFish as a benchmark method is usable in the context of automated suite with a large-scale dataset to train and test forest inventory, with our best diameter methods for several computer vision tasks. The estimation method yielding a root mean square dataset consists of approximately 40 thousand error of 3.45 cm for our whole dataset and 2.04 images collected underwater from 20 habitats in cm in ideal conditions consisting of mature forest the marine-environments of tropical Australia. with well‐spaced trees. Furthermore, we release The dataset originally contained only this dataset to the public (https:// classifcation labels. Thus, we collected point- norlab.ulaval.ca/research/montmorencydataset), level and segmentation labels to have a more to spur further research in robotic forest comprehensive fsh analysis benchmark. These inventories. Finally, stemming from this large‐ labels enable models to learn to automatically scale experiment, we provide recommendations monitor fsh count, identify their locations, and for future deployments of mobile robots in a estimate their sizes. Our experiments provide an forestry context. in-depth analysis of the dataset characteristics, and the performance evaluation of several state- Jean-François is a Ph.D. student at McGill’s Mobile of-the-art approaches based on our benchmark. Robotics Lab, under the supervision of prof. Dave Although models pre-trained on ImageNet have Meger. He is interested in model-based RL for successfully performed on this benchmark, there mobile robot navigation in unstructured is still room for improvement. Therefore, this environments such as forests, tundra or 204 Dec. 12, 2020

underwater. Previously he was a masters student Abstract 21: A Machine Learner's Guide to at the Northern Robotics Laboratory (Norlab), Streamfow Prediction in AI for Earth working on lidar mapping and perception for Sciences, Gauch 11:25 AM forestry applications. Long Oral (15m)

Abstract 9: Q/A and Discussion for Sensing & Sampling Session in AI for Earth Sciences, Abstract 22: A Deep Learning Architecture for Hansen, Girdhar, Kerner, Detry 08:20 AM Conservative Dynamical Systems: Application to Rainfall-Runof Modeling in AI Moderated by Johanna Hansen for Earth Sciences, Nearing 11:40 AM

Long Talk (15m)

Abstract 10: Ecology in AI for Earth Sciences, Dudek 08:55 AM Abstract 23: Dynamic Hydrology Maps from Ecology, Session Chair, Natasha Dudek Satellite-LiDAR Fusion in AI for Earth Sciences, Mateo-García 11:55 AM

Long Talk (15m) Abstract 11: Dan Morris in AI for Earth Sciences, Morris 09:00 AM

Program Director of Microsoft AI for Earth Abstract 25: Q/A and Discussion for Water Session in AI for Earth Sciences, Mukkavilli, Gentine, Nearing 12:20 PM

Abstract 12: Giulio De Leo in AI for Earth Moderated by S. Karthik Mukkavilli Sciences, De Leo 09:25 AM

Talk Title (tentative): ML and control of parasitic diseases of poverty in tropical and subtropical Abstract 26: Milind Tambe in AI for Earth countries, with a special focus on schistosomiasis Sciences, Tambe 12:45 PM

Prof Milind Tambe Professor at Stanford University Senior Fellow at Stanford Woods Institute for the Environment Director, Center for Research on Computation & Society Gordon McKay Professor of Computer Science Harvard John A. Paulson School of Engineering Abstract 18: Q/A and Discussion for Ecology and Applied Sciences Session in AI for Earth Sciences, Dudek, Mail: Maxwell Dworkin 125, 33 Oxford Street, Morris, De Leo 10:30 AM Cambridge, MA 02138 Moderated by Natasha Dudek Director for AI for Social Good Google India Research Center

Abstract 19: Water in AI for Earth Sciences, teamcore.seas.harvard.edu/tambe Mukkavilli 10:55 AM

By S. Karthik Mukkavilli Abstract 28: Atmosphere in AI for Earth Sciences, Beucler 01:25 PM

By Tom Beucler 205 Dec. 12, 2020

Abstract 52: Benchmark Datasets in AI for Earth Sciences, Kashinath 07:00 PM Abstract 30: Elizabeth Barnes in AI for Earth Sciences, Barnes 01:55 PM By Karthik Kashinath

Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks Abstract 54: RainBench: Enabling Data- Driven Precipitation Forecasting on a Global Scale in AI for Earth Sciences, Tong 07:30 PM Abstract 31: Spatio-temporal segmentation and tracking of weather patterns with light- Long Talk (15m) weight Neural Networks in AI for Earth Sciences, Kapp-Schwoerer 02:20 PM

Long Talk (15m) Abstract 55: WildfreDB: A Spatio-Temporal Dataset Combining Wildfre Occurrence with Relevant Covariates in AI for Earth Sciences, Singla, Diao 07:45 PM Abstract 32: Leveraging Lightning with Convolutional Recurrent AutoEncoder and Long Talk (15m) ROCKET for Severe Weather Detection in AI for Earth Sciences, Ahmed 02:35 PM

Long Talk (15m) Machine Learning for Mobile Health

Joe Futoma, Walter Dempsey, Katherine Heller, Yi-An Abstract 35: Simulations, Physics-guided, Ma, Nicholas Foti, Marianne Njifon, Kelly Zhang, Hera and ML Theory in AI for Earth Sciences, Shi Kashinath 03:25 PM Sat Dec 12, 07:00 AM By Karthik Kashinath Mobile health (mHealth) technologies have transformed the mode and quality of clinical research. Wearable sensors and mobile phones Abstract 43: Q/A and Discussion for ML provide real-time data streams that support Theory Session in AI for Earth Sciences, automated clinical decision making, allowing Kashinath, Mudigonda, Mandt, Yu 05:00 PM researchers and clinicians to provide ecological Moderated by Karthik Kashinath and Mayur and in-the-moment support to individuals in Mudigonda need. Mobile health technologies are used across various health felds. Their inclusion in clinical care has aimed to improve HIV medication adherence, to increase activity, supplement Abstract 44: People-Earth in AI for Earth counseling/pharmacotherapy in treatment for Sciences, Mudigonda 05:20 PM substance use, reinforce abstinence in addictions, By Mayur Mudigonda and to support recovery from alcohol dependence. The development of mobile health technologies, however, has progressed at a faster pace than the science and methodology to Abstract 46: Solid Earth in AI for Earth evaluate their validity and efcacy. Sciences, Kochanski 06:00 PM

By Kelly Kochanski Current mHealth technologies are limited in their ability to understand how adverse health behaviors develop, how to predict them, and how to encourage healthy behaviors. In order for mHealth to progress and have expanded impact, 206 Dec. 12, 2020

the feld needs to facilitate collaboration among 11:40 Invited Talk: Language-based machine learning researchers, statisticians, AM Behavior and Interventions in Mobile mobile sensing researchers, human-computer Health Althof interaction researchers, and clinicians. 12:15 A generative, predictive model for Techniques from multiple felds can be brought to PM menstrual cycle lengths that bear on the substantive problems facing this accounts for potential self-tracking interdisciplinary discipline: experimental design, artifacts in mobile health data Li causal inference, multi-modal complex data 12:25 Using Convolutional Variational analytics, representation learning, reinforcement PM Autoencoders to Predict Post- learning, deep learning, transfer learning, data Trauma Health Outcomes from visualization, and clinical integration. Actigraphy Data Cakmak 12:35 Fast Physical Activity Suggestions: This workshop will assemble researchers from the PM Efcient Hyperparameter Learning key areas in this interdisciplinary space in Mobile Health Menictas necessary to better address the challenges 12:50 Q&A for Afternoon Spotlight Talks currently facing the widespread use of mobile PM health technologies. 01:00 Discussion with Invited Speakers: PM Susan Murphy, Tanzeem Choudhury, Schedule Tim Althof 01:30 Poster Session in Gather Town PM 07:00 Intro 02:15 Concluding Remarks (in AM PM Gather.Town) 07:10 Invited Talk: Matthew Nock Nock AM 07:30 Invited Talk: Lee Hartsell Hartsell Abstracts (9): AM Abstract 4: Invited Talk: AI for Decision 07:50 Invited Talk: AI for Decision Support Support in Low Resource Areas in Machine AM in Low Resource Areas Salim Jr Learning for Mobile Health, Salim Jr 07:50 AM 08:25 Using Wearables for Infuenza-Like AM Illness Detection: The importance of Decision making is one of those extremely design Nestor complex things that humans can do with relative 08:35 Representing and Denoising ease most of the time. Healthcare providers do AM Wearable ECG Recordings Chan this hundreds and thousands of times per day, 08:45 Towards Personal Hand Hygiene and do an amazing job given their various levels AM Detection in Free-living Using of expertise and the resources available to them. Wearable Devices Tang The Elsa Health Assistant is a set of tools and 09:00 Q&A for Morning Spotlight Talks technologies that leverage advances in Artifcial AM Intelligence and causal modeling to augment the 09:10 Discussion for Invited Speakers: capacity of lower cadre healthcare providers and AM Matthew Nock, Lee Hartsell, Ally support optimal and consistent decision making. Salim Jr Here we will share the challenges, failures and successes of the technologies and the team. 09:40 Poster Session in Gather Town AM 10:20 Lunch / Networking Break AM Abstract 5: Using Wearables for Infuenza- 11:00 Invited Talk: Assessing Like Illness Detection: The importance of AM Personalization in Digital Health design in Machine Learning for Mobile Murphy Health, Nestor 08:25 AM

11:20 Invited Talk: Tanzeem Choudhury Consumer wearable sensors are estimated to be AM Choudhury used by one in fve Americans for tracking ftness 207 Dec. 12, 2020

and other personal health. Recently, they have Abstract 6: Representing and Denoising been touted as low-cost vehicles for frequent Wearable ECG Recordings in Machine healthcare monitoring and have received Learning for Mobile Health, Chan 08:35 AM approval as diagnostic devices to detect conditions such as atrial fbrillation. Common Modern wearable devices are embedded with a ftness tracker measurements such as heart rate range of noninvasive biomarker sensors that hold or steps can be used to implicate underlying promise for improving detection and treatment of disease. One such sensor is the single-lead causes. One application of interest is to electrocardiogram (ECG) which measures anticipate or detect infuenza-like illness (ILI). However, a timely detection of infuenza is a electrical signals in the heart. The benefts of the challenge as the virus can be transmitted prior to sheer volume of ECG measurements with rich symptom onset (pre-symptomatic), or by longitudinal structure made possible by wearables come at the price of potentially noisier individuals who harbour the virus, but do not measurements compared to clinical ECGs, e.g., experience symptoms (asymptomatic). Similarly, 44\% of viral shedding of COVID-19, another due to movement. In this work, we develop a disease which causes ILI, in symptomatic statistical model to simulate a structured noise individuals happens prior to the onset of process in ECGs derived from a wearable sensor, design a beat-to-beat representation that is symptoms. We investigate if ILI (as caused by conducive for analyzing variation, and devise a infuenza, COVID-19, and other diseases) can be detected by wearable sensors, and if possible, factor analysis-based method to denoise the ECG. how early we can anticipate the onset of We study synthetic data generated using a symptoms. Having a system to warn users that realistic ECG simulator and a structured noise model. At varying levels of signal-to-noise, we they are about to become ill can reduce viral quantitatively measure an upper bound on transmission -- mitigating the spread of seasonal infuenza and suppressing the COVID-19 performance and compare estimates from linear epidemic. and non-linear models. Finally, we apply our method to a set of ECGs collected by wearables in a mobile health study. ILI symptoms can be detected from wearable sensors. For example, temperature covaries with cardiac rhythm. The associated increase of resting heart rate (RHR) during ILI has been Abstract 7: Towards Personal Hand Hygiene demonstrated in previous studies and has been Detection in Free-living Using Wearable used to estimate the incidence of infuenza at a Devices in Machine Learning for Mobile population level, using data collected from Health, Tang 08:45 AM wearable sensors. Yet, individual-level ILI predictions from wearable features have been The COVID-19 outbreak demonstrates the need elusive, though research is actively underway. for measurement of hand hygiene behaviors such as handwashing and face touching to prevent the Rigorous work is required to evaluate the spread of infectious diseases. Wearable sensitivity of models that anticipate ILI onset prior to experiencing symptoms. In this paper we technologies and machine-learning-based expose potential pitfalls in building ILI prediction algorithms can be used to automatically detect models. Specifcally, we compare the these behaviors. In this work, we demonstrate a with a set of local- performance of a model trained and evaluated extrema-based features for detecting hand retrospectively with a held-out set of subjects versus prospectively on a held-out future week of hygiene behaviors (handwashing and face data, mimicking actual deployment scenarios. We touching activities simultaneously) using data show that when the design is focused on from inertial sensors (i.e., accelerometer, magnetometer, and gyroscope) on the wrist(s). deployment, though the performance may drop, The training and validation dataset were it is still improved over naive baselines, indicating potential real-world applications. gathered from ten individuals; each person provided 60 min of data (sampled at 100 Hz) while performing 12 steps of handwashing, 8 variations of face touching, and 7 variations of 208 Dec. 12, 2020

other face-to-head gestures across six sessions. mobile health data in Machine Learning for With 10 min of person-specifc training data, the Mobile Health, Li 12:15 PM real-time algorithm achieved its best Mobile health (mHealth) apps such as menstrual performance (F1-score of 0.88 for handwashing steps and 0.80 for face touching) using leave- trackers provide a rich source of self-tracked one-session-out validation. We also describe a health observations that can be leveraged for pilot evaluation on six-hour, free-living waking- statistical modeling. However, such data streams are notoriously unreliable since they hinge on day datasets of two participants annotated via user adherence to the app. Thus, it is crucial for front-facing video. machine learning models to account for self- tracking artifacts like skipped self-tracking. In this abstract, we propose and evaluate a hierarchical, Abstract 12: Invited Talk: Assessing generative model for predicting next cycle length Personalization in Digital Health in Machine based on previously tracked cycle lengths that Learning for Mobile Health, Murphy 11:00 AM accounts explicitly for the possibility of users Reinforcement Learning provides an attractive forgetting to track their period. Our model ofers suite of online learning methods for personalizing several advantages: 1) accounting explicitly for self-tracking artifacts yields better prediction interventions in a Digital Health. However after accuracy as likelihood of skipping increases; 2) as an reinforcement learning algorithm has been run in a clinical study, how do we assess whether a generative model, predictions can be updated personalization occurred? We might fnd users for online as a given cycle evolves; and 3) its whom it appears that the algorithm has indeed hierarchical nature enables modeling of an individual's cycle length history while learned in which contexts the user is more incorporating population-level information. Our responsive to a particular intervention. But could this have happened completely by chance? We experiments using real mHealth cycle length data discuss some frst approaches to addressing from 5,000 menstruators show that our method these questions. yields state-of-the-art performance against neural network-based and summary statistic-based baselines.

Abstract 14: Invited Talk: Language-based Behavior and Interventions in Mobile Health Abstract 16: Using Convolutional Variational in Machine Learning for Mobile Health, Althof 11:40 AM Autoencoders to Predict Post-Trauma Health Outcomes from Actigraphy Data in Machine Mobile health seeks to provide in-the-moment Learning for Mobile Health, Cakmak 12:25 PM support to individuals in need. In this talk, I will discuss the challenges associated with behavior Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly and interventions that are based on language. associated with experiencing a traumatic event. Language is high-dimensional and complex, but is a critical component of many health and Estimating mental health status through non- support interactions. Specifcally, I will describe invasive techniques such as activity-based how we can measure empathy in mental health algorithms can help to identify successful early interventions. In this work, we used locomotor peer support and how we can give feedback in activity captured from 1113 individuals who wore order to empower peer supporters to increase expressed levels of empathy, using large-scale a research grade smartwatch post-trauma. A neural transformer architectures and convolutional variational autoencoder (VAE) reinforcement learning. architecture was used for unsupervised feature extraction from four weeks of actigraphy data. By using VAE latent variables and the participant’s pre-trauma physical health status as features, a Abstract 15: A generative, predictive model logistic regression classifer achieved an area for menstrual cycle lengths that accounts under the receiver operating characteristic curve for potential self-tracking artifacts in (AUC) of 0.64 to estimate mental health 209 Dec. 12, 2020

outcomes. The results indicate that the VAE datasets of natural language. Thanks to deep RL, model is a promising approach for actigraphy emergent communication can now be studied in data analysis for mental health outcomes in long- complex multi-agent scenarios. term studies. Three previous successful workshops (2017-2019) have gathered the community to discuss how, when, and to what end Abstract 17: Fast Physical Activity Suggestions: Efcient Hyperparameter communication emerges, producing research Learning in Mobile Health in Machine later published at top ML venues (e.g., ICLR, Learning for Mobile Health, Menictas 12:35 ICML, AAAI). However, many approaches to studying emergent communication rely on PM extensive amounts of shared training time. Our Users can be supported to adopt healthy question is: Can we do that faster? behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile Humans interact with strangers on a daily basis. devices. Recently, reinforcement learning They possess a basic shared protocol, but a huge algorithms have been found to be efective for partition is nevertheless defned by the context. learning the optimal context under which to Humans are capable of adapting their shared provide suggestions. However, these algorithms protocol to ever new situations and general AI are not necessarily designed for the constraints would need this capability too. posed by mobile health (mHealth) settings, that they be efcient, domain-informed and We want to explore the possibilities for artifcial computationally afordable. agents of evolving ad hoc communication We propose an algorithm for providing physical spontaneously, by interacting with strangers. activity suggestions in mHealth Since humans excel on this task, we want to start settings. Using domain-science, we formulate a by having the participants of the workshop take contextual bandit algorithm which makes use of a the role of their agents and develop their own linear mixed efects model. We then introduce a bots for an interactive game. This will illuminate procedure the necessities of zero-shot communication to efciently perform hyper-parameter updating, learning in a practical way and form a base of using far less computational resources than understanding to build algorithms upon. The competing approaches. Not only is our approach participants will be split into groups and will have computationally efcient, it is also easily one hour to develop their bots. Then, a round- implemented with closed form matrix algebraic robin tournament will follow, where bots will play updates and we show improvements over state of an iterated zero-shot communication game with the art approaches both in speed and accuracy of other teams’ bots. up to 99% and 56% respectively. This interactive approach is especially aimed at the defned NeurIPS workshop goals to clarify questions for a subfeld or application area and to Talking to Strangers: Zero-Shot Emergent crystallize common problems. It condenses our Communication experience from former workshops on how workshop design can facilitate cooperation and

Marie Ossenkopf, Angelos Filos, Abhinav Gupta, progress in the feld. We also believe that this will Michael Noukhovitch, Angeliki Lazaridou, Jakob maximize the interactions and exchange of ideas Foerster, Kalesha Bullard, Rahma Chaabouni, Eugene between our community. Kharitonov, Roberto Dessì

Sat Dec 12, 07:00 AM Schedule

Communication is one of the most impressive human abilities but historically it has been 07:00 Welcome Remarks studied in machine learning mainly on confned AM 210 Dec. 12, 2020

07:08 Intro to Ruth Byrne Abstract 3: Invited Talk 1: Ruth Byrne (TCD) - AM How people make inferences about other 07:10 Invited Talk 1: Ruth Byrne (TCD) - people's inferences in Talking to Strangers: AM How people make inferences about Zero-Shot Emergent Communication, Byrne other people's inferences Byrne 07:10 AM 07:40 Rules of the Game and Demo I consider the sorts of models people construct to AM reason about other people’s thoughts based on 07:50 Cofee Break + Group Assignment several strands of evidence from cognitive AM science experiments. The frst is from studies of 08:00 Live Coding Session how people think about decisions to cooperate or AM not with another person in various sorts of social 09:00 Poster Session 1 interactions in which they must weigh their own AM self-interest against the common interest. I 09:45 Lunch Break + Game Matches discuss results from well-known games such as AM the Prisoner’s dilemma, such as the fnding that 10:45 Winner's Talk people who took part in the game imagine the AM outcome would have been diferent if a diferent 11:00 Intro to Michael Bowling decision had been made by the other player, not AM themselves. The second strand of evidence comes from studies of how people think about 11:02 Invited Talk 2: Michael Bowling other people’s false beliefs. I discuss reasoning in AM (University of Alberta) - Hindsight change-of-intentions tasks, in which an observer Rationality: Alternatives to Nash who witnesses an actor carrying out an action Bowling forms a false belief about the reason. People 11:37 Intro to Richard Futrell appear to develop the skills to make inferences AM about other people’s false beliefs by creating 11:40 Invited Talk 3: Richard Futrell (UCI) - counterfactual alternatives to reality about how AM Information-theoretic models of things would have been. I consider how people natural language Futrell construct models of other people’s thoughts, and 12:15 Poster Session 2 consider the implications for how AI agents could PM construct models of other AI agents. 01:00 Chat about the Game PM 01:15 Panel Discussion Abstract 4: Rules of the Game and Demo in PM Talking to Strangers: Zero-Shot Emergent 02:00 Closing Remarks Communication, 07:40 AM PM 02:10 After-Workshop Social Explanation of the Game Rules for the live coding PM session

Abstracts (14): Abstract 5: Cofee Break + Group Assignment Abstract 1: Welcome Remarks in Talking to in Talking to Strangers: Zero-Shot Emergent Strangers: Zero-Shot Emergent Communication, 07:50 AM Communication, 07:00 AM Find your group Welcome! The whole workshop will be held in gather.town. Please drop in any time. Abstract 6: Live Coding Session in Talking to Strangers: Zero-Shot Emergent Communication, 08:00 AM 211 Dec. 12, 2020

Time for your team to solve the game. Session the world, including being the frst to beat will be held in gather.town professional players at both limit and no-limit variants of the game. He also was behind the use of Atari 2600 games to evaluate the general Abstract 7: Poster Session 1 in Talking to competency of reinforcement learning algorithms and popularized research in Hanabi, a game that Strangers: Zero-Shot Emergent illustrates emergent communication and theory Communication, 09:00 AM of mind. Will be held in gather.town

Abstract 13: Invited Talk 3: Richard Futrell Abstract 8: Lunch Break + Game Matches in (UCI) - Information-theoretic models of Talking to Strangers: Zero-Shot Emergent natural language in Talking to Strangers: Communication, 09:45 AM Zero-Shot Emergent Communication, Futrell 11:40 AM The matches between the teams will be shown live during the lunch break. Optional attendance. I claim that human languages can be modeled as information-theoretic codes, that is, systems that maximize information transfer under certain Abstract 9: Winner's Talk in Talking to constraints. I argue that the relevant constraints for human language are those involving the Strangers: Zero-Shot Emergent cognitive resources used during language Communication, 10:45 AM production and comprehension and in particular Short presentation of which strategies seemed to working memory resources. Viewing human work well. No thorough analysis yet. language in this way, it is possible to derive and test new quantitative predictions about the statistical, syntactic, and morphemic structure of

Abstract 11: Invited Talk 2: Michael Bowling human languages. (University of Alberta) - Hindsight I start by reviewing some of the many ways that Rationality: Alternatives to Nash in Talking natural languages difer from optimal codes as studied in information theory. I argue that one to Strangers: Zero-Shot Emergent distinguishing characteristic of human languages, Communication, Bowling 11:02 AM as opposed to other natural and artifcial codes, I will look at some of the often unstated principles is a property I call information locality: common in multiagent learning research (and information about particular aspects of meaning emergent communication work too), suggesting is localized in time within a linguistic utterance. I that they may be responsible for holding us back. give evidence for information locality at multiple In response, I will ofer an alternative set of levels of linguistic structure, including the principles, which leads to the view of hindsight structure of words and the order of words in rationality, with connections to online learning sentences. and correlated equilibria. I will then describe Next, I state a theorem showing that information some recent technical work understanding how locality is an inevitable property of any we can build increasingly more powerful communication system where the encoder and/or algorithms for hindsight rationality in sequential decoder are operating under memory constraints. decision-making settings. The theorem yields a new, fully formal, and quantifable defnition of information locality, Speaker's Bio: which leads to new predictions about word order Michael Bowling is a professor at the University of and the structure of words across languages. I Alberta, a Fellow of the Alberta Machine test these predictions in broad corpus studies of Intelligence Institute, and a senior scientist in word order in over 50 languages, and in case DeepMind. Michael led the Computer Poker studies of the order of morphemes within words Research Group, which built some of the best in two languages. poker playing artifcial intelligence programs in 212 Dec. 12, 2020

Machine Intelligence (SVRHM) workshop is to disseminate relevant, parallel fndings in the Abstract 14: Poster Session 2 in Talking to felds of computational neuroscience, psychology, Strangers: Zero-Shot Emergent and cognitive science that may inform modern Communication, 12:15 PM machine learning. In the past few years, machine learning methods---especially deep neural Will be held in gather.town networks---have widely permeated the vision science, cognitive science, and neuroscience communities. As a result, scientifc modeling in Abstract 15: Chat about the Game in Talking these felds has greatly benefted, producing a to Strangers: Zero-Shot Emergent swath of potentially critical new insights into the Communication, 01:00 PM human mind. Since human performance remains

We will chat about the game and our insights into the gold standard for many tasks, these cross- the problem of zero-shot coordination after disciplinary insights and analytical tools may playing it. point towards solutions to many of the current problems that machine learning researchers face (e.g., adversarial attacks, compression, continual learning, and self-supervised learning). Thus we Abstract 16: Panel Discussion in Talking to propose to invite leading cognitive scientists with Strangers: Zero-Shot Emergent strong computational backgrounds to Communication, 01:15 PM disseminate their fndings to the machine learning community with the hope of closing the In this panel, we want to discuss the lessons we learned in this workshop about the possibilities loop by nourishing new ideas and creating cross- and necessities of learning to cooperate and disciplinary collaborations. In particular, this communicate with strangers. year's version of the workshop will have a heavy focus on the relative roles of larger datasets and stronger inductive biases as we work on tasks Guests: Angeliki Lazaridou that go beyond object recognition. Jakob Foerster Ruth Byrne Schedule Michael Bowling Richard Futrell 07:45 Arturo Deza, Josh Peterson, Ratan AM Murty, Tom Grifths

Abstract 18: After-Workshop Social in Talking 08:00 Martin Hebart to Strangers: Zero-Shot Emergent AM Communication, 02:10 PM 08:30 David Mayo AM Come together to discuss the workshop in our 09:00 Tim Kietzmann cozy gather.town bar AM 09:30 S.P. Arun AM 10:00 Robert Geirhos Shared Visual Representations in Human AM and Machine Intelligence (SVRHM) 10:15 Aviv Netanyahu AM Arturo Deza, Joshua Peterson, N Apurva Ratan Murty, Tom Grifths 10:30 Poster Session AM Sat Dec 12, 07:50 AM 11:30 Grace Lindsay AM https://twitter.com/svrhm2020 The goal of the 2nd Shared Visual Representations in Human and 213 Dec. 12, 2020

12:00 Leyla Isik program are available here: PM 12:30 Carlos Ponce https://neurips.cc/Conferences/2020/ PM CompetitionTrack 01:00 Aude Oliva PM Schedule 01:30 Salman Khan PM 01:45 Melanie Sclar 08:00 Introducing EfcientQA: Open PM AM domain question answering with memory constraints as a testbed for 02:00 Poster Session language understanding and PM knowledge representations 03:00 Bria Long Kwiatkowski PM 08:05 Track winner presentations: 03:30 Gamaleldin Elsayed AM Kwiatkowski PM 08:25 Showdown against trivia experts 04:00 Miguel Eckstein AM Boyd-Graber PM 09:00 Predicting Generalization in Deep 04:30 Alexei Efros AM Learning (PGDL): Opening remark PM Jiang 05:00 Arturo Deza, Josh Peterson, Ratan 09:06 Keynote speech: Sanjeev Arora PM Murty, Tom Grifths AM (PGDL) Arora, Jiang 09:16 Introduction to winning team (PGDL) AM Jiang Competition Track Saturday 09:17 Winning team presentation: On AM Representations and Generalization Hugo Jair Escalante, Katja Hofmann (PGDL) Jiang, Natekar, Sharma

Sat Dec 12, 08:00 AM 09:27 Introduction to Runner up 1 (PGDL) AM Jiang Second session for the competition program at 09:28 Runner up presentation: Robustness NeurIPS2020. AM to Augmentations as a Generalization Metric (PGDL) Aithal K Machine learning competitions have grown in 09:33 Introduction to Runner up 2 (PGDL) popularity and impact over the last decade, AM Jiang emerging as an efective means to advance the 09:34 Runner up presentation: Ranking state of the art by posing well-structured, AM generalization via smoothness of relevant, and challenging problems to the latent graphs (PGDL) Lassance community at large. Motivated by a reward or 09:39 Closing remark (PGDL) Jiang merely the satisfaction of seeing their machine AM learning algorithm reach the top of a 10:00 Openning and design of the leaderboard, practitioners innovate, improve, and AM INTERPRET challenge @ tune their approach before evaluating on a held- NeurIPS2020 Zhan out dataset or environment. The competition 10:15 Winner talks of INTERPRET track of NeurIPS has matured in 2020, its fourth AM challenge Ma year, with a considerable increase in both the 10:35 Analysis, research opportunities and number of challenges and the diversity of AM closing of INTERPRET challenge Sun domains and topics. A total of 16 competitions are featured this year as part of the track, with 8 11:00 NLC2CMD Competition Organizers: competitions associated to each of the two days. AM Introduction, Problem Description, The list of competitions that ar part of the CLAI Agarwal 214 Dec. 12, 2020

11:05 NLC2CMD Competition Keynote: 04:03 Introduction to the Procgen AM Tellina Lin, Agarwal, Chakraborti PM Benchmark Cobbe 11:15 NLC2CMD Competition Organizers: 04:12 NeurIPS 2020 Procgen Challenge AM Metrics, Data, Tracks Agarwal PM Design Mohanty 11:20 NLC2CMD Competition Participant 04:20 Winner Announcements & Analysis AM Team: AINixClaiSimple Gros PM of top submissions Mohanty 11:22 NLC2CMD Competition Participant 04:25 Sample Efciency & Generalization AM Team: coinse-team Yoon PM in RL : An assortment of tricks (talks 11:24 NLC2CMD Competition Participant by top participants) Mohanty AM Team: AICore Lee 04:44 Concluding Remarks Mohanty 11:26 NLC2CMD Competition Participant PM AM Team: magnum Fu 05:00 Introduction and results of the 2020 11:28 NLC2CMD Competition Participant PM MineRL Competition Guss, Milani, AM Team: Hubris Maene Topin 11:30 NLC2CMD Competition Participant AM Team: jb Litvinov Abstracts (2): 11:32 NLC2CMD Competition Organizers: AM Results (Live) Talamadupula Abstract 2: Track winner presentations: in 02:00 Introduction to AIDO Paull Competition Track Saturday, Kwiatkowski PM 08:05 AM 02:03 Short Scientifc Talk (AIDO) Di Lillo - smallest question answering system with 25% PM accuracy 02:05 Advanced Perception League Paull - best performing system smaller than 500 Mb PM - best performing system smaller than 6Gb 02:13 Intro to Urban League (includes - best performing system overall PM highlights from semifnals) Paull 02:19 Live robot competition (LF, LFP, PM lFVM) Paull Abstract 3: Showdown against trivia experts 02:34 Interviews with winners Paull in Competition Track Saturday, Boyd-Graber PM 08:25 AM 02:40 Conclusions and Wrap up Paull PM Five top human teams of trivia experts took on 03:00 Introduction - Flatland Mohanty the competition’s baseline systems for the PM opportunity to take on the computer systems in 03:05 Flatland Competition Design & each of the competition’s divisions. The team of PM Results Mohanty humans will compete against the computer on thirty questions from the test set. We will present 03:15 Winner Talks : Team An Old Driver highlights from the preliminary competition as PM Mohanty well as the fnal showdown between computer 03:19 Winner Talks : Team JBR_HSE Mohanty systems and the human teams. PM 03:23 Winner Talks : Team ai-team-fatland PM Mohanty 03:27 "Real world applications of Flatland" Machine Learning for Structural Biology PM : Panel Discussion with SBB,

DeutschBahn, SNCF Mohanty Raphael Townshend, Stephan Eismann, Ron Dror, 03:42 Concluding Remarks Mohanty Ellen Zhong, Namrata Anand, John Ingraham, Wouter PM Boomsma, Sergey Ovchinnikov, Roshan Rao, Per Greisen, Rachel Kolodny, Bonnie Berger 04:00 Introduction - Procgen Mohanty

PM Sat Dec 12, 08:00 AM 215 Dec. 12, 2020

Spurred on by recent advances in neural 12:01 Contributed Talk - Predicting modeling and wet-lab methods, structural PM Chemical Shifts with Graph Neural biology, the study of the three-dimensional (3D) Networks Yang atomic structure of proteins and other 12:11 Contributed Talk - Cryo-ZSSR: macromolecules, has emerged as an area of PM multiple-image super-resolution great promise for machine learning. The shape of based on deep internal learning macromolecules is intrinsically linked to their Huang, Chen, Rudin biological function (e.g., much like the shape of a 12:21 Contributed Talk - Wasserstein K- bike is critical to its transportation purposes), and PM Means for Clustering Tomographic thus machine learning algorithms that can better Projections Rao, Moscovich predict and reason about these shapes promise 12:30 Lunch + Panel Discussion on Future to unlock new scientifc discoveries in human PM of ML for Structural Biology (Starts health as well as increase our ability to design at 1pm) Townshend novel medicines. 02:01 Invited Talk - Possu Huang Huang PM Moreover, fundamental challenges in structural 02:20 Contributed talks intro Rao biology motivate the development of new PM learning systems that can more efectively capture physical inductive biases, respect natural 02:21 Contributed Talk - ProGen: Language symmetries, and generalize across atomic PM Modeling for Protein Generation systems of varying sizes and granularities. Madani, McCann, Naik, , Huang, Socher Through the Machine Learning in Structural 02:31 Contributed Talk - Biological Biology workshop, we aim to include a diverse PM structure and function emerge from range of participants and spark a conversation on scaling unsupervised learning to 250 the required representations and learning million protein sequences Rives, algorithms for atomic systems, as well as dive Goyal, Meier, Lin, Guo, Ott, Zitnick, deeply into how to integrate these with novel Fergus wet-lab capabilities. 02:41 Contributed Talk - SidechainNet: An PM All-Atom Protein Structure Dataset for Machine Learning King, Koes Schedule 02:51 Contributed Talk - Generating 3D PM Molecular Structures Conditional on 08:00 Opening Remarks Townshend a Receptor Binding Site with Deep AM Generative Models Masuda, Ragoza, 08:12 Keynote -- Michael Levitt Levitt Koes AM 03:01 Contributed Talk - Learning from 08:51 Invited Talk - Charlotte Deane: PM Protein Structure with Geometric AM Predicting the conformational Vector Jing, Eismann, ensembles of proteins Deane Suriana, Townshend, Dror 09:11 Invited Talk - Frank Noe: Deep 03:11 Afternoon Poster Session Rao AM Markov State Models versus PM Covid-19 Noe 04:11 Invited Talk - Mohammed AlQuraishi: 09:31 Invited Talk - Andrea Thorn: Finding PM (Nearly) end-to-end diferentiable AM Secondary Structure in Cryo-EM learning of protein structure maps: HARUSPEX Thorn AlQuraishi 09:50 Break 04:31 Invited Talk - Chaok Seok: Ab initio AM PM protein structure prediction by global optimization of neural 10:22 Keynote - David Baker: Rosetta network energy: Can AI learn AM design of COVID antivirals and physics? Seok diagnostics Baker 04:50 Concluding Remarks Townshend 11:00 Morning Poster Session Zhong PM AM 216 Dec. 12, 2020

05:00 Happy Hour Townshend 06:00 Combining variational autoencoder PM PM representations with structural 06:00 GEFA: Early Fusion Approach in descriptors improves prediction of PM Drug-Target Afnity Prediction docking scores Garcia Ortegon, Nguyen Minh, Nguyen, Le, Tran Rasmussen, Kajino 06:00 Sequence and stucture based deep 06:00 Protein model quality assessment PM learning models for the PM using rotation-equivariant, identifcation of peptide binding hierarchical neural networks sites Abdin, Wen Eismann, Suriana, Jing, Townshend, Dror 06:00 Learning a Continuous 06:00 Learning Super-Resolution Electron PM Representation of 3D Molecular PM Density Map of Proteins using 3D U- Structures with Deep Generative Net MULLICK, Wang, Barati Farimani Models Ragoza, Masuda, Koes 06:00 Fast and adaptive protein structure 06:00 MXMNet: A Molecular Mechanics- PM representations for machine PM Driven Neural Network Based on learning Durairaj, van Dijk Multiplex Graph for Molecules Zhang, 06:00 Pre-training Protein Language Liu PM Models with Label-Agnostic Binding 06:00 ESM-1b: Optimizing Evolutionary Pairs Enhances Performance in PM Scale Modeling Meier, Liu, Lin, Goyal, Downstream Tasks Filipavicius Ott, Rives 06:00 Design-Bench: Benchmarks for Data- 06:00 DHS-Crystallize: Deep-Hybrid- PM Driven Ofine Model-Based PM Sequence based method for Optimization Trabucco, Kumar, GENG, predicting protein Crystallization Levine Alavi 06:00 Designing a Prospective COVID-19 PM Therapeutic with Reinforcement Second Workshop on AI for Humanitarian Learning Lopez Carranza, PIERROT, Assistance and Disaster Response Phillips, Laterre, Kerkeni, Beguir

06:00 Is Transfer Learning Necessary for Ritwik Gupta, Robin Murphy, Eric Heim, Zhangyang PM Protein Landscape Prediction? Wang, Bryce Goodman, Nirav Patel, Piotr Bilinski, Belanger, Dohan Edoardo Nemni 06:00 Cross-Modality Protein Embedding Sat Dec 12, 08:00 AM PM for Compound-Protein Afnity and Contact Prediction You, Shen Natural disasters are one of the oldest threats to 06:00 Exploring generative atomic models both individuals and the societies they co-exist PM in cryo-EM reconstruction Zhong, in. As a result, humanity has ceaselessly sought Lerer, , Berger way to provide assistance to people in need after 06:00 Profle Prediction: An Alignment- disasters have struck. Further, natural disasters PM Based Pre-Training Task for Protein are but a single, extreme example of the many Sequence Models Vig, Madani possible humanitarian crises. Disease outbreak, 06:00 Conservative Objective Models: A famine, and oppression against disadvantaged PM Simple Approach to Efective Model- groups can pose even greater dangers to people Based Optimization Trabucco, Kumar, that have less obvious solutions. In this proposed GENG, Levine workshop, we seek to bring together the Artifcial 06:00 The structure-ftness landscape of Intelligence (AI) and Humanitarian Assistance and PM pairwise relations in generative Disaster Response (HADR) communities in order sequence models marshall, Koo, to bring AI to bear on real-world humanitarian Ovchinnikov crises. Through this workshop, we intend to establish meaningful dialogue between the communities. 217 Dec. 12, 2020

By the end of the workshop, the NeurIPS research 04:00 Spotlight Talk: Boin Boin community can come to understand the practical PM challenges of aiding those who are experiencing 04:15 Spotlight Talk: Jain Jain crises, while the HADR community can PM understand the landscape that is the state of art 04:30 Break (with Zoom session) and practice in AI. Through this, we seek to begin PM establishing a pipeline of transitioning the 04:45 Spotlight Talk: Rodriguez Rodriguez research created by the NeurIPS community to PM real-world humanitarian issues. 05:00 Spotlight Talk: Ebrahimi Kahou PM Ebrahimi Kahou Schedule 05:15 Spotlight Talk: Nevo Nevo PM 05:30 Live Q&A 08:00 Introduction and Welcome Gupta PM AM 08:15 Invited Talk: Duncan Duncan AM 08:45 Invited Talk: Vaughan Vaughan Consequential Decisions in Dynamic AM Environments 09:15 Invited Talk: SeLegue SeLegue AM Niki Kilbertus, Angela Zhou, Ashia Wilson, John Miller, Lily Hu, Lydia T. Liu, Nathan Kallus, Shira 09:45 Invited Talk: Yun Yun Mitchell AM 10:15 Break (with Zoom session) Sat Dec 12, 08:00 AM AM 10:45 Spotlight Talk: Tingzon Tingzon Machine learning is rapidly becoming an integral AM component of sociotechnical systems. Predictions are increasingly used to grant benefcial 11:00 Spotlight Talk: Chen Chen resources or withhold opportunities, and the AM consequences of such decisions induce complex 11:15 Spotlight Talk: Huot Huot social dynamics by changing agent outcomes and AM prompting individuals to proactively respond to 11:30 Spotlight Talk: Lee Lee decision rules. This introduces challenges for AM standard machine learning methodology. Static 11:45 Lunch (with Zoom Session) measurements and training sets poorly capture AM the complexity of dynamic interactions between 01:00 Invited Talk: Bjørgo Bjørgo algorithms and humans. Strategic adaptation to PM decision rules can render statistical regularities 01:30 Invited Talk: Rahnemoonfar obsolete. Correlations momentarily observed in PM Rahnemoonfar data may not be robust enough to support 02:00 Invited Talk: Ye Ye interventions for long-term welfaremits of PM traditional, static approaches to decision-making, 02:30 Invited Talk: Rao and Dzombak Rao, researchers in felds ranging from public policy to PM Dzombak computer science to economics have recently 03:00 Break (with Zoom session) begun to view consequential decision-making PM through a dynamic lens. This workshop will confront the use of machine learning to make 03:30 Spotlight Talk: Tsai Tsai consequential decisions in dynamic PM environments. Work in this area sits at the nexus 03:45 Spotlight Talk: Benson Benson of several diferent felds, and the workshop will PM provide an opportunity to better understand and synthesize social and technical perspectives on 218 Dec. 12, 2020

these issues and catalyze conversations between 12:30 Invited Talk 5: What are some researchers and practitioners working across PM hurdles before we can attempt these diverse areas. machine learning? Examples from the Public and Non-Proft Sector Iwata Schedule 12:50 Invited Talk 6: Unexpected PM Consequences of Algorithm-in-the- 08:00 Welcome and introduction Loop Decision Making Chen AM 01:13 Invited Talk 7: Prediction Dynamics 08:10 Invited Talk 1: What do we want? PM Hardt AM And when do we want it? Alternative 01:35 Q&A for invited talks 5, 6, 7 objectives and their implications for PM experimental design. Kasy 01:50 Break 3 08:30 Invited Talk 2: Country-Scale Bandit PM AM Implementation for Targeted 02:20 Contributed Talk 4: Strategic COVID-19 Testing Bastani PM Recourse in Linear Classifcation 08:50 Q&A for invited talks 1&2 Chen, Liu AM 02:25 Contributed Talk 5: Performative 09:00 Poster Session 1 PM Prediction in a Stateful World Hod AM 02:30 Contributed Talk 6: Do Ofine 10:00 Break 1 PM Metrics Predict Online Performance AM in Recommender Systems? Krauth, 10:20 Introduction of invited speakers 3, 4 Dean, Guo, Recht, Jordan AM 02:35 Q&A for contributed talks 4, 5, 6 10:30 Invited Talk 3: Modeling the PM AM Dynamics of Poverty Abebe 02:45 Poster Session 2 10:50 Invited Talk 4: From Moderate PM AM Deviations Theory to Distributionally 03:45 Wrap up Robust Optimization: Learning from PM Correlated Data Kuhn 11:10 Q&A for invited talks 3, 4 Abstracts (6): AM 11:20 Contributed Talk 1: Fairness Under Abstract 2: Invited Talk 1: What do we want? AM Partial Compliance Dai, Lipton And when do we want it? Alternative 11:25 Contributed Talk 2: Better Together? objectives and their implications for AM How Externalities of Size Complicate experimental design. in Consequential Notions of Solidarity and Actuarial Decisions in Dynamic Environments, Kasy Fairness Donahue, Barocas 08:10 AM 11:30 Contributed Talk 3: Algorithmic This talk will be based, in particular, on the AM Recourse: from Counterfactual following two papers: Explanations to Interventions Karimi, Schölkopf, Valera Adaptive treatment assignment in experiments 11:35 Q&A for contributed talks 1,2,3 for policy choice AM (joint with Anja Sautmann) 11:45 Break 2 Forthcoming, Econometrica, 2020 AM Manuscript: https://maxkasy.github.io/home/fles/ 12:20 Introduction of invited speakers 5, papers/adaptiveexperimentspolicy.pdf PM 6, 7 An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan 219 Dec. 12, 2020

(joint with Stefano Caria, Grant Gordon, Soha poverty-alleviation programs. Osman, Simon Quinn and Alex Teytelboym) Working paper, 2020 Manuscript: https://maxkasy.github.io/home/fles/ To bridge this gap, we present a model of papers/RefugeesWork.pdf economic welfare that incorporates dynamic experiences with shocks and pose a set of algorithmic questions related to subsidy

Abstract 3: Invited Talk 2: Country-Scale allocations. We then computationally analyze the Bandit Implementation for Targeted impact of shocks on poverty using a longitudinal, COVID-19 Testing in Consequential survey-based dataset. We reveal insights about the multi-faceted and dynamic nature of shocks Decisions in Dynamic Environments, Bastani and poverty. We discuss how these insights can 08:30 AM inform the design of poverty-alleviation programs In collaboration with the Greek government, we and highlight directions at this emerging use machine learning to manage the threat of interface of algorithms, economics, and social COVID-19. With tens of thousands of international work. visitors every day, Greece cannot test each visitor to ensure that they are not a carrier of COVID-19. We developed a bandit policy that Abstract 9: Invited Talk 4: From Moderate balances allocating scarce tests to (i) Deviations Theory to Distributionally continuously monitor the dynamic infection risk Robust Optimization: Learning from of passengers from diferent locations Correlated Data in Consequential Decisions (exploration), and (ii) preferentially target risky in Dynamic Environments, Kuhn 10:50 AM tourist profles for testing (exploitation). Our solution is currently deployed across all ports of We aim to learn a performance function of the entry to Greece. I will describe a number of invariant state distribution of an unknown linear technical challenges, including severely dynamical system based on a single trajectory of imbalanced outcomes, batched/delayed correlated state observations. The function to be feedback, high-dimensional arms, port-specifc learned may represent, for example, an testing constraints, and transferring knowledge identifcation objective or a value function. To this from (unreliable) public epidemiological data. end, we develop a distributionally robust Joint work with Kimon Drakopoulos, Vishal Gupta estimation scheme that evaluates the worst- and and Jon Vlachogiannis. best-case values of the given performance function across all stationary state distributions that are sufciently likely to have generated the observed state trajectory. By leveraging new Abstract 8: Invited Talk 3: Modeling the insights from moderate deviations theory, we Dynamics of Poverty in Consequential Decisions in Dynamic Environments, Abebe prove that our estimation scheme ofers 10:30 AM consistent upper and lower confdence bounds whose exponential convergence rate can be The dynamic nature of poverty presents a actively tuned. In the special case of a quadratic challenge in designing efective assistance cost, we show that the proposed confdence policies. A signifcant gap in our understanding of bounds can be computed efciently by solving poverty is related to the role of income shocks in Riccati equations. triggering or perpetuating cycles of poverty. Such shocks can constitute unexpected expenses -- such as a medical bill or a parking ticket -- or an Abstract 17: Invited Talk 5: What are some interruption to one’s income fow. Shocks have hurdles before we can attempt machine recently garnered increased public attention, in learning? Examples from the Public and part due to prevalent evictions and food Non-Proft Sector in Consequential insecurity during the COVID-19 pandemic. Decisions in Dynamic Environments, Iwata However, shocks do not play a corresponding 12:30 PM central role in the design and evaluation of 220 Dec. 12, 2020

Machine learning and predictive analytics are HAMLETS: Human And Model in the Loop more accessible to the public and nonproft space Evaluation and Training Strategies now more than ever. Local government and nonprofts strive to leverage these new Divyansh Kaushik, Bhargavi Paranjape, Forough technologies to improve outcomes, performance, Arabshahi, Yanai Elazar, Yixin Nie, Max Bartolo, and operations. While a willingness to collaborate Polina Kirichenko, Pontus Lars Erik Saito Stenetorp, Mohit Bansal, Zachary Lipton, Douwe Kiela and connect on common goals though a shared understanding of data needs serves to build Sat Dec 12, 08:15 AM towards a stronger culture around data, the complexities around defning critical terms in Human involvement in AI system design, dynamic environments pose signifcant hurdles to development, and evaluation is critical to ensure be able to scale any machine learning for large that the insights being derived are practical, and cross-departmental initiatives in service of the the systems built are meaningful, reliable, and public. I will share examples from my professional relatable to those who need them. Humans play work in NYC government, and probe into an integral role in all stages of machine learning challenges with data-driven processes, development, be it during data generation, consensus-based motivations and outcomes. interactively teaching machines, or interpreting, evaluating and debugging models. With growing interest in such “human in the loop” learning, we Abstract 18: Invited Talk 6: Unexpected aim to highlight new and emerging research Consequences of Algorithm-in-the-Loop opportunities for the ML community that arise Decision Making in Consequential Decisions from the evolving needs to design evaluation and in Dynamic Environments, Chen 12:50 PM training strategies for humans and models in the loop. The specifc focus of this workshop is on The rise of machine learning has fundamentally emerging and under-explored areas of human- altered decision making: rather than being made and model-in-the-loop learning, such as solely by people, many important decisions are employing humans to seek richer forms of now made through an “algorithm-in-the-loop” feedback for data than labels alone, learning from process where machine learning models inform dynamic adversarial data collection with humans people. Yet insufcient research has considered employed to fnd weaknesses in models, learning how the interactions between people and models from human teachers instructing computers actually infuence human decision making. In this through conversation and/or demonstration, talk, I’ll discuss results from a set of controlled investigating the role of humans in model experiments on algorithm-in-the-loop human interpretability, and assessing social impact of ML decision making in two contexts (pretrial release systems. This workshop aims to bring together and fnancial lending). For example, when interdisciplinary researchers from academia and presented with algorithmic risk assessments, our industry to discuss major challenges, outline study participants exhibited additional bias in recent advances, and facilitate future research in their decisions and showed a change in their these areas. decision-making process by increasing risk aversion. These results highlight the urgent need to expand our analyses of algorithmic decision Schedule making aids beyond evaluating the models themselves to investigating the full 08:15 Opening Remarks Kaushik, Paranjape, sociotechnical contexts in which people and AM Kiela algorithms interact. 08:30 Invited Talk by Tom Mitchell This talk is based on joint work with Ben Green. AM (Carnegie Mellon) Kaushik 08:55 Break Kaushik AM 09:00 Invited Talk by Jenn Wortman AM Vaughan (Microsoft Research) Kaushik 221 Dec. 12, 2020

09:25 Break Kaushik International Workshop on Scalability, AM Privacy, and Security in Federated 09:30 Invited Talk by Sanjoy Dasgupta Learning (SpicyFL 2020) AM (UCSD) Kaushik

09:55 Break Kaushik Xiaolin Andy Li, Dejing Dou, Ameet Talwalkar, AM Hongyu Li, Jianzong Wang, Yanzhi Wang 10:00 Invited Talk by Finale Doshi-Velez Sat Dec 12, 08:20 AM AM (Harvard) Kaushik 10:25 Q & A and Panel Session with Tom In the recent decade, we have witnessed rapid AM Mitchell, Jenn Wortman Vaughan, progress in machine learning in general and deep Sanjoy Dasgupta, and Finale Doshi- learning in particular, mostly driven by Velez Mitchell, Wortman Vaughan, tremendous data. As these intelligent algorithms, Dasgupta, Doshi-Velez, Lipton systems, and applications are deployed in real- 11:30 Poster Session 1 world scenarios, we are now facing new AM challenges, such as scalability, security, privacy, 12:45 Break (Or meet on GatherTown) trust, cost, regulation, and environmental and PM societal impacts. In the meantime, data privacy 01:55 Break Kaushik and ownership has become more and more PM critical in many domains, such as fnance, health, 02:00 Invited Talk by Dan Weld (University government, and social networks. Federated PM of Washington) Kaushik learning (FL) has emerged to address data 02:25 Break Kaushik privacy issues. To make FL practically scalable, PM useful, efcient, and efective on security and privacy mechanisms and policies, it calls for joint 02:30 Invited Talk by Kristen Grauman (UT eforts from the community, academia, and PM Austin, FAIR) Kaushik industry. More challenges, interplays, and 02:55 Break Kaushik tradeofs in scalability, privacy, and security need PM to be investigated in a more holistic and 03:00 Invited Talk by Scott Yih (FAIR) comprehensive manner by the community. We PM Kaushik are expecting broader, deeper, and greater 03:25 Break Kaushik evolution of these concepts and technologies, PM and confuence towards holistic trustworthy AI 03:30 Invited Talk by Emma Brunskill ecosystems. PM (Stanford) Kaushik 03:55 Break Kaushik This workshop provides an open forum for PM researchers, practitioners, and system builders to 04:00 Invited Talk by Alex Ratner (Snorkel, exchange ideas, discuss, and shape roadmaps PM University of Washington) Kaushik towards scalable and privacy-preserving 04:25 Q & A and Panel Session with Dan federated learning in particular, and scalable and PM Weld, Kristen Grauman, Scott Yih, trustworthy AI ecosystems in general. Emma Brunskill, and Alex Ratner Grauman, Yih, Ratner, Brunskill, Kiela, Schedule Weld 05:30 Poster Session 2 PM 08:20 Opening Remarks Li 06:45 Closing Remarks AM PM 08:30 Keynote Talk 1: Dawn Song 07:00 Social/Get-Together on GatherTown AM PM 222 Dec. 12, 2020

09:00 A Better Alternative to Error 03:00 Break AM Feedback for Communication- PM Efcient Distributed Learning, 03:30 Keynote Talk 6: Tao Yang Samuel Horváth and Peter Richtárik PM 09:15 Backdoor Attacks on Federated 04:00 Lightning Talk Session 3: 10 papers, AM Meta-Learning, Chien-Lun Chen, PM 2m each Leana Golubchik and Marco Paolieri 04:20 Keynote Talk 7: Tong Zhang 09:30 FedBE: Making Bayesian Model PM AM Ensemble Applicable to Federated 04:50 Lightning Talk Session 4: 5 papers, Learning, Hong-You Chen and Wei- PM 2m each Lun Chao 05:00 Panel Discussion 09:45 Preventing Backdoors in Federated PM AM Learningby Adjusting Server-side 06:00 Poster Session 2 (Papers presented Learning Rate, Mustafa Ozdayi, PM in the afternoon) Murat Kantarcioglu and Yulia Gel 07:00 Closing Remarks Li 10:00 Keynote Talk 2: H. Brendan PM AM McMahan 10:30 Lightning Talk Session 1: 10 papers, AM 2m each Abstracts (3): 10:50 Keynote Talk 3: Ruslan Abstract 1: Opening Remarks in International AM Salakhutdinov Workshop on Scalability, Privacy, and 11:20 FedML: A Research Library and Security in Federated Learning (SpicyFL AM Benchmark for Federated Machine 2020), Li 08:20 AM Learning, Chaoyang He, et. al. 11:35 Learning to Attack Distributionally Introductory comments by the organizers. AM Robust Federated Learning, Wen Shen, Henger Li and Zizhan Zheng 11:50 Keynote Talk 4: Virginia Smith Abstract 10: FedML: A Research Library and AM Benchmark for Federated Machine 12:20 Lightning Talk Session 2: 8 papers, Learning, Chaoyang He, et. al. in PM 2m each International Workshop on Scalability, 12:36 Poster Session 1 Privacy, and Security in Federated Learning PM (SpicyFL 2020), 11:20 AM 01:30 Keynote Talk 5: John C. Duchi Chaoyang He, Songze Li, Jinhyun So, Mi Zhang, PM Xiao Zeng, Hongyi Wang, Xiaoyang Wang, 02:00 On Biased Compression for Praneeth Vepakomma, Abhishek Singh, Hang Qiu, PM Distributed Learning, Aleksandr Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Beznosikov, Samuel Horváth, Mher Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Safaryan and Peter Richtarik Murali Annavaram and Salman Avestimehr 02:15 PAC Identifability in Federated PM Personalization, Ben 02:30 Model Pruning Enables Efcient Abstract 27: Closing Remarks in International PM Federated Learning on Edge Workshop on Scalability, Privacy, and Devices, Yuang Jiang, Shiqiang Security in Federated Learning (SpicyFL Wang, Victor Valls, Bong Jun Ko, 2020), Li 07:00 PM Wei-Han Lee, Kin Leung and Leandros Tassiulas Comments by the organizers 02:45 Hybrid FL: Algorithms and PM Implementation, Xinwei Zhang, Tianyi Chen, Mingyi Hong and Wotao Yin 223 Dec. 12, 2020

Workshop on Computer Assisted 09:10 Roopsha Samanta Talk Samanta Programming (CAP) AM 09:40 Spotlight Session 1 Odena, Nye, Augustus Odena, Charles Sutton, Nadia Polikarpova, AM Shrivastava, Agarwal, Hellendoorn, Josh Tenenbaum, Armando Solar-Lezama, Isil Dillig Sutton 10:10 Poster Session 1 Sat Dec 12, 08:30 AM AM

There are many tasks that could be automated 11:00 Swarat Chaudhuri Talk Chaudhuri by writing computer programs, but most people AM don’t know how to program computers (this is the 11:30 Elena Glassman Talk Glassman subject of program synthesis, the study of how to AM automatically write programs from user 12:00 Spotlight Session 2 Odena, Shi, Bieber, specifcations). Building tools for doing computer- PM Alet, Sutton, Iyer assisted-programming could thus improve the 12:30 Kevin Ellis Talk Ellis lives of many people (and it’s also a cool research PM problem!). There has been substantial recent 01:00 Poster Session 2 interest in the ML community in the problem of PM automatically writing computer programs from 02:30 Satish Chandra Talk Chandra, Odena, user specifcations, as evidenced by the PM Sutton increased volume of Program Synthesis 03:00 Xinyun Chen Talk Chen submissions to ICML, ICLR, and NeurIPS. PM 03:30 Panel Odena, Sutton, Samanta, Chen, Despite this recent work, a lot of exciting PM Glassman questions are still open, such as how to combine 04:00 closing talk Odena, Sutton symbolic reasoning over programs with deep PM learning, how to represent programs and user specifcations, and how to apply program synthesis within computer vision, robotics, and Abstracts (7): other control problems. There is also work to be done on fusing work done in the ML community Abstract 2: Sumit Gulwani Talk in Workshop with research on Programming Languages (PL) on Computer Assisted Programming (CAP), through collaboration between the ML and PL Gulwani 08:40 AM communities, and there remains the challenge of establishing benchmarks that allow for easy Title: New directions in Programming by comparison and measurement of progress. The Examples aim of the CAP workshop is to address these points. This workshop will bring together researchers in programming languages, machine learning, and related areas who are interested in Abstract: Programming by examples (PBE) program synthesis and other methods for involves synthesizing programs in an underlying automatically writing programs from a domain-specifc language from input-output specifcation of intended behavior. examples. Our journey in developing usable PBE systems has motivated two kinds of advances: (a) development of algorithms that can Schedule synthesize intended programs in real-time and from very few examples, (b) variants of the classical PBE problem including predictive 08:30 Welcome Talk Odena synthesis and modeless synthesis. AM 08:40 Sumit Gulwani Talk Gulwani AM We have leveraged logical reasoning techniques 224 Dec. 12, 2020

and its integration with machine learning techniques to develop efective PBE solutions for some domains including string/datatype transformations, table extraction from semi- Unfortunately, inductive synthesis engines structured documents (e.g., custom text fles, encounter challenges like overftting, ambiguity, webpages, PDF), and repetitive edits in code. and brittleness, similar to other inductive learning These solutions have shipped inside various engines. PL researchers have typically attacked mass-market products including Excel, PowerBI, these problems by applying syntactic biases to Visual Studio, and Sql Server Management the search space in the form of tailored domain- Studio. In this talk, I will describe these specifc languages, grammars and ranking applications, technical advances, and the form functions. In this talk, I will show how one can factors inside diferent products. further enhance the generalizability and robustness of such synthesis engines by applying semantic biases to the search space.

Bio: Sumit Gulwani is a computer scientist connecting ideas, research & practice, and (with) people with varied roles. He invented the popular Bio: Flash Fill feature in Excel and has shipped program synthesis innovations across multiple Roopsha Samanta is an Assistant Professor the Microsoft products (Ofce, SQL, Visual Studio, Department of Computer Science at Purdue Powershell, PowerQuery), having authored 65+ University. She leads the Purdue Formal Methods patent applications. He has co-authored 10 award (PurForM) group and is a member of the Purdue winning papers (including test-of-time awards Programming Languages (PurPL) group. Before from ICSE and POPL) amongst 130+ research joining Purdue in 2016, she completed her PhD at publications across multiple computer science UT Austin in 2013, advised by E. Allen Emerson areas and delivered 50+ keynotes/invited talks. and Vijay K. Garg, and was a postdoctoral He has received the Robin Milner Young researcher at IST Austria from 2014-2016 with Researcher Award, ACM SIGPLAN Outstanding Thomas A. Henzinger. She is a recipient of the Doctoral Dissertation Award (PhD from UC- 2019 NSF CAREER award. Berkeley), and President’s Gold Medal from IIT Kanpur.

Her research interests are in program verifcation, Abstract 3: Roopsha Samanta Talk in program synthesis, and concurrency. She likes to work at the intersection of formal methods and Workshop on Computer Assisted programming languages to develop frameworks Programming (CAP), Samanta 09:10 AM to assist programmers write reliable programs. MANTIS: SEMANTICS-GUIDED INDUCTIVE Her current research agenda is centered around PROGRAM SYNTHESIS two themes—formal reasoning about distributed systems and semantics-guided inductive program synthesis.

The dream of classical program synthesis is to generate programs from complete, formal specifcations of their expected behavior. An https://www.cs.purdue.edu/homes/roopsha/ increasingly favored paradigm of synthesis is inductive program synthesis, where specifcations of program behavior are provided in the form of Abstract 6: Swarat Chaudhuri Talk in examples. Inductive program synthesis not only Workshop on Computer Assisted helps make program synthesis more tractable, Programming (CAP), Chaudhuri 11:00 AM but also has the potential to democratize programming! 225 Dec. 12, 2020

Neural Attribute Grammars for Semantics-Guided Science Foundation CAREER award, the ACM Program Generation SIGPLAN John Reynolds Doctoral Dissertation Swarat Chaudhuri Award, and the Morris and Dorothy Rubinof UT Austin Dissertation Award from the University of Pennsylvania. Abstract:

I will talk about Neural Attribute Grammars Abstract 7: Elena Glassman Talk in Workshop (NAG), a framework for deep statistical on Computer Assisted Programming (CAP), generation of source code modulo language-level Glassman 11:30 AM semantic requirements (such as type safety or initialization of variables before use). Neural Title models for source code have received signifcant Increasing the Power of [Human+Program attention in the recent past. However, these Synthesis] through Interface Design models tend to be trained on syntactic program representations, and consequently, often Abstract generate programs that violate essential Program synthesis is a powerful tool for semantic invariants. In contrast, the NAG generating programs, but in the hands of users, framework exposes the semantics of the target its potential can be severely limited by language to the training procedure for the neural unanticipated usability obstacles. In this talk, I model using attribute grammars. During training, will describe several key usability obstacles and the model learns to replicate the relationship new synthesis-powered interaction mechanisms between the syntactic rules used to construct a that help users get past these obstacles to their program, and the semantic attributes (for goal: a program that behaves the way they want example, symbol tables) of the context in which it to. the rule is fred. In the talk, I will give some concrete examples of NAGs and show how to use Updated Bio them in the conditional generation of Java Elena Glassman is an Assistant Professor of programs. I will demonstrate that these NAGs Computer Science at the Harvard Paulson School generate semantically "sensible" programs with of Engineering & Applied Sciences and the signifcantly higher frequency than traditional Stanley A. Marks & William H. Marks Professor at neural models of source code. the Radclife Institute for Advanced Study, specializing in human-computer interaction. At (This talk is based on joint work with Rohan MIT, she earned a PhD and MEng in Electrical Mukherjee, Chris Jermaine, Tom Reps, Dipak Engineering and Computer Science and a BS in Chaudhari, and Matt Amodio.) Electrical Science and Engineering. Before joining Harvard, she was a postdoctoral scholar in Bio: Swarat Chaudhuri is an Associate Professor Electrical Engineering and Computer Science at of computer science at the University of Texas at the University of California, Berkeley, where she Austin. His research studies topics in the received the Berkeley Institute for Data Science intersection of machine learning and Moore/Sloan Data Science Fellowship. programming languages, including program induction, probabilistic programming, neurosymbolic programming, programmatically Abstract 9: Kevin Ellis Talk in Workshop on interpretable/explainable learning, learning- Computer Assisted Programming (CAP), Ellis accelerated formal reasoning, and formally 12:30 PM certifed learning. Swarat received a bachelor's degree from the Indian Institute of Technology, Title: Kharagpur, in 2001, and a doctoral degree from Growing generalizable, interpretable knowledge the University of Pennsylvania in 2007. Before with wake-sleep program learning joining UT Austin, he held faculty positions at Rice University and the Pennsylvania State Abstract: University. He is a recipient of the National Two challenges in engineering program synthesis 226 Dec. 12, 2020

systems are: (1) crafting specialized yet only computationally expressive domain specifc languages, and (2) expensive, but the suggested may not look designing search algorithms that can tractably natural to a developer. We explore the space of expressions in this domain present Getafx, a tool that ofers readable bug specifc language. We take a step toward the joint fxes without learning of domain specifc languages, and the requiring massive computational resources. search algorithms performs synthesis in that Getafx learns from your bug language. We propose an algorithm which starts fx history. It extracts past code changes that with a relatively minimal domain specifc fxed bugs and learns, language, and then enriches that language by in an of-line phase, a set of templates from those compressing out common syntactic patterns into fxes. As new bug a library of reusable domain specifc code. In reports appear, Getafx uses these templates to tandem, the system trains a neural network to create and rank a set guide search over expressions in the growing of suggestions in mere seconds, as well as ofer language. From a machine learning perspective, fxes that resemble this system implements a wake-sleep algorithms human-made fxes. At , Getafx has similar to the Helmholtz machine. We apply this been used to auto-fx bugs algorithm to AI and program synthesis problems, reported by static analysis tools like Infer. with the goal of understanding how domain specifc languages and neural program synthesizers can mutually bootstrap one another. Abstract 12: Xinyun Chen Talk in Workshop on Computer Assisted Programming (CAP), Chen 03:00 PM Bio: Kevin Ellis works across program synthesis and artifcial intelligence. His focuses on using Title: Deep Learning for Program Synthesis from machine learning to develop better program Input-Output Examples synthesis algorithms, and on applications of program synthesis to graphics and natural Abstract: There has been an emerging interest in language. He recently fnished his PhD at MIT applying machine learning-based techniques, coadvised by Josh Tenenbaum and Armando especially deep neural networks, for program Solar-Lezama, and is working as a research synthesis. However, because of some unique scientist at Common Sense Machines before characteristics of the program domain, directly starting as an assistant professor at Cornell in applying deep learning techniques developed for summer 2021. other applications is generally inappropriate. In this talk, I will present my work on program synthesis from input-output examples, aiming at synthesizing programs with higher complexity Abstract 11: Satish Chandra Talk in Workshop on Computer Assisted Programming (CAP), and better generalization. I will frst discuss our Chandra, Odena, Sutton 02:30 PM work on execution-guided synthesis, where we develop approaches to leverage the execution Title: Automatic Program Repair using Getafx results of both partial and full programs. In the second part of my talk, I will discuss our work on Abstract: Developers spend a signifcant amount neural-symbolic architectures for compositional of their time fxing generalization. bugs. Fixes often are repetitive, so it appears that some portion of this work should be automated. Indeed, some recent approaches ofer automation, but these typically explore a large space of potential fxes by making varying combinations of mutations, trying them all until one that passes the test suite. This is not 227 Dec. 12, 2020

The Challenges of Real World 10:40 Mini-panel discussion 1 - Bridging Reinforcement Learning AM the gap between theory and practice Tamar, Brunskill, Springenberg, Daniel Mankowitz, Gabriel Dulac-Arnold, Shie Gottesman, Mankowitz Mannor, Omer Gottesman, Anusha Nagabandi, Doina 11:20 Poster session 1 Precup, Timothy A Mann, Gabe Dulac-Arnold AM

Sat Dec 12, 08:30 AM 11:50 Keynote: Franziska Meier AM Reinforcement Learning (RL) has had numerous 12:30 Keynote: Marc Raibert, Scott successes in recent years in solving complex PM Kuindersma problem domains. However, this progress has 01:10 Mini-panel discussion 2 - Real World been largely limited to domains where a PM RL: An industry perspective Meier, simulator is available or the real environment is Dulac-Arnold, Mannor, Mann quick and easy to access. This is one of a number 01:50 Lunch of challenges that are bottlenecks to deploying PM RL agents on real-world systems. Two recent 03:20 Spotlight Talks papers identify nine important challenges that, if PM solved, will take a big step towards enabling RL 04:00 Keynote: Tom Diettrich Dietterich agents to be deployed to real-world systems PM (Dulac et. al. 2019, 2020).The goals of this 04:40 Keynote: Chelsea Finn Finn workshop are four-fold: (1) Providing a forum for PM researchers in academia, industry researchers as 05:20 Mini-panel discussion 3 - Prioritizing well as industry practitioners from diverse PM Real World RL Challenges Finn, backgrounds to discuss the challenges faced in Dietterich, Schoellig, Dragan, Nagabandi, real-world systems; (2) discuss and prioritize the Precup nine research challenges. This includes determining which challenges we should focus on 06:00 Poster session 2 next, whether any new challenges should be PM added to the list or existing ones removed from 06:30 Keynote: Angela Schoellig Schoellig this list; (3) Discuss problem formulations for the PM various challenges and critique these 07:10 Keynote: Anca Dragan Dragan formulations or develop new ones. This is PM especially important for more abstract challenges such as explainability. We should also be asking Abstracts (14): ourselves whether the current Markov Decision Process (MDP) formulation is sufcient for solving Abstract 2: Keynote: Aviv Tamar in The these problems or whether modifcations need to Challenges of Real World Reinforcement be made. (4) Discuss approaches to solving Learning, Tamar 08:40 AM combinations of these challenges. Real World RL Challenges

Schedule

Abstract 3: Keynote: Emma Brunskill in The 08:30 Introduction and Overview Challenges of Real World Reinforcement AM Mankowitz, Dulac-Arnold Learning, Brunskill 09:20 AM 08:40 Keynote: Aviv Tamar Tamar More practical Batch Ofine Reinforcement AM Learning 09:20 Keynote: Emma Brunskill Brunskill AM 10:00 Keynote: Jost Tobias Springenberg AM Springenberg Abstract 4: Keynote: Jost Tobias Springenberg in The Challenges of Real 228 Dec. 12, 2020

World Reinforcement Learning, Springenberg If you intend to attend the 3rd mini-panel session, 10:00 AM we encourage you to watch the talks of Anca Dragan and Angela Schoellig during lunch as Challenges for RL in Robotics their keynote talks will only occur *after* the mini-panel session. Thus, if you want to ask them questions, please take the time to watch the talks Abstract 6: Poster session 1 in The now. Challenges of Real World Reinforcement Learning, 11:20 AM Abstract 11: Spotlight Talks in The You can now chat to the paper authors by clicking the above Gather.town link Challenges of Real World Reinforcement Learning, 03:20 PM Links to individual poster presentations can be We have 4 spotlight talks. These talks can be found here: https://sites.google.com/corp/view/ found at the following link: https:// neurips2020rwrl#h.ey6lwdtrdt7c sites.google.com/corp/view/neurips2020rwrl#h. 9w5kdo7eecim

Abstract 7: Keynote: Franziska Meier in The Challenges of Real World Reinforcement Abstract 12: Keynote: Tom Diettrich in The Learning, 11:50 AM Challenges of Real World Reinforcement Challenges of Model-based Inverse Reinforcement Learning, Dietterich 04:00 PM Learning Applying RL to Ecosystem Management: Lessons Learned

Abstract 8: Keynote: Marc Raibert, Scott

Kuindersma in The Challenges of Real World Abstract 13: Keynote: Chelsea Finn in The Reinforcement Learning, 12:30 PM Challenges of Real World Reinforcement Learning, Finn 04:40 PM Boston Dynamics Reinforcement Learning for Real Robots

Abstract 9: Mini-panel discussion 2 - Real World RL: An industry perspective in The Abstract 15: Poster session 2 in The Challenges of Real World Reinforcement Challenges of Real World Reinforcement Learning, Meier, Dulac-Arnold, Mannor, Mann Learning, 06:00 PM 01:10 PM You can now chat to the paper authors by clicking The following speakers that will be at this event the above Gather.town link do not have Neurips profles: Franziska Meier - [email protected] Links to individual poster presentations can be Marc Reibert - [email protected] found here: https://sites.google.com/corp/view/ neurips2020rwrl#h.ey6lwdtrdt7c Scott Kuindersma - [email protected]

Abstract 16: Keynote: Angela Schoellig in The Challenges of Real World Reinforcement Abstract 10: Lunch in The Challenges of Real Learning, Schoellig 06:30 PM World Reinforcement Learning, 01:50 PM Machine Learning for Safety-Critical Robotics Enjoy your lunch break. Applications 229 Dec. 12, 2020

without considering their theoretical foundations, our workshop focuses on establishing the Abstract 17: Keynote: Anca Dragan in The theoretical foundation of SSL and providing Challenges of Real World Reinforcement theoretical insights for developing new SSL Learning, Dragan 07:10 PM approaches. We invite submissions of both theoretical works Reinforcement Learning that optimizes what and empirical works, and the intersection of the people really want two. The topics include but are not limited to: Theoretical foundations of SSL Sample complexity of SSL methods Theory-driven design of auxiliary tasks in SSL Self-Supervised Learning -- Theory and Comparative analysis of diferent auxiliary tasks Practice Comparative analysis of SSL and supervised approaches Pengtao Xie, Shanghang Zhang, Pulkit Agrawal, Information theory and SSL Ishan Misra, Cynthia Rudin, Abdelrahman Mohamed, Wenzhen Yuan, Barret Zoph, Laurens van der SSL for computer vision, natural language Maaten, Xingyi Yang, Eric Xing processing, robotics, speech processing, time- series analysis, graph analytics, etc. Sat Dec 12, 08:50 AM SSL for healthcare, social media, neuroscience, biology, social science, etc. Self-supervised learning (SSL) is an unsupervised Cognitive foundations of SSL approach for representation learning without relying on human-provided labels. It creates In addition to invited talks by leading researchers auxiliary tasks on unlabeled input data and learns from diverse backgrounds including CV, NLP, representations by solving these tasks. SSL has robotics, theoretical ML, etc., the workshop will demonstrated great success on images (e.g., feature poster sessions and panel discussion to MoCo, PIRL, SimCLR) and texts (e.g., BERT) and share perspectives on establishing foundational has shown promising results in other data understanding of existing SSL approaches and modalities, including graphs, time-series, audio, theoretically-principled ways of developing new etc. On a wide variety of tasks, SSL without using SSL methods. We accept submissions of short human-provided labels achieves performance papers (up to 4 pages excluding references in that is close to fully supervised approaches. NeurIPS format), which will be peer-reviewed by at least two reviewers. The accepted papers are The existing SSL research mostly focuses on allowed to be submitted to other conference improving the empirical performance without a venues. theoretical foundation. While the proposed SSL approaches are empirically efective, theoretically why they perform well is not clear. For example, Schedule why certain auxiliary tasks in SSL perform better than others? How many unlabeled data examples 08:50 Opening remarks are needed by SSL to learn a good AM representation? How is the performance of SSL afected by neural architectures? 09:00 Invited Talk: Oriol Vinyals Vinyals AM In this workshop, we aim to bridge this gap 09:23 QA: Oriol Vinyals Vinyals between theory and practice. We bring together AM SSL-interested researchers from various domains 09:25 Invited Talk: Ruslan Salakhutdinov to discuss the theoretical foundations of AM Salakhutdinov empirically well-performing SSL approaches and 09:48 QA: Ruslan Salakhutdinov how the theoretical insights can further improve AM Salakhutdinov SSL’s empirical performance. Diferent from 09:50 Invited Talk: Yejin Choi Choi previous SSL-related workshops which focus on AM empirical efectiveness of SSL approaches 230 Dec. 12, 2020

10:13 QA: Yejin Choi Choi 05:39 QA: Yuandong Tian Tian AM PM 10:15 Poster Session I 05:40 Panel Discussion & Closing Choi, AM PM Efros, Finn, Grauman, Le, LeCun, 11:15 Invited Talk: Jitendra Malik Malik Salakhutdinov, Xing AM 11:38 QA: Jitendra Malik Malik AM Machine Learning for Systems 11:40 Invited Talk: Jia Deng Deng AM Anna Goldie, Azalia Mirhoseini, Jonathan Raiman, 12:03 QA: Jia Deng Deng Martin Maas, Xinlei XU PM Sat Dec 12, 09:00 AM 12:05 Invited Talk: Alexei Efros Efros PM **NeurIPS 2020 Workshop on Machine Learning 12:28 QA: Alexei Efros Efros for Systems** PM 12:30 Break Website: http://mlforsystems.org/ PM 01:30 Invited Talk: Yann LeCun LeCun Submission Link: https:// PM cmt3.research.microsoft.com/MLFS2020/ 01:53 QA: Yann LeCun LeCun Submission/Index PM 01:55 Invited Talk: Kristen Grauman Important Dates: PM Grauman Submission Deadline: **October 9th, 2020** 02:18 QA: Kristen Grauman Grauman (AoE) PM Acceptance Notifcations: October 23rd, 2020 02:20 Invited Talk: Katerina Fragkiadaki Camera-Ready Submission: November 29th, 2020 PM Fragkiadaki Workshop: December 12th, 2020 02:43 QA: Katerina Fragkiadaki Fragkiadaki PM Call for Papers: 02:45 Invited Talk: Abhinav Gupta Gupta PM Machine Learning for Systems is an 03:08 QA: Abhinav Gupta Gupta interdisciplinary workshop that brings together PM researchers in computer systems and machine 03:10 Poster Session II learning. This workshop is meant to serve as a PM platform to promote discussions between 04:10 Invited Talk: Leonidas J. Guibas researchers in these target areas. PM Guibas 04:33 QA: Leonidas J. Guibas Guibas We invite submission of up to 4-page extended PM abstracts in the broad area of using machine 04:35 Invited Talk: Quoc V. Le Le learning in the design of computer systems. We PM are especially interested in submissions that move beyond using machine learning to replace 04:58 QA: Quoc V. Le Le numerical heuristics. This year, we hope to see PM novel system designs, streamlined cross-platform 05:00 Invited Talk: Chelsea Finn Finn optimization, and new benchmarks for ML for PM Systems. 05:23 QA: Chelsea Finn Finn PM Accepted papers will be made available on the 05:25 Contributed Talk: Yuandong Tian Tian workshop website, but there will be no formal PM proceedings. Authors may therefore publish their 231 Dec. 12, 2020

work in other journals or conferences. The 10:25 Break workshop will include invited talks from industry AM and academia as well as oral and poster 11:20 Q&A (Talks #1-4) presentations by workshop participants. AM 12:05 Break Areas of interest: PM 01:40 Q&A (Talks #5-8) * Supervised, unsupervised, and reinforcement PM learning research with applications to: 03:05 Q&A (Talks #9-12) - Systems Software PM - Runtime Systems 03:15 Break - Distributed Systems PM - Security - Compilers, data structures, and code 04:55 Q&A (Talks #13-17) optimization PM - Databases 05:40 Closing Remarks - Computer architecture, microarchitecture, and PM accelerators - Circuit design and layout - Interconnects and Networking Ofine Reinforcement Learning - Storage

- Datacenters Aviral Kumar, Rishabh Agarwal, George Tucker, * Representation learning for hardware and Lihong Li, Doina Precup, Aviral Kumar software * Optimization of computer systems and software Sat Dec 12, 09:00 AM * Systems modeling and simulation * Implementations of ML for Systems and The common paradigm in reinforcement learning challenges (RL) assumes that an agent frequently interacts * High quality datasets for ML for Systems with the environment and learns using its own problems collected experience. This mode of operation is prohibitive for many complex real-world Submission Instructions: problems, where repeatedly collecting diverse data is expensive (e.g., robotics or educational We welcome submissions of up to 4 pages (not agents) and/or dangerous (e.g., healthcare). including references). This is not a strict limit, but Alternatively, Ofine RL focuses on training authors are encouraged to adhere to it if agents with logged data in an ofine fashion with possible. All submissions must be in PDF format no further environment interaction. Ofine RL and should follow the NeurIPS 2020 format. promises to bring forward a data-driven RL Submissions do not have to be anonymized. paradigm and carries the potential to scale up end-to-end learning approaches to real-world Please submit your paper no later than October decision making tasks such as robotics, 9th, 2020 midnight anywhere in the world to CMT recommendation systems, dialogue generation, (Link available soon). autonomous driving, healthcare systems and safety-critical applications. Recently, successful deep RL algorithms have been adapted to the Schedule ofine RL setting and demonstrated a potential for success in a number of domains, however, signifcant algorithmic and practical challenges 08:00 Poster Session & Hallway Track remain to be addressed. The goal of this AM (gather.town) workshop is to bring attention to ofine RL, both 09:00 Opening Remarks from within and from outside the RL community AM discuss algorithmic challenges that need to be addressed, discuss potential real-world 232 Dec. 12, 2020

applications, discuss limitations and challenges, 12:00 Panel Brunskill, Jiang, de Freitas, Doshi- and come up with concrete problem statements PM Velez, Levine, Langford, Li, Tucker, and evaluation protocols, inspired from real-world Agarwal, Kumar applications, for the research community to work 01:10 Learning a Multi-Agent Simulator on. PM from Ofine Demonstrations White, White For details on submission please visit: https:// 01:40 Q&A w/ Brandyn White ofine-rl-neurips.github.io/ (Submission deadline: PM October 9, 11:59 pm PT) 01:50 Towards Reliable Validation and PM Evaluation for Ofine RL Jiang Speakers: 02:20 Q&A w/ Nan Jiang Emma Brunskill (Stanford) PM Finale Doshi-Velez (Harvard) 02:30 Contributed Talk 5: Latent Action John Langford (Microsoft Research) PM Space for Ofine Reinforcement Nan Jiang (UIUC) Learning Zhou Brandyn White ( Research) Nando de Freitas (DeepMind) 02:40 Contributed Talk 6: What are the PM Statistical Limits for Batch RL with Linear Function Approximation? Schedule Wang 02:50 Contributed Talk 7: Distilled PM Thompson Sampling: Practical and 08:50 Introduction Kumar, Tucker, Agarwal Efcient Thompson Sampling via AM Imitation Learning Daulton, Namkoong 09:00 Ofine RL de Freitas 03:00 Contributed Talk 8: Batch- AM PM Constrained Distributional 09:30 Q&A w/ Nando de Freitas Reinforcement Learning for Session- AM based Recommendation Garg 09:40 Contributed Talk 1: Ofine 03:10 Q/A for Contributed Talks 2 AM Reinforcement Learning by Solving PM Derived Non-Parametric MDPs 03:20 Poster Session 2 (gather.town) Shrestha PM 09:50 Contributed Talk 2: Chaining 04:30 Counterfactuals and Ofine RL AM Behaviors from Data with Model- PM Brunskill Free Reinforcement Learning Singh 05:00 Q&A w/ Emma Brunskill 10:00 Contributed Talk 3: Addressing PM AM Distribution Shift in Online Reinforcement Learning with Ofine 05:10 Batch RL Models Built for Validation Datasets Lee, Seo, Lee PM Doshi-Velez 10:10 Contributed Talk 4: Addressing 05:40 Q&A w/ Finale Doshi-Velez AM Extrapolation Error in Deep Ofine PM Reinforcement Learning Gulcehre 05:50 Closing Remarks 10:20 Q/A for Contributed Talks 1 PM AM

10:30 Poster Session 1 (gather.town) Abstracts (1): AM 11:20 Causal Structure Discovery in RL Abstract 4: Contributed Talk 1: Ofine AM Langford Reinforcement Learning by Solving Derived 11:50 Q&A w/ John Langford Non-Parametric MDPs in Ofine AM Reinforcement Learning, Shrestha 09:40 AM

Aayam Shrestha (Oregon State University)*; Stefan Lee (Oregon State University); Prasad 233 Dec. 12, 2020

Tadepalli (Oregon State University); Alan Fern 10:30 Contributed Talk 2: From em- (Oregon State University) AM Projections to Variational Auto- Encoder Han 10:45 Keynote 2: Marco Gori Gori AM Deep Learning through Information 12:30 Keynote 3: Shun-ichi Amari Amari Geometry PM 01:15 Keynote 4: Alexander Rakhlin Rakhlin Pratik Chaudhari, Alex Alemi, Varun Jog, Dhagash PM Mehta, Frank Nielsen, Stefano Soatto, Greg Ver 02:15 Contributed Talk 3: An Information- Steeg PM Geometric Distance on the Space of

Sat Dec 12, 09:20 AM Tasks Gao 02:30 Keynote 5: Gintare Karolina Attempts at understanding deep learning have PM Dziugaite Dziugaite come from diferent disciplines, namely physics, 03:15 Keynote 6: Guido Montufar Montufar statistics, information theory, and machine PM learning. These lines of investigation have very 04:00 Contributed Talk 4: Annealed diferent modeling assumptions and techniques; PM Importance Sampling with q-Paths it is unclear how their results may be reconciled Brekelmans together. This workshop builds upon the 04:30 Panel Discussion and Closing observation that Information Geometry has PM Remarks strong overlaps with these directions and may 05:00 Poster Session (Gather Town) serve as a means to develop a holistic PM understanding of deep learning. The workshop program is designed to answer two specifc questions. The frst question is: how do geometry Abstracts (4): of the hypothesis class and information-theoretic properties of optimization inform generalization. Abstract 3: Contributed Talk 1: The Volume of Good datasets have been a key propeller of the Non-Restricted Boltzmann Machines and empirical success of deep networks. Our Their Double Descent Model Complexity in theoretical understanding of data is however Deep Learning through Information poor. The second question the workshop will Geometry, Cheema, Sugiyama 10:15 AM focus on is: how can we model data and use the Prasad Cheema, Mahito Sugiyama understanding of data to improve optimization/ generalization in the low-data regime.

Gather.Town link: https://neurips.gather.town/app/ Abstract 4: Contributed Talk 2: From em- vPYEDmTHeUbkACgf/dl-info-neurips2020 Projections to Variational Auto-Encoder in Deep Learning through Information Geometry, Han 10:30 AM Schedule Tian Han, Jun Zhang, Ying Nian Wu

09:20 Opening Remarks AM Abstract 8: Contributed Talk 3: An 09:30 Keynote 1: Ke Sun Sun Information-Geometric Distance on the AM Space of Tasks in Deep Learning through 10:15 Contributed Talk 1: The Volume of Information Geometry, Gao 02:15 PM AM Non-Restricted Boltzmann Machines and Their Double Descent Model Yansong Gao, Pratik Chaudhari Complexity Cheema, Sugiyama 234 Dec. 12, 2020

Abstract 11: Contributed Talk 4: Annealed Rob Brekelmans, Vaden Masrani, Thang D Bui, Importance Sampling with q-Paths in Deep Frank Wood, Aram Galstyan, Greg Ver Steeg, Learning through Information Geometry, Frank Nielsen Brekelmans 04:00 PM