NIPS 2017 Workshop book

Schedule Highlights Generated Fri Dec 18, 2020

Workshop organizers make last-minute changes to their schedule. Download this document again to get the lastest changes, or use the NIPS mobile application.

Dec. 8, 2017

101 A Learning in the Presence of Strategic Behavior Haghtalab, Mansour, Roughgarden, Syrgkanis, Wortman Vaughan 101 B Visually grounded interaction and language Strub, de Vries, Das, Kottur, Lee, Malinowski, Pietquin, Parikh, Batra, Courville, Mary 102 A+B Advances in Modeling and Learning Interactions from Complex Data Dasarathy, Kolar, Baraniuk 102 C 6th Workshop on Automated Knowledge Base Construction (AKBC) Pujara, Arad, Dalvi Mishra, Rocktäschel 102 C Synergies in Geometric Data Analysis (TWO DAYS) Meila, Chazal, Chen 103 A+B Competition track Escalera, Weimer 104 A for Health (ML4H) - What Parts of Healthcare are Ripe for Disruption by Machine Learning Right Now? Beam, Beam, Fiterau, Fiterau, Schulam, Schulam, Fries, Fries, Hughes, Hughes, Wiltschko, Wiltschko, Snoek, Snoek, Antropova, Antropova, Ranganath, Ranganath, Jedynak, Jedynak, Naumann, Naumann, Dalca, Dalca, Dalca, Dalca, Althof, Althof, ASTHANA, ASTHANA, Tandon, Tandon, Kandola, Kandola, Ratner, Ratner, Kale, Kale, Shalit, Shalit, Ghassemi, Ghassemi, Kohane, Kohane 104 B Acting and Interacting in the Real World: Challenges in Robot Learning Posner, Hadsell, Riedmiller, Wulfmeier, Paul 104 C for Physical Sciences Baydin, Prabhat, Cranmer, Wood 201 A Machine Learning for Audio Signal Processing (ML4Audio) Purwins, Sturm, Plumbley 201 B Nearest Neighbors for Modern Applications with Massive Data: An Age-old Solution with New Challenges Chen, Shah, Lee 202 Machine Deception Goodfellow, Hwang, Goodman, Rodriguez 203 Discrete Structures in Machine Learning Singer, Bilmes, Krause, Jegelka, Karbasi 204 Transparent and interpretable Machine Learning in Safety Critical Environments Tosi, Vellido, Álvarez Grand NIPS 2017 Time Series Workshop Ballroom A Kuznetsov, Anava, Yang, Khaleghi Grand Conversational AI - today's practice and tomorrow's potential Ballroom B Geramifard, Williams, Heck, Glass, Bordes, Young, Tesauro Hall A OPT 2017: Optimization for Machine Learning Sra, J. Reddi, Agarwal, Recht Hall C From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making 2

Volfovsky, Swaminathan, Toulis, Kallus, Silva, Shawe-Taylor, Joachims, Li Hyatt Extreme Classifcation: Multi-class & Multi-label Learning in Extremely Large Hotel, Label Spaces Regency Varma, Kloft, Dembczynski Ballroom A+B+C Hyatt Learning on Distributions, Functions, Graphs and Groups Hotel, d'Alché-Buc, Muandet, Sriperumbudur, Szabó Regency Ballroom D+E+F+H Hyatt Machine Learning for Creativity and Design Hotel, Eck, Ha, Eslami, Dieleman, Fiebrink, Elliott Seaview Ballroom Hyatt Machine Learning and Computer Security Hotel, Steinhardt, Papernot, Li, Liu, Liang, Song Shoreline S1 ML Systems Workshop @ NIPS 2017 Lakshmiratan, Bird, Sen, Ré, Li, Gonzalez, Crankshaw S4 Machine Learning for Molecules and Materials Chmiela, Hernández-Lobato, Schütt, Aspuru-Guzik, Tkatchenko, Ramsundar, von Lilienfeld, Kusner, Tsuda, Paige, Müller S5 Workshop on Worm's Neural Information Processing (WNIP) Hasani, Zimmer, Larson, Grosu S7 Machine Learning for the Developing World De-Arteaga, Herlands Seaside Advances in Approximate Bayesian Inference Ballroom Ruiz, Mandt, Zhang, McInerney, Tran, Broderick, Titsias, Blei, Welling

Dec. 9, 2017

101 A (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights Guedj, Germain, Bach 101 B Deep Learning at Supercomputer Scale Elsen, Hafner, Stone, Saeta 102 A+B Machine Learning on the Phone and other Consumer Devices Aradhye, Quinonero Candela, Prasad 102 C Synergies in Geometric Data Analysis (2nd day) Meila, Chazal, Chen 103 A+B Medical Imaging meets NIPS Glocker, Konukoglu, Lombaert, Bhatia 103 C Workshop on Prioritising Online Content Shawe-Taylor, Pontil, Cesa-Bianchi, Yilmaz, Watkins, Riedel, Grobelnik 104 A Cognitively Informed Artifcial Intelligence: Insights From Natural Intelligence Mozer, Lake, Yu 104 B Machine Learning in Computational Biology Zou, Kundaje, Quon, Fusi, Mostafavi 3

104 C The future of gradient-based machine learning software & techniques Wiltschko, van Merriënboer, Lamblin 201 A 2017 NIPS Workshop on Machine Learning for Intelligent Transportation Systems Li, Dragan, Niebles, Savarese 201 B Aligned Artifcial Intelligence Hadfeld-Menell, Steinhardt, Duvenaud, Krueger, Dragan 202 NIPS Highlights (MLTrain), Learn How to code a paper with state of the art frameworks Dimakis, Vasiloglou, Van den Broeck, Ihler, Araki 203 Learning Disentangled Features: from Perception to Control Denton, Narayanaswamy, Kulkarni, Lee, Bouchacourt, Tenenbaum, Pfau 204 BigNeuro 2017: Analyzing brain data from nano to macroscale Dyer, Kiar, Gray Roncal, , Koerding, Vogelstein Grand Hierarchical Ballroom A Barto, Precup, Mannor, Schaul, Fox, Florensa Grand Learning with Limited : Weak Supervision and Beyond Ballroom B Augenstein, Bach, Belilovsky, Blaschko, Lampert, Oyallon, Platanios, Ratner, Ré Hall A Deep Learning: Bridging Theory and Practice Arora, Raghu, Salakhutdinov, Schmidt, Vinyals Hall C Bayesian Deep Learning Gal, Hernández-Lobato, Louizos, Wilson, Kingma, Ghahramani, Murphy, Welling Hyatt Workshop on Meta-Learning Beacon Calandra, Hutter, Larochelle, Levine Ballroom D+E+F+H Hyatt Interpreting, Explaining and Visualizing Deep Learning - Now what ? Hotel, Müller, Vedaldi, Hansen, Samek, Montavon Regency Ballroom A+B+C Hyatt Optimal Transport and Machine Learning Hotel, Bousquet, Cuturi, Peyré, Sha, Solomon Seaview Ballroom Hyatt Collaborate & Communicate: An exploration and practical skills workshop that Hotel, builds on the experience of AIML experts who are both successful collaborators Shoreline and great communicators. Gorman S1 Machine Learning Challenges as a Research Tool Guyon, Viegas, Escalera, Abernethy S4 Emergent Communication Workshop Foerster, Mordatch, Lazaridou, Cho, Kiela, Abbeel S7 Bayesian optimization for science and engineering Martinez-Cantin, Hernández-Lobato, Gonzalez Seaside Teaching Machines, Robots, and Humans Ballroom Cakmak, Raferty, Singla, Zhu, Zilles 4 Dec. 8, 2017

Dec. 8, 2017 these problems include but are not limited to

1. Learning from data that is produced by agents who have vested interest in the outcome or the Learning in the Presence of Strategic learning process. For example, learning a Behavior measure of quality of universities by surveying members of the academia who stand to gain or Nika Haghtalab, Yishay Mansour, Tim Roughgarden, lose from the outcome, or when a GPS routing Vasilis Syrgkanis, Jenn Wortman Vaughan app has to learn patterns of trafc delay by routing individuals who have no interest in taking 101 A, Fri Dec 08, 08:00 AM slower routes.

Machine learning is primarily concerned with the 2. Learning a model for the strategic behavior of design and analysis of algorithms that learn one or more agents by observing their about an entity. Increasingly more, machine interactions, for example, learning economical learning is being used to design policies that demands of buyers by observing their bidding afect the entity it once learned about. This can patterns when competing with other buyers. cause the entity to react and present a diferent behavior. Ignoring such interactions could lead to 3. Learning as a model of interactions between solutions that are ultimately inefective in agents. Examples of this include applications to practice. For example, to design an efective ad swarm robotics, where individual agents have to display one has to take into account how a viewer learn to interact in a multi-agent setting in order would react to the displayed advertisements, for to achieve individual or collective goals. example by choosing to scroll through or click on them. Additionally, in many environments, 4. Interactions between multiple learners. In multiple learners learn concurrently about one or many settings, two or more learners learn about more related entities. This can bring about a the same or multiple related concepts. How do range of interactions between individual learners. these learners interact? What are the scenarios For example, multiple frms may compete or under which they would share knowledge, collaborate on performing market research. How information, or data. What are the desirable do the learners and entities interact? How do interactions between learners? As an example, these interactions change the task at hand? What consider multiple competing pharmaceutical are some desirable interactions in a learning frms that are learning about the efectiveness of environment? And what are the mechanisms for a certain treatment. In this case, while competing bringing about such desirable interactions? These frms would prefer not to share their fndings, it is are some of the questions we would like to benefcial to the society when such fndings are explore more in this workshop. shared. How can we incentivize these learners to perform such desirable interactions? Traditionally, learning theory has adopted two extreme views in this respect: First, when The main goal of this workshop is to address learning occurs in isolation from strategic current challenges and opportunities that arise behavior, such as in the classical PAC setting from the presence of strategic behavior in where the data is drawn from a fxed distribution; learning theory. This workshop aims at bringing second, when the learner faces an adversary together members of diferent communities, whose goal is to inhibit the learning process, such including machine learning, economics, as the minimax setting where the data is theoretical , and social generated by an adaptive worst-case adversary. computing, to share recent results, discuss While these extreme perspectives have lead to important directions for future research, and elegant results and concepts, such as VC foster collaborations. dimension, Littlestone dimension, regret bounds, and more, many types of problems that we would like to solve involve strategic behaviors that do not fall into these two extremes. Examples of 5 Dec. 8, 2017

Schedule exchanging excess resources. Scientists use the rich data of these activities to understand human social behavior, generate accurate predictions, 09:00 (Invited Talk) Yiling Chen: Learning and make policy recommendations. Machine AM in Strategic Data Environments. learning traditionally take such data as given, Chen often treating them as independent samples from 09:45 Strategic Classifcation from some unknown statistical distribution. However, AM Revealed Preferences Dong such data are possessed or generated by 10:00 Learning in Repeated Auctions with potentially strategic people in the context of AM Budgets: Regret Minimization and specifc interaction rules. Hence, what data Equilibrium Gur become available depends on the interaction 10:15 Spotlights Podimata, Zuo, Feng, Kim rules. For example, people with sensitive medical AM conditions may not reveal their medical data in a 11:00 (Invited Talk) Eva Tardos: Online survey but could be willing to share them when AM learning with partial information for compensated; crowd workers may not put in a players in games. Tardos good-faith efort in completing a task if they know that the requester cannot verify the quality of 11:45 (Invited Talk) Mehryar Mohri: Regret their contributions. In this talk, I argue that a AM minimization against strategic holistic view that jointly considers data buyers. Mohri acquisition and learning is important. I will 12:30 Lunch Break discuss two projects. The frst project considers PM acquiring data from strategic data holders who 01:50 (Invited Talk) Percy Liang: Learning have private cost for revealing their data and PM with Adversaries and Collaborators then learning from the acquired data. We provide Liang a risk bound on learning, analogous to classic risk 02:35 Spotlights Kangasrääsiö, Everett, Liang, bounds, for situations when agents’ private costs PM Cai, Wu, Muthukumar, Schmit can correlate with their data in arbitrary ways. 03:00 Poster Session & Cofee break The second project leverages techniques in PM learning to design a mechanism for obtaining 03:30 (Invited Talk) Alex Peysakhovich: high-quality data from strategic data holders. The PM Towards cooperative AI Peysakhovich mechanism has a strong incentive property: it is a 04:15 Statistical Tests of Incentive dominant strategy for each agent to truthfully PM Compatibility in Display Ad Auctions reveal their data even if we have no ground truth Munoz to directly evaluate their contributions. 04:30 Optimal Economic Design through PM Deep Learning Parkes This talk is based on joint works with Jacob 04:45 Learning Against Non-Stationary Abernethy, Chien-Ju Ho, Yang Liu, and Bo PM Agents with Opponent Modeling & Waggoner. Deep Reinforcement Learning Everett

Abstract 2: Strategic Classifcation from Abstracts (12): Revealed Preferences in Learning in the Presence of Strategic Behavior, Dong 09:45 Abstract 1: (Invited Talk) Yiling Chen: AM Learning in Strategic Data Environments. in Learning in the Presence of Strategic We study an online linear classifcation problem, Behavior, Chen 09:00 AM in which the data is generated by strategic agents who manipulate their features in an efort We live in a world where activities and to change the classifcation outcome. In rounds, interactions are recorded as data: food the learner deploys a classifer, and an consumption, workout activities, buying and adversarially chosen agent arrives, possibly selling products, sharing information and manipulating her features to optimally respond to experiences, borrowing and lending money, and the learner. The learner has no knowledge of the 6 Dec. 8, 2017

agents' utility functions or ``real'' features, which minimization and market stability, by which may vary widely across agents. Instead, the advertisers can essentially follow equilibrium learner is only able to observe their ``revealed bidding strategies that also ensure the best preferences'' --- i.e. the actual manipulated performance that can be guaranteed of- feature vectors they provide. For a broad family equilibrium. of agent cost functions, we give a computationally efcient learning algorithm that Yonatan Gur and Santiago Balseiro. is able to obtain diminishing ``Stackelberg regret'' --- a form of policy regret that guarantees that the learner is obtaining loss nearly as small Abstract 4: Spotlights in Learning in the as that of the best classifer in hindsight, even Presence of Strategic Behavior, Podimata, allowing for the fact that agents will best-respond Zuo, Feng, Kim 10:15 AM diferently to the optimal classifer. 1. Strategyproof . Yiling Chen, Jinshuo Dong, Aaron Roth, Zachary Schutzman, Chara Podimata and Nisarg Shah Bo Waggoner and Zhiwei Steven Wu 2. Incentive-Aware Learning for Large Markets. Mohammad Mahdian, Vahab Mirrokni and Song Abstract 3: Learning in Repeated Auctions Zuo. with Budgets: Regret Minimization and 3. Learning to Bid Without Knowing Your Value. Equilibrium in Learning in the Presence of Zhe Feng, Chara Podimata and Vasilis Syrgkanis. Strategic Behavior, Gur 10:00 AM

In online advertising markets, advertisers often 4. Budget Management Strategies in Repeated purchase ad placements through bidding in Auctions. Uniqueness and Stability of System repeated auctions based on realized viewer Equilibria. Santiago Balseiro, Anthony Kim, information. We study how budget-constrained Mohammad Mahdian and Vahab Mirrokni. advertisers may compete in such sequential auctions in the presence of uncertainty about 5. Dynamic Second Price Auctions with Low future bidding opportunities and competition. We Regret. Vahab Mirrokni, Renato Paes Leme, Rita formulate this problem as a sequential game of Ren and Song Zuo. incomplete information, where bidders know neither their own valuation distribution, nor the budgets and valuation distributions of their Abstract 5: (Invited Talk) Eva Tardos: Online competitors. We introduce a family of practical learning with partial information for players bidding strategies we refer to as adaptive pacing in games. in Learning in the Presence of strategies, in which advertisers adjust their bids Strategic Behavior, Tardos 11:00 AM according to the sample path of expenditures they exhibit. Under arbitrary competitors’ bids, Learning has been adopted as a general we establish through matching lower and upper behavioral model for players in repeated games. bounds the asymptotic optimality of this class of Learning ofers a way that players can adopt to strategies as the number of auctions grows large. (possibly changing) environment. Learning When all the bidders adopt these strategies, we guarantees high social welfare in many games establish the convergence of the induced (including trafc routing as well as online dynamics and characterize a regime (well auctions), even when the game or the population motivated in the context of display advertising of players is dynamically changing. The rate at markets) under which these strategies constitute which the game can change depends on the an approximate Nash equilibrium in dynamic speed of convergence of the learning algorithm. If strategies: The beneft of unilaterally deviating to players observe all other participants, which such other strategies, including ones with access to full information feedback classical learning complete information, becomes negligible as the algorithms ofer very fast convergence. However, number of auctions and competitors grows large. such full information feedback is often not This establishes a connection between regret available, and the convergence of classical 7 Dec. 8, 2017

algorithms with partial feedback is much good. In 2. Inference of Strategic Behavior based on this talk we develop a black-box approach for Incomplete Observation Data. Antti Kangasrääsiö learning where the learner observes as feedback and Samuel Kaski. only losses of a subset of the actions. The simplicity and black box nature of the approach 3. Inferring agent objectives at diferent scales of allows us to use of this faster as a a complex adaptive system. Dieter Hendricks, behavioral assumption in games. Talk based on Adam Cobb, Richard Everett, Jonathan Downing joint work with Thodoris Lykouris and Karthik and Stephen Roberts. Sridharan. 4. A Reinforcement Learning Framework for Eliciting High Quality Information. Zehong Hu, Yang Liu, Yitao Liang and Jie Zhang. Abstract 6: (Invited Talk) Mehryar Mohri: Regret minimization against strategic buyers. in Learning in the Presence of 5. Learning Multi-item Auctions with (or without) Strategic Behavior, Mohri 11:45 AM Samples. Yang Cai and Constantinos Daskalakis.

This talk presents an overview of several recent 6. Competing Bandits: Learning under algorithms for regret minimization against Competition. Yishay Mansour, Aleksandrs Slivkins strategic buyers in the context of posted-price and Zhiwei Steven Wu. auctions, which are crucial for revenue optimization in online advertisement. 7. Robust commitments and partial reputation. Vidya Muthukumar and Anant Sahai Joint work with Andres Munoz Medina. 8. Learning with Abandonment. Ramesh Johari and Sven Schmit.

Abstract 8: (Invited Talk) Percy Liang: Learning with Adversaries and Collaborators in Learning in the Presence of Abstract 11: (Invited Talk) Alex Peysakhovich: Strategic Behavior, Liang 01:50 PM Towards cooperative AI in Learning in the Presence of Strategic Behavior, Peysakhovich We argue that the standard machine learning 03:30 PM paradigm is both too weak and too string. First, we show that current systems for image Social dilemmas are situations where individuals classifcation and reading comprehension are face a temptation to increase their payofs at a vulnerable to adversarial attacks, suggesting that cost to total welfare. Importantly, social existing learning setups are inadequate to dilemmas are ubiquitous in real world produce systems with robust behavior. Second, interactions. We show how to modify modern we show that in an interactive learning setting reinforcement learning methods to construct where incentives are aligned, a system can learn agents that act in ways that are simple to a simple natural language from a user from understand, begin by cooperating, try to avoid scratch, suggesting that much more can be being exploited, and forgiving (try to return to learned under a cooperative setting. mutual cooperation). Such agents can maintain cooperation in Markov social dilemmas with both perfect and imperfect information. Our Abstract 9: Spotlights in Learning in the construction does not require training methods beyond a modifcation of self-play, thus if an Presence of Strategic Behavior, environment is such that good strategies can be Kangasrääsiö, Everett, Liang, Cai, Wu, Muthukumar, Schmit 02:35 PM constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social 1. Multi-Agent Counterfactual Regret Minimization dilemmas in this environment. for Partial-Information Collaborative Games. Matthew Hartley, Stephan Zheng and Yisong Yue. 8 Dec. 8, 2017

Abstract 12: Statistical Tests of Incentive Presence of Strategic Behavior, Parkes 04:30 Compatibility in Display Ad Auctions in PM Learning in the Presence of Strategic Designing an auction that maximizes expected Behavior, Munoz 04:15 PM revenue is an intricate task. Despite major Consider a buyer participating in a repeated eforts, only the single-item case is fully auction in an ad understood. We explore the use of tools from exchange. How does a buyer fgure out whether deep learning on this topic. The design objective her bids will be used against her in the form of is revenue optimal, dominant-strategy incentive reserve prices? There are many potential A/B compatible auctions. For a baseline, we show that testing setups that one can use. However, we will multi-layer neural networks can learn almost- show many natural experimental designs have optimal auctions for a variety of settings for serious faws. which there are analytical solutions, and even without encoding characterization results into the For instance, one can use additive or design of the network. Looking ahead, deep multiplicative perturbation to the bids. We show learning has promise for deriving auctions with that additive perturbations to bids can lead to high revenue for poorly understood problems. paradoxical results, as reserve prices are not guaranteed to be monotone for non-MHR Paul Duetting, Zhe Feng, Harikrishna Narasimhan, distributions, and thus higher bids may lead to and David Parkes lower reserve prices!

Similarly, one may be tempted to measure bid Abstract 14: Learning Against Non-Stationary infuence in reserves by randomly perturbing Agents with Opponent Modeling & Deep one's bids. However, unless the perturbations are Reinforcement Learning in Learning in the aligned with the partitions used by the seller to Presence of Strategic Behavior, Everett 04:45 compute optimal reserve prices, the results are PM guaranteed to be inconclusive. Humans, like all animals, both cooperate and Finally, in practice additional market compete with each other. Through considerations play a large role---if the optimal these interactions we learn to observe, act, and reserve price is further constrained by the seller manipulate to maximize our utility to satisfy additional business logic, the power of function, and continue doing so as others learn the buyer to detect the efect to which his bids with us. This is a decentralized are being used against him is limited. non-stationary learning problem, where to survive and fourish an agent must In this work we develop tests that a buyer can adapt to the gradual changes of other agents as user to measure the impact of current bids on they learn, as well as capitalize on future reserve prices. In addition, we analyze the sudden shifts in their behavior. To date, a cost of running such experiments, exposing majority of the work in deep multi-agent trade-ofs between test accuracy, cost, and reinforcement learning has focused on only one underlying market dynamics. We validate our of these types of adaptations. In results with experiments on real world data and this paper, we introduce the Switching Agent show that a buyer can detect reserve price Model (SAM) as a way of dealing optimization done by the seller at a reasonable with both types of non-stationarity through the cost. combination of opponent modeling and deep multi-agent reinforcement learning. Andres Munoz Medina, Sebastien Lahaie, Sergei Vassilvitskii and Balasubramanian Sivan Richard Everett

Abstract 13: Optimal Economic Design through Deep Learning in Learning in the 9 Dec. 8, 2017

Visually grounded interaction and — visual captioning, dialog, and question- language answering — reasoning in language and vision

Florian Strub, Harm de Vries, Abhishek Das, Satwik — visual synthesis from language Kottur, Stefan Lee, Mateusz Malinowski, Olivier — transfer learning in language and vision tasks Pietquin, Devi Parikh, Dhruv Batra, Aaron Courville, — navigation in virtual worlds with natural- Jeremie Mary language instructions — machine translation with visual cues 101 B, Fri Dec 08, 08:00 AM — novel tasks that combine language, vision and actions Everyday interactions require a common — understanding and modeling the relationship understanding of language, i.e. for people to between language and vision in humans communicate efectively, words (for example — semantic systems and modeling of natural ‘cat’) should invoke similar beliefs over physical language and visual stimuli representations in the concepts (what cats look like, the sounds they human brain make, how they behave, what their skin feels like etc.). However, how this ‘common understanding’ Important dates emerges is still unclear. ------Submission deadline: 3rd November 2017 One appealing hypothesis is that language is tied Extended Submission deadline: 17th November to how we interact with the environment. As a 2017 result, meaning emerges by ‘grounding’ language in modalities in our environment (images, Acceptance notifcation (First deadline): 10th sounds, actions, etc.). November 2017 Acceptance notifcation (Second deadline): 24th Recent concurrent works in machine learning November 2017 have focused on bridging visual and natural language understanding through visually- Workshop: 8th December 2017 grounded language learning tasks, e.g. through natural images (Visual Question Answering, Paper details Visual Dialog), or through interactions with virtual ------physical environments. In cognitive science, — Contributed papers may include novel progress in fMRI enables creating a semantic research, preliminary results, extended abstract, atlas of the cerebral cortex, or to decode positional papers or surveys semantic information from visual input. And in — Papers are limited to 4 pages, excluding psychology, recent studies show that a baby’s references, in the latest camera-ready NIPS most likely frst words are based on their visual format: https://nips.cc/Conferences/2017/ experience, laying the foundation for a new PaperInformation/StyleFiles theory of infant language acquisition and — Papers published at the main conference can learning. be submitted without reformatting — Please submit via email: As the grounding problem requires an [email protected] interdisciplinary attitude, this workshop aims to gather researchers with broad expertise in various felds — machine learning, computer Accepted papers vision, natural language, neuroscience, and ------psychology — to discuss their cutting edge work — All accepted papers will be presented during 2 as well as perspectives on future directions in this poster sessions exciting space of grounding and interactions. — Up to 5 accepted papers will be invited to deliver short talks We will accept papers related to: — Accepted papers will be made publicly — language acquisition or learning through available as non-archival reports, allowing future interactions submissions to archival conferences and journals 10 Dec. 8, 2017

Advances in Modeling and Learning Interactions from Complex Data Invited Speakers

------Gautam Dasarathy, Mladen Kolar, Richard Baraniuk Raymond J. Mooney - University of Texas Sanja Fidler - University of Toronto 102 A+B, Fri Dec 08, 08:00 AM Olivier Pietquin - DeepMind Jack Gallant - University of Berkeley Whether it is biological networks of proteins and Devi Parikh - Georgia Tech / FAIR genes or technological ones like sensor networks Felix Hill - DeepMind and the Internet, we are surrounded today by Jack Gallant - Univeristy of Berkeley complex systems composed of entities Chen Yu - University of Indiana interacting with and afecting each other. An urgent need has therefore emerged for developing novel techniques for modeling, Schedule learning, and conducting inference in such networked systems. Consequently, we have seen progress from a variety of disciplines in both 08:30 Welcome! fundamental methodology and in applications of AM such methods to practical problems. However, 08:45 Visually Grounded Language: Past, much work remains to be done, and a unifying AM Present, and Future... Mooney and principled framework for dealing with these 09:30 Connecting high-level semantics problems remains elusive. This workshop aims to AM with low-level vision Fidler bring together theoreticians and practitioners in 10:15 Break + Poster (1) Chaplot, MA, order to both chart out recent advances and to AM Brodeur, Matsuo, Kobayashi, Shinagawa, discuss new directions in understanding Yoshino, Guo, Murdoch, Mysore interactions in large and complex systems. NIPS, Sathyendra, Ricks, Zhang, Peterson, with its attendance by a broad and cross- Zhang, Mironenco, Anderson, Johnson, disciplinary set of researchers ofers the ideal Yoo, Barzdins, Zaidi, Andrews, Witteveen, venue for this exchange of ideas. OOTA, Vijayaraghavan, Wang, Zhu, Liepins, Quinn, Raj, Cartillier, Chu, The workshop will feature a mix of contributed Caballero, Obermeyer talks, contributed posters, and invited talks by 10:40 The interface between vision and leading researchers from diverse backgrounds AM language in the human brain? Gallant working in these areas. We will also have a 11:25 Towards Embodied Question specifc segment of the schedule reserved for the AM Answering Parikh presentation of open problems, and will have 02:00 Dialogue systems and RL: plenty of time for discussions where we will PM interconnecting language, vision explicitly look to spark of collaborations amongst and rewards Pietquin the attendees. 02:45 Accepted Papers PM We encourage submissions in a variety of topics 03:15 Break + Poster (2) including, but not limited to: PM * Computationally and statistically efcient techniques for learning graphical models from 03:40 Grounded Language Learning in a data including convex, greedy, and active PM Simulated 3D World Hill approaches. 04:25 How infant learn to speak by * New probabilistic models of interacting systems PM interacting with the visual world? Yu including nonparametric and exponential family 05:10 Panel Discussion Hill, Pietquin, Gallant, graphical models. PM Mooney, Fidler, Yu, Parikh * Community detection algorithms including semi-supervised and adaptive approaches. * Techniques for modeling and learning causal relationships from data. 11 Dec. 8, 2017

* Bayesian techniques for modeling complex data 04:35 The Expxorcist: Nonparametric and causal relationships. PM Graphical Models Via Conditional * Kernel methods for directed and undirected Exponential Densities Suggala graphical models. 04:55 Mathematical and Computational * Applications of these methods in various areas PM challenges in Reconstructing like sensor networks, computer networks, social Evolution Warnow networks, and biological networks like phylogenetic trees and graphs. Abstracts (9): Successful submissions will emphasize the role of Abstract 2: Community Detection and statistical and computational learning to the Invariance to Distribution in Advances in problem at hand. The author(s) of these Modeling and Learning Interactions from submissions will be invited to present their work Complex Data, Bresler 08:05 AM as either a poster or as a contributed talk. Alongside these, we also solicit submissions of We consider the problem of recovering a hidden open problems that go with the theme of the community of size K from a graph where edges workshop. The author(s) of the selected open between members of the community have label X problems will be able to present the problem to drawn i.i.d. according to P and all other edges the attendees and solicit feedback/collaborations. have labels drawn i.i.d. according to Q. The information limits for this problem were Schedule characterized by Hajek-Wu-Xu in 2016 in terms of the KL-divergence between P and Q. We complement their work by showing that for a 08:00 Opening Remarks Dasarathy broad class of distributions P and Q the AM computational difculty is also determined by the 08:05 Community Detection and Invariance KL-divergence. We additionally show how to AM to Distribution Bresler reduce general P and Q to the case P = Ber(p) 08:40 Edge Exchangeable Temporal and Q = Ber(q) and vice versa, giving a direct AM Network Models Ng computational equivalence (up to polynomial time). 09:00 A Data-Driven Sparse-Learning AM Approach to Model Reduction in Chemical Reaction Networks HARIRCHI Abstract 3: Edge Exchangeable Temporal 09:20 Poster Spotlights Network Models in Advances in Modeling AM and Learning Interactions from Complex 10:00 Cofee Break + Posters Data, Ng 08:40 AM AM We propose a dynamic edge exchangeable 11:00 Your dreams may come true with network model that can capture sparse AM MTP2 Uhler connections observed in real temporal networks, 11:35 Estimating Mixed Memberships with in contrast to existing models which are dense. AM Sharp Eigenvector Deviations Mao The model achieved superior link prediction 12:00 Lunch Break accuracy when compared to a dynamic variant of PM the blockmodel, and is able to extract 02:00 Recovering Latent Causal Relations interpretable time-varying community structures. PM from Times Series Data Kiyavash In addition to sparsity, the model accounts for the 02:35 Learning High-Dimensional DAGs: efect of social infuence on vertices’ future PM Provable Statistical Guarantees and behaviours. Compared to the dynamic Scalable Approximation blockmodels, our model has a smaller latent 04:00 Conditional Densities and Efcient space. The compact latent space requires a PM Models in Infnite Exponential smaller number of parameters to be estimated in Families Gretton 12 Dec. 8, 2017

variational inference and results in a computationally friendly inference algorithm. Abstract 8: Estimating Mixed Memberships with Sharp Eigenvector Deviations in Advances in Modeling and Learning Abstract 4: A Data-Driven Sparse-Learning Interactions from Complex Data, Mao 11:35 Approach to Model Reduction in Chemical AM Reaction Networks in Advances in Modeling and Learning Interactions from Complex Real world networks often have nodes belonging Data, HARIRCHI 09:00 AM to multiple communities. We consider the detection of overlapping communities under the In this paper, we propose an optimization-based popular Mixed Membership Stochastic sparse learning approach to iden- tify the set of Blockmodel (MMSB). Using the inherent geometry most infuential reactions in a chemical reaction of this model, we link the inference of network. This reduced set of reactions is then overlapping communities to the problem of employed to construct a reduced chemical fnding corners in a noisy rotated and scaled reaction mechanism, which is relevant to simplex, for which consistent algorithms exist. We chemical interaction network modeling. The use this as a building block for our algorithm to problem of identifying infuential reactions is frst infer the community memberships of each node, formulated as a mixed-integer quadratic and prove its consistency. As a byproduct of our program, and then a relaxation method is analysis, we derive sharp row-wise eigenvector leveraged to reduce the compu- tational deviation bounds, and provide a cleaning step complexity of our approach. Qualitative and that improves the performance drastically for quantitative validation of the sparse encoding sparse networks. We also propose both necessary approach demonstrates that the model captures and sufcient conditions for identifability of the important network structural properties with model, while existing methods typically present moderate computational load. sufcient conditions. The empirical performance of our method is shown using simulated and real datasets scaling up to 100,000 nodes. Abstract 7: Your dreams may come true with MTP2 in Advances in Modeling and Learning

Interactions from Complex Data, Uhler 11:00 Abstract 10: Recovering Latent Causal AM Relations from Times Series Data in Advances in Modeling and Learning We study maximum likelihood estimation for exponential families that are multivariate totally Interactions from Complex Data, Kiyavash positive of order two (MTP2). Such distributions 02:00 PM appear in the context of ferromagnetism in the Discovering causal relationships from data is a Ising model and various latent models, as for challenging problem that is exacerbated when example Brownian motion tree models used in some of the variables of interests are latent. In phylogenetics. We show that maximum likelihood this talk, we discuss the problem of learning the estimation for MTP2 exponential families is a support of transition matrix between random convex optimization problem. For quadratic processes in a Vector Autoregressive (VAR) model exponential families such as Ising models and from samples when a subset of the processes are Gaussian graphical models, we show that MTP2 latent. It is well known that ignoring the efect of implies sparsity of the underlying graph without the latent processes may lead to very diferent the need of a tuning parameter. In addition, we estimates of the infuences even among observed characterize a subgraph and a supergraph of processes. We are not only interested in Gaussian graphical models under MTP2. identifying the infuences among the observed Moreover, we show that the MLE always exists processes, but also aim at learning those even in the high-dimensional setting. These between the latent ones, and those from the properties make MTP2 constraints an intriguing latent to the observed ones. We show that the alternative to methods for learning sparse support of transition matrix among the observed graphical models such as the graphical lasso. 13 Dec. 8, 2017

processes and lengths of all latent paths between Non-parametric multivariate density estimation any two observed processes can be identifed faces strong successfully under some conditions on the VAR statistical and computational bottlenecks, and model. Our results apply to both non-Gaussian the more practical and Gaussian cases, and experimental results on approaches impose near-parametric assumptions various synthetic and real-world datasets validate on the form our theoretical fndings. of the density functions. In this paper, we leverage recent developments to propose a class of non- Abstract 12: Conditional Densities and parametric models which have very attractive computational and statistical Efcient Models in Infnite Exponential properties. Families in Advances in Modeling and Learning Interactions from Complex Data, Gretton 04:00 PM Abstract 14: Mathematical and The exponential family is one of the most Computational challenges in Reconstructing powerful and widely used classes of models in Evolution in Advances in Modeling and statistics. A method was recently developed to ft Learning Interactions from Complex Data, this model when the natural parameter and Warnow 04:55 PM sufcient statistic are infnite dimensional, using a score matching approach. The infnite Reconstructing evolutionary histories is a basic exponential family is a natural generalisation of step in much biological discovery, as well as in the fnite case, much like the Gaussian and historical linguistics and other domains. Inference Dirichlet processes generalise their respective methods based on mathematical models of fnite modfels. evolution have been used to make substantial In this talk, I'll describe two recent results which advances, including in understanding the early make this model more applicable in practice, by origins of life, to predicting protein structures and reducing the computational burden and functions, and to addressing questions such as improving performance for high-dimensional "Where did the Indo-European languages begin?" data. The frsrt is a Nytsrom-like approximation to In this talk, I will describe the current state of the the full solution. We prove that this approximate art in phylogeny estimation in these domains, solution has the same consistency and what is understood from a mathematical convergence rates as the full-rank solution perspective, and identify fascinating open (exactly in Fisher distance, and nearly in other problems where novel mathematical research - distances), with guarantees on the degree of cost drawing from graph theory, algorithms, and and storage reduction. The second result is a probability theory - is needed. This talk will be generalisation of the model family to the accessible to mathematicians, computer conditional case, again with consistency scientists, and probabilists, and does not require guarantees. In experiments, the conditional any knowledge in biology. model generally outperforms a competing approach with consistency guarantees, and is competitive with a deep conditional density model on datasets that exhibit abrupt transitions 6th Workshop on Automated Knowledge and heteroscedasticity. Base Construction (AKBC)

Jay Pujara, Dor Arad, Bhavana Dalvi Mishra, Tim Abstract 13: The Expxorcist: Nonparametric Rocktäschel Graphical Models Via Conditional 102 C, Fri Dec 08, 08:00 AM Exponential Densities in Advances in Modeling and Learning Interactions from Extracting knowledge from text, images, audio, Complex Data, Suggala 04:35 PM and video and translating these extractions into a coherent, structured knowledge base (KB) is a 14 Dec. 8, 2017

task that spans the areas of machine learning, talks, outstanding contributed papers, an natural language processing, computer vision, interactive research environment, and a novel databases, search, and artifcial evaluation service will ensure AKBC is a popular intelligence. Over the past two decades, machine addition to the NIPS program. learning techniques used for information extraction, graph construction, and automated knowledge base construction have evolved from Schedule simple rule learning to end-to-end neural architectures with papers on the topic 09:00 Challenges and Innovations in consistently appearing at NIPS. Hence, we AM Building a Product Knowledge Graph believe this workshop will appeal to NIPS Dong attendees and be a valuable contribution. 09:30 End-to-end Learning for Broad AM Coverage Semantics: SRL, Furthermore, there has been signifcant interest Coreference, and Beyond Zettlemoyer and investment in knowledge base construction 10:00 Graph Convolutional Networks for in both academia and industry in recent years. AM Extracting and Modeling Relational Most major internet companies and many Data Titov startups have developed knowledge bases that 10:30 Poster Session - Session 1 power digital assistants (e.g. Siri, Alexa, Google AM Now) or provide the foundations for search and discovery applications. A similarly abundant set 11:30 Learning Hierarchical of knowledge systems have been developed at AM Representations of Relational Data top universities such as Stanford (DeepDive), Nickel Carnegie Mellon (NELL), the University of 12:00 Reading and Reasoning with Neural Washington (OpenIE), the University of Mannheim PM Program Interpreters Riedel (DBpedia), and the Max Planck Institut Informatik 02:00 Multimodal KB Extraction and (YAGO, WebChild), among others. Our workshop PM Completion Singh serves as a forum for researchers working on 02:30 Go for a Walk and Arrive at the knowledge base construction in both academia PM Answer: Reasoning Over Knowledge and industry. Bases with Reinforcement Learning 02:45 Multi-graph Afnity Embeddings for With this year’s workshop we would like to PM Multilingual Knowledge Graphs continue the successful tradition of the previous 03:00 A Study of Automatically Acquiring fve AKBC workshops. AKBC flls a unique need in PM Explanatory Inference Patterns from the feld, bringing together industry leaders and Corpora of Explanations: Lessons academic researchers. Our workshop is focused from Elementary Science Exams on stellar invited talks from high-profle speakers 03:15 NELL: Lessons and Future Directions who identify the pressing research areas where PM Mitchell current methods fall short and propose visionary 03:45 Poster Session - Session 2 Rawat, approaches that will lead to the next generation PM Joulin, Jansen, Lee, Chen, Xu, Verga, Juba, of knowledge bases. Our workshop prioritizes a Dumitrache, Jat, Logan, Sridhar, Yang, participatory environment where attendees help Das, Pezeshkpour, Monath identify the most promising research, contribute 04:45 Discussion Panel and Debate to surveys on controversial questions, and PM suggest debate topics for speaker panels. In addition, for the frst time, AKBC will address a 05:45 Closing Remarks longstanding issue in the AKBC, that of equitable PM comparison and evaluation across methods, by including a shared evaluation platform, Stanford’s Abstracts (7): KBP Online (https://kbpo.stanford.edu/), which will allow crowdsourced labels for KBs without Abstract 1: Challenges and Innovations in strong assumptions about the data or methods Building a Product Knowledge Graph in 6th used. Together, this slate of high-profle research 15 Dec. 8, 2017

Workshop on Automated Knowledge Base gains, including over 20% relative error Construction (AKBC), Dong 09:00 AM reductions when compared to non-neural methods. I will also discuss our frst steps Knowledge graphs have been used to support a towards scaling the amount of data such wide range of applications and enhance search methods can be trained on by many orders of results for multiple major search engines, such as magnitude, including semi- Google and Bing. At Amazon we are building a via contextual word embeddings and supervised Product Graph, an authoritative knowledge graph learning through crowdsourcing. Our hope is that for all products in the world. The thousands of these advances, when combined, will enable very product verticals we need to model, the vast high quality semantic analysis in any domain number of data sources we need to extract from easily gathered supervision. knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Abstract 3: Graph Convolutional Networks for Personalization, Voice, that we wish to support, Extracting and Modeling Relational Data in all present big challenges in constructing such a graph. 6th Workshop on Automated Knowledge Base Construction (AKBC), Titov 10:00 AM In this talk we describe four scientifc directions Graph Convolutional Networks (GCNs) is an we are investigating in building and using such a efective tool for modeling graph structured data. graph, namely, harvesting product knowledge We investigate their applicability in the context of from the web, hands-of-the-wheel knowledge both extracting semantic relations from text integration and cleaning, human-in-the-loop (specifcally, semantic role labeling) and knowledge learning, and graph mining and graph- modeling relational data (link prediction). For enhanced search. This talk will present our semantic role labeling, we introduce a version of progress to achieve near-term goals in each GCNs suited to modeling syntactic dependency direction, and show the many research graphs and use them as sentence encoders. opportunities towards our moon-shot goals. Relying on these linguistically-informed encoders, we achieve the best reported scores on standard benchmarks for Chinese and English. For link Abstract 2: End-to-end Learning for Broad prediction, we propose Relational GCNs (RGCNs), Coverage Semantics: SRL, Coreference, and GCNs developed specifcally to deal with highly Beyond in 6th Workshop on Automated multi-relational data, characteristic of realistic Knowledge Base Construction (AKBC), knowledge Zettlemoyer 09:30 AM bases. By explicitly modeling neighbourhoods of entities, RGCNs accumulate evidence over Deep learning with large supervised training sets multiple inference steps in relational graphs and has had signifcant impact on many research yield competitive results on standard link challenges, from speech recognition to machine prediction benchmarks. translation. However, applying these ideas to problems in computational semantics has been Joint work with Diego Marcheggiani, Michael difcult, at least in part due to modest dataset Schlichtkrull, Thomas Kipf, Max Welling, Rianna sizes and relatively complex van den Berg and Peter Bloem. tasks.

In this talk, I will present two recent results on end-to-end deep learning for classic challenge Abstract 5: Learning Hierarchical problems in computational semantics: semantic Representations of Relational Data in 6th role labeling and coreference resolution. In both Workshop on Automated Knowledge Base Construction (AKBC), Nickel 11:30 AM cases, we will introduce relative simple deep neural network approaches that use no Representation learning has become an preprocessing (e.g. no POS tagger or syntactic invaluable approach for making statistical parser) and achieve signifcant performance inferences from relational data. However, while 16 Dec. 8, 2017

complex relational datasets often exhibit a latent Knowledge Base Construction (AKBC), Singh hierarchical structure, state-of-the-art embedding 02:00 PM methods typically do not account for this Existing pipelines for constructing KBs primarily property. In this talk, I will introduce a novel approach to learning such hierarchical support a restricted set of representations of symbolic data by embedding data types, such as focusing on the text of the them into hyperbolic space -- or more precisely documents when extracting information, ignoring the various modalities of evidence that we into an n-dimensional Poincaré ball. I will discuss regularly encounter, such as images, semi- how the underlying hyperbolic geometry allows us to learn parsimonious representations which structured tables, video, and audio. Similarly, simultaneously capture hierarchy and similarity. approaches that reason over incomplete and Furthermore, I will show that Poincaré uncertain KBs are limited to basic entity-relation graphs, ignoring the diversity of data types that embeddings can outperform Euclidean are useful for relational reasoning, such as text, embeddings signifcantly on data with latent hierarchies, both in terms of representation images, and numerical attributes. In this work, capacity and in terms of generalization ability. we present a novel AKBC pipeline that takes the frst steps in combining textual and relational evidence with other sources like numerical, image, and tabular data. We focus on two tasks: Abstract 6: Reading and Reasoning with single entity attribute extraction from documents Neural Program Interpreters in 6th and relational knowledge graph completion. For Workshop on Automated Knowledge Base each, we introduce new datasets that contain Construction (AKBC), Riedel 12:00 PM multimodal information, propose benchmark evaluations, and develop models that build upon We are getting better at teaching end-to-end neural models how to answer questions about advances in deep neural encoders for diferent content in natural language text. However, data types. progress has been mostly restricted to extracting answers that are directly stated in text. In this talk, I will present our work towards teaching Abstract 11: NELL: Lessons and Future machines not only to read, but also to reason Directions in 6th Workshop on Automated with what was read and to do this in a Knowledge Base Construction (AKBC), interpretable and controlled fashion. Our main Mitchell 03:15 PM hypothesis is that this can be achieved by the The Never Ending Language Learner (NELL) development of neural abstract machines that follow the blueprint of program interpreters for research project has produced a computer real-world programming languages. We test this program that has been running continuously idea using two languages: an imperative (Forth) since January 2010, learning to build a large knowledge base by extracting structured beliefs and a declarative (Prolog/Datalog) one. In both (e.g., PersonFoundedCompany(Gates,Microsoft), cases we implement diferentiable interpreters that can be used for learning reasoning patterns. BeverageServedWithBakedGood(tea,crumpets)) Crucially, because they are based on from unstructured text on the web. This talk will interpretable host languages, the interpreters provide an update on new NELL research results, refect on the lessons learned from this efort, also allow users to easily inject prior knowledge and discuss specifc challenges for future systems and inspect the learnt patterns. Moreover, on tasks such as math word problems and relational that attempt to build large knowledge bases reasoning our approach compares favourably to automatically. state-of-the-art methods.

Abstract 7: Multimodal KB Extraction and Completion in 6th Workshop on Automated 17 Dec. 8, 2017

Synergies in Geometric Data Analysis make them aware of the state of the art in GDA, (TWO DAYS) and of the tools available. Last but not least, we hope that the scientists invited will bring these

Marina Meila, Frederic Chazal, Yu-Chia Chen methods back to their communities.

102 C, Fri Dec 08, 08:00 AM Schedule This two day workshop will bring together researchers from the various subdisciplines of 08:10 Supervised learning of labeled Geometric Data Analysis, such as manifold AM pointcloud diferences via cover-tree learning, topological data analysis, shape entropy reduction Harer analysis, will showcase recent progress in this 09:10 Estimating the Reach of a Manifold feld and will establish directions for future AM Aamari research. The focus will be on high dimensional 09:40 Multiscale geometric feature and big data, and on mathematically founded AM extraction Polonik methodology. 10:10 Poster spotlights AM Specifc aims 10:15 Maximum likelihood estimation of ======AM Riemannian metrics from Euclidean One aim of this workshop is to build connections data Arvanitidis between Topological Data Analysis on one side 10:15 Parallel multi-scale reduction of and Manifold Learning on the other. This is AM persistent homology Mendoza Smith starting to happen, after years of more or less 10:15 A dual framework for low rank separate evolution of the two felds. The moment AM tensor completion Nimishakavi has been reached when the mathematical, 10:30 Cofee break statistical and algorithmic foundations of both AM areas are mature enough -- it is now time to lay 11:00 Persistent homology of KDE the foundations for joint topological and AM fltration of Rips complexes Shin, diferential geometric understanding of data, and Rinaldo this workshop will expliecitly focus on this 11:30 Characterizing non-linear process. AM methods using Laplacian-like operators Ting The second aim is to bring GDA closer to real 01:30 Poster session I applications. We see the challenge of real PM problems and real data as a motivator for 02:00 Multiscale characterization of researchers to explore new research questions, to PM molecular dynamics Clementi reframe and expand the existing theory, and to step out of their own sub-area. In particular, for 03:30 Functional Data Analysis using a people in GDA to see TDA and ML as one. PM Topological Summary Statistic: the Smooth Euler Characteristic The impact of GDA in practice also depends on Transform, Crawford having scalable implementations of the most 04:00 Consistent manifold representation current results in theory. This workshop will PM for TDA Sauer showcase the GDA tools which achieve this and 05:00 Discussion: Geometric Data Analysis initiate a collective discussion about the tools PM Chazal, Meila that need to be built. 08:30 Topological Data Analisys with AM GUDHI and scalable manifold We intend this workshop to be a forum for learning and clustering with researchers in all areas of Geometric Data megaman Rouvreau, Meila Analysis. Trough the tutorials, we are reaching 08:50 Introduction to the R package TDA out to the wider NIPS audience, to the many AM KIM potential users of of Geometric Data Analysis, to 18 Dec. 8, 2017

09:20 Riemannian metric estimation and approximation perspective. We derive new AM the problem of isometric embedding geometric results on the reach for submanifolds Perrault-Joncas without boundary. An estimator ˆτ of τM is 09:50 Ordinal distance comparisons: from proposed in a framework where tangent spaces AM topology to geometry von Luxburg are known, and bounds assessing its efciency 10:20 Cofee break are derived. In the case of i.i.d. random point AM cloud X n , τˆ ( X n) is showed to achieve uniform expected loss bounds over a C3-like model. 10:50 Geometric Data Analysis software Finally, we obtain upper and lower bounds on the AM minimax rate for estimating the reach. 11:30 Poster session II AM 02:00 Modal-sets, and density-based PM Clustering Kpotufe Abstract 4: Poster spotlights in Synergies in Geometric Data Analysis (TWO DAYS), 10:10 02:30 A Note on Community Trees in AM PM Networks Chen 03:30 Beyond Two-sample-tests: Localizing Each poster presenter will have approximatively PM Data Discrepancies in High- 1.5 minutes to advertise their posters. dimensional Spaces Cazals We encourage all poster presenters to put up their posters at the beginning of the workshop.

Abstracts (15):

Abstract 1: Supervised learning of labeled Abstract 6: Parallel multi-scale reduction of pointcloud diferences via cover-tree persistent homology in Synergies in entropy reduction in Synergies in Geometric Geometric Data Analysis (TWO DAYS), Data Analysis (TWO DAYS), Harer 08:10 AM Mendoza Smith 10:15 AM

We introduce a new algorithm, called CDER, for Persistent homology is a mathematical formalism supervised machine learning that merges the based on algebraic topology and is central to multi-scale geometric properties of Cover Trees Topological Data Analysis (TDA). Its paradigm with the information-theoretic properties of consists in estimating the topological features of entropy. CDER applies to a training set of labeled a shape embedded in an Euclidean space from a pointclouds embedded in a common Euclidean point-cloud sampled from it. The estimation is space. If typical pointclouds corresponding to done at multiple scales by reducing a, so-called, distinct labels tend to difer at any scale in any boundary matrix which is a sparse binary matrix sub-region, CDER can identify these diferences in that encodes a simplicial complex fltration built linear time, creating a set of distributional from the point-cloud. The reduction process is coordinates which act as a feature extraction similar to Gaussian elimination and represents an mechanism for supervised learning. We describe important computational bottleneck in the the use of CDER both directly on point clouds and pipeline. To improve the scalability of the TDA on persistence diagrams. framework, several strategies to accelerate it have been proposed. Herein, we present a number of structural dependencies in boundary Abstract 2: Estimating the Reach of a matrices and use them to design a novel parallel Manifold in Synergies in Geometric Data reduction algorithm. In particular, we show that Analysis (TWO DAYS), Aamari 09:10 AM this structural information: (i) makes part of the reduction immediately apparent, (ii) decreases Various problems in manifold estimation make the total number of column operations required use of a quantity called the reach, denoted by for reduction, (iii) gives a framework for which τM, which is a measure of the regularity of the the process can be massively parallelised. manifold. This paper is the frst investigation into Simulations on synthetic examples show that the the problem of how to estimate the reach. First, computational burden can be conducted in a we study the geometry of the reach through an small fraction of the number of iterations needed 19 Dec. 8, 2017

by traditional methods. Moreover, whereas the way to estimate the target quantity. In practice, traditional methods reveal barcodes sequentially however, calculating the persistent homology of from a fltration order, this approach gives an KDEs on d-dimensional Euclidean spaces requires alternative method by which barcodes are partly to approximate the ambient space to a grid, revealed for multiple scales simultaneously and which could be computationally inefcient when further refned as the algorithm progresses. the dimension of the ambient space is high or Specifcally, our numerical experiments show that topological features are in diferent scales. In this for a Vietoris-Rips fltration with 10^4 simplices, abstract, we consider the persistent homologies the essential topological information can be of KDE fltrations on Rips complexes as estimated with 95% precision in two iterations alternative estimators. We show consistency and that the reduction completed to within 1% in results for both the persistent homology of the about ten iterations of our algorithm as opposed upper level sets fltration of the density and its to nearly approximately eight thousand iterations simplifed version. We also describe a novel for traditional methods. methodology to construct an asymptotic confdence set based on the bootstrap procedure. Unlike existing procedures, our method does not heavily rely on grid-approximations, scales to Abstract 7: A dual framework for low rank tensor completion in Synergies in higher dimensions, and is adaptive to Geometric Data Analysis (TWO DAYS), heterogeneous topological features. Nimishakavi 10:15 AM

We propose a novel formulation of the low-rank Abstract 10: Characterizing non-linear tensor completion problem that is based on the dimensionality reduction methods using duality theory and a particular choice of low-rank Laplacian-like operators in Synergies in regularizer. This low-rank regularizer along with Geometric Data Analysis (TWO DAYS), Ting the dual perspective provides a simple 11:30 AM characterization of the solution to the tensor completion problem. Motivated by large-scale (note: the talk is 30 mins, but the server has setting, we next derive a rank-constrained problems with 12:00 noon) reformulation of the proposed optimization problem, which is shown to lie on the Riemannian We examine a number of non-linear spectrahedron manifold. We exploit the versatile dimensionality reduction techniques including Riemannian optimization framework to develop Laplacian Eigenmaps, LLE, MVU, HLLE, LTSA, and computationally efcient conjugate gradient and t-SNE. In each case we show that the non-linear trust-region algorithms. The experiments confrm embedding can be characterized by a Laplacian the benefts of our choice of regularization and or Laplacian-like operator. By comparing the the proposed algorithms outperform state-of-the- resulting operators, one can uncover the art algorithms on several real-world data sets in similarities and diferences between the methods. diferent applications. For example, HLLE and LTSA can be shown to be asymptotically identical, and whilst maximum variance unfolding (MVU) can be shown to generate a Laplacian, the behavior of the Abstract 9: Persistent homology of KDE fltration of Rips complexes in Synergies in Laplacian is completely diferent from that Geometric Data Analysis (TWO DAYS), Shin, generated by Laplacian Eigenmaps. We discuss Rinaldo 11:00 AM the implications of this characterization for generating new non-linear dimensionality When we observe a point cloud in the Euclidean reduction methods and smoothness penalties. space, the persistent homology of the upper level sets fltration of the density is one of the most important tools to understand topological Abstract 13: Functional Data Analysis using a features of the data generating distribution. The Topological Summary Statistic: the Smooth persistent homology of KDEs (kernel density Euler Characteristic Transform, in Synergies estimators) for the density function is a natural 20 Dec. 8, 2017

in Geometric Data Analysis (TWO DAYS), Abstract 17: Introduction to the R package Crawford 03:30 PM TDA in Synergies in Geometric Data Analysis (TWO DAYS), KIM 08:50 AM Lorin Crawford1,2,3,†, Anthea Monod4,†, Andrew X. Chen4, Sayan Mukherjee5,6,7,8, and Rau ́l We present a short tutorial and introduction to Rabad ́an4 using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about Abstract 15: Discussion: Geometric Data Analysis in Synergies in Geometric Data the underlying space, such as the distance Analysis (TWO DAYS), Chazal, Meila 05:00 PM function, the distance to a measure, the kNN density estimator, the kernel density estimator, One aim of this workshop is to build connections and the kernel distance. The salient topological between Topological Data Analysis on one side features of the sublevel sets (or superlevel sets) and Manifold Learning on the other. The moment of these functions can be quantifed with has been reached when the mathematical, persistent homology. We provide an R interface statistical and algorithmic foundations of both for the efcient algorithms of the C++ libraries areas are mature enough -- it is now time to lay GUDHI, Dionysus, and PHAT, including a function the foundations for joint topological and for the persistent homology of the Rips fltration, diferential geometric understanding of data, and and one for the persistent homology of sublevel this discussion will explicitly focus on this sets (or superlevel sets) of arbitrary functions process. evaluated over a grid of points. The signifcance of the features in the resulting persistence The second aim is to bring GDA closer to real diagrams can be analyzed with functions that applications. We see the challenge of real implement the methods discussed in Fasy, Lecci, problems and real data as a motivator for Rinaldo, Wasserman, Balakrishnan, and Singh researchers to explore new research questions, to (2014), Chazal, Fasy, Lecci, Rinaldo, and reframe and expand the existing theory, and to Wasserman (2014c) and Chazal, Fasy, Lecci, step out of their own sub-area. Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows Abstract 16: Topological Data Analisys with us to identify the spatial organization of the probability mass associated to a density function GUDHI and scalable manifold learning and and visualize it by means of a dendrogram, the clustering with megaman in Synergies in Geometric Data Analysis (TWO DAYS), cluster tree. Rouvreau, Meila 08:30 AM

Presentation and demo of the Gudhi library for Abstract 21: Geometric Data Analysis Topological Data Analysis, followed by a software in Synergies in Geometric Data presentation of the magaman package. Analysis (TWO DAYS), 10:50 AM

The aim of the presentations will be give an We invite the GDA community to discuss the introduction for beginners into the practical side goods, the bads and the ways forward in the of GDA, and to give an overview of the software software for GDA. capabilities. The presenters will leave ample time for questions and will be available during poster [NOTE THE START TIME: 10 minutes before the sessions for more detailed discussions and ofcial end of the break] demos. http://gudhi.gforge.inria.fr/ Abstract 23: Modal-sets, and density-based http://github.com/mmp2/megaman Clustering in Synergies in Geometric Data Analysis (TWO DAYS), Kpotufe 02:00 PM 21 Dec. 8, 2017

Modes or Modal-sets are points or regions of Competition track space where the underlying data density is locally-maximal. They are relevant in problems Sergio Escalera, Markus Weimer such as clustering, outlier detection, or can simply serve to identify salient structures in high- 103 A+B, Fri Dec 08, 08:00 AM dimensional data (e.g. point-cloud data from medical imaging, astronomy, etc). This is the frst NIPS edition on "NIPS Competitions". We received 23 competition In this talk we will argue that modal-sets, as proposals related to data-driven and live general extremal surfaces, yield more stable competitions on diferent aspects of NIPS. clustering than usual modes (extremal points) of Proposals were reviewed by several qualifed a density. For one, modal-sets can be consistently researchers and experts in challenge estimated, at non-trivial convergence rates, organization. Five top-scored competitions were despite the added complexity of unknown accepted to be run and present their results surface-shape and dimension. Furthermore, during the NIPS 2017 Competition track day. modal-sets neatly dovetail into existing Evaluation was based on the quality of data, approaches that cluster data around point- problem interest and impact, promoting the modes, yielding competitive, yet more stable design of new models, and a proper schedule and clustering on a range of real-world problems. managing procedure. Below, you can fnd the fve accepted competitions. Organizers and participants in these competitions will be invited to present their work to this workshop, to be held Abstract 24: A Note on Community Trees in on December 8th. Networks in Synergies in Geometric Data Analysis (TWO DAYS), Chen 02:30 PM Accepted competitions: We introduce the concept of community trees The Conversational Intelligence Challenge that summarizes topological structures within a Webpage: http://convai.io network. A community tree is a tree structure Classifying Clinically Actionable Genetic representing clique communities from the clique Mutations Webpage: https://www.kaggle.com/c/ percolation method (CPM). The community tree msk-redefning-cancer-treatment also generates a persistent diagram. Community Learning to Run Webpage: https:// trees and persistent diagrams reveal topological www.crowdai.org/challenges/nips-2017-learning- structures of the underlying networks and can be to-run used as visualization tools. We study the stability Human-Computer Question Answering of community trees and derive a quantity called Competition Webpage: http://sites.google.com/ the total star number (TSN) that presents an view/hcqa/ upper bound on the change of community trees. Adversarial Attacks and Defences Webpage: Our fndings provide a topological interpretation https://www.kaggle.com/nips-2017-adversarial- for the stability of communities generated by the learning-competition CPM. Schedule

Abstract 25: Beyond Two-sample-tests: 08:15 Opening Escalera, Weimer Localizing Data Discrepancies in High- AM dimensional Spaces in Synergies in Geometric Data Analysis (TWO DAYS), Cazals 08:30 AI XPRIZE Milestone award 03:30 PM AM Banifatemi, khalaf, McGregor 09:00 Competition I: Adversarial Attacks https://hal.inria.fr/hal-01159235/document AM and Defenses Kurakin, Goodfellow, Bengio, Zhao, Dong, Pang, Liao, Xie, Ganesh, Elibol 22 Dec. 8, 2017

10:00 Cofee Break and Competition I: Abstract 3: Competition I: Adversarial AM Adversarial Attacks and Defenses Attacks and Defenses in Competition track, posters Kurakin, Goodfellow, Bengio, Zhao, Dong, Pang, 10:30 Competition II: Learning to Run Liao, Xie, Ganesh, Elibol 09:00 AM AM Kidziński, Ong, Mohanty, Fries, Hicks, * Introduction into adversarial examples. Invited Zheng, Yuan, Plis speaker, Dawn Song 12:15 Lunch * Overview of the competition, Alexey Kurakin, PM Ian Goodfellow 01:15 DeepArt competition Ecker, Gatys, * Winner of attack competition, Yinpeng Dong, PM Bethge Fangzhou Liao, Tianyu Pang 01:30 Competition III: The Conversational * Winner of defense competition, Yinpeng Dong, PM Intelligence Challenge Burtsev, Lowe, Fangzhou Liao, Tianyu Pang Serban, Bengio, Rudnicky, Black, * 2nd place defense competition, Cihang Xie Prabhumoye, Rodichev, Smetanin, Fedorenko, Lee, HONG, Lee, Kim, Gontier, Saito, Gershfeld, Burachenok Abstract 5: Competition II: Learning to Run in 03:00 Cofee break and Competition III: Competition track, Kidziński, Ong, Mohanty, PM The Conversational Intelligence Fries, Hicks, Zheng, Yuan, Plis 10:30 AM Challenge posters 03:30 Competition IV: Classifying Clinically * Overview of the Challenge PM Actionable Genetic Mutations * Keynote: Carmichael Ong Macgari, Grigorenko, Das, Thorbergsson, * Challenge logistics (CrowdAI platform) Kan, Zhang, Chen, Li, Kumar, Gao * Run top submissions on the fnal test 05:15 Break environment PM * Talks from top participants 05:45 Competition V: Human-Computer * AWS & NVIDIA sponsored prize ceremony PM Question Answering Boyd-Graber, * Keynote: Sergey Levine Daumé III, He, Iyyer, Rodriguez 07:30 Closing Escalera, Weimer PM Abstract 7: DeepArt competition in 07:30 Dinner Competition track, Ecker, Gatys, Bethge 01:15 PM PM

Competition review, results, and award ceremony Abstracts (6):

Abstract 2: AI XPRIZE Milestone award in Abstract 8: Competition III: The Competition track, Banifatemi, khalaf, Conversational Intelligence Challenge in McGregor 08:30 AM Competition track, Burtsev, Lowe, Serban, Bengio, Rudnicky, Black, Prabhumoye, Rodichev, • Overview of the Challenge Smetanin, Fedorenko, Lee, HONG, Lee, Kim, Gontier, Saito, Gershfeld, Burachenok 01:30 PM • Overview of top 10 competitors * Overview of the Challenge * Awarding prize • Presentation by top 2 competitors * Short presentation by winning team * Inspirational talk: Alexander Rudnicky • Winners recognition on stage Abstract: Current automatic systems use limited models of conversation with humans. Historically • Keynote: XPRIZE scientifc Advisor research in dialog systems has focused on goal- directed interaction for which objective measures of success could be defned. More recently 23 Dec. 8, 2017

attention has turned to so-called open-domain helped form the foundations of a new research conversation chatbots, similar to social community. exchanges between people but for which measures of success are not yet agreed upon. On This year’s program emphasizes identifying refection, natural human communication is previously unidentifed problems in healthcare mostly a combination of task, social and likely that the machine learning community hasn't other levels. The challenge is to understand how addressed, or seeing old challenges through a to manage such communication in a way that new lens. While healthcare and medicine are refects human capabilities. often touted as prime examples for disruption by * Panel Discussion. Moderated by Julian Serban. AI and machine learning, there has been Panelists: Alexander Rudnicky, Dilek Hakkani-Tur, vanishingly little evidence of this disruption to Jianfeng Gao and Joelle Pineau. date. To interested parties who are outside of the medical establishment (e.g. machine learning researchers), the healthcare system can appear Abstract 12: Competition V: Human-Computer byzantine and impenetrable, which results in a high barrier to entry. In this workshop, we hope to Question Answering in Competition track, reduce this activation energy by bringing Boyd-Graber, Daumé III, He, Iyyer, Rodriguez 05:45 PM together leaders at the forefront of both machine learning and healthcare for a dialog on areas of * Overview of the Challenge medicine that have immediate opportunities for * Talks from participants machine learning. Attendees at this workshop will * Competition against human team quickly gain an understanding of the key * Q&A with organizers, experts, developers problems that are unique to healthcare and how machine learning can be applied to addressed these challenges.

Machine Learning for Health (ML4H) - The workshop will feature invited talks from What Parts of Healthcare are Ripe for leading voices in both medicine and machine learning. A key part of our workshop is the Disruption by Machine Learning Right clinician pitch; a short presentation of open Now? clinical problems where data-driven solutions can make an immediate diference. This year’s Andrew Beam, Andrew Beam, Madalina Fiterau, program will also include spotlight presentations Madalina Fiterau, Peter Schulam, Peter Schulam, Jason Fries, Jason Fries, Mike Hughes, Mike Hughes, and two poster sessions highlighting novel Alex Wiltschko, Alex Wiltschko, Jasper Snoek, Jasper research contributions at the intersection of Snoek, Natalia Antropova, Natalia Antropova, Rajesh machine learning and healthcare. The workshop Ranganath, Rajesh Ranganath, Bruno Jedynak, Bruno will conclude with an interactive a panel Jedynak, Tristan Naumann, Tristan Naumann, Adrian discussion where all speakers respond to Dalca, Adrian Dalca, Adrian Dalca, Adrian Dalca, Tim Althof, Tim Althof, Shubhi ASTHANA, Shubhi questions provided by the audience. ASTHANA, Prateek Tandon, Prateek Tandon, Jaz Kandola, Jaz Kandola, Alexander Ratner, Alexander Ratner, David Kale, David Kale, Uri Shalit, Uri Shalit, Schedule Marzyeh Ghassemi, Marzyeh Ghassemi, Isaac S Kohane, Isaac S Kohane 08:00 Welcome and opening remarks 104 A, Fri Dec 08, 08:00 AM AM 08:20 Keynote: Zak Kohane, Harvard DBMI The goal of the NIPS 2017 Machine Learning for AM Kohane Health Workshop (ML4H) is to foster 08:55 Jennifer Chayes, Microsoft Research collaborations that meaningfully impact medicine AM New England Chayes by bringing together clinicians, health data 09:20 Keynote: Susan Murphy, U. Michigan experts, and machine learning researchers. We AM Murphy aim to build on the success of the last two NIPS ML4H workshops which were widely attended and 24 Dec. 8, 2017

09:55 Contributed spotlights 05:25 Keynote: Atul Butte Butte AM PM 10:20 Cofee break and Poster Session I 06:00 Closing Remarks AM Khandwala, Gallant, Way, Raghu, Shen, PM Gasimova, Bozkurt, Boag, Lopez- Martinez, Bodenhofer, Nasiri GhoshehBolagh, Guo, Kurz, Pillay, Perros, Abstracts (3): Chen, Yahi, Sushil, Purushotham, Abstract 7: Invited clinical panel in Machine Tutubalina, Virdi, Schulz, Weisenthal, Learning for Health (ML4H) - What Parts of Srikishan, Veličković, Ahuja, Miller, Craig, Healthcare are Ripe for Disruption by Ji, Dabek, Pou-Prom, Zhang, Kalyanam, Machine Learning Right Now?, Velazquez, Weng, Bhat, Chen, Kohl, Gao, Zhu, Poh, Priest, strigo 10:50 AM Urteaga, Honoré, De Palma, Al-Shedivat, Rajpurkar, McDermott, Chen, Sui, Lee, Susann Beier, U. Auckland Cheng, Fang, Hussain, Furlanello, Waks, James Priest, Stanford Chougrad, Kjellstrom, Doshi-Velez, Irina Strigo, UCSF Fruehwirt, Zhang, Hu, Chen, Park, Enrique Velazquez, Rady Children’s Hospital Mikelsons, Dakka, Hyland, chevaleyre, Lee, Giro-i-Nieto, Kale, Hughes, Erion, Mehra, Zame, Trajanovski, Chakraborty, Abstract 9: Interactive panel in Machine Peterson, Srivastava, Jin, Tejeda Lemus, Learning for Health (ML4H) - What Parts of Ray, Madl, Futoma, Gong, Ahmad, Lei, Healthcare are Ripe for Disruption by Legros Machine Learning Right Now?, 01:30 PM 10:50 Invited clinical panel Velazquez, Priest, AM strigo Interactive panel moderated by Zak Kohane: 11:50 Keynote II: Fei-Fei Li, Stanford Fei-Fei - Atul Butte AM - Jennifer Chayes 01:30 Interactive panel - Fei-Fei Li PM - Jill Mesirov - Susan Murphy 02:30 Jill Mesirov, UCSD Mesirov - Mustafa Sulyman PM 02:55 Greg Corrado, Google Corrado PM 03:20 Cofee break and Poster Session II Abstract 13: Award session + A word from PM Kane, Haque, Papalexakis, Guibas, Li, our afliates in Machine Learning for Health Arias, Nalisnick, Smyth, Rudzicz, Zhu, (ML4H) - What Parts of Healthcare are Ripe Willke, Elhadad, Rafauf, Suresh, Varma, for Disruption by Machine Learning Right Yue, Rudovic, Foschini, Ahmad, Haq, Now?, 03:50 PM Maggio, Jurman, Parbhoo, Bashivan, Award session and a word from afliates Islam, Musolesi, Wu, Ratner, Dunnmon, - Eunho Yang, KAIST, Korea Esteban, Galstyan, Ver Steeg, - Sung-ju Hwang,UNIST, Korea Khachatrian, Górriz, van der Schaar, representing AItrics, DeepMind, IBM, Google, MSR Nemchenko, Patwardhan, Tandon 03:50 Award session + A word from our PM afliates 04:10 Mihaela Van Der Schaar, Oxford Acting and Interacting in the Real World: PM Challenges in Robot Learning 04:35 Networking Break

PM Ingmar Posner, Raia Hadsell, Martin Riedmiller, 05:00 Jure Leskovec, Stanford Leskovec Markus Wulfmeier, Rohan Paul PM 104 B, Fri Dec 08, 08:00 AM 25 Dec. 8, 2017

In recent years robotics has made signifcant reinforcement learning for real world applications, strides towards applications of real value to the learning from demonstration for real world public domain. Robots are now increasingly applications, knowledge learning from expected to work for and alongside us in observation and interaction, active concept complex, dynamic environments. Machine acquisition and learning causal models. learning has been a key enabler of this success, particularly in the realm of robot perception where, due to substantial overlap with the Schedule machine vision community, methods and training data can be readily leveraged. 08:50 Welcome AM Recent advances in reinforcement learning and 09:00 Pieter Abbeel: Reducing Data Needs learning from demonstration — geared towards AM for Real-World Reinforcement teaching agents how to act — provide a Learning tantalising glimpse at a promising future 09:30 Pierre Sermanet: Self-Supervised trajectory for robot learning. Mastery of AM Imitation Sermanet challenges such as the Atari suite and AlphaGo 11:30 Raquel Urtasun: Deep Learning for build signifcant excitement as to what our robots AM Self-Driving Cars Urtasun may be able to do for us in the future. However, this success relies on the ability of learning 02:00 Jitendra Malik: Vision for cheaply, often within the confnes of a virtual PM Manipulation and Navigation Malik environment, by trial and error over as many 02:30 Martial Hebert: Reducing episodes as required. This presents a signifcant PM supervision Hebert challenge for embodied systems acting and 03:00 Poster Session Bruce, Quillen, interacting in the real world. Not only is there a PM Rakicevic, Chua, Schenck, Chien, cost (either monetary or in terms of execution Babaeizadeh, Wichers, yan, Wohlhart, time) associated with a particular trial, thus Ibarz, Konolige limiting the amount of training data obtainable, but there also exist safety constraints which Abstracts (6): make an exploration of the state space simply unrealistic: teaching a real robot to cross a real Abstract 2: Pieter Abbeel: Reducing Data road via reinforcement learning for now seems a Needs for Real-World Reinforcement noble yet somewhat far fetched goal. A Learning in Acting and Interacting in the signifcant gulf therefore exists between prior art Real World: Challenges in Robot on teaching agents to act and efective Learning, 09:00 AM approaches to real-world robot learning. This, we posit, is one of the principal impediments at the Reinforcement learning and imitation learning moment in advancing real-world robotics science. have seen success in many domains, including autonomous helicopter fight, Atari, simulated In order to bridge this gap researchers and locomotion, Go, robotic manipulation. However, practitioners in robot learning have to address a of these methods remains number of key challenges to allow real-world very high. In this talk I will present several ideas systems to be trained in a safe and data-efcient towards reducing sample complexity: (i) manner. This workshop aims to bring together Hindsight Experience Replay, which infuses experts in reinforcement learning, learning from learning signal into (traditionally) zero-reward demonstration, deep learning, feld robotics and runs, and is compatible with existing of-policy beyond to discuss what the principal challenges algorithms; (ii) Some recent advances in Model- are and how they might be addressed. With a based Reinforcement Learning, which achieve particular emphasis on data efcient learning, of 100x sample complexity gain over the more particular interest will be contributions in widely studied model-free methods; (iii) Meta- representation learning, curriculum learning, task Reinforcement Learning, which can signifcantly transfer, one-shot learning, domain transfer (in reduce sample complexity by building of other particular from simulation to real-world tasks), skills acquired in the past; (iv) Domain 26 Dec. 8, 2017

Randomization, a simple idea that can often Interacting in the Real World: Challenges in enable training fully in simulation, yet still Robot Learning, Urtasun 11:30 AM recover policies that perform well in the real Raquel Urtasun is the Head of Uber ATG Toronto. world. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Raquel Urtasun is the Abstract 3: Pierre Sermanet: Self-Supervised Head of Uber ATG Toronto. She is also an Imitation in Acting and Interacting in the Associate Professor in the Department of Real World: Challenges in Robot Learning, Computer Science at the University of Toronto, a Sermanet 09:30 AM Canada Research Chair in Machine Learning and We propose a self-supervised approach for Computer Vision and a co-founder of the Vector Institute for AI. Prior to this, she was an Assistant learning representations and robotic behaviors Professor at the Toyota Technological Institute at entirely from unlabeled videos recorded from multiple viewpoints. We study how these Chicago (TTIC), an academic computer science representations can be used in two robotic institute afliated with the University of Chicago. imitation settings: imitating object interactions She was also a visiting professor at ETH Zurich during the spring semester of 2010. She received from videos of humans, and imitating human her Bachelors degree from Universidad Publica de poses. Imitation of human behavior requires a viewpoint-invariant representation that captures Navarra in 2000, her Ph.D. degree from the the relationships between end-efectors (hands or Computer Science department at Ecole robot grippers) and the environment, object Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. attributes, and body pose. We train our She is a world leading expert in machine representations using a triplet loss, where multiple simultaneous viewpoints of the same perception for self-driving cars. Her research observation are attracted in the embedding interests include machine learning, computer space, while being repelled from temporal vision, robotics and remote sensing. Her lab was selected as an NVIDIA NVAIL lab. She is a neighbors which are often visually similar but recipient of an NSERC EWR Steacie Award, an functionally diferent. This signal causes our model to discover attributes that do not change NVIDIA Pioneers of AI Award, a Ministry of across viewpoint, but do change across time, Education and Innovation Early Researcher while ignoring nuisance variables such as Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught occlusions, motion blur, lighting and background. New Researcher Award and two Best Paper We demonstrate that this representation can be used by a robot to directly mimic human poses Runner up Prize awarded at the Conference on without an explicit correspondence, and that it Computer Vision and Pattern Recognition (CVPR) can be used as a reward function within a in 2013 and 2017 respectively. She is also an Editor of the International Journal in Computer reinforcement learning algorithm. While Vision (IJCV) and has served as Area Chair of representations are learned from an unlabeled collection of task-related videos, robot behaviors multiple machine learning and vision conferences such as pouring are learned by watching a single (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV). 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation Abstract 5: Jitendra Malik: Vision for enable efcient reinforcement learning that is Manipulation and Navigation in Acting and practical for real-world robotic systems. Video Interacting in the Real World: Challenges in results, open-source code and dataset are Robot Learning, Malik 02:00 PM available at https://sermanet.github.io/imitate I will describe recent results from my group on visually guided manipulation and navigation. We are guided considerably by insights from human Abstract 4: Raquel Urtasun: Deep Learning development and cognition. In manipulation, our for Self-Driving Cars in Acting and work is based on object-oriented task models 27 Dec. 8, 2017

acquired by experimentation. In navigation, we Learning Robotic Assembly from CAD show the benefts of architectures based on cognitive maps and landmarks. Learning Robot Skill Embeddings

Self-Supervised Visual Planning with Temporal Skip Connections Abstract 6: Martial Hebert: Reducing supervision in Acting and Interacting in the Real World: Challenges in Robot Learning, Overcoming Exploration in Reinforcement Hebert 02:30 PM Learning with Demonstrations

A key limitation, in particular for computer vision Deep Reinforcement Learning for Vision-Based tasks, is their reliance on vast amounts of Robotic Grasping strongly supervised data. This limits scalability, prevents rapid acquisition of new concepts, and Soft Value Iteration Networks for Planetary Rover limits adaptability to new tasks or new Path Planning conditions. To address this limitation, I will explore ideas in learning visual models from limited data. The basic insight behind all of these ideas is that it is possible to learn from a large Posters: corpus of vision tasks how to learn models for One-Shot Visual Imitation Learning via Meta- new tasks with limited data, by representing the Learning way visual models vary across tasks, also called model dynamics. The talk will also show One-Shot Reinforcement Learning for Robot examples from common visual classifcation Navigation with Interactive Replay < Jake Bruce; tasks. Niko Suenderhauf; Piotr Mirowski; Raia Hadsell; Michael Milford >

Abstract 7: Poster Session in Acting and Bayesian Active Edge Evaluation on Expensive Interacting in the Real World: Challenges in Graphs < Sanjiban Choudhury > Robot Learning, Bruce, Quillen, Rakicevic, Sim-to-Real Transfer of Accurate Grasping with Chua, Schenck, Chien, Babaeizadeh, Wichers, Eye-In-Hand Observations and Continuous yan, Wohlhart, Ibarz, Konolige 03:00 PM Control < Mengyuan Yan; Iuri Frosio*; Stephen Spotlights: Tyree; Kautz Jan > Deep Object-Centric Representations for Generalizable Robot Learning < Learning Robotic Manipulation of Granular Media Coline Devin> < Connor Schenck*; Jonathan Tompson; Dieter Fox; Sergey Levin> Using Simulation and Domain Adaptation to Improve Efciency of Deep Robotic Grasping End-to-End Learning of Semantic Grasping < Eric Jang > Learning Deep Composable Maximum-Entropy Policies for Real-World Robotic Manipulation Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot SE3-Pose-Nets: Structured Deep Dynamics Navigation Models for Visuomotor Control Efcient Robot Task Learning and Transfer via Learning Flexible and Reusable Locomotion Informed Search in Movement Parameter Space Primitives for a Microrobot < Nemanja Rakicevic*; Kormushev Petar >

Policy Search using Robust Bayesian Optimization Metrics for Deep Generative Models based on Learned Skills 28 Dec. 8, 2017

Unsupervised Hierarchical Video Prediction < world problems in physical sciences (including Nevan wichers*; Dumitru Erhan; Honglak Lee > the felds and subfelds of astronomy, chemistry, Earth science, and physics). Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation We will discuss research questions, practical implementation challenges, performance / scaling, and unique aspects of processing and Domain Randomization and Generative Models analyzing scientifc datasets. The target audience for Robotic Grasping comprises members of the machine learning community who are interested in scientifc Learning to Grasp from Vision and Touch applications and researchers in the physical sciences. By bringing together these two Neural Network Dynamics Models for Control of communities, we expect to strengthen dialogue, Under-actuated Legged Millirobots introduce exciting new open problems to the wider NIPS community, and stimulate production On the Importance of Uncertainty for Control with of new approaches to solving science problems. Deep Dynamics Models Invited talks from leading individuals from both communities will cover the state-of-the-art Increasing Sample-Efciency via Online Meta- techniques and set the stage for this workshop. Learning

Stochastic Variational Video Prediction Schedule

(Author information copied from CMT please 08:50 Introduction and opening remarks contact the workshop organisers under AM [email protected] for any changes) 09:00 Invited talk 1: Deep recurrent AM inverse modeling for radio astronomy and fast MRI imaging Welling Deep Learning for Physical Sciences 09:40 Contributed talk 1: Neural Message AM Passing for Jet Physics Henrion Atilim Gunes Baydin, Mr. Prabhat, Kyle Cranmer, Frank Wood 10:00 Contributed talk 2: A Foray into AM Using Neural Network Control 104 C, Fri Dec 08, 08:00 AM Policies For Rapid Switching Between Beam Parameters in a Free Physical sciences span problems and challenges Electron Laser Edelen at all scales in the universe: from fnding 10:20 Poster session 1 and cofee break exoplanets and asteroids in trillions of sky-survey AM Hagge, McGregor, Stoye, Pham, Hong, pixels, to automatic tracking of extreme weather Farbin, Seo, Zoghbi, George, Fort, Farrell, phenomena in climate datasets, to detecting Pajot, Pearson, McCarthy, Germain, anomalies in event streams from the Large Anderson, Lezcano Casado, Mudigonda, Hadron Collider at CERN. Tackling a number of Nachman, de Oliveira, Jing, Li, Kim, associated data-intensive tasks, including, but Gebhard, Zahavy not limited to, regression, classifcation, 11:00 Invited talk 2: Adversarial Games for clustering, dimensionality reduction, likelihood- AM Particle Physics Louppe free inference, generative models, and 11:40 Contributed talk 3: Implicit Causal experimental design are critical for furthering AM Models for Genome-wide Association scientifc discovery. The Deep Learning for Studies Tran Physical Sciences (DLPS) workshop invites 12:00 Contributed talk 4: Graphite: researchers to contribute papers that PM Iterative Generative Modeling of demonstrate progress in the application of Graphs Grover machine and deep learning techniques to real- 29 Dec. 8, 2017

12:20 Sponsor presentation: Intel Nervana models in high dimensional spaces, problems PM Tang associated with data-poor domains, adversarial 12:25 Lunch break examples, high computational requirements, and PM research driven by companies using large in- 02:00 Invited talk 3: Learning priors, house datasets that is ultimately not PM likelihoods, or posteriors Murray reproducible. 02:40 Contributed talk 5: Deep Learning ML4Audio aims to promote progress, PM for Real-time Gravitational Wave systematization, understanding, and convergence Detection and Parameter Estimation of applying machine learning in the area of audio with Real LIGO Data George signal processing. Specifcally, we are interested 03:00 Poster session 2 and cofee break in work that demonstrates novel applications of PM McGregor, Hagge, Stoye, Pham, Hong, machine learning techniques to audio data, as Farbin, Seo, Zoghbi, George, Fort, Farrell, well as methodological considerations of merging Pajot, Pearson, McCarthy, Germain, machine learning with audio signal processing. Anderson, Lezcano Casado, Mudigonda, We seek contributions in, but not limited to, the Nachman, de Oliveira, Jing, Li, Kim, following topics: Gebhard, Zahavy - audio information retrieval using machine 04:00 Invited talk 4: A machine learning learning; PM perspective on the many-body - audio synthesis with given contextual or musical problem in classical and quantum constraints using machine learning; physics Carrasquilla - audio source separation using machine learning; 04:40 Invited talk 5: Quantum Machine - audio transformations (e.g., sound morphing, PM Learning von Lilienfeld style transfer) using machine learning; 05:20 Contributed talk 6: Physics-guided - , online learning, one-shot PM Learning of Neural Networks: An learning, reinforcement learning, and incremental Application in Lake Temperature learning for audio; Modeling Karpatne - applications/optimization of generative 05:40 Panel session Murray, Welling, adversarial networks for audio; PM Carrasquilla, von Lilienfeld, Louppe, - cognitively inspired machine learning models of Cranmer sound cognition; 06:40 Closing remarks - mathematical foundations of machine learning PM for audio signal processing.

This workshop especially targets researchers, developers and musicians in academia and Machine Learning for Audio Signal industry in the area of MIR, audio processing, Processing (ML4Audio) hearing instruments, speech processing, musical HCI, musicology, music technology, music Hendrik Purwins, Bob L. Sturm, Mark Plumbley entertainment, and composition.

201 A, Fri Dec 08, 08:00 AM ML4Audio Organisation Committee: Abstracts and full papers: http://media.aau.dk/ Hendrik Purwins, Aalborg University Copenhagen, smc/ml4audio/ Denmark ([email protected]) Bob L. Sturm, Queen Mary University of London, Audio signal processing is currently undergoing a UK ([email protected]) paradigm change, where data-driven machine Mark Plumbley, University of Surrey, UK learning is replacing hand-crafted feature design. ([email protected]) This has led some to ask whether audio signal processing is still useful in the "era of machine Program Committee: learning." There are many challenges, new and Abeer Alwan (University of California, Los old, including the interpretation of learned Angeles) Jon Barker (University of Shefeld) 30 Dec. 8, 2017

Sebastian Böck (Johannes Kepler University Linz) 11:30 Compact Mads Græsbøll Christensen (Aalborg University) AM based on Tensor Train for Polyphonic Maximo Cobos (Universitat de Valencia) Music Modeling Sakti Sander Dieleman (Google DeepMind) 11:50 Singing Voice Separation using Monika Dörfer (University of Vienna) AM Generative Adversarial Networks Shlomo Dubnov (UC San Diego) Choi, Lee Philippe Esling (IRCAM) 12:10 Audio Cover Song Identifcation Cédric Févotte (IRIT) PM using Convolutional Neural Network Emilia Gómez (Universitat Pompeu Fabra) Chang, Lee Emanuël Habets (International Audio Labs 12:30 Lunch Break Erlangen) PM Jan Larsen (Danish Technical University) 01:30 Polyphonic piano transcription using Marco Marchini (Spotify) PM deep neural networks Eck Rafael Ramirez (Universitat Pompeu Fabra) 02:00 Deep learning for music Gaël Richard (TELECOM ParisTech) PM recommendation and generation Fatemeh Saki (UT Dallas) Dieleman Sanjeev Satheesh (Baidu SVAIL) Jan Schlüter (Austrian Research Institute for 02:30 Exploring Ad Efectiveness using Artifcial Intelligence) PM Acoustic Features Prockup, Vahabi Joan Serrà (Telefonica) 03:00 Poster Session Music and Malcolm Slaney (Google) PM environmental sounds Nieto, Pons, Emmanuel Vincent (INRIA Nancy) Raj, Tax, Elizalde, Nam, Kumar Gerhard Widmer (Austrian Research Institute for 04:00 Sight and sound Freeman Artifcial Intelligence) PM Tao Zhang (Starkey Hearing Technologies) 04:30 k-shot Learning of Acoustic Context PM de Vries Schedule 04:50 Towards Learning Semantic Audio PM Representations from Unlabeled Data Jansen 08:00 Overture Purwins 05:10 Cost-sensitive detection with AM PM variational for 08:15 Acoustic word embeddings for environmental acoustic sensing Li, AM speech search Livescu Roberts 08:45 Learning Word Embeddings from 05:30 Break AM Speech Glass, Chung PM 09:05 Multi-Speaker Localization Using 05:45 Panel: Machine learning and audio AM Convolutional Neural Network PM signal processing: State of the art Trained with Noise Chakrabarty, and future perspectives Hochreiter, Li, Habets Livescu, Mandal, Nieto, Slaney, Purwins 09:25 Adaptive Front-ends for End-to-end AM Source Separation Venkataramani, Abstracts (18): Smaragdis 09:45 Poster Session Speech: source Abstract 2: Acoustic word embeddings for AM separation, enhancement, speech search in Machine Learning for recognition, synthesis Zarar, Fakoor, Audio Signal Processing (ML4Audio), Livescu DUMPALA, Kim, Smaragdis, Dubey, Ko, 08:15 AM Sakti, Wang, Guo, Kenyon, Tjandra, Tax, Lee For a number of speech tasks, it can be useful to represent speech segments of arbitrary length by 11:00 Learning and transforming sound for fxed-dimensional vectors, or embeddings. In AM interactive musical applications particular, vectors representing word segments -- Marchini acoustic word embeddings -- can be used in 31 Dec. 8, 2017

query-by-example search, example-based speech are challenging and expensive to collect and recognition, or spoken term discovery. *Textual* annotate. word embeddings have been common in natural language processing for a number of years now; the acoustic analogue is only recently starting to Abstract 4: Multi-Speaker Localization Using be explored. This talk will present our work on Convolutional Neural Network Trained with acoustic word embeddings and their application Noise in Machine Learning for Audio Signal to query-by-example search. I will speculate on Processing (ML4Audio), Chakrabarty, Habets applications across a wider variety of audio tasks. 09:05 AM

The problem of multi-speaker localization is Karen Livescu is an Associate Professor at TTI- formulated as a multi-class multi-label Chicago. She completed her PhD and post-doc in classifcation problem, which is solved using a electrical engineering and computer science at convolutional neural network (CNN) based source MIT and her Bachelor's degree in Physics at localization method. Utilizing the common Princeton University. Karen's main research assumption of disjoint speaker activities, we interests are at the intersection of speech and propose a novel method to train the CNN using language processing and machine learning. Her synthesized noise signals. The proposed recent work includes multi-view representation localization method is evaluated for two speakers learning, segmental neural models, acoustic word and compared to a well-known steered response embeddings, and automatic sign language power method. recognition. She is a member of the IEEE Spoken Language Technical Committee, an associate editor for IEEE Transactions on Audio, Speech, Abstract 5: Adaptive Front-ends for End-to- and Language Processing, and a technical co- end Source Separation in Machine Learning chair of ASRU 2015 and 2017. for Audio Signal Processing (ML4Audio), Venkataramani, Smaragdis 09:25 AM

Abstract 3: Learning Word Embeddings from (+ Jonah Casebeer) Speech in Machine Learning for Audio Source separation and other audio applications have traditionally relied on the use of short-time Signal Processing (ML4Audio), Glass, Chung Fourier transforms as a front-end frequency 08:45 AM domain representation step. We present an auto- In this paper, we propose a novel deep neural encoder neural network that can act as an network architecture, Sequence-to- Sequence equivalent to short-time front-end transforms. We Audio2Vec, for unsupervised learning of fxed- demonstrate the ability of the network to learn length vector representations of audio segments optimal, real-valued basis functions directly from excised from a speech corpus, where the vectors the raw waveform of a signal and further show contain semantic information pertaining to the how it can be used as an adaptive front-end for segments, and are close to other vectors in the end-to-end supervised source separation. embedding space if their corresponding segments are semantically similar. The design of the proposed model is based on the RNN Abstract 6: Poster Session Speech: source Encoder-Decoder framework, and borrows the separation, enhancement, recognition, methodology of continuous skip-grams for synthesis in Machine Learning for Audio training. The learned vector representations are Signal Processing (ML4Audio), Zarar, Fakoor, evaluated on 13 widely used word similarity DUMPALA, Kim, Smaragdis, Dubey, Ko, Sakti, benchmarks, and achieved competitive results to Wang, Guo, Kenyon, Tjandra, Tax, Lee 09:45 AM that of GloVe. The biggest advantage of the proposed model is its capability of extracting Poster abstracts and full papers: http:// semantic information of audio segments taken media.aau.dk/smc/ml4audio/ directly from raw speech, without relying on any other modalities such as text or images, which SPEECH SOURCE SEPARATION 32 Dec. 8, 2017

*Lijiang Guo and Minje Kim. Bitwise Source created diverse tools for generating audio tracks Separation on Hashed Spectra: An Efcient by transforming prerecorded music material. Our Posterior Estimation Scheme Using Partial Rank artists integrated these tools in their composition Order Metrics process and produced some pop tracks. I present *Minje Kim and Paris Smaragdis. Bitwise Neural some of those tools, with audio examples, and Networks for Efcient SingleChannel Source give an operative defnition of music style Separation transfer as an optimization problem. Such *Mohit Dubey, Garrett Kenyon, Nils Carlson and defnition allows for an efcient solution which Austin Thresher. Does Phase Matter For Monaural renders possible a multitude of musical Source Separation? applications: from composing to live performance. SPEECH ENHANCEMENT *Rasool Fakoor, Xiaodong He, Ivan Tashev and Marco Marchini works at Spotify in the Creator Shuayb Zarar. Reinforcement Learning To Adapt Technology Research Lab, Paris. His mission is Speech Enhancement to Instantaneous Input bridging the gap between between creative Signal Quality artists and intelligent technologies. *Jong Hwan Ko, Josh Fromm, Matthai Phillipose, Previously, he worked as research assistant for Ivan Tashev and Shuayb Zarar. Precision Scaling the Pierre-and-Marie-Curie University at the Sony of Neural Networks for Efcient Audio Processing Computer Science Laboratory of Paris and worked for the Flow Machine project. His previous AUTOMATIC SPEECH RECOGNITION academic research also includes unsupervised *Marius Paraschiv, Lasse Borgholt, Tycho Tax, music generation and ensemble performance Marco Singh and Lars Maaløe. Exploiting analysis, this research was carried out during my Nontrivial Connectivity for Automatic Speech M.Sc. and Ph.D. at the Music Technology Group Recognition (DTIC, Pompeu Fabra University). He has a double *Brian Mcmahan and Delip Rao. Listening to the degree in Mathematics from Bologna University. World Improves Speech Command Recognition * Andros Tjandra, Sakriani Sakti and Satoshi Nakamura. End-to-End Speech Recognition with Abstract 8: Compact Recurrent Neural Local Monotonic Attention Network based on Tensor Train for *Sri Harsha Dumpala, Rupayan Chakraborty and Polyphonic Music Modeling in Machine Sunil Kumar Kopparapu. A Novel Approach for Learning for Audio Signal Processing Efective Learning in Low Resourced Scenarios (ML4Audio), Sakti 11:30 AM

SPEECH SYNTHESIS (+Andros Tjandra, Satoshi Nakamura) *Yuxuan Wang, Rj SkerryRyan, Ying Xiao, Daisy This paper introduces a novel compression Stanton, Joel Shor, Eric Battenberg, Rob Clark and method for recurrent neural networks (RNNs) Rif A. Saurous. Uncovering Latent Style Factors based on Tensor Train (TT) format. The objective for Expressive Speech Synthesis in this work are to reduce the number of *Younggun Lee, Azam Rabiee and Soo-Young Lee. parameters in RNN and maintain their expressive Emotional End-to-End Neural Speech Synthesizer power. The key of our approach is to represent the dense matrices weight parameter in the simple RNN and Gated Recurrent Unit (GRU) RNN Abstract 7: Learning and transforming sound architectures as the n- dimensional tensor in TT- for interactive musical applications in format. To evaluate our proposed models, we compare it with uncompressed RNN on Machine Learning for Audio Signal polyphonic sequence prediction tasks. Our Processing (ML4Audio), Marchini 11:00 AM proposed TT-format RNN are able to preserve the Recent developments in object recognition performance while reducing the number of RNN (especially convolutional neural networks) led to parameters signifcantly up to 80 times smaller. a new spectacular application: image style transfer. But what would be the music version of style transfer? In the fow-machine project, we 33 Dec. 8, 2017

Abstract 9: Singing Voice Separation using Learning for Audio Signal Processing Generative Adversarial Networks in (ML4Audio), Eck 01:30 PM Machine Learning for Audio Signal I'll discuss the problem of transcribing polyphonic Processing (ML4Audio), Choi, Lee 11:50 AM piano music with an emphasis on generalizing to (+Ju-heon Lee) unseen instruments. We optimize for two In this paper, we propose a novel approach objectives. We frst predict pitch onset events and extending Wasserstein generative adversarial then conditionally predict pitch at the frame networks (GANs) [3] to separate singing voice level. I'll discuss the model architecture, which from the mixture signal. We used the mixture combines CNNs and LSTMs. I'll also discuss signal as a condition to generate singing voices challenges faced in robust piano transcription, and applied the U-net style network for the stable such as obtaining enough data to train a good training of the model. Experiments with the model I'll also provide some demos and links to DSD100 dataset show the promising results with working code. This collaboration was led by Curtis the potential of using the GANs for music source Hawthorne, Erich Elsen and Jialin Song (https:// separation. arxiv.org/abs/1710.11153).

Douglas Eck works at the Google Brain team on the Magenta project, an efort to generate music, Abstract 10: Audio Cover Song Identifcation using Convolutional Neural Network in video, images and text using machine Machine Learning for Audio Signal intelligence. Processing (ML4Audio), Chang, Lee 12:10 PM He also worked on music search and recommendation for Google Play Music. I was an (+Juheon Lee, Sankeun Choe) Associate Professor in Computer Science at In this paper, we propose a new approach to University of Montreal in the BRAMS research cover song identifcation using CNN center. He also worked on music performance (convolutional neural network). Most previous modeling. studies extract the feature vectors that characterize the cover song relation from a pair of songs and used it to compute Abstract 13: Deep learning for music the (dis)similarity between the two songs. Based recommendation and generation in Machine on the observation that there is a meaningful Learning for Audio Signal Processing pattern between cover songs and that this can be (ML4Audio), Dieleman 02:00 PM learned, we have reformulated the cover song identifcation problem in a machine learning The advent of deep learning has made it possible framework. To do this, we frst build the CNN to extract high-level information from perceptual using as an input a cross-similarity matrix signals without having to specify manually and generated from a pair of songs. We then explicitly how to obtain it; instead, this can be construct the data set composed of cover song learned from examples. This creates pairs and non-cover song pairs, which are used opportunities for automated content analysis of as positive and negative training samples, musical audio signals. In this talk, I will discuss respectively. The trained CNN outputs the how deep learning techniques can be used for probability of being in the cover song relation audio-based music recommendation. I will also given a cross-similarity matrix generated from discuss my ongoing work on music generation in any two pieces of music and identifes the cover the raw waveform domain with WaveNet. song by ranking on the probability. Experimental results show that the proposed algorithm Sander Dieleman is a Research Scientist at achieves performance better than or comparable DeepMind in London, UK, where he has worked to the state-of-the-arts. on the development of AlphaGo and WaveNet. He was previously a PhD student at Ghent University, where he conducted research on Abstract 12: Polyphonic piano transcription and deep learning techniques for learning hierarchical representations of musical using deep neural networks in Machine 34 Dec. 8, 2017

audio signals. During his PhD he also developed Abstract 15: Poster Session Music and the Theano-based deep learning library Lasagne environmental sounds in Machine Learning and won solo and team gold medals respectively for Audio Signal Processing (ML4Audio), in Kaggle's "Galaxy Zoo" competition and the frst Nieto, Pons, Raj, Tax, Elizalde, Nam, Kumar 03:00 National Bowl. In the summer of PM 2014, he interned at Spotify in New York, where he worked on implementing audio-based music Poster abstracts and full papers: http:// media.aau.dk/smc/ml4audio/ recommendation using deep learning on an industrial scale. MUSIC: *Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann and Xavier Serra. Abstract 14: Exploring Ad Efectiveness using End-to-end learning for music audio tagging at Acoustic Features in Machine Learning for scale Audio Signal Processing (ML4Audio), *Jongpil Lee, Taejun Kim, Jiyoung Park and Juhan Prockup, Vahabi 02:30 PM Nam. Raw Waveform based Audio Classifcation Online audio advertising is a form of advertising Using Sample­level CNN Architectures *Alfonso Perez-Carrillo Estimation of violin bowing used abundantly in online music streaming features from Audio recordings with services. In these platforms, providing high quality ads ensures a better user experience and Convolutional Networks results in longer user engagement. In this paper we describe a way to predict ad quality using ENVIRONMENTAL SOUNDS: *Bhiksha Raj, Benjamin Elizalde, Rohan Badlani, hand-crafted, interpretable acoustic features that Ankit Shah and Anurag Kumar. NELS ­ Never­‐ capture timbre, rhythm, and harmonic organization of the audio signal. We then discuss Ending Learner of Sounds how the characteristics of the sound can be *Tycho Tax, Jose Antich, Hendrik Purwins and Lars connected to concepts such as the clarity of the Maaløe Utilizing Domain Knowledge in End-to-End Audio Processing ad and its message. *Anurag Kumar and Bhiksha Raj. Deep CNN

Matt Prockup is currently a scientist at Pandora Framework for Audio Event Recognition using working on methods and tools for Music Weakly Labeled Web Data Information Retrieval at scale. He recently received his Ph.D. in Electrical Engineering from Drexel University. His research interests span a Abstract 16: Sight and sound in Machine wide scope of topics including audio signal Learning for Audio Signal Processing processing, recommender systems, machine (ML4Audio), Freeman 04:00 PM learning, and human computer interaction. He is William T. Freeman is the Thomas and Gerd also an avid drummer, percussionist, and Perkins Professor of Electrical Engineering and composer. Computer Science at MIT. His current research Puya - Hossein Vahabi is a senior research interests include motion rerendering, scientist at Pandora working on Audio/Video computational photography, and learning for vision. He received outstanding paper awards at Computational Advertising. Before Pandora, he computer vision or machine learning conferences was a research scientist at Yahoo Labs. He has a PhD in CS, and he has been a research associate in 1997, 2006, 2009 and 2012, and recently won of the Italian National Research for many years. "test of time" awards for papers written in 1991 He has a PhD in CS, and his background is on and 1995. Previous research topics include steerable flters and pyramids, the generic computational advertising, graph mining and viewpoint assumption, color constancy, bilinear information retrieval. models for separating style and content, and belief propagation in networks with loops. He holds 30 patents. 35 Dec. 8, 2017

Audio Signal Processing (ML4Audio), Li, Roberts 05:10 PM Abstract 17: k-shot Learning of Acoustic Context in Machine Learning for Audio (+ Ivan Kiskin, Davide Zilli, Marianne Sinka, Signal Processing (ML4Audio), de Vries 04:30 Henry Chan, Kathy Willis) PM Environmental acoustic sensing involves the retrieval and processing of audio signals to better (+Ivan Bocharov, Tjalling Tjalkens) understand our surroundings. While large-scale In order to personalize the behavior of hearing acoustic data make manual analysis infeasible, aid devices in diferent acoustic scenes, we need they provide a suitable playground for machine personalized acoustic scene classifers. Since we learning approaches. Most existing machine cannot aford to burden an individual hearing aid learning techniques developed for environmental user with the task to collect a large acoustic acoustic sensing do not provide fexible control of database, we will want to train an acoustic scene the trade-of between the false positive rate and classifer on one in-situ recorded waveform (of a the false negative rate. This paper presents a few seconds duration) per class. In this paper we cost-sensitive classifcation paradigm, in which develop a method that achieves high levels of the hyper-parameters of classifers and the classifcation accuracy from a single recording of structure of variational autoencoders are selected an acoustic scene. in a principled Neyman- Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug Abstract 18: Towards Learning Semantic project1 which aims to detect the presence of Audio Representations from Unlabeled Data mosquitoes using sound collected by simple in Machine Learning for Audio Signal embedded devices. Processing (ML4Audio), Jansen 04:50 PM

(+ Manoj Plakal, Ratheet Pandya, Daniel P. W. Abstract 21: Panel: Machine learning and Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous) audio signal processing: State of the art Our goal is to learn semantically structured audio and future perspectives in Machine representations without relying on categorically Learning for Audio Signal Processing (ML4Audio), Hochreiter, Li, Livescu, Mandal, labeled data. We consider several class-agnostic Nieto, Slaney, Purwins 05:45 PM semantic constraints that are inherent to non- speech audio: (i) sound categories are invariant How can end-to-end audio processing be further to additive noise and translations in time, (ii) optimized? How can an audio processing system mixtures of two sound events inherit the be built that generalizes across domains, in categories of the constituents, and (iii) the particular diferent languages, music styles, or categories of events in close temporal proximity acoustic environments? How can complex in a single recording are likely to be the same or musical hierarchical structure be learned? How related. We apply these constraints to sample can we use machine learning to build a music training data for triplet-loss embedding models system that is able to react in the same way an using a large unlabeled dataset of YouTube improvisation partner would? soundtracks. The resulting low-dimensional Can we build a system that could put a composer representations provide both greatly improved in the role of a perceptual engineer? query-by-example retrieval performance and reduced labeled data and model complexity Sepp Hochreiter (Johannes Kepler University Linz, requirements for supervised sound classifcation. http://www.bioinf.jku.at/people/hochreiter/) Bo Li (Google, https://research.google.com/pubs/ BoLi.html) Abstract 19: Cost-sensitive detection with Karen Livescu (Toyota Technological Institute at variational autoencoders for environmental Chicago, http://ttic.uchicago.edu/~klivescu/) acoustic sensing in Machine Learning for Arindam Mandal (Amazon Alexa, https:// scholar.google.com/citations? 36 Dec. 8, 2017

user=tV1hW0YAAAAJ&hl=en) for which existing theory may not extend to, or Oriol Nieto (Pandora, http://urinieto.com/about/) for which existing theory is not easily usable by a Malcolm Slaney (Google, http://www.slaney.org/ practitioner. Sufce it to say, a lot is lost in malcolm/pubs.html) translation between diferent communities Hendrik Purwins (Aalborg University Copenhagen, working with NN methods. http://personprofl.aau.dk/130346?lang=en) In short, NN methods is an exciting feld at the intersection of classical statistics, machine learning, data structures and domain specifc Nearest Neighbors for Modern expertise. The aim of this work is to bring Applications with Massive Data: An Age- together theoreticians and practitioners alike from these various diferent backgrounds with a old Solution with New Challenges diverse range of perspectives to bring everyone up to speed on: George H Chen, Devavrat Shah, Christina Lee - Best known statistical/computational

201 B, Fri Dec 08, 08:00 AM guarantees (especially recent non-asymptotic results) Many modern methods for prediction leverage - Latest methods/systems that have been nearest neighbor (NN) search to fnd past training developed especially for fast approximate NN examples most similar to a test example, an idea search that scale to massive datasets that dates back in text to at least the 11th - Various applications in which NN methods are century in the “Book of Optics” by Alhazen. heavily used as a critical component in prediction Today, NN methods remain popular, often as a or inference cog in a bigger prediction machine, used for instance in recommendation systems, forecasting By gathering a diverse crowd, we hope attendees baseball player performance and election share their perspectives, identify ways to bridge outcomes, survival analysis in healthcare, image theory and practice, and discuss avenues of in-painting, crowdsourcing, graphon estimation, future research. and more. The popularity of NN methods is due in no small part to the proliferation of high-quality fast approximate NN search methods that scale Schedule to high-dimensional massive datasets typical of contemporary applications. Moreover, NN 09:00 A millennium of nearest neighbor prediction readily pairs with methods that learn AM methods – an introduction to the similarities, such as metric learning methods or NIPS nearest neighbor workshop Siamese networks. In fact, some well-known 2017 Chen pairings that result in nearest neighbor predictors 09:20 Analyzing Robustness of Nearest that learn similarities include random forests and AM Neighbors to Adversarial Examples many boosting methods. Chaudhuri 10:05 New perspective from Blackwell's Despite the popularity, success, and age of AM "comparisons of experiments" on nearest neighbor methods, our theoretical generative adversarial networks Oh understanding of them is still surprisingly 11:00 Data-dependent methods for incomplete (perhaps much to the chagrin of the AM similarity search in high dimensions initial eforts of analysis by Fix, Hodges, Cover, Indyk and Hart) and can also be disconnected from what practitioners actually want or care about. 11:45 Fast k-Nearest Neighbor Search via Many successful approximate nearest neighbor AM Prioritized DCI Li methods in practice do not have known 01:45 How to stop worrying and learn to theoretical guarantees, and many of the PM love Nearest Neighbors Efros guarantees for exact nearest neighbor methods 02:30 A Platform for Data Science Doshi do not readily handle approximation. Meanwhile, PM many applications use variations on NN methods, 37 Dec. 8, 2017

02:50 Poster Session (encompasses cofee examples. PM break) Chen, Balle, Lee, frosio, Malik, Kautz, Li, Sugiyama, Carreira-Perpinan, In this talk, we take a step towards addressing Raziperchikolaei, Tulabandhula, Noh, Yu this challenging question by introducing a new 04:15 The Boundary Forest Algorithm theoretical framework, analogous to bias- PM Yedidia variance theory, which we can use to tease out 05:00 Iterative Collaborative Filtering for the causes of vulnerability. We apply our PM Sparse Matrix Estimation Lee framework to a simple classifcation algorithm: nearest neighbors, and analyze its robustness to adversarial examples. Motivated by our analysis, Abstracts (10): we propose a modifed version of the nearest neighbor algorithm, and demonstrate both Abstract 1: A millennium of nearest neighbor theoretically and empirically that it has superior methods – an introduction to the NIPS robustness to standard nearest neighbors. nearest neighbor workshop 2017 in Nearest Neighbors for Modern Applications with Massive Data: An Age-old Solution with New Abstract 3: New perspective from Blackwell's Challenges, Chen 09:00 AM "comparisons of experiments" on I give a brief history of nearest neighbor (NN) generative adversarial networks in Nearest methods, starting from Alhazen’s “Book of Neighbors for Modern Applications with Optics” in the 11th century, and leading up to Massive Data: An Age-old Solution with New present time. Surprisingly, the most general Challenges, Oh 10:05 AM nonasymptotic results on NN prediction only We bring the tools from Blackwell's seminal result recently emerged in 2014 in the work of on comparing two stochastic experiments, to Chaudhuri and Dasgupta. Turning toward “user- shine new lights on the modern applications of friendly” theory with an eye toward practitioners, great interest: generative adversarial networks I mention recent guarantees (2013-2015) in the (GAN). Binary hypothesis testing is at the center contemporary applications of time series of GANs, and we propose new data processing forecasting, online collaborative fltering, and inequalities that allows us to discover new medical image segmentation — in all three cases, algorithms to combat mode collapse, provide clustering structure enables successful NN sharper analyses, and provide simpler proofs. prediction. This leads to a new architecture to handle one of the major challenges in GAN known as ``mode collapse''; the lack of diversity in the samples Abstract 2: Analyzing Robustness of Nearest generated by the learned generators. The Neighbors to Adversarial Examples in hypothesis testing view of GAN allows us to Nearest Neighbors for Modern Applications analyze the new architecture and show that it with Massive Data: An Age-old Solution with encourages generators with no mode collapse. New Challenges, Chaudhuri 09:20 AM Experimental results show that the proposed architecture can learn to generate samples with Motivated by applications such as autonomous diversity that is orders of magnitude better than vehicles, test-time attacks via adversarial competing approaches, while being simpler. For examples have received a great deal of recent this talk, I will assume no prior background on attention. In this setting, an adversary is capable GANs. of making queries to a classifer, and perturbs a test example by a small amount in order to force the classifer to report an incorrect label. While a long line of work has explored a number of Abstract 4: Data-dependent methods for attacks, not many reliable defenses are known, similarity search in high dimensions in and there is an overall lack of general Nearest Neighbors for Modern Applications understanding about the foundations of designing with Massive Data: An Age-old Solution with machine learning algorithms robust to adversarial New Challenges, Indyk 11:00 AM 38 Dec. 8, 2017

I will give an overview of recent research on increase in intrinsic dimensionality, or designing provable data-dependent approximate equivalently, an exponential increase in the similarity search methods for high-dimensional number of points near a query, can be mostly data. Unlike many approaches proposed in the counteracted with just a linear increase in space. algorithms literature, the new methods lead to We also demonstrate empirically that Prioritized data structures that adapt to the data while DCI signifcantly outperforms prior methods. In retaining correctness and running time particular, relative to Locality-Sensitive Hashing guarantees. The high-level principle behind those (LSH), Prioritized DCI reduces the number of methods is that every data set (even a distance evaluations by a factor of 14 to 116 and completely random one) has some structure that the memory consumption by a factor of 21. can be exploited algorithmically. This interpolates between basic "data-oblivious" approaches (e.g., those based on simple random Abstract 6: How to stop worrying and learn projections) that are analyzable but do not adapt to love Nearest Neighbors in Nearest easily to the structure of the data, and basic Neighbors for Modern Applications with "data-dependent" methods that exploit specifc Massive Data: An Age-old Solution with New structure of the data, but might not perform well Challenges, Efros 01:45 PM if this structure is not present. Nearest neighbors is an algorithm everyone loves Concretely, I will cover two recent lines of to hate. It's too trivial, too brute-force, doesn't research: ofer any insights. In this talk, I will try to make - Data-dependent Locality-Sensitive Hashing: you fall in love with the humble nearest approximate nearest neighbor search methods neighbor. First, I will discuss the use of nearest that provably improve over the best possible neighbor as an exceptionally useful baseline to data-oblivious LSH algorithms (covers work from guard against our feld's natural bias in favor of SODA'14, STOC'15 and NIPS'15). elegant algorithms over data. Then, I will discuss - Data-dependent metric compression: algorithms some scenarios when nearest neighbors is that represent sets of points using a small particularly useful in practice. number of bits while approximately preserving the distances up to higher accuracy than previously known (covers work from SODA'17 and Abstract 7: A Platform for Data Science in NIPS'17). Nearest Neighbors for Modern Applications with Massive Data: An Age-old Solution with New Challenges, Doshi 02:30 PM Abstract 5: Fast k-Nearest Neighbor Search We describe an extensible data science platform via Prioritized DCI in Nearest Neighbors for based on non-parametric statistical methods such Modern Applications with Massive Data: An Age-old Solution with New Challenges, Li as nearest neighbors. This platform was borne 11:45 AM out of the need for us to provide scalable and fexible predictive analytics solutions for retailers Most exact methods for k-nearest neighbour and the federal government. In this talk, I will search sufer from the curse of describe the formalism associated with the dimensionality; that is, their query times exhibit platform, discuss its performance on benchmark exponential dependence on either the ambient or datasets out-of-the-box and show a live demo for the intrinsic dimensionality. Dynamic Continuous building a recommendation system and for Indexing (DCI) [19] ofers a promising way of detecting geo-spatial anomalies in maritime circumventing the curse and successfully reduces domain. the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we Abstract 8: Poster Session (encompasses call Prioritized DCI, and show a remarkable cofee break) in Nearest Neighbors for improvement in the dependence of query time on Modern Applications with Massive Data: An intrinsic dimensionality. In particular, a linear 39 Dec. 8, 2017

Age-old Solution with New Challenges, Chen, The sparse matrix estimation problem consists of Balle, Lee, frosio, Malik, Kautz, Li, Sugiyama, estimating the distribution of an n-by-n matrix Y , Carreira-Perpinan, Raziperchikolaei, Tulabandhula, from a sparsely observed single instance of this Noh, Yu 02:50 PM matrix where the entries of Y are independent random variables. This captures a wide array of - Learning Supervised Binary Hashing without problems; special instances include matrix Binary Code Optimization (Miguel Carreira- completion in the context of recommendation Perpinan; Ramin Raziperchikolaei) systems, graphon estimation, and community - Sub-linear Privacy-preserving Search with detection in (mixed membership) stochastic block Unsecured Server and Semi-honest Parties (Beidi models. Inspired by classical collaborative Chen) fltering for recommendation systems, we - On Nearest Neighbors in Non Local Means propose a novel iterative, collaborative fltering Denoising (Iuri Frosio; Kautz Jan) style algorithm for matrix estimation in this - Fast k-Nearest Neighbour Search via Prioritized generic setting. Under model assumptions of DCI (Ke Li; Jitendra Malik) uniform sampling, bounded entries, and fnite - Generative Local Metric Learning for Nearest spectrum, we provide bounds on the the mean Neighbor Methods (Yung-Kyun Noh; Masashi squared error (MSE) of our estimator and show Sugiyama; Daniel D Lee) improved sample complexity. - Private Document Classifcation in Federated Databases (Phillipp Schoppmann; Adria Gascon; Borja Balle) - Optimizing Revenue Over Data-Driven Machine Deception Assortments (Deeksha Sinha; Theja Tulabandula) - Fast Distance Metric Learning for Multi-Task Ian Goodfellow, Tim Hwang, Bryce Goodman, Mikel Large Margin Nearest Neighbor Classifcation Rodriguez (Adams Yu) 202, Fri Dec 08, 08:00 AM

Machine deception refers to the capacity for Abstract 9: The Boundary Forest Algorithm in machine learning systems to manipulate human Nearest Neighbors for Modern Applications and machine agents into believing, acting upon with Massive Data: An Age-old Solution with or otherwise accepting false information. The New Challenges, Yedidia 04:15 PM development of machine deception has had a I explain the Boundary Forest algorithm, a simple, long, foundational and under-appreciated impact fast, and incremental online algorithm that can on shaping research in the feld of artifcial be used in both supervised and unsupervised intelligence. Thought experiments such as Alan settings, for classifcation, regression, or retrieval Turing’s eponymous “Turing test” - where an problems. As a classifcation algorithm, its automated system attempts to deceive a human generalization performance is at least as good as judge into believing it is a human interlocutor, or K-nearest-neighbors, while being able to respond Searle’s “Chinese room” - in which a human to queries very quickly. It maintains trees of operator attempts to imbue the false impression examples that let it train and respond to test of consciousness in a machine, are queries in a time logarithmic with the number of simultaneously exemplars of machine deception stored examples, which is typically very much and some of the most famous and infuential less than the number of training examples. concepts in the feld of AI.

As the feld of machine learning advances, so too does machine deception seem poised to give rise Abstract 10: Iterative Collaborative Filtering to a host of practical opportunities and concerns. for Sparse Matrix Estimation in Nearest Machine deception can have many benign and Neighbors for Modern Applications with benefcial applications. Chatbots designed to Massive Data: An Age-old Solution with New mimic human agents ofer technical support and Challenges, Lee 05:00 PM even provide therapy at a cost and scale that 40 Dec. 8, 2017

may not be otherwise achievable. On the other 11:30 Interpretation of Neural Networks is hand, the rise of techniques that leverage bots AM Fragile Abid and other autonomous agents to manipulate and 12:00 Generative Models for Spear shape political speech online, has put machine PM Phishing Posts on Social Media Tully, deception in the political spotlight and raised Seymour fundamental questions regarding the ability to 01:30 CycleGAN, a Master of preserve truth in the digital domain. These PM Steganography Chu concerns are amplifed by recent demonstrations 02:00 Machine Against Machine: Minimax- of machine learning techniques that synthesize PM Optimal Attacks and Defenses Hamm hyper-realistic manipulations of audio and video. 03:00 A3T: Adversarially Augmented PM Adversarial Training Baratin, Lacoste- The proposed workshop will bring together Julien, Bengio, Erraqabi research at the forefront of machine deception, 03:30 Thermometer Encoding: One Hot including: PM way to resist Adversarial Examples

Machine-machine deception: Where a machine agent deceives another machine agent, e.g. the use of “bot farms” that automate posting on Discrete Structures in Machine Learning social media platforms to manipulate content ranking algorithms or evolutionary networks to Yaron Singer, Jef A Bilmes, Andreas Krause, Stefanie generate images that “fool” deep neural Jegelka, Amin Karbasi networks. 203, Fri Dec 08, 08:00 AM Human-machine deception: Where a human agent deceives a machine agent, e.g. the use of Traditionally, machine learning has been focused human “troll farms” to manipulate content on methods where objects reside in continuous ranking algorithms or use of adversarial domains. The goal of this workshop is to advance examples to exploit fragility of autonomous state-of-the-art methods in machine learning that systems (e.g. stop sign sticker for self driving involve discrete structures. cars or printed eye-glasses for facial recognition). Machine-human deception: Where a machine Models with ultimately discrete solutions play an agent is leveraged to deceives a human agent, important role in machine learning. At its core, e.g. the use of GANs to produce realistic statistical machine learning is concerned with manipulations of audio and video content. making inferences from data, and when the underlying variables of the data are discrete, Although the workshop will primarily focus on the both the tasks of model inference as well as technical aspects of machine deception, predictions using the inferred model are submissions from the felds of law, policy, social inherently discrete algorithmic problems. Many of sciences and psychology will also be encouraged. these problems are notoriously hard, and even It is envisaged that this interdisciplinary forum those that are theoretically tractable become will both shine a light on what is possible given intractable in practice with abundant and steadily state of the art tools today, and provide increasing amounts of data. As a result, standard instructive guidance for both technologists and theoretical models and of-the-shelf algorithms policy-makers going forward. become either impractical or intractable (and in some cases both). Schedule While many problems are hard in the worst case, the problems of practical interest are often much 10:00 Generating Natural Adversarial more well-behaved, and have the potential to be AM Examples Zhao modeled in ways that make them tractable. 10:30 Adversarial Patch Gilmer Indeed, many discrete problems in machine AM learning can possess benefcial structure; such structure has been an important ingredient in 41 Dec. 8, 2017

many successful (approximate) solution Transparent and interpretable Machine strategies. Examples include submodularity, Learning in Safety Critical Environments marginal polytopes, symmetries and exchangeability. Alessandra Tosi, Alfredo Vellido, Mauricio Álvarez

Machine learning, algorithms, discrete 204, Fri Dec 08, 08:00 AM mathematics and combinatorics as well as applications in computer vision, speech, NLP, The use of machine learning has become biology and network analysis are all active areas pervasive in our society, from specialized of research, each with an increasingly large body scientifc data analysis to industry intelligence of foundational knowledge. The workshop aims to and practical applications with a direct impact in ask questions that enable communication across the public domain. This impact involves diferent these felds. In particular, this year we aim to social issues including privacy, ethics, liability address the investigation of combinatorial and accountability. This workshop aims to discuss structures allows to capture complex, high-order the use of machine learning in safety critical dependencies in discrete learning problems environments, with special emphasis on three prevalent in deep learning, social networks, etc. main application domains: An emphasis will be given on uncertainty and - Healthcare structure that results from problem instances - Autonomous systems being estimated from data. - Complainants and liability in data driven industries We aim to answer some of these questions: How Schedule do we make our models more comprehensible and transparent? Shall we always trust our decision making process? How do we involve feld 09:15 Andrea Montanari experts in the process of making machine AM learning pipelines more practically interpretable 10:00 Spotlight session I Thaker, Korba, from the viewpoint of the application domain? AM Lenail 11:00 David Tse AM Schedule 12:00 Lunch PM 08:50 Opening remarks Tosi, Vellido, Álvarez 01:30 Bobby Kleinberg AM PM 09:10 Invited talk: Is interpretability and 02:45 Spotlight session III Kazemi, Salehi AM explainability enough for safe and PM reliable decision making? Saria 03:00 Cofee Break and Posters 09:45 Invited talk: The Role of Explanation PM AM in Holding AIs Accountable Doshi- 04:00 Nina Balcan Velez PM 10:20 Contributed talk: Beyond Sparsity: 05:00 Contributed talk AM Tree-based Regularization of Deep PM Models for Interpretability Wu, 05:15 Poster session Parbhoo, Doshi-Velez PM 10:30 Cofe break 1 AM 11:00 Invited talk: Challenges for AM Transparency Weller 11:20 Contributed talk: Safe Policy Search AM with Gaussian Process Models Polymenakos, Roberts 42 Dec. 8, 2017

11:30 Poster spotlights Kuwajima, Tanaka, Transparent and interpretable Machine AM Liang, Komorowski, Que, Drumond, Learning in Safety Critical Environments, Raghu, Celi, Göpfert, Ross, Tan, Caruana, Doshi-Velez 09:45 AM Lou, Kumar, Taylor, Poursabzi-Sangdeh, As AIs are used in more common and Wortman Vaughan, Wallach consequential situations, it is important that we 12:00 Poster session part I fnd ways to take advantage of our computational PM capabilities while also holding the creators of 12:30 Lunch break these systems accountable. In this talk, I'll start PM out by sharing some of the challenges associated 02:00 Invited talk: When the classifer with deploying AIs in healthcare, and how PM doesn't know: optimum reject interpretability or explanation is an essential tool options for classifcation. Hammer in this domain. Then I'll speak more broadly 02:35 Contributed talk: Predict about the role of explanation in holding AIs PM Responsibly: Increasing Fairness by accountable under the law (especially in the Learning To Defer Abstract Madras, context of current regulation around AIs). In doing Zemel, Pitassi so, I hope to spark discussions about how we, as 02:45 Contributed talk: Deep Motif a machine learning community, believe that our PM Dashboard: Visualizing and work should be regulated. Understanding Genomic Sequences Using Deep Neural Networks Lanchantin, Singh, Wang Abstract 4: Contributed talk: Beyond 02:55 Best paper prize announcement Sparsity: Tree-based Regularization of Deep PM Models for Interpretability in Transparent 03:00 Cofe break and Poster session part and interpretable Machine Learning in PM II Safety Critical Environments, Wu, Parbhoo, 03:30 Invited talk: Robot Transparency as Doshi-Velez 10:20 AM PM Optimal Control Dragan The lack of interpretability remains a key barrier 04:05 Invited talk 6 Amodei to the adoption of deep models in many PM healthcare applications. In this work, we explicitly 04:40 Panel discussion regularize deep models so human users might PM step through the process behind their predictions 05:20 Final remarks Tosi, Vellido, Álvarez in little time. Specifcally, we train deep time- PM series models so their class-probability 05:30 End of workshop predictions have high accuracy while being PM closely modeled by decision trees with few nodes. On two clinical decision-making tasks, we demonstrate that this new tree-based Abstracts (8): regularization is distinct from simpler L2 or L1 Abstract 1: Opening remarks in Transparent penalties, resulting in more interpretable models and interpretable Machine Learning in without sacrifcing predictive power. Safety Critical Environments, Tosi, Vellido, Álvarez 08:50 AM Abstract 7: Contributed talk: Safe Policy Opening remarks and introduction to the Search with Gaussian Process Models in Workshop on Transparent and Interpretable Transparent and interpretable Machine Machine Learning in Learning in Safety Critical Environments, Safety Critical Environments. Polymenakos, Roberts 11:20 AM

We propose a method to optimise the parameters Abstract 3: Invited talk: The Role of of a policy which will be used to safely perform a Explanation in Holding AIs Accountable in given task in a data-efcient manner. We train a 43 Dec. 8, 2017

Gaussian process model to capture the system predictions too fully, and allow its biases to dynamics, based on the PILCO framework. Our reinforce the user's own. model has useful analytic properties, which allow In this work, we explore models that learn to closed form computation of error gradients and defer. In our scheme, a model learns to classify estimating the probability of violating given state accurately and fairly, but also to space constraints. During training, as well as defer if necessary, passing judgment to a operation, only policies that are deemed safe are downstream decision-maker such as a human implemented on the real system, minimising the user. We further propose a learning algorithm risk of failure. which accounts for potential biases held by decision-makers later in a pipeline. Experiments on real-world datasets demonstrate that learning to defer can make a model not only more Abstract 8: Poster spotlights in Transparent and interpretable Machine Learning in accurate but also less biased. Even when Safety Critical Environments, Kuwajima, operated by biased users, we show that deferring Tanaka, Liang, Komorowski, Que, Drumond, models can still greatly improve the fairness of the entire pipeline. Raghu, Celi, Göpfert, Ross, Tan, Caruana, Lou, Kumar, Taylor, Poursabzi-Sangdeh, Wortman Vaughan, Wallach 11:30 AM Abstract 13: Contributed talk: Deep Motif [1] "Network Analysis for Explanation" Dashboard: Visualizing and Understanding [2] "Using prototypes to improve convolutional Genomic Sequences Using Deep Neural networks interpretability" Networks in Transparent and interpretable [3] "Accelerated Primal-Dual Policy Optimization Machine Learning in Safety Critical for Safe Reinforcement Learning" Environments, Lanchantin, Singh, Wang 02:45 [4] "Deep Reinforcement Learning for Sepsis PM Treatment" [5] "Analyzing Feature Relevance for Linear Deep neural network (DNN) models have recently Reject Option SVM using Relevance Intervals" obtained state-of-the-art prediction accuracy for [6] "The Neural LASSO: Local Linear Sparsity for the transcription factor binding (TFBS) site Interpretable Explanations" classifcation task. However, it remains unclear [7] "Detecting Bias in Black-Box Models Using how these approaches identify meaningful DNA Transparent Model Distillation" sequence signals and give insights as to why TFs [8] "Data masking for privacy-sensitive learning" bind to certain locations. In this paper, we [9] "CLEAR-DR: Interpretable Computer Aided propose a toolkit called the Deep Motif Diagnosis of Diabetic Retinopathy" Dashboard (DeMo Dashboard) which provides a [10] "Manipulating and Measuring Model suite of visualization strategies to extract motifs, Interpretability" or sequence patterns from deep neural network models for TFBS classifcation. We demonstrate how to visualize and understand three important Abstract 12: Contributed talk: Predict DNN models using three visualization methods: saliency maps, temporal output scores, and class Responsibly: Increasing Fairness by optimization. In addition to providing insights as Learning To Defer Abstract in Transparent and interpretable Machine Learning in to how each model makes its prediction, the Safety Critical Environments, Madras, Zemel, visualization techniques indicate that CNN-RNN Pitassi 02:35 PM makes predictions by modeling both motifs as well as dependencies among them. Machine learning systems, which are often used for high-stakes decisions, sufer from two mutually reinforcing problems: unfairness and Abstract 16: Invited talk: Robot Transparency opaqueness. Many popular models, though as Optimal Control in Transparent and generally accurate, cannot express uncertainty interpretable Machine Learning in Safety about their predictions. Even in regimes where a Critical Environments, Dragan 03:30 PM model is inaccurate, users may trust the model's 44 Dec. 8, 2017

In this talk, we will formalize transparency as theoretical and applied researchers interested in acting in a dynamical system or MDP in which we the analysis of time series and development of augment the physical state with the human's new algorithms to process sequential data. This belief about the robot. We will characterize the includes algorithms for time series prediction, dynamics model in this MDP, and show that classifcation, clustering, anomaly and change approximate solutions lead to cars that drive in a point detection, correlation discovery, way that is easier to anticipate, robots that come dimensionality reduction as well as a general up with instructive demonstrations of their task theory for learning and comparing stochastic knowledge, manipulator arms that clarify their processes. We invite researchers from the related intent, and navigation robots that clarify their areas of batch and online learning, reinforcement future task plans. Lastly, we will briefy explore learning, data analysis and statistics, robots that express more interesting properties econometrics, and many others to contribute to like the their level of confdence in their task, or this workshop. the weight of an object they are carrying. Schedule

NIPS 2017 Time Series Workshop 09:00 Introduction to Time Series AM Workshop Vitaly Kuznetsov, Oren Anava, Scott Yang, Azadeh 09:15 Marco Cuturi: Soft-DTW, a Khaleghi AM diferentiable loss for time series

Grand Ballroom A, Fri Dec 08, 08:00 AM data 10:00 Learning theory and algorithms for Data, in the form of time-dependent sequential AM shapelets and other local features. observations emerge in many key real-world Daiki Suehiro, Kohei Hatano, Eiji problems, ranging from biological data, fnancial Takimoto, Shuji Yamamoto, Kenichi markets, weather forecasting to audio/video Bannai and Akiko Takeda. processing. However, despite the ubiquity of such 10:15 Difusion Convolutional Recurrent data, most mainstream machine learning AM Neural Network: Data-Driven Trafc algorithms have been primarily developed for Forecasting. Yaguang Li, Rose Yu, settings in which sample points are drawn i.i.d. Cyrus Shahabi and Yan Liu. Li from some (usually unknown) fxed distribution. 10:30 Morning Cofee Break While there exist algorithms designed to handle AM non-i.i.d. data, these typically assume specifc 11:00 Panel discussion featureing Marco parametric form for the data-generating AM Cuturi (ENSAE / CREST), Claire distribution. Such assumptions may undermine Monteleoni (GWU), Karthik the complex nature of modern data which can Sridharan (Cornell), Firdaus Janoos possess long-range dependency patterns, and for (Two Sigma) and Matthias Seeger which we now have the computing power to (Amazon) discern. On the other extreme lie on-line learning 11:45 Poster Session Zand, Tu, Lee, Covert, algorithms that consider a more general AM Hernandez, Ebrahimzadeh, Slawinska, framework without any distributional Supratak, Lu, Alberg, Shen, Yeo, Pao, assumptions. However, by being purely-agnostic, Chai, Agarwal, Giannakis, Amjad common on-line algorithms may not fully exploit 12:30 Lunch the stochastic aspect of time-series data. PM 02:30 DISCOVERING ORDER IN UNORDERED This is the third instalment of time series PM DATASETS: GENERATIVE MARKOV workshop at NIPS and will build on the success of NETWORKS. Yao-Hung Hubert Tsai, the previous events: NIPS 2015 Time Series Han Zhao, Nebojsa Jojic and Ruslan Workshop and NIPS 2016 Time Series Workshop. Salakhutdinov.

The goal of this workshop is to bring together 45 Dec. 8, 2017

02:45 Vitaly Kuznetsov: Kaggle web trafc Víctor Campos, Brendan Jou, Xavier Giró-I-Nieto, PM time series forecasting competition: Jordi Torres and Shih-Fu Chang. Skip RNN: results and insights Learning to Skip State Updates in Recurrent 03:30 Afternoon Cofee Break Neural Networks. PM 04:00 Skip RNN: Learning to Skip State Yao-Hung Hubert Tsai, Han Zhao, Nebojsa Jojic PM Updates in Recurrent Neural and Ruslan Salakhutdinov. DISCOVERING ORDER Networks. Víctor Campos, Brendan IN UNORDERED DATASETS: GENERATIVE MARKOV Jou, Xavier Giró-I-Nieto, Jordi Torres NETWORKS. and Shih-Fu Chang. Yaguang Li, Rose Yu, Cyrus Shahabi and Yan Liu. 04:15 Karthik Sridharan: Online learning, Difusion Convolutional Recurrent Neural PM Probabilistic Inequalities and the Network: Data-Driven Trafc Forecasting. Burkholder Method 05:00 Scalable Joint Models for Reliable Alex Tank, Emily Fox and Ali Shojaie. An Efcient PM Event Prediction. Hossein Soleimani, ADMM Algorithm for Structural Break Detection in James Hensman and Suchi Saria. Multivariate Time Series. 05:15 Claire Monteleoni: Algorithms for PM Climate Informatics: Learning from Hossein Soleimani, James Hensman and Suchi spatiotemporal data with both Saria. Scalable Joint Models for Reliable Event spatial and temporal non- Prediction. stationarity

06:00 An Efcient ADMM Algorithm for Daiki Suehiro, Kohei Hatano, Eiji Takimoto, Shuji PM Structural Break Detection in Yamamoto, Kenichi Bannai and Akiko Takeda. Multivariate Time Series. Alex Tank, Learning theory and algorithms for shapelets and Emily Fox and Ali Shojaie. Tank other local features. 06:15 Conclusion and Awards PM Tao-Yi Lee, Yuh-Jye Lee, Hsing-Kuo Pao, You-Hua Lin and Yi-Ren Yeh. Elastic Motif Segmentation and Alignment of Time Series for Encoding and Abstracts (5): Classifcation. Abstract 2: Marco Cuturi: Soft-DTW, a diferentiable loss for time series data in Yun Jie Serene Yeo, Kian Ming A. Chai, Weiping NIPS 2017 Time Series Workshop, 09:15 AM Priscilla Fan, Si Hui Maureen Lee, Junxian Ong, Poh Ling Tan, Yu Li Lydia Law and Kok-Yong Seng. I will present in this talk a modifcation of the DP Mixture of Warped Correlated GPs for dynamic time warping distance which is, unlike Individualized Time Series Prediction. the original quantity, diferentiable in all of its inputs. As a result, that alternative distance can Anish Agarwal, Muhammad Amjad, Devavrat be used naturally as a learning loss to learn with Shah and Dennis Shen. Time Series Forecasting = datasets of time series, to produce means, Matrix Estimation. clusters or structured prediction where the goal is to forecast entire time series. Rose Yu, Stephan Zheng, Anima Anandkumar and Yisong Yue. Long-term Forecasting using Tensor- Train RNNs. Abstract 7: Poster Session in NIPS 2017 Time Series Workshop, Zand, Tu, Lee, Covert, Pranamesh Chakraborty, Chinmay Hegde and Hernandez, Ebrahimzadeh, Slawinska, Supratak, Anuj Sharma. Trend Filtering in Network Time Lu, Alberg, Shen, Yeo, Pao, Chai, Agarwal, Series with Applications to Trafc Incident Giannakis, Amjad 11:45 AM Detection.

Feel free to enjoy posters at lunch time as well! Jaleh Zand and Stephen Roberts. MiDGaP: Mixture Density Gaussian Processes. 46 Dec. 8, 2017

modeling using a memory controller extension for Dimitrios Giannakis, Joanna Slawinska, Abbas LSTM. Ourmazd and Zhizhen Zhao. Vector-Valued Spectral Analysis of Space-Time Data. Neil Dhir and Adam Kosiorek. Bayesian delay embeddings for dynamical systems. Ruofeng Wen, Kari Torkkola and Balakrishnan Narayanaswamy. A Multi-Horizon Quantile Aleksander Wieczorek and Volker Roth. Time Recurrent Forecaster. Series Classifcation with Causal Compression.

Alessandro Davide Ialongo, Mark van der Wilk Daniel Hernandez, Liam Paninski and John and Carl Edward Rasmussen. Closed-form Cunningham. Variational inference for latent Inference and Prediction in Gaussian Process nonlinear dynamics. State-Space Models. Alex Tank, Ian Covert, Nick Foti, Ali Shojaie and Hao Liu, Haoli Bai, Lirong He and Zenglin Xu. Emily Fox. An Interpretable and Sparse Neural Structured Inference for Recurrent Hidden Semi- Network Model for Nonlinear Granger Causality markov Model. Discovery.

Petar Veličković, Laurynas Karazija, Nicholas John Alberg and Zachary Lipton. Improving Factor- Lane, Sourav Bhattacharya, Edgar Liberis, Pietro Based Quantitative Investing by Forecasting Lio, Angela Chieh, Otmane Bellahsen and Company Fundamentals. Matthieu Vegreville. Cross-modal Recurrent Models for Weight Objective Prediction from Achintya Kr. Sarkar and Zheng-Hua Tan. Time- Multimodal Time-series Data. Contrastive Learning Based DNN Bottleneck Features for Text-Dependent Speaker Verifcation. Kun Tu, Bruno Ribeiro, Ananthram Swami and Don Towsley. Temporal Clustering in time-varying Networks with Time Series Analysis. Ankit Gandhi, Vineet Chaoji and Arijit Biswas. Modeling Customer Time Series for Age Shaojie Bai, J. Zico Kolter and Vladlen Koltun. Prediction. Convolutional Sequence Modeling Revisited. Zahra Ebrahimzadeh and Samantha Kleinberg. Apurv Shukla, Se-Young Yun and Daniel Bienstock. Multi-Scale Change Point Detection in Multivariate Non-Stationary Streaming PCA. Time Series.

Kun Zhao, Takayuki Osogami and Rudy Raymond. Fluid simulation with dynamic Boltzmann Abstract 8: Lunch in NIPS 2017 Time Series machine in batch manner. Workshop, 12:30 PM

Anderson Zhang, Miao Lu, Deguang Kong and Lunch on your own Jimmy Yang. Bayesian Time Series Forecasting with Change Point and .

Abstract 13: Karthik Sridharan: Online Akara Supratak, Stefen Schneider, Hao Dong, learning, Probabilistic Inequalities and the Ling Li and Yike Guo. Towards Desynchronization Burkholder Method in NIPS 2017 Time Detection in Biosignals. Series Workshop, 04:15 PM

Rudy Raymond, Takayuki Osogami and Online learning is a framework that makes Sakyasingha Dasgupta. Dynamic Boltzmann minimal assumptions about the sequence of Machines for Second Order Moments and instances provided to a learner. This makes online Generalized Gaussian Distributions. learning an excellent framework for dealing with sequences of instances that vary with time. In Itamar Ben-Ari and Ravid Shwartz-Ziv. Sequence this talk, we will look at inherent connections 47 Dec. 8, 2017

between online learning, certain Probabilistic such as Siri, Google Now, Cortana, Alexa, Inequalities and the so called Burkholder Method. Facebook M and others via in-home devices, We will see how one can derive new, optimal, phones, or messaging channels such as adaptive online learning algorithms using the Messenger, Slack, Skype, among others. At the Burkholder Method via the connection with same time, interest among the research Probabilistic Inequalities. We will use this insight community in conversational systems has to help us get a step closer to what I shall term blossomed: for supervised and reinforcement Plug-&-Play ML. That is, help us move a step learning, conversational systems often serve as towards building machine learning systems both a benchmark task and an inspiration for new automatically. ML methods at conferences which don't focus on speech and language per se, such as NIPS, ICML, IJCAI, and others. Research community challenge

Abstract 15: Claire Monteleoni: Algorithms tasks are proliferating, including the sixth Dialog for Climate Informatics: Learning from Systems Technology Challenge (DSTC6), the spatiotemporal data with both spatial and Amazon Alexa prize, and the Conversational Intelligence Challenge live competition at NIPS temporal non-stationarity in NIPS 2017 2017. Time Series Workshop, 05:15 PM

Climate Informatics is emerging as a compelling Now more than ever, it is crucial to promote application of machine learning. This is due in cross-pollination of ideas between academic part to the urgent nature of climate change, and research centers and industry. The goal of this its many remaining uncertainties (e.g. how will a workshop is to bring together researchers and changing climate afect severe storms and other practitioners in this area, to clarify impactful extreme weather events?). Meanwhile, progress research problems, share fndings from large- in climate informatics is made possible in part by scale real-world deployments, and generate new the public availability of vast amounts of data, ideas for future lines of research. both simulated by large-scale physics-based models, and observed. Not only are time series at This workshop will include invited talks from the crux of the study of climate science, but also, academia and industry, contributed work, and by defnition, climate change implies non- open discussion. In these talks, senior technical stationarity. In addition, much of the relevant leaders from many of the most popular data is spatiotemporal, and also varies over conversational services will give insights into real location. In this talk, I will discuss our work on usage and challenges at scale. An open call for learning in the presence of spatial and temporal papers will be issued, and we will prioritize non-stationarity, and exploiting local forward-looking papers that propose interesting dependencies in time and space. Along the way, I and impactful contributions. We will end the day will highlight open problems in which machine with an open discussion, including a panel learning, including deep learning methods, may consisting of academic and industrial prove fruitful. researchers.

Schedule Conversational AI - today's practice and tomorrow's potential 09:00 Opening (Alborz) Geramifard AM Alborz Geramifard, Jason Williams, Larry Heck, Jim 09:10 Invited Talk - Satindar Singh Singh Glass, Antoine Bordes, Steve Young, Gerald Tesauro AM

Grand Ballroom B, Fri Dec 08, 08:00 AM 09:50 An Ensemble Model with Ranking for AM Social Dialogue In the span of only a few years, conversational 10:10 A Deep Reinforcement Learning systems have become commonplace. Every day, AM Chatbot millions of people use natural-language interfaces 48 Dec. 8, 2017

10:30 Cofee Break / Hangup Posters Dear NIPS Workshop Chairs, AM 11:00 Invited Talk - Joelle Pineau Pineau We propose to organize the workshop: AM 11:40 Multi-Domain Adversarial Learning OPT 2017: Optimization for Machine Learning. AM for Slot Filling in Spoken Language Understanding This year marks a major milestone in the history of OPT, as it will be the 10th anniversary edition 12:00 Lunch Break of this long running NIPS workshop. PM 01:30 Invited Talk: Amazon (Ashwin) Ram The previous OPT workshops enjoyed packed to PM overpacked attendance. This huge interest is no 02:10 Invited Talk - Apple Talk (Blaise) surprise: optimization is the 2nd largest topic at PM Thomson NIPS and is indeed foundational for the wider ML 02:50 One Minute Poster Spotlight Zhao, community. PM khatri, Joshi, Fazel-Zarandi, Gupta, Li, Cercas Curry, Hedayatnia, Zhou, Looking back over the past decade, a strong Venkatesh, Mi, Cheng trend is apparent: The intersection of OPT and ML 03:05 Cofee Break / Poster Session has grown monotonically to the point that now PM several cutting-edge advances in optimization 03:30 Invited Talk: Microsoft (Asli and arise from the ML community. The distinctive PM Jianfeng) Gao feature of optimization within ML is its departure 04:10 Invited Talk: Google (Matt and Dilek) from textbook approaches, in particular, by PM Hakkani-Tur having a diferent set of goals driven by “big- 04:50 Poster Session data,” where both models and practical PM implementation are crucial. 05:20 Invited Talk: IBM (David, Rudolf) PM Kadlec, Nahamoo This intimate relation between OPT and ML is the core theme of our workshop. OPT workshops 06:00 Panel by organizers have previously covered a variety of topics, such PM as frameworks for convex programs (D. 06:50 Closing Remarks (Jason) Williams Bertsekas), the intersection of ML and PM optimization, especially SVM training (S. Wright), large-scale learning via stochastic gradient Abstracts (1): methods and its tradeofs (L. Bottou, N. Srebro), exploitation of structured sparsity Abstract 17: Panel by organizers in (Vandenberghe), randomized methods for Conversational AI - today's practice and extremely large-scale convex optimization (A. tomorrow's potential, 06:00 PM Nemirovski), complexity theoretic foundations of convex optimization (Y. Nesterov), distributed Q&A large-scale optimization (S. Boyd), asynchronous and sparsity based stochastic gradient (B. Recht), algebraic techniques in machine learning (P. Parrilo), insights into nonconvex optimization (A. OPT 2017: Optimization for Machine Lewis), sums-of-squares techniques (J. Lasserre), Learning optimization in the context of deep learning (Y. Bengio), stochastic convex optimization (G. Lan), Suvrit Sra, Sashank J. Reddi, Alekh Agarwal, new views on interior point (E. Hazan), among Benjamin Recht others.

Hall A, Fri Dec 08, 08:00 AM Several ideas propounded in these talks have become important research topics in ML and optimization --- especially in the feld of 49 Dec. 8, 2017

randomized algorithms, stochastic gradient and 2. Four invited talks by leading experts from variance reduced stochastic gradient methods. optimization and ML An edited book "Optimization for Machine 3. Contributed talks from the broader OPT and ML Learning" (S. Sra, S. Nowozin, and S. Wright; MIT community Press, 2011) grew out of the frst three OPT 4. A panel discussion exploring key future workshops, and contains high-quality research directions for OPTML. contributions from many of the speakers and attendees, and there have been sustained requests for the next edition of such a volume. Schedule

We wish to use OPT2017 as a platform to foster 08:50 Opening Remarks discussion, discovery, and dissemination of the AM state-of-the-art in optimization as relevant to 09:00 Poster Session Lau, Maly, Loizou, Kroer, machine learning. And even beyond that, as a AM Yao, Park, Kovacs, Yin, Zhukov, Lim, platform to identify new directions and Barmherzig, Metaxas, Shi, Udwani, challenges that will drive future research. Brendel, Zhou, braverman, Liu, Golikov 09:00 Invited Talk: Leon Bottou Continuing its trend, the workshop will bring AM experts in optimization to share their 09:45 Invited Talk: Yurii Nesterov perspectives while leveraging crossover experts AM in ML to share their views and recent advances. Our tentative invited speakers for this year are: 10:30 Cofee Break 1 AM Yurii Nesterov (already agreed) 11:00 Spotlight: Oracle Complexity of Dimitri Bertsekas (already agreed) AM Second-Order Methods for Smooth Francis Bach (already agreed) Convex Optimization 11:15 Spotlight: Gradient Diversity: a Key Distinction from other optimization workshops at AM Ingredient for Scalable Distributed NIPS: Learning 11:30 Invited Talk: Francis Bach Compared to the other optimization focused AM workshops that happen (or have happened) at 12:15 Lunch Break NIPS, key distinguishing features of OPT are: (a) it PM provides a unique bridge between the ML 02:00 Invited Talk: Dimitri Bertsekas community and the wider optimization PM community, and is the longest running NIPS 02:45 Spotlight: Lower Bounds for Finding workshop on optimization (since NIPS 2008); (b) PM Stationary Points of Non-Convex, it encourages theoretical work on an equal Smooth High-Dimensional Functions footing with practical efciency; and (c) it caters 03:00 Cofee Break 2 to a wide body of NIPS attendees, experts and PM beginners alike; (d) it covers optimization in a 03:30 Invited Talk : Pablo Parrilo broad-spectrum, with a focus on bringing in new PM optimization ideas from diferent communities into ML while identifying key future directions for 04:15 Spotlight: Efciently Optimizing over the broader OPTML community. PM (Non-Convex) Cones via Approximate Projections Organization 04:30 Poster Session II ------PM

The main features of the proposed workshop are: Abstracts (1):

1. One day long with morning and afternoon sessions 50 Dec. 8, 2017

Abstract 1: Opening Remarks in OPT 2017: data will be made available for future learning. Optimization for Machine Learning, 08:50 AM Learning algorithms have to reason about how changes to the system will afect future data, Opening Remarks by the Organizers. giving rise to challenging counterfactual and causal reasoning issues that the learning algorithm has to account for. Modern and scalable policy learning algorithms also require From 'What If?' To 'What Next?' : Causal operating with non-experimental data, such as Inference and Machine Learning for logged user interaction data where users click Intelligent Decision Making ads suggested by recommender systems trained on historical user clicks. Alexander Volfovsky, Adith Swaminathan, Panos Toulis, Nathan Kallus, Ricardo Silva, John Shawe- To further bring the community together around Taylor, Thorsten Joachims, Lihong Li the use of such interaction data, this workshop will host a Kaggle challenge problem based on Hall C, Fri Dec 08, 08:30 AM the frst real-world dataset of logged contextual bandit feedback with non-uniform action- In recent years machine learning and causal selection propensities. The dataset consists of inference have both seen important advances, several gigabytes of data from an ad placement especially through a dramatic expansion of their system, which we have processed into multiple theoretical and practical domains. Machine well-defned learning problems of increasing learning has focused on ultra high-dimensional complexity, feedback signal, and context. models and scalable stochastic algorithms, Participants in the challenge problem will be able whereas causal inference has been guiding policy to discuss their results at the workshop. in complex domains involving economics, social and health sciences, and business. Through such advances a powerful cross-pollination has Schedule emerged as a new set of methodologies promising to deliver robust data analysis than each feld could individually -- some examples 08:30 Introductions Toulis, Volfovsky include concepts such as doubly-robust methods, AM targeted learning, double machine learning, 08:45 Looking for a Missing Signal Bottou causal trees, all of which have recently been AM introduced. 09:20 Invited Talk Brunskill AM This workshop is aimed at facilitating more 10:00 Contributed Talk 1 Heckerman interactions between researchers in machine AM learning and causal inference. In particular, it is 10:15 Contributed Talk 2 Mitrovic an opportunity to bring together highly technical AM individuals who are strongly motivated by the 11:00 Invited Talk 3 Hahn practical importance and real-world impact of AM their work. Cultivating such interactions will lead 11:35 Invited Talk 4 Sontag to the development of theory, methodology, and AM - most importantly - practical tools, that better target causal questions across diferent domains. 12:10 Poster session Zaidi, Kurz, Heckerman, PM Lin, Riezler, Shpitser, Yan, Goudet, In particular, we will highlight theory, algorithms Deshpande, Pearl, Mitrovic, Vegetabile, and applications on automatic decision making Lee, Sachs, Mohan, Rose, Ramakers, systems, such as recommendation engines, Hassanpour, Baldi, Nabi, Hammarlund, medical decision systems and self-driving cars, as Sherman, Lawrence, Jabbari, Semenova, both producers and users of data. The challenge Dimakopoulou, Gajane, Greiner, Zadik, here is the feedback between learning from data Blocker, Xu, EL HAY, Jebara, Rostykus and then taking actions that may afect what 51 Dec. 8, 2017

01:35 Contributed Talk 3 Shpitser Extreme classifcation is a rapidly growing PM research area focussing on multi-class and multi- 01:50 Contributed Talk 4 Pearl label problems involving an extremely large PM number of labels. Many applications have been 02:05 Invited Talk 5 imbens found in diverse areas ranging from language PM modelling to document tagging in NLP, face recognition to learning universal feature 03:30 Invited Talk Yu representations in computer vision, gene function PM prediction in bioinformatics, etc. Extreme 04:05 Causal inference with machine classifcation has also opened up a new paradigm PM learning for ranking and recommendation by 05:00 Causality and Machine Learning reformulating them as multi- label learning tasks PM Challenge: Criteo Ad Placement where each item to be ranked or recommended is Challenge treated as a separate label. Such reformulations have led to signifcant gains over traditional collaborative fltering and content based Abstracts (1): recommendation techniques. Consequently, Abstract 2: Looking for a Missing Signal in extreme classifers have been deployed in many From 'What If?' To 'What Next?' : Causal real-world applications in industry. Inference and Machine Learning for Intelligent Decision Making, Bottou 08:45 AM Extreme classifcation raises a number of interesting research questions including those We know how to spot object in images, but we related to: must learn on more images than a human can see in a lifetime. We know how to translate text * Large scale learning and distributed and parallel (somehow), but we must learn it on more text training than a human can read in a lifetime. We know * Log-time and log-space prediction and how to learn playing Atari games, but we must prediction on a test-time budget learn it by playing more games than any * Label embedding and tree based approaches teenager can endure. The list is long. We can of * Crowd sourcing, preference elicitation and other course try to pin this inefciently to some data gathering techniques properties of our algorithms. However, we can * Bandits, semi-supervised learning and other also take the point of view that there is possibly a approaches for dealing with training set biases lot of signal in natural data that we simply do not and label noise exploit. I will report on two works in this direction. * Bandits with an extremely large number of arms The frst one establishes that something as * Fine-grained classifcation simple as a collection of static images contains * Zero shot learning and extensible output spaces nontrivial information about the causal relations between the objects they represent. The second * Tackling label polysemy, synonymy and one, time permitting, shows how an attempt to correlations discover such a structure in observational data * Structured output prediction and multi-task led to a clear improvement of Generative learning Adversarial Networks. * Learning from highly imbalanced data * Dealing with tail labels and learning from very few data points per label * PU learning and learning from missing and Extreme Classifcation: Multi-class & Multi- incorrect labels label Learning in Extremely Large Label * Feature extraction, feature sharing, lazy feature Spaces evaluation, etc. * Performance evaluation Manik Varma, Marius Kloft, Krzysztof Dembczynski * Statistical analysis and generalization bounds * Applications to new domains Hyatt Hotel, Regency Ballroom A+B+C, Fri Dec 08, 08:00 AM 52 Dec. 8, 2017

The workshop aims to bring together researchers 02:00 Maxim Grechkin (UW) on EZLearn: interested in these areas to encourage discussion PM Exploiting Organic Supervision in and improve upon the state-of-the-art in extreme Large-Scale Data Annotation classifcation. In particular, we aim to bring Grechkin together researchers from the natural language 02:15 Sayantan Dasgupta (Michigan) on processing, computer vision and core machine PM Multi-label Learning for Large Text learning communities to foster interaction and Corpora using Latent Variable Model collaboration. Several leading researchers will Dasgupta present invited talks detailing the latest advances 02:30 Yukihiro Tagami (Yahoo) on Extreme in the area. We also seek extended abstracts PM Multi-label Learning via Nearest presenting work in progress which will be Neighbor Graph Partitioning and reviewed for acceptance as a spotlight + poster Embedding Tagami or a talk. The workshop should be of interest to 03:00 Cofee Break researchers in core supervised learning as well as PM application domains such as recommender 03:15 Mehryar Mohri (NYU) on Tight systems, computer vision, computational PM Learning Bounds for Multi-Class advertising, information retrieval and natural Classifcation Mohri language processing. We expect a healthy 03:45 Ravi Ganti (Walmart Labs) on participation from both industry and academia. PM Exploiting Structure in Large Scale Bandit Problems Ganti Schedule 04:00 Hai S Le (WUSTL) on Precision-Recall PM versus Accuracy and the Role of Large Data Sets Le 09:00 Introduction by Manik Varma Varma 04:15 Loubna Benabbou (EMI) on A AM PM Reduction Principle for Generalizing 09:05 John Langford (MSR) on Dreaming Bona Fide Risk Bounds in Multi-class AM Contextual Memory Langford Seeting BENABBOU 09:35 Ed Chi (Google) on Learned Deep 04:30 Marius Kloft (Kaiserslautern) on AM Retrieval for Recommenders Chi PM Generalization Error Bounds for 10:05 David Sontag (MIT) on Extreme Multi-class Classifcation AM Representation Learning for Kloft Extreme Multi-class Classifcation & Density Estimation 10:35 Cofee Break AM Learning on Distributions, Functions, 11:00 Inderjit Dhillon (UT Austin & Graphs and Groups AM Amazon) on Stabilizing Gradients for Deep Neural Networks with Florence d'Alché-Buc, Krikamol Muandet, Bharath Sriperumbudur, Zoltán Szabó Applications to Extreme

Classifcation Hyatt Hotel, Regency Ballroom D+E+F+H, Fri Dec 08, 08:00 AM 11:30 Wei-cheng Chang (CMU) on Deep AM Learning Approach for Extreme The increased variability of acquired data has Multi-label Text Classifcation recently pushed the feld of machine learning to 12:00 Lunch extend its scope to non-standard data including PM for example functional (Ferraty & Vieu, 2006; 01:30 Pradeep Ravikumar (CMU) on A Wang et al., 2015), distributional (Póczos et al., PM Parallel Primal-Dual Sparse Method 2013), graph, or topological data (Carlsson, 2009; for Extreme Classifcation Ravikumar Vitaliy). Successful applications span across a wide range of disciplines such as healthcare (Zhou et al., 2013), action recognition from iPod/ iPhone accelerometer data (Sun et al., 2013), causal inference (Lopez-Paz et al., 2015), 53 Dec. 8, 2017

bioinformatics (Kondor & Pan, 2016; Kusano et - Learning with non-standard input/output data al., 2016), cosmology (Ravanbakhsh et al., 2016; - Large-scale approximations (e.g. sketching, Law et al., 2017), acoustic-to-articulatory speech random Fourier features, hashing, Nyström inversion (Kadri et al., 2016), network inference method, inducing point methods), and statistical- (Brouard et al., 2016), climate research (Szabó et computational efciency tradeofs al., 2016), and ecological inference (Flaxman et al., 2015). References: Leveraging the underlying structure of these non- standard data types often leads to signifcant Frédéric Ferraty and Philippe Vieu. Nonparametric boost in prediction accuracy and inference Functional Data Analysis: Theory and Practice. performance. In order to achieve these Springer Series in Statistics, Springer-Verlag, compelling improvements, however, numerous 2006. challenges and questions have to be addressed: (i) choosing an adequate representation of the Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg data, (ii) constructing appropriate similarity Müller. Review of Functional Data Analysis. measures (inner product, norm or metric) on Annual Review of Statistics, 3:1-41, 2015. these representations, (iii) efciently exploiting their intrinsic structure such as multi-scale nature Barnabás Póczos, Aarti Singh, Alessandro Rinaldo, or invariances, (iv) designing afordable Larry Wasserman. Distribution-free Distribution computational schemes (relying e.g., on Regression. International Conference on AI and surrogate losses), (v) understanding the Statistics (AISTATS), PMLR 31:507-515, 2013. computational-statistical tradeofs of the resulting algorithms, and (vi) exploring novel Gunnar Carlsson. Topology and data. Bulletin of application domains. the American Mathematical Society, 46 (2): 255-308, 2009. The goal of this workshop is (i) to discuss new theoretical considerations and Vitaliy Kurlin. Research blog: http://kurlin.org/ applications related to learning with non-standard blog/. data, (ii) to explore future research directions by Jiayu Zhou, Jun Liu, Vaibhav A. Narayan, and bringing together practitioners with various Jieping Ye. Modeling disease progression via domain expertise and algorithmic tools, and multi-task learning. NeuroImage, 78:233-248, theoreticians interested in providing sound 2013. methodology, (iii) to accelerate the advances of this recent area Xu Sun, Hisashi Kashima, and Naonori Ueda. and application arsenal. Large-scale personalized human activity recognition using online multitask learning. IEEE We encourage submissions on a variety of topics, Transactions on Knowledge and Data Engine, including but not limited to: 25:2551-2563, 2013. - Novel applications for learning on non-standard objects David Lopez-Paz, Krikamol Muandet, Bernhard - Learning theory/algorithms on distributions Schölkopf, and Ilya Tolstikhin. Towards a Learning - Topological and geometric data analysis Theory of Cause-Efect Inference. International - Functional data analysis Conference on Machine Learning (ICML), PMLR - Multi-task learning, structured output prediction, 37:1452-1461, 2015. and surrogate losses - Vector-valued learning (e.g., operator-valued Risi Kondor, Horace Pan. The Multiscale Laplacian kernel) Graph Kernel. Advances in Neural Information - Gaussian processes Processing Systems (NIPS), 2982-2990, 2016. - Learning on graphs and networks - Group theoretic methods and invariances in Genki Kusano, Yasuaki Hiraoka, Kenji Fukumizu. learning Persistence weighted Gaussian kernel for 54 Dec. 8, 2017

topological data analysis. International 10:10 On Kernel Methods for Covariates Conference on Machine Learning (ICML), PMLR AM that are Rankings (poster). 48:2004-2013, 2016. 10:10 Learning from Conditional AM Distributions via Dual Embeddings Siamak Ravanbakhsh, Junier Oliva, Sebastian (poster). Song Fromenteau, Layne Price, Shirley Ho, Jef 10:10 Squared Earth Mover's Distance Schneider, Barnabás Póczos. Estimating AM Loss for Training Deep Neural Cosmological Parameters from the Dark Matter Networks on Ordered-Classes Distribution. International Conference on Machine (poster). Hou Learning (ICML), PMLR 48:2407-2416, 2016. 10:10 Large Scale Graph Learning from AM Smooth Signals (poster). Ho Chung Leon Law, Dougal J. Sutherland, Dino 10:10 Convolutional Layers based on Sejdinovic, Seth Flaxman. Bayesian Distribution AM Directed Multi-Graphs (poster). Arodz Regression. Technical Report, 2017 (https:// 10:10 Kernels on Fuzzy Sets: an Overview arxiv.org/abs/1705.04293). AM (poster). Guevara Diaz

Hachem Kadri, Emmanuel Dufos, Philippe Preux, 10:10 The Geometric Block Model (poster). Stéphane Canu, Alain Rakotomamonjy, and Julien AM Pal Audifren. Operator-valued kernels for learning 10:10 Graph based for from functional response data. Journal of Machine AM Structured High Dimensional Data Learning Research, 17:1-54, 2016. (poster). Zhang 10:10 Learning from Graphs with Céline Brouard, Marie Szafranski, and Florence AM Structural Variation (poster). Holm, d’Alché-Buc. Input output kernel regression: Nielsen Supervised and semi-supervised structured 10:10 Worst-case vs. Average-case Design output prediction with operator-valued kernels. AM for Estimation from Fixed Pairwise Journal of Machine Learning Research, 17:1-48, Comparisons (poster). 2016. 10:10 Diferentially Private Database AM Release via Kernel Mean Zoltán Szabó, Bharath K. Sriperumbudur, Embeddings (poster). Barnabás Póczos, Arthur Gretton. Learning Theory 10:10 Post Selection Inference with for Distribution Regression. Journal of Machine AM Maximum Mean Discrepancy Learning Research, 17(152):1-40, 2016. (poster). 10:10 Algorithmic and Statistical Aspects Seth Flaxman, Yu-Xiang Wang, and Alex Smola. AM of Linear Regression without Who supported Obama in 2012? Ecological Correspondence (poster). inference through distribution regression. In ACM 10:10 When is Network Lasso Accurate: SIGKDD International Conference on Knowledge AM The Vector Case (poster). Discovery and Data Mining (KDD), 289-298, 2015. 10:10 Bayesian Distribution Regression AM (poster). Schedule 10:10 The Weighted Kendall Kernel AM (poster). 11:00 When is Network Lasso Accurate: 09:00 On Structured Prediction Theory AM The Vector Case. Tran AM with Calibrated Convex Surrogate Losses. Lacoste-Julien 11:20 Worst-case vs. Average-case Design AM for Estimation from Fixed Pairwise 09:30 Diferentially Private Database Comparisons. AM Release via Kernel Mean Embeddings. 11:40 The Weighted Kendall Kernel. AM 09:50 Bayesian Distribution Regression. AM 12:00 On Kernel Methods for Covariates PM that are Rankings. 55 Dec. 8, 2017

12:20 Lunch Break Abstract 3: Bayesian Distribution Regression. PM in Learning on Distributions, Functions, 01:50 Learning on topological and Graphs and Groups, 09:50 AM PM geometrical structures of data. Authors: Ho Chung Leon Law, Dougal J. Fukumizu Sutherland, Dino Sejdinovic, Seth Flaxman. 02:20 Operator-valued kernels and their PM application to functional data analysis. Kadri Abstract 4: On Kernel Methods for Covariates 02:50 Poster Session II & Cofee that are Rankings (poster). in Learning on PM Distributions, Functions, Graphs and 03:50 Distribution Regression and its Groups, 10:10 AM PM Applications. Poczos 04:20 Covariant Compositional Networks Authors: Horia Mania, Aaditya Ramdas, Martin PM for Learning Graphs Kondor Wainwright, Michael Jordan, Benjamin Recht. Poster session continues at 14:50 - 15:50.

Abstracts (27):

Abstract 1: On Structured Prediction Theory Abstract 5: Learning from Conditional with Calibrated Convex Surrogate Losses. in Distributions via Dual Embeddings (poster). Learning on Distributions, Functions, in Learning on Distributions, Functions, Graphs and Groups, Lacoste-Julien 09:00 AM Graphs and Groups, Song 10:10 AM

We provide novel theoretical insights on Authors: Bo Dai, Niao He, Yunpeng Pan, Byron structured prediction in the context of efcient Boots, Le Song. Poster session continues at 14:50 convex surrogate loss minimization with - 15:50. consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we Abstract 6: Squared Earth Mover's Distance prove tight bounds on the so-called "calibration Loss for Training Deep Neural Networks on function" relating the excess surrogate risk to the Ordered-Classes (poster). in Learning on actual risk. In contrast to prior related work, we Distributions, Functions, Graphs and carefully monitor the efect of the exponential Groups, Hou 10:10 AM number of classes in the learning guarantees as well as on the optimization complexity. As an Authors: Le Hou, Chen-Ping Yu, Dimitris Samaras. interesting consequence, we formalize the Poster session continues at 14:50 - 15:50. intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for general structured prediction. Abstract 7: Large Scale Graph Learning from Smooth Signals (poster). in Learning on This (https://arxiv.org/abs/1703.02403) is joint Distributions, Functions, Graphs and work with Anton Osokin and Francis Bach. Groups, 10:10 AM

Authors: Vassilis Kalofolias, Nathanael Perraudin. Poster session continues at 14:50 - 15:50. Abstract 2: Diferentially Private Database Release via Kernel Mean Embeddings. in Learning on Distributions, Functions, Graphs and Groups, 09:30 AM Abstract 8: Convolutional Layers based on Directed Multi-Graphs (poster). in Learning Authors: Matej Balog, Ilya Tolstikhin, Bernhard on Distributions, Functions, Graphs and Schölkopf. Groups, Arodz 10:10 AM 56 Dec. 8, 2017

Author: Tomasz Arodz. Poster session continues at 14:50 - 15:50. Abstract 14: Diferentially Private Database Release via Kernel Mean Embeddings (poster). in Learning on Distributions, Abstract 9: Kernels on Fuzzy Sets: an Functions, Graphs and Groups, 10:10 AM Overview (poster). in Learning on Distributions, Functions, Graphs and Authors: Matej Balog, Ilya Tolstikhin, Bernhard Groups, Guevara Diaz 10:10 AM Schölkopf. Poster session continues at 14:50 - 15:50. Author: Jorge Luis Guevara Diaz. Poster session continues at 14:50 - 15:50.

Abstract 15: Post Selection Inference with Maximum Mean Discrepancy (poster). in Abstract 10: The Geometric Block Model Learning on Distributions, Functions, (poster). in Learning on Distributions, Graphs and Groups, 10:10 AM Functions, Graphs and Groups, Pal 10:10 AM Authors: Denny Wu, Makoto Yamada, Ichiro Authors: Sainyam Galhotra, Arya Mazumdar, Takeuchi, Kenji Fukumizu. Poster session Soumyabrata Pal, Barna Saha. Poster session continues at 14:50 - 15:50. continues at 14:50 - 15:50.

Abstract 16: Algorithmic and Statistical Abstract 11: Graph based Feature Selection Aspects of Linear Regression without for Structured High Dimensional Data Correspondence (poster). in Learning on (poster). in Learning on Distributions, Distributions, Functions, Graphs and Functions, Graphs and Groups, Zhang 10:10 Groups, 10:10 AM AM Authors: Daniel Hsu, Kevin Shi, Xiaorui Sun. Authors: Thosini K. Bamunu Mudiyanselage, Poster session continues at 14:50 - 15:50. Yanqing Zhang. Poster session continues at 14:50 - 15:50.

Abstract 17: When is Network Lasso Accurate: The Vector Case (poster). in Abstract 12: Learning from Graphs with Learning on Distributions, Functions, Structural Variation (poster). in Learning on Graphs and Groups, 10:10 AM Distributions, Functions, Graphs and Groups, Holm, Nielsen 10:10 AM Authors: Nguyen Quang Tran, Alexander Jung, Saeed Basirian. Poster session continues at 14:50 Authors: Rune K. Nielsen, Aasa Feragen, Andreas - 15:50. Holm. Poster session continues at 14:50 - 15:50.

Abstract 18: Bayesian Distribution Abstract 13: Worst-case vs. Average-case Regression (poster). in Learning on Design for Estimation from Fixed Pairwise Distributions, Functions, Graphs and Comparisons (poster). in Learning on Groups, 10:10 AM Distributions, Functions, Graphs and Groups, 10:10 AM Authors: Ho Chung Leon Law, Dougal J Sutherland, Dino Sejdinovic, Seth Flaxman. Poster Authors: Ashwin Pananjady, Cheng Mao, Vidya session continues at 14:50 - 15:50. Muthukumar, Martin Wainwright, Thomas Courtade. Poster session continues at 14:50 - 15:50. 57 Dec. 8, 2017

Abstract 19: The Weighted Kendall Kernel (2002), provides a multiscale descriptor for the (poster). in Learning on Distributions, topology of data, and has been recently applied Functions, Graphs and Groups, 10:10 AM to a variety of data analysis. In this talk I will introduce a machine learning framework of TDA Authors: Yunlong Jiao, Jean-Philippe Vert. Poster by combining persistence homology and kernel session continues at 14:50 - 15:50. methods. As an expression of persistent homology, persistence diagrams are widely used to express the lifetimes of generators of Abstract 20: When is Network Lasso homology groups. While they serve as a compact Accurate: The Vector Case. in Learning on representation of data, it is not straightforward to Distributions, Functions, Graphs and apply standard data analysis to persistence Groups, Tran 11:00 AM diagrams, since they consist of a set of points in 2D space expressing the lifetimes. We introduce Authors: Nguyen Quang Tran, Alexander Jung, a method of kernel embedding of the persistence Saeed Basirian. diagrams to obtain their vector representation, which enables one to apply any kernel methods in topological data analysis, and propose a Abstract 21: Worst-case vs. Average-case persistence weighted Gaussian kernel as a Design for Estimation from Fixed Pairwise suitable kernel for vectorization of persistence Comparisons. in Learning on Distributions, diagrams. Some theoretical properties including Functions, Graphs and Groups, 11:20 AM Lipschitz continuity of the embedding are also discussed. I will also present applications to Authors: Ashwin Pananjady, Cheng Mao, Vidya change point detection and time series analysis Muthukumar, Martin Wainwright, Thomas in the feld of material sciences and biochemistry. Courtade.

Abstract 26: Operator-valued kernels and Abstract 22: The Weighted Kendall Kernel. in their application to functional data analysis. Learning on Distributions, Functions, in Learning on Distributions, Functions, Graphs and Groups, 11:40 AM Graphs and Groups, Kadri 02:20 PM

Authors: Yunlong Jiao, Jean-Philippe Vert. Positive semidefnite operator-valued kernel generalizes the well-known notion of reproducing kernel, and is a main concept underlying many Abstract 23: On Kernel Methods for kernel-based vector-valued learning algorithms. Covariates that are Rankings. in Learning In this talk I will give a brief introduction to on Distributions, Functions, Graphs and learning with operator-valued kernels, discuss Groups, 12:00 PM current challenges in the feld, and describe convenient schemes to overcome them. I'll Authors: Horia Mania, Aaditya Ramdas, Martin overview our recent work on learning with Wainwright, Michael Jordan, Benjamin Recht. functional data in the case where both attributes and labels are functions. In this setting, a set of rigorously defned infnite-dimensional operator- Abstract 25: Learning on topological and valued kernels that can be valuably applied when geometrical structures of data. in Learning the data are functions is described, and a on Distributions, Functions, Graphs and learning scheme for nonlinear functional data Groups, Fukumizu 01:50 PM analysis is introduced. The methodology is illustrated through speech and audio signal Topological data analysis (TDA) is a recent processing experiments. methodology for extracting topological and geometrical features from complex geometric data structures. Persistent homology, a new mathematical notion proposed by Edelsbrunner 58 Dec. 8, 2017

Abstract 28: Distribution Regression and its Machine Learning for Creativity and Applications. in Learning on Distributions, Design Functions, Graphs and Groups, Poczos 03:50

PM Douglas Eck, David Ha, Ali Eslami, Sander Dieleman, Rebecca Fiebrink, Luba Elliott The most common machine learning algorithms operate on fnite-dimensional vectorial feature Hyatt Hotel, Seaview Ballroom, Fri Dec 08, 08:00 AM representations. In many applications, however, the natural representation of the data consists of In the last year, generative machine learning and distributions, sets, and other complex objects machine creativity have gotten a lot of attention rather than fnite-dimensional vectors. In this talk in the non-research world. At the same time there we will review machine learning algorithms that have been signifcant advances in generative can operate directly on these complex objects. models for media creation and for design. This We will discuss applications in various scientifc one-day workshop explores several issues in the problems including estimating the cosmological domain of generative models for creativity and parameters of our Universe, dynamical mass design. First, we will look at algorithms for measurements of galaxy clusters, fnding generation and creation of new media and new anomalous events in fuid dynamics, and designs, engaging researchers building the next estimating phenotypes in agriculturally important generation of generative models (GANs, RL, etc) plants. and also from a more information-theoretic view of creativity (compression, entropy, etc). Second, we will investigate the social and cultural impact Abstract 29: Covariant Compositional of these new models, engaging researchers from Networks for Learning Graphs in Learning HCI/UX communities. Finally, we’ll hear from on Distributions, Functions, Graphs and some of the artists and musicians who are Groups, Kondor 04:20 PM adopting machine learning approaches like deep learning and reinforcement learning as part of Most existing neural networks for learning graphs their artistic process. We’ll leave ample time for deal with the issue of permutation invariance by discussing both the important technical conceiving of the network as a message passing challenges of generative models for creativity scheme, where each node sums the feature and design, as well as the philosophical and vectors coming from its neighbors. We argue that cultural issues that surround this area of this imposes a limitation on their representation research. power, and instead propose a new general architecture for representing objects consisting of Background a hierarchy of parts, which we call covariant In 2016, DeepMind’s AlphaGo made two moves compositional networks (CCNs). Here covariance against Lee Sedol that were described by the Go means that the activation of each neuron must community as “brilliant,” “surprising,” transform in a specifc way under permutations, “beautiful,” and so forth. Moreover, there was similarly to steerability in CNNs. We achieve little discussion surrounding the fact that these covariance by making each activation transform very creative moves were actually made by a according to a tensor representation of the machine (Wired); it was enough that they were permutation group, and derive the corresponding great examples of go playing. At the same time, tensor aggregation rules that each neuron must the general public showed more concern for other implement. Experiments show that CCNs can applications of generative models. Algorithms outperform competing methods on some that allow for convincing voice style transfer standard graph learning benchmarks. (Lyrebird) or puppet-like video face control (Face2Face) have raised concerns that generative ML will be used to make convincing forms of fake news (FastCompany).

Balancing this, the arts and music worlds have positively embraced generative models. Starting 59 Dec. 8, 2017

with DeepDream and expanding with image and 2011), allowing users to dynamically mix video generation advances (e.g. GANs) we’ve between multiple generative models (Akten & seen lots of new and interesting art and music Grierson 2016), or other techniques. Although [citations] technologies provided by the machine questions of how to make learning algorithms learning community. We’ve seen research controllable and understandable to users are projects like Google Brain’s Magenta, Sony CSL’s relatively nacesent in the modern context of deep FlowMachines and IBM’s Watson undertake learning and reinforcement learning, such collaborations and attempt to build tools and ML questions have been a growing focus of work models for use by these communities. within the human-computer interaction community (e.g., examined in a CHI 2016 Research workshop on Human-Centred Machine Learning), Recent advances in generative models enable and the AI Safety community (e.g. Christiano et new possibilities in art and music production. al. 2017, using human preferences to train deep Language models can be used to write science reinforcement learning systems). Such fction flm scripts (Sunspring) and even replicate considerations also underpin the new Google the style of individual authors (Deep Tingle). “People + AI Research” (PAIR) initiative. Generative models for image and video allow us to create visions of people, places and things that Artists and Musicians resemble the distribution of actual images (GANs All the above techniques improve our capabilities etc). Sequence modelling techniques have of producing text, sound and images. Art and opened up the possibility of generating realistic music that stands the test of time however musical scores (MIDI generation etc) and even requires more than that. Recent research includes raw audio that resembles human speech and a focus on novelty in creative adversarial physical instruments (DeepMind’s WaveNet, networks (Elgammal et al., 2017) and considers MILA’s Char2Wav and Google’s NSynth). In how generative algorithms can integrate into addition, sequence modelling allows us to model human creative processes, supporting vector images to construct stroke-based drawings exploration of new ideas as well as human of common objects according to human doodles infuence over generated content (Atken & (sketch-rnn). Grierson 2016a, 2016b). Artists including Mario Klingemann, Gene Kogan, Mike Tyka, and Memo In addition to feld-specifc research, a number of Akten have further contributed to this space of papers have come out that are directly applicable work by creating artwork that compellingly to the challenges of generation and evaluation demonstrates capabilities of generative such as learning from human preferences algorithms, and by publicly refecting on the (Christiano et al., 2017) and CycleGAN. The artistic afordances of these new tools. application of Novelty Search (Stanley), evolutionary complexifcation (Stanley - CPPN, The goal of this workshop is to bring together NEAT, Nguyen et al - Plug&Play GANs, Innovation researchers interested in advancing art and Engine) and intrinsic motivation (Oudeyer et al music generation to present new work, foster 2007, Schmidhuber on Fun and Creativity) collaborations and build networks. techniques, where objective functions are constantly evolving, is still not common practice In this workshop, we are particularly interested in in art and music generation using machine how the following can be used in art and music learning. generation: reinforcement learning, generative adversarial networks, novelty search and Another focus of the workshop is how to better evaluation as well as learning from user enable human infuence over generative models. preferences. We welcome submissions of short This could include learning from human papers, demos and extended abstracts related to preferences, exposing model parameters in ways the above. that are understandable and relevant to users in a given application domain (e.g., similar to Morris There will also be an open call for a display of et al. 2008), enabling users to manipulate models artworks incorporating machine learning through changes to training data (Fiebrink et al. techniques. 60 Dec. 8, 2017

Schedule 04:15 Artwork Ambrosi, Erler, Salavon, PM Reimann-Dubbers, Barrat 04:15 Disentangled representations of 08:30 Welcome and Introduction PM style and content for visual art with AM generative adversarial networks 08:45 Invited Talk Schmidhuber 04:15 Learning to Create Piano AM PM Performances 09:15 Invited Talk Denton 04:15 SocialML: machine learning for AM PM social media video creators 09:45 Invited Talk Fiebrink 04:15 Combinatorial Meta Search AM PM 10:15 GANosaic - Mosaic Creation with 04:15 Exploring Audio Style Transfer AM Generative Texture Manifolds PM Jetchev, Bergmann, Seward 04:15 Deep Learning for Identifying 10:20 TopoSketch: Drawing in Latent PM Potential Conceptual Shifts for Co- AM Space creative Drawing Karimi 10:25 Input parameterization for 04:15 Neural Style Transfer for Audio AM DeepDream PM Spectograms Verma, Smith 11:00 Invited Talk Goodfellow 04:15 The Emotional GAN: Priming AM PM Adversarial Generation of Art with 11:30 Improvised Comedy as a Turing Test Emotion Amores Fernandez AM 04:15 Algorithmic composition of 12:00 Lunch PM polyphonic music with the WaveCRF PM 04:15 SOMNIA: Self-Organizing Maps as 01:00 Invited Talk Elgammal PM Neural Interactive Art PM 04:15 Generating Black Metal and Math 01:30 Hierarchical Variational PM Rock: Beyond Bach, Beethoven, and PM Autoencoders for Music Beatles 02:00 Lexical preferences in an automated 04:15 Deep Interactive Evolutionary PM story writing system PM Computation Bontrager 02:30 ObamaNet: Photo-realistic lip-sync 04:15 Sequential Line Search for PM from text Kumar, Sotelo, Kumar, de PM Generative Adversarial Networks Brébisson 04:15 ASCII Art Synthesis with 03:00 Art / Cofee Break PM Convolutional Networks PM 04:15 Compositional Pattern Producing 03:30 Towards the High-quality Anime PM GAN PM Characters Generation with 04:15 Generative Embedded Mapping Generative Adversarial Networks PM Systems for Design 03:35 Crowd Sourcing Clothes Design 04:15 Consistent Comic Colorization with PM Directed by Adversarial Neural PM Pixel-wise Background Classifcation Networks Osone, Kato, Sato, Muramatsu Choo, Kang 03:40 Paper Cubes: Evolving 3D characters 04:15 Imaginary Soundscape : Cross-Modal PM in Augmented Reality using PM Approach to Generate Pseudo Sound Recurrent Neural Networks Fuste, Environments Kajihara, Tokui Jongejan 04:15 Repeating and Remembering: GANs 03:45 Open discussion PM in an art context Ridler PM 04:15 Improvisational Storytelling Agents 04:15 AI for Fragrance Design Segal PM PM 61 Dec. 8, 2017

Machine Learning and Computer Security 11:30 BadNets: Identifying Vulnerabilities AM in the Machine Learning Model Jacob Steinhardt, Nicolas Papernot, Bo Li, Chang Liu, Supply Chain Garg Percy Liang, Dawn Song 11:45 Poster Spotlights II AM Hyatt Hotel, Shoreline, Fri Dec 08, 08:00 AM 01:30 Defending Against Adversarial While traditional computer security relies on well- PM Examples Goodfellow defned attack models and proofs of security, a 02:00 Provable defenses against science of security for machine learning systems PM adversarial examples via the convex has proven more elusive. This is due to a number outer adversarial polytope Kolter of obstacles, including (1) the highly varied 02:15 Games People Play (With Bots) angles of attack against ML systems, (2) the lack PM Brinkman of a clearly defned attack surface (because the 02:45 Synthesizing Robust Adversarial source of the data analyzed by ML systems is not PM Examples Ilyas, Athalye, Engstrom, easily traced), and (3) the lack of clear formal Kwok defnitions of security that are appropriate for ML 03:00 Poster Session systems. At the same time, security of ML PM systems is of great import due the recent trend of 03:45 Safety beyond Security: Societal using ML systems as a line of defense against PM Challenges for Machine Learning malicious behavior (e.g., network intrusion, Hardt malware, and ransomware), as well as the 04:15 Towards Verifcation of Deep Neural prevalence of ML systems as parts of sensitive PM Networks Barrett and valuable software systems (e.g., sentiment analyzers for predicting stock prices). This workshop will bring together experts from the computer security and machine learning ML Systems Workshop @ NIPS 2017 communities in an attempt to highlight recent work in this area, as well as to clarify the Aparna Lakshmiratan, Sarah Bird, Siddhartha Sen, foundations of secure ML and chart out important Chris Ré, Li Erran Li, Joseph Gonzalez, Dan Crankshaw directions for future work and cross-community collaborations. S1, Fri Dec 08, 08:00 AM

Schedule A new area is emerging at the intersection of artifcial intelligence, machine learning, and systems design. This birth is driven by the 09:00 Opening Remarks Song explosive growth of diverse applications of ML in AM production, the continued growth in data volume, 09:15 AI Applications in Security at Ant and the complexity of large-scale learning AM Financial Qi systems. The goal of this workshop is to bring 09:45 A Word Graph Approach for together experts working at the crossroads of AM Dictionary Detection and Extraction machine learning, system design and software in DGA Domain Names Pereira engineering to explore the challenges faced when building practical large-scale ML systems. In 10:00 Practical Machine Learning for Cloud particular, we aim to elicit new connections AM Intrusion Detection Siva Kumar among these diverse felds, and identify tools, 10:15 Poster Spotlights I Na, Song, Sinha, best practices and design principles. We also AM Shin, Huang, Narodytska, Staib, Pei, want to think about how to do research in this Suya, Ghorbani, Buckman, Hein, Zhang, area and properly evaluate it. The workshop will Qi, Tian, Du, Tsipras cover ML and AI platforms and algorithm toolkits, 11:00 International Security and the AI as well as dive into machine learning-focused AM Revolution Dafoe developments in distributed learning platforms, programming languages, data structures, GPU 62 Dec. 8, 2017

processing, and other topics. 03:40 Posters and Cofee Tristan, Lee, PM Dorogush, Hido, Terry, Siam, Nakada, This workshop will follow the successful model we Coleman, Ha, Zhang, Stooke, Meng, have previously run at ICML, NIPS and SOSP Kappler, Schwartz, Olston, Schelter, Sun, 2017. Kang, Hummer, Chung, Kraska, Ramchandran, Hynes, Boden, Kwak Our plan is to run this workshop annually at one 04:30 Contributed Talk 3: NSML: A Machine ML venue and one Systems venue, and PM Learning Platform That Enables You eventually merge these communities into a full to Focus on Your Models conference venue. We believe this dual approach 04:50 Contributed Talk 4: DAWNBench: An will help to create a low barrier to participation PM End-to-End Deep Learning for both communities. Benchmark and Competition 05:10 Panel Gibson, Gonzalez, Langford, Song Schedule PM

08:45 Opening Remarks AM Machine Learning for Molecules and 09:00 Invited Talk: Ray: A distributed Materials AM execution engine for emerging AI Stefan Chmiela, Jose Miguel Hernández-Lobato, applications, Ion Stoica, UC Berkeley Kristof Schütt, Alan Aspuru-Guzik, Alexandre Stoica Tkatchenko, Bharath Ramsundar, Anatole von 09:20 Contributed Talk 1: The Case for Lilienfeld, Matt Kusner, Koji Tsuda, Brooks Paige, AM Learning Database Indexes Klaus-Robert Müller 09:40 Invited Talk: Federated Multi-Task S4, Fri Dec 08, 08:00 AM AM Learning, Virginia Smith, Stanford University Smith The success of machine learning has been 10:00 Poster Previews: 1 min lightning demonstrated time and time again in AM talks classifcation, generative modelling, and 11:30 Invited Talk: Accelerating Persistent reinforcement learning. In particular, we have AM Neural Networks at Datacenter recently seen interesting developments where ML Scale, Daniel Lo, Microsoft Research has been applied to the natural sciences Lo (chemistry, physics, materials science, 11:50 Contributed Talk 2: DLVM: A modern neuroscience and biology). Here, often the data is AM compiler framework for neural not abundant and very costly. This workshop will network DSLs focus on the unique challenges of applying 12:10 Lunch machine learning to molecules and materials. PM 01:20 Updates from Current ML Systems Accurate prediction of chemical and physical PM (TensorFlow, PyTorch, Cafe2, CNTK, properties is a crucial ingredient toward rational MXNet, TVM, Clipper, MacroBase, compound design in chemical and ModelDB) Monga, Chintala, Zhang, pharmaceutical industries. Many discoveries in Chen, Crankshaw, Tai, Tulloch, Vartak chemistry can be guided by screening large 02:50 Invited Talk: Machine Learning for databases of computational molecular structures PM Systems and Systems for Machine and properties, but high level quantum-chemical Learning, Jef Dean, Google Brain calculations can take up to several days per Dean molecule or material at the required accuracy, placing the ultimate achievement of in silico 03:20 Invited Talk: Creating an Open and design out of reach for the foreseeable future. In PM Flexible ecosystem for AI models large part the current state of the art for such with ONNX, Sarah Bird, Dmytro problems is the expertise of individual Dzhulgakov, Facebook Research Bird researchers or at best highly-specifc rule-based 63 Dec. 8, 2017

heuristic systems. Efcient methods in machine [1] Behler, J., Lorenz, S., Reuter, K. (2007). learning, applied to property and structure Representing molecule-surface interactions with prediction, can therefore have pivotal impact in symmetry-adapted neural networks. J. Chem. enabling chemical discovery and foster Phys., 127(1), 07B603. fundamental insights. [2] Behler, J., Parrinello, M. (2007). Generalized neural-network representation of high- Because of this, in the past few years there has dimensional potential-energy surfaces. Phys. Rev. been a furry of recent work towards designing Lett., 98(14), 146401. machine learning techniques for molecule [1, 2, [3] Kang, B., Ceder, G. (2009). Battery materials 4-11, 13-18, 20, 21, 23-32, 34-38] and material for ultrafast charging and discharging. Nature, data [1-3, 5, 6, 12, 19, 24, 33]. These works have 458(7235), 190. drawn inspiration from and made signifcant [4] Bartók, A. P., Payne, M. C., Kondor, R., Csányi, contributions to areas of machine learning as G. (2010). Gaussian approximation potentials: diverse as learning on graphs to models in The accuracy of quantum mechanics, without the natural language processing. Recent advances electrons. Phys. Rev. Lett., 104(13), 136403. enabled the acceleration of molecular dynamics [5] Behler, J. (2011). Atom-centered symmetry simulations, contributed to a better functions for constructing high-dimensional understanding of interactions within quantum neural network potentials. J. Chem. Phys, 134(7), many-body systems and increased the efciency 074106. of density functional theory based quantum [6] Behler, J. (2011). Neural network potential- mechanical modeling methods. This young feld energy surfaces in chemistry: a tool for large- ofers unique opportunities for machine learning scale simulations. Phys. Chem. Chem. Phys., researchers and practitioners, as it presents a 13(40), 17930-17955. wide spectrum of challenges and open questions, [7] Rupp, M., Tkatchenko, A., Müller, K.-R., von including but not limited to representations of Lilienfeld, O. A. (2012). Fast and accurate physical systems, physically constrained models, modeling of molecular atomization energies with manifold learning, interpretability, model bias, machine learning. Phys. Rev. Lett., 108(5), and causality. 058301. [8] Snyder, J. C., Rupp, M., Hansen, K., Müller, K.- The goal of this workshop is to bring together R., Burke, K. (2012). Finding density functionals researchers and industrial practitioners in the with machine learning. Phys. Rev. Lett., 108(25), felds of computer science, chemistry, physics, 253002. materials science, and biology all working to [9] Montavon, G., Rupp, M., Gobre, V., Vazquez- innovate and apply machine learning to tackle Mayagoitia, A., Hansen, K., Tkatchenko, A., the challenges involving molecules and materials. Müller, K.-R., von Lilienfeld, O. A. (2013). Machine In a highly interactive format, we will outline the learning of molecular electronic properties in current frontiers and present emerging research chemical compound space. New J. Phys., 15(9), directions. We aim to use this workshop as an 095003. opportunity to establish a common language [10] Hansen, K., Montavon, G., Biegler, F., Fazli, between all communities, to actively discuss new S., Rupp, M., Schefer, M., Tkatchenko, A., Muller, research problems, and also to collect datasets K.-R. (2013). Assessment and validation of by which novel machine learning models can be machine learning methods for predicting benchmarked. The program is a collection of molecular atomization energies. J. Chem. Theory invited talks, alongside contributed posters. A Comput., 9(8), 3404-3419. panel discussion will provide diferent [11] Bartók, A. P., Kondor, R., Csányi, G. (2013). perspectives and experiences of infuential On representing chemical environments. Phys. researchers from both felds and also engage Rev. B, 87(18), 184115. open participant conversation. An expected [12] Schütt K. T., Glawe, H., Brockherde F., Sanna outcome of this workshop is the interdisciplinary A., Müller K.-R., Gross E. K. U. (2014). How to exchange of ideas and initiation of collaboration. represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B., 89(20), 205118. References [13] Ramsundar, B., Kearnes, S., Riley, P., 64 Dec. 8, 2017

Webster, D., Konerding, D., Pande, V. (2015). (2016). Comparing molecules and solids across Massively multitask networks for drug discovery. structural and alchemical space. Phys. Chem. arXiv preprint arXiv:1502.02072. Chem. Phys., 18(20), 13754-13769. [14] Rupp, M., Ramakrishnan, R., & von Lilienfeld, [26] Schütt, K. T., Arbabzadah, F., Chmiela, S., O. A. (2015). Machine learning for quantum Müller, K.-R., Tkatchenko, A. (2017). Quantum- mechanical properties of atoms in molecules. J. chemical insights from deep tensor neural Phys. Chem. Lett., 6(16), 3309-3313. networks. Nat. Commun., 8, 13890. [15] V. Botu, R. Ramprasad (2015). Learning [27] Segler, M. H., Waller, M. P. (2017). Neural‐ scheme to predict atomic forces and accelerate symbolic machine learning for retrosynthesis and materials simulations., Phys. Rev. B, 92(9), reaction prediction. ‎Chem. Eur. J., 23(25), 094306. 5966-5971. [16] Hansen, K., Biegler, F., Ramakrishnan, R., [28] Kusner, M. J., Paige, B., Hernández-Lobato, J. Pronobis, W., von Lilienfeld, O. A., Müller, K.-R., M. (2017). Grammar variational . Tkatchenko, A. (2015). Machine learning arXiv preprint arXiv:1703.01925. predictions of molecular properties: Accurate [29] Coley, C. W., Barzilay, R., Jaakkola, T. S., many-body potentials and nonlocality in chemical Green, W. H., Jensen K. F. (2017). Prediction of space. J. Phys. Chem. Lett, 6(12), 2326-2331. organic reaction outcomes using machine [17] Alipanahi, B., Delong, A., Weirauch, M. T., learning. ACS Cent. Sci., 3(5), 434-443. Frey, B. J. (2015). Predicting the sequence [30] Altae-Tran, H., Ramsundar, B., Pappu, A. S., specifcities of DNA-and RNA-binding proteins by Pande, V. (2017). Low data drug discovery with deep learning. ‎Nat. Biotechnol., 33(8), 831-838. one-shot learning. ACS Cent. Sci., 3(4), 283-293. [18] Duvenaud, D. K., Maclaurin, D., Aguilera- [31] Gilmer, J., Schoenholz, S. S., Riley, P. F., Iparraguirre, J., Gomez-Bombarelli, R., Hirzel, T., Vinyals, O., Dahl, G. E. (2017). Neural message Aspuru-Guzik, A., Adams, R. P. (2015). passing for quantum chemistry. arXiv preprint Convolutional networks on graphs for learning arXiv:1704.01212. molecular fngerprints. NIPS, 2224-2232. [32] Chmiela, S., Tkatchenko, A., Sauceda, H. E., [19] Faber F. A., Lindmaa A., von Lilienfeld, O. A., Poltavsky, Igor, Schütt, K. T., Müller, K.-R. (2017). Armiento, R. (2016). Machine learning energies of Machine learning of accurate energy-conserving 2 million elpasolite (A B C 2 D 6) crystals. Phys. molecular force felds. Sci. Adv., 3(5), e1603015. Rev. Lett., 117(13), 135502. [33] Ju, S., Shiga T., Feng L., Hou Z., Tsuda, K., [20] Gomez-Bombarelli, R., Duvenaud, D., Shiomi J. (2017). Designing nanostructures for Hernandez-Lobato, J. M., Aguilera-Iparraguirre, J., phonon transport via bayesian optimization. Hirzel, T. D., Adams, R. P., Aspuru-Guzik, A. Phys. Rev. X, 7(2), 021024. (2016). Automatic chemical design using a data- [34] Ramakrishnan, R, von Lilienfeld, A. (2017). driven continuous representation of molecules. Machine learning, quantum chemistry, and arXiv preprint arXiv:1610.02415. chemical space. Reviews in Computational [21] Wei, J. N., Duvenaud, D, Aspuru-Guzik, A. Chemistry, 225-256. (2016). Neural networks for the prediction of [35] Hernandez-Lobato, J. M., Requeima, J., Pyzer- organic chemistry reactions. ACS Cent. Sci., Knapp, E. O., Aspuru-Guzik, A. (2017). Parallel 2(10), 725-732. and distributed Thompson sampling for large- [22] Sadowski, P., Fooshee, D., Subrahmanya, N., scale accelerated exploration of chemical space. Baldi, P. (2016). Synergies between quantum arXiv preprint arXiv:1706.01825. mechanics and machine learning in reaction [36] Smith, J., Isayev, O., Roitberg, A. E. (2017). prediction. J. Chem. Inf. Model., 56(11), ANI-1: an extensible neural network potential 2125-2128. with DFT accuracy at force feld computational [23] Lee, A. A., Brenner, M. P., Colwell L. J. (2016). cost. Chem. Sci., 8(4), 3192-3203. Predicting protein-ligand afnity with a random [37] Brockherde, F., Li, L., Burke, K., Müller, K.-R. matrix framework. Proc. Natl. Acad. Sci., 113(48), By-passing the Kohn-Sham equations with 13564-13569. machine learning. Nat. Commun., in press. [24] Behler, J. (2016). Perspective: Machine [38] Schütt, K. T., Kindermans, P. J., Sauceda, H. learning potentials for atomistic simulations. J. E., Chmiela, S., Tkatchenko, A., Müller, K. R. Chem. Phys., 145(17), 170901. (2017). MolecuLeNet: A continuous-flter [25] De, S., Bartók, A. P., Csányi, G., Ceriotti, M. 65 Dec. 8, 2017

convolutional neural network for modeling 04:20 Planning Chemical Syntheses with quantum interactions. NIPS (accepted). PM Neural Networks and Monte Carlo Tree Search Segler 05:50 Closing remarks Hernández-Lobato Schedule PM

08:00 Opening remarks Müller AM Workshop on Worm's Neural Information 08:20 Machine Learning for Molecular Processing (WNIP) AM Materials Design Aspuru-Guzik

08:45 TBA4 DiStasio Jr. Ramin Hasani, Manuel Zimmer, Stephen Larson, Radu AM Grosu 09:00 New Density Functionals Created by AM Machine Learning burke S5, Fri Dec 08, 08:00 AM 09:35 Poster spotlights Strubell, Goh, A fundamental Challenge in neuroscience is to AM Tsubaki, Gaudin, Schwaller, Ragoza, understand the elemental computations and Gomez-Bombarelli algorithms by which brains perform information 10:45 Quantum Machine Learning von processing. This is of great signifcance to AM Lilienfeld biologists, as well as, to engineers and computer 11:05 Machine Learning in Organic scientists, who aim at developing energy efcient AM Synthesis Planning And Execution and intelligent solutions for the next generation Jensen of computers and autonomous devices. The 11:25 Neural-network Quantum States benefts of collaborations between these felds AM Carleo are reciprocal, as brain-inspired computational 11:40 Quantitative Attribution: Do Neural algorithms and devices not only advance AM Network Models Learn the Correct engineering, but also assist neuroscientists by Chemistry? Colwell conforming their models and making novel 01:55 TBA11 Smola predictions. A large impediment toward such an PM efcient interaction is still the complexity of 02:00 ChemTS: An Efcient Python Library brains. We thus propose that the study of small PM for De Novo Molecular Generation model organisms should pioneer these eforts. Tsuda 02:15 Symmetry Matters: Learning Scalars The nematode worm, C. elegans, provides a PM and Tensors in Materials and ready experimental system for reverse- Molecules Ceriotti engineering the nervous system, being one of the best studied animals in the life sciences. The 02:35 Towards Exact Molecular Dynamics neural connectome of C. elegans has been known PM Simulations with Machine-Learned for 30 years, providing the structural basis for Force Fields Chmiela building models of its neural information 03:25 N-body Neural Networks: A General processing. Despite its small size, C. elegans PM Compositional Architecture For exhibits complex behaviors, such as, locating Representing Multiscale Physical food, mating partners and navigating its Systems Kondor environment by integrating a plethora of 03:45 Distilling Expensive Simulations with environmental cues. Over the past years, the PM Neural Networks Vinyals feld has made an enormous progress in 04:05 Automatic Chemical Design Using a understanding some of the neural circuits that PM Data-driven Continuous control sensory processing, decision making and Representation of Molecules locomotion. In laboratory, the crawling behavior Duvenaud of worms occurs mainly in 2D. This enables the use of machine learning tools to obtain quantitative behavioral descriptions of 66 Dec. 8, 2017

unprecedented accuracy. Moreover, neuronal analysis, cell modeling, cell recognition and imaging techniques have been developed so that tracking, dynamic modeling of neural circuits and the activity of nearly all nerve cells in the brain genetic regulatory networks. can be recorded in real time. Leveraging on these advancements, the community wide C. elegans The workshop’s webpage: https:// OpenWorm project will make a realistic in silico sites.google.com/site/wwnip2017/ simulation of a nervous system and the behavior it produces possible, for the frst time. Schedule The goal of this workshop is to gather researchers in neuroscience and machine learning together, 09:00 Openning Remarks Hasani to advance understanding of the neural AM information processing of the worm and to outline 09:15 Capturing the continuous complexity what challenges still lie ahead. We particularly AM of natural behavior Stephens aim to: 09:45 Neuronal analysis of value-based - Comprehensively, introduce the nervous system AM decision making in C. elegans Lockery of C. elegans. We will discuss the state-of-the-art 10:15 3 spotlight presentations Buchanan, fndings and potential future solutions for AM Lechner, Li modeling its neurons and synapses, complete networks of neurons and the various behaviors of 10:30 Poster Session 1 Fuchs, Lung, Lechner, the worm, AM Li, Gordus, Venkatachalam, Chaudhary, - Identify main challenges and their solutions in Hůla, Rolnick, Linderman, Mena, Paninski, behavioral and neural data extraction, such as Cohen imaging techniques, generation of time series 11:00 Mechanisms and Functions of data from calcium imaging records and high AM Neuronal Population Dynamics in C. resolution behavioral data, as well as cell elegans Zimmer recognition, cell tracking and image 11:30 From salt navigation in segmentation, AM Caenorhabditis elegans to robot - Explore machine learning techniques for navigation in urban environments, interpretation of brain data, such as time series or: the role of sensory computation analysis, feature extraction methods, complex in balancing exploration and network analysis, complex nonlinear systems exploitation during animal search analysis, large-scale parameter optimization Cohen methods, and representation learning, 12:00 Multi-neuronal imaging of C. elegans - Get inspirations from this well-understood brain PM courtship and mating Venkatachalam to design novel network architectures, control 12:15 Using Network Control Principles to algorithms and neural processing units. PM Probe the Structure and Function of We have invited leading neuroscientists, machine Neuronal Connectomes Schafer learning scientists and interdisciplinary experts, 12:30 Lunch Break to address these main objectives of the PM workshop, in the form of Keynote talks and a 02:00 Biological Neurons Are Diferent panel discussion. We also invite submissions of 4- PM From Neural Networks: Simulating page papers for posters, spotlight presentations C.elegans in an open science project and contributed talks, and ofer travel awards. Larson 02:30 Parking with a worm's brain Grosu Topics of interests are: Deep learning applications PM in nervous system data analysis, neural circuits 03:00 Break / Poster Session 2 analysis, behavior modeling, novel computational PM approaches and algorithms for brain data interpretations, brain simulation platforms, 03:30 Evolving Neural Circuits for optimization algorithms for nonlinear systems, PM Behavior: C. elegans Locomotion applications of machine learning methods to Izquierdo brain data and cell biology, complex network 67 Dec. 8, 2017

04:00 Panel Discussion introduction of genetically tractable invertebrates PM with small nervous systems, like the Drosophila and C. elegans. We have recently shown that the nematode C. Abstracts (7): elegans makes value-based decisions. This was done using a formal economic method – the Abstract 2: Capturing the continuous Generalized Axiom of Revealed Preference complexity of natural behavior in Workshop (GARP). The basis of the method is to establish on Worm's Neural Information Processing that the subject's choices are internally (WNIP), Stephens 09:15 AM consistent with respect to transitivity (A > B > C While animal behavior is often quantifed through ⇒ A > C). In the wild, C. elegans feeds on a discrete motifs, this is only an approximation to variety of bacteria and learns to prefer the more fundamentally continuous dynamics and ignores nutritious species. We tested worms on a set of important variability within each motif. Here, we decisions between a high quality species and a develop a behavioral phase space in which the low quality species at a range of relative instantaneous state is smoothly unfolded as a concentrations and found the worm’s choices to combination of postures and their short-time be 100% transitive, the necessary and sufcient dynamics. We apply this approach to C. elegans condition for value-based decision making. and show that the dynamics lie on a 6D space, Further, we found that the olfactory neuron AWC, which is globally composed of three sets of cyclic known to be activated by the sudden absence of trajectories that form the animal’s basic food, is required for intact food choice behavior. behavioral motifs: forward, backward and turning Surprisingly, however, we found that AWC is also locomotion. In contrast to global stereotypy, activated by the switch from high quality food to variability is evident by the presence of locally- low quality food, even when the two foods are at unstable dynamics for each set of cycles. Across the same concentration. Thus, food value may be the full phase space we show that the Lyapunov represented at the level of individual olfactory spectrum is symmetric with positive, chaotic neurons. exponents driving variability balanced by We are now investigating the neural mechanisms negative, dissipative exponents driving of choice transitivity. C. elegans selects food stereotypy. The symmetry of the spectrum holds sources utilizing klinotaxis, a chemotaxis strategy for diferent environments and for human during locomotion in which the worm’s head walking, suggesting a general condition of motor bends more deeply on the side of preferred food. control. Finally, we use the reconstructed phase The chemosensory neurons, interneurons, and space to analyze the complexity of the dynamics motor neurons of a candidate circuit for klinotaxis along the worm’s body and fnd evidence for have been identifed. Extrapolating from our multiple, spatially-separate oscillators driving C. fndings with respect to AWC, we have developed elegans locomotion. a model of the circuit in which distinct chemosensory neuron types encode food quality and quantity during particular phases of head bending. Activation of downstream interneurons Abstract 3: Neuronal analysis of value-based in the model is the weighted sum of these inputs decision making in C. elegans in Workshop in accordance with phase information. The model on Worm's Neural Information Processing proposes that the signs and strengths of synaptic (WNIP), Lockery 09:45 AM weights in the biological circuit are adjusted to Decision making is a central function of the brain ensure that subjective value is a monotonic and the focus of intensive study in neuroscience, function of the relative quantity of high and low psychology, and economics. Value-based decision quality food, a property that guarantees making (e.g., ‘which fragrance do you prefer?’ transitivity under GARP. Work in progress tests not 'which smells more like roses?') guides the model using calcium imaging, optogenetic signifcant, sometimes life-changing, choices yet activation, and ablations of each neuron in the its neuronal basis is poorly understood. Research circuit. into this question would be accelerated by the 68 Dec. 8, 2017

Abstract 6: Mechanisms and Functions of patterns. Neuronal Population Dynamics in C. elegans We next performed systematic perturbations by in Workshop on Worm's Neural Information interrogation of rich club neurons via transgenic Processing (WNIP), Zimmer 11:00 AM neuronal inhibition tools. Using whole brain imaging in combination with computational Populations of neurons in the brains of many analysis methods we found that upon inhibition of diferent animals, ranging from invertebrates to critical hubs, leading to a disintegration of the primates, typically coordinate their activities to network, most other individual neurons remain generate low dimensional and transient activity vigorously active, however the global dynamics, an operational principle serving many coordination across neurons was abolished. neuronal functions like sensory coding, decision Based on these results we hypothesize that making and motor control. However, the neuronal population dynamics are an emergent mechanism that bind individual neurons to global property of neuronal networks. population states are not yet known. Are Finally, we aim to recapitulate C. elegans brain population dynamics driven by a smaller number dynamics in silico. Here, we generate neuronal of pacemaker neurons or are they an emergent network simulations based on deterministic and property of neuronal networks? What are the stochastic biophysical models of neurons and features in global network architecture that synapses, at multiscale levels of abstraction. We support coordinated network wide dynamics? then adopt a genetic algorithm for neuronal In order to address these problems, we study circuit parameter optimization, to fnd the best neuronal population dynamics in C. elegans. We matches between simulations and measured recently developed a calcium imaging approach calcium dynamics. This approach enables us to to record the activity of nearly all neuron in the test our hypotheses, and to predict unknown worm brain in real time and at single cell properties of neural circuits important for brain resolution. We show that brain activity of C. dynamics. elegans is dominated by brain wide coordinated population dynamics involving a large fraction of interneurons and motor neurons. The activity patterns of these neuronal ensembles recur in an Abstract 7: From salt navigation in Caenorhabditis elegans to robot navigation orderly and cyclical fashion. In subsequent in urban environments, or: the role of experiments, we characterized these brain dynamics functionally and found that they sensory computation in balancing represent action commands and their assembly exploration and exploitation during animal into a typical action sequence of these animals: search in Workshop on Worm's Neural Information Processing (WNIP), Cohen 11:30 forward crawling – backward crawling – turning. AM Deciphering the mechanisms underlying neuronal population dynamics is key to understanding the Efective spatial navigation is essential for the principal computations performed by neuronal survival of animals. Navigation, or the search for networks in the brains of animals, and perhaps favorable conditions, is fundamentally an will inspire the design of novel machine learning adaptive behavior that can depend on the algorithms for robotic control. In this talk, I will changing environment, the animal's past history discuss three of our approaches to uncover these of success and failure and its internal state. C. mechanisms: elegans implements combinations of systematic First, using graph theory, we aim to identify the and stochastic navigational strategies that are key features of neuronal network architecture modulated by plasticity across a range of time that support functional dynamics. We found that scales. Here, we combine experiments and rich club neurons, i.e. highly interconnected computational modeling to characterise network hubs contribute most to brain dynamics. adaptation in gustatory and nociceptive salt However, simple measures of synaptic sensing neurons and construct a simulation connectivity (e.g. connection strength) failed to framework in which animals can navigate a predict functional interactions between these virtual environment. Our model, and simulations neurons; unlike higher order network statistics on a variety of smooth, rugged or complex that measure the similarity in synaptic input landscapes, suggest that these diferent forms of 69 Dec. 8, 2017

sensory adaptation combine to dynamically prediction, ablation of PDB had a specifc efect modulate navigational strategies, giving rise to on locomotion, altering the dorsoventral polarity efective exploration and navigation of the of large turns. Control analysis also predicted that environment. Inspired by this compact and three members of the DD motorneuron class elegant sensory circuit, we present a robotic (DD4, DD5 and DD6) are individually required for simulation framework, capable of robustly body muscle controllability, while more anterior searching for landmarks in a toy simulation DDs (DD1, DD2 and DD3) are not. Indeed, we environment. found that ablation of DD4 or DD5, but not DD2 or DD3, led to abnormalities in posterior body movements, again consistent with control theory predictions. We are currently using the control Abstract 9: Using Network Control Principles framework to probe other parts of the C. elegans to Probe the Structure and Function of Neuronal Connectomes in Workshop on connectome, and are developing more Worm's Neural Information Processing sophisticated approaches behavioral analysis in (WNIP), Schafer 12:15 PM order to more precisely relate ablation phenotypes to specifc muscle groups. We William R. Schafer1, Gang Yan2, 3, Petra E. anticipate that the control framework validated Vértes4, Emma K. Towlson3, Yee Lian Chew1, by this work may have application in the analysis Denise S. Walker1, & Albert-László Barabási3 of larger neuronal connectomes and other 1Division of Neurobiology, MRC Laboratory of complex networks. Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2

0QH, UK. Abstract 11: Biological Neurons Are Diferent 2School of Physics Science and Engineering, From Neural Networks: Simulating Tongji University, Shanghai 200092, China. C.elegans in an open science project in 3Center for Complex Network Research and Workshop on Worm's Neural Information Department of Physics, Northeastern University, Processing (WNIP), Larson 02:00 PM Boston, Massachusetts 02115, USA. 4Department of Psychiatry, Behavioural and The membrane potential of a biological neuron is Clinical Neuroscience Institute, University of considered to be one of the most Cambridge, Cambridge CB2 0SZ, UK. importantproperties to understand its dynamic state. While the action potential or discrete Large-scale eforts are underway to map the “spike” featureof mammalian neurons has been neuronal connectomes of many animals, from emphasized as an information bearing signal, fies to humans. However, even for small biologicalevidence exists that even without action connectomes, such as that of C. elegans, it has potentials, neurons process information and give been difcult to relate the structure of neuronal rise todiferent behavioral states. Nowhere is this wiring patterns to the function of neural circuits. more evident than in the nematode worm Recent theoretical studies have suggested that C.elegans , where its entire nervous system of control theory might provide a framework to 302 neurons, despite a lack of action understand structure-function relationships in potentials,organizes complex behaviors such as complex biological networks, including neuronal mating, predator avoidance, location of food connectomes. To test this hypothesis sources, andmany others. experimentally, we have used the complete For thirty years, the C. elegans nervous system neuronal connectome of C. elegans to identify has remained the only adult animal that has neurons predicted to afect the controllability of hadits nervous system connectivity mapped at the body muscles and assess the efect of the level of individual synapses and gap ablating these neurons on locomotor behavior. junctions.As part of the international open We identifed 12 neural classes whose removal science collaboration known as OpenWorm, we from the connectome reduced the structural have built a1simulation framework, known as controllability of the body neuromusculature, one c302, that enables us to assemble the known of which was the uncharacterized PDB connectivity andother biological data of the C. motorneuron. Consistent with the control theory elegans nervous system into a Hodgkin­Huxley­‐ 70 Dec. 8, 2017

based simulationthat can be run in the NEURON is then to analyze and understand the evolved simulation engine. solutions as a way to generate novel, often Using a physical simulation of the C. elegans unexpected, hypotheses. As an example, I will body, known as Sibernetic, we have injected focus on how the rhythmic pattern is both simple sinusoidal activation patterns of the generated and propagated along the body during muscle cells of the C. elegans and produced locomotion. simplecrawling and swimming behavior. With the goal of producing the same simple sinusoids in themuscle cells, we have used c302 to select a subnetwork from the full C. elegans Machine Learning for the Developing nervoussystem and used machine learning World techniques to ft dynamic parameters that are underspecifedby the data. Our preliminary Maria De-Arteaga, William Herlands results still leave many important biological features out, butinitially demonstrate that it is S7, Fri Dec 08, 08:00 AM possible to make motor neurons produce sinusoidal activitypatterns in the muscles as used Six billion people live in developing world in the physical simulation. countries. The unique development challenges faced by these regions have long been studied by In this talk I will discuss these initial results and researchers ranging from sociology to statistics discuss future directions for a and ecology to economics. With the emergence betterunderstanding of the information of mature machine learning methods in the past processing underlying the C. elegans’ nervous decades, researchers from many felds - including system. core machine learning - are increasingly turning to machine learning to study and address challenges in the developing world. This workshop is about delving into the intersection of Abstract 14: Evolving Neural Circuits for machine learning and development research. Behavior: C. elegans Locomotion in Workshop on Worm's Neural Information Machine learning present tremendous potential to Processing (WNIP), Izquierdo 03:30 PM development research and practice. Supervised One of the grand scientifc challenges of this methods can provide expert telemedicine century is to understand how behavior is decision support in regions with few resources; grounded in the interaction between an deep learning techniques can analyze satellite organism’s brain, its body, and its environment. imagery to create novel economic indicators; NLP Although a lot of attention and resources are algorithms can preserve and translate obscure focused on understanding the human brain, I will languages, some of which are only spoken. Yet, argue that the study of simpler organisms are an there are notable challenges with machine ideal place to begin to address this challenge. I learning in the developing world. Data will introduce the nematode worm Caernohabditis cleanliness, computational capacity, power elegans, with just 302 neurons, the only fully- availability, and internet accessibility are more reconstructed connectome at the cellular level, limited than in developed countries. Additionally, and a rich behavioral repertoire that we are still the specifc applications difer from what many discovering. I will describe a computational machine learning researchers normally approach to address such grand challenge. I will encounter. The confuence of machine learning's lay out some of the advantages of expressing our immense potential with the practical challenges understanding in equations and computational posed by developing world settings has inspired a models rather than just words. I will describe our growing body of research at the intersection of unique methodology for exploring the unknown machine learning and the developing world. biological parameters of the model through the use of evolutionary algorithms. We train the This one-day workshop is focused on machine neural networks on what they should do, with learning for the developing world, with an little or no instructions on how to do it. The efort emphasis on developing novel methods and 71 Dec. 8, 2017

technical applications that address core concerns 12:30 Lunch of developing regions. We will consider a wide PM range of development areas including health, 02:00 Jen Ziemke (International Crisis education, institutional integrity, violence PM Mappers) Ziemke mitigation, economics, societal analysis, and 02:30 Caitlin Augustin (DataKind): Data for environment. From the machine learning PM Social Good Augustin perspective we are open to all methodologies 03:00 Cofee break with an emphasis on novel techniques inspired by PM particular use cases in the developing world. 03:30 Stefano Ermon (Stanford): PM Measuring Progress Towards Invited speakers will address particular areas of Sustainable Development Goals with interest, while poster sessions and a guided panel Machine Learning Ermon discussion will encourage interaction between 04:00 Panel discussion attendees. We wish to review the current PM approaches to machine learning in the developing world, and inspire new approaches and paradigms that can lay the groundwork for Abstracts (5): substantial innovation. Abstract 2: Skyler Speakman (IBM Research Africa): Three Population Covariate Shift for Schedule Mobile Phone-based Credit Scoring in Machine Learning for the Developing World, 08:45 Introductory remarks Dubrawski Speakman 09:00 AM AM Mobile money platforms are gaining traction 09:00 Skyler Speakman (IBM Research across developing markets as a convenient way AM Africa): Three Population Covariate of sending and receiving money over mobile Shift for Mobile Phone-based Credit phones. Recent joint collaborations between Scoring Speakman banks and mobile-network operators leverage a 09:30 Contributed talk: Unique Entity customer's past mobile phone transactions in AM Estimation with Application to the order to create a credit score for the individual. Syrian Confict Chen These scores allow access to low-value, short- 09:40 Contributed talk: Field Test term, un-collateralized loans. In this talk we will AM Evaluation of Predictive Models for look at the problem of launching a mobile-phone Wildlife Poaching Activity in Uganda based credit scoring system in a new market Gholami without either labeled examples of repayment or 09:50 Contributed talk: Household poverty the marginal distribution of features of borrowers AM classifcation in data-scarce in the new market. The latter assumption rules environments: a machine learning out traditional transfer learning approaches such approach Kshirsagar as a direct covariate shift. We apply a Three 10:00 Ernest Mwebaze (UN Global Pulse): Population Covariate Shift method to account for AM ML4D: what works and how it works the diferences in the original and new markets. - case studies from the developing The three populations are: a) Original Market world Mwebaze Members, b) Original Market Borrowers who self- 11:00 Emma Brunskill (Stanford) Brunskill selected into a loan product, and c) New Market AM Members. The goal of applying a generalized covariate shift to these three populations is to 11:30 P. Anandan (Wadhwani Institute of understand the repayment behavior of a fourth: AM AI) d) New Market Borrowers who will self-select into 11:40 Posters Bhattacharya, Lam, Vidyapu, a loan product when it becomes available. AM Shankar, Anders, Wilder, Khan, Li, Saquib, Kshirsagar, Perez, Zhang, Gholami, Abebe 72 Dec. 8, 2017

Abstract 6: Ernest Mwebaze (UN Global ways that DataKind engages with nonprofts Pulse): ML4D: what works and how it works across industries and economies, with particular - case studies from the developing world in emphasis on techniques, tools, and approaches Machine Learning for the Developing World, that can provide guidance to ML in the Mwebaze 10:00 AM developing world. We’ll dive in on the exciting potential of big data to tackle big social issues Present advances in Artifcial Intelligence and and how data scientists can apply their skills for Machine Learning ofer unique opportunities for the greater good. solving impactful developing world problems. At the AI and Data Science research lab at Makerere University and UN PulseLab Kampala in Uganda, we have 8 years of trying to marry good Abstract 14: Stefano Ermon (Stanford): computational techniques with good developing Measuring Progress Towards Sustainable world problems. In this talk I will give some Development Goals with Machine Learning in Machine Learning for the Developing examples of some of the projects we are working World, Ermon 03:30 PM on or have worked on. These will include automating disease diagnosis in crops and Recent technological developments are creating humans, crowd-sourcing surveillance, trafc new spatio-temporal data streams that contain a monitoring and using public radio data to infer wealth of information relevant to sustainable humanitarian crises. I will talk about the relative development goals. Modern AI techniques have strengths of the diferent types of data that can the potential to yield accurate, inexpensive, and be reliably collected in the developing world and highly scalable models to inform research and some deployment options that (seem to) work. policy. As a frst example, I will present a machine learning method we developed to predict and map poverty in developing countries. Our method Abstract 12: Caitlin Augustin (DataKind): can reliably predict economic well-being using Data for Social Good in Machine Learning only high-resolution satellite imagery. Because for the Developing World, Augustin 02:30 PM images are passively collected in every corner of the world, our method can provide timely and We are living inside a data revolution that is accurate measurements in a very scalable end transforming the way we understand and interact economic way, and could revolutionize eforts with each other and the world - and it has only towards global poverty eradication. As a second just begun. Every feld is now having its “data example, I will present some ongoing work on moment,” giving mission-driven organizations monitoring food security outcomes. brand new opportunities to harness data to advance their work. In fact, the same algorithms that companies use to boost profts can help these organizations boost their impact. From Abstract 15: Panel discussion in Machine poverty alleviation to healthcare access to Learning for the Developing World, 04:00 PM improved education, machine learning has the This panel discussion brings together core potential to move the needle on seemingly machine learning researchers and developing insurmountable issues, but only if there is close world application domain experts in a collaboration between data scientists and subject conversation regarding challenges, opportunities matter experts. Since DataKind was founded in and future directions of ML4D. 2011, its volunteers have delivered over $25 million in pro bono services to social change organizations worldwide - helping organizations deliver vaccines more efectively to creating chatbots that connect people to critical services during a natural disaster to helping at risk students reach graduation, using satellite imagery to estimate poverty and identify crop diseases, and more. This talk will focus on the 73 Dec. 8, 2017

Advances in Approximate Bayesian 09:40 Industry talk: Variational Inference AM Autoencoders for Recommendation Liang Francisco Ruiz, Stephan Mandt, Cheng Zhang, James 10:00 Poster Spotlights Locatello, Pakman, McInerney, Dustin Tran, Tamara Broderick, Michalis AM Tang, Rainforth, Borsos, Järvenpää, Titsias, David Blei, Max Welling Nalisnick, Abbati, LU, Huggins, Singh, Luo

Seaside Ballroom, Fri Dec 08, 08:00 AM 10:30 Cofee Break and Poster Session 1 AM Approximate inference is key to modern 11:25 Industry talk: Cedric Archambeau probabilistic modeling. Thanks to the availability AM (TBA) Archambeau of big data, signifcant computational power, and 11:45 Contributed talk: Variational sophisticated models, machine learning has AM Inference based on Robust achieved many breakthroughs in multiple Divergences Futami application domains. At the same time, 12:00 Lunch Break approximate inference becomes critical since PM exact inference is intractable for most models of 01:00 Poster Session Horii, Jeong, Schwedes, interest. Within the feld of approximate Bayesian PM He, Calderhead, Erdil, Altosaar, inference, variational and Monte Carlo methods Muchmore, Khanna, Gemp, Zhang, Zhou, are currently the mainstay techniques. For both Cremer, DeYoreo, Terenin, McVeigh, methods, there has been considerable progress Singh, Yang, Bodin, Evans, Chai, Zhe, both on the efciency and performance. Ling, ADAM, Maaløe, Miller, Pakman, Djolonga, Ge In this workshop, we encourage submissions 02:05 Contributed talk: Adversarial advancing approximate inference methods. We PM Sequential Monte Carlo Kempinska are open to a broad scope of methods within the 02:20 Contributed talk: Scalable Logit feld of Bayesian inference. In addition, we also PM Gaussian Process Classifcation encourage applications of approximate inference Wenzel in many domains, such as computational biology, 02:35 Invited talk: Variational Inference in recommender systems, diferential privacy, and PM Deep Gaussian Processes Damianou industry applications. 03:00 Cofee Break and Poster Session 2 PM Schedule 03:30 Contributed talk: Taylor Residual PM Estimators via Automatic Diferentiation Miller 08:30 Introduction Zhang, Ruiz, Tran, 03:45 Invited talk: Diferential privacy and AM McInerney, Mandt PM Bayesian learning Honkela 08:35 Invited talk: Iain Murray (TBA) Murray 04:10 Contributed talk: Frequentist AM PM Consistency of Variational Bayes 09:00 Contributed talk: Learning Implicit Wang AM Generative Models Using 04:25 Panel: On the Foundations and Diferentiable Graph Tests Djolonga PM Future of Approximate Inference 09:15 Invited talk: Gradient Estimators for Blei, Ghahramani, Heller, Salimans, AM Implicit Models Li Welling, Hofman 74 Dec. 9, 2017

Dec. 9, 2017 world problems.

In short, this workshop aims at gathering statisticians and machine learning researchers to (Almost) 50 shades of Bayesian Learning: discuss current trends and the future of PAC-Bayesian trends and insights {PAC,quasi}-Bayesian learning. From a broader perspective, we aim to bridge the gap between Benjamin Guedj, Pascal Germain, Francis Bach several communities that can all beneft from sharper statistical guarantees and sound theory- 101 A, Sat Dec 09, 08:00 AM driven learning algorithms.

Industry-wide successes of machine learning at References the dawn of the (so-called) big data era has led to [1] J. Shawe-Taylor and R. Williamson. A PAC an increasing gap between practitioners and analysis of a Bayes estimator. In Proceedings of theoreticians. The former are using of-the-shelf COLT, 1997. statistical and machine learning methods, while [2] D. A. McAllester. Some PAC-Bayesian the latter are designing and studying the theorems. In Proceedings of COLT, 1998. mathematical properties of such algorithms. The [3] D. A. McAllester. PAC-Bayesian model tradeof between those two movements is averaging. In Proceedings of COLT, 1999. somewhat addressed by Bayesian researchers, [4] O. Catoni. Statistical Learning Theory and where sound mathematical guarantees often Stochastic Optimization. Saint-Flour Summer meet efcient implementation and provide model School on Probability Theory 2001 (Jean Picard selection criteria. In the late 90s, a new paradigm ed.), Lecture Notes in Mathematics. Springer, has emerged in the statistical learning 2004. community, used to derive probably [5] O. Catoni. PAC-Bayesian supervised approximately correct (PAC) bounds on Bayesian- classifcation: the thermodynamics of statistical favored estimators. This PAC-Bayesian theory learning. Institute of Mathematical Statistics has been pioneered by Shawe-Taylor and Lecture Notes—Monograph Series, 56. Institute of Willamson (1997), and McAllester (1998, 1999). It Mathematical Statistics, 2007. has been extensively formalized by Catoni (2004, [6] P. D. Grünwald. The Minimum Description 2007) and has triggered, slowly but surely, Length Principle. The MIT Press, 2007. increasing research eforts during last decades.

We believe it is time to pinpoint the current PAC- Schedule Bayesian trends relatively to other modern approaches in the (statistical) machine learning community. Indeed, we observe that, while the 08:30 Overture Guedj, Bach, Germain feld grows by its own, it took some undesirable AM distance from some related areas. Firstly, it 08:30 François Laviolette - A Tutorial on seems to us that the relation to Bayesian AM PAC-Bayesian Theory Laviolette methods has been forsaken in numerous works, 09:30 Peter Grünwald - A Tight Excess Risk despite the potential of PAC-Bayesian theory to AM Bound via a Unifed PAC-Bayesian- bring new insights to the Bayesian community Rademacher-Shtarkov-MDL and to go beyond the classical Bayesian/ Complexity Grünwald frequentist divide. Secondly, the PAC-Bayesian 11:00 Jean-Michel Marin - Some recent methods share similarities with other quasi- AM advances on Approximate Bayesian Bayesian (or pseudo-Bayesian) methods studying Computation techniques Marin Bayesian practices from a frequentist standpoint, 11:45 Contributed talk 1 - A PAC-Bayesian such as the Minimum Description Length (MDL) AM Approach to Spectrally-Normalized principle (Grünwald, 2007). Last but not least, Margin Bounds for Neural Networks even if some practical and theory grounded Neyshabur learning algorithm has emerged from PAC- 02:00 Olivier Catoni - Dimension-free PAC- Bayesian works, these are almost unused for real- PM Bayesian Bounds Catoni 75 Dec. 9, 2017

02:40 Contributed talk 2 - Dimension free (polynomial entropy) classes which have always PM PAC-Bayesian bounds for the been problematic for the PAC-Bayesian approach; estimation of the mean of a random the optimal $\eta$ depends on 'fast rate' vector Catoni properties of the domain, such as central, 03:30 Yevgeny Seldin - A Strongly Bernstein and Tsybakov conditions. PM Quasiconvex PAC-Bayesian Bound Seldin Joint work with Nishant Mehta, University of 04:15 John Shawe-Taylor - Distribution Victoria. See https://arxiv.org/abs/1710.07732 PM Dependent Priors for Stable Learning Shawe-Taylor 05:00 Daniel Roy - Deep Neural Networks: Abstract 4: Jean-Michel Marin - Some recent PM From Flat Minima to Numerically advances on Approximate Bayesian Nonvacuous Generalization Bounds Computation techniques in (Almost) 50 via PAC-Bayes Roy shades of Bayesian Learning: PAC-Bayesian 05:30 Neil Lawrence, Francis Bach and trends and insights, Marin 11:00 AM PM François Laviolette Lawrence, Bach, In an increasing number of application domains, Laviolette the statistical model is so complex that the point- 06:25 Concluding remarks Bach, Guedj, wise computation of the likelihood is intractable. PM Germain That is typically the case when the underlying probability distribution involves numerous latent variables. Approximate Bayesian Computation Abstracts (5): (ABC) is a widely used technique to bypass that Abstract 3: Peter Grünwald - A Tight Excess difculty. I will review some recent developments Risk Bound via a Unifed PAC-Bayesian- on ABC techniques, emphazing the fact that Rademacher-Shtarkov-MDL Complexity in modern machine learning approaches are useful (Almost) 50 shades of Bayesian Learning: in this feld. Although intrinsically very diferent of PAC-Bayesian trends and insights, Grünwald PAC-Bayesian strategies - the choice of a 09:30 AM generative model is essential within the ABC paradigm - I will highlight some links between Over the last 15 years, machine learning these two methodologies. theorists have bounded the performance of empirical risk minimization by (localized) Rademacher complexity; Bayesians with Abstract 6: Olivier Catoni - Dimension-free frequentist sympathies have studied Bayesian PAC-Bayesian Bounds in (Almost) 50 shades consistency and rate of convergence theorems, of Bayesian Learning: PAC-Bayesian trends sometimes under misspecifcation, and PAC- and insights, Catoni 02:00 PM Bayesians have studied convergence properties of generalized Bayesian and Gibbs posteriors. We PAC-Bayesian inequalities have already proved to show that, amazingly, most such bounds readily be a great tool to obtain dimension free follow from essentially a single result that bounds generalization bounds, such as margin bounds for excess risk in terms of a novel complexity COMP$ Support Vector Machines. In this talk, we will play (\eta,w)$. which depends on a learning rate $ with PAC-Bayesian inequalities and infuence \eta$ and a luckiness function $w$, the latter functions to present new robust estimators for generalizing the concept of a 'prior'. Depending the mean of random vectors and random on the choice of $w$, COMP$(\eta,w)$ specializes matrices, as well as for linear to PAC-Bayesian (KL(posterior||prior) complexity, regression. A common theme of the presentation MDL (normalized maximum likelihood) complexity will be to establish dimension free bounds and to and Rademacher complexity, and the bounds work under mild polynomial moment obtained are optimized for generalized Bayes, assumptions regarding the tail of the sample ERM, penalized ERM (such as Lasso) or other distribution. methods. Tuning $\eta$ leads to optimal excess risk convergence rates, even for very large Joint work with Ilaria Giulini. 76 Dec. 9, 2017

learning" regime. Logically, in order to explain generalization, we need nonvacuous bounds. Abstract 8: Yevgeny Seldin - A Strongly Quasiconvex PAC-Bayesian Bound in I will discuss recent work using PAC-Bayesian (Almost) 50 shades of Bayesian Learning: bounds and optimization to arrive at nonvacuous PAC-Bayesian trends and insights, Seldin generalization bounds for neural networks with 03:30 PM millions of parameters trained on only tens of

We propose a new PAC-Bayesian bound and a thousands of examples. We connect our fndings way of constructing a hypothesis space, so that to recent and old work on fat minima and MDL- the bound is convex in the posterior distribution based explanations of generalization, as well as to variational inference for deep learning. Time and also convex in a trade-of parameter permitting, I'll discuss new work interpreting between empirical performance of the posterior distribution and its complexity. The complexity is Entropy-SGD as a PAC-Bayesian method. measured by the Kullback-Leibler divergence to a prior. We derive an alternating procedure for Joint work with Gintare Karolina Dziugaite, based on https://arxiv.org/abs/1703.11008 minimizing the bound. We show that the bound can be rewritten as a one-dimensional function of the trade-of parameter and provide sufcient conditions under which the function has a single global minimum. When the conditions are Deep Learning at Supercomputer Scale satisfed the alternating minimization is guaranteed to converge to the global minimum of Erich Elsen, Danijar Hafner, Zak Stone, Brennan Saeta the bound. We provide experimental results demonstrating that rigorous minimization of the 101 B, Sat Dec 09, 08:00 AM bound is competitive with cross-validation in tuning the trade-of between complexity and Five years ago, it took more than a month to train empirical performance. In all our experiments the a state-of-the-art image recognition model on the trade-of turned to be quasiconvex even when ImageNet dataset. Earlier this year, Facebook the sufcient conditions were violated. demonstrated that such a model could be trained in an hour. However, if we could parallelize this Joint work with Niklas Thiemann, Christian Igel, training problem across the world’s fastest and Olivier Wintenberger. supercomputers (~100 PFlops), it would be possible to train the same model in under a minute. This workshop is about closing that gap: Abstract 10: Daniel Roy - Deep Neural how can we turn months into minutes and Networks: From Flat Minima to Numerically increase the productivity of machine learning Nonvacuous Generalization Bounds via PAC- researchers everywhere? Bayes in (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights, This one-day workshop will facilitate active Roy 05:00 PM debate and interaction across many diferent disciplines. The conversation will range from One of the defning properties of deep learning is algorithms to infrastructure to silicon, with invited that models are chosen to have many more speakers from Cerebras, DeepMind, Facebook, parameters than available training data. In light Google, OpenAI, and other organizations. When of this capacity for overftting, it is remarkable should synchronous training be preferred over that simple algorithms like SGD reliably return asynchronous training? Are large batch sizes the solutions with low test error. One roadblock to key to reach supercomputer scale, or is it explaining these phenomena in terms of implicit possible to fully utilize a supercomputer at batch regularization, structural properties of the size one? How important is sparsity in enabling solution, and/or easiness of the data is that many us to scale? Should sparsity patterns be learning bounds are quantitatively vacuous when structured or unstructured? To what extent do we applied to networks learned by SGD in this "deep expect to customize model architectures for 77 Dec. 9, 2017

particular problem domains, and to what extent 05:00 Supercomputers for Deep Learning can a “single model architecture” deliver state-of- PM Sukumar the-art results across many diferent domains? 05:00 Adaptive Memory Networks Li How can new hardware architectures unlock even PM higher real-world training performance?

Our goal is bring people who are trying to answer any of these questions together in hopes that Machine Learning on the Phone and other cross pollination will accelerate progress towards Consumer Devices deep learning at true supercomputer scale. Hrishi Aradhye, Joaquin Quinonero Candela, Rohit Prasad Schedule 102 A+B, Sat Dec 09, 08:00 AM

08:10 Generalization Gap Keskar Deep Machine Learning has changed the AM computing paradigm. Products of today are built 08:30 Closing the Generalization Gap with machine intelligence as a central attribute, AM Hubara and consumers are beginning to expect near- 08:50 Don't Decay the Learning Rate, human interaction with the appliances they use. AM Increase the Batch Size Smith However, much of the Deep Learning revolution has been limited to the cloud, enabled by popular 09:10 ImageNet In 1 Hour Goyal toolkits such as Cafe, TensorFlow, and MxNet, AM and by specialized hardware such as TPUs. In 09:30 Training with TPUs Ying comparison, mobile devices until recently were AM just not fast enough, there were limited 09:50 Cofee Break developer tools, and there were limited use cases AM that required on-device machine learning. That 10:10 KFAC and Natural Gradients Johnson, has recently started to change, with the AM Duckworth advances in real-time computer vision and 10:30 Neumann Optimizer Krishnan spoken language understanding driving real AM innovation in intelligent mobile applications. 10:50 Evolutionary Strategies Salimans Several mobile-optimized neural network libraries AM were recently announced (CoreML [1], Cafe2 for 11:15 Future Hardware Directions Diamos, mobile [2], TensorFlow Lite [3]), which aim to AM Dean, Knowles, James, Gray dramatically reduce the barrier to entry for 01:30 Learning Device Placement Mirhoseini mobile machine learning. Innovation and PM competition at the silicon layer has enabled new 01:50 Scaling and Sparsity Diamos possibilities for hardware acceleration. To make PM things even better, mobile-optimized versions of several state-of-the-art benchmark models were 02:10 Small World Network Architectures recently open sourced [4]. Widespread increase PM Gray in availability of connected “smart” appliances 02:30 Scalable RL and AlphaGo Lillicrap for consumers and IoT platforms for industrial use PM cases means that there is an ever-expanding 03:20 Scaling Deep Learning to 15 surface area for mobile intelligence and ambient PM PetaFlops Kurth devices in homes. All of these advances in 03:40 Scalable Silicon Compute Knowles combination imply that we are likely at the cusp PM of a rapid increase in research interest in on- 04:00 Practical Scaling Techniques Kapasi device machine learning, and in particular, on- PM device neural computing. 04:20 Designing for Supercompute-Scale PM Deep Learning James Signifcant research challenges remain, however. 78 Dec. 9, 2017

Mobile devices are even more personal than [4] https://opensource.googleblog.com/2017/06/ “personal computers” were. Enabling machine mobilenets-open-source-models-for.html learning while simultaneously preserving user trust requires ongoing advances in the research of diferential privacy and federated learning Schedule techniques. On-device ML has to keep model size and power usage low while simultaneously 08:05 Qualcomm presentation on ML- optimizing for accuracy. There are a few exciting AM optimized mobile hardware Teague novel approaches recently being developed in 08:30 fpgaConvNet: A Toolfow for mobile optimization of neural networks. Lastly, AM Mapping Diverse Convolutional the newly prevalent use of camera and voice as Neural Networks on Embedded interaction models has fueled exciting research FPGAs Venieris towards neural techniques for image and speech/ 08:45 High performance ultra-low- language understanding. AM precision convolutions on mobile devices Tulloch, Jia With this emerging interest as well as the wealth 09:00 Cafe2: Lessons from Running Deep of challenging research problems in mind, we are AM Learning on the World’s Smart proposing the frst NIPS workshop dedicated to Phones Jia on-device machine learning for mobile and ambient home consumer devices. We believe 09:30 CoreML: High-Performance On- that interest in this space is only going to AM Device Inference Kapoor increase, and we hope that the workshop plays 10:00 Data center to the edge: a journey the role of an infuential catalyst to foster AM with TensorFlow Monga research and collaboration in this nascent 11:00 On-Device ML Frameworks Gehlhaar, community. AM Jia, Monga 11:45 Poster Spotlight 1 The next wave of ML applications will have AM signifcant processing on mobile and ambient 12:05 Poster Session 1 devices. Some immediate examples of these are PM single-image depth estimation, object recognition 12:05 Lunch and segmentation running on-device for creative PM efects, or on-device recommender and ranking 01:30 Federated learning for model systems for privacy-preserving, low-latency PM training on decentralized data experiences. This workshop will bring ML Ramage practitioners up to speed on the latest trends for 02:00 Personalized and Private Peer-to- on-device applications of ML, ofer an overview of PM Peer Machine Learning Bellet, the latest HW and SW framework developments, Guerraoui, Tommasi and champion active research towards hard 02:15 SquishedNets: Squishing technical challenges emerging in this nascent PM SqueezeNet further for edge device area. The target audience for the workshop is scenarios via deep evolutionary both industrial and academic researchers and synthesis Li practitioners of on-device, native machine learning. The workshop will cover both 02:30 A Cascade Architecture for Keyword “informational” and “aspirational” aspects of this PM Spotting on Mobile Devices Alvarez, emerging research area for delivering ground- Thornton, Ghodrat breaking experiences on real-world products. 02:45 Multiple-Instance, Cascaded PM Classifcation for Keyword Spotting in Narrow-Band Audio Abdulkader, [1] https://developer.apple.com/machine- Nassar, Mahmoud, Galvez learning/ 03:00 Cofee Break [2] https://cafe2.ai/ PM [3] https://www.tensorfow.org/mobile/ 79 Dec. 9, 2017

03:30 Machine Learning for Alexa Mandal The second aim is to bring GDA closer to real PM applications. We see the challenge of real 04:00 Now Playing: Continuous low-power problems and real data as a motivator for PM music recognition Ritter, Guo, Kumar, researchers to explore new research questions, to Odell, Velimirović, Roblek, Lyon reframe and expand the existing theory, and to 04:15 Learning On-Device Conversational step out of their own sub-area. In particular, for PM Models Ravi, Rudick, Fan people in GDA to see TDA and ML as one. 04:30 Google Lens Chen The impact of GDA in practice also depends on PM having scalable implementations of the most 05:00 Poster Spotlight 2 current results in theory. This workshop will PM showcase the GDA tools which achieve this and 05:20 Poster Session 2 Shafq, Nevado initiate a collective discussion about the tools PM Vilchez, Yamada, Dasgupta, Geyer, Nabi, that need to be built. Rodrigues, Manino, Serb, Carreira- Perpinan, Lim, Low, Pandey, White, We intend this workshop to be a forum for Pidlypenskyi, Wang, Brighton, Zhu, researchers in all areas of Geometric Data Gupta, Leroux Analysis. Trough the tutorials, we are reaching out to the wider NIPS audience, to the many potential users of of Geometric Data Analysis, to Synergies in Geometric Data Analysis make them aware of the state of the art in GDA, (TWO DAYS) and of the tools available. Last but not least, we hope that the scientists invited will bring these

Marina Meila, Frederic Chazal, Yu-Chia Chen methods back to their communities.

102 C, Sat Dec 09, 08:00 AM Medical Imaging meets NIPS This two day workshop will bring together researchers from the various subdisciplines of Ben Glocker, Ender Konukoglu, Hervé Lombaert, Geometric Data Analysis, such as manifold Kanwal Bhatia learning, topological data analysis, shape analysis, will showcase recent progress in this 103 A+B, Sat Dec 09, 08:00 AM feld and will establish directions for future **Scope** research. The focus will be on high dimensional and big data, and on mathematically founded 'Medical Imaging meets NIPS' is a satellite methodology. workshop at NIPS 2017. The workshop aims to bring researchers together from the medical image computing and machine learning Specifc aims community. The objective is to discuss the major ======challenges in the feld and opportunities for One aim of this workshop is to build connections joining forces. The event will feature a series of between Topological Data Analysis on one side high-profle invited speakers from industry, and Manifold Learning on the other. This is academia, engineering and medical sciences who starting to happen, after years of more or less aim to give an overview of recent advances, separate evolution of the two felds. The moment challenges, latest technology and eforts for has been reached when the mathematical, sharing clinical data. statistical and algorithmic foundations of both areas are mature enough -- it is now time to lay **Motivation** the foundations for joint topological and diferential geometric understanding of data, and Medical imaging is facing a major crisis with an this workshop will expliecitly focus on this ever increasing complexity and volume of data process. and immense economic pressure. The 80 Dec. 9, 2017

interpretation of medical images pushes human 10:30 Cofee break - Morning abilities to the limit with the risk that critical AM patterns of disease go undetected. Machine 10:30 Poster session - Morning Kwon, Kim, learning has emerged as a key technology for AM Konukoglu, Li, Guibas, Virdi, Kumar, developing novel tools in computer aided Mardani, Wolterink, Gong, Antropova, diagnosis, therapy and intervention. Still, Stelzer, Bidart, Weng, Rajchl, Górriz, progress is slow compared to other felds of visual Singh, Sandino, Chougrad, Hu, Godfried, recognition which is mainly due to the domain Xiao, Tejeda Lemus, Harrod, WOO, Chen, complexity and constraints in clinical applications Cheng, Gupta, Yee, Glocker, Lombaert, which require most robust, accurate, and reliable Ilse, Lisowska, Doyle, Makkie solutions. 11:00 Deep learning for cardiovascular AM image analysis (University Medical **Call for Abstracts** Center Utrecht) Isgum 11:30 Deep learning for image We invite submissions of extended abstracts for AM reconstruction, segmentation and poster presentation during the workshop. super-resolution in medical imaging Submitting an abstract is an ideal way of (Imperial College London) Rueckert engaging with the workshop and to showcase 12:00 Biomedical Imaging and Genetic research in the area of machine learning for PM (BIG) Data Analytics in Dementia medical imaging. Submitted work does not have and Oncology (Erasmus MC to be original and can be already published Rotterdam, Quantib) Niessen elsewhere and/or can be of preliminary nature. 12:30 Lunch break There will be no workshop proceedings, and the PM poster session may be conditional on receiving sufciently many submissions. Accepted 01:30 Scikit-learn & nilearn: abstracts together with author information will be PM Democratisation of machine learning made available on this website. for brain imaging (INRIA) Varoquaux 02:00 The Multimodal Brain Tumor **Dates** PM Segmentation Challenge (TU Munich) Menze Submissions: Sunday, October 29th, midnight 02:30 DeepLumen: Accurate Vessel PST PM Segmentation for Better Notifcations: Sunday, November 5th Cardiovascular Care (HeartFlow) Workshop: Saturday, December 9th, 8:45 AM - 6 Petersen PM 03:00 Poster session - Afternoon Kwon, Kim, PM Konukoglu, Li, Guibas, Virdi, Kumar, **Schedule (tentative)** Mardani, Wolterink, Gong, Antropova, Stelzer, Bidart, Weng, Rajchl, Górriz, Singh, Sandino, Chougrad, Hu, Godfried, Schedule Xiao, Tejeda Lemus, Harrod, WOO, Chen, Cheng, Gupta, Yee, Glocker, Lombaert, 08:45 Opening Glocker, Konukoglu, Lombaert, Ilse, Lisowska, Doyle, Makkie AM Bhatia 03:00 Cofee break - Afternoon 09:00 Shaping the future through PM AM innovations: Artifcial intelligence 03:30 NiftyNet: An open-source comunity- for healthcare (Siemens) Comaniciu PM driven framework for neural 09:30 Role of AI and Deep Learning in networks in medical imaging AM Radiology (IBM) Syeda-Mahmood (University College London) Cardoso 10:00 Pixel Perfectionism - Machine 04:00 Winner of The Digital Mammography AM learning and Adaptive Radiation PM DREAM Challenge (Therapixel) Nikulin Therapy (University of Cambridge & 04:30 Panel discussion Microsoft Research) Jena PM 81 Dec. 9, 2017

05:00 Closing Glocker, Konukoglu, Lombaert, not led to more extreme and polarised opinions. PM Bhatia What is the efect of content prioritisation – and more generally, the efect of the afordances of the social network – on the nature of discussion Abstracts (1): and debate? Social networks could potentially enable society-wide debate and collective Abstract 1: Opening in Medical Imaging intelligence. On the other hand, they could also meets NIPS, Glocker, Konukoglu, Lombaert, encourage communal reinforcement by enforcing Bhatia 08:45 AM conformity within friendship groups, in that it is a Opening remarks by the organizers daring person who posts an opinion at odds with the majority of their friends. Each design of content prioritisation may nudge users towards particular styles of both content-viewing and of Workshop on Prioritising Online Content content-posting and discussion. What is the nature of the interaction between content- John Shawe-Taylor, Massimiliano Pontil, Nicolò Cesa- presentation and users’ viewing and debate? Bianchi, Emine Yilmaz, Chris Watkins, Sebastian Content may be prioritised either ‘transparently’ Riedel, Marko Grobelnik according to users’ explicit choices of what they want to see, combined with transparent 103 C, Sat Dec 09, 08:00 AM community voting, and moderators whose decisions can be questioned (e.g. Reddit). At the Social Media and other online media sources play other extreme, content may be prioritised by a critical role in distributing news and informing proprietary algorithms that model each user’s public opinion. Initially it seemed that preferences and then predict what they want to democratising the dissemination of information see. What is the range of possible designs and and news with online media might be wholly what are their efects? Could one design good – but during the last year we have intelligent power-tools for moderators? witnessed other perhaps less positive efects. The online portal Reddit is a rare exception to the The algorithms that prioritise content for users general rule in that it has proven a popular site aim to provide information that will be ‘liked’ by despite employing a more nuanced algorithm for each user in order to retain their attention and the prioritisation of content. The approach was, interest. These algorithms are now well-tuned however, apparently designed to manage trafc and are indeed able to match content to diferent fows rather than create a better balance of users’ preferences. This has meant that users opinions. It would, therefore, appear that even for increasingly see content that aligns with their this algorithm its efect on prioritisation is only world view, confrms their beliefs, supports their partially understood or intended. opinions, in short that maintains their If we view social networks as implementing a ‘information bubble’, creating the so-called echo- large scale message-passing algorithm chambers. As a result, views have often become attempting to perform inference about the state more polarised rather than less, with people of the world and possible interventions and/or expressing genuine disbelief that fellow citizens improvements, the current prioritisation could possibly countenance alternative opinions, algorithms create many (typically short) cycles. It be they pro- or anti-brexit, pro- or anti-Trump. is well known that inference based on message Perhaps the most extreme example is that of fake passing fails to converge to an optimal solution if news in which news is created in order to satisfy the underlying graph contains cycles because and reinforce certain beliefs. information then becomes incorrectly weighted. This polarisation of views cannot be benefcial for Perhaps a similar situation is occurring with the society. As the success of Computer Science and use of social media? Is it possible to model this more specifcally Machine Learning have led to phenomenon as an approximate inference task? this undesirable situation, it is natural that we The workshop will provide a forum for the should now ask how Online Content might be presentation and discussion of analyses of online prioritised in such a way that users are still prioritisation with emphasis on the biases that satisfed with an outlet but at the same time are such prioritisations introduce and reinforce. 82 Dec. 9, 2017

Particular interest will be placed on presentations 12:10 Lunch Break that consider alternative ways of prioritising PM content where it can be argued that they will 01:00 Philosophy and ethics of defning, reduce the negative side-efects of current PM identifying, and tackling fake news methods while maintaining user loyalty. and inappropriate content Watkins 02:00 Political echo chambers in social Call for contributions - see conference web page PM media Gionis via link above. 03:00 Cofee break PM We will issue a call for contributions highlighting 03:30 Reality around fake news but not restricted to the following themes: PM (*) predicting future global events from media (*) detecting and predicting new major trends in the scientifc literature Abstracts (1): (*) enhancing content with information from fact checkers Abstract 13: Political echo chambers in social (*) detection of fake news media in Workshop on Prioritising Online (*) detecting and mitigating tribalism among Content, Gionis 02:00 PM online personas Echo chambers describe situations where one is (*) adapted and improved mechanisms of exposed only to opinions that agree with their information spreading own. In this talk we will discuss the phenomenon (*) algorithmic fairness in machine learning of political echo chambers in social media. We identify the diferent components in the Schedule phenomenon and characterize users based on their behavior with respect to content production and consumption. Among other fndings, we 09:00 Automating textual claim observe that users who try to bridge the echo AM verifcation Vlachos chambers have to pay a "price of bipartisanship." 10:30 Reducing controversy by connecting We then discuss ideas for combating echo AM opposing views Gionis, Garimella chambers. We frst present a model for learning 10:50 Leveraging the Crowd to Detect and ideological-leaning factors, of social-media users AM Reduce the Spread of Fake News and media sources, in a joint latent space. The and Misinformation Oh, Schölkopf model space can be used to develop exploratory 11:10 A Framework for Automated Fact- and interactive interfaces that can help users to AM Checking for Real-Time Validation of difuse their information flter bubble. Second we Emerging Claims on the Web present an infuence-based approach for Hanselowski balancing the information exposure of users in 11:30 Equality of Opportunity in Rankings the social network. AM Joachims, Singh 11:35 The Unfair Externalities of AM Exploration Slivkins, Wortman Vaughan 11:40 Developing an Information Source Cognitively Informed Artifcial Intelligence: AM Lexicon Addawood, Mishra Insights From Natural Intelligence 11:45 Mitigating the spread of fake news AM by identifying and disrupting echo Mike Mozer, Brenden Lake, Angela Yu chambers Warren, Huyen 104 A, Sat Dec 09, 08:00 AM 11:50 An Efcient Method to Impose AM Fairness in Linear Models Pontil, The goal of this workshop is to bring together Shawe-Taylor cognitive scientists, neuroscientists, and AI 11:55 Poster session Cesa-Bianchi researchers to discuss opportunities for AM improving machine learning by leveraging our 83 Dec. 9, 2017

scientifc understanding of human perception and concepts and skills form a compositional basis for cognition. There is a history of making these learning new concepts and skills. connections: artifcial neural networks were originally motivated by the massively parallel, (2) _Memory_. Human memory operates on deep architecture of the brain; considerations of multiple time scales, from memories that literally biological plausibility have driven the persist for the blink of an eye to those that development of learning procedures; and persist for a lifetime. These diferent forms of architectures for computer vision draw parallels memory serve diferent computational purposes. to the connectivity and physiology of mammalian Although forgetting is typically thought of as a visual cortex. However, beyond these celebrated disadvantage, the ability to selectively forget/ examples, cognitive science and neuroscience override irrelevant knowledge in nonstationary has fallen short of its potential to infuence the environments is highly desirable. next generation of AI systems. Areas such as memory, attention, and development have rich (3) _Attention and Decision Making_. These refer theoretical and experimental histories, yet these to relatively high-level cognitive functions that concepts, as applied to AI systems so far, only allow task demands to purposefully control an bear a superfcial resemblance to their biological agent’s external environment and sensory data counterparts. stream, dynamically reconfgure internal representation and architecture, and devise The premise of this workshop is that there are action plans that strategically trade of multiple, valuable data and models from cognitive science oft-conficting behavioral objectives. that can inform the development of intelligent adaptive machines, and can endow learning The long-term aims of this workshop are: architectures with the strength and fexibility of the human . The structures * to promote work that incorporates insights from and mechanisms of the mind and brain can human cognition to suggest novel and improved provide the sort of strong inductive bias needed AI architectures; for machine-learning systems to attain human- like performance. We conjecture that this * to facilitate the development of ML methods inductive bias will become more important as that can better predict human behavior; and researchers move from domain-specifc tasks such as object and speech recognition toward * to support the development of a feld of tackling general intelligence and the human-like ‘cognitive computing’ that is more than a ability to dynamically reconfgure cognition in marketing slogan一a feld that improves on both service of changing goals. For ML researchers, natural and artifcial cognition by synergistically the workshop will provide access to a wealth of advancing each and integrating their strengths in data and concepts situated in the context of complementary manners. contemporary ML. For cognitive scientists, the workshop will suggest research questions that are of critical interest to ML researchers. Schedule

The workshop will focus on three interconnected 08:30 Workshop overview Mozer, Yu, Lake topics of particular relevance to ML: AM 08:40 Cognitive AI Lake (1) _Learning and development_. Cognitive AM capabilities expressed early in a child’s 09:05 Computational modeling of human development are likely to be crucial for AM face processing Yu bootstrapping adult learning and intelligence. 09:30 People infer object shape in a 3D, Intuitive physics and intuitive psychology allow AM object-centered coordinate system the developing organism to build an Jacobs understanding of the world and of other agents. Additionally, children and adults often 09:55 Relational neural expectation demonstrate “learning-to-learn,” where previous AM maximization van Steenkiste 84 Dec. 9, 2017

10:10 Contextual dependence of human 04:45 Generating more human-like AM preference for complex objects: A PM recommendations with a cognitive Bayesian statistical account Ryali model of generalization Bourgin 10:15 A biologically-inspired sparse, 04:50 POSTER: Concept acquisition AM topographic recurrent neural PM through meta-learning Grant network model for robust change 04:50 POSTER: Question asking as detection Sridharan PM program generation Rothe 10:20 Visual attention guided deep 04:50 POSTER: Variational probability fow AM imitation learning Zhang PM for biologically plausible training of 10:25 Human learning of video games deep neural networks LIU, Lin AM Tsividis 04:50 POSTER: Cognitive modeling and the 10:30 COFFEE BREAK AND POSTER PM wisdom of the crowd Lee AM SESSION 04:50 POSTER: Improving transfer using 11:00 Life history and learning: Extended PM augmented feedback in progressive AM human childhood as a way to resolve neural networks Bablani, Chadha explore/exploit trade-ofs and 04:50 POSTER: Learning to organize improve hypothesis search Gopnik PM knowledge with N-gram machines 11:25 Meta-reinforcement learning in Yang AM brains and machines Botvinick 04:50 POSTER: Using STDP for 11:50 Revealing human inductive biases PM unsupervised, event-based online AM and metacognitive processes with learning Thiele rational models Grifths 04:50 POSTER: Sample-efcient 12:15 Learning to select computations PM reinforcement learning through PM Lieder, Callaway, Gul, Krueger transfer and architectural priors 02:00 From deep learning of disentangled Spector PM representations to higher-level 04:50 POSTER: Power-law temporal cognition Bengio PM discounting over a logarithmically 02:25 Access consciousness and the compressed timeline for scale PM construction of actionable invariant reinforcement learning representations Mozer Tiganj 02:50 Evaluating the capacity to reason 04:50 POSTER: Pre-training attentional PM about beliefs Nematzadeh PM mechanisms Lindsey 03:05 COFFEE BREAK AND POSTER 04:50 POSTER: Context-modulation of PM SESSION II PM hippocampal dynamics and deep 03:30 Mapping the spatio-temporal convolutional networks Aimone PM dynamics of cognition in the human 04:50 POSTER: Curiosity-driven brain Oliva PM reinforcement learning with 03:55 Scale-invariant temporal memory in hoemostatic regulation Magrans de PM AI Howard Abril 04:20 Scale-invariant temporal history 05:25 Object-oriented intelligence Battaglia PM (SITH): Optimal slicing of the past in PM an uncertain world Spears, Jacques, 05:50 Representational primitives, in Howard, Sederberg PM minds and machines Marcus 04:35 Efcient human-like semantic PM representations via the information bottleneck principle Zaslavsky 04:40 The mutation sampler: A sampling PM approach to causal representation Davis 85 Dec. 9, 2017

Machine Learning in Computational appropriate for the workshop. We will also Biology encourage contributions to address new challenges in analyzing data generated from

James Zou, Anshul Kundaje, Gerald Quon, Nicolo gene editing, single cell genomics and other Fusi, Sara Mostafavi novel technologies. The targeted audience are people with interest in machine learning and 104 B, Sat Dec 09, 08:00 AM applications to relevant problems from the life sciences, including NIPS participants without any The feld of computational biology has seen existing research link to computational biology. dramatic growth over the past few years. A wide Many of the talks will be of interest to the broad range of high-throughput technologies developed machine learning community. in the last decade now enable us to measure parts of a biological system at various resolutions —at the genome, epigenome, transcriptome, and Schedule proteome levels. These technologies are now being used to collect data for an ever-increasingly 08:55 Opening Remarks diverse set of problems, ranging from classical AM problems such as predicting diferentially regulated genes between time points and 09:00 Christina Leslie - Decoding immune predicting subcellular localization of RNA and AM cell states and dysfunction proteins, to models that explore complex 09:40 Denoising scRNA-seq Data Using mechanistic hypotheses bridging the gap AM Deep Count Autoencoders (Gökcen between genetics and disease, population Eraslan) genetics and transcriptional regulation. Fully 09:55 Fine Mapping of Chromatin realizing the scientifc and clinical potential of AM Interactions via Neural Nets (Artur these data requires developing novel supervised Jaroszewicz) and unsupervised learning methods that are 10:10 F-MoDISco: Learning High-Quality, scalable, can accommodate heterogeneity, are AM Non-Redundant Transcription Factor robust to systematic noise and confounding Binding Motifs Using Deep Learning factors, and provide mechanistic insights. (Avanti Shrikumar) 10:30 Cofee break + posters The goals of this workshop are to i) present AM emerging problems and innovative machine 11:00 Spotlights learning techniques in computational biology, AM and ii) generate discussion on how to best model 12:00 Lunch+Posters the intricacies of biological data and synthesize PM and interpret results in light of the current work in 01:30 Posters the feld. We will invite several leaders at the PM intersection of computational biology and 02:00 Eran Halperin - A new sparse PCA machine learning who will present current PM algorithm with guaranteed research problems in computational biology and asymptotic properties and lead these discussions based on their own applications in methylation data research and experiences. We will also have the usual rigorous screening of contributed talks on 02:40 Variational Bayes inference novel learning approaches in computational PM algorithm for causal multivariate biology. We encourage contributions describing mediation with linkage either progress on new bioinformatics problems disequilibrium (Yongjin Park) or work on established problems using methods 02:55 Reference-Free Archaic Admixture that are substantially diferent from established PM Segmentation Using A Permutation- alternatives. Deep learning, kernel methods, Equivariant Network (Jefrey Chan) graphical models, feature selection, non- parametric models and other techniques applied to relevant bioinformatics problems would all be 86 Dec. 9, 2017

03:10 Robust and Scalable Models of and techniques can enable more advanced PM Microbiome Dynamics for machine learning models, run large-scale Bacteriotherapy Design (Travis machine learning on accelerators with better Gibson) performance, and increase the usability of 04:00 Ben Raphael - Inferring Tumor machine learning frameworks for practitioners. PM Evolution Topics for discussion will include: 04:40 Drug Response Variational PM Autoencoder (Ladislav Rampášek) * What abstractions (languages, kernels, interfaces, instruction sets) do we need to 04:55 Variational auto-encoding of protein develop advanced automatic diferentiation PM sequences (Sam Sinai) frameworks for the machine learning ecosystem? 05:10 Antigen Identifcation for Cancer * What diferent use cases exist in machine PM Immunotherapy by Deep Learning learning, from large-scale performance-critical on Tumor HLA Peptides models to small prototypes, and how should our 05:25 Sponsors + prizes toolsets refect these needs? PM * What advanced techniques from the automatic 06:00 Test push to whova diferentiation literature, such as checkpointing, PM diferentiating through iterative processes or chaotic systems, cross-country elimination, etc., could be adopted by the ML community to enable The future of gradient-based machine research on new models? * How can we foster greater collaboration learning software & techniques between the felds of machine learning and

Alex Wiltschko, Bart van Merriënboer, Pascal Lamblin automatic diferentiation?

104 C, Sat Dec 09, 08:00 AM Schedule Many algorithms in machine learning, computer vision, physical simulation, and other felds 09:00 Introduction and opening remarks require the calculation of gradients and other AM Wiltschko derivatives. Manual derivation of gradients can 09:10 Beyond backprop: automatic be time consuming and error-prone. Automatic AM diferentiation in machine learning diferentiation comprises a set of techniques to Baydin calculate the derivative of a numerical 09:50 Automatic diferentiation in PyTorch computation expressed as a computer program. AM Paszke These techniques are commonly used in 10:30 Morning Cofee Break atmospheric sciences and computational fuid AM dynamics, and have more recently also been adopted by machine learning researchers. 11:00 Optimal Smoothing for Pathwise AM Adjoints Hüser Practitioners across many felds have built a wide 11:40 Poster session Naumann, Schwartz, set of automatic diferentiation tools, using AM Wei, Meissner, Druce, Lin, Pothen, Yang diferent programming languages, computational 01:40 Algorithmic diferentiation primitives and intermediate compiler PM techniques in the deep learning representations. Each of these choices comes context Utke with positive and negative trade-ofs, in terms of 02:20 Some highlights on Source-to- their usability, fexibility and performance in PM Source Adjoint AD Hascoet specifc domains. 03:00 Afternoon Cofee Break PM This workshop will bring together researchers in 03:30 Divide-and-Conquer Checkpointing the felds of automatic diferentiation and PM for Arbitrary Programs with No User machine learning to discuss ways in which Annotation Siskind advanced automatic diferentiation frameworks 87 Dec. 9, 2017

04:10 Automatic Diferentiation of transportation systems as a whole rather than PM Parallelised Convolutional Neural solving problems in isolation. New machine Networks - Lessons from Adjoint learning solutions are needed as transportation PDE Solvers Hueckelheim places specifc requirements such as extremely 04:50 Panel discussion Baydin, Paszke, low tolerance on uncertainty and the need to PM Hüser, Utke, Hascoet, Siskind, intelligently coordinate self-driving cars through Hueckelheim, Griewank V2V and V2X.

The goal of this workshop is to bring together researchers and practitioners from all areas of 2017 NIPS Workshop on Machine Learning intelligent transportations systems to address for Intelligent Transportation Systems core challenges with machine learning. These challenges include, but are not limited to Li Erran Li, Anca Dragan, Juan Carlos Niebles, Silvio accurate and efcient pedestrian detection, Savarese pedestrian intent detection, machine learning for object tracking, 201 A, Sat Dec 09, 08:00 AM unsupervised representation learning for Our transportation systems are poised for a autonomous driving, transformation as we make progress on deep reinforcement learning for learning driving autonomous vehicles, vehicle-to-vehicle (V2V) policies, and vehicle-to-everything (V2X) communication cross-modal and simulator to real-world transfer infrastructures, and smart road infrastructures learning, such as smart trafc lights. scene classifcation, real-time perception and There are many challenges in transforming our prediction of trafc scenes, current transportation systems to the future uncertainty propagation in deep neural networks, vision. For example, how to make perception efcient inference with deep neural networks accurate and robust to accomplish safe predictive modeling of risk and accidents through autonomous driving? How to learn long term telematics, modeling, simulation and forecast of driving strategies (known as driving policies) so demand and mobility patterns in large scale that autonomous vehicles can be equipped with urban transportation systems, adaptive human negotiation skills when merging, machine learning approaches for control and overtaking and giving way, etc? how do we coordination of trafc leveraging V2V and V2X achieve near-zero fatality? How do we optimize infrastructures, efciency through intelligent trafc management and control of feets? How do we optimize for The workshop will include invited speakers, trafc capacity during rush hours? To meet these panels, presentations of accepted papers and requirements in safety, efciency, control, and posters. We invite papers in the form of short, capacity, the systems must be automated with long and position papers to address the core intelligent decision making. challenges mentioned above. We encourage researchers and practitioners on self-driving cars, Machine learning will be essential to enable transportation systems and ride-sharing intelligent transportation systems. Machine platforms to participate. Since this is a topic of learning has made rapid progress in self-driving, broad and current interest, we expect at least e.g. real-time perception and prediction of trafc 150 participants from leading university scenes, and has started to be applied to ride- researchers, auto-companies and ride-sharing sharing platforms such as Uber (e.g. demand companies. forecasting) and crowd-sourced video scene analysis companies such as Nexar Schedule (understanding and avoiding accidents). To address the challenges arising in our future transportation system such as trafc 08:45 Opening Remarks Li management and safety, we need to consider the AM 88 Dec. 9, 2017

09:00 Machine Learning for Self-Driving 03:30 On Autonomous Driving: Challenges AM Cars, Raquel Urtasun, Uber ATG and PM and Opportunities, Sertac Karaman, University of Toronto Urtasun MIT Karaman 09:30 Exhausting the Sim with Domain 04:00 Generative Adversarial Imitation AM Randomization and Trying to PM Learning, Stefano Ermon, Stanford Exhaust the Real World, Pieter Ermon Abbeel, UC Berkeley and Embodied 04:30 Deep Learning and photo-realistic Intelligence Abbeel, Kahn PM Simulation for real-world Machine 10:00 Learning-based system Intelligence in Autonomous Driving, AM identifcation and decision making Adrien Gaidon (TRI) Gaidon for autonomous driving, Marin 05:00 Panel Discussion Kahn, Sarukkai, Kobilarov, Zoox Kobilarov PM Gaidon, Karaman 10:30 Poster and Cofee AM Abstracts (11): 11:00 Hesham M. Eraqi, Mohamed N. AM Moustafa, Jens Honer, End-to-End Abstract 2: Machine Learning for Self-Driving Deep Learning for Steering Cars, Raquel Urtasun, Uber ATG and Autonomous Vehicles Considering University of Toronto in 2017 NIPS Temporal Dependencies Workshop on Machine Learning for 11:10 Abhinav Jauhri (CMU), Carlee Joe- Intelligent Transportation Systems, Urtasun AM Wong, John Paul Shen, On the Real- 09:00 AM time Vehicle Placement Problem Jauhri, Shen Bio: Raquel Urtasun is the Head of Uber ATG 11:20 Andrew Best (UNC), Sahil Narang, Toronto. She is also an Associate Professor in the AM Lucas Pasqualin, Daniel Barber, Department of Computer Science at the Dinesh Manocha, AutonoVi-Sim: University of Toronto, a Raquel Urtasun is the Autonomous Vehicle Simulation Head of Uber ATG Toronto. She is also an Platform with Weather, Sensing, and Associate Professor in the Department of Trafc control Best Computer Science at the University of Toronto, a Canada Research Chair in Machine Learning and 11:30 Nikita Japuria (MIT), Golnaz Habibi, Computer Vision and a co-founder of the Vector AM Jonathan P. How, CASNSC: A context- Institute for AI. Prior to this, she was an Assistant based approach for accurate Professor at the Toyota Technological Institute at pedestrian motion prediction at Chicago (TTIC), an academic computer science intersections Jaipuria institute afliated with the University of Chicago. 11:40 6 Spotlight Talks (3 min each) Siam, She was also a visiting professor at ETH Zurich AM Prabhushankar, Parashar, Mukadam, yao, during the spring semester of 2010. She received Senanayake her Bachelors degree from Universidad Publica de 12:00 Lunch Navarra in 2000, her Ph.D. degree from the PM Computer Science department at Ecole 01:30 Adaptive Deep Learning for Polytechnique Federal de Lausanne (EPFL) in PM Perception, Action, and Explanation, 2006 and did her postdoc at MIT and UC Berkeley. Trevor Darrell (UC Berkeley) Darrell She is a world leading expert in machine 02:00 The challenges of applying machine perception for self-driving cars. Her research PM learning to autonomous vehicles, interests include machine learning, computer Andrej Karpathy, Telsa vision, robotics and remote sensing. Her lab was 02:30 Micro-Perception Approach to selected as an NVIDIA NVAIL lab. She is a PM Intelligent Transport, Ramesh recipient of an NSERC EWR Steacie Award, an Sarukkai (Lyft) Sarukkai NVIDIA Pioneers of AI Award, a Ministry of 03:00 Posters and Cofe Education and Innovation Early Researcher PM Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught 89 Dec. 9, 2017

New Researcher Award and two Best Paper locomotion. He has won various awards, including Runner up Prize awarded at the Conference on best paper awards at ICML, NIPS and ICRA, the Computer Vision and Pattern Recognition (CVPR) Sloan Fellowship, the Air Force Ofce of Scientifc in 2013 and 2017 respectively. She is also an Research Young Investigator Program (AFOSR-YIP) Editor of the International Journal in Computer award, the Ofce of Naval Research Young Vision (IJCV) and has served as Area Chair of Investigator Program (ONR-YIP) award, the DARPA multiple machine learning and vision conferences Young Faculty Award (DARPA-YFA), the National (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV). Science Foundation Faculty Early Career Development Program Award (NSF-CAREER), the Presidential Early Career Award for Scientists and Engineers (PECASE), the CRA-E Undergraduate Abstract 3: Exhausting the Sim with Domain Research Faculty Mentoring Award, the MIT TR35, Randomization and Trying to Exhaust the Real World, Pieter Abbeel, UC Berkeley and the IEEE Robotics and Automation Society (RAS) Embodied Intelligence in 2017 NIPS Early Career Award, IEEE Fellow, and the Dick Workshop on Machine Learning for Volz Best U.S. Ph.D. Thesis in Robotics and Automation Award. Intelligent Transportation Systems, Abbeel, Kahn 09:30 AM

Abstract: Reinforcement learning and imitation Abstract 4: Learning-based system learning have seen success in many domains, identifcation and decision making for including autonomous helicopter fight, Atari, autonomous driving, Marin Kobilarov, Zoox simulated locomotion, Go, robotic manipulation. in 2017 NIPS Workshop on Machine However, sample complexity of these methods Learning for Intelligent Transportation remains very high. In this talk I will present two Systems, Kobilarov 10:00 AM ideas towards efective data collection, and initial fndings indicating promise for both: (i) Domain Abstract: The talk will include a brief overview of Randomization, which relies on extensive methods for planning and perception developed variation (none of it necessarily realistic) in at Zoox, and focus on some recent results for simulation, aiming at generalization to the real learning-based system identifcation and decision world per the real world (hopefully) being like just making. another random sample. (ii) Self-supervised Deep RL, which considers the problem of autonomous Marin Kobilarov is principal engineer for planning data collection. We evaluate our approach on a and control at Zoox and assistant professor in real-world RC car and show it can learn to Mechanical Engineering at the Johns Hopkins navigate through a complex indoor environment University where he leads the Laboratory for with a few hours of fully autonomous, self- Autonomous Systems, Control, and Optimization. supervised training. His research focuses on planning and control of robotic systems, on approximation methods for BIO: Pieter Abbeel (Professor at UC Berkeley optimization and statistical learning, and [2008- ], Co-Founder Embodied Intelligence applications to autonomous vehicles. Until 2012, [2017- ], Co-Founder Gradescope [2014- ], he was a postdoctoral fellow in Control and Research Scientist at OpenAI [2016-2017]) works Dynamical Systems at the California Institute of in machine learning and robotics, in particular his Technology. He obtained a Ph.D. from the research focuses on making robots learn from University of Southern California in Computer people (apprenticeship learning), how to make Science (2008) and a B.S. in Computer Science robots learn through their own trial and error and Applied Mathematics from Trinity College, (reinforcement learning), and how to speed up Hartford, CT (2003). skill acquisition through learning-to-learn. His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, and Abstract 10: 6 Spotlight Talks (3 min each) in organizing laundry. His group has pioneered deep 2017 NIPS Workshop on Machine Learning reinforcement learning for robotics, including for Intelligent Transportation Systems, Siam, learning visuomotor skills and simulated 90 Dec. 9, 2017

Prabhushankar, Parashar, Mukadam, yao, long-term recurrent network models that learn Senanayake 11:40 AM cross-modal description and explanation, using implicit and explicit approaches, which can be 1. Mennatullah Siam, Heba Mahgoub, Mohamed applied to domains including fne-grained Zahran, Senthil Yogamani, Martin Jagersand, recognition and visuomotor policies. Ahmad El-Sallab, Motion and appearance based Multi-Task Learning network for autonomous driving Abstract 13: The challenges of applying machine learning to autonomous vehicles, 2. Dogancan Temel, Gukyeong Kwon, Mohit Andrej Karpathy, Telsa in 2017 NIPS Prabhushankar, Ghassan AlRegib, CURE-TSR: Challenging Unreal and Real Environments for Workshop on Machine Learning for Trafc Sign Recognition Intelligent Transportation Systems, 02:00 PM

Abstract: 3. Priyam Parashar, Akansel Cosgun, Alireza I'll cover cover some of the discrepancies Nakhaei and Kikuo Fujimura, Modeling between machine learning problems in academia Preemptive Behaviors for Uncommon Hazardous and those in the industry, especially in the Situations From Demonstrations context of autonomous vehicles. I will also cover Tesla's approach to massive feet learning and 4. Mustafa Mukadam, Akansel Cosgun, Alireza some of the associated open research problems. Nakhaei, Kikuo Fujimura, Tactical Decision Making for Lane Changing with Deep Reinforcement Bio: Learning Andrej is a Director of AI at Tesla, where he focuses on computer vision for the Autopilot. 5. Hengshuai Yao, Masoud S. Nosrati, Kasra Previously he was a research scientist at OpenAI Rezaee, Monte-Carlo Tree Search vs. Model- working on Reinforcement Learning and a PhD Predictive Controller: A Track-Following Example student at Stanford working on end-to-end learning of Convolutional/Recurrent neural 6. Ransalu Senanayake, Thushan Ganegedara, network architectures for images and text. Fabio Ramos, Deep occupancy maps: a continuous mapping technique for dynamic environments Abstract 14: Micro-Perception Approach to Intelligent Transport, Ramesh Sarukkai (Lyft) in 2017 NIPS Workshop on Machine Abstract 12: Adaptive Deep Learning for Learning for Intelligent Transportation Perception, Action, and Explanation, Trevor Systems, Sarukkai 02:30 PM Darrell (UC Berkeley) in 2017 NIPS Workshop on Machine Learning for Abstract: In this talk, we will focus on the broader Intelligent Transportation Systems, Darrell angle of applying machine learning to diferent 01:30 PM aspects of transportation - ranging from trafc congestion, real-time speed estimation, image Learning of layered or "deep" representations has based localization, and active map making as provided signifcant advances in computer vision examples. In particular, as we grow the portfolio in recent years, but has traditionally been limited of models, we see an unique opportunity in to fully supervised settings with very large building out a unifed framework with a number amounts of training data, where the model lacked of micro-perception services for intelligent interpretability. New results in adversarial transport which allows for portability and adaptive representation learning show how such optimization across multiple transport use cases. methods can also excel when learning across We also discuss implications for existing ride- modalities and domains, and further can be sharing transport as well as potential impact to trained or constrained to provide natural autonomous. language explanations or multimodal visualizations to their users. I'll present recent Bio: Dr. Ramesh Sarukkai currently heads up the 91 Dec. 9, 2017

Geo teams (Mapping, Localization & Perception) robotics and control theory. In particular, he at Lyft. Prior to that he was a Director of studies the applications of probability theory, Engineering at Facebook and Google/YouTube stochastic processes, stochastic geometry, formal where he led a number of platform & products methods, and optimization for the design and initiatives including applied machine learning analysis of high-performance cyber-physical teams, consumer/advertising video products and systems. The application areas of his research core payments/risk/developer platforms. He has include driverless cars, unmanned aerial vehicles, given a number of talks/keynotes/panelist at distributed aerial surveillance systems, air trafc major conferences/workshops such as W3C WWW control, certifcation and verifcation of control Conferences, ACM Multimedia, and published/ systems software, and many others. He is the presented papers at leading journals/conferences recipient of an IEEE Robotics and Automation on internet technologies, speech/audio, computer Society Early Career Award in 2017, an Ofce of vision and machine learning, in addition to Naval Research (ONR) Young Investigator Award authoring a book on “Foundations of Web in 2017, Army Research Ofce (ARO) Young Technology” (Kluwer/Springer). He also holds a Investigator Award in 2015, National Science large number of patents in the aforementioned Foundation Faculty Career Development areas and graduated with a PhD in computer (CAREER) Award in 2014, AIAA Wright Brothers science from the University of Rochester. Graduate Award in 2012, and an NVIDIA Fellowship in 2011.

Abstract 16: On Autonomous Driving: Challenges and Opportunities, Sertac Abstract 17: Generative Adversarial Imitation Karaman, MIT in 2017 NIPS Workshop on Learning, Stefano Ermon, Stanford in 2017 Machine Learning for Intelligent NIPS Workshop on Machine Learning for Transportation Systems, Karaman 03:30 PM Intelligent Transportation Systems, Ermon 04:00 PM Abstract: Fully-autonomous driving has long been an sought after. DARPA’s eforts dating back a Abstract: Consider learning a policy from example decade has ignited the frst spark, showcasing expert behavior, without interaction with the the possibilities. Then, the AI revolution pushed expert or access to a reward or cost signal. One the boundaries. These lead to the creation of a approach is to recover the expert’s cost function rapidly-growing ecosystem around developing with inverse reinforcement learning, then self-driving capabilities. In this talk, we briefy compute an optimal policy for that cost function. summarize our experience in the DARPA Urban This approach is indirect and can be slow. In this Challenge as Team MIT, one of the six fnishers of talk, I will discuss a new generative modeling the race. We then highlight few of our recent framework for directly extracting a policy from research results at MIT, including end-to-end data, drawing an analogy between imitation deep learning for parallel autonomy and sparse- learning and generative adversarial networks. I to-dense depth estimation for autonomous will derive a model-free imitation learning driving. We conclude with a few questions that algorithm that obtains signifcant performance may be relevant in the near future. gains over existing methods in imitating complex behaviors in large, high-dimensional Bio: Sertac Karaman is an Associate Professor of environments. Our approach can also be used to Aeronautics and Astronautics at the infer the latent structure of human Massachusetts Institute of Technology (since Fall demonstrations in an unsupervised way. As an 2012). He has obtained B.S. degrees in example, I will show a driving application where a mechanical engineering and in computer model learned from demonstrations is able to engineering from the Istanbul Technical both produce diferent driving styles and University, Turkey, in 2007; an S.M. degree in accurately anticipate human actions using raw mechanical engineering from MIT in 2009; and a visual inputs. Ph.D. degree in electrical engineering and computer science also from MIT in 2012. His Bio: research interests lie in the broad areas of 92 Dec. 9, 2017

Stefano Ermon is currently an Assistant Professor motion). With our recent PHAV dataset [3], we in the Department of Computer Science at push the limits of this approach further by Stanford University, where he is afliated with the providing stochastic simulations of human Artifcial Intelligence Laboratory. He completed actions, camera paths, and environmental his PhD in computer science at Cornell in 2015. conditions. I will describe our work on these His research interests include techniques for synthetic 4D environments to automatically scalable and accurate inference in graphical generate potentially infnite amounts of varied models, large-scale combinatorial optimization, and realistic data. I will also describe how to and robust decision making under uncertainty, measure and mitigate the domain gap when and is motivated by a range of applications, in learning deep neural networks for diferent particular ones in the emerging feld of perceptual tasks needed for self-driving. I will computational sustainability. Stefano's research fnally show some recent results on more has won several awards, including three Best interactive simulation for autonomous driving and Paper Awards, a World Bank Big Data Innovation adversarial learning to automatically improve the Challenge, and was selected by Scientifc output of simulators. American as one of the 10 World Changing Ideas in 2016. He is a recipient of the Sony Faculty

Innovation Award and NSF CAREER Award. Abstract 19: Panel Discussion in 2017 NIPS Workshop on Machine Learning for Intelligent Transportation Systems, Kahn, Abstract 18: Deep Learning and photo- Sarukkai, Gaidon, Karaman 05:00 PM realistic Simulation for real-world Machine Intelligence in Autonomous Driving, Adrien Dmitry Chichkov - NVIDIA Gaidon (TRI) in 2017 NIPS Workshop on Adrien Gaidon - TRI Machine Learning for Intelligent Gregory Kahn - Berkeley Sertac Karaman - MIT Transportation Systems, Gaidon 04:30 PM Ramesh Sarukkai - Lyft Photo-realistic simulation is rapidly gaining momentum for visual training and test data generation in autonomous driving and general robotic contexts. This is particularly the case for Aligned Artifcial Intelligence video analysis, where manual labeling of data is extremely difcult or even impossible. This Dylan Hadfeld-Menell, Jacob Steinhardt, David scarcity of adequate labeled training data is Duvenaud, David Krueger, Anca Dragan widely accepted as a major bottleneck of deep 201 B, Sat Dec 09, 08:00 AM learning algorithms for important video understanding tasks like segmentation, tracking, In order to be helpful to users and to society at and action recognition. In this talk, I will describe large, an autonomous agent needs to be aligned our use of modern game engines to generate with the objectives of its stakeholders. Misaligned large scale, densely labeled, high-quality incentives are a common and crucial problem synthetic video data with little to no manual with human agents --- we should expect similar intervention. In contrast to approaches using challenges to arise from misaligned incentives existing video games to record limited data from with artifcial agents. For example, it is not human game sessions, we build upon the more uncommon to see reinforcement learning agents powerful approach of “virtual world generation”. ‘hack’ their specifed reward function. How do we Pioneering this approach, the recent Virtual KITTI build learning systems that will reliably achieve a [1] and SYNTHIA [2] datasets are among the user's _intended_ objective? How can we ensure largest fully-labelled datasets designed to boost that autonomous agents behave reliably in perceptual tasks in the context of autonomous unforeseen situations? How do we design driving and video understanding (including systems whose behavior will be aligned with the semantic and instance segmentation, 2D and 3D values and goals of society at large? As AI object detection and tracking, optical fow capabilities develop, it is crucial for the AI estimation, depth estimation, and structure from 93 Dec. 9, 2017

community to come to satisfying and trustworthy 04:00 Learning Reward Functions Leike answers to these questions. This workshop will PM focus on three central challenges in value 04:30 Informal Technical Discussion: Open alignment: learning complex rewards that refect PM Problems in AI Alignment human preferences (e.g. meaningful oversight, preference elicitation, inverse reinforcement learning, learning from demonstration or feedback), engineering reliable AI systems (e.g. NIPS Highlights (MLTrain), Learn How to robustness to distributional shift, model code a paper with state of the art misspecifcation, or adversarial data, via methods frameworks such as adversarial training, KWIK-style learning, or transparency to human inspection), and Alex Dimakis, Nikolaos Vasiloglou, Guy Van den dealing with bounded rationality and incomplete Broeck, Alexander Ihler, Assaf Araki information in both AI systems and their users 202, Sat Dec 09, 08:00 AM (e.g. acting on incomplete task specifcations, learning from users who sometimes make Every year hundreds of papers are published at mistakes). We also welcome submissions that do NIPS. Although the authors provide sound and not directly ft these categories but generally deal scientifc description and proof of their ideas, with problems relating to value alignment in there is no space for explaining all the tricks and artifcial intelligence. details that can make the implementation of the paper work. The goal of this workshop is to help Schedule authors evangelize their paper to the industry and expose the participants to all the Machine Learning/Artifcial Intelligence know-how that 09:15 Opening Remarks Hadfeld-Menell cannot be found in the papers. Also the efect/ AM importance of tuning parameters is rarely 09:30 Dynamic Safe Interruptibility for discussed, due to lack of space. AM Decentralized Multi-Agent Submissions Reinforcement Learning Hendrikx We encourage you to prepare a poster of your 09:45 Minimax-Regret Querying on Side favorite paper that explains graphically and at a AM Efects in Factored Markov Decision higher level the concepts and the ideas discussed Processes Singh in it. You should also submit a jupyter notebook 10:15 Robust Covariate Shift with Exact that explains in detail how equations in the paper AM Loss Functions Liu translate to code. You are welcome to use any of 11:00 Adversarial Robustness for Aligned the famous platforms like tensorFlow, Keras, AM AI Goodfellow MxNet, CNTK, etc. For more information visit here 11:30 Incomplete Contracting and AI For more information https://www.mltrain.cc/ AM Alignment Hadfeld 01:15 Learning from Human Feedback PM Christiano Schedule 01:45 Finite Supervision Reinforcement PM Learning Saunders, Langlois 09:00 Lessons learned from designing 02:00 Safer Classifcation by Synthesis AM Edward Tran PM Wang 09:45 Tips and tricks of coding papers on 02:15 Aligned AI Poster Session Askell, AM PyTorch Chintala PM Muszynski, Wang, Yang, Nguyen, Low, 10:05 Diferentiable Learning of Logical Jaillet, Schumann, Liu, Eckersley, Wang, AM Rules for Knowledge Base Reasoning Saunders Cohen, Yang 03:30 Machine Learning for Human 10:45 Coding Reinforcement Learning PM Deliberative Judgment Evans AM Papers Zhang 94 Dec. 9, 2017

11:15 A Linear-Time Kernel Goodness-of-Fit about how to efciently AM Test (NIPS best paper) Jitkrittum implement models within dynamic neural 11:35 Imagination-Augmented Agents for networks, AM Deep Reinforcement Learning including minimizing the number of computations Racanière and mini-batching. 11:55 Inductive Representation Learning Then I'll introduce our recently proposed method AM on Large Graphs Hamilton for automatic batching in dynamic networks, which makes it much easier to 12:15 Probabilistic Programming with implement complicated PM PYRO Goodman networks efciently. Code examples for the 12:35 Poster Session implementation will be provided. PM 02:00 Simple and Efcient Implementation PM of Neural Nets with Automatic Operation Batching Neubig Abstract 16: Spotlights in NIPS Highlights (MLTrain), Learn How to code a paper with 02:45 Learning Texture Manifolds with the state of the art frameworks, 05:00 PM PM Periodic Spatial GAN by Nikolay Jetchev , Zalando Vollgraf Ben Athiwaratkun: Bayesian GAN in Pytorch, 03:05 MLPACK, A case study: PM implementing ID3 decision trees to Dhyani Dushyanta: A Convolutional Encoder be as fast as possible Curtin Model for Neural Machine Translation, 03:40 Self-Normalizing Neural Networks PM Unterthiner Forough Arabshahi: Combining Symbolic 04:00 Best of Both Worlds: Transferring Expressions and PM Knowledge from Discriminative Black-box Function Evaluations in Neural Learning to a Generative Visual Programs, Dialog Mode Lu 04:25 Break Jean Kossaif: Tensor Regression Networks with PM TensorLy and MXNet, 05:00 Spotlights PM Joseph Paul Cohen: ShortScience.org - Reproducing Intuition,

Abstracts (2): Kamyar Azizzadenesheli: Efcient Exploration through Bayesian Abstract 10: Simple and Efcient Deep Q-Networks, Implementation of Neural Nets with Automatic Operation Batching in NIPS Ashish Khetan: Learning from noisy, single- Highlights (MLTrain), Learn How to code a labeled data, paper with state of the art frameworks, Neubig 02:00 PM Rose Yu: Long-Term Forecasting using Tensor-Train RNNs In this talk I will talk about how to easily and efciently develop Shayenne da Lu Moura: Melody Transcription neural network System models for complicated problems such as natural language processing Tschannen Michael: Fast Linear Algebra in using dynamic neural networks. First, I will briefy Stacked Strassen Networks explain diferent paradigms in neural networks: static networks (e.g. Yang Shi: Multimodal Compact Bilinear Pooling for TensorFlow), dynamic and Visual Question Answering eager (e.g. PyTorch), and dynamic and lazy (e.g. DyNet). I will discuss 95 Dec. 9, 2017

Yu-Chia Chen: Improved Graph Laplacian via with researchers in natural language processing Geometric Consistency and reinforcement learning as well as neuroscientists and cognitive scientists. Why do we care about disentangling: What are the downstream tasks that can beneft from using Learning Disentangled Features: from disentangled representations? Does the Perception to Control downstream task defne the relevance of the disentanglement to learn? What does

Emily Denton, Siddharth Narayanaswamy, Tejas disentangling get us in terms of improved Kulkarni, Honglak Lee, Diane Bouchacourt, Josh prediction or behavior in intelligent agents? Tenenbaum, David Pfau

203, Sat Dec 09, 08:00 AM Schedule

An important facet of human experience is our ability to break down what we observe and 08:30 Welcome: Josh Tenenbaum interact with, along characteristic lines. Visual AM Tenenbaum scenes consist of separate objects, which may 09:00 Stefano Soatto Soatto have diferent poses and identities within their AM category. In natural language, the syntax and 09:30 Irina Higgins Higgins semantics of a sentence can often be separated AM from one another. In planning and cognition plans 10:00 Finale Doshi-Velez Doshi-Velez can be broken down into immediate and long AM term goals. Inspired by this much research in 10:30 Poster session + Cofee break deep representation learning has gone into AM fnding disentangled factors of variation. 11:00 Doris Tsao Tsao However, this research often lacks a clear AM defnition of what disentangling is or much 11:30 Spotlight talks relation to work in other branches of machine AM learning, neuroscience or cognitive science. In 12:15 Lunch this workshop we intend to bring a wide swathe PM of scientists studying disentangled 02:00 Doina Precup Precup representations under one roof to try to come to PM a unifed view of the problem of disentangling. 02:30 Pushmeet Kohli Kohli The workshop will address these issues through 3 PM focuses: 03:00 Poster session + Cofee break What is disentangling: Are disentangled PM Kågebäck, Melnyk, Karimi, Brunner, representations just the same as statistically Banijamali, Donahue, Zhao, independent representations, or is there Parascandolo, Thomas, Kumar, Burgess, something more? How does disentangling relate Nilsson, Larsson, Eastwood, Peychev to interpretability? Can we defne what it means 03:30 Yoshua Bengio Bengio to separate style and content, or is human PM judgement the fnal arbiter? Are disentangled 04:00 Ahmed Elgammal Elgammal representations the same as equivariant PM representations? 04:30 Final Poster Break How can disentangled representations be PM discovered: What is the current state of the art in 05:00 Panel discussion learning disentangled representations? What are PM the cognitive and neural underpinnings of disentangled representations in animals and humans? Most work in disentangling has focused Abstracts (1): on perception, but we will encourage dialogue 96 Dec. 9, 2017

Abstract 2: Stefano Soatto in Learning 09:00 Can brain data be used to reverse Disentangled Features: from Perception to AM engineer the algorithms of human Control, Soatto 09:00 AM perception? DiCarlo 09:35 and deep learning Emergence of Invariance and Disentangling in AM in the brain Lillicrap Deep Representations 09:55 Algorithms, tools, and progress in AM connectomic reconstruction of neural circuits Jain BigNeuro 2017: Analyzing brain data from 10:15 Multimodal deep learning for natural AM human neural recordings and video nano to macroscale Brunton 10:35 Cofee Break Eva Dyer, Greg Kiar, Will Gray Roncal, , Konrad P Koerding, Joshua T Vogelstein AM 11:00 More Steps towards Biologically 204, Sat Dec 09, 08:00 AM AM Plausible Backprop Bengio 11:40 Panel on "What neural systems can Datasets in neuroscience are increasing in size at AM teach us about building better alarming rates relative to our ability to analyze machine learning systems" Lillicrap, them. This workshop aims at discussing new DiCarlo, Rozell, Jain, Kutz, Gray Roncal, frameworks for processing and making sense of Brunton large neural datasets. 12:20 Lunch Break PM The morning session will focus on approaches for 01:40 Discovery of governing equations processing large neuroscience datasets. PM and biological principles from Examples include: distributed + high- spatio-temporal time-series performance computing, GPU and other hardware recordings Kutz accelerations, spatial databases and other compression schemes used for large 02:15 Large scale calcium imaging data neuroimaging datasets, PM analysis for the 99% Pnevmatikakis approaches for handling large data sizes, 02:35 Machine learning for cognitive randomization and stochastic optimization. PM mapping Varoquaux 02:55 Dealing with clinical heterogeneity The afternoon session will focus on abstractions PM in the discovery of new biomarkers for modelling large neuroscience datasets. of brain disorders Dansereau Examples include graphs, graphical models, 03:15 Morphological error detection for manifolds, mixture models, latent variable PM connectomics Rolnick models, spatial models, and factor learning. 03:25 Poster Spotlights PM In addition to talks and discussions, we plan to 03:45 Poster Session have papers submitted and peer reviewed. PM Workshop “proceedings” will consist of links to 04:30 Deep nets meet real neurons: unpublished arXiv or bioarXiv papers that are of PM pattern selectivity of V4 through exceptional quality and are well aligned with the transfer learning and stability workshop scope. Some accepted papers will also analysis Yu be invited for an oral presentation; the remaining 05:05 Mapping brain structure and authors will be invited to present a poster. PM function with deep learning Calhoun 05:25 bossDB: A Petascale Database for Schedule PM Large-Scale Neuroscience Informing Machine Learning Drenkow

08:40 Opening Remarks Dyer, Gray Roncal AM 97 Dec. 9, 2017

05:45 Machine vision and learning for PM extracting a mechanistic There is a growing interest in HRL methods for understanding of neural structure discovery, planning, and learning, as computation Branson well as HRL systems for shared learning and 06:05 Closing Panel: Analyzing brain data policy deployment. The goal of this workshop is PM from nano to macroscale Gray Roncal, to improve cohesion and synergy among the Dyer research community and increase its impact by promoting better understanding of the challenges and potential of HRL. This workshop further aims to bring together researchers studying both Hierarchical Reinforcement Learning theoretical and practical aspects of HRL, for a joint presentation, discussion, and evaluation of Andrew G Barto, Doina Precup, Shie Mannor, Tom some of the numerous novel approaches to HRL Schaul, Roy Fox, Carlos Florensa developed in recent years.

Grand Ballroom A, Sat Dec 09, 08:00 AM Schedule Reinforcement Learning (RL) has become a powerful tool for tackling complex sequential decision-making problems. It has been shown to 09:00 Opening Remarks Fox train agents to reach super-human capabilities in AM game-playing domains such as Go and Atari. RL 09:10 Deep Reinforcement Learning with can also learn advanced control policies in high- AM Subgoals (David Silver) Silver dimensional robotic systems. Nevertheless, 09:40 Landmark Options Via Refection current RL agents have considerable difculties AM (LOVR) in Multi-task Lifelong when facing sparse rewards, long planning Reinforcement Learning (Nicholas horizons, and more generally a scarcity of useful Denis) Denis supervision signals. Unfortunately, the most 09:50 Crossmodal Attentive Skill Learner valuable control tasks are specifed in terms of AM (Shayegan Omidshafei) Omidshafei high-level instructions, implying sparse rewards 10:00 HRL with gradient-based subgoal when formulated as an RL problem. Internal AM generators, asymptotically optimal spatio-temporal abstractions and memory incremental problem solvers, structures can constrain the decision space, various meta-learners, and improving data efciency in the face of scarcity, PowerPlay (Jürgen Schmidhuber) but are likewise challenging for a supervisor to Schmidhuber teach. 11:00 Meta-Learning Shared Hierarchies Hierarchical Reinforcement Learning (HRL) is AM (Pieter Abbeel) Abbeel emerging as a key component for fnding spatio- 11:30 Best Paper Award and Talk — temporal abstractions and behavioral patterns AM Learning with options that terminate that can guide the discovery of useful large-scale of-policy (Anna Harutyunyan) control architectures, both for deep-network Harutyunyan representations and for analytic and optimal- 11:55 Spotlights & Poster Session Abel, control methods. HRL has the potential to AM Denis, Eckstein, Fruit, Goel, Gruenstein, accelerate planning and exploration by Harutyunyan, Klissarov, Kong, Kumar, identifying skills that can reliably reach desirable Kumar, Liu, McNamee, Omidshafei, Pitis, future states. It can abstract away the details of Rauber, Roderick, Shu, Wang, Zhang low-level controllers to facilitate long-horizon 12:30 Lunch Break planning and meta-learning in a high-level PM feature space. Hierarchical structures are 01:30 Hierarchical Imitation and modular and amenable to separation of training PM Reinforcement Learning for Robotics eforts, reuse, and transfer. By imitating a core (Jan Peters) Peters principle of human cognition, hierarchies hold promise for interpretability and explainability. 98 Dec. 9, 2017

02:00 Deep Abstract Q-Networks (Melrose diseases in medical applications). In this PM Roderick) Roderick workshop we focus on techniques for few sample 02:10 Federated Control with Hierarchical learning and using weaker supervision when PM Multi-Agent Deep Reinforcement large unlabeled datasets are available, as well as Learning (Saurabh Kumar) Kumar theory associated with both. 02:20 Efective Master-Slave PM Communication On A Multi-Agent One increasingly popular approach is to use Deep Reinforcement Learning weaker forms of supervision—i.e. supervision that System (Xiangyu Kong) Kong is potentially noisier, biased, and/or less precise. An overarching goal of such approaches is to use 02:30 Sample efciency and of policy domain knowledge and resources from subject PM hierarchical RL (Emma Brunskill) matter experts, but to solicit it in higher-level, Brunskill lower-fdelity, or more opportunistic ways. 03:00 Cofee Break Examples include higher-level abstractions such PM as heuristic labeling rules, feature annotations, 03:30 Applying variational information constraints, expected distributions, and PM bottleneck in hierarchical domains generalized expectation criteria; noisier or biased (Matt Botvinick) Botvinick labels from distant supervision, crowd workers, 04:00 Progress on Deep Reinforcement and weak classifers; data augmentation PM Learning with Temporal Abstraction strategies to express class invariances; and (Doina Precup) Precup potentially mismatched training data such as in 04:30 Panel Discussion Botvinick, Brunskill, multitask and transfer learning settings. PM Campos, Peters, Precup, Silver, Tenenbaum, Fox Along with practical methods and techniques for 05:30 Poster Session Abel, Denis, Eckstein, dealing with limited labeled data settings, this PM Fruit, Goel, Gruenstein, Harutyunyan, workshop will also focus on the theory of learning Klissarov, Kong, Kumar, Kumar, Liu, in this general setting. Although several classic McNamee, Omidshafei, Pitis, Rauber, techniques in the statistical learning theory exist Roderick, Shu, Wang, Zhang which handle the case of few samples and high dimensions, extending these results for example to the recent success of deep learning is still a challenge. How can the theory or the techniques Learning with Limited Labeled Data: Weak that have gained success in deep learning be Supervision and Beyond adapted to the case of limited labeled data? How can systems designed (and potentially deployed) Isabelle Augenstein, Stephen Bach, Eugene for large scale learning be adapted to small data Belilovsky, Matthew Blaschko, Christoph Lampert, Edouard Oyallon, Emmanouil Antonios Platanios, settings? What are efcient and practical ways to Alexander Ratner, Chris Ré incorporate prior knowledge?

Grand Ballroom B, Sat Dec 09, 08:00 AM This workshop will focus on highlighting both practical and theoretical aspects of learning with Modern representation learning techniques like limited labeled data, including but not limited to deep neural networks have had a major impact topics such as: both within and beyond the feld of machine - Learning from noisy labels learning, achieving new state-of-the-art - “Distant” or heuristic supervision performances with little or no - Non-standard labels such as feature on a vast array of tasks. However, these gains annotations, distributions, and constraints are often difcult to translate into real-world -Data augmentation and/or the use of simulated settings as they require massive hand-labeled data training sets. And in the vast majority of real- - Frameworks that can tackle both very few world settings, collecting such training sets by samples and settings with more data without hand is infeasible due to the cost of labeling data extensive intervention. or the paucity of data in a given domain (e.g. rare - Efective and practical techniques for 99 Dec. 9, 2017

incorporating domain knowledge 04:45 Invited Talk: Sameer Singh, "That - Applications of machine learning for small data PM Doesn't Make Sense! A Case Study problems in medical images and industry in Actively Annotating Model Explanations" 05:15 Contributed Talk 3: Local Afne Schedule PM Approximators of Deep Neural Nets for Improving Knowledge Transfer 08:30 Welcome & Opening Remarks 05:30 Contributed Talk 4: Co-trained AM PM Ensemble Models for Weakly 08:40 Invited Talk: "Tales from fMRI: Supervised Cyberbullying Detection AM Learning from limited labeled data" 05:45 Invited Talk: What’s so Hard About Varoquaux PM Natural Language Understanding? 09:10 Invited Talk: Learning from Limited Ritter AM Labeled Data (But a Lot of Unlabeled 06:15 Closing Remarks & Awards Data) Mitchell PM 09:40 Contributed Talk 1: "Smooth AM Neighbors on Teacher Graphs for Semi-supervised Learning" Deep Learning: Bridging Theory and 09:55 1-minute Poster Spotlights (Session Practice AM #1)

10:15 Poster Sessions Forster, Inouye, Sanjeev Arora, Maithra Raghu, Russ Salakhutdinov, AM Srivastava, De Cock, Sharma, Kozinski, Ludwig Schmidt, Oriol Vinyals Babkin, he, Cui, Rao, Raskar, Das, Zhao, Lanka Hall A, Sat Dec 09, 08:00 AM 11:00 Invited Talk: "Light Supervision of The past fve years have seen a huge increase in AM Structured Prediction Energy the capabilities of deep neural networks. Networks" McCallum Maintaining this rate of progress however, faces 11:30 Invited Talk: "Forcing Neural Link some steep challenges, and awaits fundamental AM Predictors to Play by the Rules", insights. As our models become more complex, Sebastian Riedel and venture into areas such as unsupervised 12:00 Lunch learning or reinforcement learning, designing PM improvements becomes more laborious, and 02:00 Panel: Limited Labeled Data in success can be brittle and hard to transfer to new PM Medical Imaging Rubin, Lungren settings. 02:30 1-minute Poster Spotlights (Session PM #2) This workshop seeks to highlight recent works 02:50 Poster Session / Cofee Break Ren, that use *theory as well as systematic PM Lundquist, Hickson, Dubey, Shinoda, experiments* to isolate the fundamental Marasović, Stretcu, Reda, Raunak, dos questions that need to be addressed in deep Santos, Canas, Mager Hois, Hirzel learning. These have helped fesh out core 03:30 Invited Talk: Sample and questions on topics such as generalization, PM Computationally Efcient Active adversarial robustness, large batch training, Learning Algorithms Balcan generative adversarial nets, and optimization, 04:00 Contributed Talk 2: "EZLearn: and point towards elements of the theory of deep PM Exploiting Organic Supervision in learning that is expected to emerge in the future. Large-Scale Data Annotation" 04:15 Invited Talk: Overcoming Limited PM Data with GANs Goodfellow The workshop aims to enhance this confuence of theory and practice, highlighting infuential work with these methods, future open directions, and core fundamental problems. There will be an 100 Dec. 9, 2017

emphasis on discussion, via panels and round Bridging Theory and Practice of GANs tables, to identify future research directions that are promising and tractable. Abstract 4: Spotlights 1 in Deep Learning: Schedule Bridging Theory and Practice, 09:45 AM 1) Generalization in deep nets: the role of 08:35 Opening Remarks distance from initialization AM 2) Entropy-SG(L)D optimizes the prior of a (valid) PAC-Bayes bound 08:45 Invited Talk #1 (Yoshua Bengio) 3) Large Batch Training of DNNs with Layer-wise AM Adaptive Rate Scaling 09:15 Invited Talk #2 (Ian Goodfellow) AM 09:45 Spotlights 1 AM Abstract 5: Invited Talk #3 (Peter Bartlett) in Deep Learning: Bridging Theory and 10:00 Invited Talk #3 (Peter Bartlett) Practice, 10:00 AM AM 10:30 Cofee Generalization in Deep Networks AM 11:00 Invited Talk #4 (Doina Precup) AM Abstract 7: Invited Talk #4 (Doina Precup) in 11:30 Spotlights 2 Deep Learning: Bridging Theory and AM Practice, 11:00 AM 11:45 Poster Session 1 and Lunch Dathathri, AM Rangamani, Sharma, RoyChowdhury, Experimental design in Deep Reinforcement Advani, Guss, Yun, Hardy, Alberti, Learning Sachan, Veit, Shinozaki, Chin 01:30 Invited Talk #5 (Percy Liang) PM Abstract 8: Spotlights 2 in Deep Learning: 02:00 Contributed Talks 1,2,3,4 Bridging Theory and Practice, 11:30 AM PM 1) Measuring robustness of NNs via Minimal 03:00 Poster Session 2 Adversarial Examples PM 2) A classifcation based perspective on GAN- 04:00 Invited Talk #6 (Sham Kakade) distributions PM 3) Learning one hidden layer neural nets with 04:30 Panel landscape design PM

Abstracts (9): Abstract 10: Invited Talk #5 (Percy Liang) in Deep Learning: Bridging Theory and Abstract 2: Invited Talk #1 (Yoshua Bengio) Practice, 01:30 PM in Deep Learning: Bridging Theory and Practice, 08:45 AM Fighting Black Boxes, Adversaries, and Bugs in Deep Learning Generalization, Memorization and SGD

Abstract 11: Contributed Talks 1,2,3,4 in Abstract 3: Invited Talk #2 (Ian Goodfellow) Deep Learning: Bridging Theory and in Deep Learning: Bridging Theory and Practice, 02:00 PM Practice, 09:15 AM 101 Dec. 9, 2017

1) Don't Decay the Learning Rate, Increase the Extending on last year’s workshop’s success, this Batch Size workshop will again study the advantages and 2) Meta-Learning and Universality: Deep disadvantages of such ideas, and will be a Representations and Gradient Descent Can platform to host the recent fourish of ideas using Approximate Any Learning Algorithm Bayesian approaches in deep learning and using 3) Hyperparameter Optimization: A Spectral deep learning tools in Bayesian modelling. The Approach program includes a mix of invited talks, 4) Learning Implicit Generative Models with contributed talks, and contributed posters. It will Method of Learned Moments be composed of fve main themes: deep generative models, variational inference using neural network recognition models, practical approximate inference techniques in Bayesian Abstract 13: Invited Talk #6 (Sham Kakade) in Deep Learning: Bridging Theory and neural networks, applications of Bayesian neural Practice, 04:00 PM networks, and information theory in deep learning. Future directions for the feld will be Towards Bridging Theory and Practice in DeepRL debated in a panel discussion.

Topics: Probabilistic deep models for classifcation and Bayesian Deep Learning regression (such as extensions and application of Bayesian neural networks), Yarin Gal, José Miguel Hernández-Lobato, Christos Generative deep models (such as variational Louizos, Andrew Wilson, Diederik Kingma, Zoubin autoencoders), Ghahramani, Kevin Murphy, Max Welling Incorporating explicit prior knowledge in deep learning (such as posterior regularization with Hall C, Sat Dec 09, 08:00 AM logic rules), Approximate inference for Bayesian deep While deep learning has been revolutionary for learning (such as variational Bayes / expectation machine learning, most modern deep learning propagation / etc. in Bayesian neural networks), models cannot represent their uncertainty nor Scalable MCMC inference in Bayesian deep take advantage of the well studied tools of models, probability theory. This has started to change Deep recognition models for variational inference following recent developments of tools and (amortized inference), techniques combining Bayesian approaches with Model uncertainty in deep learning, deep learning. The intersection of the two felds Bayesian deep reinforcement learning, has received great interest from the community Deep learning with small data, over the past few years, with the introduction of Deep learning in Bayesian modelling, new deep learning models that take advantage of Probabilistic semi-supervised learning techniques, Bayesian techniques, as well as Bayesian models Active learning and Bayesian optimization for that incorporate deep learning elements [1-11]. experimental design, In fact, the use of Bayesian techniques in deep Applying non-parametric methods, one-shot learning can be traced back to the 1990s’, in learning, and Bayesian deep learning in general, seminal works by Radford Neal [12], David Implicit inference, MacKay [13], and Dayan et al. [14]. These gave Kernel methods in Bayesian deep learning. us tools to reason about deep models’ confdence, and achieved state-of-the-art performance on many tasks. However earlier References: tools did not adapt when new needs arose (such [1] - Kingma, DP and Welling, M, ‘’Auto-encoding as scalability to big data), and were consequently variational bayes’’, 2013. forgotten. Such ideas are now being revisited in [2] - Rezende, D, Mohamed, S, and Wierstra, D, light of new advances in the feld, yielding many ‘’Stochastic backpropagation and approximate exciting new results. inference in deep generative models’’, 2014. [3] - Blundell, C, Cornebise, J, Kavukcuoglu, K, 102 Dec. 9, 2017

and Wierstra, D, ‘’Weight uncertainty in neural 09:55 Discussion over cofee and poster network’’, 2015. AM session 1 [4] - Hernandez-Lobato, JM and Adams, R, 10:55 Stochastic Gradient Descent as ’’Probabilistic backpropagation for scalable AM Approximate Bayesian Inference learning of Bayesian neural networks’’, 2015. 11:20 TBD 2 [5] - Gal, Y and Ghahramani, Z, ‘’Dropout as a AM Bayesian approximation: Representing model 11:35 TBD 2.5 Kalchbrenner uncertainty in deep learning’’, 2015. AM [6] - Gal, Y and Ghahramani, G, ‘’Bayesian 12:00 Lunch convolutional neural networks with Bernoulli PM approximate variational inference’’, 2015. 01:35 Deep Kernel Learning Salakhutdinov [7] - Kingma, D, Salimans, T, and Welling, M. PM ‘’Variational dropout and the local reparameterization trick’’, 2015. 02:00 TBD 3 [8] - Balan, AK, Rathod, V, Murphy, KP, and PM Welling, M, ‘’Bayesian dark knowledge’’, 2015. 02:15 Bayes by Backprop Fortunato [9] - Louizos, C and Welling, M, “Structured and PM Efcient Variational Deep Learning with Matrix 02:40 Discussion over cofee and poster Gaussian Posteriors”, 2016. PM session 2 [10] - Lawrence, ND and Quinonero-Candela, J, 03:35 How do the Deep Learning layers “Local distance preservation in the GP-LVM PM converge to the Information through back constraints”, 2006. Bottleneck limit by Stochastic [11] - Tran, D, Ranganath, R, and Blei, DM, Gradient Descent? Tishby “Variational Gaussian Process”, 2015. 04:00 Panel Session Lawrence, Doshi-Velez, [12] - Neal, R, ‘’Bayesian Learning for Neural PM Ghahramani, LeCun, Welling, Teh, Networks’’, 1996. Winther [13] - MacKay, D, ‘’A practical Bayesian 05:00 Poster session Zheng, Rudner, Tegho, framework for backpropagation networks‘’, 1992. PM McClure, Tang, D'CRUZ, Gamboa Higuera, [14] - Dayan, P, Hinton, G, Neal, R, and Zemel, S, Seelamantula, Arias Figueroa, Berlin, ‘’The Helmholtz machine’’, 1995. Voisin, Amini, Doan, Hu, Botev, [15] - Wilson, AG, Hu, Z, Salakhutdinov, R, and Sünderhauf, ZHANG, Lambert Xing, EP, “Deep Kernel Learning”, 2016. [16] - Saatchi, Y and Wilson, AG, “Bayesian GAN”, 2017. [17] - MacKay, D.J.C. “Bayesian Methods for Workshop on Meta-Learning Adaptive Models”, PhD thesis, 1992. Roberto Calandra, Frank Hutter, Hugo Larochelle, Sergey Levine Schedule Hyatt Beacon Ballroom D+E+F+H, Sat Dec 09, 08:00 AM

08:05 Deep Probabilistic Programming Tran Recent years have seen rapid progress in meta- AM learning methods, which learn (and optimize) the 08:30 TBD 1 performance of learning methods based on data, AM generate new learning methods from scratch, 08:45 Automatic Model Selection in BNNs and learn to transfer knowledge across tasks and AM with Horseshoe Priors Doshi-Velez domains. Meta-learning can be seen as the logical conclusion of the arc that machine 09:10 Deep Bayes for Distributed learning has undergone in the last decade, from AM Learning, Uncertainty Quantifcation learning classifers, to learning representations, and Compression Welling and fnally to learning algorithms that themselves 09:40 Poster spotlights acquire representations and classifers. The AM ability to improve one’s own learning capabilities 103 Dec. 9, 2017

through experience can also be viewed as a who study human learning, to provide a broad hallmark of intelligent beings, and there are perspective to the attendees. strong connections with work on human learning in neuroscience. Schedule Meta-learning methods are also of substantial practical interest, since they have, e.g., been 08:30 Introduction and opening remarks shown to yield new state-of-the-art automated AM Calandra machine learning methods, novel deep learning 08:40 Learning to optimize with architectures, and substantially improved one- AM reinforcement learning Malik shot learning systems. 09:10 Informing the Use of AM Hyperparameter Optimization Some of the fundamental questions that this Through Metalearning Giraud-Carrier workshop aims to address are: 09:40 Poster Spotlight - What are the fundamental diferences in the AM learning “task” compared to traditional “non- meta” learners? 10:00 Poster session (and Cofee Break) - Is there a practical limit to the number of meta- AM Andreas, Li, Vercellino, Miconi, Zhang, learning layers (e.g., would a meta-meta-meta- Franceschi, Xiong, Ahmed, Itti, Klinger, learning algorithm be of practical use)? Rohaninejad - How can we design more sample-efcient meta- 11:00 Invited talk: Jane Wang learning methods? AM - How can we exploit our domain knowledge to 11:30 Model-Agnostic Meta-Learning: efectively guide the meta-learning process? AM Universality, Inductive Bias, and - What are the meta-learning processes in nature Weak Supervision Finn (e.g, in humans), and how can we take inspiration 01:30 Learn to learn high-dimensional from them? PM models from few examples - Which ML approaches are best suited for meta- Tenenbaum learning, in which circumstances, and why? 02:00 Multiple Adaptive Bayesian Linear - What principles can we learn from meta- PM Regression for Scalable Bayesian learning to help us design the next generation of Optimization with Warm Start learning systems? 02:15 Learning to Model the Tail PM The goal of this workshop is to bring together 02:30 Poster session (and Cofee Break) researchers from all the diferent communities PM and topics that fall under the umbrella of meta- 03:30 Meta Unsupervised Learning Vinyals learning. We expect that the presence of these PM diferent communities will result in a fruitful 04:00 Panel Discussion exchange of ideas and stimulate an open PM discussion about the current challenges in meta- learning, as well as possible solutions.

Abstracts (2): In terms of prospective participants, our main targets are machine learning researchers Abstract 7: Model-Agnostic Meta-Learning: interested in the processes related to Universality, Inductive Bias, and Weak understanding and improving current meta- Supervision in Workshop on Meta-Learning, learning algorithms. Specifc target communities Finn 11:30 AM within machine learning include, but are not limited to: meta-learning, optimization, deep Meta-learning holds the promise of enabling learning, reinforcement learning, evolutionary machine learning systems to replace manual computation, Bayesian optimization and AutoML. engineering of hyperparameters and Our invited speakers also include researchers architectures, efectively reuse data across tasks, and quickly adapt to unexpected scenarios. In 104 Dec. 9, 2017

this talk, I will present a unifed view of the meta- between physical or biological variables) is a learning problem, discussing how a variety of prerequisite for building meaningful new scientifc approaches attempt to solve the problem, and hypotheses. Without such understanding and the when we might prefer some approaches over possibility of verifcation that the model has others. Further, I will discuss interesting learned something meaningful (e.g. obeying the theoretical and empirical properties of the model- known physical or biological laws), even the best agnostic meta-learning algorithm. Finally, I will predictor is of no use for scientifc purposes. conclude by showing new results on learning to Finally, also from the perspective of a deep learn from weak supervision with applications in learning engineer, being able to visualize what imitation learning on a real robot and human-like the model has (or has not) learned is valuable as concept acquisition. it allows to improve current models by e.g. identifying biases in the data or the training procedure, or by comparing the strengths and Abstract 12: Meta Unsupervised Learning in weaknesses of diferent architectures. Workshop on Meta-Learning, Vinyals 03:30 PM Not surprisingly, the problem of visualizing and In this talk I'll cover some recent work on few understanding neural networks has recently shot learning which we did at DeepMind. I'll received a lot of attention in the community. describe how the work in MANN and Matching Various techniques for interpreting deep neural Networks infuenced our most recent work on few networks have been proposed and several shot learning for distributions, "Few-shot workshops have been organized on related Autoregressive Density Estimation: Towards topics. However, the theoretical foundations of Learning to Learn Distributions". the interpretability problem are yet to be investigated and the usefulness of the proposed methods in practice still needs to be demonstrated. Interpreting, Explaining and Visualizing Deep Learning - Now what ? Our NIPS 2017 Workshop “Interpreting, Explaining and Visualizing Deep Learning – Now

Klaus-Robert Müller, Andrea Vedaldi, Lars K Hansen, what?” aims to review recent techniques and Wojciech Samek, Grégoire Montavon establish new theoretical foundations for interpreting and understanding deep learning Hyatt Hotel, Regency Ballroom A+B+C, Sat Dec 09, 08:00 AM models. However, it will not stop at the methodological level, but also address the “now Machine learning has become an indispensable what?” question. This strong focus on the tool for a number of tasks ranging from the applications of interpretable methods in deep detection of objects in images to the learning distinguishes this workshop from understanding of natural languages. While these previous events as we aim to take the next step models reach impressively high predictive by exploring and extending the practical accuracy, they are often perceived as black- usefulness of Interpreting, Explaining and boxes, and it is not clear what information in the Visualizing in Deep Learning. Also with this input data is used for predicting. In sensitive workshop we aim to identify new felds of applications such as medical diagnosis or self- applications for interpretable deep learning. Since driving cars, where a single incorrect prediction the workshop will host invited speakers from can be very costly, the reliance of the model on various application domains (computer vision, the right features must be guaranteed. This NLP, neuroscience, medicine), it will provide an indeed lowers the risk that the model behaves opportunity for participants to learn from each erroneously in presence of novel factors of other and initiate new interdisciplinary variation in the test data. Furthermore, collaborations. The workshop will contain invited interpretability is instrumental when applying research talks, short methods and applications machine learning to the sciences, as the detailed talks, a poster and demonstration session and a understanding of the trained model (e.g., what panel discussion. A selection of accepted papers features it uses to capture the complex relations together with the invited contributions will be 105 Dec. 9, 2017

published in an edited book by Springer LNCS in Optimal Transport and Machine Learning order to provide a representative overview of recent activities in this emerging research feld. Olivier Bousquet, Marco Cuturi, Gabriel Peyré, Fei Sha, Justin Solomon

Schedule Hyatt Hotel, Seaview Ballroom, Sat Dec 09, 08:00 AM

Optimal transport (OT) is gradually establishing 08:15 Opening Remarks Müller itself as a powerful and essential tool to compare AM probability measures, which in machine learning 08:45 Invited Talk 1 Kim take the form of point clouds, histograms, bags- AM of-features, or more generally datasets to be 09:15 Invited Talk 2 Batra compared with probability densities and AM generative models. OT can be traced back to 09:45 Methods 1 Montavon, Tsang, Ancona early work by Monge, and later to Kantorovich AM and Dantzig during the birth of linear 10:30 Cofee Break (morning) programming. The mathematical theory of OT AM has produced several important developments 11:00 Methods 2 Kindermans since the 90's, crowned by Cédric Villani's Fields AM Medal in 2010. OT is now transitioning into more applied spheres, including recent applications to 11:15 Invited Talk 3 Hochreiter machine learning, because it can tackle AM challenging learning scenarios including 11:45 Posters 1 Lewis, Khalifa Bashier Babiker, dimensionality reduction, structured prediction AM Qi, Rieger, Xie, Dabek, problems that involve histograms, and estimation NAGASUBRAMANIAN, Zhou, Hupkes, of generative models in highly degenerate, high- CHANG, Douglas, Ceolini, Doran, Liu, Li, dimensional problems. This workshop will follow Goebel that organized 3 years ago (NIPS 2014) and will 12:15 Lunch seek to amplify that trend. We will provide the PM audience with an update on all of the very recent 01:15 Posters 2 successes brought forward by efcient solvers PM and innovative applications through a long list of 01:45 Invited Talk 4 Nguyen invited talks. We will add to that a few PM contributed presentations (oral, and, if needed 02:15 Invited Talk 5 Lee posters) and, fnally, a panel for all invited PM speakers to take questions from the audience and 02:45 Applications 1 Samek formulate more nuanced opinions on this nascent PM feld. 03:00 Cofee Break (afternoon) PM Schedule 03:30 Applications 2 Greydanus PM 03:45 Invited Talk 6 Caruana 08:00 Structured Optimal Transport (with PM AM T. Jaakkola, S. Jegelka) Alvarez-Melis 04:15 Invited Talk 7 Darrell 08:20 Approximate Bayesian computation PM AM with the Wasserstein distance Jacob 04:45 Closing Remarks Hansen 09:00 Gradient fow in the Wasserstein PM AM metric Craig 09:40 Approximate inference with AM Wasserstein gradient fows (with T. Poggio) Frogner 106 Dec. 9, 2017

10:00 6 x 3 minutes spotlights Flamary, can be used in the setting of dependent data AM Chen, Rujeerapaiboon, Adler, Lee, such as time series, and how approximations of Roberts the Wasserstein distance allow in practice the 11:00 Optimal planar transport in near- method to scale to large datasets. In particular, AM linear time Andoni we propose a new approximation to the optimal 11:40 Laplacian operator and Brownian assignment problem using the Hilbert space- AM motions on the Wasserstein space flling curve. The approach is illustrated on Gangbo various examples including i.i.d. data and time series. 01:40 Geometrical Insights for PM Unsupervised Learning Bottou 02:20 Improving GANs Using Optimal PM Transport (with H. Zhang, A. Abstract 3: Gradient fow in the Wasserstein Radford, D. Metaxas) Salimans metric in Optimal Transport and Machine 02:40 Overrelaxed Sinkhorn-Knopp Learning, Craig 09:00 AM PM Algorithm for Regularized Optimal Optimal transport not only provides powerful Transport (with L. Chizat, C. Dossal, techniques for comparing probability measures, N. Papadakis) THIBAULT but also for analyzing their evolution over time. 03:30 Domain adaptation with optimal For a range of partial diferential equations PM transport : from mapping to learning arising in physics, biology, and engineering, with joint distribution Flamary solutions are gradient fows in the Wasserstein 04:10 Sharp asymptotic and fnite-sample metric: each equation has a notion of energy for PM rates of convergence of empirical which solutions dissipate energy as quickly as measures in Wasserstein distance possible, with respect to the Wasserstein Bach structure. Steady states of the equation 04:50 7 x 3 minutes spotlights Cazelles, correspond to minimizers of the energy, and PM Genevay, Mena, Brauer, Fischer, Petzka, stability properties of the equation translate into Seguy, Rolet, Sonoda convexity properties of the energy. In this talk, I 05:10 short Q&A session with plenary will compare Wasserstein gradient fow with more PM speakers classical gradient fows arising in optimization 05:30 Closing session and machine learning. I’ll then introduce a class PM of particle blob methods for simulating Wasserstein gradient fows numerically.

Abstracts (9):

Abstract 2: Approximate Bayesian Abstract 5: 6 x 3 minutes spotlights in computation with the Wasserstein distance Optimal Transport and Machine Learning, in Optimal Transport and Machine Learning, Flamary, Chen, Rujeerapaiboon, Adler, Lee, Jacob 08:20 AM Roberts 10:00 AM

A growing range of generative statistical models 1. Nicolas Courty, Rémi Flamary and Mélanie prohibits the numerical evaluation of their Ducofe. Learning Wasserstein Embeddings likelihood functions. Approximate Bayesian 2. Yongxin Chen, Tryphon Georgiou and Allen computation has become a popular approach to Tannenbaum. Optimal transport for Gaussian overcome this issue, simulating synthetic data mixture models given parameters and comparing summaries of 3. Napat Rujeerapaiboon, Kilian Schindler, Daniel these simulations with the corresponding Kuhn and Wolfram Wiesemann. Size Matters: observed values. We propose to avoid these Cardinality-Constrained Clustering and Outlier summaries and the ensuing loss of information Detection via Conic Optimization through the use of Wasserstein distances 4. Jonas Adler, Axel Ringh, Ozan Öktem and Johan between empirical distributions of observed and Karlsson. Learning to solve inverse problems synthetic data. We describe how the approach using Wasserstein loss 5. John Lee, Adam Charles, Nicholas Bertrand and 107 Dec. 9, 2017

Christopher Rozell. An Optimal Transport Tracking Abstract 8: Geometrical Insights for Regularizer Unsupervised Learning in Optimal Transport 6. Lucas Roberts, Leo Razoumov, Lin Su and and Machine Learning, Bottou 01:40 PM Yuyang Wang. Gini-regularized Optimal Transport with an Application in Spatio-Temporal After arguing that choosing the right probability Forecasting distance is critical for achieving the elusive goals of unsupervised learning, we compare the geometric properties of the two currently most promising distances: (1) the earth-mover Abstract 6: Optimal planar transport in near- distance, and (2) the energy distance, also known linear time in Optimal Transport and as maximum mean discrepancy. These insights Machine Learning, Andoni 11:00 AM allow us to give a fresh viewpoint on reported experimental results and to risk a couple We show how to compute the Earth Mover predictions. Joint work with Leon Bottou, Martin Distance between two planar sets of size N in N^{1+o(1)} time. The algorithm Arjovsky, David Lopez-Paz, and Maxime Oquab. is based on a generic framework that decomposes the natural Linear Programming formulation Abstract 11: Domain adaptation with optimal for the transport problem into a tree of smaller transport : from mapping to learning with LPs, and recomposes joint distribution in Optimal Transport and it in a divide-and-conquer fashion. The main Machine Learning, Flamary 03:30 PM enabling idea is use This presentation deals with the unsupervised *sketching* -- a generalization of the dimension domain adaptation problem, where one wants to reduction method -- in order to reduce the size of the "partial estimate a prediction function f in a given target computation" so that the domain without any labeled sample by exploiting conquer step is more efcient. We will conclude the knowledge available from a source domain where labels are known. with some open After a short introduction of recent developent in questions in the area. domain adaptation and their relation to optimal This is joint work with Aleksandar Nikolov, transport we will present a method that Krzysztof Onak, and estimates a barycentric mapping between the feature distributions in order to adapt the training Grigory Yaroslavtsev. dataset prior to learning. Next we propose a novel method that model with optimal transport the transformation between the Abstract 7: Laplacian operator and Brownian joint feature/labels space distributions of the two motions on the Wasserstein space in domains. We aim at recovering an estimated Optimal Transport and Machine Learning, target distribution ptf=(X,f(X)) by optimizing Gangbo 11:40 AM simultaneously the optimal coupling and f. We We endow the space of probability measures on $ discuss the generalization of the proposed \mathbb{R}^d$ with $\Delta_w$, a Laplacian method, and provide an efcient algorithmic solution. The versatility of the approach, both in operator. terms of class of hypothesis or loss functions is A Brownian motion is shown to be consistent with the Laplacian operator. The smoothing demonstrated with real world classifcation, efect of the heat equation is established for a regression problems and large datasets where class of functions. Special perturbations of stochastic approaches become necessary. the Laplacian operator, denoted $\Delta_{w, Joint work with Nicolas COURTY, Devis TUIA, \epsilon}$, appearing in Mean Field Games theory, are considered (Joint work with Y. T. Amaury HABRARD, and Alain RAKOTOMAMONJY Chow). 108 Dec. 9, 2017

Abstract 12: Sharp asymptotic and fnite- analysis of denoising autoencoders: a novel sample rates of convergence of empirical method for analyzing deep neural networks measures in Wasserstein distance in Optimal Transport and Machine Learning, Bach 04:10 PM Collaborate & Communicate: An The Wasserstein distance between two probability measures on a metric space is a exploration and practical skills workshop measure of closeness with applications in that builds on the experience of AIML statistics, probability, and machine learning. In experts who are both successful this work, we consider the fundamental question collaborators and great communicators. of how quickly the empirical measure obtained fromnindependent samples from μ approaches μ Katherine Gorman in the Wasserstein distance of any order. We prove sharp asymptotic and fnite-sample results Hyatt Hotel, Shoreline, Sat Dec 09, 08:00 AM for this rate of convergence for general measures on general compact metric spaces. Our fnite- For many in the sciences, collaboration is a given, sample results show the existence of multi-scale or at least a given assumption. The feld of AIML behavior, where measures can exhibit radically is no diferent, and collaboration across felds and diferent rates of convergence as n grows. See disciplines has long been a source of data and more details in: J. Weed, F. Bach. Sharp funding. But for many, efective collaboration can asymptotic and fnite-sample ratesof be confounding, and for those who have never convergence of empirical measures in worked with someone from a diferent feld, it can Wasserstein distance. Technical Report, be confusing and daunting. Arxiv-1707.00087, 2017. Good collaboration requires good communication, but more fundamentally, clear communication is a core skillset for anyone. It takes practice, and in Abstract 13: 7 x 3 minutes spotlights in highly specialized felds, it is often subject to an Optimal Transport and Machine Learning, all-too-common malady: the curse of knowledge. Cazelles, Genevay, Mena, Brauer, Fischer, Petzka, The curse of knowledge happens when experts in Seguy, Rolet, Sonoda 04:50 PM a feld, communicating within their feld, begin to 1. Jérémie Bigot, Elsa Cazelles and Nicolas make assumptions about the knowledge and Papadakis. Central limit theorems for Sinkhorn understanding of their audience and begin to divergence between probability distributions on overlook the fundamentals of clear fnite spaces. communication. They do this because for an 2. Aude Genevay. Learning Generative Models audience of their peers, they seem to become with Sinkhorn Divergences less necessary, while short cuts like jargon seem 3. Gonzalo Mena, David Belanger, Gonzalo Muñoz to make communication faster and more efcient. and Jasper Snoek. Sinkhorn Networks: Using But today, clear communication around issues Optimal Transport Techniques to Learn and techniques in machine intelligence work is Permutations crucial not only within the community, but also to 4. Christoph Brauer, Christian Clason, Dirk Lorenz foster collaboaration across disciplines, and and Benedikt Wirth. A Sinkhorn-Newton method between the community and the lay public. for entropic optimal transport 5. Henning Petzka, Asja Fischer and Denis In this workshop we will explore stories of Lukovnikov. On the regularization of Wasserstein success in both of these topics through a series GANs of short talks by pairs of collaborators followed by 6. Vivien Seguy, Bharath Bhushan Damodaran, a panel discussion. We will then apply some of Rémi Flamary, Nicolas Courty, Antoine Rolet and the principles from these areas by exploring Mathieu Blondel. Large Scale Optimal Transport practical models and skills to aid in both. and Mapping Estimation 7. Sho Sonoda and Noboru Murata. Transportation We intend to work with partners to organize 109 Dec. 9, 2017

attendees at the beginning of the workshop to dig into other topic including open innovation, create real opportunities for good collaborations collaborative challenges (coopetitions), platform among them and to inspire them to consider how interoperability, and tool mutualisation. they might work together. The audience of this workshop is targeted to workshop organizers, participants, and anyone Attendees will leave with resources to help them with scientifc problem involving machine more efectively collaborate and to help them learning, which may be formulated as a organize and communicate about their own work. challenge. The emphasis of the workshop is on challenge design. Hence it complements nicely the workshop on the NIPS 2017 competition track Schedule and will help paving the way toward next year's competition program. 09:00 DUE TO LOW ATTENDANCE THIS AM WORKSHOP IS CANCELED Schedule

08:00 Introduction - Isabelle Guyon and Machine Learning Challenges as a AM Evelyne Viegas Guyon Research Tool 08:10 Baázs Kégl, RAMP platform Kégl AM Isabelle Guyon, Evelyne Viegas, Sergio Escalera, Jacob D Abernethy 08:40 Automatic evaluation of chatbots - AM Varvara Logacheva S1, Sat Dec 09, 08:00 AM 09:10 TrackML - David Rousseau AM Challenges in machine learning and data science 09:40 Data Science Bowl - Andre Farris are competitions running over several weeks or AM Farris months to resolve problems using provided 10:10 CrowdAI - Mohanty Sarada datasets or simulated environments. The playful AM nature of challenges naturally attracts students, 10:40 Break, poster viewing 1 making challenge a great teaching resource. For AM this fourth edition of the CiML workshop at NIPS we want to explore the impact of machine 11:00 Ben Hamner, Kaggle platform Hamner learning challenges as a research tool. The AM workshop will give a large part to discussions 11:30 Incentivizing productivity in data around several axes: (1) benefts and limitations AM science - Jacob Abernethy of challenges as a research tool; (2) methods to 12:20 Lunch Break, poster viewing 2 induce and train young researchers; (3) PM experimental design to foster contributions that 12:20 Establishing Uniform Image will push the state of the art. PM Segmentation - Maximilian Chen and CiML is a forum that brings together workshop Michael C. Darling Chen, Darling organizers, platform providers, and participants 01:30 Welcome - Isabelle Guyon and to discuss best practices in challenge PM Evelyne Viegas organization and new methods and application 01:40 Project Malmo, Minecraft - Katja opportunities to design high impact challenges. PM Hofmann Following the success of last year's workshop, in 02:10 Project Alloy - Laura Seaman and which a fruitful exchange led to many PM Israel Ridgley Ridgley, Seaman innovations, we propose to reconvene and 02:40 Education and public service - discuss new opportunities for challenges as a PM Jonathan C. Stroud research tool, one of the hottest topics identifed 03:10 Break, poster viewing 3 in last year's discussions. We have invited PM prominent speakers in this feld. We will also reserve time to an open discussion to 110 Dec. 9, 2017

03:30 AutoDL (Google challenge) - Olivier Schedule PM Bousquet 04:00 Scoring rule markets - Rafael 08:50 Opening Remarks PM Frongillo and Bo Waggoner AM 04:30 ENCODE-DREAM challenge - Akshay 09:00 "Language in context" Goodman PM Balsubramani AM 05:00 Codalab platform, Xavier Baro 09:30 "Communication via Physical Action" PM AM Dragan 05:30 New opportunities for challenges, 10:00 Invited Talk 3 Kohli PM looking to the future - Sergio AM Escalera 10:30 Cofee Break 06:20 Wrap up -Evelyne Viegas and AM PM Isabelle Guyon 11:00 Morning panel discussion AM Schmidhuber, Goodman, Dragan, Kohli, Batra Emergent Communication Workshop 12:15 Poster Session and Lunch PM Jakob Foerster, Igor Mordatch, Angeliki Lazaridou, Kyunghyun Cho, Douwe Kiela, Pieter Abbeel 02:15 Contributed Talks 1 Resnick, Wen, PM Zheng, Bhutani, Choi S4, Sat Dec 09, 08:00 AM 03:00 Cofee Break + Poster Presentation PM Communication is one of the most impressive 03:30 Contributed Talks 2 Raileanu, Kottur, human abilities. The question of how PM Grouchy communication arises has been studied for many 04:00 "From Democritus to Signaling decades, if not centuries. However, due to the PM Networks" Skyrms computational and representational limitations, in 04:30 "Language Emergence as Boundedly the past problem-settings had to be restricted to PM Optimal Control" Singh low dimensional, simple observation spaces. With 05:00 Afternoon Panel discussion Skyrms, the rise of deep reinforcement learning methods, PM Singh, Andreas this question can now be studied in complex multi-agent settings, which has lead to fourishing 06:15 Closing Remarks activity in the area over the last two years. In PM these settings agents can learn to communicate in grounded multi-modal environments and rich Abstracts (4): communication protocols emerge. Abstract 6: Morning panel discussion in However, the recent research has been largely Emergent Communication Workshop, disconnected from the study of emergent Schmidhuber, Goodman, Dragan, Kohli, Batra communication in other felds and even from 11:00 AM work done on this topic in previous decades. This workshop will provide a forum for a variety of All speakers from the morning session and further researchers from diferent felds (machine invited panelists. learning, game-theory, linguistics, cognitive science, and programming languages) interested in the question of communication and emergent Abstract 8: Contributed Talks 1 in Emergent language to exchange ideas. Communication Workshop, Resnick, Wen, Zheng, Bhutani, Choi 02:15 PM https://sites.google.com/site/emecom2017/ - Vehicle Communication Strategies for Simulated Highway Driving. Cinjon Resnick*, NYU; Ilia Kulikov, NYU; Kyunghyun Cho, New York 111 Dec. 9, 2017

University; Jason Weston, FAIR - Emergence of Language with Multi-agent Games: Learning to Communicate with - Multiagent Bidirectionally-Coordinated Nets: Sequences of Symbols. Serhii Havrylov*, Institute Emergence of Human-level Coordination in for Language, Cognition and Computation, the Learning to Play StarCraft Combat Games. Ying University of Edinburgh; Ivan Titov, Institute for Wen*, UCL; Jun Wang, UCL Language, Cognition and Computation, the University of Edinburgh (NIPS 2017) - A two dimensional decomposition approach for matrix completion through gossip. Mukul - An unbroken evolutionary pathway to symbolic Bhutani*, Amazon; Bamdev Mishra, Amazon.com communication. Paul Grouchy*, University of Toronto; Paul Grouchy, Gabriele M. T. D’Eleuterio; - Multi-Agent Compositional Communication Morten H. Christiansen; Hod Lipson (Nature Learning from Raw Visual Input. Edward Choi*, Scientifc Reports) Georgia Institute of Technology; Angeliki Lazaridou, DeepMind; Nando de Freitas, DeepMind Abstract 13: Afternoon Panel discussion in Emergent Communication Workshop, - Analyzing Language Learned by an Active Skyrms, Singh, Andreas 05:00 PM Question Answering Agent. Jannis Bulian*, Google; Christian Buck, Google; Massimiliano Brian Skyrms, Satinder Singh, Jacob Andreas Ciaramita, Google; Wojciech Gajewski, Google; Andrea Gesmundo, Google; Neil Houlsby, Google; Wei Wang, Google Bayesian optimization for science and - MACE: Structured Exploration via Deep engineering Hierarchical Coordination. Stephan Zheng*,

Caltech; Yisong Yue, Caltech Ruben Martinez-Cantin, Jose Miguel Hernández- Lobato, Javier Gonzalez - Multi-agent Communication by Bi-directional Recurrent Neural Networks. Ying Wen*, UCL; S7, Sat Dec 09, 08:00 AM Yaodong Yang, University College London; Jun Wang, UCL Bayesian optimization (BO) is a recent subfeld of machine learning comprising a collection of methodologies for the efcient optimization of expensive black-box functions. BO techniques Abstract 10: Contributed Talks 2 in Emergent work by ftting a model to black-box function data Communication Workshop, Raileanu, Kottur, and then using the model's predictions to decide Grouchy 03:30 PM where to collect data next, so that the - Modeling Opponent Hidden States in Deep optimization problem can be solved using only a Reinforcement Learning. Roberta Raileanu*, NYU; small number of function evaluations. The Arthur Szlam, FAIR; Rob Fergus, FAIR resulting methods are characterized by their high sample-efciency when compared to alternative - Emergent Communication through Negotiatio.n. black-box optimization algorithms, enabling the Kris Cao*, University of Cambridge; Angeliki solution of new challenging problems. For Lazaridou, DeepMind; Stephen Clark, University example, in recent years, BO has become a of Cambridge/Deepmind; Joel Z Leibo, DeepMind; popular tool in the machine learning community Marc Lanctot, Deepmind; Karl Tuyls, Deepmind for the excellent performance attained in the problem of hyperparameter tuning, with -Natural Language Does Not Emerge 'Naturally' in important results both in academia and industry. Multi-Agent Dialog. Satwik Kottur*, Carnegie This success has made BO a crucial player in the Mellon University; Satwik Kottur; José M.F. Moura; current trend of “automatic machine learning”. Stefan Lee; Dhruv Batra (EMNLP 2017) As new BO methods have been developed, the 112 Dec. 9, 2017

area of applicability has been continuously specifc cases, each black-box may produce expanding. While the problem of hyperparameter additional output values besides an estimate of tuning permeates all disciplines, the feld has the objective function. For example, when tuning moved towards more specifc problems in science a simulator, besides the fnal output value, we and engineering requiring of new advanced may also obtain structured data related to the methodology. Today, Bayesian optimization is the execution trace of the simulator. How can design most promising approach for accelerating and BO methods that automatically exploit this extra automating science and engineering. Therefore, structured output? we have chosen this year's theme for the workshop to be "Bayesian optimization for -Scalability and fast evaluations. Recent problems science and engineering". require to collect a massive amount of data in parallel and at cost that is usually lower than in We enumerate below a few of the recent typical BO problems. How can we design efcient directions in which BO methodology is being BO methods that collect very large batches of pushed forward to address specifc problems in data in parallel? How can we automatically adjust science and engineering: the cost of BO methods so that they are efcient even when the data collection is not highly -Beyond Gaussian processes. While GPs are the expensive? default choice in BO, specifc problems in science and engineering require to collect larger amounts The target audience for this workshop consists of of complex data. In these settings, it is necessary both industrial and academic practitioners of to use alternative models to capture complex Bayesian optimization as well as researchers patterns with higher accuracy. For example, working on theoretical and practical advances in Bayesian neural networks or deep Gaussian model based optimization across diferent processes. How can we do efcient BO with these engineering areas. We expect that this pairing of type of models? theoretical and applied knowledge will lead to an interesting exchange of ideas and stimulate an -Optimization in structured domains. Many open discussion about the long term goals and problems in science and engineering require to challenges of the Bayesian optimization perform optimization in complex spaces which community. are diferent from the typical box-constrained subset of the real coordinate space. For example, The main goal of the workshop is to serve as a how can we efciently optimize over graphs, forum of discussion and to encourage discrete sequences, trees, computer collaboration between the diverse set of scientist programmes, etc.? that develop and use Bayesian optimization and related techniques. Researchers and practitioners -Safe optimization. Critical applications in science in Academia are welcome, as well people form and engineering may include confgurations in the wider optimization, engineering and the search space that may be unsafe or harmful probabilistic modeling communities. and may lead to system failure. How can we perform efcient BO while avoiding these unsafe settings? Schedule

-Incorporating domain specifc knowledge. In 09:00 Introduction many optimization settings there is available a lot AM of domain specifc information that can be used 09:10 Invited talk: Towards Safe Bayesian to build bespoke BO methods with signifcantly AM Optimization Krause better performance than its of-the-shelf 09:40 Invited talk: Learning to learn counterparts. How can we easily encode and AM without gradient descent by transfer available knowledge into BO methods in gradient descent. Chen an easy and fast manner? 10:10 Poster spotlights 1 AM -Optimization with structured output response. In 113 Dec. 9, 2017

11:00 Invited talk: Scaling Bayesian * machine teaching problem formulation has AM Optimization in High Dimensions been recently studied in the context of diverse Jegelka applications including personalized educational 11:30 Poster spotlights 2 systems, cyber-security problems, robotics, AM program synthesis, human-in-the-loop systems, 02:00 Invited talk: Neuroadaptive Bayesian and crowdsourcing. [Jha et al. 2016; Zhu 2015; PM Optimization - Implications for Mei & Zhu 2015; Ba & Caruana 2014; Patil et al. Cognitive Sciences Lorenz 2014; Singla et al. 2014; Cakmak & Thomaz 2014] 02:30 Poster session PM In this workshop, we draw attention to machine 04:00 Invited talk: Knowledge Gradient teaching by emphasizing how the area of PM Methods for Bayesian Optimization machine teaching interacts with emerging Frazier research trends and application domains relevant 04:30 Invited talk: Quantifying and to the NIPS community. The goal of this workshop PM reducing uncertainties on sets under is to foster these ideas by bringing together Gaussian Process priors Ginsbourger researchers with expertise/interest in the inter- 04:55 Panel related areas of machine teaching, interactive PM machine learning, robotics, cyber-security problems, generative adversarial networks, educational technologies, and cognitive science. Teaching Machines, Robots, and Humans Topics of interests in the workshop include (but Maya Cakmak, Anna Raferty, Adish Singla, Jerry Zhu, are not limited to): Sandra Zilles * Theoretical foundations of machine teaching: Seaside Ballroom, Sat Dec 09, 08:00 AM ** using tools from information theory to develop better mathematical models of teaching; This workshop focuses on “machine teaching”, ** characterizing the complexity of teaching the inverse problem of machine learning, in when a teacher has limited power, or incomplete which the goal is to fnd an optimal training set knowledge of student’s model, or a mismatch in given a machine learning algorithm and a target feature representations; model. The study of machine teaching began in ** algorithms for adaptive teaching by the early 1990s, primarily coming out of interactively inferring the learner’s state; computational learning theory. Recently, there ** new notions of Teaching-dimension for generic has been a surge of interest in machine teaching teaching settings. as several diferent communities within machine learning have found connections to this problem; * Connections to machine learning models: these connections have included the following: ** the information complexity of teaching and query complexity; * machine teaching has close connections to ** machine teaching vs. curriculum learning and newly introduced models of interaction in other models of interactive machine learning; machine learning community, such as curriculum ** teaching reinforcement learning agents. learning, self-paced learning, and knowledge distillation. [Hinton et al. 2015; Bengio et al. * Applications of machine teaching to adversarial 2009] attacks, including cyber-security problems, generative adversarial networks, attacks on * there are strong theoretical connections machine learning algorithms, etc. between the Teaching-dimension (the sample complexity of teaching) and the VC-dimension * Applications of machine teaching to educational (the sample complexity of learning from technologies: randomly chosen examples). [Doliwa et al. 2014] ** using the machine teaching formulation to enable more rigorous and generalizable 114 Dec. 9, 2017

approaches for developing intelligent tutoring 11:30 Discussion session (Topic: Open systems; AM questions and new research ** behavioral experiments to identify good directions) cognitive models of human learning processes. 02:00 "Iterative Machine Teaching" - Le PM Song (Georgia Tech) * Novel applications for machine teaching such as 02:30 Spotlight presentations for posters program synthesis, human-robot interactions, PM (14 papers) social robotics, etc. 03:30 Poster presentations (14 papers) PM Schedule 04:30 "Machine Teaching: A New Paradigm PM for Building Machine Learning Systems" - Patrice Simard (Microsoft 09:00 Overview of Machine Teaching Research) AM 05:00 "Improving Language Learning and 10:00 "Reinforcement Learning with PM Assessment with Data" - Burr AM People" - Emma Brunskill (Stanford) Settles (Duolingo) 11:00 "Active Classifcation with 05:30 Discussion session (Topic: Novel AM Comparison Queries" - Shay Moran PM applications and industry insights) (Technion)