An Introduction to Bayesian Artificial Intelligence in Medicine

Total Page:16

File Type:pdf, Size:1020Kb

An Introduction to Bayesian Artificial Intelligence in Medicine An Introduction to Bayesian Artificial Intelligence in Medicine Giovanni Briganti1 and Marco Scutari2 1Unit of Epidemiology, Biostatistics and Clinical Research, Universit´eLibre de Bruxelles 2IDSIA Dalle Molle Institute for Artificial Intelligence August 7, 2020 Abstract Artificial Intelligence has become a topic of interest in the medical sciences in the last two decades, but most studies of its applications in clinical studies have been criticised for having unreliable designs and poor replicability: there is an increasing need for literacy among medical professionals on this increasingly popular domain of research. This work provides a short introduction to the specific topic of Bayesian Artificial Intelligence: we introduce Bayesian reasoning, networks and causal discovery as well as their (potential) applications in clinical practice. Keywords: statistics, causality, directed acyclic graphs 1 Introduction The origins of Artificial Intelligence (AI) can be traced back to the work of Alan Turing in 19501. He proposed a test (known today as the Turing Test) in which a certain level of intelligence is required from a machine to fool a human into thinking he is carrying on a conversation with another human (the Imitation Game). Although such level of intelligence has not yet been attained, a simple conversation with a virtual assistant like Siri serves as a clear example of how AI research has rapidly evolved in the past decades. AI has become a topic of interest in the medical sciences in the last two decades, with more and more applications being approved by health authorities worldwide. However, most physicians and medical scientists have no formal training in the disciplines the world of smart medical monitoring, diagnosis and follow-up has sprung from. This, along with historical epistemological differences in the practice of medicine and engineering disciplines, is one of the main obstacles to widespread collaboration between physicians and engineers in the development of AI software. It may also be argued that that is in turn one of the root causes of the ongoing replicability crisis in medical AI research2, characterised by unreliable study designs and poor replication3. Although there is an increasingly rich literature on how AI can be used in the clinical practice, few works aim to interest physicians in the fundamental concepts and terminology of AI. The gradual shift towards quantitative methods in the last century has made them 1 familiar with such concepts from probability and statistics as correlation, regression and confidence intervals; it is worthwhile to expand on these concepts and link them with modern AI. The most common AI approaches in the medical literature are neural networks (in the early 2000s) and clustering (in the last decade)4. Bayesian reasoning and methods for AI are less known, although they can be used for medical decision making, to study human physi- ology5, to identify interactions between symptoms6, and to investigate symptom recovery7. Bayesian models can facilitate reasoning in probabilistic terms when dealing with uncertainty, which is omnipresent in the medical sciences since we cannot construct a complete mecha- nistic model of diseases and of the physiological mechanisms they impact. One important characteristic of Bayesian models, and in particular of Bayesian networks, is that the vari- ables that are used as inputs for the model and their relationships are direct representations of real-world entities and of their interplay, not a purely mathematical construct as in neural networks8. This is why such approach is interesting to the medical field: Bayesian networks are the method of choice for explainable reasoning in AI. This work aims to introduce physicians to Bayesian AI through a clinical perspective on probabilistic reasoning and Bayesian networks. More detailed accounts on the topic may be found in popular textbooks9;10. 2 Probabilistic reasoning Bayes' theorem describes the probability of an event based on prior knowledge of conditions that may be related to that event: that is expressed with the well known mathematical notation Pr(B j A) Pr(A) Pr(A j B) = ; Pr(B) the probability Pr(A j B) of an event A given prior knowledge of a condition B (that is, B has occurred). Pr(A j B) is a conditional probability, while Pr(A) and Pr(B) are known as marginal probabilities, that is, the probabilities of observing the events A and B individually. In the context of medical diagnosis, the goal is to determine the probability Pr(Di j Cp) of presence of a particular disease or disorder Di given the clinical presentation of the patient 11 0 Cp , the prior probabilities Pi of the disease in the patient's reference group, and the prior 0 probabilities Pj of other diseases Dj; that is 0 Pi Pr (Cp j Di) Pr (Di j Cp) = 0 P 0 Pi Pr (Cp j Di) + j Pj Pr (Cp j Dj) where Pr(Cp j Dj) is the probability of having the same clinical presentation given other diseases. Bayes' theorem makes it possible to work with the distributions of dependent (or condi- tionally dependent) variables. However, in order to reduce the number of variables we need to observe simultaneously in probabilistic systems, it is also important to determine whether two variables A and B are independent (A ?? B), Pr(A j B) = Pr(A); 2 or conditionally independent (A ?? B j C) given the value of a third variable C, Pr(A \ B j C) = Pr(A j C) Pr(B j C): Extracting conditional dependence tables is one of the building blocks upon which we can build Bayesian probabilistic reasoning. It allows to take a set of variables (say, A, B and C again) and to compute the conditional probabilities of some of them (say, A j C), P B2fx;yg Pr(C = x; B; A = y) Pr(A = y j C = x) = P : B;A2fx;yg Pr(C = x; B; A) Conversely, we can also take variables or set of variables that are (conditionally) independent from each other and combine them to obtain their joint probability. This joint probability will be structured as a larger conditional dependence table that is the product of smaller tables associated with the original variables. For instance, Pr(A = y; B = z j C = x) = Pr(A = y j B = z) Pr(B = z j C = x) assuming A ?? C. The ability of explicitly merging and splitting set of variables to separate variables of interest we need for diagnostic purposes from redundant variables is one of the reasons that makes Bayesian reasoning easy while at the same time mathematically rigorous. 3 Bayesian thinking in machine learning Machine learning (ML) is the sub-field of AI that studies the algorithms and the statistical tools that allow computer systems to perform specific and well-defined tasks without explicit instructions12. Implementing machine learning requires four components. First, we need a working model of the world that describes the tasks and their context in a way that is understandable by a computer. In practice, this means choosing a class of Bayesian models defined over the variables of interest, and over relevant exogenous variables, and implementing it in software. Generative models such as Bayesian networks describe how variables interact with each other, and therefore represent the joint probability Pr (X1;:::;XN ); while discriminative models such as random forests and neural networks only focus on how a group of variables predicts a target variable by estimating Pr (X1 j X2;:::;XN ) : Clearly, depending on the application, a generative model may be a more appropriate choice than a discriminative model or vice versa. If characterising the phenomenon we are modelling from a systems perspective, generative models should be preferred. If we are only interested in predicting some clinical event, as in the case of diagnostic devices, discriminative models provide better predictive accuracy at the cost of being less expressive. Second, we need to measure the performance of the model, usually by being able to predict new events. Third, we must encode the knowledge of the world from training data, experts or both into the model: this step is called learning. The computer system can then 3 Figure 1: An undirected network of five nodes connected through edges. learn what is the best model within the prescribed class by finding that that maximize the chosen performance metric, drawing either from observational or experimental data or expert knowledge available from practitioners. Fourth, the computer uses the model as a proxy of reality and performs inference as new inputs come in, all while deciding if and how to perform the assigned task. Successfully implementing machine learning applications is, however, far from easy: it requires large amounts of data, and it is difficult to decide how to structure the model from a probabilistic and mathematical point of view. Principled software engineering also plays an important role13. For these reasons, machine learning models should comprise a limited number of variables so that they are easy to construct and interpret. As a side effect, compact models are unlikely to require copious amounts of computational power to learn, unlike state- of-the-art neural networks13. Clinical settings limit our knowledge of the patients to what we can learn from them: hence probability should be used to determine whether two variables are associated given other variables. Formally, if the occurrence of an event in one variable affects the probability of an event occurring in another variable, we say the two are associated with or probabilistically dependent on each other. Association is symmetric in probability; it does not distinguish between cause and effects in itself. Bayesian network models, however, go beyond probability theory to represent causal effects as arcs in a graph to allow for principled causal reasoning. 4 Bayesian Networks We introduce in this section the fundamental notions of Bayesian networks.
Recommended publications
  • Much Has Been Written About the Turing Test in the Last Few Years, Some of It
    1 Much has been written about the Turing Test in the last few years, some of it preposterously off the mark. People typically mis-imagine the test by orders of magnitude. This essay is an antidote, a prosthesis for the imagination, showing how huge the task posed by the Turing Test is, and hence how unlikely it is that any computer will ever pass it. It does not go far enough in the imagination-enhancement department, however, and I have updated the essay with a new postscript. Can Machines Think?1 Can machines think? This has been a conundrum for philosophers for years, but in their fascination with the pure conceptual issues they have for the most part overlooked the real social importance of the answer. It is of more than academic importance that we learn to think clearly about the actual cognitive powers of computers, for they are now being introduced into a variety of sensitive social roles, where their powers will be put to the ultimate test: In a wide variety of areas, we are on the verge of making ourselves dependent upon their cognitive powers. The cost of overestimating them could be enormous. One of the principal inventors of the computer was the great 1 Originally appeared in Shafto, M., ed., How We Know (San Francisco: Harper & Row, 1985). 2 British mathematician Alan Turing. It was he who first figured out, in highly abstract terms, how to design a programmable computing device--what we not call a universal Turing machine. All programmable computers in use today are in essence Turing machines.
    [Show full text]
  • Turing Test Does Not Work in Theory but in Practice
    Int'l Conf. Artificial Intelligence | ICAI'15 | 433 Turing test does not work in theory but in practice Pertti Saariluoma1 and Matthias Rauterberg2 1Information Technology, University of Jyväskylä, Jyväskylä, Finland 2Industrial Design, Eindhoven University of Technology, Eindhoven, The Netherlands Abstract - The Turing test is considered one of the most im- Essentially, the Turing test is an imitation game. Like any portant thought experiments in the history of AI. It is argued good experiment, it has two conditions. In the case of control that the test shows how people think like computers, but is this conditions, it is assumed that there is an opaque screen. On actually true? In this paper, we discuss an entirely new per- one side of the screen is an interrogator whose task is to ask spective. Scientific languages have their foundational limita- questions and assess the nature of the answers. On the other tions, for example, in their power of expression. It is thus pos- side, there are two people, A and B. The task of A and B is to sible to discuss the limitations of formal concepts and theory answer the questions, and the task of the interrogator is to languages. In order to represent real world phenomena in guess who has given the answer. In the case of experimental formal concepts, formal symbols must be given semantics and conditions, B is replaced by a machine (computer) and again, information contents; that is, they must be given an interpreta- the interrogator must decide whether it was the human or the tion. They key argument is that it is not possible to express machine who answered the questions.
    [Show full text]
  • THE TURING TEST RELIES on a MISTAKE ABOUT the BRAIN For
    THE TURING TEST RELIES ON A MISTAKE ABOUT THE BRAIN K. L. KIRKPATRICK Abstract. There has been a long controversy about how to define and study intel- ligence in machines and whether machine intelligence is possible. In fact, both the Turing Test and the most important objection to it (called variously the Shannon- McCarthy, Blockhead, and Chinese Room arguments) are based on a mistake that Turing made about the brain in his 1948 paper \Intelligent Machinery," a paper that he never published but whose main assumption got embedded in his famous 1950 paper and the celebrated Imitation Game. In this paper I will show how the mistake is a false dichotomy and how it should be fixed, to provide a solid foundation for a new understanding of the brain and a new approach to artificial intelligence. In the process, I make an analogy between the brain and the ribosome, machines that translate information into action, and through this analogy I demonstrate how it is possible to go beyond the purely information-processing paradigm of computing and to derive meaning from syntax. For decades, the metaphor of the brain as an information processor has dominated both neuroscience and artificial intelligence research and allowed the fruitful appli- cations of Turing's computation theory and Shannon's information theory in both fields. But this metaphor may be leading us astray, because of its limitations and the deficiencies in our understanding of the brain and AI. In this paper I will present a new metaphor to consider, of the brain as both an information processor and a producer of physical effects.
    [Show full text]
  • The Turing Test and Other Design Detection Methodologies∗
    Detecting Intelligence: The Turing Test and Other Design Detection Methodologies∗ George D. Montanez˜ 1 1Machine Learning Department, Carnegie Mellon University, Pittsburgh PA, USA [email protected] Keywords: Turing Test, Design Detection, Intelligent Agents Abstract: “Can machines think?” When faced with this “meaningless” question, Alan Turing suggested we ask a dif- ferent, more precise question: can a machine reliably fool a human interviewer into believing the machine is human? To answer this question, Turing outlined what came to be known as the Turing Test for artificial intel- ligence, namely, an imitation game where machines and humans interacted from remote locations and human judges had to distinguish between the human and machine participants. According to the test, machines that consistently fool human judges are to be viewed as intelligent. While popular culture champions the Turing Test as a scientific procedure for detecting artificial intelligence, doing so raises significant issues. First, a simple argument establishes the equivalence of the Turing Test to intelligent design methodology in several fundamental respects. Constructed with similar goals, shared assumptions and identical observational models, both projects attempt to detect intelligent agents through the examination of generated artifacts of uncertain origin. Second, if the Turing Test rests on scientifically defensible assumptions then design inferences become possible and cannot, in general, be wholly unscientific. Third, if passing the Turing Test reliably indicates intelligence, this implies the likely existence of a designing intelligence in nature. 1 THE IMITATION GAME gin et al., 2003), this paper presents a novel critique of the Turing Test in the spirit of a reductio ad ab- In his seminal paper on artificial intelligence (Tur- surdum.
    [Show full text]
  • A Review on Neural Turing Machine
    A review on Neural Turing Machine Soroor Malekmohammadi Faradounbeh, Faramarz Safi-Esfahani Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran. Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran. [email protected], [email protected] Abstract— one of the major objectives of Artificial Intelligence is to design learning algorithms that are executed on a general purposes computational machines such as human brain. Neural Turing Machine (NTM) is a step towards realizing such a computational machine. The attempt is made here to run a systematic review on Neural Turing Machine. First, the mind-map and taxonomy of machine learning, neural networks, and Turing machine are introduced. Next, NTM is inspected in terms of concepts, structure, variety of versions, implemented tasks, comparisons, etc. Finally, the paper discusses on issues and ends up with several future works. Keywords— NTM, Deep Learning, Machine Learning, Turing machine, Neural Networks 1 INTRODUCTION The Artificial Intelligence (AI) seeks to construct real intelligent machines. The neural networks constitute a small portion of AI, thus the human brain should be named as a Biological Neural Network (BNN). Brain is a very complicated non-linear and parallel computer [1]. The almost relatively new neural learning technology in AI named ‘Deep Learning’ that consists of multi-layer neural networks learning techniques. Deep Learning provides a computerized system to observe the abstract pattern similar to that of human brain, while providing the means to resolve cognitive problems. As to the development of deep learning, recently there exist many effective methods for multi-layer neural network training such as Recurrent Neural Network (RNN)[2].
    [Show full text]
  • The Future Just Arrived
    BLOG The Future Just Arrived Brad Neuman, CFA Senior Vice President Director of Market Strategy What if a computer were actually smart? You wouldn’t purpose natural language processing model. The have to be an expert in a particular application to interface is “text-in, text-out” that goal. but while the interact with it – you could just talk to it. You wouldn’t input is ordinary English language instructions, the need an advanced degree to train it to do something output text can be anything from prose to computer specific - it would just know how to complete a broad code to poetry. In fact, with simple commands, the range of tasks. program can create an entire webpage, generate a household income statement and balance sheet from a For example, how would you create the following digital description of your financial activities, create a cooking image or “button” in a traditional computer program? recipe, translate legalese into plain English, or even write an essay on the evolution of the economy that this strategist fears could put him out of a job! Indeed, if you talk with the program, you may even believe it is human. But can GPT-3 pass the Turing test, which assesses a machine’s ability to exhibit human-level intelligence. The answer: GPT-3 puts us a lot closer to that goal. I have no idea because I can’t code. But a new Data is the Fuel computer program (called GPT-3) is so smart that all you have to do is ask it in plain English to generate “a The technology behind this mind-blowing program has button that looks like a watermelon.” The computer then its roots in a neural network, deep learning architecture i generates the following code: introduced by Google in 2017.
    [Show full text]
  • (GPT-3) in Healthcare Delivery ✉ Diane M
    www.nature.com/npjdigitalmed COMMENT OPEN Considering the possibilities and pitfalls of Generative Pre- trained Transformer 3 (GPT-3) in healthcare delivery ✉ Diane M. Korngiebel 1 and Sean D. Mooney2 Natural language computer applications are becoming increasingly sophisticated and, with the recent release of Generative Pre- trained Transformer 3, they could be deployed in healthcare-related contexts that have historically comprised human-to-human interaction. However, for GPT-3 and similar applications to be considered for use in health-related contexts, possibilities and pitfalls need thoughtful exploration. In this article, we briefly introduce some opportunities and cautions that would accompany advanced Natural Language Processing applications deployed in eHealth. npj Digital Medicine (2021) 4:93 ; https://doi.org/10.1038/s41746-021-00464-x A seemingly sophisticated artificial intelligence, OpenAI’s Gen- language texts. Although its developers at OpenAI think it performs erative Pre-trained Transformer 3, or GPT-3, developed using well on translation, question answering, and cloze tasks (e.g., a fill-in- computer-based processing of huge amounts of publicly available the-blank test to demonstrate comprehension of text by providing textual data (natural language)1, may be coming to a healthcare the missing words in a sentence)1, it does not always predict a 1234567890():,; clinic (or eHealth application) near you. This may sound fantastical, correct string of words that are believable as a conversation. And but not too long ago so did a powerful computer so tiny it could once it has started a wrong prediction (ranging from a semantic fit in the palm of your hand.
    [Show full text]
  • Does the Turing Test Demonstrate Intelligence Or Not?
    Does the Turing Test demonstrate intelligence or not? The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Stuart M. Shieber. Does the Turing Test demonstrate intelligence or not? In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), Boston, MA, 16-20 July 2006. Published Version http://www.aaai.org/Library/AAAI/2006/aaai06-245.php Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2252596 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Does the Turing Test Demonstrate Intelligence or Not? Stuart M. Shieber∗ Harvard University 33 Oxford Street — 245 Cambridge, MA 02138 [email protected] Introduction descriptions of the Test (Newman et al., 1952) accord with The Turing Test has served as a defining inspiration this view: throughout the early history of artificial intelligence re- The idea of the test is that the machine has to pretend to search. Its centrality arises in part because verbal behavior be a man, by answering questions put to it, and it will Proceedings of indistinguishable from that of humans seems like an in- only pass if the pretence is reasonably convincing. controvertible criterion for intelligence, a “philosophical We had better suppose that each jury has to judge quite conversation stopper” as Dennett (1985) says. On the other a number of times, and that sometimes they really are hand, from the moment Turing’s seminal article (Turing, dealing with a man and not a machine.
    [Show full text]
  • Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation
    Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation Sam Devlin * 1 Raluca Georgescu * 1 Ida Momennejad * 2 Jaroslaw Rzepecki * 1 Evelyn Zuniga * 1 Gavin Costello 3 Guy Leroy 1 Ali Shaw 3 Katja Hofmann 1 Abstract mated NTT or ANTT), a novel approach to learn to quantify the degree to which behavior is perceived as human-like A key challenge on the path to developing agents by humans. While human judges can be highly accurate that learn complex human-like behavior is the when judging human-like behavior, their scalability and need to quickly and accurately quantify human- speed are limited. As a result, it is prohibitive to use human likeness. While human assessments of such be- assessments in, e.g., iterations of training machine learning- havior can be highly accurate, speed and scala- based agents. We aim to remove this bottleneck, providing bility are limited. We address these limitations a scalable and accurate proxy of human assessments. through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments We demonstrate our NTT approach on the complex task of of human-likeness. We demonstrate the effective- goal-directed navigation in 3D space. This task is highly ness of our automated NTT on a navigation task relevant and practical because: (1) it has been studied from a in a complex 3D environment. We investigate six wide range of perspectives in both biological (Banino et al., classifcation models to shed light on the types of 2018; de Cothi, 2020) and artifcial (Alonso et al., 2020; architectures best suited to this task, and validate Chaplot et al., 2020) agents, so is relatively well understood; them against data collected through a human NTT.
    [Show full text]
  • The Chinese Room: Just Say “No!” to Appear in the Proceedings of the 22Nd Annual Cognitive Science Society Conference, (2000), NJ: LEA
    The Chinese Room: Just Say “No!” To appear in the Proceedings of the 22nd Annual Cognitive Science Society Conference, (2000), NJ: LEA Robert M. French Quantitative Psychology and Cognitive Science University of Liège 4000 Liège, Belgium email: [email protected] Abstract presenting Searle’s well-known transformation of the It is time to view John Searle’s Chinese Room thought Turing’s Test. Unlike other critics of the Chinese Room experiment in a new light. The main focus of attention argument, however, I will not take issue with Searle’s has always been on showing what is wrong (or right) argument per se. Rather, I will focus on the argument’s with the argument, with the tacit assumption being that central premise and will argue that the correct approach somehow there could be such a Room. In this article I to the whole argument is simply to refuse to go beyond argue that the debate should not focus on the question “If this premise, for it is, as I hope to show, untenable. a person in the Room answered all the questions in perfect Chinese, while not understanding a word of Chinese, what would the implications of this be for The Chinese Room strong AI?” Rather, the question should be, “Does the Instead of Turing’s Imitation Game in which a very idea of such a Room and a person in the Room who computer in one room and a person in a separate room is able to answer questions in perfect Chinese while not both attempt to convince an interrogator that they are understanding any Chinese make any sense at all?” And I believe that the answer, in parallel with recent human, Searle asks us to begin by imagining a closed arguments that claim that it would be impossible for a room in which there is an English-speaker who knows machine to pass the Turing Test unless it had no Chinese whatsoever.
    [Show full text]
  • IEEE Paper Template in A4 (V1)
    International Journal of Research Trends in Computer Science & Information Technology (IJRTCSIT)[ISSN: 2455-6513] Volume 6, Issue 2 ,December [2020] A review on GPT-3 - An AI revolution from Open AI Naman Mishra1, Priyank Singhal2, Shakti Kundu3 Teerthanker Mahaveer University, Moradabad, UP, India [email protected], [email protected], [email protected] Abstract— the great achievements, and others have injected When we talk about AI we think of machines that themselves very smoothly into our daily regime. In have the ability to think, to be able to make between these extremes, programs of AI have decisions on their own without any human become vital tools in field of science and commerce. interference. To to so one thing has been extremely Even though not every AI achievement makes it to important i.e, how well a machine is able to process mainstream media there have been some amazing and generate language. Until recently we relied on developments in AI and they have in some small small-scale statistical models or formalized way become a part of how technology works today. grammar systems. Over the course of last few years Voice Assistants like Amazon Alexa or Google we have seen better models comes out for these be Home are best examples of products backed by AI it ELMo, BERT or GPT-n series by OpenAI. In this around us. paper we try to look at the large-scale statistical models with the focus on GPT-3 currently the The world of Artificial Intelligence is changing largest model in the world and try to understand rapidly every we are seeing great developments in their impact in the world of AI.
    [Show full text]
  • Experimental Evidence That People Cannot Differentiate AI-Generated from Human-Written Poetry
    Computers in Human Behavior 114 (2021) 106553 Contents lists available at ScienceDirect Computers in Human Behavior journal homepage: http://www.elsevier.com/locate/comphumbeh Full length article Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry Nils Kobis¨ a,b,*, Luca D. Mossink a a Department of Economics & Center for Experimental Economics and Political Decision Making (CREED), University of Amsterdam, the Netherlands b Max Planck Institute for Human Development, Center for Humans & Machines, Germany ARTICLE INFO ABSTRACT Keywords: The release of openly available, robust natural language generation algorithms (NLG) has spurred much public Natural language generation attention and debate. One reason lies in the algorithms’ purported ability to generate humanlike text across Computational creativity various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) Turing prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing Test behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using Creativity Machine behavior the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmicallygenerated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, in­ dependent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity).
    [Show full text]