Can the Bot Speak? the Paranoid Voice in Conversational UI
Total Page:16
File Type:pdf, Size:1020Kb
Can The Bot Speak? The Paranoid Voice in Conversational UI Benjamin H. Bratton Published in Across & Beyond: A Transmediale Reader on Post-digital Practices, Concepts, and Institutions. R. Bishop, K. Gansing, J. Parikka & E. Wilk (eds.) Berlin: Sternberg Press. 2016. “Four Sail. Bay be choose. Never worn.” The design of user interfaces rotates from the specification of buttons-with-words-on-them, with which we cascade through tasks, toward the conjuring of personalities with which we speak and negotiate outcomes. But, who and what do we make of these bots? Bots come in many guises and voices. Some do not and cannot speak but many do quite capably. Some of these are actually many bots but appear to human users as if they were a single personality (such as Siri, Alexa or Cortana). Other bots, those working like Morlocks deep down in the Internet “pipes” have no voice, at least of the sort that most people can hear. They are mute creatures of simple instruction. The epigenesis of automated personalities may be subordinated and subaltern (and may even look like slavery), and it may well be that conventions of CUI (Conversational User Interface) personality design unwittingly exploit these older human social relations. For conversational bots, however, the issue is not that they do not have voice, but that they only have voice. Is their vocality, their only appearance, also their fate and their prison? Bots, it seems, will evolve not only as an interface membrane that cleaves humans and computers, but as part of a genre of software applications that would help organize how planetary-scale computation involves and internalizes human society. That is, bots are a layer within a larger infrastructural landscape, articulating how that system appears to us and us to it.1 To me represents an interesting convergence of User and Interface layers of The Stack: the interface performs as it it were itself another User.2 It appears less as a tool or a diagram and more as a sympathetic collaborator with whom one may enter into temporary or long-term composite agencies, some of which may simply speak through the mobile phone, and others through specific hardware slabs, others through disembodied ambient entities.3 (The design challenges posed cannot ultimately be solved by generic strategies. If every bot is an empty text field, mic icon, or mini-monolith, then a user has no clue what the bot can do: from skeuomorphic affordance to what?) For these composite agencies, it’s not clear how to filter the delightful from the dubious. We see pitched weirdness in how humans may love or hate robots, ones often with far less algorithmic intelligence than a personal assistant app. We consider robot pets for elderly patients that calm the creeping dementia, robot lovedolls who withstand the amorous performances of the lonely, robot vacuum cleaners that commit accidental suicide by trying to clean the surface of hot stoves while their caretakers are away, robot insectoids forced into gladiatorial death matches by adolescent hobbyists, as we gratefully consume what robot slaves assemble in our factories, and so on.4 However, this survey may reveal less about what bots, robots and robotics are and are capable of doing than it says sad things about humans and the residual pathologies of legacy Humanisms. If so, then the work ahead of us is to make better sense of the uneven landscape of bots, both extend and emergent, and the burden they bear to perform synthetic personality on our behalf. At stake is a new medium built on symbolic interaction, and one for which empathetic “humanization” may be destructive, and even dangerous. As we learn to model and interact with the entire Stack guided by beseeching bots, we will also learn new ways of speaking and commanding at a distance: we will learn new voices.5 As we learn to speak to bots based on how we speak to one another, then in time, will we learn to speak to other humans in accordance with how we’ve learned to speak to those artificial personalities? If one conversation will accustom the other, there are myriad ways it could all go bizarrely wrong (or bizarrely right.) Among the early warning signals is what may be called a “paranoid style” of bot interaction that thrives because of, not in spite of, the sentimental humanization of anthropomorphic A.I.. Human-ness may first provide a degree of recognition, identification, empathy or comfort, but soon the motivations of this new character are called into question. With what dramas and plots is this creature aligned? What designs does it have on us? The bot’s actual relationship to computational infrastructure, a situation that demands our interpretation, may be ignored in favor of entertaining conspiracies. The “user-centered” ethos that computation “disappear” into familiar environments may be the cause, not the solution, to this serious problem. We will ultimately conclude that there are better ways for bots to appear and co-negotiate the world with us than by the simulation of obsequious dialogue. II. What does it mean to use an interface to do your bidding? Does it matter if you write it in as code or speak it as sounds? Or both? When you talk to Siri and tell “her” to do something, are you the user or the programmer or is she, or neither of you? In this context, is programming a kind of agency or agency a kind of programming, or neither? Any form of embodied cognition is more than a channel through which one thinks. In time, it becomes the way that one thinks. As medium it is, if not the message, then it is a form that makes the message possible. Media theory offers diverse perspectives on how the contours of significance are formed not only in but through the technical array of encoding, relay and performance, and on how these modes of mediation stack one upon the next.6 One form of inscription and distribution may project itself onto (and into) the domain of another such that one interface becomes the interface to another, and another, and so on. This may be unseen or unconsidered by those users whose cognition is being thus distributed, but more apparent to someone else reading what has already been written in these ways. We see one version of this in the dynamic between cognitive technologies of programming (a specific medium of inscription) and the cognitive technologies of interfacial interaction (reading and responding to available mediated inscriptions, and piloting them according to user intention). For the former, a programmer’s ideas and intentions are encoded and can be activated later as a form of available, automated instrumentation, not so dissimilar to how the thought process of any tool-maker is available to anyone who later uses that tool. For the latter, the user’s interests and intentions are extended from the interfacial image of those available programs and how it frames their context and allows for their manipulation.7 For one, cognition is encoded into the interface and for the other cognition is augmented by using of the interface. That said, isn’t this distinction between inputs and outputs too neatly drawn? In practice one becomes the other quite regularly. From the machine’s perspective, any form of programming (even and especially including machine language) translates between animal cognition and electronic logic and back again all the time. These translations may be arranged in an orderly stack of relative abstraction (from assembly language up to visual programming tools, for example), and in this way at least, writing/programming at a higher layer of abstraction is also to use an interface so as to automate, by relay, the writing of programs at a lower layer of abstraction. The distinction made above — between programming and the encoding of thought versus the use of existing interfaces to augment user/programmer cognition in a given expanded field— is thus irrevocably blurred. To encode at one level is only possible through the already- given interfacial instrumentation that translates between cognition and machine: a process no more mysterious here than for users of typewriters who once manipulated an contraption of letters to articulate ideas into alphanumeric symbols.8 In this blur, any user who may imagine him or herself to be only tweaking interfaces at the surface level of screens is also a kind of programmer. Who knew? This identity be limited to a rather soft and prosaic sense of “programming,” such as when a user clicks buttons in Photoshop, that execute a particular bit of code in a particular strategically nested sequence, resulting in a desired manipulation of the source file that is appreciated as a modified digital “image.” Or it may intonate programming in a more active and ecological sense, such when a user chooses a particular search result and he also trains the search algorithmic apparatus a bit more about what psychodemographically-similar homo sapiens find interesting. The first connotation refers to how humans make things on computers, the latter to how A.I.’s learn and grow, continuously programmed through the accumulation of interactions they have with humans and human artifacts. And so to use a bot, to converse with a bot, is not only to augment and extend one’s cognition into an interfacial field (a la Douglas Engelbart), it is also to use one’s own voice as a medium of semantic abstraction into order to program the A.I., not just to command it. The careers of early bot assistants, as frosting on the machine learning cake, are established by this arrangement of mutual learning and cognition.