Dialogue Systems and Chatbots

Natalie Parde, Ph.D. Department of Computer Science University of Illinois at Chicago

CS 421: Natural Language Processing Fall 2019

Many slides adapted from Jurafsky and Martin Dialogue Systems and (https://web.stanford.edu/~jurafsky/slp3/). Chatbots What are • Programs capable of communicating with users in natural language via text, dialogue speech, or both systems? • Often referred to as conversational agents

11/12/19 Natalie Parde - UIC CS 421 2 Dialogue systems are everywhere!

11/12/19 Natalie Parde - UIC CS 421 3 Types of Dialogue Systems

Task-Oriented Chatbots

• Designed to leverage • Designed to carry out conversational extended, interactions to help unstructured users complete tasks conversations (similar to human chats)

11/12/19 Natalie Parde - UIC CS 421 4 Many dialogue systems contain elements of both of these types!

• Task-oriented agents may seem U: Hey more natural if they also contain a A: Hi, how are you? chatbot component U: I’m doing good, how are you? A: I’m doing good as well. Would you like me to help you reserve a room for your meeting?

11/12/19 Natalie Parde - UIC CS 421 5 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn. Salesperson: Um okay when do you need those? Designing high- Natalie: I want to bring them to a party on Saturday. quality Salesperson: And what size buckets would you like? Natalie: Extra large. conversational Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets agents requires of caramel popcorn? Natalie: No. an Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn--- understanding Natalie: I’ll take that. Eight buckets of ranch-flavored popcorn and five buckets of cheesy popcorn. of how human Salesperson: Okay. Natalie: Actually, wait. Seven buckets of ranch and six buckets of conversation cheesy popcorn, still all in extra large. works. Salesperson: Okay, we will have seven extra-large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 6 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Turns: Individual Natalie: I want to bring them to a party on Properties of Saturday. contributions to the Salesperson: And what size buckets would you Human dialogue like? Conversation • Typically a sentence, Natalie: Extra large. Salesperson: Okay, our cheesy popcorn is but may be shorter really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets of (e.g., a single word) or caramel popcorn?

longer (e.g., multiple Natalie: No.

sentences) Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

Salesperson: Okay. Turn Natalie: Actually, wait. Seven buckets of ranch and six buckets of cheesy popcorn, still all in extra large. Salesperson: Okay, we will have seven extra- large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 7 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need • Understanding turn those? structure is very Natalie: I want to bring them to a party on Properties of important for spoken Saturday. dialogue systems! Salesperson: And what size buckets would you Human like? Conversation • Systems must know: Natalie: Extra large. • When to stop talking Salesperson: Okay, our cheesy popcorn is • Dealing with really popular. Would you be okay with six interruptions buckets of cheesy popcorn and seven buckets of • When to start talking caramel popcorn? • Detecting when the Natalie: No. human user has finished speaking Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- • Detecting when a -- Natalie: I’ll take that. Eight buckets of ranch- user has finished flavored popcorn and five buckets of cheesy speaking is called popcorn. endpoint detection Salesperson: Okay. • Challenging due to Turn Natalie: Actually, wait. Seven buckets of ranch noise and speech and six buckets of cheesy popcorn, still all in pauses extra large. Salesperson: Okay, we will have seven extra- large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 8 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Speech Acts: Types of Natalie: I want to bring them to a party on Properties of Saturday. actions performed by the Salesperson: And what size buckets would you Human speaker like? Conversation • Also referred to as dialogue Natalie: Extra large. acts Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets of • Major dialogue act groups: caramel popcorn?

• Constatives Natalie: No.

• Directives Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- • Commissives --

• Acknowledgments Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

Salesperson: Okay. Natalie: Actually, wait. Seven buckets of ranch and six buckets of cheesy popcorn, still all in extra large. Salesperson: Okay, we will have seven extra- large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 9 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Constatives: Making a Natalie: I want to bring them to a party on Properties of Saturday. statement Salesperson: And what size buckets would you Human • Answering like? Conversation • Claiming Natalie: Extra large. Salesperson: Okay, our cheesy popcorn is • Confirming really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets of • Denying caramel popcorn?

• Disagreeing Natalie: No. • Stating Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 10 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Directives: Attempting to Natalie: I want to bring them to a party on Properties of Saturday. get the addressee to do Salesperson: And what size buckets would you Human something like? Conversation • Advising Natalie: Extra large. Salesperson: Okay, our cheesy popcorn is • Asking really popular. Would you be okay with six • Forbidding buckets of cheesy popcorn and seven buckets of caramel popcorn? • Inviting Natalie: No. • Ordering Salesperson: Okay, what about some of our • Requesting other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 11 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Commissives: Natalie: I want to bring them to a party on Properties of Saturday. Committing the speaker to Salesperson: And what size buckets would you Human a future action like? Conversation • Promising Natalie: Extra large. Salesperson: Okay, our cheesy popcorn is • Planning really popular. Would you be okay with six • Vowing buckets of cheesy popcorn and seven buckets of caramel popcorn? • Betting Natalie: No. • Opposing Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 12 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Acknowlegements: Natalie: I want to bring them to a party on Properties of Saturday. Expressing the speaker’s Salesperson: And what size buckets would you Human attitude regarding some like? Conversation social action Natalie: Extra large. Salesperson: Okay, our cheesy popcorn is • Apologizing really popular. Would you be okay with six • Greeting buckets of cheesy popcorn and seven buckets of caramel popcorn? • Thanking Natalie: No. • Accepting an Salesperson: Okay, what about some of our acknowledgement other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 13 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Grounding: Establishing Natalie: I want to bring them to a party on Properties of Saturday. common ground by Salesperson: And what size buckets would you Human acknowledging that the like? Conversation speaker has been heard Natalie: Extra large. and/or understood Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six • Saying “okay” buckets of cheesy popcorn and seven buckets of caramel popcorn? • Repeating what the other speaker said Natalie: No. Salesperson: Okay, what about some of our • Using implicit signals of other flavors? We have ranch-flavored popcorn- understanding like “and” at -- the beginning of an Natalie: I’ll take that. Eight buckets of ranch- utterance flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 14 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 15 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 16 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Conversations have structure Natalie: I want to bring them to a party on Saturday. Properties of • Questions set up an expectation for an answer Salesperson: And what size buckets would you like? Human • Proposals set up an expectation for Conversation an acceptance or rejection Natalie: Extra large. • These dialogue act pairs are Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six called adjacency pairs buckets of cheesy popcorn and seven buckets of • First pair part: Question caramel popcorn? • Second pair part: Answer Natalie: No.

Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 17 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • However, two dialogue acts in an Natalie: I want to bring them to a party on Properties of adjacency pair don’t always Saturday. immediately follow one another! Salesperson: And what size buckets would you Human like? Conversation • Adjacency pairs can be Natalie: Extra large. separated by side sequences or Salesperson: Okay, our cheesy popcorn is subdialogues really popular. Would you be okay with six • Interruptions buckets of cheesy popcorn and seven buckets of caramel popcorn? • Clarifying questions Natalie: No. • Corrections Salesperson: Okay, what about some of our • Some adjacency pairs also have other flavors? We have ranch-flavored popcorn- presequences -- • Requests may be preceded by Natalie: I’ll take that. Eight buckets of ranch- questions about a system’s flavored popcorn and five buckets of cheesy capabilities popcorn. Salesperson: Okay. Natalie: Actually, wait. Seven buckets of ranch and six buckets of cheesy popcorn, still all in extra large. Salesperson: Okay, we will have seven extra- large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 18 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need those? • Initiative: Conversational control Natalie: I want to bring them to a party on Properties of Saturday. • Generally, the speaker asking Salesperson: And what size buckets would you Human questions has the like? Conversation conversational initiative Natalie: Extra large. • In everyday dialogue, most Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six interactions are mixed-initiative buckets of cheesy popcorn and seven buckets of • Participants sometimes ask caramel popcorn? questions, and sometimes answer Natalie: No. them Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- --

Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

11/12/19 Natalie Parde - UIC CS 421 19 • Although normal in human- Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn. human conversations, mixed- initiative dialogue is very difficult Salesperson: Um okay when do you need those? for dialogue systems to achieve! Natalie: I want to bring them to a party on Properties of • In question-answering systems Saturday. (e.g., “Alexa, what’s the weather Salesperson: And what size buckets would you Human like right now?”) the initiative lies like? Conversation entirely with the user Natalie: Extra large. • Systems such as these are called Salesperson: Okay, our cheesy popcorn is user-initiative systems really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets of • Opposite of user-initiative caramel popcorn? system: system-initiative Natalie: No. system Salesperson: Okay, what about some of our • Can be very frustrating! other flavors? We have ranch-flavored popcorn- Salesperson: Which variety of -- caramel popcorn would you like? Natalie: I’ll take that. Eight buckets of ranch- Natalie: I don’t want caramel flavored popcorn and five buckets of cheesy popcorn. popcorn. Salesperson: Which variety of Salesperson: Okay. caramel popcorn would you like? Natalie: Actually, wait. Seven buckets of ranch Natalie: Can I quit? and six buckets of cheesy popcorn, still all in Salesperson: Which variety of extra large. caramel popcorn would you like? Salesperson: Okay, we will have seven extra- Natalie: Um, regular. large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 20 Natalie: Hi, I would like to order thirteen buckets of cheesy popcorn.

Salesperson: Um okay when do you need • Inference: Drawing conclusions those? based on more information than Natalie: I want to bring them to a party on Properties of is present in the uttered words Saturday. Salesperson: And what size buckets would you Human • Implicature: The act of implying like? meaning beyond what is directly Natalie: Extra large. Conversation communicated Salesperson: Okay, our cheesy popcorn is really popular. Would you be okay with six buckets of cheesy popcorn and seven buckets of caramel popcorn?

Natalie: No. Mentioning the party on Saturday (especially in response to the salesperson’s Salesperson: Okay, what about some of our other flavors? We have ranch-flavored popcorn- question!) implies that the popcorn will be -- needed by that time Natalie: I’ll take that. Eight buckets of ranch- flavored popcorn and five buckets of cheesy popcorn.

Salesperson: Okay.

The salesperson infers that the popcorn Natalie: Actually, wait. Seven buckets of ranch should be ready for pickup by Friday and six buckets of cheesy popcorn, still all in extra large. Salesperson: Okay, we will have seven extra- large buckets of ranch-flavored popcorn and six extra-large buckets of cheesy popcorn ready for you to pick up on Friday.

11/12/19 Natalie Parde - UIC CS 421 21 Chatbots

• Dialogue systems designed to carry out extended conversations with the goal of mimicking unstructured, informal, human-human chats • Can be used for entertainment • Cleverbot: https://www.cleverbot.com/ • Or more practical purposes • ELIZA

11/12/19 Natalie Parde - UIC CS 421 22 Rule-based: Interpret and respond to input using a (sometimes extremely complex) set of rules Chatbot architectures can be Corpus-based: Automatically learn to divided into map from a user utterance to a system two classes. response based on large datasets of human-human conversations

11/12/19 Natalie Parde - UIC CS 421 23 • Revisiting ELIZA: U: Men are all alike. Rule- • Rogerian psychologist E: IN WHAT WAY (reflect patients’ based statements right back at U: They’re always bugging us them) about something or other. Chatbots • Takes the stance of E: CAN YOU THINK OF A knowing almost nothing SPECIFIC EXAMPLE about the world U: Well my boyfriend made me • Common position for come here. chatbots trying to pass the Turing Test E: YOUR BOYFRIEND MADE …users may assume YOU COME HERE that the chatbot is U: He says I’m depressed pretending not to much of the time. know anything to fulfill some type of E: I AM SORRY TO HEAR conversational goal YOU ARE DEPRESSED

11/12/19 Natalie Parde - UIC CS 421 24 • ELIZA works by matching patterns and transforming slots from those patterns into responses • (.* [yY][oO][uU] .* [mM][eE]) → (WHAT Rule-based MAKES YOU THINK I 3 YOU) • You hate me → WHAT MAKES YOU Chatbots THINK I HATE YOU • Each pattern used by ELIZA is based on keyword(s) that might occur in users’ sentences

11/12/19 Natalie Parde - UIC CS 421 25 • In ELIZA, keywords are associated with a rank • More specific keywords → higher rank • More general keywords → lower rank • Consider two rules: • (I .*) → (YOU SAY YOU 2) Rule-based • (EVERYBODY .*) → (WHO IN PARTICULAR ARE YOU THINKING OF) Chatbots • Consider an input sentence: • “I know everybody made a cooler chatbot.” • Since “I” is more general than “everybody,” the system will return WHO IN PARTICULAR ARE YOU THINKING OF rather than YOU SAY YOU KNOW EVERYBODY MADE A COOLER CHATBOT

11/12/19 Natalie Parde - UIC CS 421 26 Rule-based Chatbots

• What if no keywords are matched in an input? • ELIZA defaults to a non-committal response • PLEASE GO ON • THAT’S VERY INTERESTING • I SEE

11/12/19 Natalie Parde - UIC CS 421 27 • Alternately, if no keywords are matched in an input, ELIZA can access facts from its memory • Whenever “my” is the highest- ranked keyword, select a transformation rule, apply it to the input, and store the transformed Rule-based input on a stack Chatbots • My boyfriend made me come here. → DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE • Pop the most recent transformed input off the stack if no keywords are matched in an input

11/12/19 Natalie Parde - UIC CS 421 28 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1

(.* always .*) → CAN YOU THINK OF A 1 SPECIFIC EXAMPLE Men are all alike. (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory:

11/12/19 Natalie Parde - UIC CS 421 29 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1

(.* always .*) → CAN YOU THINK OF A 1 SPECIFIC EXAMPLE Men are all alike. (.*) → IN WHAT WAY 3 IN WHAT WAY

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory:

11/12/19 Natalie Parde - UIC CS 421 30 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) They’re always (my .*) → (YOUR 2) 1 bugging us about (.* always .*) → CAN YOU THINK OF A 1 CAN YOU THINK OF A something or other. SPECIFIC EXAMPLE SPECIFIC EXAMPLE (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory:

11/12/19 Natalie Parde - UIC CS 421 31 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1

(.* always .*) → CAN YOU THINK OF A 1 Well my boyfriend SPECIFIC EXAMPLE made me come here. (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory:

11/12/19 Natalie Parde - UIC CS 421 32 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1 YOUR BOYFRIEND (.* always .*) → CAN YOU THINK OF A 1 Well my boyfriend SPECIFIC EXAMPLE MADE YOU COME made me come here. (.*) → IN WHAT WAY 3 HERE

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory: 1) DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE

11/12/19 Natalie Parde - UIC CS 421 33 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1 I AM SORRY TO HEAR THAT YOU ARE (.* always .*) → CAN YOU THINK OF A 1 DEPRESSED MUCH He says I’m depressed SPECIFIC EXAMPLE OF THE TIME much of the time. (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory: 1) DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE

11/12/19 Natalie Parde - UIC CS 421 34 Example: ELIZA

Rule Rank

(I .*) → (YOU SAY YOU 2) 2

(I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) (my .*) → (YOUR 2) 1

(.* always .*) → CAN YOU THINK OF A 1 Maybe he is the SPECIFIC EXAMPLE problem. (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory: 1) DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE

11/12/19 Natalie Parde - UIC CS 421 35 Example: ELIZA

Rule Rank DOES THAT HAVE (I .*) → (YOU SAY YOU 2) 2 ANYTHING TO DO WITH THE FACT (I’m .*) → (I AM SORRY TO HEAR THAT YOU 1 ARE 2) THAT YOUR (my .*) → (YOUR 2) 1 BOYFRIEND MADE YOU COME HERE (.* always .*) → CAN YOU THINK OF A 1 Maybe he is the SPECIFIC EXAMPLE problem. (.*) → IN WHAT WAY 3

(.* my .*) → (DOES THAT HAVE ANYTHING TO (Memory) DO WITH THE FACT THAT YOUR 3)

Memory:

11/12/19 Natalie Parde - UIC CS 421 36 Formal Algorithm: ELIZA find the word w in the sentence s that has the highest keyword rank if w exists: choose the highest ranked rule r for w that matches s response ← apply the transformation in r to s if w == “my”: future ← apply a transformation from the memory rule list to s push future onto the memory stack else: # No keyword applies! either: response ← apply the transformation for NONE to s or: response ← pop the top response from the memory stack

11/12/19 Natalie Parde - UIC CS 421 37 • Still used in many applications today! • Modern chatbot systems like ALICE are based on updated versions of ELIZA’s pattern/action architecture Continued • PARRY • Another clinical psychology chatbot developed popularity of several years after ELIZA • Included a model of its own mental state, with ELIZA’s affect variables for its levels of fear and anger • High anger → hostile output framework…. • High fear → sequence of deluded statements • First known system to pass the Turing Test (psychiatrists were unable to distinguish transcripts from PARRY with transcripts from people with real paranoia)

11/12/19 Natalie Parde - UIC CS 421 38 • Do not use hand-built rules • Instead, learn to map inputs to outputs based on large human- human conversation corpora • Very data-intensive! Corpus-based • Training modern corpus-based Chatbots chatbots requires hundreds of millions, or even billions, of words

Thanks! You’re welcome. Doing good, thanks, and you? Hello I’m doing good as well. Hi, how are you? Well, see you later. Bye!

11/12/19 Natalie Parde - UIC CS 421 39 What kind of corpora are used to train corpus-based chatbots?

• Large spoken conversational corpora • Switchboard corpus of American English telephone conversations: https://catalog.ldc.upenn.edu/LDC97S62 • Movie dialogue • Text from microblogging sites (e.g., Twitter) • Collections of crowdsourced conversations • Topical-Chat: https://github.com/alexa/alexa-prize-topical- chat-dataset

11/12/19 Natalie Parde - UIC CS 421 40 Possible responses can also be extracted from non-dialogue corpora.

Did you know that Illinois has • Possible sources: the 25th largest land area of all U.S. states? • News • Online knowledge repositories (e.g., Wikipedia) • This allows the chatbot to tell stories or mention facts acquired from non-conversational sources

11/12/19 Natalie Parde - UIC CS 421 41 As humans interact with a chatbot, their own utterances can be used as additional training data as well.

• This allows a chatbot’s quality to gradually improve over time • Some privacy concerns can emerge when using this strategy (it’s crucial to remove personally identifiable information!)

11/12/19 Natalie Parde - UIC CS 421 42 • Two main architectures: • Information retrieval • Machine learned sequence transduction • Most corpus-based chatbots do Corpus-based (surprisingly!) very little modeling of Chatbots conversational context • The focus? • Generate a single response turn that is appropriate given the user’s immediately previous utterance(s)

11/12/19 Natalie Parde - UIC CS 421 43 • Since they tend to rely only on very recent utterances, many corpus-based chatbots are viewed more as response generation Corpus- systems based • This makes them similar to question answering systems: Chatbots • Focus on single responses • Ignore context or larger conversational goals

11/12/19 Natalie Parde - UIC CS 421 44 • Respond to a user’s turn by repeating some appropriate turn from a corpus of Information natural human conversational text • Any information retrieval algorithm can be Retrieval- used to choose the appropriate response based • Two simple methods: • Return the response to the most similar Chatbots turn • Return the most similar turn

11/12/19 Natalie Parde - UIC CS 421 45 • Look for a turn that resembles the user’s turn, and return the human response to that turn • More formally, given: How can we return • A user query, q the response to the • A conversational corpus, C most similar turn? • Find the turn t in C that is most similar to q (e.g., using cosine similarity) and return the human response to t • � = response(argmax ) ∈

11/12/19 Natalie Parde - UIC CS 421 46 • Directly match the user’s query, q, with turns from C, since a good How can we return response will often share words or the most similar semantic patterns with the prior turn turn? • More formally: • � = argmax ∈

11/12/19 Natalie Parde - UIC CS 421 47 Which of these methods works better?

Depends on the application More often, returning the most similar turn seems to work better Slightly less noise from the alternate method of selecting the response to the most similar turn

11/12/19 Natalie Parde - UIC CS 421 48 Various

techniques • Possible additional features: can be used • Entire conversation with the user so far • Particularly useful when dealing with to improve short user queries, e.g., “yes” • User-specific information performance • Sentiment with IR- • Information from external knowledge based sources chatbots.

11/12/19 Natalie Parde - UIC CS 421 49 • Machine learned sequence transduction: System learns from a corpus to transduce a question to an answer • Machine learning version of ELIZA • Intuition borrowed from phrase-based machine translation • Learn to convert one phrase of text into Encoder-Decoder another Chatbots • Key difference? • In phrase-based machine translation, words or phrases in the source and target sentences tend to align well with one another • In response generation, a user’s input might share no words or phrases with a coherent, relevant response

11/12/19 Natalie Parde - UIC CS 421 50 Encoder-decoder models

• Accept sequential information as How does a input, and return different chatbot learn sequential information as output to perform this Also recently used in: transduction? • Machine translation • Question answering • Summarization

11/12/19 Natalie Parde - UIC CS 421 51 How do encoder-decoder models work? • In NLP applications, encoders and decoders are often some type of RNN • Encoders take sequential input and generate an encoded representation of it • This representation is comprised of the outputs from the last hidden state of the encoder network …it is undecipherable to casual observers! • Decoders take this representation as input and generate a sequential (interpretable) output

Very cold …is it What is the weather Encoder Decoder November or like today? January?

11/12/19 Natalie Parde - UIC CS 421 52 • Basic encoder-decoder models tend to produce repetitive (and therefore boring) responses that don’t encourage further Encoder- conversation • “I’m okay” Decoder • “I don’t know” • To avoid this, it is important to incentivize Chatbots response diversity • Mutual information objective function • Beam search

11/12/19 Natalie Parde - UIC CS 421 53 Encoder-Decoder Chatbots

• Other challenges? • Inability to model prior context • Can be solved by using a hierarchical model that summarizes information over multiple turns • Often poor multi-turn coherence • Can be addressed to some extent using Generator Discriminator reinforcement learning or adversarial networks to learn to choose responses that make the overall conversation more natural

11/12/19 Natalie Parde - UIC CS 421 54 Currently best evaluation technique? • Collect human ratings! How are Automated metrics correlate poorly with human judgements chatbots • Typically measure slot-filling accuracy evaluated? or word overlap with a gold standard • These metrics work best when there are few possible responses and the responses lexically overlap with one another

11/12/19 Natalie Parde - UIC CS 421 55 • Dialogue system has a specific goal (generally, helping a user solve some task) • Making a reservation Task-based • Purchasing a product • Task-based dialogue systems are frame- Dialogue based • Assume a set of user intentions, each Systems of which consist of a collection of slots that can be filled by possible values • Set of frames is sometimes called a domain ontology

11/12/19 Natalie Parde - UIC CS 421 56 GUS: Hello. My name is GUS. I can help you plan a trip by air. Where do you want to go? Client: I want to go to San Diego • Simple (but very long- on May 28 Frame- GUS: What time do you want to lived!) frame-based leave? dialogue system: GUS Client: I must be in San Diego based • Genial Understander before 10 am System GUS: Would you like P.S.A. flight Dialogue 102 that arrives at 9:15 am? • Originally used for Client: When is the next flight? travel planning Systems GUS: Air California flight 310 that (Bobrow et al., 1977) leaves at 8:30am and arrives at 10:00am Client: I’ll take the first one. GUS: In what name should I make a reservation, please? Client: Dan Bobrow GUS: I have confirmed the following flight: P.S.A. flight 102 on Wednesday May 28 from San Jose to San Diego that leaves at 7:30 am and arrives at 9:15 am.

11/12/19 Natalie Parde - UIC CS 421 57 Slots: Specify what the system needs to know • Slot fillers are constrained to Components values of specific semantic types • In the travel domain, these could of Frame- be cities, dates, airlines, or times based Dialogue These types may have Systems hierarchical structures • In GUS, date is a frame itself, with slots for month, year, day, and weekday

11/12/19 Natalie Parde - UIC CS 421 58 Components of Frame-based Dialogue Systems

Slot Type Question Template

ORIGIN CITY city “From what city are you leaving?”

DESTINATION CITY city “Where are you going?”

DEPARTURE TIME time “When would you like to leave?”

DEPARTURE DATE date ”What date would you like to leave?”

ARRIVAL TIME time “When do you want to arrive?”

ARRIVAL DATE date “What day would you like to arrive?”

11/12/19 Natalie Parde - UIC CS 421 59 The control architecture for frame- based dialogue systems is also designed around the frame

• Goal: Control 1. Fill the slots in the frame with the fillers the user intends Structure for 2. Perform the relevant action for the user Frame-based The system achieves its goal by Dialogue asking questions of a user

• Typically these questions are constructed using pre-specified question templates associated with each slot of each frame

11/12/19 Natalie Parde - UIC CS 421 60 Control Structure for Frame- based Dialogue

• The system continues questioning the user until it is able to fill all slots needed to perform the desired task • GUS attaches condition-action rules to slots to reduce monotony • If a user has specified a flight destination city, it may automatically fill the hotel destination slot with that value as well

11/12/19 Natalie Parde - UIC CS 421 61 Control Structure for Frame-based Dialogue • Many domains require multiple frames! • Dialogue systems must be able to disambiguate which slot of which frame a given input is supposed to fill, and then switch dialogue control to that frame • This can be done using production rules • Different types of inputs and recent dialogue history match different frames • Control is switched to the matched frame • Once the system has enough information, it performs the desired task (e.g., querying a database of flights) and returns the result to the user

11/12/19 Natalie Parde - UIC CS 421 62 Natural Language Understanding in Frame- based Dialogue Systems

• In a frame-based dialogue system, natural language understanding is necessary for performing three tasks: • Domain classification • Intent determination • Slot filling

11/12/19 Natalie Parde - UIC CS 421 63 Domain Classification: What is the user talking about? Managing a Booking a flight Setting an alarm Natural calendar Language Understanding Intent Determination: What task is the user trying to in Frame- accomplish? Retrieve all flights in a Delete a calendar based given time window appointment Dialogue Systems Slot Filling: What particular slots and fillers does the user intend the system to understand from their utterance, with respect to their intent?

11/12/19 Natalie Parde - UIC CS 421 64 Natural Language Understanding for Slot Filling in Frame-based Dialogue Systems

Show me the morning flights from Chicago to Dallas on Thursday.

Domain: AIR-TRAVEL Intent: SHOW FLIGHTS Origin-City: Chicago Origin-Date: Thursday Origin-Time: morning Destination-City: Dallas Wake me tomorrow at 6

Domain: ALARM-CLOCK Intent: SET-ALARM Time: 2019-11-13 0600

11/12/19 Natalie Parde - UIC CS 421 65 Natural Language Understanding for Slot Filling in Frame-based Dialogue Systems

In GUS, and in many Rule-based systems Other systems use commercial applications, often include large supervised learning for slots are filled using quantities (thousands!) of slot filling handwritten rules rules structured as semantic grammars wake me (up)? | set (the|an) alarm Semantic Grammar: A context- | get me up → Intent: SET-ALARM free grammar in which the left- hand side of each rule corresponds to the semantic entities (slot names) being expressed Semantic grammars can be parsed using any CFG parsing algorithm

11/12/19 Natalie Parde - UIC CS 421 66 SHOW → show me | i want | can i see

DESTINATION → to CITY

CITY → Chicago | Dallas | Denver | Phoenix

11/12/19 Natalie Parde - UIC CS 421 67 • Automated Speech Recognition: Converts audio input to a string of text • May or may not be constrained based on the Other current dialogue domain and/or intent • Natural Language Generation: Produces the Components utterances that the system outputs to the user • Frame-based systems typically use template- of Frame- based generation • What time do you want to leave ORIGIN- based CITY? • Will you return to ORIGIN-CITY from Dialogue DESTINATION-CITY? • Text to Speech Synthesis: Converts a string of text to Systems an audio output • May be done at runtime or using prerecorded statements or phrases

11/12/19 Natalie Parde - UIC CS 421 68 Why use a simple, rule-based architecture for task-oriented dialogue systems? Advantages • High precision • Given a narrow domain and sufficient and expertise, can provide sufficient coverage Disadvantages of GUS Why explore other options?

• Handwritten rules or grammars can be expensive and slow to create • Low recall

11/12/19 Natalie Parde - UIC CS 421 69 • Dialogue systems are programs capable of communicating with users in natural language • Two types of dialogue systems: • Conversational (chatbot) Summary: • Task-oriented • Dialogue systems must understand a variety of Dialogue communicative functions, including turn-taking, dialogue acts, grounding, conversational structure, initiative, Systems and implicature • Chatbots can be rule-based or corpus-based and • Corpus-based chatbots can use information retrieval or sequence transduction methods Chatbots • Task-based dialogue systems are frame-based • Frame-based dialogue systems assume a set of (Part 1) intentions, each of which consist of a set of slots that can be filled by possible values • Slots can be filled using rule-based or machine learning approaches

11/12/19 Natalie Parde - UIC CS 421 70 • Dialogue-State (or Belief-State) Architecture: A modular dialogue system architecture comprised of six main More components: Sophisticated • Automated Speech Recognition (ASR) Frame-based • Natural Language Understanding (NLU) Dialogue • Dialogue State Tracker Systems • Dialogue Policy • Natural Language Generation (NLG) • Text to Speech (TTS)

11/12/19 Natalie Parde - UIC CS 421 71 Dialogue State Architecture

haɪ, aɪ wʊd laɪk tu ˈɔrdər ˈθɜr ˈtin ˈbʌkəts ʌv ˈʧizi ˈpɑpˌkɔrn.

11/12/19 Natalie Parde - UIC CS 421 72 Dialogue State Architecture

Hi, I would like to order thirteen buckets Automated of cheesy popcorn. Speech Recognition haɪ, aɪ wʊd laɪk tu ˈɔrdər ˈθɜr ˈtin ˈbʌkəts ʌv ˈʧizi ˈpɑpˌkɔrn.

11/12/19 Natalie Parde - UIC CS 421 73 Dialogue State Architecture

Hi, I would like to order thirteen buckets Automated of cheesy popcorn. Natural Speech Language Recognition Understanding haɪ, aɪ wʊd laɪk tu ˈɔrdər ˈθɜr Intent: Order ˈtin ˈbʌkəts ʌv ˈʧizi ˈpɑpˌkɔrn. Buckets: 13 Flavor: Cheesy

11/12/19 Natalie Parde - UIC CS 421 74 Dialogue State Architecture

Previous Dialogue Act: Dialogue State None Tracker

Intent: Order Buckets: 13 Flavor: Cheesy Bucket-Size: ? Ready-By: ?

11/12/19 Natalie Parde - UIC CS 421 75 Dialogue State Architecture

Previous Dialogue Act: Dialogue State None Tracker

Intent: Order Next Dialogue Act: Buckets: 13 Get_Ready-By Dialogue Flavor: Cheesy Bucket-Size: ? Policy Ready-By: ?

11/12/19 Natalie Parde - UIC CS 421 76 Dialogue State Architecture

Previous Dialogue Act: Dialogue State None Tracker

Intent: Order When do you Natural Next Dialogue Act: Buckets: 13 need those? Get_Ready-By Dialogue Flavor: Cheesy Language Bucket-Size: ? Policy Generation Ready-By: ?

11/12/19 Natalie Parde - UIC CS 421 77 Dialogue State Architecture

Previous Dialogue Act: Dialogue State None Tracker wɛn duː juː niːd ðəʊz?

Intent: Order When do you Natural Next Dialogue Act: Buckets: 13 Text to need those? Get_Ready-By Dialogue Flavor: Cheesy Language Bucket-Size: ? Speech Policy Generation Ready-By: ?

11/12/19 Natalie Parde - UIC CS 421 78 Hi, I would like to Hi, I would like to Automated order thirteen buckets Natural order thirteen buckets of cheesy popcorn. Automated of cheesy popcorn. Natural Speech Language Speech Language Recognition Understanding Recognition Understanding haɪ, aɪ wʊd laɪk tu ˈɔrdər ˈθɜr Intent: Order haɪ, aɪ wʊd laɪk tu ˈɔrdər ˈθɜr ˈtin ˈbʌkəts ʌv ˈʧizi ˈpɑpˌkɔrn. Buckets: 13 ˈtin ˈbʌkəts ʌv ˈʧizi ˈpɑpˌkɔrn. Flavor: Cheesy Intent: Order Buckets: 13 Flavor: Cheesy

Previous Dialogue Act: Dialogue State None Tracker wɛn duː juː niːd wɛn duː juː niːd ðəʊz? ðəʊz? Dialogue Manager Intent: Order When do you Natural Next Dialogue Act: Buckets: 13 When do you Natural Text to need those? Get_Ready-By Dialogue Flavor: Cheesy Text to need those? Next Dialogue Act: Bucket-Size: ? Language Language Get_Ready-By Previous Dialogue Act: Speech Policy Ready-By: ? Speech Generation Generation None

The dialogue state tracker and dialogue policy are sometimes grouped together as a single dialogue manager.

11/12/19 Natalie Parde - UIC CS 421 79 Automated Speech Recognition

• Input: Audio • Output: Transcribed string of words • Can be optimized for domain-dependent dialogue systems by constraining the vocabulary to a fixed, smaller set of relevant words

11/12/19 Natalie Parde - UIC CS 421 80 • Very small vocabulary for a given dialogue state → finite state grammar • Larger vocabulary needed for dialogue state → n-gram language model with Automated probabilities conditioned on the dialogue state Speech • State-specific language models are Recognition restrictive grammars • Few options for user → user has less initiative • More options for user → user has more initiative

11/12/19 Natalie Parde - UIC CS 421 81 Automated Speech Recognition

• ASR systems need to work quickly (users are often unwilling to wait for long pauses while their input is processed) • Prioritizing efficiency may necessitate constraining the vocabulary • Generally return a confidence score for an output text sequence • Dialogue system can use this score to determine whether to request clarification, or move forward on the assumption that the sequence is correct

Hypothesis Confidence ˌmɪnɪˈsoʊtə Minnesota 70% Minnesota Mini soda 30%

11/12/19 Natalie Parde - UIC CS 421 82 • Similar to the simple GUS frame-based Natural architecture • Slot fillers are extracted from the user’s Language utterance Understanding • However, generally uses machine learning rather than rules

11/12/19 Natalie Parde - UIC CS 421 83 Dialogue State Tracker and Dialogue Policy

• Dialogue State Tracker: Maintains the current state of the dialogue • Most recent dialogue act • All slot values the user has expressed so far • Dialogue Policy: Decides what the system should do or say next • In GUS, the system just asked questions until the frame was full • In more sophisticated dialogue systems, the policy might help the system decide: • When to answer the user’s questions • When to ask the user a clarification question • When to make a suggestion

11/12/19 Natalie Parde - UIC CS 421 84 Natural Language Generation

• In GUS, sentences produced were from pre- written templates • In more sophisticated dialogue systems, the natural language generation component can be conditioned on prior context to produce more natural-sounding dialogue turns

11/12/19 Natalie Parde - UIC CS 421 85 Text to Speech Synthesis

• Inputs: • Words • Prosodic annotations • Output: • Audio waveform

11/12/19 Natalie Parde - UIC CS 421 86 • Prosody: Elements of speech such as intonation, tone, stress, and rhythm What is • Often carries hints regarding: • A speaker’s emotional state prosody? • The type of utterance being spoken • The presence of sarcasm • The focus of the utterance

11/12/19 Natalie Parde - UIC CS 421 87 Common Prosodic Trends

Question: Is there any popcorn left? Sarcasm: I just love having no popcorn.

Anger: You ate my last bucket of popcorn! Emphasis: So, caramel or cheesy popcorn?

11/12/19 Natalie Parde - UIC CS 421 88 Spoken • Automated speech recognition and text to speech synthesis are only necessary in Dialogue spoken dialogue systems • Dialogue systems which accept spoken Systems vs. input and produce spoken output Text-based • Other dialogue systems can eliminate those components, moving directly from: Dialogue • Input to natural language understanding Systems • Natural language generation to output

11/12/19 Natalie Parde - UIC CS 421 89 Modified Dialogue State Architecture

Natural Intent: Order Buckets: 13 Hi, I would like to Language Flavor: Cheesy order thirteen buckets of cheesy popcorn. Understanding

Dialogue State Previous Dialogue Act: None Tracker

When do you Intent: Order Buckets: 13 need those? Next Dialogue Act: Natural Flavor: Cheesy Get_Ready-By Dialogue Language Bucket-Size: ? Policy Ready-By: ? Generation

11/12/19 Natalie Parde - UIC CS 421 90 • Dialogue-state systems make use of dialogue acts • Interactive function of the turn • Question Dialogue • Statement Acts • Clarification • Different types of dialogue systems require that different types of dialogue acts are labeled • Dialogue tagsets tend to be task-specific

11/12/19 Natalie Parde - UIC CS 421 91 Valid System Valid User Tag Description Act? Act?

� Open a dialogue and give info a=x, b=y, Hello(a=x, b=y, …) � …

Inform(a=x, b=y, …) � � Give info a=x, b=y, …

Request(a, b=x, …) � � Request value for a given b=x, … Sample Reqalts(a=x, …) � Request alternative with a=x, … Dialogue Confirm(a=x, b=y, …) � � Explicitly confirm a=x, b=y, … Implicitly confirm a=x, and request value Confreq(a=x, …, d) � of d

Implicitly confirm a=x, and request value Act Tagset Select(a=x, a=y) � of d

Affirm(a=x, b=y, …) � � Affirm and give further info a=x, b=y, …

Negate(a=x) � Negate and give corrected value a=x

Deny(a=x) � Deny that a=x Bye() � � Close a dialogue

11/12/19 Natalie Parde - UIC CS 421 92 Sample Annotated Dialogue

Valid System Valid User Tag Description Speaker Utterance Dialogue Act Act? Act?

� Open a dialogue and give info a=x, Hello(a=x, b=y, …) � U Hi, I am looking for somewhere to eat. b=y, …

You are looking for a restaurant. What Inform(a=x, b=y, …) � � Give info a=x, b=y, … S type of food do you like?

I’d like an Italian restaurant somewhere Request(a, b=x, …) � � Request value for a given b=x, … U near the lake. Reqalts(a=x, …) � Request alternative with a=x, … Petterino’s is a nice Italian restaurant S Confirm(a=x, b=y, …) � � Explicitly confirm a=x, b=y, … near the lake.

Implicitly confirm a=x, and request Confreq(a=x, …, d) � value of d U Is it reasonably priced?

Implicitly confirm a=x, and request Select(a=x, a=y) � Yes, Petterino’s is in the moderate value of d S price range. Affirm and give further info a=x, Affirm(a=x, b=y, …) � � b=y, … U What is the phone number?

Negate and give corrected value Negate(a=x) � The phone number for Peterino’s is a=x S 123-456-7890 Deny(a=x) � Deny that a=x

Bye() � � Close a dialogue U Okay, thank you. Goodbye!

11/12/19 Natalie Parde - UIC CS 421 93 Sample Annotated Dialogue

Valid System Valid User Tag Description Speaker Utterance Dialogue Act Act? Act?

� Open a dialogue and give info a=x, Hello(a=x, b=y, …) � U Hi, I am looking for somewhere to eat. Hello(task=find, type=restaurant) b=y, …

You are looking for a restaurant. What Inform(a=x, b=y, …) � � Give info a=x, b=y, … S Confreq(type=restaurant, food) type of food do you like?

I’d like an Italian restaurant somewhere Request(a, b=x, …) � � Request value for a given b=x, … U Inform(food=Italian, near=lake) near the lake. Reqalts(a=x, …) � Request alternative with a=x, … Inform(name=“Petterino’s”, Petterino’s is a nice Italian restaurant S type=restaurant, food=Italian, Confirm(a=x, b=y, …) � � Explicitly confirm a=x, b=y, … near the lake. near=lake)

Implicitly confirm a=x, and request Confreq(a=x, …, d) � value of d U Is it reasonably priced? Confirm(pricerange=moderate)

Implicitly confirm a=x, and request Select(a=x, a=y) � Yes, Petterino’s is in the moderate Affirm(name=“Petterino’s”, value of d S price range. pricerange=moderate) Affirm and give further info a=x, Affirm(a=x, b=y, …) � � b=y, … U What is the phone number? Request(phone)

Negate and give corrected value Negate(a=x) � The phone number for Peterino’s is Inform(name=“Petterino’s”, a=x S 123-456-7890 phone=“123-456-7890”) Deny(a=x) � Deny that a=x

Bye() � � Close a dialogue U Okay, thank you. Goodbye! Bye()

11/12/19 Natalie Parde - UIC CS 421 94 • Special case of supervised semantic parsing • Labeled training set associates each Slot Filling in sentence with the correct set of slots, Dialogue domain, and intent • Many possible ways train a classifier for this State purpose • One method: Architectures • Train a sequence model to map from input words to slot fillers, domain, and intent

11/12/19 Natalie Parde - UIC CS 421 95 Slot Filling in Dialogue State Departure-Date Architectures ∅ Destination

∅

Sunday on Chicago

fly

11/12/19 Natalie Parde - UIC CS 421 96 Slot Filling in • Domain and intent can be determined via: Dialogue • One vs. many classifier • Adding domain+intent as the desired State output for the final end-of-sentence Architectures token in the sequence labeler

11/12/19 Natalie Parde - UIC CS 421 97 Common Industrial Approach to ML- based Slot Filling • Bootstrapping! • Start with a seed GUS-style, rule-based system • Predict labels for new user utterances • Train classifier on new utterances with predicted labels • Repeat as needed

11/12/19 Natalie Parde - UIC CS 421 98 • Core component of task-based dialogue systems • Decides what step to take next to bring the conversation closer to its goal Dialogue • Can range from simple (minimal history and/or state tracking) to complex (advanced Management state tracking and dialogue policy modules) • Simplest dialogue management architecture: • Finite state dialogue manager

11/12/19 Natalie Parde - UIC CS 421 99 States (nodes) • Questions that the dialogue manager asks the Finite user State Transitions (arcs) • Actions to take depending on how the user Dialogue responds Manager System has full conversational initiative! • Asks a series of questions • Ignores or misinterprets inputs that are not direct answers to questions

11/12/19 Natalie Parde - UIC CS 421 100 Finite State Dialogue Manager

Do you want to go from ORIGIN to DESTINATION Yes on DATE?

Yes From what What date Book Where are city are do you Is it a one- you Flight you want to way trip? going? departing? leave? Yes

Do you want to No What date go from to you ORIGIN to DESTINATION want to on DATE return? returning on RETURN?

11/12/19 Natalie Parde - UIC CS 421 101 Finite State Dialogue Manager

• Many finite state systems also allow universal commands • Commands that can be stated anywhere in the dialogue and still be recognized • Help • Start over • Correction • Quit

11/12/19 Natalie Parde - UIC CS 421 102 Advantages and Disadvantages of Finite State Dialogue Managers

Advantages: Disadvantages:

• Easy to implement • Can be awkward • Sufficient for simple and annoying tasks • Cannot easily handle complex sentences

11/12/19 Natalie Parde - UIC CS 421 103 Dialogue Management

• More common in modern dialogue state architectures: • Dialogue state tracker • Dialogue policy

11/12/19 Natalie Parde - UIC CS 421 104 What does a dialogue state tracker do?

Determine both:

• The current state of the frame • What slots have been filled, and how? • The user’s most recent dialogue act

Current state of the frame: More than just the slot fillers expressed in the current sentence! • Entire state of the frame up to and including this point

11/12/19 Natalie Parde - UIC CS 421 105 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …)

Inform(a=x, b=y, …)

Request(a, b=x, …)

Reqalts(a=x, …)

Confirm(a=x, b=y, …)

Confreq(a=x, …, d)

Select(a=x, a=y)

Affirm(a=x, b=y, …)

Negate(a=x)

Deny(a=x)

Bye()

11/12/19 Natalie Parde - UIC CS 421 106 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …)

Inform(a=x, b=y, …)

Request(a, b=x, …)

Reqalts(a=x, …)

Confirm(a=x, b=y, …)

Confreq(a=x, …, d) inform(price=expensive)

Select(a=x, a=y)

Affirm(a=x, b=y, …)

Negate(a=x)

Deny(a=x)

Bye()

11/12/19 Natalie Parde - UIC CS 421 107 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …)

Inform(a=x, b=y, …)

Sure. What cuisine? Request(a, b=x, …)

Reqalts(a=x, …)

Confirm(a=x, b=y, …) Turkish food would be great. Confreq(a=x, …, d) inform(price=expensive)

Select(a=x, a=y)

Affirm(a=x, b=y, …)

Negate(a=x)

Deny(a=x)

Bye()

11/12/19 Natalie Parde - UIC CS 421 108 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …)

Inform(a=x, b=y, …)

Sure. What cuisine? Request(a, b=x, …)

Reqalts(a=x, …)

Confirm(a=x, b=y, …) Turkish food would be great. inform(price=expensive, Confreq(a=x, …, d) cuisine=Turkish)

Select(a=x, a=y)

Affirm(a=x, b=y, …)

Negate(a=x)

Deny(a=x)

Bye()

11/12/19 Natalie Parde - UIC CS 421 109 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …)

Inform(a=x, b=y, …)

Sure. What cuisine? Request(a, b=x, …)

Reqalts(a=x, …)

Confirm(a=x, b=y, …) Turkish food would be great. inform(price=expensive, Confreq(a=x, …, d) cuisine=Turkish, area=ChicagoTheatre) Select(a=x, a=y) Okay. Where should the restaurant be? Affirm(a=x, b=y, …)

Negate(a=x) Near the Chicago Theatre. Deny(a=x)

Bye()

11/12/19 Natalie Parde - UIC CS 421 110 Example: Dialogue State Tracker

Tag I’m looking for an upscale restaurant. Dialogue State Tracker Hello(a=x, b=y, …) Sure. What cuisine? Inform(a=x, b=y, …)

Turkish food would be great. Request(a, b=x, …)

Reqalts(a=x, …) inform(price=expensive, cuisine=Turkish, Okay. Where should the restaurant be? Confirm(a=x, b=y, …) area=ChicagoTheatre); Confreq(a=x, …, d) Near the Chicago Theatre. affirm(price=expensive, Select(a=x, a=y) cuisine=Turkish,

So an upscale Turkish restaurant near Affirm(a=x, b=y, …) area=ChicagoTheatre) the Chicago Theatre? Negate(a=x)

Yes, please. Deny(a=x) Bye()

11/12/19 Natalie Parde - UIC CS 421 111 How can we detect corrections?

• Users generally correct errors (either theirs or the system’s) by repeating or ---or, reformulating their utterance Turkish food sounds great wait, no. How about Greek • Harder to do than detecting regular food instead? utterances! • Speakers often hyperarticulate corrections I said TURKISH FOOD not TURKEY LEGS. • Common characteristics of corrections: • Exact or close-to-exact repetitions • Paraphrases -ISH-FOOD! • Contain “no” or swear words No. TURK • Low ASR confidence

11/12/19 Natalie Parde - UIC CS 421 112 Dialogue Policy

• Goal: Determine what action the system should take next • What dialogue act should be generated?

• More formally, at turn i in the conversation, we want to predict which action Ai to take, based on the entire dialogue state (the sequence of dialogue acts from the system A and the user U): • � = argmax �(�|�, �, … , �, �) ∈ • To simplify this, we can maintain the dialogue state as the set of slot-fillers the users has expressed (thereby allowing us to condition on the current state of Framei and the last turn): • � = argmax �(�|��, �, �) ∈

11/12/19 Natalie Parde - UIC CS 421 113 • Neural classifier trained on vector representations of the slot fillers and utterances • More sophisticated models may also use How can we reinforcement learning • Reinforcement learning system gets a estimate reward at the end of a dialogue • Uses that reward to train a policy to take an these optimal sequence of actions • Large positive reward if the dialogue probabilities? system terminates with the correct slot representation for a training instance • Large negative reward if all slots are wrong

11/12/19 Natalie Parde - UIC CS 421 114 How can dialogue managers handle mistakes?

• First, check to make sure the user’s input has been interpreted correctly: • Confirm understandings with the user • Reject utterances that the system is likely to have misunderstood • These checks can be performed explicitly or implicitly

11/12/19 Natalie Parde - UIC CS 421 115 • System asks the user a S: From which city do you direct question to confirm its want to leave? Explicit understanding U: Chicago. Confirmation S: You want to leave from Chicago? U: Yes.

U: I’d like to fly from Chicago to Dallas on November twenty-seventh. S: Okay, I have you going from Chicago to Dallas on November twenty-seventh. Is that correct? U: Yes.

11/12/19 Natalie Parde - UIC CS 421 116 • System demonstrates its U: I want to travel to Implicit understanding as a Chicago. grounding strategy Confirmation S: When do you want to • Usually done by repeating travel to Chicago? back its understanding as part of the next question U: Hi, I’d like to fly to Chicago tomorrow afternoon. S: Traveling to Chicago on November fifteenth in the afternoon. What is your full name?

11/12/19 Natalie Parde - UIC CS 421 117 When to use explicit vs. implicit confirmation?

Explicit Confirmation Implicit Confirmation • Easier for users to correct the • Sounds more conversationally system’s misrecognitions natural • User can just say “no” when the • More efficient system tries to confirm • But, more difficult for users to • But, can be awkward and unnatural correct the system’s misrecognitions • Also extends the length of the conversation

11/12/19 Natalie Parde - UIC CS 421 118 What if the dialogue manager has no idea how to handle the user’s input? • Systems can outright reject the user’s input • Often, when utterances are rejected, systems will follow a strategy of progressive prompting or escalating detail

I’m sorry, I didn’t understand that.

11/12/19 Natalie Parde - UIC CS 421 119 • System gradually gives S: When would you like to Progressive the user more guidance leave? about how to formulate an U: Well, I need to be in Prompting utterance that will be Dallas for Thanksgiving. accepted S: Sorry, I didn’t get that. Please say the month and day you’d like to leave. U: I would like to leave on November twenty- seventh.

11/12/19 Natalie Parde - UIC CS 421 120 • Rapid Reprompting: For the first rejection, the system just says “I’m sorry?” or “What was that?” • From the second rejection onward, progressive prompting can be applied • Users tend to prefer rapid reprompting as a Other first-level error prompt (Cohen et al., 2004) Strategies • Explicitly confirm low-confidence ASR outputs • Design thresholds based on the cost of making for Error an error • Low confidence → Reject Handling • Confidence just above minimum threshold → Confirm explicitly • Confidence comfortably above threshold → Confirm implicitly • Very high confidence → Don’t confirm at all

11/12/19 Natalie Parde - UIC CS 421 121 • Creates a natural language realization of the selected dialogue act Natural • Two stages: Language • Content planning • What should be said? Generation • Surface realization • How should it be said?

11/12/19 Natalie Parde - UIC CS 421 122 Content Planning

• Most of the work is done by the dialogue policy • Which dialogue act should be generated? • What attributes (slots and values) should be included in the dialogue act?

Dialogue Act: Recommend

Prespecified Attributes: Cuisine=Turkish Area=ChicagoTheatre Price=Expensive

11/12/19 Natalie Parde - UIC CS 421 123 • Generates a sentence of the specified type, containing the specified attributes • Often a machine learning model trained on many examples of representation/sentence pairs Surface • Recommend(Cuisine=Turkish, Area=ChicagoTheatre, Price=Expensive) Realization • “So you want an upscale Turkish restaurant near the Chicago Theatre?” • “Okay, so we’re looking for a high-end Turkish restaurant near the Chicago Theatre.”

11/12/19 Natalie Parde - UIC CS 421 124 What if it is hard to find training data like this? • Unlikely that we’ll see every possible combination of attributes • Thus, generality of training samples can be increased by delexicalization • Delexicalization: The process of replacing specific words in the training set that represent slot values with generic placeholder tokens

Recommend(Cuisine=Turkish, Area=ChicagoTheatre, Price=Expensive)

So you want an PRICE CUISINE restaurant near the AREA?

Okay, so we’re looking for a PRICE CUISINE restaurant near the AREA.

11/12/19 Natalie Parde - UIC CS 421 125 Mapping from • Generally performed by encoder-decoder models Frames to • Input: Sequence of Delexicalized tokens that represent the dialogue act and its Sentences arguments • Cuisine=Turkish • Price=Expensive • Area=ChicagoTheatre • DialogueAct=Recommend • Output: Delexicalized sentence Cuisine=Turkish So you want a PRICE CUISINE Price=Expensive Decoder Area=ChicagoTheatre Encoder restaurant near DialogueAct=Recommend the AREA?

11/12/19 Natalie Parde - UIC CS 421 126 Relexicalization

• Once we’ve generated a delexicalized string, we need to relexicalize it • Relexicalization: Filling in generic slots with specific words • We can do this using the input frame from the content planner

Dialogue Act: Recommend So you want an So you want an PRICE CUISINE Prespecified Attributes: expensive Turkish restaurant near the AREA? Cuisine=Turkish restaurant near the Area=ChicagoTheatre Chicago Theatre? Price=Expensive

11/12/19 Natalie Parde - UIC CS 421 127 • Humans clarification questions tend to be targeted at specific elements of the misunderstanding (Purver 2004, Ginzburg and Sag 2000, Stoyanchev et al. 2013) Generating • These can be created using rule-based approaches, or by probabilistically guessing Clarification which slots in a sentence might have been Questions misrecognized What flights are going to UNKNOWN on November 14th?

What flights are going to where on November 14th?

Can you please repeat your destination for November 14th?

11/12/19 Natalie Parde - UIC CS 421 128 Evaluating Task-based Dialogue Systems

• Task success • User satisfaction • Efficiency cost • Quality cost

11/12/19 Natalie Parde - UIC CS 421 129 Measuring Task Success

• How correct was the total solution? • Slot Error Rate: The percentage of slots that were filled with incorrect values # of inserted, deleted, or substituted slots • Slot Error Rate = # of total reference slots

Slot Filler What flights are going ORIGIN Chicago from Chicago to Dallas DESTINATION Denver Slot Error Rate = ¼ = 0.25 on the afternoon of November 27th? TIME afternoon DATE 11/27/2019

11/12/19 Natalie Parde - UIC CS 421 130 • Alternative metric: task error rate • Task Error Rate: The percentage of times that the overall task was completed incorrectly • Was the (correct) meeting added to the Measuring calendar? • Did users end up booking the flights they Task wanted? • In addition to slot error rate and task error Success rate, we can apply our standard NLP metrics: • Precision • Recall • F-measure

11/12/19 Natalie Parde - UIC CS 421 131 Measuring User On a scale from 1 (worst) to 5 (best)…. Satisfaction TTS Performance Was the system easy to understand? ASR Performance Did the system understand what you said? Task Ease Was it easy to find the information you wanted? • Typically survey-based Interaction Pace Was the pace of interaction with the system • Users interact with a appropriate? dialogue system to User Expertise Did you know what you could say at each perform a task, and then point? complete a questionnaire System Response Was the system often sluggish and slow to about their experience reply to you? Expected Behavior Did the system work the way you expected it to? Future Use Do you think you’d use the system in the future?

11/12/19 Natalie Parde - UIC CS 421 132 • How efficiently does the system help users perform tasks? • Total elapsed time Measuring • Number of total turns Efficiency • Number of system turns • Number of user queries Cost • Turn correction ratio • Number of system or user turns that were used solely to correct errors, divided by the total number of turns

11/12/19 Natalie Parde - UIC CS 421 133 Measuring Quality Cost

• What are the costs of other aspects of the interaction that affect users’ perceptions of the system? • Number of times the ASR system fails to return anything useful • Number of times the user had to interrupt the system • Number of times the user didn’t respond to the system quickly enough (causing event time-outs or follow-up prompts) • Appropriateness/correctness of the system’s questions, answers, and error messages

11/12/19 Natalie Parde - UIC CS 421 134 Dialogue System Design

• Users play an important role in designing dialogue systems • Research in dialogue systems is closely linked to research in human- computer interaction • Design of dialogue strategies, prompts, and error messages is often referred to as voice user interface design

11/12/19 Natalie Parde - UIC CS 421 135 Voice User Interface Design

• Generally follows user-centered design principles 1. Study the user and task 2. Build simulations and prototypes 3. Iteratively test the design on users

11/12/19 Natalie Parde - UIC CS 421 136 Studying the User and Task

• Understand the potential users • Interview them about their needs and expectations • Observe human-human dialogues • Understand the nature of the task • Investigate similar dialogue systems • Talk to domain experts

11/12/19 Natalie Parde - UIC CS 421 137 Building Simulations and Prototypes

• Wizard-of-Oz Studies: Users interact with what they think is an automated system (but that is actually a human “wizard” disguised by a software interface) • Wizard-of-Oz studies can be used to test architectures prior to implementation 1. Wizard gets input from the user 2. Wizard uses a database to run sample queries based on the user input 3. Wizard outputs a response, either by typing it or by selecting an option from a menu 4. Often used in text-only interactions, but the output can be disguised using a text to speech system for voice interfaces • Wizard-of-Oz studies can also be used to collect training data • Although not a perfect simulation of the real system (they tend to be idealistic), results from Wizard-of-Oz studies provide a useful first idea of domain issues

11/12/19 Natalie Parde - UIC CS 421 138 Iteratively Testing the Design

• Often, users will interact with the system in unexpected ways • Testing prototypes early (and often) minimizes the chances of substantial issues in the final version • Application designers are often not able to anticipate these issues since they’ve been working on the design for so long themselves!

11/12/19 Natalie Parde - UIC CS 421 139 Ethical Issues in Dialogue System Design

• Bias • Machine learning systems of any kind tend to replicate human biases that occur in training data • Especially problematic for chatbots that are trained to replicate human responses! • Microsoft’s Tay chatbot: https://www.theverge.com/2016/3/24/11297050/tay-microsoft- chatbot-racist • Corpora drawn from social media (e.g., Twitter or Reddit) tend to be particularly problematic (Henderson et al. 2017, Hutto et al. 2015, Davidson et al. 2017)

11/12/19 Natalie Parde - UIC CS 421 140 Ethical Issues in Dialogue System Design

• Privacy • Home dialogue agents may accidentally record private information, which may then be used to train a conversational model • Adversaries can potentially recover this information • Very important to anonymize personally identifiable information when training chatbots on transcripts of human-human or human-machine conversation! • Gender Equality • Current chatbots tend to be assigned female names and voices • Perpetuates stereotypes of subservient females • Most commercial chatbots evade or give positive responses to sexually harassing language, rather than responding in clear negative ways (Fessler, 2017)

11/12/19 Natalie Parde - UIC CS 421 141 • Modern dialogue systems tend to use the dialogue-state architecture, which contains components for: • Automated speech recognition • Natural language understanding Summary: • Dialogue state tracking • Dialogue policy • Natural language generation Dialogue • Text to speech • These components have to handle many expected and Systems unexpected inputs (different dialogue act types, as well as unrecognized, corrected, or mistaken input) and • Dialogue systems are typically evaluated based on task success, user satisfaction, efficiency cost, and quality Chatbots cost • One way to gain an initial understanding of domain issues (as well as to collect relevant data) is to conduct a Wizard- (Part 2) of-Oz study • Dialogue system designers should be aware of ethical issues in dialogue system design, including concerns about bias, privacy, and gender equality

11/12/19 Natalie Parde - UIC CS 421 142