Technical Disclosure Commons
Defensive Publications Series
June 2021
VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
Xin Li
Zheng Kou
Sriram Natarajan
Sukhyun Song
Follow this and additional works at: https://www.tdcommons.org/dpubs_series
Recommended Citation Li, Xin; Kou, Zheng; Natarajan, Sriram; and Song, Sukhyun, "VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES", Technical Disclosure Commons, (June 09, 2021) https://www.tdcommons.org/dpubs_series/4368
This work is licensed under a Creative Commons Attribution 4.0 License. This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons. Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
ABSTRACT
A vehicle (e.g., automobile, motorcycle, a bus, a recreational vehicle (RV), a semi-trailer
truck, a tractor or other type of farm equipment, a train, a plane, a helicopter, etc.) may include a
so-called “head unit” that provides a voice user interface (VUI) by which to enable spoken
human interaction with the head unit to respond to requests for permission (e.g., to access user
personal data, to enable the usage of third-party services, etc.). For example, responsive to
detecting that an action to be performed has not been granted permission, the head unit may
produce (e.g., via one or more speakers) an audio prompt requesting the required permission. A
user may answer the audio prompt with an audio input in the form of human speech, which the
head unit may receive (e.g., via one or more microphones). The head unit may parse the audio
input using speech recognition (e.g., a natural language understanding module) to identify a valid
input (e.g., grant or deny permission to perform an action, request additional information, etc.) to
which the audio input corresponds and, responsive to identifying a valid input, the head unit may
perform the action (e.g., granting or denying permission to perform the action, providing
additional information, etc.) associated with the valid input. In this way, the head unit may
enable the user to control the granting of permissions via the VUI, which may be particularly
beneficial in vehicle settings in which the user is operating the vehicle, as the hands-free, eyes-
free user experience may reduce distractions to the user while operating the vehicle and thereby
promote safety.
Published by Technical Disclosure Commons, 2021 2 Defensive Publications Series, Art. 4368 [2021]
DESCRIPTION
FIG. 1 below is a conceptual diagram illustrating a system 10 including a head unit 100
of a vehicle (e.g., automobile, motorcycle, a bus, a recreational vehicle (RV), a semi-trailer truck,
a tractor or other type of farm equipment, a train, a plane, a helicopter, etc.) and a computing
system 120. As shown in FIG. 1, head unit 100 includes one or more processors 102, a display
104, one or more communication components 106 (“COMM components 106”), one or more
speakers 108, one or more microphones 110, and one or more storage devices 112. As further
shown in FIG. 1, computing system 120 includes one or more processors 122, one or more
communication components 126 (“COMM components 126”), and one or more storage devices
132.
https://www.tdcommons.org/dpubs_series/4368 3 Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
Head unit 100 of system 10 may represent an integrated head unit that provides an
interface (e.g., a voice user interface (VUI), a graphical user interface (GUI), etc.). In general,
head unit 100 may control one or more vehicle systems, such as a heating, ventilation, and air
conditioning (HVAC) system, a lighting system (for controlling interior and/or exterior lights),
an infotainment system, a seating system (for controlling a position of a driver and/or passenger
seat), etc. Head unit 100 may be included in a motorcycle, a bus, a RV, a semi-trailer truck, a
tractor or other type of farm equipment, a train, a plane, a helicopter, a personal transport
vehicle, or any other type of vehicle.
Processors 102 may implement functionality and/or execute instructions associated with
head unit 100. Examples of processors 102 may include one or more of an application specific
integrated circuit (ASIC), a field programmable gate array (FPGA), an application processor, a
display controller, an auxiliary processor, a central processing unit (CPU), a graphics processing
unit (GPU), one or more sensor hubs, and any other hardware configure to function as a
processor, a processing unit, or a processing device. One or more applications 114 and a voice
user interface module 116 (“VUI module 116”) may be operable by processors 102 to perform
various actions, operations, or functions of head unit 100.
Display 104 of head unit 100 may be a presence-sensitive display that functions as an
input device and as an output device. For example, presence-sensitive display 104 may function
as an input device using a presence-sensitive input component, such as a resistive touchscreen, a
surface acoustic wave touchscreen, a pressure-sensitive screen, an acoustic pulse recognition
touchscreen, or another presence-sensitive display technology. Additionally, presence-sensitive
display 104 may function as an output (e.g., display) device using any of one or more display
Published by Technical Disclosure Commons, 2021 4 Defensive Publications Series, Art. 4368 [2021]
components, such as a liquid crystal display (LCD), dot matrix display, light emitting diode
(LED) display, active-matrix organic light-emitting diode (AMOLED) display, etc.
COMM components 106 of head unit 100 may include wireless communication devices
capable of transmitting and/or receiving communication signals, such as a cellular radio, a 3G
radio, a 4G radio, a 5G radio, a Bluetooth® radio (or any other PAN radio), an NFC radio, or a
Wi-Fi™ radio (or any other wireless local area network (WLAN) radio). COMM components
106 may be configured to send and receive information via a network 111 (e.g., a local area
network (LAN), wide area network (WAN), a global network, such as the Internet, etc.).
Storage devices 112 of head unit 100 may include one or more computer-readable storage
media. For example, storage devices 112 may be configured for long-term, as well as short-term
storage of information, such as instructions, data, or other information used by head unit 100. In
some examples, storage devices 112 may include non-volatile storage elements. Examples of
such non-volatile storage elements include magnetic hard discs, optical discs, solid state discs,
and/or the like. In other examples, in place of, or in addition to the non-volatile storage elements,
storage devices 112 may include one or more so-called “temporary” memory devices, meaning
that a primary purpose of these devices may not be long-term data storage. For example, the
devices may comprise volatile memory devices, meaning that the devices may not maintain
stored contents when the devices are not receiving power. Examples of volatile memory devices
include random-access memories (RAM), dynamic random-access memories (DRAM), static
random-access memories (SRAM), etc.
Computing system 120 of system 10 may be any suitable remote computing system, such
as one or more desktop computers, laptop computers, mainframes, servers, cloud computing
systems, virtual machines, etc., capable of sending and receiving information via network 111.
https://www.tdcommons.org/dpubs_series/4368 5 Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
As shown in FIG .1, computing system 120 includes processors 122, which may be substantially
similar to processors 102, COMM components 126, which may be substantially similar to
COMM components 106, and storage devices 132, which may be substantially similar to storage
devices 112. While described herein as being performed at least in part by computing system
120, any or all techniques of the present disclosure may be performed by one or more other
devices, such as head unit 100. That is, in some examples, head unit 100 may be operable to
perform one or more techniques of the present disclosure alone.
Head unit 100 may require permission from a user of a vehicle to perform an action, such
as accessing the user’s contact list (e.g., on a computing device connected to head unit 100),
disclosing the user’s information to make a third-party query, etc.). In general, head unit 100
may request the permission by, for example, displaying one or more GUI elements via display
104 of head unit 100. Responsive to the user providing user input (e.g., a tap, a long press, a
swipe, etc.) via display 104, head unit 100 may perform the action corresponding to the user
input (e.g., granting permission, denying permission, presenting additional information, etc.).
However, this user experience may be dangerous because, to interact with display 104, the user
may need to look at display 104 and remove at least one hand from the steering wheel of the
vehicle, potentially distracting the user from a driving task.
In accordance with techniques of this disclosure, head unit 100 may include VUI module
116 configured to enable spoken human interaction with head unit 100 to respond to requests for
permission (e.g., to access user personal data, to enable the usage of third-party services, etc.).
For example, responsive to head unit 100 detecting that application 114 (e.g., a dialer
application) has not been granted permission, VUI module 116 of head unit 100 may produce,
via speakers 108 (e.g., speakers built into the vehicle, speakers integrated into a computing
Published by Technical Disclosure Commons, 2021 6 Defensive Publications Series, Art. 4368 [2021]
device, portable speakers, etc.) an audio prompt requesting the required permission. In some
examples, the audio prompt may identify the permission being requested. For example, if
application 114 needs ‘user device contacts setting’ permission to access a user’s contacts stored
at a computing device (e.g., connected to head unit 100), the audio prompt emitted by VUI
module 116 via speakers 108 may identify the permission being requested as the ‘user device
contacts setting’ permission, thus giving a user more information to make a decision regarding
the requested permission.
A user may respond to the audio prompt with an audio input in the form of human
speech. Head unit 100 may receive the audio input from the user via microphones 110 (e.g.,
microphones built into the vehicle, microphones built into a computing device, portable
microphones, etc.) and store the audio input data into an audio input repository 118. Head unit
100 may transmit, via COMM components 106, the audio input data stored in audio input
repository 118 to computing system 120 via network 111. Computing system 120 may then store
the audio input data in audio input repository 138.
A natural language understanding module 136 (“NLU module 136”) of computing
system 120 may be configured to analyze the audio input data stored in audio input repository
138 using speech recognition techniques to match the user’s response with a valid input to the
audio prompt provided by VUI module 116. A valid input may be any input with a semantic
meaning that matches a semantic meaning of a predetermined response to the audio prompt, such
as an input corresponding to a grant of permission, an input corresponding to a denial of
permission, an input corresponding to a request for additional information, etc.
NLU module 136 may be a machine learning model trained to perform natural language
processing tasks, such as converting speech to text, analyzing the text to determine its semantic
https://www.tdcommons.org/dpubs_series/4368 7 Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
meaning, etc. In some examples, NLU module 136 may represent one more neural networks,
such as one or more recurrent neural networks. In some instances, at least some of the nodes of a
recurrent neural network may form a cycle. That is, NLU module 136 may pass or retain
information from a previous portion of the input data sequence to a subsequent portion of the
input data sequence through the use of recurrent or directed cyclical node connections, where the
input data sequence includes words in a sentence for natural language processing, speech
detection or processing, etc.
Responsive to NLU module 136 determining the input to which the audio input
corresponds, computing system 120 may transmit an indication of the corresponding input to
head unit 100 via network 111. Head unit 100 may then perform the action associated with the
corresponding input. For example, if NLU module 136 determines that the audio input
corresponds to granting application 114 permission to access a user’s contacts stored at a
computing device (e.g., connected to head unit 100), head unit 100 may grant application 114
permission to do so. In another example, if NLU module 136 determines that the audio input
corresponds to denying application 114 permission to access a user’s contacts stored at a
computing device (e.g., connected to head unit 100), head unit 100 may deny application 114
permission to do so. In yet another example, if NLU module 136 cannot determine an input to
which audio input corresponds, VUI module 116 may repeat the audio prompt to the user to
receive and process a new audio input.
Published by Technical Disclosure Commons, 2021 8 Defensive Publications Series, Art. 4368 [2021]
FIG. 2 is a flow diagram illustrating an example dialogue between VUI module 116 and a
user, where the blue text represents audio prompts from VUI module 116, and the green text
represents audio input from the user. As shown in FIG. 2, VUI module 116 may explain, by
emitting sound via speakers 108, the details of the permission being requested by, for example,
application 114, which in the example of FIG. 2 is a third-party application, and ask the user to
grant permission. The style of the audio prompt provided by VUI module 116 may be
personalized and colloquial to encourage a user-friendly dialogue. VUI module 116 may receive
audio input (e.g., “yes,” “more information,” “no,” any other query, etc.) from the user via
microphones 110. Head unit 100 may then transmit the audio input data to computing system
120 via network 111 for NLU module 136 to match the user’s response with a corresponding
(valid) input to the audio prompt from the user. Upon completion thereof, computing system 120
https://www.tdcommons.org/dpubs_series/4368 9 Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
may transmit an indication of the corresponding input to head unit 100 via network 111. Head
unit 100 may then perform an action associated with the corresponding input.
For example, if a user says, “sure,” to the audio prompt from VUI module 116 requesting
permission on behalf of application 114, NLU module 136 may process the audio input data
associated with that audio input and determine that the semantic meaning of “sure” matches the
semantic meaning of the predetermined response “yes.” NLU module 136 may transmit an
indication that the corresponding input is “yes,” and VUI module 116 may convert a
predetermined text response associated with the corresponding input “yes” into speech (e.g.,
outputted by speakers 108). In the example of FIG. 2, the predetermined text response is “sure,
you can learn more later in assistant settings if you’re interested.” As shown in the example of
FIG. 2, VUI module 116 may also provide a notification (e.g., via display 104) associated with
the corresponding input “yes.” VUI module 116 may perform the action associated with the
corresponding input “yes” by granting application 114 permission to access user information to
fulfill requests and save preferences.
In another example, if a user says, “I don’t know. Why does the application want my
information?” NLU module 136 may determine that the semantic meaning of that audio data
(e.g., after being converted to text) matches the semantic meaning of the predetermined response
“More information/what do you mean?” NLU module 136 may transmit an indication of that
corresponding input, and VUI module 116 may output a text-to-speech response via speakers
108, such as “You can find more details in application settings if you’d like to take a look later.
For now, I’ll cancel your request.” Head unit 100 may then perform the action associated with
“More information/what do you mean?” by cancelling the request. In yet another example, if the
user says, “nope,” NLU module 136 may determine that the semantic meaning of “nope”
Published by Technical Disclosure Commons, 2021 10 Defensive Publications Series, Art. 4368 [2021]
matches the semantic meaning of the predetermined response “no,” resulting in VUI module 116
outputting the text-to-speech response, “OK, I’ve canceled your request. You can learn more
later in application settings if you’re interested.” Head unit 100 may then cancel the request.
In yet another example, if a user provides an audio input with no corresponding input
(e.g., the user says something unrelated to the audio prompt) or no input (e.g., the user is silent),
NLU module 136 may determine that the semantic meaning of the audio input matches the
semantic meaning of “Any other query (no match) / no-input.” In this case, VUI module 116
may output another audio prompt, such as “Sorry, are you OK with sharing your information and
preferences?” The user may respond to the audio prompt with an audio input similar to the ones
described in previous examples (e.g., “yes,” “no,” any other query, no input, etc.). If the user
responds “yes,” then head unit 100 fulfills the request for permission and outputs dialogue like
“Let’s get the application.” If the user responds “no,” then head unit 100 denies the request and
may state that the request has been canceled. If the audio data corresponds to any other query or
no input, then head unit 100 may cancel the request, and VUI module 116 may output “OK, I’ve
canceled your request to Application. You can learn more in application settings.”
Responsive to VUI module 116 outputting terminating dialogue (e.g., “Let’s get the
application,” “OK, I’ve canceled your request to Application. You can learn more in application
settings,” etc.) head unit 100 may stop receiving or processing audio data from microphone 110.
For example, head unit 100 may cause microphone 110 to not actively detect audio by disabling
microphone 110, powering off microphone 110, etc. In some examples, head unit 100 may cause
microphone 110 to stop providing the stream of audio data to processors 102.
One or more advantages of the techniques described in this disclosure include providing a
user a hands-free, eyes-free user experience for controlling the granting of permissions via a
https://www.tdcommons.org/dpubs_series/4368 11 Li et al.: VOICE USER INTERFACE BASED PERMISSION GRANT SYSTEM FOR VEHICLES
VUI, which may be particularly beneficial in vehicle settings in which the user is operating the
vehicle, as such a user experience may reduce distractions and thereby promote safety. Another
advantage includes providing a user the opportunity to address the granting or denying of
permissions in real-time or just-in-time (e.g., as opposed to notifying the user that an application
was granted permission to perform an action without the user’s consent), thus promoting the
user’s interests in data privacy, network usage, etc.
It is noted that the techniques of this disclosure may be combined with any other suitable
technique or combination of techniques. As one example, the techniques of this disclosure may
be combined with the techniques described in U.S. Patent Application Publication No.
2020/0099538A1. In another example, the techniques of this disclosure may be combined with
the techniques described in U.S. Patent Application Publication No. 2020/0143812A1. In yet
another example, the techniques of this disclosure may be combined with the techniques
described in U.S. Patent Application Publication No. 2020/0221420A1. In yet another example,
the techniques of this disclosure may be combined with the techniques described in U.S. Patent
Application Publication No. 2016/0164881A1.
Published by Technical Disclosure Commons, 2021 12