Voice Control Desktop Assistant

High Technology Letters ISSN NO : 1006-6748 Voice Control Desktop Assistant Jessica Sarah 1, Amisha Michelle Danny 2, Juan Mark Deen 3 Ankit Ahirwar 4, Abhishek Solanki 5, Anurag Shrivastava 6, Divyansh Gajbe 7 1Department of computer science and Engineering Vellore Institute of Technology, Kotri Kalan, Ashta, Near, Indore Road, Bhopal, Madhya Pradesh 466114, India 2Department of Computer Science and Engineering Kalinga Institute of Industrial Technology, KIIT Road, Patia, Bhubaneswar, Odisha 751024,India 3Department of computer science and Engineering and Bioinformatics Vellore Institute of Technology, Vellore Campus, Tiruvalam Rd, Katpadi, Vellore, Tamil Nadu 632014, India, 4,5,6,7 Department of Computer Science and Engineering University Institute of Technology, RGPV ,Bhopal 462033,India Abstract In recent times, the demand for controlling electronic devices through voice speech is increasing day by day. Now days working on computers during this time of pandemic has increased drastically. With more and more work on a desktop. It will make a little annoying to do general and regular tasks with extra effort while doing your academic or routine jobs. The desktop assistant needs to be built to work like a better companion. In order to make some of these tasks automated base of this paper shows automate the day-to-day tasks of users with voice control commands and make working a little comfortable. Keywords :- Speech recognition, Desktop Assistant, Human voice interaction , Python, SqL. 1.Introduction At present, human-machine interaction is an exciting topic, towards which lots of work is already done and continuously develops. Interaction with the machine in the form of voice and gestures gives the user a comfortable working experience. Machine Learning algorithms and Natural language processing development give more flexibility and convenience to human-machine interactions. AT&T researchers in Bell Lab collected information on vowel formant frequency shifts. They produced the world's first test system for 10 English digital pronunciations in the 1950s, providing information about the change in vowel formant frequencies. Dynamic programming was proposed in the 1960s by Soviet universities. In the 1970s, pitch and cepstrum technologies were increasingly applied to language recognition with the advent of the LPC voice feature parameters. Speech recognition technology reached the climax along with HMM put forward in the nineteen-eighties. Some speech recognition systems are introduced today, such as the Whisper system and IBM Via Voice system of Microsoft. The related technology was pushed ahead simultaneously; discriminative training based on the criterion of maximum likelihood estimation appeared [1]. Speech Recognition is a practical application that establishes a solid foundation for automatic speech recognition Volume 27, Issue 7, 2021 754 http://www.gjstx-e.cn/ High Technology Letters ISSN NO : 1006-6748 robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques developed over the past thirty years, emphasizing practical methods proven to be successful and likely to be further developed for future applications. The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The author[2] discussed acoustic models based on Gaussian mixing models and profound neural networks for rubber-robust technology. In addition, guidance is offered for the selection of best practices. Speech recognition systems are projected to be employed as the primary man-machine interface for robots used in rehabilitation, entertainment, and other applications in the future. In study[3] outlines the creation of a hidden Markov model-based voice recognition system for biped robot control. According to the researchers [4], the unimodal repair is less accurate than multimodal mistake correction. Multimodal correction is faster than unimodal correction by respeaking on a dictation job. Their research also shows that system-initiated error repair (based on confidence metrics) may not speed up error correction. In the 1990s, the key technologies developed during this period were stochastic language understanding, statistical learning of acoustic and language models, and the methods for implementing large vocabulary speech understanding systems. After five decades of research, speech recognition technology has finally entered the marketplace, benefiting the users in various ways. The challenge of designing a machine that genuinely functions like an intelligent human is still a major one going forward[5]. In this study is a work towards making machine operation automated with the least manual effort of the user. The proposed assistant will be able to be used to perform the following tasks with the convenience of voice commands: • The basic functionality of this assistant is to take voice command from the user and execute the general tasks. • It can search on the Web, Wikipedia, YouTube, and a browser with user voice instruction. • It will send emails to their contacts with document attachments while users themselves are busy with their tasks. • It will play user's favorite videos on YouTube and also download them to their computer. • Play Music from the user's computer. • Maintain user's queries and their quick notes. • Taking a screenshot and Setting up reminders. • All these tasks are done with voice commands and are almost automated . 1.1 Objective Of the Proposed Method The Objective of this study is to develop the application for performing the general tasks with the help of voice commands. Expected achievements in order to fulfill the objective are: • Interact user with the voice and visuals. • Accepting the voice command of user. • Base on the command given, extracting the task and data from command to perform the operation. Volume 27, Issue 7, 2021 755 http://www.gjstx-e.cn/ High Technology Letters ISSN NO : 1006-6748 • Presenting the results in convenient manner to the user. 2. Proposed Methodology This section deals primarily with proposed techniques, methodologies, and concepts relevant to voice control desktop assistants, which is more specific and niche to a single process that uses speech detection, processing, data extraction, and operations. 2.1 Proposed work The proposed model can be divided into three main modules. The modules and their function are defined as follows. 2.1.1: Speech Recognition Speech recognition is the process of converting spoken words to text. Speech recognition is the leading block for the interaction with a machine in this study. Python supports many speech recognition engines and APIs, including Google Speech Engine, Google Cloud Speech API, Microsoft Bing Voice Recognition, and IBM Speech to Text. 2.1.2: Processing of query(command) The processing of the user-given command is done to extract the type of tasks to be performed and the required data that will be used for operating it. The processing of user queries is a complex task, and it will be done using the procedural method of matching the query's keywords with the commands defined in the dataset. After the detection of the command, the required data is obtained by filtering the keywords with relative positions. Query: send email to juanmark0521 2.1.3: Performing Operations: After the command and the data are extracted, the main task remains to execute the given command. In the program, all task that the proposed model can execute is defined functionally. The query recognizer part of a program that runs in a continuous loop will call the required functions along with the data obtained in the query. 2.2: Use case Diagram Below is the use case diagram proposed in fig.2.2, which shows the assistant's tasks Volume 27, Issue 7, 2021 756 http://www.gjstx-e.cn/ High Technology Letters ISSN NO : 1006-6748 Fig.2.2 Use case Diagram 2.3: Proposed Logic and Algorithm The working of any program mainly depends on the algorithm used in it. The logic used in this study is procedural based, which were described below: 1.Importing all the required modules and libraries. 2.Creating database if not exists ##When running the program for the first time 3.Defining the speech recognition method and speak method. 4.Defining the setup to store the user credentials and contact data. 5.Defining the methods for executing the operations performed by the assistant. 6. Defining the query identifies logic inside an infinite loop to continuously identify command and execute it with required data. 7.End of program. 2.3.1: Algorithm for speech recognition Algorithm for speech recognition to take the user command as voice input: 1.creating instance of{ speech_recognition} speech identification. 2.Using microphone as operating system service and instance of speech_recognition capturing the audio of the user. 3.Recorded audio is processed with the speech recognition engine API to convert the audio to the text form. 4.Returning the text query to the calling function. Volume 27, Issue 7, 2021 757 http://www.gjstx-e.cn/ High Technology Letters ISSN NO : 1006-6748 2.3.2: Algorithm for query identification and execution: The following algorithm logic is implemented to identify user command and its execution 1.while(true) ## infinite loop for taking continuous commands from user 2.getting the command as a text query from the above algorithm ##using speech ## Recognition algorithm 3.Identifying the command with help of conditional logic and pre-defined dataset of commands. 4.Removing the command keywords from the query and filtering the data for execution if required. 5.Executing the command by calling the respective function with data 6.if (end command): Break the while loop and exit the program Else: Continue the command identifying logic ## goto step 2 2.4: Control Flow Diagram Following diagram give the description about the control flow of the proposed model as shown in fig 2.4. Fig.2.4 Control Flow Diagram 2.5: Dataflow Diagram Following fig 2.5 diagram shows the flow of data(speech) in the program for processing were query flow depends based on speech command.

Load more