SEMOUR: A Scripted Emotional Speech Repository for Urdu Nimra Zaheer Obaid Ullah Ahmad Ammar Ahmed Information Technology University Information Technology University Information Technology University Lahore, Pakistan Lahore, Pakistan Lahore, Pakistan
[email protected] [email protected] [email protected] Muhammad Shehryar Khan Mudassir Shabbir Information Technology University Information Technology University Lahore, Pakistan Lahore, Pakistan
[email protected] [email protected] Mfcc Chroma Mel spectrogram Script Actors Recording Audio Clip Human Experimentation Audio Clip Feature Extraction DNN Preparation Recruitment Sessions Segmentation Annotation SEMOUR (scripted emotional speech repository for Predictions Urdu) Figure 1: This figure elaborates steps involved in the design of SEMOUR: scripted emotional speech repository for Urdu along with the architecture for our proposed machine learning network for Speech Emotion Recognition. ABSTRACT 1 INTRODUCTION Designing reliable Speech Emotion Recognition systems is a com- Sound waves are an information-rich medium containing multi- plex task that inevitably requires sufficient data for training pur- tudes of properties like pitch (relative positions of frequencies on poses. Such extensive datasets are currently available in only a few a scale), texture (how different elements are combined), loudness languages, including English, German, and Italian. In this paper, (intensity of sound pressure in decibels), and duration (length of we present SEMOUR, the first scripted database of emotion-tagged time a tone is sounded), etc. People instinctively use this informa- speech in the Urdu language, to design an Urdu Speech Recogni- tion to perceive underlying emotions, e.g., sadness, fear, happiness, tion System. Our gender-balanced dataset contains 15, 040 unique aggressiveness, appreciation, sarcasm, etc., even when the words instances recorded by eight professional actors eliciting a syntac- carried by the sound may not be associated with these emotions.