Proc. IEEE Canadian Conf. Electrical, Computer Engineering (St. John’s, NF), pp. 470-473, May 1997
COMPARISON OF VOICE ACTIVITY DETECTION ALGORITHMS FOR
WIRELESS PERSONAL COMMUNICATIONS SYSTEMSy
Khaled El-Maleh and Peter Kabal
TSP Lab oratory, Dept. of Electrical Eng.
McGill University
Montreal, Queb ec, Canada H3A 2A7
email: fkhaled,[email protected]
ABSTRACT (VBR) co ding has b een recently adopted for CDMA-
based cellular and PCS systems to enhance capacity
V oice activity detection (VAD) algorithms havebe-
b y reducing interference. A V AD device is an indis-
come an in tegral part of many of the recently stan-
p ensable part of any VBR co dec as it controls b oth the
dardized wireless cellular and P ersonal Communica-
a verage bit rate, and the o v erall qualit y of the co der
tions Systems (PCS). In this pap er, we presentacom-
[3].
parative study of the p erformance of three recently pro-
This pap er is organized in ve parts. Section 2
posed VAD algorithms under various acoustical back-
starts with a general review of the basics of a V AD
ground noise conditions. We also prop ose new ideas to
design. It then describ es, in some detail, three selected
enhance the p erformance of a VAD algorithm in wire-
V AD designs for the this study. Section 3 rep orts the
less PCS sp eech applications.
results of a simulation study that has b een p erformed
to study the p erformance of the three VADs under var-
1. INTRODUCTION
ious background noise conditions. We then discuss and
analyze the simulation results in Section 4. Finally,
Conv ersational sp eech is a sequence of consecutive seg-
conclusions are presented in Section 5.
ments of silence and sp eech. In wireless telephony,the
user is often roaming and th us encountering di erent
2. VOICE ACTIVITY DETECTION
types and levels of background acoustical noises. This
ALGORITHMS
bac kground noise which contaminates the signal results
in either noise only or sp eech plus noise segments. In
The basic principle of a VAD device is that it extracts
manyspeech pro cessing applications, it is desirable to
some measured features or quantities from the input
detect sp eech in the noise. This pro cess is called voice
signal and then compare these values with thresholds,
activit y detection (VAD) [1]. The VAD op eration can
usually extracted from noise only p erio ds. Voice ac-
be viewed as a decision problem in which the detec-
tivity(VAD=1) is declared if the measured values ex-
tor decides b etw een noise only,orofspeech plus noise.
ceed the thresholds. Otherwise, no sp eec h activityor
This is a challenging problem in noisy acoustical envi-
noise (VAD=0) is present. What generally character-
ronments.
izes a VAD design is the wa y it selects its features, and
V oice activity detection is used in a variet y of sp eech
the wa y it de nes and up dates the thresholds. In gen-
communication systems such as sp eech co ding, sp eech
eral, a V AD algorithm outputs a binary decision in a
recognition, hands-free telephony , audio-conferencing,
frame-by-frame basis where a \frame" of the input sig-
and echo cancellation. The recently prop osed multiple-
nal is a short unit of time such as 20{40 ms. Accuracy,
access sc hemes, such as CDMA, and enhanced TDMA
robustness to noise conditions, simplicity , adaptation,
for cellular and PCS systems use some form of V AD
and real-time pro cessing are some of the required fea-
[1]. Moreov er, in GSM-based wireless systems a VAD
tures of a go o d VAD.
mo dule is used for discontinuous transmission to sav e
In the early V AD algorithms, short-time energy,
the battery life of p ortable units [2]. V ariable bit rate
zero crossing rate, and LPC co ecients w ere among
the common features used in the detection pro cess [4].
yThis work w as supp orted b y a grant from the Canadian
Cepstral features [5], formant shap e [6], a least-square
Institute for T elecommunications Research under the NCE pro-
gram of the Gov ernment of Canada p erio dicity measure [7] are some of the recent ideas in
VAD designs. 2.3. The TOS VAD
In this work, we consider three recently prop osed
Symmetrically distributed (non-skewed) pro cesses are
VAD algorithms. These include the VAD used in the
characterized to have a third-order cumulant (TOC)
GSM cellular system [1,2], the VAD used in the en-
that is identically zero at all lags. However, sp eechsig-
hanced variable rate co dec (EVRC) of the North Amer-
nals have b een observed exp erimentally to be skewed
ican CDMA-based PCS and cellular systems [3], and a
enough to pro duce signi cantly non-zero TOC at all
third-order statistics (TOS)-based VAD [8].
lags. Under the assumption that many noises can be
mo deled as Gaussian or symmetrically distributed pro-
2.1. The GSM VAD
cesses, it is p ossible to discriminate sp eech from noise.
In [8], a novel time domain Gaussianity test is used in
In the GSM VAD, an adaptive noise-suppressor lter
the sp eech detection pro cess. The test statistic of this
^
is used to lter the input signal frame. The co ecients
VAD, d is de ned as
of the lter are computed during noise-only p erio ds.