Applied Soft Computing 15 (2014) 121–135
Contents lists available at ScienceDirect
Applied Soft Computing
j ournal homepage: www.elsevier.com/locate/asoc
Moving object detection using Markov Random Field and Distributed
Differential Evolution
a,∗ a b
Ashish Ghosh , Ajoy Mondal , Susmita Ghosh
a
Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700108, India
b
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
a
r t a b
i s
c l e i n f o t r a c t
Article history: In this article, we present an algorithm for detecting moving objects from a given video sequence. Here,
Received 16 March 2012
spatial and temporal segmentations are combined together to detect moving objects. In spatial segmen-
Received in revised form 22 July 2013
tation, a multi-layer compound Markov Random Field (MRF) is used which models spatial, temporal, and
Accepted 25 October 2013
edge attributes of image frames of a given video. Segmentation is viewed as a pixel labeling problem
Available online 9 November 2013
and is solved using the maximum a posteriori (MAP) probability estimation principle; i.e., segmenta-
tion is done by searching a labeled configuration that maximizes this probability. We have proposed
Keywords:
using a Differential Evolution (DE) algorithm with neighborhood-based mutation (termed as Distributed
Markov Random Field
Differential Evolution (DDE) algorithm) for estimating the MAP of the MRF model. A window is consid-
MAP estimation
ered over the entire image lattice for mutation of each target vector of the DDE; thereby enhancing the
Differential Evolution
Distributed Differential Evolution speed of convergence. In case of temporal segmentation, the Change Detection Mask (CDM) is obtained
Object detection by thresholding the absolute differences of the two consecutive spatially segmented image frames. The
intensity/color values of the original pixels of the considered current frame are superimposed in the
changed regions of the modified CDM to extract the Video Object Planes (VOPs). To test the effective-
ness of the proposed algorithm, five reference and one real life video sequences are considered. Results
of the proposed method are compared with four state of the art techniques and provide better spatial
segmentation and better identification of the location of moving objects.
© 2013 Elsevier B.V. All rights reserved.
1. Introduction decades, MRF models [3–7] and Hidden MRF models [8] are used
for segmenting images and these models are quite popular for
Detection and tracking of moving objects from a given video segmenting video sequences [9–20]. In [9,21], to detect moving
sequence are challenging tasks in video processing and have been objects, MRF model was considered in the spatial direction only.
an active research area for the last few decades [1,2]. It has wide Whereas, in a video, two image frames are not independent of
applicability in video surveillance, event detection, anomaly detec- each other. Hence, temporal coherence can be explored to obtain
tion, dynamic scene analysis, activity recognition, activity based better segmented output. Considering this temporal coherence, a
human recognition [1,2] etc. To detect moving objects for a given multi-layer MRF modeling is adhered to in [10–16] (one frame in
video sequence, various object regions which are moving with temporal direction). Kim and Park [11] showed that a combination
respect to their background are identified [2]. It can be performed of spatial segmentation and temporal segmentation provides better
using (i) change/motion detection and (ii) motion estimation [2] results for detecting moving objects. To preserve the boundary of
and the detected objects, in turn, can be tracked. Tracking provides the objects, Sududhi et al. [12] proposed a multi-layer compound
the velocity, acceleration and position of moving objects at different MRF model which takes care of the spatial distribution of color,
time instants. temporal color coherence, and edge map/line-field in the tempo-
Literature on detection of moving objects using different ral frames (two frames in temporal direction) to obtain a spatial
stochastic models is quite rich. Markov Random Field (MRF) model segmentation. Temporal coherence helps in preserving the region
is one such approach which represents spatial continuity among information in segmentation. The use of edge map/line-field can
adjacent pixels and is robust to degradation. Since the last few preserve the accurate object boundary in segmentation.
To the best of our knowledge, in the literature no method is
available which exploits the merits of Differential Evolution (DE)
∗ scheme with MRF model. In the present article, the applicability of
Corresponding author. Tel.: +91 33 2575 3110/3100; fax: +91 33 2578 3357.
DE has been explored with MRF model for moving object detec-
E-mail addresses: [email protected] (A. Ghosh), [email protected]
(A. Mondal), [email protected] (S. Ghosh). tion. A preliminary experiment of this work was reported in [17].
1568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2013.10.021
122 A. Ghosh et al. / Applied Soft Computing 15 (2014) 121–135
A robust analysis of the results with application in different com- modeling), DE algorithm, DE algorithm with neighborhood-based
plex video sequences is reported in the present article. Comparison mutation, the proposed DDE algorithm for MAP estimation, seg-
with recent state-of-the-art techniques along with its quantitative mentation of subsequent frames based on change information, and
analysis has also been made in the present version. temporal segmentation are described. Experimental results and
The present work considers, a spatial segmentation scheme discussion are given in Section 4 and conclusive remarks are put
where multi-layer compound MRF model [12] has been used for in Section 5.
detecting moving objects in a given video sequence. Spatial seg-
mentation as well as temporal segmentation are combined for
detecting moving objects. The assumed MRF model takes care of 2. Related work
spatial distribution of color, temporal color coherence, and edge
map/line-field in temporal direction (two frames in the temporal For change/motion detection initially a reference frame is con-
direction) of a given video frame. The considered MRF model is sidered where no objects are present [2]. Detection of moving
found to preserve accurate object boundary. The spatial segmen- objects becomes very difficult using temporal segmentation if the
tation using the considered multi-layer compound MRF model is reference frame is absent or movement of objects is either very
equivalent to a pixel labeling problem and is solved using the MAP slow, or very fast, or they halt for some time and then move again
estimation principle. A Distributed Differential Evolution (DDE) [1,2].
algorithm is proposed for searching the maximum a posterior prob- In order to solve these problems a region based robust video
ability of MRF modeled frame. In the proposed search technique, a frame segmentation algorithm is needed [1,2]. Salember and Mar-
population represents a segmented output of the considered video ques [23] had proposed a computationally efficient watershed
frame and size of the population is equal to the dimension of the based spatial segmentation scheme. To detect the object’s bound-
video frame. A population consists of a set of target vectors. Here, ary, they have used both spatial and temporal segmentations. But
each target vector encodes the red (R), green (G), and blue (B) com- this scheme sometimes produces over-segmented results.
ponents of the segmented output and a set of such target vectors Image segmentation using MRF model can be achieved by esti-
constitutes each population. The population is initialized randomly. mating maximum a posteriori (MAP) probability of the concerned
It is to be noted that the segmentation of the MRF modeled frame model [3,10,11,24]. For a given observed image or a video frame
corresponds to the MAP estimation and the proposed DDE algo- Y, segmentation is obtained by searching a configuration X which
rithm maximizes this probability. This maximization process has maximizes this probability. This maximization process is quite
been done iteratively by evolving the target vectors (i.e., possible complex owing to the large search space. Hence, efficient and
solutions) of the population using the following three operations: effective optimization algorithms are required to maximize the
mutation, crossover, and selection. The algorithm continues until posterior probability.
some stopping criterion is reached. At the end of the evolutionary The MAP of the MRF model has been estimated using var-
process, a stable labeled configuration is obtained. Such a configu- ious optimization algorithms such as Simulated Annealing (SA)
ration is considered as the required segmented output of the given [4,24,25], Iterated Conditional Mode (ICM) [3,24], and Genetic Algo-
input video frame. rithm (GA) [5,24]. Though the convergence speed of SA is less but
In the proposed DDE, to mutate each target vector, instead of it has the capability of finding better solutions [5,24]. On the other
considering the whole population, a small window centered at the hand, ICM consumes less time as compared to other optimization
target vector is considered thereby reducing computational time techniques; but sometimes it may get stuck to local optima [5,24].
as compared to the conventional Differential Evolution algorithm. In [12], Subudhi et al. has proposed an MRF model based spatial
It also provides a better spatial segmentation as the considered segmentation scheme by combining both SA and ICM to estimate
window maintains the spatial regularity of the lattice in MRF. the MAP of the corresponding model. It has the ability to provide
In temporal segmentation, the difference images of each of the comparable results quickly. In [12], the SA was initially executed
R, G, and B channels are obtained by taking absolute difference for a few iterations to get a suboptimal solution and thereafter ICM
of the respective R, G, and B components of the two consecutive (with suboptimal solution as initialization) was used to obtain the
spatially segmented image frames. Thereafter, Change Detection optimal solution. As SA is executed for a few iterations, the sub-
Masks (CDMs) of each of the R, G, and B channels are obtained optimal solutions obtained using SA sometimes correspond to the
by thresholding the corresponding difference images. The changed local optima or nearer to the local optima and thereafter ICM may
and the unchanged regions in the current frame are obtained by get stuck to the same and may not reach the global optimum or
considering the union of CDMs of these three channels. To obtain near to the global optimum. Hence, segmentation using the same
the exact location of the moving objects in the current frame, we algorithm may not be a good choice. Moreover, the setting of the
modify the obtained CDM by considering the information of pixels number of iterations in SA, is not easier.
belonging to the moving objects in the previous frame and result GA was used as an alternative optimizer to SA and ICM in [5] due
of the corresponding current spatial segmentation. Video Object to suitable values of the parameter, its ability to explore and exploit
Plane (VOP) is extracted by superimposing the intensity/color val- the search space quickly using the concept of natural evolution
ues of original pixels of the current frame on the changed regions and selection [26]. It is robust for identifying solutions to combi-
of the modified CDM. natorial optimization problems. In [5,26,27], it is mentioned that
To test the effectiveness of the proposed algorithm investigation the GA sometimes converges to a premature solution and has low
was carried out on five reference and one real life video sequences. searching speed. To overcome these problems, GA has been modi-
Results obtained by the proposed method are compared with those fied in [28,29]. In [28], a deterministic hill-climbing procedure was
of MRF with DGA scheme [10,11], MRF with DGA scheme with embedded into GA. Although the computational cost was low, it
multi-layer compound MRF model as used in the proposed method still landed up to a suboptimal solution. Zhang et al. developed evo-
(henceforth, called as modified MRF with DGA scheme), MRF with lutionary algorithm based on a combination of GA and SA [30,31].
Graph Cut scheme [7,22], and MRF with SA and ICM scheme [12]. These algorithms produce better segmentation than those obtained
Organization of the article is as follows. Section 2 describes the using GA alone, but had a lower convergence speed. A combination
related work. In Section 3, the proposed methodology for mov- of ICM and GA was proposed by Lai and co-worker [5]. For quick
ing object detection where spatial segmentation (spatio-temporal convergence and to obtain better segmentation, a new version of
MRF based image modeling, MAP estimation framework for MRF GA called, Distributed Genetic Algorithm (DGA) has been proposed
A. Ghosh et al. / Applied Soft Computing 15 (2014) 121–135 123
Fig. 1. Block diagram of the proposed method.
for segmenting textured images in [32] and video sequences in change information based initialization is utilized for segmenting
[10,11]. In DGA [11], the chromosome with the highest fitness value the subsequent frames.
among its neighborhood was selected. During crossover, a neigh- In temporal segmentation, the difference images for each of
boring chromosome within the window was randomly picked up the R, G, and B channels are obtained by computing the absolute
and a component of the feature vector was chosen and replaced by differences of the corresponding R, G, and B components of two con-
the corresponding value of the feature vector of the selected chro- secutive spatially segmented image frames. Then, the CDMs of each
mosome. For this reason, DGA merges smaller regions to a large of the R, G, and B channels are obtained by thresholding the cor-
region and produces over-segmented results at the boundary of an responding difference images. These CDMs represent the changed
object. and the unchanged regions in the respective channel. Union of
Unlike evolutionary approaches, some deterministic methods CDMs of these three channels is taken to obtain the changed and
such as Belief Propagation (BP) and Graph Cuts (GC) have been the unchanged regions in the current frame. The changed regions
used for MAP estimation. In [14], Yin and Collins had presented correspond to moving objects whereas the unchanged regions cor-
BP approach for moving object detection using a 3D MRF model. In respond to stationary objects and background. To obtain the exact
this method, each hidden state in the 3D MRF model represented a locations of the moving objects, the CDM is modified. Information
pixel’s motion likelihood and was estimated using message passing of the pixels belonging to the moving objects in the previous frame
in a 6-connected spatio-temporal neighborhood. Huang et al. [15] and output of the current spatial segmentation are considered to
had proposed a framework for segmenting moving objects using modify it. The intensity/color values of the original pixels of the
a new spatio-temporal MRF model and Loopy Belief Propagation current frame are then superimposed on the changed regions of
(LBP) was used to estimate MAP of MRF model. In [22], Szeliski et al. the modified CDM to extract the VOP.
studied different energy minimization methods such as ICM, swap-
move, expansion-move, max-product LBP, and Tree-Reweighed
3.1. Spatial segmentation
Message passing (TRW) for stereo, image stitching, interactive seg-
mentation, and denoising. The authors mentioned that the TRW
Spatial segmentation can be done by combining the spatial and
and expansion-move GC algorithms achieve minimum energy.
temporal features of a given video image frame in a single frame-
Feng and Jian-ya [7] had proposed a novel MRF based image
work [11,33]. In the present article, MRF (with both spatial and
segmentation scheme where MAP of MRF model was estimated
temporal features) based image modeling is considered for spatial
using expansion-move algorithm in GC. However, these approaches
segmentation.
[7,14,15,22] are found to provide over-segmented results.
3.1.1. Spatio-temporal MRF based image modeling
Here, the observed video sequence y is assumed to be a 3D vol-
3. Proposed method for moving object detection
ume, consisting of spatio-temporal image frames. The tth frame of
the video sequence y is represented as yt. Each pixel of yt is a spa-
Fig. 1 represents the block diagram of the proposed method.
tial site s, denoted by yst, q is a temporal site, and e is an edge site.
As mentioned earlier, in the present article spatial and temporal
Let, S be the set of all the sites. Thus S = {s} ∪ {q} ∪ {e}. Let Yt rep-
segmentations are integrated together to detect moving objects.
resent a random field and yt be a realization of it at time t. Thus,
Spatial segmentation accurately provides the boundary of the
yst denotes a pixel at site s at time t. Let x denote the segmenta-
regions in a given scene, whereas temporal segmentation deter-
tion of video sequence y and xt the segmented version of yt. Let Xt
mines the changed and the unchanged regions. MRF model with
be the MRF and xt represents one of its realization. In the present
spatio-temporal framework is used for spatial segmentation as it
investigation, the second order MRF modeling in spatial as well as in
was used in [12]. Color features, both in spatial and temporal direc-
temporal directions are considered to take care of the spatial distri-
tions, and edge map/line-field in temporal direction are taken into
bution of color and temporal color coherence, respectively. Another
consideration while modeling the image frame. The edge map has
MRF model (with the edge/line-field of the current frame xt and the
been obtained using a 3 × 3 Laplacian mask. Segmentation using the
edges/line-fields of the (t − 1)th frame xt−1 and the (t − 2)th frame
MRF model is usually done through the MAP estimation principle.
xt−2) is considered to preserve the accurate object boundary in seg-
Using a local mutation based DE termed as Distributed Differential
mentation. As Xt is an MRF, it satisfies the following Markovian
Evolution (DDE) is proposed to estimate the MAP of such a model. In
property:
DDE unlike conventional DE algorithm, a small window is consid-
ered over the image lattice for mutating each of the target vectors.
P(Xst =xst |Xqt =xqt , ∀q ∈ S, s =/ q)=P(Xst =xst |Xqt =xqt , (q, t) ∈ st ),
Random initialization is done for segmenting the initial frame. A
124 A. Ghosh et al. / Applied Soft Computing 15 (2014) 121–135
Fig. 2. (a) MRF model in spatial direction, (b) MRF model in temporal direction, and (c) MRF model with line-field in temporal direction.
where st denotes the neighborhood of site s at time t. For temporal second order clique function in spatial direction. Fig. 2(b) shows
MRF, the following Markovian property is also satisfied: the considered MRF model in temporal direction. Here, each site s
at location (i, j) in the tth frame is modeled with neighbors of the
P(Xst = xst |Xqr = xqr , s =/ q, t =/ r, ∀(s, t), (q, r) ∈ V)
corresponding pixels in the temporal direction i.e., in the (t − 1)th
= −
P(Xst = xst |Xqr = xqr , s =/ q, t =/ r, (q, r) ∈ sr ). and the (t 2)th frames. Similarly, Fig. 2(c) diagrammatically rep-
resents the considered MRF model with edge/line-field of the tth
Here V represents the 3D volume of the video sequence and sr
frame with the neighbors of the corresponding pixels in the (t − 1)th
denotes the neighborhood of (s, t) in temporal direction.
and the (t − 2)th frames.
The novelty of the MRF model lies in the fact that it takes into
The prior probability of the MRF framework can be represented
account the spatial and temporal characteristics of a region [24]. −U(Xt)/T
by Gibb’s distribution with P(Xt) = (1/z)e , where z represents
The regions are assumed to have uniform intensities. The consid- −U(Xt )/T
the partition function and it is expressed as z = e , and
xt
ered image frame is assumed to come from an imperfect imaging
U(Xt) denotes the energy function (a function of clique potentials).
modality. It is also assumed that some noise has corrupted the
The parameter T is the temperature constant and is set to 1 as in
actual image to produce the observed image yt. Given this observed
[24]. The following clique potential functions have been considered
image yt our aim is to find out the image xt, which maximizes the
for spatial and temporal directions:
posterior probability. An estimate of xt is denoted by xˆt . Hence, it
For spatial direction,
is assumed that due to the presence of noise xt cannot be observed
properly. But a noisy version of xt (i.e., yt) is observed. Noise is
+˛ if xst =/ xqt , (s, t), (q, t) ∈ S,
assumed to be white (additive i.i.d., i.e., independent and identically Vcps(xst , xqt ) =
−
˛ if x = x , (s, t), (q, t) ∈ S.
distributed). Hence Yt can be expressed as [24] st qt
Similarly, in temporal direction