Particle Filter-Based Fingertip Tracking with Circular Hough
Total Page:16
File Type:pdf, Size:1020Kb
Particle Filter-Based Fingertip Tracking with Circular Hough Transform Features Martin Do, Tamim Asfour, and R¨udigerDillmann Karlsruhe Institute of Technology (KIT), Adenauerring 2, 76131 Karlsruhe, Germany. martin.do,as f our,dillmann @kit.edu { } Abstract multi-scale color features [7] introduces a hierarchical representation of the hand consisting of blobs of dif- In this work, we present a fingertip tracking frame- ferent sizes with each blob representing a part of the work which allows observation of finger movements hand. The blob features are matched with a number in task space. By applying a multi-scale edge extrac- hierarchical 2D models each incorporating a specific tion technique, an edge map is generated in which low finger pose. Therefore, tracking is accomplished under contrast edges are preserved while noise is suppressed. the assumption that the local finger poses regarding Based on circular image features, determined from the the hand remains fixed. In order to implement a con- map using Hough transform, the fingertips are accu- tinuous fingertip tracking method, we would like to rately tracked by combining a particle filter and a sub- rely on prominent features which can be extracted at sequent mean-shift procedure. To increase the robust- any time of an image sequence. In [8], for detecting ness of the proposed method, dynamical motion mod- a guitarist’s fingertips, circular features are proposed els are trained for the prediction of the finger displace- which are localized by performing a circular Hough ments. Experiments were conducted on various image transform. For the same application, [9] defined semi- sequences from which statements on the performance circular templates which are used to find the fingers’ of the framework can be derived. positions. In our work, we adopt the concept of circular fea- 1 Introduction tures to tackle the more complex problem of track- Towards an intuitive and natural interface to ma- ing fingertips of a freely moving hand, where overlap chines, markerless human observation has become a of finger and palm occur frequently leading to diffi- major research focus during the past years. Regard- culties regarding the robust extraction of these fea- ing coarse granular human tracking incorporating the tures. For tracking, we combined particle filtering with torso, arms, head, and legs, considerable progress has a mean-shift algorithm. In addition, a dynamical mo- been achieved whereas human hand tracking still re- tion model for predicting was trained to enhance the mains an unresolved issue, although the hand is con- robustness of the proposed framework. sidered to be one of the most crucial body parts re- The paper is organized as follows. Section 2 de- garding the interaction with other humans and the en- scribes the feature extraction consisting of an edge vironment. detection step and the Hough transform. Details on Regarding markerless tracking and detection of hu- the tracking procedure performed on the resulting map man body parts, most systems have limited sensor ca- are given in Section 3. Subsequently, first experiments pabilities which in common case are limited to a stereo with the framework are explained in Section 4. In Sec- camera setup. Full hand tracking approaches in joint tion 5, the work is summarized and notes to future angle space have been proposed in [1],[2],[3]. However, works are given. due to the highly complex structure of the hand whose motion involves 27 DoF, tracking can be only achieved 2 Feature Extraction at a low frame rate or on multiple views from different In order to generate the edge image, a skin color perspectives. segmentation is performed for extracting the hand and Using stereo vision, a reasonable solution lies in re- finger regions. Morphological operators are applied on ducing dimensionality of the problem by shifting from the segmented image to eliminate noise and to produce joint angle space into task space. In [4], a finger track- a uniform region. To detect the edges in this prepro- ing approach based on Active contours is presented for cessed image, image gradients are calculated on various air-writing. The target to be tracked consists of a con- scales. tour which is laid around the pointing finger. As a result, since no reliable statement can be made on the 2.1 Multi-Scale Edge Extraction actual fingertip position, one has to assume that finger Considering the problem of fingertip tracking, due to pose is not changing. small intensity variances between different parts of the Hence, based on curvature properties, in [5] finger- hand, e.g. the fingernail and the skin, respectively, the tips are detected within a contour which is extracted finger regions and the palm, it is desired to detect edges from skin blob tracking. A more elaborate approach where contrast can vary over a broad range. Depending is presented in [6] where particles are propagated from on the parameters, applying standard algorithms, such the center of the hand to positions close to the con- as the Canny edge detectors on a wider scale, leads to tour. Intersection of the contour with line segments at an edge image where numerous, false edges occur. To particles and examination of the transitions between preserve low contrast edges in certain areas while re- non-skin and skin-area indicate whether a particle rep- ducing noise close to high-contrast edges, based on the resents a fingertip. However, this method is specifically work of [10], we implemented a filter approach consist- designed to detect tips of stretched fingers. Based on ing of a steerable Gaussian derivative filter on multiple R = m 1.1 r,...,m 1.1 r with m N whereby a range{− of pixels· · along· the· edge} tangent∈ is considered during the voting process. In order to increase the robustness of the tracking algorithm, a density distri- bution is formed in Hough space by convolving IH with r a Gaussian kernel G(u,v; 2 ). Since the hand motion occurs in 3D Euclidean space, a fixation of r is only valid if movement of the fingertip Figure 1. Left: Original input image. Cen- in direction of the z-axis of a camera is excluded during ter/Left: Color segmented image. Center/Right: the tracking. Adaptation of r in each frame, allows to Edge image using Canny detector. Right: Edge track fingers in all directions. Based on the generated image using the method proposed in Section 2.1. density distribution in frame t a radius estimate rˆt is determined by applying an Expectation Maximization algorithm. Further details are given in Section 3.3. scales. The basis filter for x is defined as follows: 3 Tracking Fingertips (x2+y2) − x x 2σ2 3.1 Prediction G (x,y;σ ) = e k . (1) k k πσ− 4 2 k Providing a prediction on the movement of the ob- jects to be tracked increases the robustness of a statis- y σ Gk(x,y; k) is defined analogously. To determine the tical tracking framework. We train dynamical motion scale at which a gradient can be reliably estimated, models in the form of a second-order auto-regresssive x σ the magnitude of the filter response rk(x,y; k) and (AR) process as proposed in [4], which is described as y σ rk(x,y; k), obtained by convolution of the image I with follows: the filters from Eq. 1, is checked against a noise thresh- ω old. While the magnitude can be calculated according qt q¯ = A1(qt 1 q¯) + A2(qt 2 q¯) + b0 k (5) − − − − − to: D where qt R denotes the current configuration, q¯ ∈ ω m σ x σ 2 y σ 2 the mean configuration, and k [0,1]. To learn the rk (x,y; k) = r (x,y; k) + r (x,y; k) , (2) D D ∈ D k k AR parameters A1, A2 R × and b0 R , training q data is provided in the∈ form of a configuration∈ se- the threshold is set by following function: quence Q = q0′ ,...,qM′ whereas the sequence is gener- ated by manual{ labeling} of fingertips in each frame of 2ln(1 (1 α)R) c = − − − s . (3) a recorded image sequence. k σ 2√ π l p 2 k 2 Two AR models are trained to provide predictions for the local fingertip movement concerning a static α with sl representing the standard deviation and the hand pose as well as the movement of the hand itself. significance level for an image with R pixels which de- Based on the assumption that the motion of each finger fines an upper boundary for allowed misclassification is influenced by the motion of the neighbored fingers, of image pixels. To take into account local intensity the first model is trained with training data whose in- and contrast conditions, we focus on local signal noise stances qi′ Q with D = N consists of the length of the in a specific region rather than on global sensor noise. j, j∈+1 j+1 mod N j vector v = q q between the fingertip j Therefore, Eq. 3 depends on the local standard devia- t t − t σ max σ max and j + 1 mod N: tion sl calculated within a 2 k 2 k -neighborhood max × where σ denotes the largest scale being examined. j, j+1 k q′( j) = v j = 1,...,N. (6) Hence, we calculate each gradient at the minimum re- i k t k σ min liable scale k where the likelihood of error due lo- For finger j, this leads to following displacement vector: cal signal noise falls below a standard tolerance. This guarantees that a more accurate gradient map is esti- 1 j mated which is less sensitive to signal noise and errors v¯j = AN(i,i + 1)vi,i+1 + AN(i,i + 1)vi,i+1 +b j ω .