applied sciences
Article Hand Gesture Detection and Recognition Using Spectrogram and Image Processing Technique with a Single Pair of Ultrasonic Transducers
Guo-Hua Feng 1,* and Gui-Rong Lai 2
1 Department of Power Mechanical Engineering and NEMS Institute, National Tsing Hua University, Hsinchu 300, Taiwan 2 Department of Mechanical Engineering, National Chung Cheng University, Chiayi 621, Taiwan; [email protected] * Correspondence: [email protected]
Abstract: This paper presents an effective signal processing scheme of hand gesture recognition with a superior accuracy rate of judging identical and dissimilar hand gestures. This scheme is implemented with the air sonar possessing a pair of cost-effective ultrasonic emitter and receiver along with signal processing circuitry. Through the circuitry, the Doppler signals of hand gestures are obtained and processed with the developed algorithm for recognition. Four different hand gestures of push motion, wrist motion from flexion to extension, pinch out, and hand rotation are investigated. To judge the starting time of hand gesture occurrence, the technique based on continuous short- period analysis is proposed. It could identify the starting time of the hand gesture with small-scale motion and avoid faulty judgment while no hand in front of the sonar. Fusing the short-time Fourier Citation: Feng, G.-H.; Lai, G.-R. transform spectrogram of hand gesture to the image processing techniques of corner feature detection, Hand Gesture Detection and feature descriptors, and Hamming-distance matching are the first-time, to our knowledge, employed Recognition Using Spectrogram and to recognize hand gestures. The results show that the number of matching points is an effective Image Processing Technique with parameter for classifying hand gestures. Based on the experimental data, the proposed scheme could a Single Pair of Ultrasonic achieve an accuracy rate of 99.8% for the hand gesture recognition. Transducers. Appl. Sci. 2021, 11, 5407. https:// Keywords: ultrasound; hand gesture; image processing; short-time Fourier transform; air sonar doi.org/10.3390/app11125407
Academic Editor: Theodore E. Matikas 1. Introduction Received: 22 May 2021 Nowadays, ultrasound is broadly applied in several fields, and the major areas include Accepted: 8 June 2021 medical and structural diagnoses and range-finding-related applications. For example, Published: 10 June 2021 Doppler radar can be used in aviation, meteorology, speed gun, motion detection, intruder warning, and light control. In particular, Doppler radar has been used for posture and Publisher’s Note: MDPI stays neutral gesture recognition in motion sensing [1–5]. with regard to jurisdictional claims in Posture or gesture recognition commonly uses mechanical or light stimulation sensors, published maps and institutional affil- such as cameras, infrared sensors, and wearable devices. Although their installation iations. may not be difficult and the obtained results could be intuitively analyzed, these devices have inherent limitations [6]. Cameras can infringe on users’ privacy; infrared sensors are susceptible to misjudgment; securing wearable sensors is necessary for functional operation [7]. In contrast, acoustic stimulus sensing, such as sonar, possesses several Copyright: © 2021 by the authors. unique advantages, particularly for the monitoring of the home environment and can be Licensee MDPI, Basel, Switzerland. fully operated in the dark [8]. In addition, visuals need not be recorded so that the privacy This article is an open access article of users will not be affected [9]. These characteristics make sonar a superior sensing system distributed under the terms and for gesture recognition applications. conditions of the Creative Commons The hand gesture detection system possesses several advantages. The contactless Attribution (CC BY) license (https:// interaction mode allows the users not to touch the control panel. This avoids the po- creativecommons.org/licenses/by/ tential cross-contamination of multiple users via the touch panel/screen and the likely 4.0/).
Appl. Sci. 2021, 11, 5407. https://doi.org/10.3390/app11125407 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, 5407 2 of 18
damage/fatigue of the physical control device due to inappropriate/intensified operation. In addition, the proposed hand gestures in the study are aimed at robot manipulation. The hand gestures intuitively correspond to the motion of the robot. This facilitates the users not only to operate the robot but to collaborate with the robot to execute a task. Liu et al. used a single speaker and microphone in a smartphone with the Doppler effect to study gesture recognition via ultrasound. The vectors containing +1 and −1 can be used to indicate the direction of gesture movement in each time period [10]. The Sound- Wave team also used a microphone and speaker in a mobile device. The gestures inves- tigated by them consisted of scrolling, single or double tapping, forward and backward hand gestures, and continuous motion. These gestures can be judged from the amplitude of the reflected signal [11]. Przybyla et al. developed a three-dimensional range-finding function chip. Its searchable range is up to 1 m in a 45◦ field of view. Low power consump- tion is the main feature of the chip [12]. Zhou et al. reported the ultrasonic hand gesture recognition system which is capable of recognizing 3D micro-hand gestures with a high cross-user recognition rate. This system possesses the potential to develop a practical low-power human machine interaction system [13]. On the other hand, researchers working in the field of computer vision have broadly employed image processing methods such as feature detection, description, and matching for object recognition, image classification, and retrieval. Features, such as points, edges, or corners, can be considered as the information of interest within the investigated im- age [14]. A good feature detector can find features possessing strong information changes that are repeatedly recognized among several images taken with different viewing angles and lighting conditions [15]. A feature descriptor is an algorithm used to process the image feature points into meaningful feature vectors. Feature descriptors encode interesting image features into a series of numbers and act as a type of numerical identifier that can be employed to differentiate one feature from another. Researchers have conducted a considerable number of studies on image feature extraction and description and proposed several classic feature description and extraction methods, such as scale invariant feature transform (SIFT), speeded up robust features (SURF), features from accelerated segment test (FAST), and binary robust independent elementary feature (BRIEF) [16–19]. These methods obtain image feature points and their descriptors by finding the local extremum in the image and describing the features using the luminance information of their neighborhood. Feature matching is obtained by calculating the distances between feature points in different images. For example, the SIFT feature descriptor uses the Euclidean distance as the judgment standard between descriptors, whereas the BRIEF descriptor [20] is a type of binary descriptor that uses the Hamming distance as the judgment standard to describe the correspondence between two feature points [21,22]. As described above, existing air sonar applied to hand gesture recognition by time- domain reflected signal by decoding its signal level or amplitude, the likely environmental noise and the distance between the users and speaker/microphone could decrease the accu- racy rate of hand gesture recognition. Moreover, most of the image recognition techniques deals with the images directly obtained via camera. The large amount data points of images accompanying with the frame change not only requires the higher cost hardware but more computational effort compared to our proposed air sonar approach. In this study, we constructed an air sonar with a pair of ultrasonic emitters and receiver to investigate the hand gesture recognition. Through the cost-effective circuity, the acquired Doppler signals were processed for the study of hand gesture recognition. Two algorithms were de- veloped with the one to judge the starting time of hand gestures by continuous short-time Fourier transform and the other to recognize the hand gesture by image processing with spectrogram. The Superior recognition results were obtained using the proposed scheme. Further details are described below. Appl.Appl. Sci. 2021,, 11,, 5407x FOR PEER REVIEW 33 of of 18
2. Linear and Rotational Doppler Effect on Hand Gesture Motion The common hand gestures incl includeude translational and rotational motion of the hand. The signals from air sonar with linear and rotationalrotational Doppler effect will be applied to our hand gesture recognition. Consider the constructedconstructed ultrasonic transmitter and receiver pair as as an an air air sonar sonar system system fixed fixed and and located located at the at origin the origin Q of the Q ofradar the coordinate radar coordinate system (U,system V, W) (U, (Figure V, W) 1). (Figure The hand1). The is described hand is described in the local in coordinate the local coordinate (x, y, z) and (x, has y, a z) trans- and lationhas a translationand rotation and with rotation respect with to the respect radar to coordinate. the radar coordinate. A reference A coordinate reference coordinatesystem (X, Y,system Z) is (X,introduced, Y, Z) is introduced, which has whichthe same has translation the same translation as coordinate as coordinate (x, y, z) but (x, has y, z) no but rota- has tionno rotation with respect with respectto the (U, to theV, W) (U, coordinate. V, W) coordinate.
Z
M’’ M’ z Y ωz r O’ M Radar coordinate ro X (U, V, W) ω O y t = t Reference y coordinate W ωx x (X, Y, Z) Target coordinate V (x, y, z) t = 0 U
Figure 1. The coordinate of an air sonar and a target for micro Doppler analysis.
Suppose the hand has a translation velocity v with respect to the radar and an angular Suppose the hand has a translation velocity v with respect to the radar and anT angular rotation velocity ωω represented in the reference coordinate as ωω= (ωωX, ωωY, ωωZ) T. A point rotation velocity represented in the reference coordinateT as = ( X, Y, Z) . 0A point scatterer M on the hand, located at r0 = (X0, Y0, Z0) , at time t = 0 will move to M at time scatterer M on the hand, located at r0 = (X0, Y0, Z0)T, at time t = 0 will move to M’ at time t. t. This movement can be considered as a translation from M to M00 with velocity v and This movement can be considered as a translation from M to M” with velocity v and a a rotation from M00 to M0 with an angular velocity ω. At time t, the range vector from rotation from M″ to M′ with an angular velocity ω. At time t, the range vector from the the radar to the scatterer at M0 becomes radar to the scatterer at M′ becomes * * * * * OM 0 ⃑ =QO ⃑ +OO 0 ⃑ +O′M′0 0 ⃑ =𝑅 ⃑ +𝑣*⃑𝑡+𝑟*⃑. OM = QO + OO + O M = R0 + v t + r . The scalar range of OM * ⃑ can be expressed as The scalar range of OM0 can be expressed as 𝑅 = 𝑅 ⃑ +𝑣⃑𝑡+𝑟⃑ . * * * R =k R + v t + r k . When the air sonar emits a continuoust 0 sinusoidal wave with a carrier frequency f0, the echo signal s(t) received from the scatterer on the hand at the position (x, y, z) is ex- When the air sonar emits a continuous sinusoidal wave with a carrier frequency pressed as a function of Rt. f 0, the echo signal s(t) received from the scatterer on the hand at the position (x, y, z) is expressed as a function of R . 2𝑅 t 𝑠(𝑡) =𝜌(𝑥, 𝑦,) 𝑧 exp 𝑗 2π𝑓 , (1) 𝑐 2Rt where 𝜌(𝑥, 𝑦, 𝑧) is the reflectivitys(t) = ρfunction(x, y, z) expof thej po2πintf scatterer, M described in the target (1) c local coordinates (x, y, z), c is the speed of sound, and the phase of the signal is where2π𝑓(2𝑅ρ (/𝑐)x, y. ,Thez) is Doppler the reflectivity frequency function shift by of thehand point motion scatterer can be M obtained described by in taking the target the timelocal derivative coordinates of (x the, y, phasez), c is as the [23] speed of sound, and the phase of the signal is 2π f (2Rt/c). The Doppler frequency shift by hand2𝑅 motion can be obtained by taking the time derivative 𝑑(2πf( )) 1 𝑐 2𝑓 𝑑𝑅 2𝑓 of the phase as [23] 𝑓 = = = (𝑣⃑ +𝜔 ⃑ ×𝑟⃑) 𝑛 ⃑, (2) 2𝜋 𝑑𝑡 𝑐 𝑑𝑡 𝑐 2R ( π ( t )) T where 𝑛 ⃑ =(𝑅 ⃑ +𝑣⃑𝑡1 +⃑)/𝑅 𝑟 d 2 . f c 2 f dRt 2 f * * * * fD = = = ( v + ω × r ) nm, (2) Thus, the Doppler2 πfrequencydt shift includedc dt thec effect of translation and rotation as well as* the direction* * from* the radar to the scatterer on the hand. Four hand gestures, = ( + + ) wherenamely,n m“push”,R0 “wristv t motionr /Rt. from flexion to extension”, “pinch out”, and “hand rota- tion”, were performed with the right hand for this study. Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 18
Appl. Sci. 2021, 11, 5407 (1) Push: As shown in Figure 2a, the push gesture involves only a translational mo-4 of 18 tion of the palm. While the hand moved forward from an initial position to a stop position, the normal direction of the palm was toward the radar and the palm underwent the ac- celeration, near-constant velocity, and deceleration stages. The maximum Doppler fre- quencyThus, shift the was Doppler contributed frequency by the shift center included of the palm the effectbecause of the translation velocity direction and rotation was as wellnormal as the to direction the direction from of the the radar radar to to the the scatterer center of on the the palm. hand. The Four minimum hand gestures, Doppler namely, fre- “push”,quency “wrist shift resulted motion from from the flexion corner to of extension”, the moving “pinch palm. out”, and “hand rotation”, were performed(2) Wrist with motion the right from hand flexion for thisto extension: study. As shown in Figure 2b, the hand moved from(1) the Push: position As shownwith the in palm Figure away2a, from the pushthe radar gesture to that involves with the only palm a translationaltoward the motionradar. Initially, of the palm. the wrist While was the flexed. hand Then, moved regarding forward thefrom angular an displacement, initial position the to hand a stop position,underwent the acceleration, normal direction constant of the speed, palm and was th towarden deceleration the radar stages. and the Finally, palm the underwent wrist thewas acceleration, extended. The near-constant pivot region, velocity, which can and be decelerationconsidered as stages. the wrist The joint, maximum possesses Doppler zero frequencyvelocity during shift was the contributedentire motion, by thus the centerresulting of in the a palmzero Doppler because frequency the velocity shift. direction The waslargest normal Doppler to the frequency direction shift of the was radar contributed to the center by the of largest the palm. value The of the minimum hand velocity Doppler frequencyat a certain shift scatterer resulted point from on the the corner hand ofmultiplying the moving the palm. unit direction vector from that scatterer to the radar.
FigureFigure 2. 2.Hand Hand gestures: gestures: ((aa)) PushPush motion. ( (bb)) Wrist Wrist from from flexion flexion to to extension. extension. (c) ( cPinch) Pinch out out mo- motion. (dtion.) Hand (d) rotationHand rotation motion. motion. (e) Used (e) Used ultrasonic ultrasonic emitter emitter and and receiver. receiver.
(2) Wrist motion from flexion to extension: As shown in Figure2b, the hand moved from the position with the palm away from the radar to that with the palm toward the radar. Initially, the wrist was flexed. Then, regarding the angular displacement, the hand un- derwent acceleration, constant speed, and then deceleration stages. Finally, the wrist was extended. The pivot region, which can be considered as the wrist joint, possesses Appl. Sci. 2021, 11, 5407 5 of 18
Appl. Sci. 2021, 11, x FOR PEER REVIEWzero velocity during the entire motion, thus resulting in a zero Doppler frequency5 shift.of 18 The largest Doppler frequency shift was contributed by the largest value of the hand velocity at a certain scatterer point on the hand multiplying the unit direction vector from that scatterer to the radar. (3) Pinch out: This motion is mainly performed by the thumb and index finger. In the (3) Pinch out: This motion is mainly performed by the thumb and index finger. In the initial state, the front ends of the index finger and thumb contacted each other. The re- initial state, the front ends of the index finger and thumb contacted each other. The remain- maining fingers were clenched. The motion is illustrated in Figure 2c. The front ends of ing fingers were clenched. The motion is illustrated in Figure2c. The front ends of the two the two fingers gradually opened up along with the forward movement of the hand. The fingers gradually opened up along with the forward movement of the hand. The thumb andthumb index and finger index underwentfinger underwent acceleration acceleration from the from initial the positioninitial position and then and decelerated then decel- toerated a stop to position,a stop position, thus completing thus completing this motion. this motion. The index The index finger finger rotated rotated in a clockwisein a clock- direction,wise direction, and the and thumb the thumb rotated rotated in a counterclockwise in a counterclockwise direction. direction. These These two motionstwo motions pro- videprovide a negative a negative Doppler Doppler frequency frequency shift with shift the with magnitude the magnitude gradually gradually increasing, increasing, reaching thereaching maximum, the maximum, and then and decreasing then decreasing to zero. Meanwhile,to zero. Meanwhile, the remaining the remaining fingers formed fingers aformed partial a fist partial and underwentfist and underwent acceleration acceleration and deceleration and deceleration phases. phases. This caused This acaused similar a Dopplersimilar Doppler frequency frequency shift as the shift push as the motion, push but motion, covered but a covered smaller scatteringa smaller areascattering compared area withcompared the push with motion. the push motion. (4)(4) HandHand rotation:rotation: The motion of handhand rotationrotation isis similarsimilar toto thethe wristwrist motionmotion fromfrom flexionflexion to extension,extension, withwith thethe majormajor differencedifference beingbeing thethe axisaxis ofof rotation.rotation. RegardingRegarding handhand rotation,rotation, the hand was aligned aligned with with the the lower lower arm, arm, and and all all the the fingers fingers closed closed to to form form a aplane plane with with the the palm palm (Figure (Figure 2d).2d). The The axis axis of of rotation rotation was the centerline ofof thethe lowerlower arm.arm. TheThe rotationrotation motionmotion startedstarted fromfrom thethe palmpalm towardtoward thethe radarradar andand rotatedrotated toto makemake thethe palmpalm faceface awayaway fromfrom thethe radar.radar. TheThe handhand motionmotion underwentunderwent twotwo phasesphases ofof accelerationacceleration andand deceleration. The axis of rotation had zero velocity, andand thusthus contributedcontributed zerozero frequencyfrequency shift. During the motion, the part of the hand from the axisaxis ofof rotationrotation toto thethe littlelittle fingerfinger edge provided a positivepositive Doppler frequency shift, followed by a negative Doppler fre- quencyquency shift.shift. Meanwhile,Meanwhile, the the rest rest of of the the hand hand produced produced a negative a negative Doppler Doppler frequency frequency shift, followedshift, followed by a small by a small positive positive Doppler Doppler frequency frequency shift. shift. Hand gesture cancan bebe servedserved asas anan importantimportant tooltool forfor humanhuman interaction.interaction. ComparedCompared toto existingexisting interfaces,interfaces, handhand gesturesgestures havehave thethe advantagesadvantages ofof beingbeing intuitiveintuitive andand easyeasy toto use. The The investigated investigated four four different different kinds kinds of ofhand hand gestures gestures which which could could be applied be applied to hu- to human-robotman-robot interaction. interaction. The The importance importance of of the the characterization characterization of of these these hand hand gestures areare as following:following: thethe pushpush motion motion commands commands the the robot robot arm arm to to move move forward, forward, the the wrist wrist motion mo- fromtion from flexion flexion to extension to extension asks asks the robotthe robot grip grip to make to make a right a right turn, turn, the pinchthe pinch out out motion mo- instructstion instructs the gripperthe gripper of robot of robot to open,to open, and and the the hand hand rotation rotation motion motion requests requests the the robotrobot body/platform to to turn turn left left (Figure (Figure 33).).
Figure 3. The potential robot operating applicationapplication ofof thethe proposedproposed handhand gesturegesture recognition.recognition.
Appl. Sci. 2021, 11, 5407 6 of 18
3. Hardware Setup of Air Sonar 3.1. Ultrasonic Emitter and Receiver We employed MA40S4S as the emitter and MA40S4R as the receiver (Murata Co., Nagaokakyo, Japan) with an operating frequency of approximately 40 kHz in the air sonar system. The sinusoidal signal was sent using a function generator to actuate the ultrasonic emitter, and an NI-USB6351 data acquisition card was used to acquire the electrical signal from the ultrasonic receiver. To avoid the sinusoidal wave generated due to distortion, we set the sampling frequency to 250 kHz, which is five times higher than the operating frequency, to examine the performance of the selected pair of ultrasonic transducers. Two tests were performed under a sinusoidal voltage input of 20 Vpp: (1) the emitter was fixed, and the receiver was placed 3, 18, and 38 cm away such that it was facing the emit- ter. The amplitudes of the acquired sinusoidal signals were 6.52, 0.736, and 0.188 Vpp, respectively. (2) The emitter and receiver were soldered on a circuit board 1 cm away from each other. A large glass plate was held parallel to the circuit board 7 and 40 cm away from the ultrasonic transducers as a reflector of the ultrasonic wave. The resulting amplitudes of the acquired signals were 2.3 and 0.424 Vpp, respectively. These two sim- ple tests verify the capability of the selected transducer pair to obtain a sufficiently large signal compared with the environmental noise level during the subsequent hand gesture experiment performed approximately 10 to 30 cm away from the transducer pair.
3.2. Circuit for Air Sonar Operation The air sonar was designed with a transmitted frequency of 40 kHz. The minimum sampling rate should be greater than 80 kHz to acquire the reflected signal directly. It can be even higher than 200 kHz to obtain a better waveform of the reflected signal. The sampling rate and data amount for algorithm processing should be reduced to make the hand gesture recognition technology easily implementable online. As we utilized the Doppler effect of the received signal for our hand gesture recog- nition, the sampling frequency is much lower than that required to acquire the signal emitted from the ultrasonic emitter. This could be realized through a frequency-mixing technique. Consider f 1 as the frequency of the received ultrasonic wave, which is given by the frequency of the transmitted ultrasonic wave plus the Doppler frequency shift. Consider f 2 as the frequency of the mixed signal, as specified. When signals of frequencies f 1 and f 2 enter the mixer, the mixed signals of frequencies f 1 + f 2 and f 1 − f 2 emerge at the output terminals. We employed the signal with the frequency (f 1 − f 2) for investigating the Doppler effect. Based on the preliminary experiment, the magnitude of the Doppler frequency shift of interest is less than 500 Hz. We constructed the mixed signal with the frequency f 1 − f 2 = f 1 (40.8 kHz + Doppler frequency shift) − f 2(37.6 kHz) = 3.2 kHz. The frequency of 3.2 kHz was selected, as it was five times higher than the Doppler frequency shift in this study. The oscillators, mixer, and filters were employed to construct the circuitry for reducing the carrier frequency [24]. Figure4 shows the circuit implementation. Two oscillators were realized using a Wien bridge sine-wave oscillator. One oscillator was operated at 40.8 kHz (f 1) to drive the ultrasonic emitter, and the other oscillator was operated at 37.6 kHz (f 2), as the mixed signal. The analog multiplier AD633 served as the mixing function. A bandpass filter was applied to remove unwanted noises and obtain a reasonable frequency bandwidth signal. A low-pass filter was employed to remove the signal with a carrier frequency of f 1 + f 2. In addition, a voltage follower was added to avoid the loading effect of the voltage signal from the ultrasonic receiver to the bandpass filter. After bandpass filtering, a voltage amplifier was used to adjust the gain so that the signal sent to the mixer had a proper amplitude during the hand gesture operation. Appl. Sci. 2021, 11, 5407 7 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 18
Figure 4. The circuit implementation for air sonar operation. Figure 4. The circuit implementation for air sonar operation.
4.4. Detecting Detecting the the Occurrence Occurrence of of the The Hand Hand Gesture Gesture Event Event 4.1.4.1. Ultrasonic Ultrasonic Emitter Emitter and and Receiver Receiver ToTo detect detect the the occurrence occurrence of of hand hand gesture gesture event, event, we we utilized utilized the the continuous continuous short- short- periodperiod analysis. analysis. EachEach short-period short-period was was set set as as 0.02 0.02 s, s, which which corresponded corresponded to to 1000 1000 data data pointspoints based based on on a samplinga sampling rate rate of 50of kHz.50 kHz. The The time time resolution resolution for detecting for detecting the occurrence the occur- ofrence a hand of a gesture hand gesture would would be one-fiftieth be one-fiftieth of a second. of a second. The spectrum The spectrum was analyzed was analyzed by fast by Fourierfast Fourier transform transform (FFT). (FFT). TwoTwo signal signal processing processing techniques techniques were were applied applied in in this this study: study: bandpass bandpass and and notch notch filtering.filtering. The The bandpass bandpass filter filter was was employed employed to to process process the the signal signal to to remove remove the the frequency frequency bandsbands other other than than that that from from 2.7 2.7 to to 3.7 3.7 kHz. kHz. We We chose chose a a Butterworth Butterworth filter filter of of order order 7 7 with with thethe cutoff cutoff frequencies frequencies of of 2.7 2.7 and and 3.7 3.7 kHz kHz to to design design the the bandpass bandpass filter. filter. The The filtered filtered time time signalssignals can can be be easily easily used used to to distinguish distinguish between between no no hand hand gesture gesture and and the the occurrence occurrence of of aa hand hand gesture gesture through through an an amplitude amplitude change. change. Moreover,Moreover, considering considering the the critical critical moment moment at at which which the the hand hand gesture gesture started, started, we we ob- ob- servedserved that that the the frequency frequency peaks peaks split split into into two two major major frequencies, frequencies, with with one one at at a a carrier carrier frequencyfrequency of of ~3200 ~3200 Hz Hz and and the the other other deviating deviating from from the the carrier carrier frequency. frequency. The The strength strength of of thethe latter latter peak peak had had a a varied varied amplitude, amplitude, which which was was related related to to the the hand hand gesture gesture executed. executed. OnceOnce the the algorithm algorithm for for detecting detecting the the hand hand gesture gesture starting starting time time was was to to select select the the charac- charac- teristicteristic frequency frequency as as the the peak peak frequency frequency with with the the maximum maximum amplitude amplitude for for FFT FFT results results of of each short-period (1000 sample points), the carrier frequency was possible to be selected each short-period (1000 sample points), the carrier frequency was possible to be selected as the characteristic frequency. Practically, the carrier frequency should have a narrower as the characteristic frequency. Practically, the carrier frequency should have a narrower bandwidth compared to that of the Doppler frequency because the carrier frequency was bandwidth compared to that of the Doppler frequency because the carrier frequency was synthesized by the constructed circuit. The Doppler frequency could easily cover a wider synthesized by the constructed circuit. The Doppler frequency could easily cover a wider frequency range because performing the hand gesture was hard to maintain a constant ve- frequency range because performing the hand gesture was hard to maintain a constant locity at all points of hand to reflect the ultrasonic waves. Thus, a proper notch filter design velocity at all points of hand to reflect the ultrasonic waves. Thus, a proper notch filter to effectively reduce the amplitude of carrier frequency peak and still allow the Doppler design to effectively reduce the amplitude of carrier frequency peak and still allow the frequency peak to be selected as the characteristic frequency during hand gesture operation Doppler frequency peak to be selected as the characteristic frequency during hand gesture was needed. operation was needed. By closely examining the amplitude of the main lobe for the case of no hand gesture, we consideredBy closely a examining value of 0.007 the amplitude as a reference of the for main the notch lobe for filter the design. case of Theno hand notch gesture, filter we considered a value of 0.007 as a reference for the notch filter design. The notch filter Appl. Sci. 2021, 11, 5407 8 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 18
was set a center frequency as the carrier frequency. The bandwidth and the maximum was set a center frequency as the carrier frequency. The bandwidth and the maximum attenuation gain of the notch filter were 5 Hz and 0.01, respectively. attenuation gain of the notch filter were 5 Hz and 0.01, respectively. Figure5 shows the processed results of the time-domain signals and their frequency Figure 5 shows the processed results of the time-domain signals and their frequency spectrum without any hand gesture and with a hand gesture (push motion) occurring spectrum without any hand gesture and with a hand gesture (push motion) occurring approximately at 5 s. The spectrum results were overlaid by fast Fourier transform analysis approximately at 5 s. The spectrum results were overlaid by fast Fourier transform analy- through a continuous short-period analysis. sis through a continuous short-period analysis.
FigureFigure 5.5. TheThe acquiredacquired signalsignal afterafter bandpassbandpass and notch filtering:filtering: (a) time-domain signal and (b) its continuouscontinuous FFTFFT resultsresults ofof eacheach 2020 msms short-timeshort-time periodperiod atat nono handhand gesture gesture condition. condition. Time-domainTime-do- signalmain signal (c) and (c) its and continuous its continuous FFT resultsFFT results of each of each 20 ms 20 short-time ms short-time period period (d) under(d) under hand hand gesture ges- (pushture (push motion) motion) condition. condition. The The spectrum spectrum results results were were overlaid overlaid by by fast fast Fourier Fourier transform transform analysis analy- throughsis through a continuous a continuous short-period short-period analysis. analysis.
4.2.4.2. SchemeScheme ofof JudgingJudging thethe StartStart TimeTime ofof HandHand GestureGesture TwoTwo parametersparameters werewere investigatedinvestigated toto determinedetermine thethe startstart timetime ofof thethe handhand gesturegesture operationoperation forfor eacheach short-timeshort-time signalsignal processedprocessed withwith thethe aforementionedaforementioned bandwidthbandwidth andand notchnotch filters:filters: thethe frequencyfrequency shiftsshifts andand thethe amplitudesamplitudes ofof thethe frequencyfrequency peaks.peaks. FigureFigure6 6aa shows shows the the frequency frequency values values of of the the frequency frequency peaks peaks obtained obtained with with each each time time frameframe ofof 0.020.02 s.s. These frequency valuesvalues werewere firstfirst atat aa nearnear constantconstant level,level, andand thenthen showedshowed aa rapidrapid increaseincrease andand subsequentsubsequent dropdrop fromfrom aa maximum.maximum. TheThe timetime intervalinterval ofof thethe evidentevident upsurgeupsurge andand decrease decrease region region was was approximately approximately 1 s,1 whichs, which indicated indicated that that the the hand hand gesture ges- startedture started from anfrom acceleration an acceleration state, then state, changed then changed to a deceleration to a deceleration one, and thenone, stopped.and then stopped.Although checking the maximum peak frequencies could determine the start time for mostAlthough of the studied checking hand the gestures, maximum a likely peak situation frequencies is that could the maximum determine peaks the start occurred time aroundfor most the of designed the studied cut-off hand frequencies gestures,of a thelikely bandpass situation filter is withoutthat the handmaximum gesture peaks motion. oc- Thiscurred was around because the the designed amplitude cut-off of the frequencie carrier frequencys of the bandpass after processing filter without with thehand notch ges- filterture motion. could be This less was than because the amplitudes the amplitude of the of peaks the carrier around frequency the frequency after processing region close with to thethe cut-offnotch filter frequency could be of less the than Butterworth the amplit filter.udes This of the would peaks produce around the the frequency peak-frequency region curveclose to jump the cut-off from the frequency carrier frequencyof the Butterworth to the frequency filter. This near would the produce cut-off frequencythe peak-fre- of thequency bandpass curve filterjump and from thus the cause carrier a mistakefrequency in to judging the frequency the start near time the of thecut-off hand frequency gesture. In contrast, a more reliable method could be the use of the amplitudes of the maximum of the bandpass filter and thus cause a mistake in judging the start time of the hand ges- peaks to judge the start time of the hand gestures. Figure6b shows the amplitudes of ture. the maximum peak obtained with each time frame of 0.02 s. The amplitudes of the maxi- In contrast, a more reliable method could be the use of the amplitudes of the maxi- mum peaks resulted in a larger variation during a hand gesture compared with the cases mum peaks to judge the start time of the hand gestures. Figure 6b shows the amplitudes of a hand gesture at a standby position or no hand gesture. We first attempted to utilize of the maximum peak obtained with each time frame of 0.02 s. The amplitudes of the Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 18
Appl. Sci. 2021, 11, 5407 9 of 18
maximum peaks resulted in a larger variation during a hand gesture compared with the cases of a hand gesture at a standby position or no hand gesture. We first attempted to theutilize amplitude the amplitude difference difference of the maximum of the maximum peaks between peaks between neighboring neighboring time frames time to frames judge theto judge start time the start of the time hand of gestures. the hand This gestures. method This is consideredmethod is asconsidered the amplitude as the differentia- amplitude tiondifferentiation of continuously of continuously maximum peaks.maximum The resultpeaks. displayedThe result more displayed sharp more peaks sharp compared peaks withcompared the case with without the case differentiation. without differentiation. For other hand For other gestures, hand this gestures, differentiation this differentia- value wastion not value sufficiently was not sufficiently large to be selectedlarge to be as sele thected representative as the representative start time ofstart hand time gestures, of hand such as hand rotation motion. Consequently, we then performed the second differentiation gestures, such as hand rotation motion. Consequently, we then performed the second dif- of the result. The results of the second derivative were even sharper than those of the first ferentiation of the result. The results of the second derivative were even sharper than those derivative for all the hand gestures (Figure6c). of the first derivative for all the hand gestures (Figure 6c).
FigureFigure 6. 6. ((aa)) TheThe frequenciesfrequencies of the maximum maximum peaks peaks under under a a push push hand hand gesture gesture motion. motion. (b ()b The) The amplitudesamplitudes ofof thethe maximummaximum peaks analyzed with the the signal signal as as ( (aa).). (c ()c )The The signal signal in in (b ()b after) after 2nd 2nd
differentiation.differentiation. ( d(d)) The The result result of ofT Tpickpick [[nn]] usedused toto judgejudge thethe startstart timetime ofof handhand gesture.gesture.
Finally,Finally, we we selected selected the the start start time timeT Tpickpick [[nn]] ofof thethe hand hand gesture gesture based based on on the the second second derivativederivative ofof thethe strength strength at at the the peak peak frequency frequency with with the the maximum maximum strength strength by by defining defining the following parameter: the following parameter: A[n + 2] Tpick[n] = 𝐴[𝑛+2] , (3) 𝑇 [𝑛] = A[n + 1] − A[n] , (3) 𝐴[𝑛+1] − 𝐴[𝑛] where A[n] is the second derivative of the amplitudes of the maximum peaks at time n. whereIf weA[n consider] is the second that the derivative amplitude of the of theamplitudes maximum of peakthe maximum as a function peaks of at the time hand n. position,If weA [considern] indicates that the the acceleration amplitude of of the the maximum hand and thepeak difference as a functionA[n + of 1] the− Ahand[n] indicatesposition, theA[n jerk] indicates of the hand.the acceleration Therefore, of in the terms hand of and physical the difference meaning, A we[n determine+ 1] − A[n] theindicates moment the of jerk occurrence of the hand. of minimum Therefore, jerk in terms and a of high physical acceleration meaning, as we the determine start time the of themoment hand gesture.of occurrence A threshold of minimu valuem could jerk and be determined a high acceleration to find the as appropriatethe start timeTpick of [then] sohand that gesture. the time A step threshold n could value be evaluated could be as determined the start time to offind the the hand appropriate gesture (Figure Tpick [6nd).] so that the time step n could be evaluated as the start time of the hand gesture (Figure 6d). 5. Scheme of Hand Gesture Recognition 5.1.5. Scheme Convert of Motion Hand Signal Gesture to Time-Frequency Recognition Response 5.1. ConvertAfter judging Motion theSignal start to timeTime-Frequency of the hand Response gesture, we need to evaluate the acquired signal further for hand gesture recognition. We proposed a method of utilizing the time– After judging the start time of the hand gesture, we need to evaluate the acquired frequency response to evaluate gestures. Figure7 shows the block diagram of the proposed signal further for hand gesture recognition. We proposed a method of utilizing the time– method. As the common gesture motions were performed for approximately 1 s, we could frequency response to evaluate gestures. Figure 7 shows the block diagram of the pro- pick only the signal data in this time interval for performing the analysis of the time– posed method. As the common gesture motions were performed for approximately 1 s, frequency response once the start of the hand gesture was judged. This effectively reduced the electrical power of the microprocessor required for implementing this technique on Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 18
we could pick only the signal data in this time interval for performing the analysis of the time–frequency response once the start of the hand gesture was judged. This effectively reduced the electrical power of the microprocessor required for implementing this tech- nique on a portable device. In this investigation, we took 150,000 sample points, which corresponded to a signal of a duration of 3 s, for spectrogram analysis while monitoring the occurrence of hand gestures. More specifically, once we detected the data point (=i) representing the start time of the hand gesture, the data points ranging from (i − 69,999) to (i + 80,000) were extracted for the spectrogram analysis. Considering the data length of 150,000 was a conservative choice to analyze a hand gesture motion fully because the re- sulting spectrogram, which is described in the following section, indicated that an effec- Appl. Sci. 2021, 11, 5407 tive hand gesture pattern only occupied approximately one half of the spectrogram10 in of the 18 x direction (time axis). The data length for judging one gesture could be significantly re- duced by employing this technique in future real-time applications. The analysis of the time–frequency response utilized the short-time Fourier trans- a portable device. In this investigation, we took 150,000 sample points, which corresponded form (STFT). The two STFT parameters important for our analysis are the window and to a signal of a duration of 3 s, for spectrogram analysis while monitoring the occurrence ofoverlap hand gestures.sizes. The Morewindow specifically, size affected once the we resolution detected in the the data time-domain point (=i) variation. representing Alt- thehough start a time small of window the hand size gesture, yielded the a data better points time ranging resolution, from the (i −frequency-domain69,999) to (i + 80,000) reso- werelution extracted was poor. for The the overlap spectrogram size was analysis. also critical Considering for the resolution the data in length both the of time 150,000 and wasfrequency a conservative domains. choice A smaller to analyze overlapping a hand resulted gesture in motion a worse fully resolution, because which the resulting indicated spectrogram,that the pixelated which image is described appeared in thein the following spectrogram. section, However, indicated a thatsmoother an effective spectrogram hand gesturewith a patternlarger overlapping only occupied required approximately a considerable one half calculation of the spectrogram effort. A Hamming in the x direction window (timewith axis).a window The size data of length 20,000 for and judging an overlap one si gestureze of 19,000 could was be selected significantly in this reduced study. This by employingindicates that this the technique spectrogram in future derived real-time for our applications. hand gesture recognition had a time reso- lution of 0.02 s and a frequency resolution of 2.5 Hz.
FigureFigure 7. 7.The The block block diagram diagram of of the the proposed proposed method method for for hand hand gesture gesture recognition. recognition.
5.2. TheProcessing analysis Spectrogram of the time–frequency Result of the responseImage for utilizedHand Gesture the short-time Recognition Fourier transform (STFT).As The previously two STFT mentioned, parameters STFT important was applied for our to analysis obtain the are spectrogram the window andfor the overlap signal sizes.of interest. The window The Hamming size affected window the size resolution was set in as the 20,000, time-domain and the overlap variation. length Although was set a small window size yielded a better time resolution, the frequency-domain resolution was poor. The overlap size was also critical for the resolution in both the time and frequency domains. A smaller overlapping resulted in a worse resolution, which indicated that the pixelated image appeared in the spectrogram. However, a smoother spectrogram with a larger overlapping required a considerable calculation effort. A Hamming window with a window size of 20,000 and an overlap size of 19,000 was selected in this study. This indicates that the spectrogram derived for our hand gesture recognition had a time resolution of 0.02 s and a frequency resolution of 2.5 Hz.
5.2. Processing Spectrogram Result of the Image for Hand Gesture Recognition As previously mentioned, STFT was applied to obtain the spectrogram for the signal of interest. The Hamming window size was set as 20,000, and the overlap length was set as 19,000 for analyzing the 150,000 data points. The image obtained from the surf function with the spectrogram computed results using the MATLAB software was directly used to Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 18
as 19,000 for analyzing the 150,000 data points. The image obtained from the surf function with the spectrogram computed results using the MATLAB software was directly used to obtain the plot with the frequency in the range of 2.7 to 3.7 kHz, as shown in Figure 8a. This image has two major characteristics: (1) the calculated power spectrum is in a double- precision float-point format; (2) the edge of the signal pattern appears insufficiently smooth. The former characteristic could make our proposed method of image pattern recognition more complicated and time-consuming. The latter characteristic could be an issue while finding the feature points. Instead of using the default image output with surf function for the STFT result, we processed the output data of the image as follows. We converted the calculated power spectrum from the STFT format into the uint8 format, that is, all the values of the power spectrum were shown as integers from 0 to 255, and then displayed the resulting values as a grayscale image. During this conversion, only the frequency range of the STFT data was processed, which ranges from 2.7 to 3.3 kHz. This reduced the processing time of the calculation compared with dealing with the entire frequency range of the STFT results from 0 to 25 kHz. Figure 8b shows the resulting image of the push motion. However, the contrast of the image was not sufficient to differentiate the motion part from the background. An image processing technique for histogram equalization was em- ployed to enhance the contrast. Based on the histogram analysis of the image derived from the STFT, all the pixels were at a higher gray level, which indicates that the image had an over-exposure trait. We utilized the following equations so that the pixel values could be Appl. Sci. 2021, 11, 5407 mapped in the range from 0 to 255: 11 of 18
0, if 𝑟 <