AF Kohn (2006). Autocorrelation and Cross-Correlation Methods. In
Total Page:16
File Type:pdf, Size:1020Kb
260 AUTOCORRELATION AND CROSSCORRELATION METHODS A. F. Kohn (2006). Autocorrelation and Cross-Correlation Methods. In: Wiley Encyclopedia of Biomedical Engineering, Ed. Metin Akay, Hoboken: John Wiley & Sons, pp. 260-283. ISBN: 0-471-24967-X. AUTOCORRELATION AND CROSSCORRELATION METHODS ANDRE´ FABIO KOHN University of Sa˜o Paulo Sao Paulo, Brazil 1. INTRODUCTION Any physical quantity that varies with time is a signal. Examples from physiology are being an electrocardiogram (ECG), an electroencephalogram (EEG), an arterial pres- sure waveform, and a variation of someone’s blood glucose concentration along time. In Fig. 1, one one can see ex- amples of two signals: an electromyogram (EMG) in (a) and the corresponding force in (b). When a muscle con- tracts, it generates a force or torque while it generates an electrical signal that is the EMG (1). The experiment to obtain these two signals is simple. The subject is seated with one foot strapped to a pedal coupled to a force or torque meter. Electrodes are attached to a calf muscle of the right foot (m. soleus), and the signal is amplified. The subject is instructed to produce an alternating pressure on the pedal, starting after an initial rest period of about 2.5 s. During this initial period, the foot stays relaxed on the pedal, which corresponds to practically no EMG signal and a small force because of the foot resting on the pedal. When the subject controls voluntarily the alternating con- tractions, the random-looking EMG has waxing and wan- ing modulations in its amplitude while the force also exhibits an oscillating pattern (Fig. 1). Biological signals vary as time goes on, but when they are measured, for example, by a computerized system, the measures are usually only taken at pre-specified times, usually at equal time intervals. In a more formal jargon, it is said that although the original biological signal is de- fined in continuous time, the measured biological signal is defined in discrete time. For continuous-time signals, the time variable t takes values either from 1 to þ1(in theory) or in an interval between t1 and t2 (a subset of the real numbers, t1 indicating the time when the signal AUTOCORRELATION AND CROSSCORRELATION METHODS 261 estimated by 5000 P 1 NÀ1 ðwðiÞw ÞðhðiÞhÞ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiN i ¼ 0 r ¼ P P ; ð1Þ 0 1 NÀ1 2 1 NÀ1 2 N i ¼ 0 ðwðiÞw Þ Á N i ¼ 0 ðhðiÞhÞ − 5000 where ½wðiÞ; hðiÞ, i ¼ 0; 1; 2; ...; N À 1 are the N pairs of 0 5 10 15 20 measurements (e.g., from subject number 0 up to subject number N À 1); w and h are the mean values computed (a) from the N values of w(i) and h(i), respectively. Sometimes 1000 r is called the linear correlation coefficient to emphasize that it quantifies the degree of linear relation between two 800 variables. If the correlation coefficient between the two 600 variables w and h is near the maximum value 1, it is said that the variables have a strong positive linear correla- 400 tion, and the measurements will gather around a line with positive slope when one variable is plotted against the 200 other. On the contrary, if the correlation coefficient is near 0 the minimum attainable value of À 1, it is said that the 0510 15 20 two variables have a strong negative linear correlation. In t (s) this case, the measured points will gather around a neg- (b) atively sloped line. If the correlation coefficient is near the Figure 1. Two signals obtained from an experiment involving a value 0, the two variables are not linearly correlated and human pressing a pedal with his right foot. (a) The EMG of the the plot of the measured points will show a spread that soleus muscle and (b) the force or torque applied to the pedal are does not follow any specific straight line. Here, it may be represented. The abscissae are in seconds and the ordinates are in important to note that two variables may have a strong arbitrary units. The ordinate calibration is not important here nonlinear correlation and yet have almost zero value because in the computation of the correlation a division by the for the linear correlation coefficient r. For example, standard deviation of each signal exists. Only the first 20 s of 100 normally distributed random samples were generated the data are shown here. A 30 s data record was used to compute by computer for a variable h, whereas variable w the graphs of Figs. 2 and 3. was computed according to the quadratic relation w ¼ 300 ðh À hÞ2 þ 50. A plot of the pairs of points ðw; hÞ will show that the samples follow a parabola, which means started being observed in the experiment and t2 the final that they are strongly correlated along such a parabola. time of observation). Such signals are indicated as y(t), On the other hand, the value of r was 0.0373. Statistical x(t), w(t), and so on. On the other hand, a discrete-time analysis suggests that such a low value of linear correla- signal is a set of measurements taken sequentially in tion is not significantly different to zero. Therefore, a near time (e.g., at every millisecond). Each measurement point zero value of r does not necessarily mean the two variables is usually called a sample, and a discrete-time signal is are not associated with one another, it could mean indicated by by y(n), x(n), and w(n), where the index n is that they are nonlinearly associated (see Extensions and an integer that points to the order of the measurements in Further Applications). the sequence. Note that the time interval T between two On the other hand, a random signal is a broadening of adjacent samples is not shown explicitly in the y(n) rep- the concept of a random variable by the introduction of resentation, but this information is used whenever an in- variations along time and is part of the theory of random terpretation is required based on continuous-time units processes. Many biological signals vary in a random way (e.g., seconds). As a result of the low price of computers in time (e.g., the EMG in Fig. 1) and hence their mathe- and microprocessors, almost any equipment used today matical characterization has to rely on probabilistic con- in medicine or biomedical research uses digital signal cepts (3–5). For a random signal, the mean and the processing, which means that the signals are functions autocorrelation are useful quantifiers, the first indicating of discrete time. the constant level about which the signal varies and the From basic probability and statistics theory, it is known second indicating the statistical dependencies between the that in the analysis of a random variable (e.g., the height values of two samples taken at a certain time interval. The of a population of human subjects), the mean and the time relationship between two random signals may be an- variance are very useful quantifiers (2). When studying alyzed by the cross-correlation, which is very often used in the linear relationship between two random variables biomedical research. (e.g., the height and the weight of individuals in a popu- Let us analyze briefly the problem of studying quanti- lation), the correlation coefficient is an extremely useful tatively the time relationship between the two signals quantifier (2). The correlation coefficient between N shown in Fig. 1. Although the EMG in Fig. 1a looks measurements of pairs of random variables, such as erratic, its amplitude modulations seem to have some pe- the weight w and height h of human subjects, may be riodicity. Such slow amplitude modulations are sometimes 262 AUTOCORRELATION AND CROSSCORRELATION METHODS called the ‘‘envelope’’ of the signal, which may be esti- negative delay, meaning the EMG precedes the soleus mated by smoothing the absolute value of the signal. The muscle force. Many factors, experimental and physiologic, force in Fig. 1b is much less erratic and exhibits a clearer contribute to such a delay between the electrical activity of oscillation. Questions that may develop regarding such the muscle and the torque exerted by the foot. signals (the EMG envelope and the force) include: what Signal processing tools such as the autocorrelation and periodicities are involved in the two signals? Are they the the cross-correlation have been used with much success in same in the two signals? If so, is there a delay between the a number of biomedical research projects. A few examples two oscillations? What are the physiological interpreta- will be cited for illustrative purposes. In a study of absence tions? To answer the questions on the periodicities of each epileptic seizures in animals, the cross-correlation be- signal, one may analyze their respective autocorrelation tween waves obtained from the cortex and a brain region functions, as shown in Fig. 2. The autocorrelation of the called the subthalamic nucleus was a key tool to show that absolute value of the EMG (a simple estimate of the en- the two regions have their activities synchronized by a velope) shown in Fig. 2a has low-amplitude oscillations, specific corticothalamic network (6). The cross-correlation those of the force (Fig. 2b) are large, but both have the function was used in Ref. 7 to show that insulin secretion same periodicity. The much lower amplitude oscillations by the pancreas is an important determinant of insulin in the autocorrelation function of the absolute value of the clearance by the liver. In a study of preterm neonates, it EMG when compared with that of the force autocorrela- was shown in Ref.