IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994 1827 Fast Identification n-Widths and Uncertainty Principles for LTI and Slowly Varying Systems

George Zames, Fellow, IEEE, Lin Lin, and Le Yi Wang, Member, IEEE

Abstract- The optimal worst-case uncertainty that can be that identification speed depends on the size of the region achieved by identificationdepends on the observation time. In the of uncertainty, which can be quantified using a measure of first part of the paper, this dependence is evaluated for selected metric complexity such as an n-width or a metric dimension. In linear time invariant systems in the 2’ and H” norms and shown to be derivable from a monotonicity principle. The minimal fact, the dependence of accuracy on time will be characterized time required is shown to depend on the metric complexity of by two notions of n-width which are related to the standard the a priori information set. Two notions of n-width (or metric notions of Kolmogorov and Gel’fand. dimension) are introduced to characterize this complexity. The paper is divided into two parts. The first part, consisting In the second part of the paper, the results are applied to of Sections 11-IV, is concerned with linear time invariant systems in which the law governing the evolution of the uncertain elements is not time invariant. Such systems cannot be identified systems. In the second part, consisting of Section V, the accurately. The inherent uncertainty is bounded in the case of results will be extended to obtain uncertainty principles for the slow time variation. identification of slowly varying linear systems. Slowly varying The n-widths and related optimal inputs provide benchmarks linear systems are of interest in adaptive control because from for the evaluation of actual inputs occurring in adaptive feedback a certain point of view they are the most general ones for which systems. an input-output theory is useful. In particular, identification of uncertain elements has predictive value only if their future I. INTRODUCTION behavior is like their past or, at worst, approximately like DENTIFICATION speed is one of the main elements their past. If a “black-box’’ system changes substantially in I affecting the performance of adaptive’ control systems. relation to the length of time needed to identify it, however, Speed can be defined in terms of the minimal observation then accurate identification is inherently impossible. This fact time needed to achieve a specified accuracy in input-output is expressed through uncertainty principles, which relate the behavior or, altematively, the maximum accuracy achievable inherent uncertainty to the n-widths mentioned above. in given time. Here, optimal speed will be computed for certain representative problems. It will be shown that the optimal Previous Results speed that can be achieved in identification depends on the The information-based approach to worst-case identifica- a priori information, independently of the particular algorithm tion was introduced by Zames [4], [3] using concepts of E- that may be used. (The optimal speed can only be achieved dimension (inverse n-width) and €-entropy. Recently there has for certain optimized inputs. Its main relevance in adaptive been a revived interest in this subject. We note especially the control will be in providing a benchmark against which the related works of Tse, Dahleh, and Tsitsiklis [5] and Helmicki effectiveness of input ensembles that are actually present in et al. [6] as well as various results on worst-case identification, feedback systems can be compared.) e.g., Makila [7], Gu et al. [8]. A mathematical framework for It has been emphasized elsewhere [3], [2] that the concept slowly linear time-varying (LTV) system control was given in of metric complexity can provide a unifying framework for the [9], [lo], and there are results on LTV identification in [ll]. subjects of identification, adaptive feedback, and organization. Portions of the present paper originally appeared in [l]. For example, feedback can be viewed [2] as an agent for the reduction of complexity. One of the objectives here will be to establish a direct link between identification speed and Notation complexity. With that in mind, identification problems will be C, R, Z and Z+ denote the complexes, reals, integers and considered in which the a priori information locates the plant nonnegative integers. in a region of uncertainty in a metric space. It will be shown P[a, b], 1 5 p 5 00, -00 < a 5 b < 00 denotes the space Manuscript received February 26, 1993; revised October 12, 1993. Recom- of sequences of real numbers f(t),t being an integer in the mended by Associate Editor, A. Vicino. interval a 5 t 5 b, satisfying IlfllP: = [E,”=,If(t)l”]’/P < 00 G.Zames and L. Lin are with Department of Electrical Engineering, McGill University, , , H3A 2A7. for 1 5 p < CO, and Ilflloo:= SUPtE[,,blIf(t)l < 0O. This L. Y. Wang is with Department of Electrical and Computer Engineering, notation is extended in the usual way to the cases where the Wayne State University, Detroit, Michigan 48202 USA. interval has one (or both) end points missing, such as [a, b), IEEE Log Number 9402994. or is (semi) infinite, such as [a, 00). ‘See [2] for a formal definition of input-output adaptation, based on the facts that performance is a function of accuracy (or uncertainty) and accuracy P[,,,] is the truncation operator on P, defined by increases with time. (P[,,,~f)(t):= f(t) for t E [n, m],and 0 otherwise.

0018-9286/94$04.00 0 1994 IEEE 1828 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEFTEMBER 1994

IlSll~is the norm of the largest function in the subset S of Say the output is observed on a time interval [to, tl) which a normed space L, i.e., IJSIIL:= sup { lllcll~: k E S}. may be infinite. For our reference problem it is important not to SI[tl,t2)is the subset of functions of S mapping Z to R assume at the outset that the input is zero before to, i.e., not to with support in the interval [tl,t2) of Z, i.e., Sl[tl,t2):= rule out the possibility that prior excitation might speed up the s n P[tl,t2)S. identification. It is that possibility which makes the problem Null (@):= {k E X:@(k) = O}. nontrivial, and there are several reasons for not excluding it: Hp(Dr)denotes the Hardy space on D,: = {z E (I: : IzI < If observations start while the plant is in operation and the T}, T > 0. IIHllm,r: = supZED,IH(z)l is the norm defined on initial conditions at time t, are not known, the response H"(Dr). to them acts as a noise component in the output. Until that HW(D):= H"(Dr)l,=l. noise decays to some small level, say at to, (to > t,), U*is the dual space of U. observations of the output often convey no information T:U -+ U,is the shift operator, i.e., (Tu)@)= u(t - 1). about the plant. The useful output observation interval d" denotes the Gel'fand n-width. starts at to, whereas the input is free to be shaped earlier, 0" denotes the identification n-width. after t,. p(-) denotes the variation rate of a time-varying system. If the input is designed to contain a deterministic dithering component, say an almost-periodic function, then knowl- 11. FORMULATION:TIME-INVARIANT CASE edge of the component completely determines its past, which can be viewed as a known prior excitation for the We will consider discrete-time systems represented by con- purposes of experiment. One would like to know whether volution operators of the form K:U + Y such a prior excitation could be beneficial. 00 Inputs which start prior to the observations will play a y(t) = Ck(r)u(t- 7-), t, 7- E Z (1) role in "moving window" adaptive problems (see Section r=O 111-B). where IC(.) E L, under the following assumptions: If the input u E U is known and fixed, then on the basis of i) U,Y are normed linear spaces of functions Z -+ W the observations of the output Ku(t)in [to, tl),the location representing inputs and outputs respectively. Initially, we of the true kernel, ktrue, is narrowed down from the a priori assume that the sets U and Y are equal to P(--co,CO). data set Sprier to a smaller set, S(ktrue) (Later this will be relaxed so as to include the case U = S(ktrue)={k S,ri,,(Ku-KtrUeu)(t)=O WE(~O,~I)}. Y = P(-co, CO). Inputs start at -CO to allow situations E in which the system is running before observations (2) begin.) It will be assumed henceforth that the a priori data set satisfies the following. ii) L is a normed linear algebra consisting of causal weight- Assumption I: Sprier is a closed convex symmetric (i.e., ing functions Z+ + R acting on input pasts. The set L is k E Sprier -k E Sprier) subset of L. contained in Z1[O, m), ensuring that (1) is well defined. +- If the estimated kernel, kest E K is optimally chosen for Moreover the norm on L satisfies 11 . IIL 5 Const.11 . 11p. (The H" norm is an example of such a norm.) S(ktrue), then it can be shown3 that the norm of the worst-case uncertainty, i.e., Identification is concerned with estimating the kernel k E L from observations of the output y and input U, given the a e(u):= sup inf sup llk-k,,tllL (3) priori information that the true kernel lies in a subset Sprier of ktrueESprlor kES(ktrue) L. In setting up our benchmark problem, certain assumptions is related to the radius of a null set in the following will be made that merit an explanation. S expression In general, y may be contaminated by noise and much identification research since the 1950s was concerned with S(u):= sup {Ilkll~:k E Sprier and reducing the uncertainty created by such noise. The concern here, on the other hand, is primarily with (multiplicative) plant (Ku)(t)= 0 vt E (to, tl)} (4) uncertainty, which will be shown to be inherently irreducible = SUP {IIkIlL: E Sprior nN.11 (P(t,,t,)Qu)>(5) below a certain level even when there is no (additive) noise. The control implications of these two kinds of uncertainty may by the inequalities be very different. Since the former is less well understood than &(U) 5 e(u)5 2S(u). (6) the later, it seems desirable initially to isolate the effects of plant uncertainty when there is no noise and defer the noisy Thus the radius 6(u) is both a lower bound and a good indica- case to a sequel.' Then, for fixed u E U,the map tor of worst-case plant uncertainty. Moreover, it will be shown later that if u is chosen to minimize e(u),then the stronger Qu:L + Y,Qu(k): = Ku = y 3The lower bound in (6) is obtained when ktrue = 0, in which case, is a linear (possibly unbounded) map from kernels to outputs. Assumption 1 implies that S(ktrue) is also convex and symmetric. An optimal choice of k,,, is at the Chebyshev center of S(kt,,,) and the resulting *Some of the differences between the noisy and noise free case are uncertainty in litrue is the radius of S(0) =: 6. The upper bound is obtained elaborated in [12]. by choosing liest to be any function in S(ktrue). ZAMES et al.: UNCERTAINTY PRINCIPLES FOR LTI AND SLOWLY VARYING SYSTEMS 1829 conclusion that e(uopt)= S(uopt)is possible, provided certain lower bounds to infuEU e(.). They are indicators of the least additional assumptions are satisfied. In that case, S(uopt) is the error achievable based on n arbitrary linear measurements in worst-case identification error. The dependence of S(uopt) on the Gel'fand case, or the least error that can be achieved in observation length and other factors will be computed next. approximating the system by a linear combination of n basis systems in the Kolmogorov case. These three notions of n- 111. UNCERTAINTYvs. OBSERVATION width coincide frequently and in examples in this paper, but LENGTH:TIME-INVARIANT RESULTS not in general." 8" has been computed by the authors [l] or elsewhere in The worst-case identification uncertainty optimized over all the recent literature [13] for certain special cases of the a inputs depends on the length of interval over which the output priori data set Sprier. The results, which are summarized in is observed. For time-invariant systems, this can be expressed the following example, were obtained by a variety of unrelated as a kind of "n-width." We recognize two situations: The first methods but appear in many respects to be very similar. Our in which the input is free to be selected exogenously, and first objective will be to derive them from a common principle. the second in which the input is a member of some ensemble Example I: Let U = Y = l"(-m, m), n > 0, C > 0, which might not be entirely free. For these two respective and 0 < r < 1. It will be shown in Section IV that situations, two notions of n-width will be introduced. The first, B", involves inputs optimized for a particular observation i) If interval of length n. The second, g", motivated by certain problems in adaptive control, involves inputs optimized over all shifts of such an observation interval. Actually, in online identification, the input is seldom free to be optimized. The point of finding optimal inputs for fast identification is, rather, to provide lower bounds and an ideal against which actual input ensembles can be compared, and towards which they can eventually be modified, e.g., by the introduction of a dither signal. With that in mind, we introduce the following.

A. The 7ime-Width 8" In the following definition of e", the output observation interval [to, to + n) is viewed as being fixed and the input as being optimized for that interval. At the outset, we do not wish to exclude the possibility that an optimal input might start prior to the observation interval. (See comments preceding (2).) Definition 1: For any n E Z+ and arbitrary to E Z In i) and ii), L = Z1 [0, m). In the next two cases L is the Wiener algebra, i.e., the Hm-nonned algebra of functions in B"(Sprior, L) H" (D) with Fourier coefficients (restricted) in l1 [0, m). infuEUsup { IIICIIL: 5 E Spriorr iii) If := { (Ku)(t)= 0, t E [to, to + n)}, n > 0 (7) I Isprior IIL n=O where IISpriorII~:= sup{llkll~:IC E Sprier}. 8" will be referred to as the identification n-width or the time-width of then Sprior. 8" is the uncertainty radius S(u) optimized over all inputs en(s;rior,L) = CP. (14) in U when observations of the output are restricted to n consecutive samples. An optimal input is one for which the iv) If infimum (7) is attained, i.e., S(uopt) = 8". Since the systems in Sprier are time-invariant, to can be fixed at 0 without loss of generality. 8" is an indicator of the intrinsic uncertainty or error achiev- where H'(z) denotes the derivative of H, then able on the basis of observations taken over n consecutive instants, and satisfies

8" 5 inf e(u) 5 28" (8) UEU by (6). The n-widths of Kolmogorov, d,, as well as Gel'fand, 4Although they always provide a bound accurate to within a factor of d", which have been used elsewhere in the context of identifi- three. For details see L. Lin and L. Y. Wang, "Time Complexity of Fast cation (See Section IV for definitions and details), also provide Identification," in Proc. 1993 ZEEE CDC, 1993, pp. 2099-2104. IFEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994

In each of these ex For some of our results we will need an additional assump-

2: Sprier is closed under the truncations re generally? there is a causal Cesaro operation rior with support outside the s Sprlorinto Cn(Sprlor) and satisfies

ll(I - P[o,n)Cn)SpriorJILI IISpriorI[n,co) IIL. (19) S(uOpt). It is far from trivial, however, to show that it is a Here a "Cesaro operation" is any map satisfying (19), and ntly, that excitation of the system is so called because it will typically be obtained via a Cesaro servation interval cannot reduce the summation; i.e., the first n samples of each impulse response in worst-case error. It will be shown that in fact it cannot if the S will be multiplied by a weighting function which decreases a priori set of impulse responses has a property of monotone with time. decrease with time, which our examples have in common, and Proposition 1: If in addition to Assumption 1, Assumption which will be described next. 2 also holds, then under the conditions of Theorem lb, By the norm llSll~of any subset S of L we mean IISIIL: = 0" (Sprier, L) is precisely the optimal worst-case identification sup {llkll~: k E S}. A subset S of L will be called monotone error e(uopt)for the interval [to, tofn).(The estimated kemel decreasing if given any fixed interval [to, tl),the norm of S ICest which attains this error lies in sp{ 1, z, . . . , zn-l}.) intersected with any subspace of functions of L having support The proof of the lower bound (17) of Theorem 1 actually on a (variable) subint b, tb + i) of [to, tl) is monotone extends to the Gel'fand n-width d" and we have decreasing as tb incre somewhat more general property than monotonicity of S requires the previous statement to be true only for subintervals of length i Iq, in which case S will be called ing. We will now define 5 IISpriorI[n,co) IIL. (20) these notions o The proofs of Theorem 1, Proposition 1, and the above some notation. inequalities are in the next section. 11 wish to consider subsets of Remark-Speed vs. Complexity: The minimal time T(E) needed to 'attain an accuracy t for fixed (Sprier, L) is the the following. inverse Notation: For any subset S of L, SJ[t,,tz)denotes the subset of functions of S with support in the interval [tl,tz) of Z, i.e., = inf{n: Bn(Sprior, L) 5 E}. (21) Sl[tl,tz):= S n PI~,,~,)S.For a set of the form S1pm),a p- section is its intersection with any p-dimensional subspace of T(E) is of the metric complexity of the set Sprlor. LI pm). A tail p-section of SI pm) is the intersection of S with In [4] this was called the metric dimension or €-dimension of the5 span sp{zm--P,. . . , zm-l). tion of Theorem lb is that the length of tify a system to a given tolerance is exactly Definition 2: Sprier will be called q-monotone decreasing, metric complexity of the a priori data set 1 5 q 5 03, if every p-section Xp in S

whenever 1 5 p < m < 03, p 5 q 5 03. (Here "SI is smaller than SZ" means that IISI(ILI IISZ(~L.) The optimal input for On, i.e., the input which attains the In other words, in any subset of the form SpriorI[O,m), the optimal identification accuracy 19" in n instants, typically loses smallest p-section is the tail p-section and this is true for all its optimality when shifted in relation to the observation inter- p up to some q. val. When the observation interval is not fixed in relation to Theorem 1: the input, 0" gives a lower bound which may be unattainable. In particular, in the adaptive control of slowly time-varying a) If the a priori set Sprier is q-monotone decreasing, then the n-width 0" has bounds systems, the identified model is periodically updated on the basis of measurements from the recent past, and the model is IISpriorl[n,n+p) IIL 5 Qn(Sprior, L) IllSpriorl[n,co)llL, then used to update the feedback law as in [9]. The observation interval lies in a "moving window" of constant length which . (P I4, P < ..I. (17) advances in relation to the input, and a single input must b) Moreover, iP therefore be effective for many intervals. For such cases, we introduce the second n-width, s", in which the relation limIISpriorI[n,n+p) IIL = IISpriorI[n,oo) IIL (18) P-+q between the input and observation interval is subjected to arbitrary time shifts. s" itself is then invariant under shifts of then 6" (Sprior, L) = JISpriorI[n,oo)II the optimal input. It provides a benchmark for the comparison at the start of the observation interval is optimal for 0". of suboptimal input ensembles, whether free or fixed.

'By the usual ahuse of notation, zz denotes the zth power function in IF. 71f SprlorIS closed under truncations then (19) is satisfied with C, equal 6For finite q. the limit in (18) can he avoided and we can simply let p = q. to the identity. ZAMES et al.: UNCERTAINTY PRINCIPLES FOR LTI AND SLOWLY VARYING SYSTEMS 1831

Dejinition3: For any n E 22, arbitrary linear measurements, whereas in the case of the n- -n width 8" these measurements are restricted to be n consecutive 8 (Spriorr L) output values. The Kolmogorov n-width is the least worst-case infu,=uSUPto,=Z sup{llkll~ : E Sprior, error in approximating the system by a linear combination of : = { (Ku)(t)= 0, t E [to, to + n)}, n>O n basis systems in L. IlSprior IIL i n=O Proposition 3: Let S be a convex set which contains the (22) origin. If there exists X > 0 such that llkll~2 Xlllcll~,for all k E S, then i.e., e" is the worst-location uncertainty supto&(U) optimized over all inputs in U. Proposition 2: In our examples, under the hypotheses of Example 1 Proof: See Appendix. This proposition and (8) indicate that d" is a lower bound 8" (Sarior 7 1;) ('brim L) QP (Sbrior L) (23) I e" , I to the error infuEUe(u)that can be achieved on the basis of n where the upper bound is valid for the following Q. For Sirior,linear measurements without regard to how the system model S&ior,and Sirior,Q: = 2; for S4priorrQ = 7r. is realized. On the other hand, the Kolmogorov n-width d, can Remark: In our examples, there is a loss of accuracy similarly be shown to be a lower bound to the error that can whenever there is no freedom to position the observation be achieved by a realization involving a linear combination interval advantageously, but the loss never exceeds a factor of not more than n basis vectors, without regard to how the of lr. measurements are taken. In all of our examples, d" and d, will coincide. Before proving Theorem 1 we establish a result similar to Iv. PROOFS AND DETAILSFOR THE TIME-INVARIANT CASE Theorem 1 but for the Gel'fand n-width. The sets here are In this section Le will delineate the properties of the two monotone decreasing in the sense specified before Theorem 1. n-widths, 8" and 8 , and provide proofs for the claims made Theorem 1': If the a priori set Sprier is q-monotone de- in the previous section. creasing, (1 i q i w) then the Gel'fand n-width d" has The n-width 8" can be extended to more general classes bounds of operators, for example, H" frequency responses acting on l2 inputs, as follows. Let U = Y be any normed linear space contained in Z"(-co, co) which is invariant under the bilateral shift T:U 4 U,(Tu)(t): = u(t - l), and the time- reversal involution u*(t):= u(-t). (11 . IIu may be different Moreover, if from )I . lllm.) L is a normed algebra contained in Z1[O, co) and L is a subspace of U*,the dual space of U. (11 . IIL may be different from both 11 . 1(p and 11 . 11~'.) Then our previous definitions of 8" and e" remain valid. This extended setup will be assumed throughout the rest of Section IV. The n-width 8" is often related to the Kolmogorov n- width d,, and bounded below by the Gel'fand n-width, d", both standard notions in metric complexity theory, which are defined as follows. Dejinition 4: Let L be a normed linear space and S a subset of L. The n-width, in the sense of Kolmogorov, of S in L is given by It will be shown that for any L", Lnl[O,n+p)is a subspace d,(S, L): = inf sup inf Ilk - g11L (24) of Ll[o,,+y)with dimension greater or equal to p. Therefore, xnk,=SSEXn by definition of p-section, the boundary of SpriorI[O,n+p) n where the infimum is taken over all n-dimensional subspaces L"( [O,n+p) is either a p-section or contains a p-section. It will of L. follow, by the monotonicity of Sprier, that the last term in (30) The Gel'fand n-width of S in L is given by is not less than IISpriorl[n,n+p)ll~,giving the lower bound of (27) for all finite p, p 5 q. To show that dim(L"I[o,n+p)) 2 p, we notice that where the infimum is taken over all subspaces L" of L of codimension n. A subspace is said to be of codimension k€ sp{l,z,...,z"+P--l): n if there exist n independent bounded linear functionals fl,.-.,fn such that L" = {k EL:f;(k) = 0, i = l1-.eln}. nSp-1 The Gel'fand n-width can be seen as the optimized worst- . k(t)f;(t)= 0, i = 1,.. . , n case uncertainty or error when identification is based on n t=O 1832 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994

where fi's are the functionals defining L". Put Corollary I: Let U = Y = P-CO,oo) and f E l1[0, CO) be a monotone decreasing positive function. If fl(0) ... fI(nfp-1) ... Sprior = {k E l1[0, Ik(7)I I f(~),VT E z+> (34) F=( fi)'.. f"(n+p- 1) then the optimal input is an impulse at the start of the By (31), we have observation interval, and

e(Uopt) = B"(Sprior1 l1[0, m)) = dn(Spriorr l1[0i CO)) = ll~[",O0)(f)ll11~ (35)

Prooj It will be shown that Sprier is oo-monotone (k(O),...,k(n+p- l))TE Null (F) . (33) decreasing. For this it is enough to show that for any positive 1 integers p < m < CO, which will be held fixed in the proof, if M is a p-dimensional subspace of sp{ 1, z, . . . , zm-'}, then It follows that dim (L"IIO,n+p))= dim (Null (F))2 p. the p-section A: = M n SpriorIIO,m) of the set SpriorJ[O,m) is The upper bound is achieved by taking L" = Ltpt. When not smaller than the tail p-section SpriorI[m-p,m),i.e., there is the upper bound equals the lower bound in (28), the optimality a function k A such that of LEpt follows from the definition of d". 0 E Proof of Theorem I: Theorem 1 can be proved by in- (Ik((l1 llSpriorl[m-p,m) 1111 1llP[m-p,m)(f)ll~1 (36) voking Theorem 1' and Proposition 3. The upper bound in 2 (17) is achieved by talung U to be an impulse at the start of where 11 . 1111 denotes the I' norm, and the last identity holds the observation interval. To show the lower bound in (17), because Sprier is closed under truncation P[m-p,m).It will we notice that for all p I q, p < W, Qn(Spriorr L) 2 be shown that in fact there exists k E A which touches the Bn(SpriorIIO,n+p), L). Since SpriorIIO,n+p)is contained in a boundary of A at p points at least, i.e., there exist 71, 72, . . . , rp finite dimensional subspace, and in such a case the hypotheses in the interval [0, m) at which Ik(r;)l = lf(~i)l,and lk(~)I5 of Proposition 3 are automatically fulfilled, 0" (Sprier, L) 2 If(.)] elsewhere in the interval [0, m). Such a k clearly has d" (SpriorIIO,n+p),L), by that proposition. Since Sprier is the requisite property (36) because f is monotone decreasing. monotone decreasing by hypothesis, Sprier I [O,n+p) is also Its existence will be established by induction on the number monotone decreasing. Hence by Theorem 1' 0" (Sprier, L) 2 of points touching the boundary. IISpriorI[n,n+p)IIL. Noticing that the above inequality holds Let kl be a nonzero vector in M. Since f is positive, there for all finite p I. q, we get the lower bound in (17). When the exists a constant a E R such that akl E A and akl touches upper bound coincides with lower bound, the optimality of the the boundary of A at one point at least, say at 71 E [0, m). impulse follows from definition of 8". 17 Suppose, next, that for some integer i, 1 I: i < p, there Proof of Proposition I: Since Assumption 1 is satisfied, exists ki E A which touches that boundary of A at (least the lower bound on e(u) in (6) implies that e(uopt) 2 8". at) i points, TI, 72, . . . , ~i,in [0, m). Let us show that there Therefore, it is enough to show that e(uopt) I 0". exist ki+l E A which touches the boundary at (least at) i + 1 Under the condition of Theorem 1, if identity (1 8) holds, points, 71, 72,'..,7;+1. the optimal input uopt is an impulse at the start of the Let {wl,..- ,wp} be a basis of M, and Vi:M -+ observation interval. Therefore, the set S(ktrue) coincides sp{zT1,.. . , z"} be the transformation whose matrix rep- with {k E Sprior : P[o,n)(k)= P[O,n)(ktrue)>. Let Cn be the resentation relative to this basis is Cesaro operation satisfying (19) and keSt:= P[O,~)C,(~~,,,). By the causality of C,, kest is equal to P[o,,)C,P[o,,)(ktrue). Vl(T1) ... 'up(71) Therefore, for all k E S(ktrue) (; ... 1 ). (37) vl(7i) . '. wp(7i) For i < p, Null (x)# 0. Let Aki # 0 be any function in Null (K). Then Aki E M and ICi + aAki E M for all a E R. Since by this construction Aki(7j) = 0, j = 1, 2,.. . , i, ki + aAki will stay on the boundary of A at T~,72,. . . , 7i for all a E R. Since Ak; # 0, k; + aAki =: k;+l must touch the boundary of A at some point 7i+l in [0, m) for some value a 6 R, i.e., (k(~+l)l= lf(7i+l)l, and ~i+l# TI, TZ,".,G. Thus ki+l touches the boundary of A at (least at) i + 1 points, and has Therefore, by definition of e(.) in (3), e(uopt) I the requisite properties. Since l~Sprior~[n,+,~~~.The last norm is equal to 8" by Theorem 1. 0 p-00lim (Isprior1 [n,n+p) (I11 = ~~~ll'[n,n+p) (f)(I 11 The estimates of n-width described in Example 1 are established by the following corollaries to Theorems 1 and 1'. = IIP[",Cu)(f) 111' ZAMES et al.: UNCERTAINTY PRINCIPLES FOR LTI AND SLOWLY VARYING SYSTEMS 1833 identities (18) in Theorem 1 and (28) in Theorem 1' apply. where denotes the Zth derivative of K, then the optimal Therefore, by Theorem 1 and Theorem l', 8" = d" = input is an impulse at the start of the observation interval, and ~~P[n,~)(f)~~ll.Also by Theorem 1, the optimal input is an impulse at the start of the observation interval. As Sprier is closed under truncations Ppt], by Proposition 1, e(uopt) = P. 0 (43) Corollary 2: Let U = Y = Z"(-CO, CO), and f E Z1 [0, CO) be a monotone decreasing positive function. If Proof: The proof is trivial when n < 1. We prove this corollary for n 2 1. S(T,1) is one-monotone decreasing, i.e., every boundary vector in S(T,l)l[o,n~ is not less than the tail vector IC, defined by

i#n then the optimal input is an impulse at the start of the (44) observation interval, and 2 = n.

e(uopt)= dn(Sprior, Z1[0,CO)) This fact follows from a theorem in Pinkus [13, The- orem 2.1, p. 2501 which implies that S(T,l)lpn] con- = dn(Sprior, z1[o, CO)) = f(n)- (39) tains a n + 1-dimensional H" ball of radius ((n - Proof: We will prove that Sprier is one-monotone de- Z)!/n!)C?-'. That radius equals the H" norm of IC,, creasing. For this it is enough to show that every boundary which in turn equals IIS(T,Z)~~,,m)~~~. Since IC, as defined function in SpriorIIO,m~is not less than the tail function ICm in (44) is in S(T, U[","+1)9 IMT, U[",CO)IlL = IlknlIL = defined by I~S(T,Z)l[n,n+l)llL. It follows that identities (18) in Theorem 1 and (28) in Theorem 1' hold. Therefore, an impulse at the start of the observation interval is the optimal input, and 8" = d" = ((n- Z)!/n!)CP-'.The theorem in [13] also implies that there exists a mapping C, on L such that Let IC E SpriorI[~,m~be a boundary function, i.e., VIC E S(T, 1) zEoIIC(i)lf-l(i) = 1. Then, as f is monotone decreasing m II~II~~L f(m)C~k(i)~f-l(i) = f(m) = IIkmII- i=O Hence, by proposition 1, e(uopt)= 8". 0 Thus Sprier satisfies the monotonicity hypothesis in Theorem Example 1, parts iii) and iv) are special cases of Corollary 1 and Theorem 1'. 3 for the sets S(T,0) and S(1, 1). To show that identities (18) in Theorem 1 and (28) in The- Proof of Proposition 2: The lowzr bound follows from orem 1' hold, i.e., llSpriorI[n,n+1)IIl1 = Ilspriorl[n,co)IIl1* it is the fact that the supremum defining 8 is over a larger set enough to show that IISpriorl[n,n+l)Ill1L IISpriorl[n,oo)llll.Let than for 8". IC E Spriorl[n,m).As f is decreasing and CE"=,~IC(z)~f-l(z) 5 For the upper bound, we consider a sequence of impulses 1, we get as the input, i.e., u(t)= 1 for t = mn, m E Z,and u(t)= 0 " elsewhere. Since the input is a periodic function of period n, the output is similarly a periodic function and is zero on an II4Il1 If(n)xl~(i)lf-l(i) If(n). interval of length n if and only if it is zero everywhere. In this i=n case the location of the observation interval does not affect Hence, llSpriorI[n,oo)lIll If(n) = IILIIP IIISpriorI[n,n+l) S(u) and we can arbitrarily set to = 0, whereupon

1111, as kn E SpriorI[n,n+l). 7L Therefore, by Theorem 1, the optimal input is an impulse at 8 (Sbrior, L) 5 SUP {II~IIL : 5 E SZprior, the start of the observation interval, and 8" = f(n).Similarly, (Ku)(t)= 0, 0 5 t < n}. (45) Theorem 1' implies that d" = f(n).Since Sprier is closed under the truncations Ppt],by Proposition 1, e(uopt)= 8". i) In the cases of Skriorand SErior, (Ku)(t)= 0, 0 5 t < 0 n, implies that Example 1, parts i) and ii) follow from the preceding corollaries when f(t) = CT~. CO IC(~+ .)U(-.) = 0, o 5 t < n. Corollury3: Let = = ~"(-co, CO) and L be the U Y r=-t Wiener algebra, i.e., the HCO-normedalgebra of functions in H"(D) with Fourier coefficients in Z1[O, CO). Let 0 < T I Since u is a sequence of impulses, we have l,Z 2 0 and 1 # 0 if T = 1. If

S(T, E) = {K: E H"(Dr-l), /lK(q",T-l IC} (41) 1834 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994

i.e., k(t) = - E,"=,k(t + mn),0 5 t < n. It follows Now we prove that K(eiw)is Lipschitz continuous with that Lipschitz constant llK('llm.Since K(z)E H"(D), integration on the arc {reiw:w1 5 w 5 wz} gives us IK(reiwl)- K(reiW2)l t=O t=n

n-1 03 ---K(reiw) dw , (46) 5 Ik(t + mn)I + ll~[n,m)(k)Illb t=O m=l w1 dK(reiw)d(re2") = 2lP[n,m)(k)IlP. dui. (47)

By (43, we have 5 rIK'(reiw)Idw, (48) --7L 6 (Skrior, l'> I 'SUP {IIp[n,W)(k)IIll : E SZprior} L ~IIK'II"IW1 - (49) = 28"(s;,;,,, P). 4. Therefore ii) In the case of Sirior,considering the discrete Fourier transforms of both the input and the output, we have IK(eiwl)- K(eiwz)I

where K(wj) = E,"=,k(T)eiwJT and wj = (27rj/n),j = 0, 1, . . . , n - 1. Therefore by (45)

where B, is a Blaschke product in Hm(Dr-l) V. IDENTIFICATION UNCERTAINTY IN SLOWLY VARYING SYSTEMS In the first part of the paper, which dealt with time-invariant systems, it was shown that the observations needed to achieve a certain accuracy require time to complete, and the relation between time and accuracy is captured by the n-width 6". If, This implies that en(Sirior,H") 5 CIIBnllm. To however, the system changes in the course of that time and the compute 11Bn1103,we notice that observations cannot catch up with the changes, then there is an irreducible error in identification. The purpose of the second part of the paper, which deals with time-varying systems, is to capture these facts in an uncertainty principle. Time-varying systems will be represented by Volterra sum 1 - cos (nu) = 2rZn operators, K : U -+ Y,where 1 + r4 - 2r2 cos (nu)' 00 y(t) = ck(t,T)U(t - T), t, T E Z. (53) T=O Therefore Here, as in the time invariant case, U and Y are normed linear spaces contained in the set Z"(-CO, CO). A distinction will be made between kernels k(., .) : Z x Z+ -+ R and the weighting functions that these kernel induce, denoted by kt(.), t E Z, iii) In the case of Strior,if K E Strior,then K has which satisfy kt(T):= k(t, T), kt(.):Z+ --+ R. It will be bounded derivative in D and Hardy's inequality implies assumed that kernels IC(., .) belong to a normed algebra B that K(z) = E,"==,k(~)z~ and Er=o lk(~)I < CO. satisfying the conditions Hence K(ei'") is defined for all w and K(e"") = limr+l K(rei"). It follows that IC(., .): kt(.) E LVt E Z

'(Strior, H") L SUP {IIKII~: IIK'II L G, K(eiwj)= 0, j = 0, 1,.. . , n - I}. ZAMES ef al.: UNCERTAINTY PRINCIPLES FOR LTI AND SLOWLY VARYING SYSTEMS 1835 where L is a convolution algebra defined as in the time Again, as in (3), the worst-case uncertainty in identifying the invariant case, (recall that 11 . 11~5 Const.11 . 11p; an example weighting function at to, for an optimally chosen estimate of L is H" of an enlarged disk). The norm on B is (kest)to is

(55)

The product in B of two kernels in B is the kernel of the product operator. Identification will be considered in the L-norm for weight- and is a function of the input U. We would like to relate this ing functions andor the B-norm for the kernels. The B-norm uncertainty optimized over all inputs, i.e., is a natural choice where it coincides with the operator norm of the time-varying operator, as in the case where L = 1'. A(Sprior,L, to): = inf e(u, to) More generally, the precise operator norm may be intractable, UEU but the B-norm is nevertheless suitable for the "frozen-time" analysis of systems, as in [9], [lo]. There, the systems vary slowly with time or "approximately commute with the shift," to the n-width 8"(Sprior,L) and show that if the rate p(Sprior) and the B-norm is an upper bound on the operator norms of is greater than zero then there is an irreducible uncertainty in the local' operators. identification no matter what the input. For this we will need The rate of change p of such a system is defined to be the following lower bound on e(u, to) whose derivation is similar to (6) for the time-invariant case

e(? to) >SUP {IIktollL: kt,(.) = qto, .) for some k E Sprier For any' subset S c B, p(S): = su"pEs p(k). for which (Ku)(t)= OVt E R}. (62) Suppose that the a priori information concerning a system locates its weighting functions kt(.) in a set Sprier c L and Given 8" (Sprier, L), introduce the function $ mapping limits its rate of change, but does not otherwise constrain the the positive integers {I, 2,. ..} into" R+ U {CO}, $(TI):= manner in which it changes with time, i.e., (1/n)8"-' (Sprier, L). Then $(n) is monotone decreasing in n. Let $-' be the inverse relation 4-l : R+ t Z+ Sprior = {k E B: kt E Sprier C L Vt E Z, $-'(x): = inf{n: n > 0, $(TI)5 x} and p(k) 5 c < CO} (57) which is also monotone decreasing. which implies that p(Sprior) = c < CO. Here Sprier again Theorem 2: If the a priori uncertainty set Sprier has a rate satisfies Assumption 1 of Section I1 and is therefore a closed of change p(Sprior),then the optimal worst-case uncertainty convex symmetric set. A(Sprior,L, to) in identification of the weighting function ICto First, we consider the identification of the weighting func- at time to has the lower bound" tions kt(.) in the L-norm. To get the most general lower bounds to uncertainty, assume that the entire histories of the A(Sprior,L, to)2 8(2@-1(f)-1) VtOE Z (63) input U and output y on (-CO, CO) are known. (If they are not, where = 8(Sprior, and p: = p(Sprior). a greater lower bound is possible.) Based on these observations 8: L) Proof: By (62), it is enough to show $at for all U E U of U and y the location of the true kern$ ktrue is narrowed and to E there exists a null kernel, E Sprier n Null down, as in the time-invariant case, from Sprier to a smaller set P, k (au) whose frozen-time system kto is appropriately large. Choose n = $-'(p). By definition of e2"-', given e > 0, U E U,and S(ktrue): = {k E Sprier : (KU- Ktrueu)(t) = o Vt E R} to E Z,there is an impulse response kto E Sprier for which the (58) (time-invariant) system operator Kto satisfies (Ktou)(t)= 0 and the uncertainty in the corresponding weighting function at for to - n < t < to + n, and time to is reduced from Sprier to the subset

Define k E B if to - n < t < to n, k(t, .): = kt0 (.>(1 - y) elsewhere. + *The local operator of K at time t is the time-invariant operator Kt with { the impulse response kt(.). We will use hatted capitals, such as S; to denote set of kernels in B, and unhatted capitals to denote the sets of corresponding weighting functions in 'ONote that the superscript following 0 is not an exponent but an index of L. n-width. 1836 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994

The resulting (time-varying) operator K is null, i.e., (Ku)= Since Sb(ktrue, to) is convex and symmetric around (ktrue)to, 0, and k(t, -) E Sprier. Also, the choice n = $-l(p), together the optimal choice for (kest)to for the right-hand side of (72) with (64) and (65) imply that K has an appropriate rate, is (ktrue)to9 i.e., i.e., p(K) 5 I p(Sprior). Hence k E Sprior n Null ( QU). Finally e(u,to, ktrue) 2 SUP Ilkto - (ktrue)toIIL (73) kt, ES*(ktruerto)

e(u,to) 2 IlktollL (by (6211, which implies (69) by definition of Sb(ktr_ue, to). > 0%-1 - -E, (by (64)) An appropriate set Sb is the subset of sprier - p-W-1 - E, W1b)= n) Sb:={k: kt E (l-CX)spriorvtE Z, and since this holds for all E > 0 and u E U,the theorem ~(k)i (1 - P)p(Sprior)I follows. 0 for then IC E Sb implies (IC+ ktrue)t E ((1 - a)sprior+ Theorem 2 is our uncertainty principle. It shows that there is asprior) c sprior Vt E Z, and ~(k+ ktrue) I (1 - P) an irreducible worst-case uncertainty in identification, which p(Sprior)-+ Pp(Sprior) = p(Sprior), which implies that k + increases with the time interval needed to achieve a given ktrue E Sprier. Since (69) holds for all u E U,applying the accuracy and with the rate of change of the system during that method of proof of Theorem 2 to Sb, we get the proposition. interval. 0 In fact, the uncertainty is greater than zero not only in the When a = /3 = 0, the lower bound for A(Sprior, worst-case but for any system which is not too close to the L, to, ICtrue) is the same as the one for the optimal worst-case boundary of Sprier. This is shown in the next corollary to uncertainty A(Sprior,L, to) in the theorem. Theorem 2. Denote the uncertainty for a weighting function Now we consider the identification of a time-varying kernel (ktrue)to by in the normed space B. Based on the observations on the infinite interval (-CO, CO), uncertainty as to the true kernel is e(u,to, ktrue):= inf SUP Ilkto - (JCest)toIIL. (k-t)to EL kto ES(ktrue, to) reduced to S(ktrue) as in (58). The worst-case uncertainty for (66) the optimally chosen estimate ICest in the B-norm is Also, denote the optimal uncertainty for (ktr,e)to by e(u):= sup inf SUP Ilk- kestllB. (74) ktrueESprlor kestEBkES(ktrue) A(Sprior, L, to, ktrue): = inf e(u,to, ktrue). (67) UEU Since the B-norm of a time-varying kemel is the supremum of the L-norms of its weighting functions, it is not difficult Corollary 4: If (ktrue)t E asprior Vt E Z and p(ktrue) I to show that e(u) 2 e(u,to) for all to. Therefore, the lower Pp(Sprior) for some 0 I a < 1, 0 5 ,B < 1, then for all to E Z bound in Theorem 2 is also a lower bound to the optimal worst- A(Sprior,L, to, ICtrue) 2 (1 - a)O{24-1[(1-p)fl-1}(68) case uncertainty of the time-varying kemel A(Sprior,L): = infnEUe(u); i.e., see the following. where 8: = 8(Sprior,L), p: = p(Sprior) and asprior: = Corollary 5: Under the hypotheses of Theorem 2 {ah:k E Sprier}. Proof: It will be shown later that a closed convex-sym- A(Sprior,L) 2 O(2$-1(+1). (75) metric set Sb C B can be found such that (ICtrue + sb) C A result similar to Corollary 4 is also easy to obtain for Sprier. It will follow that for all u E U time-varying kernels. Example 2: In the following we assume that the sets Sbrior ktrue) SUP Ilkto : kto = e(u,to, 2 { IIL ('1 k(to, .) i = 1, 2, 3, 4, are defined as in Example 1; (ktrue)t E asprior for some k E for which Ku = O}. (69) Vt E Z; and p(ktrue) 5 Pp(Sprior).For fixed r > 0, let 1c, be the function I): [0, CO) -+ Z+ TO see that (69) will follow, note that (ktrue + Sb) C Sprior implies that the set in (59) satisfies r2n-l I)(x): = inf n: -5 x}. in i) If Sprier = Sirior,then it has been claimed that OZnp1 = (C/1 - r)ran-'. By Corollary 4, for all to E Z

A(Sprior,~,to, ktrue) > (1 - ,)~,{a~[(l-~)~(i.)]-l}.C - 1-r

ii) If Sprier = S:rior or Sprier = Sirior,then OZn-l = CrZn-'. Hence, for all to E Z

A(Sprior, L, to, hue) -> (1 - a)cr{24('P)61-'} ZAMES et al.: UNCERTAINTY PRINCIPLES FOR Ll? AND SLOWLY VARYING SYSTEMS 1837

iii) If sprier = s&~~~,then e2”-l = (C/2n - 1). By identification errors derived here exist even if there is no definition of 4, +(n) = (C/n(2n- 1)).It follows that additive noise. the inverse function APPENDIXI Proof of Proposition 3: For any U E U,write ui = T-;u*, which is in U under our assumption that U is closed under T and under time-reversal (.)*. It will be shown later that under the hypotheses of the proposition, there exists a subspace S c L containing the a priori uncertainty set Sprier and with the property that, for each U E U,the sum E,”=,k(t)ui(t) defines a linear functional bounded in the L norm on S. Now, let L”(u) be the space consisting of those k E S which lie in the intersection of the null spaces of the functionals determined by the ui,i = 0,. . . ,n-1. L”(u) is a subspace of codimension n in S. As d” is by definition an infimum over all spaces of codimension n

[Isprior n L”(u)ll~2 dn(Sprior, S). (76) (F)(2(1-P)p1 + L)-lG WOE z. Since (76) holds for all U E U,the infimum 0” of the left side of (76) over U, satisfies Remarks on Prediction Uncertainty e”(Sprior, L) L dn(Sprior, S). (77) It might be expected that for quickly changing uncertain plants, observations from the remote past should contribute Now, using the fact that every bounded linear functional on little to the identification of the present weighting function; S can be extended to a bounded linear functional on L with i.e., the useful observation interval should get shorter as the preservation of norm (by the Hahn-Banach Theorem), it is time variation rate p(Sprior) increases. This is borne out by not hard to show that dn(Sprior,S) = dn(Sprior,L). The Theorem 2 and the examples, which show that the optimal proposition follows. identification error is bounded below by a-monotone increasing It remains to show the existence of such a subspace S. Put function of the time variation rate p(Sprior). The error in S:= {k E L: ck E Sprier, for some c E W}. (78) predicting (ktrue)to+l from observations of y on (-CO, to] is bounded below by A(Sprior,L, to, ICtrue) + p at least.’’ If As Sprier is a convex set which contains the origin, S is a that bound exceeds IISpriorll~, then identification provides no subspace. Because S c L c U*,U; E U defines a linear information about future behavior, and it becomes impossible functional on S bounded in the U* norm. Since llkll~2 to construct a model with any predictive power. This happens Xllkllu for all k E S by hypothesis whenever the rate p satisfies p 2 ,omax, where pmax is the solution of (79)

(1 - ,)e{2~-1[(l--p)p,,,l--l} + pmax = IISpriorIIL Therefore, the functional on S defined by U is also bounded where ktrue satisfies the hypotheses of the corollary. in the L norm, and S has the properties claimed. hfiZZU REFERENCES Concluding Remarks L. Lin, L. Y. Wang, and G. Zames, “Uncertainty principles and The minimal time needed to identify a system to a specified identification n-widths for LTI and slowly varying systems,” in Proc. accuracy in input-output behavior has been computed for Amer. Contr. Con$ 1992, pp. 296-300. selected cases. That time has been shown to depend on the G. Zames, “Adaptive feedback, identification and complexity: An overview,” in Proc. 32nd Con$ Dec. Contr., 1993, pp. 2068-2075. metric complexity of the a priori data set, as measured by the -, “Feedback organizations, learning and complexity in Hm,” in metric dimension or n-width. It is the relationship between Plenary Lecture, Proc. Amer. Contr. Con$, Atlanta, 1988. time and complexity that makes it fruitful to pose identification -, “On the metric complexity of causal linear systems: €-entropy and €-dimension for continuous time,” IEEE Trans. Automat. Contr., problems in the context of complexity theory. In that context, vol. AC-24, no. 2, pp. 222-230, Apr. 1979. identification and feedback both serve the common purpose of D. N. C. Tse, M. A. Dahleh, and J. N. Tsitsiklis, “Optimal asymptotic reducing plant uncertainty and thereby reducing complexity identification under bounded disturbances,” in Proc. 30th Con$ Dec. Control, 1991, pp. 623-628. PI, PI. A. J. Helmicki, C. A. Jacobson, and C. N. Nett, “Control oriented If a system changes while it is being identified, then there system identification: A worst-case/deterministic approach in H-,” IEEE Trans. Automat. Contr., vol. 36, pp. 1163-1176, 1991. is an irreducible uncertainty as to its input-output behavior, P. M. M~la,“On identification of stable systems and optimal approx- which has been related to its rate of change. The irreducible imation,’’ Automatica, vol. 27, pp. 663-676, 1991. G. Gu and P. P. Khargonekar, “Linear and nonlinear algorithms for “A greater lower bound can be obtained by exploiting the fact that identification in Hm with error bounds,” IEEE Trans. Automat. Contr. observations are available only on (-00, to). vol. 37, no. 7 pp. 953-963, July 1992. 1838 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 39, NO. 9, SEPTEMBER 1994

[9] G. Zames and L. Y. Wang, “Local-global algebras for slow Hm Lin Lin was hom in Beijing, China, on May 22, adaptation: Part I-Inversion and stability,” ZEEE Trans. Automat. Contr., 1961. He received the B.E. and M E degrees from vol. 36, no. 2, pp. 130-142, Feb. 1991. Tsinghua University, Beijing, China, in 1984 and [lo] L. Y. Wang and G. Zames, “Local-global algebras for slow H” 1987, respectively, and the Ph.D. degree from adaptation: Part 11-Optimization of stable plants,” ZEEE Trans Auromat. McGill University, Montreal, Canada, in 1993, all Contr., vol. 36, no. 2, pp. 143-151, Feb. 1991. in electrical engineenng. [l11 L. Gerencser, “On a Class of Mixing Processes,” Stochastics, vol. 26, Since December 1993, Dr. Lin has been with pp. 165-191, 1989. INRS-Telecommunications, a research institute [12] L. Lin, L. Y. Wang, and G Zames, “Fast identification wwidths and affiliated with Bell Northem Research in Mon- quasianalytic inputs for continuous LTI systems,” in Proc. Amer. Contr. treal, where he is currently holding an NSERC Con$, 1993, pp. 12261230. postdoctoral fellowship. His research interests are [I31 A. PinkW n-Wzdths in APProxlmatiOn Theory. New York Springer- in the areas of system identification, complexity theory, adaptive systems Verlag, 1985. and noise cancellation.

Le Yi Wang (S’85-M’89) received the M.E. degree in computer control from the Institute of Mechanical Engineering, Shanghai, China, in 1982, and the Ph.D. degree in electrical engineer- George Zames (S’57-M’6l-SM’78-F‘79) received ing from McGill University, Montreal, Canada, in the B.Eng. degree in 1954 from McGill University, 1990. Montreal, Canada, and the Sc.D. degree in 1969 From 1982 to 1984, he taught computer courses from Massachusetts Institute of Technology, Cam- in the Department of Automation, the Shanghai bridge, MA. Institute of Mechanical Engineering. Since 1990, Dr. McGill is currently the Macdonald Professor he has been with Wayne State University, Detroit,

of Electrical Engineering at McGill University. He Michigan.YI where he is currentlv Assistant Professor is a Fellow of the Royal Society of Canada, the in the Department of Electrical and Computer Engineering. His research Canadian Institute for Advanced Research, and the interests are in the areas of H-infinity optimization, robust control, slowly IEEE. He has written many papers on systems and time-varying systems, system identification, and adaptive systems.

control theorv.2, won manv awards for his research. Dr. Wang was awarded the Research Initiation Award from the National and served on many joumal editorial boards. Science Foundation in 1992.