arXiv:1005.5170v1 [cs.LG] 25 May 2010 elvle otfntosdfie ncmlxdmis( me a domains as complex 10], on 4, defined 8, functions 12, cost 2, valued 1, sig real 7, the [13, in filtering popular adaptive very complex of become has [15] calculus Wirtinger’s Introduction 1 http://users.uoa.gr/ naEciendmi ihadul iesoaiy( dimensionality double a with Instea domain used. Euclidean be a cannot on derivative complex the therefore and n ope aibe hl nscin3temi oin and notions main the 3 section in while variable, complex one h rdet,wihmybcm eiu ftedul dimens double the if tedious become el may simpl more which Therefore, gradients, and derivative. the equivalent comp complex the an the when using with that resemblance described out be Wirt turns may of it derivatives However, foundation derivative. theoretical real the the that emphasize should result We therefore and derivatives of definition alternative Spaces) presen Kernel Hilbert rigorous extend Reproducing it more b) as a and (such insightful spaces provide more Hilbert finds to author endeavors the Kreutz-D that it K. aspects a) of text introductory twofold: the is is summari framework to unified attempt a first in recommended Wirtinge highly the and of use excellent ideas An and main the calculus of Wirtinger’s presentation self-consistent of [14 concepts therefor are the and abstruse, textbooks understand technically these and completely of specialized a Some highly in mentioned conditions). introduced Riemann are are Cauchy they calculus non-holomor or Wirtinger’s superficially, involves of either it ideas since the ignored, n textbooks, only usually is nee It is fields specialist. calculus the mathematical to other only accessible from dea are great materials and a manifolds properly studied on be calculus to order in which properties, a eoepplri h inlpoesn omnt,most community, processing [13]. signal filters t the estimation applied linear in been has popular and become countries deri has complex speaking standard German the the cumbe of in rules known become the to computations ba resemblance is the great that a formulation, that equivalent is alternative an approach provides this of price ∗ .Buolsi ihteDprmn fIfraisadtelec and Informatics of Department the with is Bouboulis P. h aucithstomi at.Scin2 el ihordi with deals 2, Section parts. main two has manuscript The otCmlxAayi’txbosda with deal textbooks Analysis’ Complex Most omnmsocpin(sal oeb einr ntefiel the in beginners by done (usually misconception common A itne’ aclsi eea ibr Spaces Hilbert general in Calculus Wirtinger’s ∼ ldalla/pantelis/). atlsBuols ebr EE AMS. IEEE, Member, Bouboulis, Pantelis a 1 2010 31, May ope analytic complex 1 R ndffrn rdetrlsi iiiainproblems. minimization in rules gradient different in s 2 C trlta ntesoeo hsltrtr,Wirtinger’s literature, this of scope the in that atural o eomne o oen h at nyto only wants who someone for recommended not e ν muiain,Uiest fAhn,Gec,emi:(see e-mail: Greece, Athens, of University ommunications, ’ aclscno efudi n ftoeworks. those of any in found be cannot calculus r’s e nsml ue n rnilsadwihbears which and principles and rules simple on sed ν ,te h eldrvtvsmyb mlyd The employed. be may derivatives real the then ), .Sc ucin,ovosy r o holomorphic not are obviously, functions, Such ). hcfntos eetees nsm fthese of some in Nevertheless, functions. phic ue a edrvdadtecmuain of computations the and derived be may rules e ,i ecnie httecs ucini defined is function cost the that consider we if d, a rcsigcmuiymil ntecontext the in mainly community processing nal n fcmuig na lgn a,gainsof gradients way, elegant an in computing, of ans fntosfo oooy ieeta geometry, differential topology, from notions of l . ieetstu mil sab-rdc fthe of bi-product a as (mainly set-up different lhuh nms ae,te r presented are they cases, most in although, , aie lhuhti ehdlg srelatively is methodology this Although vative. h oin fWrigrscluu ngeneral on calculus Wirtinger’s of notions the s nhshrfil.Mroe,argru and rigorous a Moreover, field. his/her in m lao[] h i ftepeetmanuscript present the of aim The [6]. elgado oa space ional ob mlyd hrfr,ms fthese of most Therefore, employed. be to d ne’ aclsi h omndfiiinof definition common the is calculus inger’s eut fteetne itne’ Calculus Wirtinger’s extended the of results ydet h ok fPcnooo widely on Picinbono of works the to due ly e tutr stknit con,tereal the account, into taken is structure lex gn omlto hc er surprising a bears which formulation egant ealterltdcnet n rsn them present and concepts related the all ze aino h eae aeil ouigon focusing material, related the of tation 1 ] oee,ms fteewrsare works these of most However, 9]. 11, , rcia plctos[,5,ol recently only 5], [3, applications practical o sm n eiu.Wrigrscalculus Wirtinger’s tedious. and rsome ayWrigrscluu o ucin of functions for calculus Wirtinger’s nary (i.e., )i htWrigrscluu ssan uses calculus Wirtinger’s that is d) holomorphic R 2 ∗ ν scniee,aesimplified. are considered, is ucin n their and functions ) in RKHSs are presented. Throughout the paper, we will denote the set of all integers, real and complex numbers by N, R and C respectively. Vector or matrix valued quantities appear in boldfaced symbols. The present report, has been inspired by the need of the author and its colleagues to understand the underlying theory of Wirtinger’s Calculus and to further extend it to include the kernel case. Many parts have been considerably improved after long discussions with prof. S. Theodoridis and L. Dalla.
2 Wirtinger’s Calculus on C
Consider the function f : X ⊆ C → C, f(z)= f(x + iy)= u(x,y)+ v(x,y)i, where u, and v are real valued functions defined on an open subset X of R2. Any such function, f, may be regarded as defined either on a subset of C or on a subset of R2. Furthermore, f may be regarded either as a complex valued function, or as a vector valued function, which takes values in R2. Therefore, we may equivalently write:
f(z)= f(x + iy)= f(x,y)= u(x,y)+ iv(x,y) = (u(x,y), v(x,y)).
The complex derivative of f at c, if it exists, is defined as the limit:
f(c + h) − f(c) f ′(c) = lim . h→0 h This definition, although similar with the typical real derivative of elementary calculus, exploits the complex structure of X. More specifically, the division that appears in the definition is based on the complex multiplication, which forces a great deal of structure on f. From this simple fact follow all the important strong properties of the complex derivative, which do not have counterparts in the ordinary real calculus. For example, it is well known that if f ′ exists, then so does f (n), for n ∈ N. If f is differentiable at any z0 ∈ A, f is called holomorphic in A, or complex analytic in A, in the sense that it can be expanded as a Taylor series, i.e.,
∞ f (n)(c) f(c + h)= hn. (1) n! n =0 The proof of this statement is out of the scope of this manuscript, but it can be found at any complex analysis textbook. The expression “f is complex analytic at z0” means that f is complex-analytic at a neighborhood around z0. We will say that f is real analytic, when both u and v have a Taylor’s series expansion in the real domain.
2.1 Cauchy-Riemann conditions We begin our study, exploring the relations between the complex derivative and the real derivatives. In the following we will say that f is differentiable in the complex sense, if the complex derivative exists, and that f is differentiable in the real sense, if both u and v have partial derivatives.
Proposition 2.1. If the complex derivative of f at a point c (i.e.,f ′(c)) exists, then u and v are differentiable at the point (c1, c2), where c = c1 + c2i. Furthermore, ∂u ∂v ∂u ∂v (c , c )= (c , c ) and (c , c )= − (c , c ). (2) ∂x 1 2 ∂y 1 2 ∂y 1 2 ∂x 1 2
There are several proves of this proposition, that can be found in any complex analysis textbook. Here we present the two most characteristic ones.
2 1st Proof. Since f is differentiable in the complex sense, the limit:
f(c + h) − f(c) f ′(c) = lim h→0 h exists. Consider the special case where h = h1 (i.e., h → 0 on the real axes). Then f(c + h) − f(c) u(c + h , c )+ iv(c + h , c ) − u(c , c ) − iv(c , c ) = 1 1 2 1 1 2 1 2 1 2 h h1 u(c + h , c ) − u(c , c ) v(c + h , c ) − v(c , c ) = 1 1 2 1 2 + i 1 1 2 1 2 . h1 h1 In this case, since the left part of the equation converges to f(c), the real and imaginary parts of the second half of the equation must also be convergent. Thus, u, and v have partial derivatives with respect to x and ′ f (c)= ∂u(c)/∂x + i∂v(c)/∂x. Following the same rationale, if we set h = ih2 (i.e., h → 0 on the imaginary axes), we take
f(c + h) − f(c) u(c , c + h )+ iv(c , c + h ) − u(c , c ) − iv(c , c ) = 1 2 2 1 2 2 1 2 1 2 h ih2 u(c , c + h ) − u(c , c ) v(c , c + h2) − v(c , c ) = 1 2 2 1 2 + i 1 2 1 2 ih2 ih2 v(c , c + h ) − v(c , c ) u(c , c + h ) − u(c , c ) = 1 2 2 1 2 − i 1 2 2 1 2 . h2 h2 The last equation guarantees the existence of the partial derivatives of u and v with respect to y. We conclude that u and v have partial derivatives and that ∂u ∂v ∂v ∂u f ′(c)= (c)+ i (c)= (c) − i (c). ∂x ∂x ∂y ∂x
2nd Proof. Considering the first order Taylor expansion of f around c, we take:
f(c + h)= f(c)+ f ′(c)h + o(|h|), (3) where the notation o means that o(|h|)/|h|→ 0, as |h|→ 0. Substituting f ′(c)= A + Bi we have:
f(c + h)=f(c) + (A + Bi)(h1 + ih2)+ o(h)
=u(c1, c2)+ Ah1 − Bh2 + ℜ[o(h)] + i (v(c1, c2)+ Bh1 + Ah2 + ℑ[o(|h|)]) .
Therefore,
u(c1 + h1, c2 + h2)=u(c1, c2)+ Ah1 − Bh2 + ℜ[o(|h|)], (4)
v(c1 + h1, c2 + h2)=v(c1, c2)+ Bh1 + Ah2 + ℑ[o(|h|)]. (5)
Since o(|h|)/|h|→ 0, we also have
2 2 2 2 ℜ[o(|(h1, h2)|)]/ h1 + h2 → 0 and ℑ[o(|(h1, h2)|)]/ h1 + h2 → 0 as h → 0. Thus, equations (4-5) are the first order Taylor expansions of u and v around (c1, c2). Hence we deduce that: ∂u ∂u ∂v ∂v (c , c )= A, (c , c )= −B, (c , c )= B, (c , c )= A. ∂x 1 2 ∂y 1 2 ∂x 1 2 ∂y 1 2 This completes the proof.
3 Equations (2) are called the Cauchy Riemann conditions. It is well known that they provide a necessary and sufficient condition, for a complex function f to be differentiable in the complex sense, providing that f is differentiable in the real sense. This is explored in the following proposition.
Proposition 2.2. If f is differentiable in the real sense at a point (c1, c2) and the Cauchy-Riemann condi- tions hold: ∂u ∂v ∂u ∂v (c , c )= (c , c ) and (c , c )= − (c , c ), (6) ∂x 1 2 ∂y 1 2 ∂y 1 2 ∂x 1 2 then f is differentiable in the complex sense at the point c = c1 + c2i.
Proof. Consider the first order Taylor expansions of u and v at c = c1 + ic2 = (c1, c2): ∂u ∂u u(c + h)= u(c)+ (c)h + (c)h + o(|h|), ∂x 1 ∂y 2 ∂v ∂v v(c + h)= v(c)+ (c)h + (c)h + o(|h|). ∂x 1 ∂y 2 Multiplying the second relation by i and adding it to the first one, we take: ∂u ∂v ∂u ∂v f(c + h)= f(c)+ (c)+ i (c) h + (c)+ i (c) h + o(|h|). ∂x ∂x 1 ∂y ∂y 2 To simplify the notation we may define ∂f ∂u ∂v ∂f ∂u ∂v (c)= (c)+ i (c) and (c)= (c)+ i (c) ∂x ∂x ∂x ∂y ∂y ∂y and obtain: ∂f ∂f f(c + h)= f(c)+ (c)h + (c)h + o(|h|). ∂x 1 ∂y 2
h+h∗ h−h∗ Nest, we substitute h1 and h2 using the relations h1 = 2 and h2 = 2i . 1 ∂f 1 ∂f 1 ∂f 1 ∂f f(c + h)= f(c)+ (c)+ (c) h + (c) − (c) h∗ + o(|h|) 2 ∂x i ∂y 2 ∂x i ∂y 1 ∂f ∂f 1 ∂f ∂f = f(c)+ (c) − i (c) h + (c)+ i (c) h∗ + o(|h|). (7) 2 ∂x ∂y 2 ∂x ∂y It will be shown that equation (7) is essential for the development of Wirtinger’s calculus. To complete the proof of the proposition we compute the fraction that appears in the definition of the complex derivative: f(c + h) − f(c) 1 ∂f ∂f 1 ∂f ∂f h∗ o(|h|) = (c) − i (c) + (c)+ i (c) + h 2 ∂x ∂y 2 ∂x ∂y h h h∗ 1 Recall that for the limit limh→0 λ h to exist, it is necessary that λ = 0 . Hence, since o(|h|)/h → 0 as h → 0, f is differentiable in the complex sense, iff ∂f ∂f (c)+ i (c) = 0. ∂x ∂y However, according to our definition, ∂f ∂f ∂u ∂v ∂v ∂u (c)+ i (c)= (c) − (c) + i (c)+ (c) . ∂x ∂y ∂x ∂y ∂x ∂y ∗ 1 iθ → h −2iθ → To prove it, just set h = re and let r 0, while keeping θ constant. Then λ h = λe 0, if and only if λ = 0.
4 Thus, f is differentiable in the complex sense, iff the Cauchy-Riemann conditions hold. Moreover, in this case: 1 ∂f ∂f f ′(c)= (c) − i (c) 2 ∂x ∂y 1 ∂u ∂v i ∂u ∂v = (c)+ i (c) − (c)+ i (c) 2 ∂x ∂x 2 ∂y ∂y 1 ∂u ∂v i ∂v ∂u = (c)+ (c) + (c) − (c) 2 ∂x ∂y 2 ∂x ∂y ∂u ∂v = (c)+ i (c) ∂x ∂x ∂v ∂u = (c) − i (c). ∂y ∂y
More information on holomorphic functions and their relation to harmonic real functions may be found in [14, 11, 9].
2.2 Wirtinger’s Derivatives In many practical applications we are dealing with functions f that are not differentiable in the complex sense. For example, in minimization problems the cost function is real valued and thus cannot be complex- differentiable at every x ∈ X (unless it is a constant function2). In these cases, our only option is to work with the real derivatives of u and v. However, this might make the computations of the gradients cumbersome and tedious. To cope with this problem, we will develop an alternative formulation which, although it is based on the real derivatives, it strongly resembles the notion of the complex derivative. In fact, if f is differentiable in the complex sense, the developed derivatives will coincide with the complex ones. To provide some more insights into the ideas that lie behind the derivation of Wirtinger’s Calculus, we present an alternative definition of a complex derivative, which we call the conjugate-complex derivative at c. We shall say that f is conjugate-complex differentiable (or that it is differentiable in the conjugate-complex sense) at c, if the limit
∗ ′ f(c + h ) − f(c) f(c + h) − f(c) f∗(c) = lim = lim (8) h∗→0 h h→0 h∗ exists. Following a procedure similar to the one presented in section 2.1 we may prove the following propo- sitions.
′ Proposition 2.3. If the conjugate-complex derivative of f at a point c (i.e.,f∗(c)) exists, then u and v are differentiable at the point (c1, c2), where c = c1 + c2i. Furthermore, ∂u ∂v ∂u ∂v (c , c )= − (c , c ) and (c , c )= (c , c ). (9) ∂x 1 2 ∂y 1 2 ∂y 1 2 ∂x 1 2 These are called the conjugate Cauchy Riemann conditions.
Proposition 2.4. If f is differentiable in the real sense at a point (c1, c2) and the conjugate Cauchy- Riemann conditions hold, then f is differentiable in the conjugate-complex sense at the point c = c1 + c2i.
If a function f is differentiable in the conjugate-complex sense, at every point of an open set A, we will say that f is conjugate holomorphic on A. As in the case of the holomorphic functions, one may prove that
2This statement can be proved using the Cauchy Riemann conditions. For any real valued complex function f defined on a , v vanishes. Therefore the Cauchy-Riemann conditions dictate that ∂u/∂x = ∂u/∂y = 0. Hence, u must be a constant.
5 ′ similar strong results hold for conjugate-holomorphic functions. In particular, it can be shown that if f∗(z) exists for every z in a neighborhood of c, then f has a form of a Taylor series expansion around c, i.e.,
∞ f (n)(c) f(c + h)= ∗ (h∗)n. (10) n! n=0 In this case, we will say that f is conjugate-complex analytic at c. Note, that if f(z) is complex analytic at c, then f ∗(z) is conjugate-complex analytic at c. It is evident that if neither the Cauchy Riemann conditions, nor the conjugate Cauchy-Riemann con- ditions are satisfied for a function f, then the complex derivatives cannot be exploited and the function cannot be expressed neither in terms of h or h∗, as in the case of complex or conjugate-complex differen- tiable functions. Nevertheless, if f is differentiable in the real sense (i.e., u and v have partial derivatives), we may still find a form of Taylor’s series expansion. Recall, for example, that in the proof of proposition 2.2, we concluded, based on the first order Taylor’s series expansion of u, v, that (equation (7)):
1 ∂f ∂f 1 ∂f ∂f f(c + h)= f(c)+ (c) − i (c) h + (c)+ i (c) h∗ + o(|h|). 2 ∂x ∂y 2 ∂x ∂y One may notice that in the more general case, where f is real-differentiable, it’s Taylor’s expansion is casted in terms of both h and h∗. This can be generalized for higher order Taylor’s expansion formulas following the same rationale. Observe also that, if f is complex or conjugate-complex differentiable, this relation degenerates (due to the Cauchy Riemann conditions) to the respective Taylor’s expansion formula (i.e., (1) or 10)). In this context, the following definitions come naturally. We define the Wirtinger’s derivative (or W-derivative for short) of f at c as follows
∂f 1 ∂f ∂f 1 ∂u ∂v i ∂v ∂u (c)= (c) − i (c) = (c)+ (c) + (c) − (c) . (11) ∂z 2 ∂x ∂y 2 ∂x ∂y 2 ∂x ∂y Consequently, the conjugate Wirtinger’s derivative (or CW-derivative for short) of f at c is defined by:
∂f 1 ∂f ∂f 1 ∂u ∂v i ∂v ∂u (c)= (c)+ i (c) = (c) − (c) + (c)+ (c) . (12) ∂z∗ 2 ∂x ∂y 2 ∂x ∂y 2 ∂x ∂y Note that both the W-derivative and the CW-derivative exist, if f is differentiable in the real sense. In view of these new definitions, equation (7) may now be recasted as follows ∂f ∂f f(c + h)= f(c)+ (c)h + (c)h∗ + o(|h|). (13) ∂z ∂z∗ At first glance the definitions the W and CW derivatives seem rather obscure. Although, it is evident that they are defined so that that they are consistent with the Taylor’s formula (equation (7)), their computation seems quite difficult. However, this is not the case. We will show that they may be computed quickly using well-known differentiation rules. First, observe that if f satisfies the Cauchy Riemann conditions then the W-derivative degenerates to the standard complex derivative. The following theorem establishes the fundamental property of W and CW derivatives. Its proof is rather obvious.
Theorem 2.5. If f is complex differentiable at c, then its W derivative degenerates to the standard complex derivative, while its CW derivative vanishes, i.e., ∂f ∂f (c)= f ′(c), (c) = 0. ∂z ∂z∗ Consequently, if f is conjugate-complex differentiable at c, then its CW derivative degenerates to the standard conjugate-complex derivative, while its W derivative vanishes, i.e., ∂f ∂f (c)= f ′ (c), (c) = 0. ∂z∗ ∗ ∂z
6 In the sequel, we will develop the main differentiation rules of Wirtinger’s derivatives. Most of the proofs of the following properties are straightforward. Nevertheless, we present them all for completeness.
Proposition 2.6. If f is differentiable in the real sense at c, then ∂f ∗ ∂f ∗ (c) = (c). (14) ∂z ∂z∗
Proof. ∂f ∗ 1 ∂u ∂v i ∂v ∂u (c) = (c)+ (c) − (c) − (c) ∂z 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂(−v) i ∂(−v) ∂u = (c) − (c) + (c)+ (c) 2 ∂x ∂y 2 ∂x ∂y ∂f ∗ = (c). ∂z∗
Proposition 2.7. If f is differentiable in the real sense at c, then ∂f ∗ ∂f ∗ (c) = (c). (15) ∂z∗ ∂z
Proof. ∂f ∗ 1 ∂u ∂v i ∂v ∂u (c) = (c) − (c) − (c)+ (c) ∂z∗ 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂(−v) i ∂(−v) ∂u = (c)+ (c) + (c) − (c) 2 ∂x ∂y 2 ∂x ∂y ∂f ∗ = (c). ∂z
Proposition 2.8 (Linearity). If f, g are differentiable in the real sense at c and α, β ∈ C, then ∂(αf + βg) ∂f ∂g (c)= α (c)+ β (c), (16) ∂z ∂z ∂z ∂(αf + βg) ∂f ∂g (c)= α (c)+ β (c) (17) ∂z∗ ∂z∗ ∂z∗
Proof. Let f(z) = f(x,y) = uf (x,y) + ivf (x,y), g(z) = g(x,y) = ug(x,y) + ivg(x,y) be two complex functions and α, β ∈ C, such that α = α1 + iα2, β = β1 + iβ2. Then
r(z)=αf(z)+ βg(z) = (α1 + iα2)(uf (x,y)+ ivf (x,y)) + (β1 + iβ2)(ug(x,y)+ ivg(x,y))
= (α1uf (x,y) − α2vf (x,y)+ β1ug(x,y) − β2vg(x,y))
+ i (α1vf (x,y)+ α2uf (x,y)+ β1vg(x,y)+ β2ug(x,y)) .
7 Thus, the W-derivative of r will be given by: ∂r 1 ∂u ∂v i ∂v ∂u (c)= r (c)+ r (c) + r (c) − r (c) ∂z 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v ∂u ∂v ∂v ∂u ∂v ∂u = α f (c) − α f (c)+ β g (c) − β g (c)+ α f (c)+ α f (c)+ β g (c)+ β g (c) 2 1 ∂x 2 ∂x 1 ∂x 2 ∂x 1 ∂y 2 ∂y 1 ∂y 2 ∂y i ∂v ∂u ∂v ∂u ∂u ∂v ∂u ∂v + α f (c)+ α f (c)+ β g (c)+ β g (c) − α f (c)+ α f (c) − β g (c)+ β g (c) 2 1 ∂x 2 ∂x 1 ∂x 2 ∂x 1 ∂y 2 ∂y 1 ∂y 2 ∂y 1 ∂u i ∂v 1 ∂u i ∂v = (α + iα ) f (c)+ (α + iα ) f (c)+ (β + iβ ) g (c)+ (β + iβ ) g (c) 2 1 2 ∂x 2 1 2 ∂x 2 1 2 ∂x 2 1 2 ∂x 1 ∂v i ∂u 1 ∂v i ∂u + (α + iα ) f (c) − (α + iα ) f (c)+ (β + iβ ) g (c) − (β + iβ ) g (c) 2 1 2 ∂y 2 1 2 ∂y 2 1 2 ∂y 2 1 2 ∂y 1 ∂u ∂v i ∂v ∂u =α f (c)+ f (c) + f (c) − f (c) 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v i ∂v ∂u + β g (c)+ g (c) + g (c) − g (c) 2 ∂x ∂y 2 ∂x ∂y ∂f ∂g =α (c)+ β (c). ∂z ∂z On the other hand, in view of Propositions 2.7 and 2.6 and the linearity property of the W-derivative, the CW-derivative of r at c will be given by: ∂r ∂(αf + βg) ∂(αf + βg)∗ ∗ (c)= (c)= (c) ∂z∗ ∂z∗ ∂z ∂(α∗f ∗ + β∗g∗) ∗ ∂f ∗ ∂g∗ ∗ = (c) = α∗ (c)+ β∗ (c) ∂z ∂z ∂z ∂f ∗ ∗ ∂g∗ ∗ ∂f ∂g =α (c) + β (c) = α (c)+ β (c). ∂z ∂z ∂z∗ ∂z∗
Proposition 2.9 (Product Rule). If f, g are differentiable in the real sense at c, then ∂(f g) ∂f ∂g (c)= (c)g(c)+ f(c) (c), (18) ∂z ∂z ∂z ∂(f g) ∂f ∂g (c)= (c)g(c)+ f(c) (c). (19) ∂z∗ ∂z∗ ∂z∗
Proof. Let f(z) = f(x,y) = uf (x,y) + ivf (x,y), g(z) = g(x,y) = ug(x,y) + ivg(x,y) be two complex functions differentiable at c. Consider the complex function r defined as r(z)= f(z)g(z). Then
r(z) = (uf (z)+ ivg(z))(ug(z)+ ivg(z)) = (uf ug − vf vg)+ i(uf vg + vf ug). Hence the W-derivative of r at c is given by: ∂r 1 ∂u ∂v i ∂v ∂u (c)= r (c)+ r (c) + r (c) − r (c) ∂z 2 ∂x ∂y 2 ∂x ∂y 1 ∂(u u − v v ) ∂(u v + v u ) i ∂(u v + v u ) ∂(u u − v v ) = f g f g (c)+ f g f g (c) + f g f g (c) − f g f g (c) 2 ∂x ∂y 2 ∂x ∂y 1 ∂(u u ) ∂(v v ) ∂(u v ) ∂(v u ) = f g (c) − f g (c)+ f g (c)+ f g (c) 2 ∂x ∂x ∂y ∂y i ∂(u v ) ∂(v u ) ∂(u u ) ∂(v v ) + f g (c)+ f g (c) − f g (c)+ f g (c) 2 ∂x ∂x ∂y ∂y
8 1 ∂u ∂u ∂v ∂v = f (c)u (c)+ g (c)u (c) − f (c)v (c) − g (c)v (c) 2 ∂x g ∂x f ∂x g ∂x f ∂u ∂v ∂v ∂u + f (c)v (c)+ g (c)u (c)+ f (c)u (c)+ g (c)v (c) ∂y g ∂y f ∂y g ∂y f i ∂u ∂v ∂v ∂u + f (c)v (c)+ g (c)u (c)+ f (c)u (c)+ g (c)v (c) 2 ∂x g ∂x f ∂x g ∂x f ∂u ∂u ∂v ∂v − f (c)u (c) − g (c)u (c)+ f (c)v (c)+ g (c)v (c) . ∂y g ∂y f ∂y g ∂y f After factorization we obtain: ∂r 1 ∂u ∂v i ∂v ∂u (c)=u (c) f (c)+ f (c) + f (c) − f (c) ∂z g 2 ∂x ∂y 2 ∂x ∂y 1 ∂v ∂u i ∂u ∂v + v (c) − f (c)+ f (c) + f (c)+ f (c) g 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v i ∂v ∂u + u (c) g (c)+ g (c) + g (c) − g (c) f 2 ∂x ∂y 2 ∂x ∂y 1 ∂v ∂u i ∂u ∂v + v (c) − g (c)+ g (c) + g (c)+ g (c) . f 2 ∂x ∂y 2 ∂x ∂y Applying the simple rule 1/i = −i, we take: ∂r 1 ∂u ∂v i ∂v ∂u (c)=u (c) f (c)+ f (c) + f (c) − f (c) ∂z g 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v i ∂v ∂u + iv (c) f (c)+ f (c) + f (c) − f (c) g 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v i ∂v ∂u + u (c) g (c)+ g (c) + g (c) − g (c) f 2 ∂x ∂y 2 ∂x ∂y 1 ∂u ∂v i ∂v ∂u + iv (c) g (c)+ g (c) + g (c) − g (c) f 2 ∂x ∂y 2 ∂x ∂y ∂f ∂g =(u (c)+ iv ) (c) + (u (c)+ iv ) (c), g g ∂z f f ∂z which gives the result. The product rule of the CW-derivative follows from the product rule of the W-derivative and Propositions 2.6, 2.7 as follows: ∂(fg) ∂(fg)∗ ∗ ∂f ∗ ∂g∗ ∗ (c)= (c) = (c)g∗(c)+ (c)f ∗(c) ∂z∗ ∂z ∂z ∂z ∂f ∂g = (c)g(c)+ (c)f(c). ∂z∗ ∂z∗
Lemma 2.10 (Reciprocal Rule). If f is differentiable in the real sense at c and f(c) = 0, then ∂( 1 ) ∂f (c) f (c)= − ∂z , (20) ∂z f 2(c) 1 ∂f ∂( ) ∗ (c) f (c)= − ∂z . (21) ∂z∗ f 2(c)
Proof. Let f(z)= uf (x,y)+ ivf (x,y) be a complex function, differentiable in the real sense at c, such that f(c) = 0. Consider the function r(z) = 1/f(z). Then
uf (z) vf (z) r(z)= 2 2 − i 2 2 . uf (z)+ vf (z) uf (z)+ vf (z)
9 For the partial derivatives of ur, vr we have:
∂uf 2 2 2 ∂uf ∂vf ∂ur ∂x (c)(uf (c)+ vf (c)) − 2uf (c) ∂x (c) − 2uf (c)vf (c) ∂x (c) (c)= 2 2 2 , ∂x (uf (c)+ vf (c)) ∂uf 2 2 2 ∂uf ∂vf ∂ur ∂y (c)(uf (c)+ vf (c)) − 2uf (c) ∂y (c) − 2uf (c)vf (c) ∂y (c) (c)= 2 2 2 , ∂y (uf (c)+ vf (c)) ∂vf 2 2 2 ∂vf ∂uf ∂vr ∂x (c)(uf (c)+ vf (c)) − 2vf (c) ∂x (c) − 2uf (c)vf (c) ∂x (c) (c)= − 2 2 2 , ∂x (uf (c)+ vf (c)) ∂vf 2 2 2 ∂vf ∂uf ∂vr ∂y (c)(uf (c)+ vf (c)) − 2vf (c) ∂y (c) − 2uf (c)vf (c) ∂y (c) (c)= − 2 2 2 . ∂y (uf (c)+ vf (c)) Therefore, ∂r 1 ∂u ∂v i ∂v ∂u (c)= r (c)+ r (c) + r (c) − r (c) ∂z 2 ∂x ∂y 2 ∂x ∂y 1 ∂uf 2 2 ∂vf 2 2 = (c) −u (c)+ v (c) + 2iuf (c)vf (c) + (c) −2uf (c)vf (c) − iu + iv (c) 2(u2 (c)+ v2(c))2 ∂x f f ∂x f f f f ∂v ∂u + f (c) −u2 (c)+ v2(c) + 2iu (c)v (c) + f (c) 2u (c)v (c)+ iu2 (c) − iv2(c) ∂y f f f f ∂y f f f f 2 2 u (c) − v (c) − 2iuf (c)vf (c) ∂u ∂v ∂v ∂u = f f − f (c)+ f (c) − i f (c) − f (c) 2(u2 (c)+ v2(c))2 ∂x ∂y ∂x ∂y f f ∂f (c) = − ∂z . f 2(c) To prove the corresponding rule of the CW-derivative we apply the reciprocal rule of the W-derivative as well as Propositions 2.6, 2.7:
1 1 ∗ ∂f ∗ ∗ ∂f ∂( ) ∂( ∗ ) f f ∂z (c) ∂z∗ (c) ∗ (c)= (c) = − 2 = − 2 . ∂z ∂z (f ∗(c)) (f(c))
Proposition 2.11 (Division Rule). If f, g are differentiable in the real sense at c and g(c) = 0, then
f ∂( ) ∂f (c)g(c) − f(c) ∂g (c) g (c)= ∂z ∂z , (22) ∂z g2(c) f ∂f ∂g ∂( ) ∗ (c)g(c) − f(c) ∗ (c) g (c)= ∂z ∂z . (23) ∂z g2(c)
f(c) 1 Proof. It follows immediately from the multiplication rule and the reciprocal rule g(c) = f(c) g(c) .
Proposition 2.12 (Chain Rule). If f is differentiable in the real sense at c and g is differentiable in the real sense at f(c), then ∂g ◦ f ∂g ∂f ∂g ∂f ∗ (c)= (f(c)) (c)+ (f(c)) (c), (24) ∂z ∂z ∂z ∂z∗ ∂z ∂g ◦ f ∂g ∂f ∂g ∂f ∗ (c)= (f(c)) (c)+ (f(c)) (c). (25) ∂z∗ ∂z ∂z∗ ∂z∗ ∂z∗
10 Proof. Consider the function h(z) = g ◦ f(z) = ug(uf (x,y), vf (x,y)) + ivg(uf (x,y), vf (x,y)). Then the partial derivatives of uh and vh are given by the chain rule: ∂u ∂u ∂u ∂u ∂v h (c)= g (f(c)) f (c)+ g (f(c)) f (c), ∂x ∂x ∂x ∂y ∂x ∂u ∂u ∂u ∂u ∂v h (c)= g (f(c)) f (c)+ g (f(c)) f (c), ∂y ∂x ∂y ∂y ∂y ∂v ∂v ∂u ∂v ∂v h (c)= g (f(c)) f (c)+ g (f(c)) f (c), ∂x ∂x ∂x ∂y ∂x ∂v ∂v ∂u ∂v ∂v h (c)= g (f(c)) f (c)+ g (f(c)) f (c). ∂y ∂x ∂y ∂y ∂y In addition, we have:
∂g ∂f 1 ∂u ∂u ∂u ∂v ∂u ∂v ∂u ∂u (f(c)) (c)= g (f(c)) f (c)+ g (f(c)) f (c)+ i g (f(c)) f (c) − i g (f(c)) f (c) ∂z ∂z 4 ∂x ∂x ∂x ∂y ∂x ∂x ∂x ∂y ∂v ∂u ∂v ∂v ∂v ∂v ∂v ∂u + g (f(c)) f (c)+ g (f(c)) f (c)+ i g (f(c)) f (c) − i g (f(c)) f (c) ∂y ∂x ∂y ∂y ∂y ∂x ∂y ∂y ∂v ∂u ∂v ∂v ∂v ∂v ∂v ∂u + i g (f(c)) f (c)+ i g (f(c)) f (c) − g (f(c)) f (c)+ g (f(c)) f (c) ∂x ∂x ∂x ∂y ∂x ∂x ∂x ∂y ∂u ∂u ∂u ∂v ∂u ∂v ∂u ∂u −i g (f(c)) f (c) − i g (f(c)) f (c)+ g (f(c)) f (c) − g (f(c)) f (c) ∂y ∂x ∂y ∂y ∂y ∂x ∂y ∂y and ∂g ∂f ∗ 1 ∂u ∂u ∂u ∂v ∂u ∂v ∂u ∂u (f(c)) (c)= g (f(c)) f (c) − g (f(c)) f (c) − i g (f(c)) f (c) − i g (f(c)) f (c) ∂z∗ ∂z 4 ∂x ∂x ∂x ∂y ∂x ∂x ∂x ∂y ∂v ∂u ∂v ∂v ∂v ∂v ∂v ∂u − g (f(c)) f (c)+ g (f(c)) f (c)+ i g (f(c)) f (c)+ i g (f(c)) f (c) ∂y ∂x ∂y ∂y ∂y ∂x ∂y ∂y ∂v ∂u ∂v ∂v ∂v ∂v ∂v ∂u + i g (f(c)) f (c) − i g (f(c)) f (c)+ g (f(c)) f (c)+ g (f(c)) f (c) ∂x ∂x ∂x ∂y ∂x ∂x ∂x ∂y ∂u ∂u ∂u ∂v ∂u ∂v ∂u ∂u +i g (f(c)) f (c) − i g (f(c)) f (c)+ g (f(c)) f (c)+ g (f(c)) f (c) . ∂y ∂x ∂y ∂y ∂y ∂x ∂y ∂y Summing up the last two relations and eliminating the opposite terms, we obtain:
∂g ∂f ∂g ∂f ∗ 1 ∂u ∂u ∂u ∂u ∂u ∂u (f(c)) (c)+ (f(c)) (c)= g (f(c)) f (c) − i g (f(c)) f (c)+ g (f(c)) f (c) ∂z ∂z ∂z∗ ∂z 2 ∂x ∂x ∂x ∂y ∂y ∂y ∂u ∂v ∂v ∂u ∂v ∂u + i g (f(c)) f (c)+ i g (f(c)) f (c)+ g (f(c)) f (c) ∂y ∂x ∂x ∂x ∂x ∂y ∂u ∂v ∂u ∂v −i g (f(c)) f (c)+ g (f(c)) f (c) ∂y ∂y ∂y ∂x 1 ∂u ∂v i ∂v ∂u = h (c)+ h (c) + h (c) − h (c) 2 ∂x ∂y 2 ∂x ∂y ∂h = (c). ∂z To prove the chain rule of the CW-derivative, we apply the chain rule of the W-derivative as well as
11 Propositions 2.6, 2.7 and obtain:
∂h ∂h∗ ∗ ∂g∗ ∂f ∂g∗ ∂f ∗ ∗ (c)= = (f(c)) (c)+ (f(c)) (c) ∂z∗ ∂z ∂z ∂z ∂z∗ ∂z ∂g∗ ∗ ∂f ∗ ∂g∗ ∗ ∂f ∗ ∗ = (f(c)) (c) + (f(c)) (c) ∂z ∂z ∂z∗ ∂z ∂g ∂f ∗ ∂g ∂f = (f(c)) (c)+ (f(c)) (c), ∂z∗ ∂z∗ ∂z ∂z∗ which completes the proof.
In the following we examine some examples in order to make the aforementioned rules more intelligible.
2 ∂f ∂f C 1. f(z)= z . Since f is complex analytic, ∂z∗ (z)=0 and ∂z (z) = 2z, for all z ∈ .
∗ ∂f ∂f C 2. f(z)= z . Since f is conjugate-complex analytic, ∂z (z)=0 and ∂z∗ (z) = 1, for all z ∈ .
3 ∗ 2 3 ∗ 2 ∂f ∂(z ) ∂(iz) ∂((z ) ) 2 3. f(z)= z − iz + (z ) . Applying the linearity rule we take: ∂z (z)= ∂z + ∂z + ∂z = 3z + i. 3 ∗ 2 ∂f ∂(z ) ∂(iz) ∂((z ) ) ∗ Similarly, ∂z∗ (z)= ∂z∗ + ∂z∗ + ∂z∗ = 2z .
1 ∂f 1 ∂f C 4. f(z)= z . Applying the reciprocal rule: ∂z (z)= − z2 , ∂z∗ (z) = 0, for all z ∈ − {0}.
5. f(z)= z2 + z∗ 3. Consider the functions g(z)= z3 and h(z)= z2+z∗. Then f(z)= g◦h(z). Applying ∂f ∂g 2 ∗ 3 ∂h ∂g 2 ∗ 3 ∂h∗ 2 ∗ 2 2 ∗ 2 the chain rule: (z)= ((z + z ) ) (z)+ ∗ ((z + z ) ) (z) = 3(z + z ) 2z +0=6z(z + z ) . ∂z ∂z ∂z ∂z ∂z ∂f 2 ∗ 2 Similarly, ∂z∗ (z) = 3(z + z ) .
It should have become clear by now, that all the aforementioned rules can be summarized to the following simple statements:
• To compute the W-derivative of a function f, which is expressed in terms of z and z∗, apply the known differentiation rules considering z∗ as a constant. • To compute the CW-derivative of a function f, which is expressed in terms of z and z∗, apply the known differentiation rules considering z as a constant.
Most texts or papers, that deal with Wirtinger’s methodology, highlight the aforementioned rules only, disregarding the underlying rich mathematical body. Therefore, the first-time reader is left puzzled and with many unanswered questions. For example, it is difficult for the beginner to comprehend the notion “keep z∗ constant”, since if z∗ is kept fixed, then so does z. We should emphasize, that these statements should be regarded as a simple computational trick, rather than as a rigorous mathematical rule. This trick works well, as shown in the examples, due to Theorem 2.5, which states the W and CW derivative vanish for conjugate-complex analytic and complex analytic functions respectively, and the differentiation rules (Propositions 2.8-2.12). Nonetheless, special care should be considered, whenever this trick is applied. 2 ∂f Given the function f(z) = |z| , one might conclude that ∂z∗ = 0, since if we consider z as a constant, according to the aforementioned statements, then f(z) is also a constant. However, on a closer look, one ∗ ∂f may recast f as f(z)= zz . Then the same trick produces ∂z∗ = z. Which one is correct? ∂f ∂f The answer, of course, is that ∂z∗ = z. One should not conclude that ∂z∗ = 0, since f is not conjugate- holomorphic. We can only apply the aforementioned tricks on functions f that are expressed in terms of z and z∗ (hence the term f(z, z∗) used often in the literature) in such a way, that: a) if we replace z with a fixed w, while keeping z∗ intact, the resulting function will be conjugate-complex analytic and b) if we
12 replace z∗ with a fixed w, while keeping z intact, the resulting function will be complex analytic. Utilizing the methodology employed in Proposition 2.2, one may prove that this is possible for any real analytic function f. Of course, another course of action (a more formal one) is to disregard the aforementioned tricks and use directly the differentiation rules (as seen in the examples).
2.3 Second Order Taylor’s expansion formula Let f a complex function differentiable in the real sense. Following the same procedure that led us to 7 we may derive Taylor’s expansion formulas of any order. In this section we consider the second order expansion, since it is useful in many problems that employ Newton-based minimization. Consider the second order Taylor’s expansion of u and v around c = c1 + ic2: ∂u ∂u u(c + h)= u(c + h , c + h )=u(c , c )+ (c , c )h + (c , c )h 1 1 2 2 1 2 ∂x 1 2 1 ∂y 1 2 2 1 + (h , h ) H (h , h )T + o(|h|2), 2 1 2 u 1 2 ∂v ∂v v(c + h)= v(c + h , c + h )=v(c , c )+ (c , c )h + (c , c )h 1 1 2 2 1 2 ∂x 1 2 1 ∂y 1 2 2 1 + (h , h ) H (h , h )T + o(|h|2), 2 1 2 v 1 2 where Hu, Hv are the associated Hessian matrices. Since f(c + h)= u(c + h)+ iv(c + h), after some algebra one obtains that:
∂2f ∂2f ∂f ∂f h 1 2 (c) ∗ (c) h ∗ ∂z2 ∂z∂z2 2 f(c + h)= f(c)+ , ∗ ∗ + (h, h ) ∂ f ∂ f ∗ + o(|h| ). ∂z ∂z h 2 ∗ (c) ∗ 2 (c) h ∂z ∂z ∂(z )
2.4 Wirtinger’s calculus applied on real valued functions Many problems of complex signal processing involve minimization problems of real valued cost functions defined on complex domains, i.e., f : X ⊂ C → R. Therefore, in order to successfully implement the associated minimization algorithms, the gradients of the respective cost functions need to be deployed. Since, such functions are neither complex analytic nor conjugate-complex analytic3, our only option is to compute the gradients, either by employing ordinary R2 calculus, that is regarding C as R2 and evaluating the gradient (i.e., the partial derivatives) of the cost function f(x + iy) = f(x,y), or by using Wirtinger’s calculus. Both cases will eventually lead to the same results, but the application of Wirtinger’s calculus provides a more elegant and comfortable alternative, especially if the cost function by its definition is given in terms of z and z∗ instead of x and y (i.e., the real and imaginary part of z). As the function under consideration f(z) is real valued, the W and CW derivatives are simplified, i.e.,
∂f 1 ∂f ∂f ∂f 1 ∂f ∂f (c)= (c) − i (c) and (c)= (c)+ i (c) ∂z 2 ∂x ∂y ∂z∗ 2 ∂x ∂y and the following important property can be derived.
Lemma 2.13. If f : X ⊆ C → R is differentiable in the real sense, then
∂f ∗ ∂f (c) = (c). (26) ∂z ∂z∗
3Of course, one may not use a complex valued function as a cost function of a minimization problem, since we cannot define inequalities in C.
13 An important consequence is that if f is a real valued function defined on C, then its first order Taylor’s expansion at z is given by: ∂f ∂f f(c + h)= f(c)+ (c)h + (c)h∗ + o(|h|) ∂z ∂z∗ ∂f ∂f ∗ = f(c)+ (c)h + (c)h + o(|h|) ∂z ∂z ∂f = f(c)+ ℜ (c)h + o(|h|). ∂z However, in view of the Cauchy Riemann inequality we have:
∂f ∂f ∗ ∂f ∗ ℜ (c)h = ℜ h, (c) ≤ h, (c) ∂z ∂z ∂z C C ∂f ≤ | h| (c) . ∂z∗ ∂f ∂f The equality in the above relationship holds, if h ⇈ ∂z∗ . Hence, the direction of increase of f is ∂z∗ . Therefore, any gradient descent based algorithm minimizing f(z) is based on the update scheme:
∂f z = z − µ (z ). (27) n n−1 ∂z∗ n−1 Assuming differentiability of f, a standard result from elementary real calculus states that a necessary condition for a point (x0,y0) to be an optimum (in the sense that f(x,y) is minimized) is that this point is a stationary point of f, i.e. the partial derivatives of f at (x0,y0) vanish. In the context of Wirtinger’s calculus we have the following obvious corresponding result.
Proposition 2.14. If f : X ⊆ C → R is differentiable at x in the real sense, then a necessary condition for a point c to be a local optimum (in the sense that f(c) is minimized or maximized) is that either the W, or the CW derivative vanishes4.
3 Wirtinger’s Calculus on general complex Hilbert spaces
In this section we generalize the main ideas and results of Wirtinger’s calculus on general Hilbert spaces. To this end, we begin with a brief review of the Fr´echet derivative, which generalizes differentiability to abstract Banach spaces.
3.1 Fr´echet Derivatives Consider a Hilbert space H over the field F (typically R or C). The operator T : H → F ν is said to be T ν Fr´echet differentiable at f0, if there exists a linear continuous operator W = (W1,W2,...,Wν) : H → F such that