A Potpourri of Nonlinear Algebra Chris Hillar

Home , Claude Shannon

a1a2a3 + c1c2c3 =2,a1a3b2 + c1c3d2 =0,a2a3b1 + c2c3d1 =0, 1 a3b1b2 + c3d1d2 = 4,a1a2b3 + c1c2d3 =0,a1b2b3 + c1d2d3 = 4, di = ✓i + ✓j a2b1b3 + c2d3d1 =4,b1b2b3 + d1d2d3 =0 j=i X6 A Potpourri of Nonlinear Algebra Chris Hillar

2 1 a000 a010 a100 a110 a000 a010 a100 a110 Det2,2,2( )= det + det A 4 a001 a011 a101 a111 a001 a011 a101 a111  ✓  ◆ ✓  ◆ a a a a 4 det 000 010 det 100 110 . a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , a001 a011 a101 a111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1   a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 1 1 1 3 1 3 3 3 1 1 1 4 1 4 4 4 1 1 1 2 1 2 2 2 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 2a b + a b + a b +2a b , 2 2 2 3 2 3 3 3 3 3 3 4 3 4 4 4 1 1 1 2 2 1 2 2 2a2b2 + a2b3 + a3b2 +2a3b3, 2a1b1 + a1b3 + a2b1 +2a3b3, 2a1b1 + a1b4 + a4b1 +2a4b4, 2a b + a b + a b +2a b ,w2 + w2 + + w2 + w2 . 3 3 3 4 4 3 4 4 1 2 ··· 17 18 a000x0y0 + a010x0y1 + a100x1y0 + a110x1y1 =0,a001x0y0 + a011x0y1 + a101x1y0 + a111x1y1 =0,

a000x0z0 + a001x0z1 + a100x1z0 + a101x1z1 =0,a010x0z0 + a011x0z1 + a110x1z0 + a111x1z1 =0,

a000y0z0 + a001y0z1 + a010y1z0 + a011y1z1 =0,a100y0z0 + a101y0z1 + a110y1z0 + a111y1z1 =0,

ICERM | Computational Nonlinear Algebra | June 2014 Outline

Computational complexity of nonlinear algebra

“Real-life” examples:

Tensor problems - graph theory, optimization, Groebner bases

Neuroscience: The Retina Equations - bipartite graphs, probability, matrix analysis Computational Nonlinear Algebra Problem: Solve on a finite computer in finite time a finite set of polynomial (quadratic) equations

computability ring reference

Undecidable [Hilbert’s 10th Problem] Z [Davis, Putnam, Robinson, (“Uncomputable”) Matijasevič ’61/’70]

????? Q [Poonen ’03]

Decidable R [Tarski–Seidenberg] (“Computable”) C [Hironaka ’64, Buchberger ’70] Some “Random” Polynomial Systems:

a) a1a2a3 + c1c2c3 =2,a1a3b2 + c1c3d2 =0,a2a3b1 + c2c3d1 =0, a b b + c d d = 4,aa b + c c d =0,ab b + c d d = 4, 3 1 2 3 1 2 1 2 3 1 2 3 1 2 3 1 2 3 a2b1b3 + c2d3d1 =4,b1b2b3 + d1d2d3 =0

a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , b) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 1 1 1 3 1 3 3 3 1 1 1 4 1 4 4 4 1 1 1 2 1 2 2 2 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 2a b + a b + a b +2a b , 2 2 2 3 2 3 3 3 3 3 3 4 3 4 4 4 1 1 1 2 2 1 2 2 2a2b2 + a2b3 + a3b2 +2a3b3, 2a1b1 + a1b3 + a2b1 +2a3b3, 2a1b1 + a1b4 + a4b1 +2a4b4, 2a b + a b + a b +2a b ,w2 + w2 + + w2 + w2 . 3 3 3 4 4 3 4 4 1 2 ··· 17 18 c) a000x0y0 + a010x0y1 + a100x1y0 + a110x1y1 =0,a001x0y0 + a011x0y1 + a101x1y0 + a111x1y1 =0,

a000x0z0 + a001x0z1 + a100x1z0 + a101x1z1 =0,a010x0z0 + a011x0z1 + a110x1z0 + a111x1z1 =0,

a000y0z0 + a001y0z1 + a010y1z0 + a011y1z1 =0,a100y0z0 + a101y0z1 + a110y1z0 + a111y1z1 =0, A Briefer on Computational Complexity

I. Model of Computation - What are inputs / outputs? Alan Turing - What is a computation?

II. Model of Complexity - Cost of computation? Stephen Cook

Dick Karp III. Model of Reducibility - What are equivalent problems? Leonid Levin I. Model of Computation: Turing Machine [Turing ’37][Turing 1936] Inputs: ﬁnite list of rational numbers Outputs: YES/NO or rational vectors II. Model of Complexity: Time complexity Turing Machine (Mike Davey) Number of Tape-Level moves III. Model of Reducibility: Classes: P (polynomial-time), NP,, NP-complete, NP-hard, ... NP-Hard

Tensor Problems P2 input I input I’ P1 polynomial-sized NP-Complete transformation P1 P2 NP P

Matrix Problems YES/NO = YES/NO the world of all computational problems NP-complete decision problems [Cook-Karp-Levin 1971/2]

Graph coloring: Given graph G, is there a proper 3-coloring?

1 2 1 2

4 3 4 3 YES NO

is an NP-complete (can verify quickly) problem

1 Million $$$ prize (Clay Math) Connection to nonlinear algebra Theorem [Bayer ‘82]: Whether or not a graph is 3- colorable can be encoded as whether a system of cubic equations over C has a nonzero solution Reformulation [H., Lim ’13]: Whether or not a graph G on v vertices with edges E is 3-colorable can be encoded as whether the following homogeneous quadratics has a nonzero solution in C 2 2 2 xiyi u ,yiu xi ,xiu yi ,i=1,...,v, CG = 2 2 ( j: i,j E(xi + xixj + xj ),i=1,...,v. { }2 xiP= ai + ibi Quadratic System yi = ci + idi over the reals R Example: The following primitive cube root of 1 ! 2 graph is 3-colorable: !1 12 1 2 xi =1

xi = !

4 3 4 2 3 xi = ! 1 !

The system has a (nonzero) solution over the reals: 35 35 35 35 homogeneous quadratics in 35 indeterminates: Q ⇥ ⇥ A 2 b) a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 1 1 1 3 1 3 3 3 1 1 1 4 1 4 4 4 1 1 1 2 1 2 2 2 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 2a b + a b + a b +2a b , 2 2 2 3 2 3 3 3 3 3 3 4 3 4 4 4 1 1 1 2 2 1 2 2 2a2b2 + a2b3 + a3b2 +2a3b3, 2a1b1 + a1b3 + a2b1 +2a3b3, 2a1b1 + a1b4 + a4b1 +2a4b4, 2a b + a b + a b +2a b ,w2 + w2 + + w2 + w2 . 3 3 3 4 4 3 4 4 1 2 ··· 17 18 Example1 : The following2 1 2 graph is not 3-colorable:

4 3 4 3

The system does not have (nonzero) solution over R :

a2 b2 + a a b b + a2 b2, 2a b + a b + a b +2a b b) 2 2 2 4 2 4 4 4 2 2 2 4 4 2 4 4 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 a c b d u2,bc + a d ,cu a2 + b2,du 2a b ,au c2 + d2,bu 2d c , 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 1 1 1 3 1 3 3 3 1 1 1 4 1 4 4 4 1 1 1 2 1 2 2 2 a2 b2 + a a b b + a2 b2,a2 b2 + a a b b + a2 b2, 2a b + a b + a b +2a b , 2 2 2 3 2 3 3 3 3 3 3 4 3 4 4 4 1 1 1 2 2 1 2 2 2a2b2 + a2b3 + a3b2 +2a3b3, 2a1b1 + a1b3 + a2b1 +2a3b3, 2a1b1 + a1b4 + a4b1 +2a4b4, 2a b + a b + a b +2a b ,w2 + w2 + + w2 + w2 . 3 3 3 4 4 3 4 4 1 2 ··· 17 18 Example: The graph G below is uniquely 3-colorable [Example of Akbari, Mirrokni, Sadjad ‘01 disproving a conjecture of Xu ’90]

The coloring ideal I G is trivial (< 2 sec computation) [H., Windfeldt ’08] Tensor eigenvalues

n n n Problem: Given =[[aijk]] Q ⇥ ⇥ A 2

ﬁnd ( x , ) with x = 0 such that: 6 n

aijkxixj = xk,k=1,...,n i,j=1 X [Lim 2005], [Qi 2005], [Ni, et al 2007], [Qi 2007], [Cartwright and Sturmfels 2012]

Some Facts: Generic or random tensors over complex numbers have a ﬁnite number of eigenvalues and eigenvectors (up to scaling equivalence), although their count is exponential.

Still, it is possible for a tensor to have an inﬁnite number of non-equivalent eigenvalues, but in that case they comprise a coﬁnite set of complex numbers

Another important fact is that over the reals, every 3-tensor has a real eigenpair. Decision problem

n n n Problem: Given =[[aijk]] Q ⇥ ⇥ A 2 n and Q does there exist 0 = x C 2 6 2 n

aijkxixj = xk,k=1,...,n i,j=1 X

Decidable (Computable on a Turing machine): - Quantiﬁer elimination - Buchberger’s algorithm and Groebner bases - Multivariate resultants

All quickly become inefﬁcient as n grows Is there an efﬁcient algorithm? No, because Quadratic equations are hard to solve [Bayer 1982], [Lovasz 1994], [Grenet et al 2010], ...

NP-Hard Corollary: Deciding if =0 is a Tensor Problems tensor eigenvalue is NP-hard NP-Complete

NP 1 √3 , −2 2 P ! "

Matrix Problems xi = !

3 4 xi =1 Corollary: Unless P = NP, (1, 0) there is no polynomial time 3 approximation scheme for 4 ﬁnding tensor eigenvectors 2 to within ✏ =3/4 xi = ! Computational complexity of tensor problems

[H., Lim ’13] Tensor Rank

n rank 1 tensors: = x y z, x, y, z F A=[[x⌦iyj⌦zk]] = x 2y z - Segre varietyA ⌦ ⌦

Deﬁnition: Tensor rank over the ﬁeld F is r

rankF( )=min = xii yyii zzii . F A r {A ⌦⌦ ⌦⌦ }} i=1 X Theorem [Hastad ’90]: Tensor rank is NP-hard over Q

Note: rank can change over changing ﬁelds (not true linear algebra)

Question: Is tensor rank different over reals / complex? The following system has no solution over the rationals [Singular, Macaulay 2, Maple, ...] a) a1a2a3 + c1c2c3 =2,a1a3b2 + c1c3d2 =0,a2a3b1 + c2c3d1 =0, a b b + c d d = 4,aa b + c c d =0,ab b + c d d = 4, 3 1 2 3 1 2 1 2 3 1 2 3 1 2 3 1 2 3 a2b1b3 + c2d3d1 =4,b1b2b3 + d1d2d3 =0

= z z z + z z z A ⌦ ⌦ ⌦ ⌦ z = x + p2y, z = x p2y x =[1, 0]>, y =[0, 1]>

Theorem [H. Lim ’13] There are rational tensors with different rank over the rationals versus the reals rank ( ) < rank ( ) R A Q A The 2 x 2 x 2 hyperdeterminant of a tensor is zero: (Deﬁning equation for the dual variety to Segre variety)

2 1 a000 a010 a100 a110 a000 a010 a100 a110 Det2,2,2( )= det + det A 4 a001 a011 a101 a111 a001 a011 a101 a111  ✓  ◆ ✓  ◆ a a a a 4 det 000 010 det 100 110 . a001 a011 a101 a111  

[Cayley 1845]

There exist ( x , y , z ) = 0 such that: 6 c) a000x0y0 + a010x0y1 + a100x1y0 + a110x1y1 =0,a001x0y0 + a011x0y1 + a101x1y0 + a111x1y1 =0,

a000x0z0 + a001x0z1 + a100x1z0 + a101x1z1 =0,a010x0z0 + a011x0z1 + a110x1z0 + a111x1z1 =0,

a000y0z0 + a001y0z1 + a010y1z0 + a011y1z1 =0,a100y0z0 + a101y0z1 + a110y1z0 + a111y1z1 =0,

Conjecture: NP-hard to compute hyperdeterminant and now for something completely different ... Neuroscience Motivation: Spike Coding of Continuous Signals

0100011

1010011

Santiago Ramón y Cajal

? ????

continuous signal neural sensor binary spike train in the world circuit (retina) (ganglion neurons) Shannon Entropy (1948)

“Theseus” ﬁrst robot mouse

Claude Shannon

Father of “Entropy Theory” Given a distribution on a ﬁnite number of states:

pi = probability of being in state i

p =(p1,...,pN )

Deﬁnition: The entropy of a distribution is

N 1 H(p)= p log i p i=1 i X

- entropy is a measure of the uncertainty in a random variable

- entropy provides an absolute limit on the best possible lossless encoding or compression of a communication Given a distribution on a ﬁnite number of states:

pi = probability of being in state i

p =(p1,...,pN )

Deﬁnition: The entropy of a distribution is

N 1 H(p)= p log log N i p  i=1 i X

- entropy is a measure of the uncertainty in a random variable

- entropy provides an absolute limit on the best possible lossless encoding or compression of a communication (hence bounded above by log N) N 1 Example: Flipping coins H(p)= p log i p i=1 i Heads Tails X 1 1 p = p = H 2 T 2 1 1 H(p ,p )= log 2 + log 2 = 1 H T 2 2

A fair coin ﬂip has 1 bit of entropy or information (n fair coin ﬂips have n bits of entropy)

Example: The uniform distribution has highest entropy

1 1 p = ,..., H(p) = log N N N ✓ ◆ Example: Compressing letters over the interwebs:

0.14

0.12

0.1 H = 4.2 bits (per letter)

0.08

0.06

probability of 0.04 letter letter in Oxford English Dictionary

0.02

e t a o i n s h r d l c u m w f g y p b v k j x q z

- So it takes 4.2 < 4.7 = log(26) bits per character to code English letters

- E.g., e = “0”, t = “01”, a = “10”, ... Maximum Entropy

• Statistical Physics (Jaynes, 1957) • Biology, Neuroscience (Bialek)

Statistical =? Process New measurements

Measurements MaxEnt Model

most random / generic distribution given constraints / measurements What about computation?

Learning (development)

Statistical Measurements MaxEnt Process Model

new measurement ???

Input Computation Output

Neural Circuit Example: Neural Processing Distributions on graphs

1 8 1 p = 1/2

2 1/2

1/2 3

i.i.d. Bernoulli distribution on each edge

Example: Erdős-Renyi (ER) distribution on graphs Maximum Entropy for graphs

Example: The maximum entropy distribution on graphs is the ER distribution

1 p = 1/2

2 1/2

1/2 3 Expected degree sequence

Given a distribution on graphs, can compute an expected degree sequence:

E[di]=E[ wij] j=i X6

Example: ER with p = 1/2

n 1 E[di]= 2 Maximum Entropy for graphs

Problem: Given an expected degree sequence d, what is the maximum entropy distribution on graphs with this expected degree sequence?

Answer (classical): ✓1 1 1+e✓1+✓2 1 1 d1 = + 1+e✓1+✓3 1+e✓1+✓2 1 1 1 d2 = ✓ +✓ + ✓ +✓ ✓2 1+e 2 3 1+e 2 1 1+e✓1+✓3 1 1 d3 = + 1+e✓3+✓1 1+e✓3+✓2 1 1+e✓2+✓3 Erdős-Renyi ✓3 ✓ =0 Chatterjee-Diaconis-Sly (2011)

n =300,r =2

0.5 Persi Diaconis

0 MLE estimate −0.5

−1

−1 −0.5 0 0.5 1 True θ ∗

Theorem: One sample of a graph from such a maximum entropy distribution determines the distribution for large n

Student Version of MATLAB Application: Binary Coding of Continuous Signals

Original ✓ expected d degrees Single graph sample Reconstruction ✓ˆ ˆ “emperical” d degree sequence What about graphs which have [H-Wibisono]:

r different edge values {0,...,r-1} nonnegative integer edges values {0,1,....} nonnegative real values

1 Edges are exponential random variables with means ✓i + ✓j 1 di = ✓i + ✓j j=i X6 Sanyal-Sturmfels-Vinzant (2013)

The “Retina Equations”:

1 (1) di = , i =1,...,N ✓i + ✓j Bernd Sturmfels j=i X6 Studied (more generally) using matroid theory and algebraic geometry Theorem [H., Wibisono ’13]: There is almost surely a unique nonnegative solution to any retina equation. Moreover, given one sample from a graph distribution, solving the equations gives original parameters for large n:

log n ✓ ✓ C with high probability | |1  r n b Proof ingredients:

(1) Large deviation theory

(2) Inverses of special class of matrices (positive symmetric diagonally dominant matrices) Diagonally Dominant Matrices

Deﬁnition: A positive matrix is diagonally dominant if each off-diagonal row sum is at most the diagonal entry

612 3 3111 181 5 1311 J = S4 = 2 3 2 214 13 1131 6 3 5 1 10 7 6 11137 6 7 6 7 4 5 4 5 Theorem [H., Lin, Wibisono]: For a positive, symmetric diagonally dominant (n x n) matrix J with smallest entry > 1:

1 1 3n 4 J S = || ||1  || ||1 2(n 2)(n 1) bipartite bipartite non-bipartite

1 N =lim(S + tP ) (P is the signless Laplacian of G) t !1 Theorem [H., Wibisono ’13]: There is almost surely a unique nonnegative solution to any retina equation. Moreover, given one sample from a graph distribution, solving the equations gives original parameters for large n:

1 Proof sketch: di = ✓i + ✓j F (✓)=(d1,...,dn) j=i X6 F (✓) F (✓ˆ)+J (✓ ✓ˆ) ⇡ ·

1 C log n ✓ ✓ˆ J d dˆ n log n C | |1  | |1| |1  n  r n p

Large deviation theory Matrix theorem (strengthening of central limit theorem) ( J is the Jacobian of the map F ) 1 log n ✓ ✓ˆ J d dˆ C | |1  | |1| |1  r n

✓ F d

✓ˆ 1 F dˆ

✓ - Original parameters of graph distribution d - Expected degree sequence of graph distribution dˆ - Degrees computed from single graph sample ✓ˆ - Parameters inferred from sampled degrees Problems: Solve these other “Retina Equations”

Maximum entropy graph distributions with binary edges:

1 di = , i =1,...,N ✓i✓j +1 j=i X6 with nonnegative integer edges:

1 i =1,...,N di = , ✓i✓j 1 j=i X6 also others ... Redwood Center for Theoretical Neuroscience (U.C. Berkeley)

Mathematical Sciences Research Institute National Science Foundation