Research Statement LUIS DAVID GARCIA–PUENTE My mathematical research interests are in algebraic statistics, computational algebraic geometry, and combinatorial commutative algebra. These areas lie in the interplay among algebra, geometry, combinatorics, and symbolic computation. In recent years new algorithms have been developed and several old and new methods from these fields have led to significant and unexpected advances in several diverse areas of application. My research projects arise from problems in discrete multivari- ate analysis, probabilistic independence models, computational biology, and geometric modelling. While most of my work has been in algebraic statistics, I have also mentored several undergradu- ate students as part of the Texas A&M REU/VIGRE course “Algebraic Methods in Computational Biology.” I have established a network of collaborators from mathematics, statistics, and computer science. My main professional goal is to begin a research program involving both undergraduate and graduate students. My research area is ideal for this, because many problems have simple formu- lations, computation and experimentation play an important role, and solutions involve beautiful ideas from diverse areas of knowledge.
1 Algebraic Statistics
Algebraic statistics is a new field using ideas from combinatorics, discrete geometry, computational algebra and algebraic geometry to formulate, interpret, and solve statistical problems. It has been applied to experimental design, discrete statistical analysis, statistical inference, and computational biology. The core principle in this area is that discrete statistical models are the non-negative real points of certain algebraic varieties.
Bayesian Networks
Bayesian networks are directed graphical models. The geometry of these models is relevant in statistics, see [6]. Algebraic statistics provides the right framework to understand the geometry of these models. Bayesian networks can be described either by a recursive factorization of probability distributions or by conditional independence statements dictated by a graph, known as global Markov statements. This is an instance of the computational algebra principle that varieties can be presented either parametrically or implicitly. For example, let G be a graph consisting of two disjoint nodes representing the random variables X and Y . The only global Markov statement encoded by G is global(G) = {X⊥⊥Y }. The recursive factorization of the joint probability distribution of X and Y is given by
p(X = i, Y = j) = p(X = i)p(Y = j|X = i) = p(X = i)p(Y = j).
The equivalence of these two representations for Bayesian networks is the Factorization Theorem, a well-known theorem in statistics. This theorem is surprisingly delicate and no longer holds in the usual setting of algebraic geometry. LUIS DAVID GARCIA–PUENTE RESEARCH STATEMENT 2
Factorization Theorem
In the previous example, the Factorization theorem implies that X is independent of Y if and only if the joint probability distribution of X and Y factors as p(X,Y ) = p(X)p(Y ). Now assume that X has two states and Y has three states. Let pij be an indeterminate denoting the probability p(X = i, Y = j). Let IX⊥⊥Y be the ideal generated by the 2 × 2-minors of the 2 × 3-matrix p11 p12 p13 p21 p22 p23
Then every joint probability distribution p = p(X,Y ) on two independent random variables is a point in the variety V (IX⊥⊥Y ). On the other hand, the factorization of the joint probability distribution gives a map
f : (a), (b, c) 7−→ ab, ac, a(1 − b − c), (1 − a)b, (1 − a)c, (1 − a)(1 − b − c) where a = p(X = 0), b = p(Y = 0), and c = p(Y = 1). The image of this map corresponds to the set of all distributions that factor according to G. The map f gives a parametric definition of an algebraic variety. In this context, each distribution in the image of f is a point in the variety V (If ), where the ideal If is the kernel of the homomorphism f. In [4], Michael Stillman, Bernd Sturmfels and I proved the following generalization of the Factorization theorem.
Theorem 1. The prime ideal If is a minimal primary component of Iglobal(G). More precisely,