Some Characteristics of the Second of Random Two Dimensional Simplicial Complexes

by

Kang Tan

B.Sc, Fudan University, 1989

A THESIS SUBMITTED TN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Mathematics)

We accept this thesis as coiiforrriing to the required standard

The University of British Columbia April 1996 © Kang Tan, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.

Department of

The University of British Columbia Vancouver, Canada

Da,e M IS,!**

DE-6 (2/88) ABSTRACT

In this thesis, through generating random two-dimensional simplicial complexes, we studied the event (b2=0) for some specific probabilities. We found when the

probability of event (b2=0) takes on certain specific values, the pair (no,n2) lies on certain lines. However, this research is limited by our sample space ( i.e. for

P(b2=0) « 10%, 10 < no < 80; for P(b2=0) « 50%, 12 < no < 100; for P(b2=0) « 90%, 12 < no < 145). The " linear behavior" may not hold asymptotically. In the same time, we endeavor to find the number of tetrahedra and 6-triangles in the simplicial

complexes. When the event (b2=0) occurs in our specific probabilities, it seems the second Betti number should come from tetrahedra and 6-triangles with high probability. However, the expectation of the number of tetrahedra and 6-triangles goes to zero,

when no goes to infinity and there exists linear relationships between the pair (no,n2). This evidence may also support that the " linear behavior" may not hold asymptotically.

If n2 and n0 vary linearly with no going to infinity, then the probability that no(K) — no is extremely small in model MB for reasons are similar to the Coupon Collector's

problem. Hence, the probability that we cannot find element in S(no,n2) is large, which indicates that the model may have problems. TABLE OF CONTENTS

Abstract ii

Table of Contents iii

List of Tables iv

List of Figures v

Acknowledgment vi

INTRODUCTION 1

Section One Algebraic Concepts 2 Simplices and Simplicial Complexes 2 Boundary and 3 The Betti Number and the Euler Number 4

Section Two The Probabilistic Model 5

Section Three The Random Generator and Experimental Outcomes 10 The Random Simplicial Complex Generator 10 The Data and Data Analysis 14

Section Four Tetrahedra and Six-triangles in a Simplicial Complex 18 The Expected Number of Tetrahedra and Six-triangles in S(no, n2) 19 The Process to Detect Tetrahedra and Six-triangles in a Simplicial Complex 21 Bibliography 36

iii LIST OF TABLES

Table 1 The Upper Bounds of PMBO^I) for Some Specific Values of no and n2 8

Table 2 Computation Results of Floating Point and Exact Precision for Some Data 14

Table 3 Data Distribution for P(b2=0) « 10 14

Table 4 Data Distribution for P(b2=0) » 50% 15

Table 5 Data Distribution for P(b2=0) « 90% 15

Table 6 Regression Functions for P(b2=0) * 10% 16

Table 7 Regression Functions for P(b2=0) * 50% 17

Table 8 Regression Functions for P(b2=0) « 90% 17

Table 9 Portion of the Second Betti Number Coming from Tetrahedra and Six-triangles 23

iv LIST OF FIGURES

Figure 1 Scatter Plot of n2 against no when P(b2=0) near 10% 24

Figure 2 Scatter Plot of n2 against no when P(b2=0) near 50% 25

Figure 3 Scatter Plot of n2 against no when P(b2=0) near 90% 26

Figure 4 Regression Line of n2 against no when P(b2=0) = 10% 27

Figure 5 Regression Line of n2 against no when P(b2=0) near 10% 28

Figure 6 Regression Line of n2 against no when P(b2=0) = 50% 29

Figure 7 Regression Line of n2 against no when P(b2=0) near 50% 30

Figure 8 Regression Line of n2 against no when P(b2=0) = 90% 31

Figure 9 Regression Line of n2 against no when P(b2=0) near 90% 32

Figure 10 Distribution of b2 when no=20 n2= 143 33

Figure 11 Distribution of b2 when no=30 n2=200 33

Figure 12 Distribution of b2 when no=50 n2=400 33

Figure 13 Distribution of b2 when no=45 n2=256 34

Figure 14 Distribution of b2 when no=50 n2=260 34

Figure 15 Distribution of b2 when no=80 n2=500 34

Figure 16 Distribution of b2 when no=30 n2=80 35

Figure 17 Distribution of b2 when no=75 n2=250 35 Figure 18 Distribution of D2 when no=l 00 n2=350 35

v ACKNOWLEDGMENT

I wish to express gratefulness to my supervisor, Prof. Joel Friedman. During the 1 year study and research, he always gave me valuable suggestions, patiently explained anything and kindly guided me to finish this thesis. I am also thankful to Prof. Jack Snoeyink for his helpful comments on revising the thesis which led to a considerable improvement of my presentation.

vi Introduction

Distinguishing the topological isomorphism classes of topological spaces is one of the main interests in the of . To distinguish these isomorphism classes, one of the methods used is to compute the homology groups of topological spaces. The homology groups of a topological space can be computed from those of an "approximating" geometric simplicial complex; the homology groups of a geometric simplicial complex are equal to those of its abstraction, an abstract simplicial complex, which is a finite combinatorial object (generalizing graphs). Throughout this thesis, when we refer to simplicial complexes we mean abstract simplicial complexes.

A part of a homology group is its rank. The i-th Betti number, bi, is the rank of the i-th homology group. The integers, bo, bi, b„, are called Betti numbers, and are fundamental invariants of the n-dimensional topological spaces. The Betti numbers of a simplicial complex have intuitive meanings. For example, the 0-th Betti number, bo, is the number of connected components in the simplicial complex. The general goal of this line of research is to understand the Betti numbers of the simplicial complexes in any dimension.

In a one-dimensional simplicial complex (i.e. graph), one can easily findb i for any

graph, G, if bo is known, by the formula bi - b0= x(G) = E(G) - V(G), where %(G), E(G), and V(G), are the Euler characteristic, number of edges, and number of vertices of graph G, respectively. A two-dimensional simplicial complex is a collection of subsets of a fixed set, V, which is closed under taking subsets, and where the largest faces are triangles1. In a two-dimensional simplicial complex, K, even if bo is given2, one cannot

'A triangle is a set of size three, namely {vi, v2, v3}. 2 If a simplicial complex has more than one components, we can study each component separately. 1 / 1

find b2 or bi from the Euler characteristic formula, b2 -bi +bo= %(K)=n2(K) - n\(K) + n0(K),

where %(K), n2(K), ni(K), and no(K), are the Euler characteristic, number of triangles, number of edges, and number of vertices of K, respectively. For this reason, simplicial complexes in two dimensions are the lowest dimensional simplicial complexes of interest. When studying these two dimensional simplicial complexes, we will use a probabilistic model to study typical simplicial complexes.

The remaining sections of this thesis are arranged as follows. In section 1, we review the necessary concepts of algebraic topology. We describe our probabilistic model in section 2. We explicate the process of generating a random simplicial complex, display computational data, and interpret the results in section 3. In section 4, we study some combinatorial quantities which seem to be related to the second Betti number, when certain conditions hold in our model.

1. Algebraic Topology Concepts

Let us briefly review some necessary concepts from algebraic topology. All of our terminology follows that in [2].

1.1 Simplices and Simplicial Complexes For 0 < k < n, a geometric k-simplex c in R" is the convex hull of a set T of k+1 geometrically independent points. The dimension of C is dim(c) = |T| - 1 = k. For every U c T , the simplex a defined by U is a face of C, and if TJ ^ T then a is called a proper face of c. In Rn, we use the terms vertex for 0-simplex, edge for 1-simplex, and triangle

for 2-simplex. A collection of simplices, H, is a geometric simplicial complex if it satisfies

2 following two properties: (i) if a is a face of c and c sH, then asH. (ii) if a e if, o e if, then a n o either a face of both or an empty set. The largest dimension of any simplex in

H is the dimension ofH.

An abstract simplicial complex is a collection of subsets of a fixed set, K, which is closed under taking subsets. A simplex a ofK is an element of K. The dimension of a is one less than the number of vertices in it. A k-simplex of K is a simplex with dimension k. Each nonempty subset of a is called a face of a. The dimension of K is the maximum dimension of its simplices. The vertex set of K is the union of the O-simplices of K. Clearly, an abstract simplicial complex is completely determined by its maximal faces. Two abstract simplicial complexes K and Tare said to be isomorphic if there is a bijective correspondence / mapping the vertex set of K to the vertex set of T such that

{ v0, vn } e K if and only if {f[yo),/v„) } e T.

Let H be a geometric simplicial complex, and let K be an abstract simplicial complex whose vertices are in one to one correspondence with the vertices of H, a subset of vertices being a simplex of K if and only if they correspond to the vertices of some simplex of H. K is called an abstraction of H, and any geometric simplicial complex having K as an abstraction is called a realization of AT. The homology groups of a geometric simplicial complex, H, are

equal to those of its abstraction, K. Hence, the Betti numbers of H are the same as those of K.

1.2. Boundary and Homology

Each k-simplex of a complex K can be oriented by assigning a linear ordering to its vertices, denoted by c = [v<>, vi, vj. Two orientations of K are the same if one sequence differs from the other by an even number of transpositions. The boundary of o is

k A defined to be dkc = Xi=o (-1)' [v0, Vi, v; , .., vk], where the caret means that v; is

3 omitted. If o is a vertex, then 90c is defined to be zero. So 8k maps each oriented k-simplex to a formal sum of oriented (k-l)-simplices. A formal sum of integer multiples of oriented k-simplices is called a k-chain. If the coefficient of a k-simplex in some k-chains is non-zero, we say the simplex belongs to the chain. The freeabelia n group of k-chains of K is denoted by Ck = Ck(X). The map 9k naturally extends to the boundary homomorphism 3k: Ck —> Ck-i defined by 3k(Zj=idajOj) = Lj=idaj9k(Cj), where the aj are integers, the Oj are k-simplices of K, and d is the dimension of Ck. Note that a (k-1)-simplex can occur in more than one term of this sum. The coefficients of these terms are added in the usual way.

A number of interesting groups can be defined using the boundary homomorphism.

The group of .ST-cycles, Zk = Zk(K), is the kernel of 9k, which is, the subgroup of k-chains z e Ck with 3kZ = 0. The group of k-boundaries, Bk = Bk(X), is the image of 9k+i, which is the subgroup of k-chains z e Ck for which there exists a (k+l)-chain z' e Ck+i with z = 3k+i z*. It can be easily shown that 9k°9k+i z1 = 0 for every (k+l)-chain z'; therefore, Bk

is subgroup of Zk. Finally the quotient group, Hk = Hk(fy) = Zk/Bk = ker(9k)/im(9k+i), is defined to be the k-th homology group of K.

1.3. The Betti number and the Euler number

By the fundamental theorem for finitelygenerate d abelian groups, such a group, G,

b is isomorphic to a direct sum Z © Z/ti © Z/t2 ©...© Z/tm, where b is non-negative integer,

and the coefficients, ti, are integers greater than or equal to 2 such that t; divide t;+i for 1 < i < m - 1. If G is free, G is isomorphic to Zb, and is denoted as G = Zb. Let

G = Hk, then bk = bk(£) = b(Hk) (i.e. the dimension of homology group ker(9k)/im(3k+i) ),

4 and where bk is called the k-th Beth' number of K. Hence,

bk= dim(ker(3k)) - dim(im(3k+1)).

The Euler number of a simplicial complex K is defined classically by the equation,

d i X(K) = Iw, (-l) ni, where n; is the number of simplicies of K in dimension i and d is the dimension of K; by Hopf trace theorem, we have, x(£) = Xi=od(-l)'bi.

2. The Probabilistic Model

In previous research on the evolution of random graphs, one adds random edges to a fixed vertex set and studies how connected components evolve. A variation of this technique allows us to add random triangles and study the evolution of the Betti numbers

of the simplicial complexes. Let V(no) be the vertex set {1, 2, no} and S(no,n2) be the

set { K | no(£) = no, n2(K) = n2, where K is a two-dimensional simplicial complex }. With

T initially empty, we will generate a random element inS(no,n2) by repeatedly adding to T

a random triangle not in T until the number of triangles in T reaches n2. Then we take closure of triangles in T to get a simplicial complex, K. We only keep a simplicial complex, K, if no(K) = no. The above process, defined to be model MA, is used to generate

a random simplicial complex; it only dependents on parameters no and n2.

In order to simplify the calculation of the probability in our model, we introduce another model MB which is like MA except that we do not discard simplicial complexes with no(K) < no. For any event, A(X), in MA, we have:

PMA(A(/£) occurring ) = PMB( A(K) occurring given n<)(K) = no ) (1)

5 Let us consider event ( no(K) < no ) (i.e. 3 v e V(no) and v g V(K) ) in MB. Fix a Vk in V(rio) and consider the event that Vk£V(AT). We pick triangles one by one without repetition. The first triangle in V(K) does not contain Vk with probability Cs^'VCs"0, where

C3110 is the binomial coefficient of no choose 3. Then the second triangle in V(K) does not

contain vk with probability (C^-iyiCs^-l); similarly, the i-th triangle in V(K) does not

n<, 1 ,v> contain vk with probability (C3 " -i+l)/(C3 -i+l).

Therefore, PMB( vkgV(K) ) = n^KCs^-i+iyCCs^-i+l)]. We have:

PMB( no(K) < no) = PMB( 3 some vk in V(no) such that vk e V(K) ) (2)

= PMB(^k=in°{vkgV(Z)}) (3)

^Zk^P^v^V^) (4)

< noTL-^Cs^-i+iyCsM+l) (5)

n > 1 ,

< noe3nj/n<> (8)

We get formula (6) by the fact: x/y > (x-i)/(y-i), where 0 < x < y and 0 < i < y. From formula (7) to (8), we use the inequity: log(l+x) < x, where x > (-1). Notice that when

n2 > (nologno)/3, formula (8) is bounded by 1. We calculate formula (8) in the values of

pairs (no,n2) considered and obtain (8) is bounded by 0.03. Hence, PMB(IIO(£) = no) > 0.97.

PMA(A(Z) occurring) < PMB( A(K) occurring) / PMB( no(K) = no) (9)

< 1.03 1PMB( A(K) occurring) (10)

We will use formula (10) to get the upper bound of the probability for any event occurring in our model.

6 . The Betti numbers of random simplicial complexes are random variables

depending on the pair (no, n2). If we know any two of them, we can find another one by the Euler characteristic formula. Suppose that K is a simplicial complex generated by our probabilistic model. The 0-th Betti number, bo(£), is the number of connected components in K. Intuitively speaking, after adding enough random triangles to K, K contains only one component with large probability. Let us consider the event (bo(X) > 1) (i.e. K has at least

two components ) in model MB; suppose Ci(K) and C2(K) to be any two components of K. We have

6 < V(d(£)) + V(C2(K)) < no (11) where V(CI(JK)) is the number of vertices in component Ci(K) because each component has at least three vertices. Without loss of generality, suppose

3 < V(Ci(£)) < [no/2] (12) where [no/2] is the maximal integer less than or equal to no/2. Let A be subset of V(no) such that A - V(Ci(X)) and 3 < |A| =j < [no/2]; let CA be complement of A in V(no). We pick triangles one by one without repetition. After picking the (i-l)-th triangle, the

n J J number of triangles whose vertices lie in either A or CA is (Dj-i+1), where Dj = C3 °" + C3 is the number of triangles whose vertices lie in either A or CA; the total number of triangles is (Cs^-i+l) after the (i-l)-th triangle is picked. Hence, given that all the previous triangles lie in either A or CA, the probability that all of the i-th triangle's vertices also lie

n in either A or CA is (Dj-iH4)/(C3 °-i+l). We have:

PMB( b0(£)>l ) = PMB( 3 subset of V(no), A, with size j such that some triangles' vertices are in A and the other triangles' vertices are in CA) (13)

[n(/2] = PMB( uj=3 ( |A|=j and 3 d(K) c K s.t. V(d(£)) = A) (14)

7 < ZJPMB( IA|=j and 3 d(K) cKs.t V(Ci(£)) = A) (15)

no < ZjCj^niCDj -i +l)/(C3 -i+l) (16)

no no n2 < ZjCj (D/C3 ) (17)

0 where 3 < j < [no/2] and 1 < i < n2. We get (16) because there are Q" subsets of V(n0) with size j; from formula (16) to (17), we use the same fact as from formula (5) to (6) for Dj < Cs*0.

We calculate formula (17) for some specific values of no and n2 and list the results in Table 1.

(no, n2) (10, 55) (12, 25) (145, 600)

1.526e-152 2.41e-564 PMB(D0>1) < 1.254e-478

Table 1. The upper bounds of PMBO^I) for some specific values of n0 and n2.

In Table 1, we only checked three pairs of (no,n2). Of the data we have, 10 is the

smallest no, 145 is the largest no, and 25/12 is the smallest ratio of n2/no. Because

PMA(DO=1) = 1- PMA(DO>1) > 1- PMB(bo>l), b0 is almost certainly equal to one if n2 is large

with respect to no. Therefore, if we know either one of bi or b2, we can find the other one by the Euler characteristic formula.

In this thesis, we focus on properties of the second Betti number. Some reasons

for us to study b2 rather than bi are: (i) finding b2 requires less computation than findingb i

in that computing b2 requires the computation of only one matrix's rank, but computing bi

requires the computation of ranks of two matrices, (ii) when b2 does not equal zero, it is important to know how many tetrahedra3 and 6-triangles4 the simplicial complex contains.

3 In a two-dimensional simplicial complex, a tetrahedron is a set of the form { {vi,V2,v3}, {vi, v2, v4},

{vi, v3, v4}, {y2, v3, v4} } for any four vertices set {vi, v2, v3> v4}. 4 A 6-triangle is a set of six triangles such that there is a triangle, A, not in the simplicial complex,and 8 If, for some values of (no, n2), the second Betti number, b2, comes from tetrahedra and

6-triangles in that they form a basis for rational H2, we can use a more efficient way to

study the properties of the Betti numbers of the class S(no,n2).

Note that for fixed no, as triangles are added, the second Betti number, b2, does not

decrease. To describe b2, there are many interesting events such as (b2=0), (b2<10), and

(b2< VnT). However, we choose to focus on the event (b2=0) because it is the simplest and

requires the least computational resources. To study the event (b2=0), we should

investigate how the event P(b2=0) depends on the pair (no, n2) or investigate which n2

values give us P(b2=0) near various values for given no. In this thesis we focus specifically

on the three probabilities; P(b2=0) * 10% ( which we take to be 8-12% ), P(b2=0) * 50%

( i.e 48-52%), and P(b2=0) « 90% (i.e. 88-92%). It should be noted that choosing these three numbers (10%, 50%, 90%) is somewhat arbitrary. One can choose any other three

percentages or more to study the event (b2=0). In order to get a broader understanding of

how n2 depends on no when P(b2=0) jumps from one level to another, and to avoid difficulty in analyzing the data, it is better not to choose these percentages too close to one another.

There are many other probabilistic models which add a random triangle. For example, model(i): add to K a random triangle (i.e. repeats of a triangle are allowed); model(ii): add to K a random triangle having no common edges with the triangles already

in K. To study the event (b2=0), we compare the above different random simplicial complex generators with our model. Once there exists two of the same triangles in the

simplicial complex generated by model(i), the second Betti number b2 will be greater than

or equal to one. Hence, P(b2=0) = 0. It is easy to reduce the computation of the second

Betti number, b2, of a simplicial complex generated by model(i) by computing the b2 of

there are exactly two tetrahedra with a common triangle A if A is added into the set. 9 the simplicial complex generated using different triangles. Let Ki be the simplicial complex

generated by model(i), let K2 be the simplicial complex reduced from Ki by keeping only

one copy of each triangle in K\. It is easy to see b2(Ki) = b2(K2) + 2i( multiplicity of triangle i in K\ - 1), where 0 < i < no(K{). We do not use the model (ii), because it is also

easy to see that the second Betti number, b2, of a simplicial complex generated by this model is always zero.

3. The Random Simplicial Complex Generator and Experimental Outcomes

In this section we completely describe the process used to generate a random simplicial complex and our experiments. First, let us review notation used throughout the

rest of this thesis: b2 denotes the second Betti number of a simplicial complexes; d2

denotes the matrix induced by the boundary map, C2 —>d; T(K) denotes the set of triangles of simplicial complex K; B(K) denotes the set of edges of simplicial complex K;

V(K) denotes the vertex set of simplicial complex K; n2(K) denotes the number of triangles

of K such that |T(K)| = n2(K); m(K) denotes the number of edges K such that |E(K)| =

n\(K); no(AT) denotes the number of vertices of K such that |V(£)| = n0(K);

V(n0) = {1, 2, no} and S(no,n2) = { K \ ru>(K) = no, n2(K) = n2, where K is a two- dimensional simplicial complex}.

3.1 The Random Simplicial Complex Generator

Given a pair (no,n2), we want to generate a random simplicial complex, K, according to the model MA. We use three different integers to denote the three vertices of

10 a triangle. For each fixed no, we randomly generate three different integers in V(n0) by C++ library function randO- In other words, we randomly create a triangle with its vertices in V(no). In order to guarantee that a random triangle is generated, we first create an array, N, with size5 no+1 and N[i] = i, where 1 < i < no. Secondly, we generate the first random integer, ii, in V(no). If no ^ ii, we swap N[no] and N[ii]. Thirdly, we generate the

second random integer, i2, in V(no-l). If no-1 * ii, we swap N[no-l] and N[i2]. Swapping the integers in this way ensures that no duplicate integers are generated. Lastly, we

generate the third random integer, i3, in V(no-2). Now we have a triangle { N[n0],

N[no-l], N[i3] }. Because any two triangles are the same if their vertices are different only in order, we sort these three integers in increasing order. With initial F(£) = 0, we generate the maximal faces set of a random simplicial complex K by adding different

random triangles to F(K) until the number of triangles in F(K) reaches n2 (i.e. we discard the repeated triangle if we find it is already in F(K)). Furthermore, we keep all of the triangles generated in lexicographical order. Keeping the triangles in this order allows easier detection of duplicate triangles and easier detection of tetrahedra which will be discussed in section 5.

We should also notice that the number of vertices of that simplicial complex, K, generated by F(£) may be smaller than no. To make sure that no is the value attained, we use an array VT with size no+1 to record the frequency of each vertex when we add the

triangle {vi, v2, v3}. If VT[i] * 0, V 1 < i < no, we list all edges of F(£), arrange them in lexicographical order and put them in B(K). Now a random simplicial complex K such

that K e S(no,n2) has been successfully generated.

5In C++, an array index starts from zero. 11 In order to find the chain matrix d2, we create an n2 by ni matrix 32, for the i-th

triangle (vi, vm, vn} and the j-th edge (vr, vs}, let 92(i, j) = 0, if the edge

{v„ vs} £ {vi, vm, v„}. Let 32(i, j) = 1 if the j-th edge (vr, v„} is either {vi, vm} or {vm, v„},

otherwise 32(i, j) = -1, where 1, m, n, r, and s belong to V(K) such that 1< m < n and r < s. We use double precision to store the matrix entries. Then we perform Gaussian

elimination to the chain matrix, 32, in which we repeatedly choose the first element, c, such that |c| > 10"8 in each column as the pivot element6, and treat c as zero if |c| < 10"8.

By the definition of b2, we have b2 = dim(ker(32)) = n2 - dim(im(32)) = n2 - rank(32).

We give our algorithm to find the distribution of b2 of the class S(no,n2)

Algorithm: input no, n2.

• generate a simplicial complex K e S(no,n2).

• find the chain matrix 32.

• compute the rank of d2 via Gaussian elimination.

• b2 = n2 - rank(32).

7 repeat above 100 times using different random seed to estimate P(b2=0) for

fixed pair (no, n2)

For fixed no, our goal is to find the corresponding n2 such that the pair (no,n2) gives

P(b2=0) « 10%, P(b2=0) « 50%, and P(b2=0) = 90%, respectively. For fixedn 0 < 50, we

choose n2= 4no. If the value n2 results in the P(b2=0) being greater than 12%, then we

increase the first value of n2= 4no by 20 and repeat the same process. We stop increasing

%e guess it is small enough for our matrix. However, when some nonzero entries below the pivot element in a column of a matrix are smaller than 10"8 in exact arithmetric, the rank calculated in this way may be less than the true rank of the matrix. 7 To estimate P(b2=0) for fixed pair (no,n2), the larger sample, the better. We choose 100 as sample size due to

computer time constraint. The standard deviation of P(b2=0) * 10% (or 90%) is 3 for 100 experiments. The

standard deviation of P(b2=0) » 50% is 5 for 100 experiments. 12 n2 once P(b2=0) * 10% or P(b2=0) < 7%. If we cannot get n2 such that P(b2=0) » 10%, we

reduce 20 to 10, use the smallest value of n2 which satisfies P(b2=0) > 13%, and repeat.

For no > 50, we use the same process except that we begin with n2 = 7no and increase the

n2 value using intervals of 40. For P(b2=0) » 50%, we begin with some n2 which is the

nearest to P(b2=0) = 50% in the results of finding n2 such that P(b2=0) « 10%. Then we

decrease n2 until P(b2=0) > 52% or P(b2=0) « 50%. If we cannot find the appropriate n2,

then we increase n2 and repeat the same process. To find n2 such that P(b2=0) « 90%, we

start with n2=2no and increase this n2 value using intervals of 10, until that we find some n2

that yields P(b2=0) < 88%. Then we adjust n2 by increasing the value and repeat the same

process. After we find the first n2, we can adjust this n2 value slightly and input new

random seed to find another n2. We should point out that choosing n2=2no, n2=4no,

n2=7no, and increasing values is somewhat arbitrary. One can choose other small values of

n2 and increase these values to achieve the desired values of P(b2=0).

Our code is written in C++. We use double precision to store the matrix entries

which introduces two problems. One problem is that we cannot compute b2 for large n2 and ni due to the lack of computer storage space. In fact, this is a general problem for any large computation. The other problem is that using float precision to compute the rank of the matrix may be unreliable. We used exact multiple precision calculations to check some floating point calculations. It is clearly shown in Table 2 that we get the same results for the pairs (12, 100) and (20, 200). However, multiple precision calculations did not work in our computer when testing the large pair (25, 300), because the matrix involved could be as large as 300 by 300. For such a pair the computer gave the error message "Virtual memory exceeded in 'new' ". Hence, the multiple precision calculations need much more storage space than double precision calculations, and cannot compute the second Betti number of the simplicial complexes such as those in S(25,300). To overcome the above

13 problems, a new technique for computing the Betti numbers was developed, which is faster and requires less storage [3].

(12, 100) (20, 200) (25, 300) (no,n2)

floating point precision b2=10 b2=18 b2=8

out of memory multiple precision b2=10 b2=18

Table 2. Computation results of floating point and exact precision for some data

3.2. The Data and Data Analysis

To obtain a comprehensive notion of behaviour of event (b2=0), one must get large

number of (no,n2) sample pairs. This, however, is not practical for us because of limited resources and time. For this reason, we had to confine our study to a limited number of

test result and in our case we choose 100 (no,n2) pairs.

For P(b2=0) « 10%, P(b2=0) » 50%, and P(b2=0) « 90%, the scatterplots are

shown in Figure 1-3, respectively. For example, when P(b2=0) « 10%, each point (no, n2) in

the plot denotes that P(b2=0) of the class S(no, n2) is near to 10%. We use different symbols

to denote the different values of P(b2=0) (see Figure 1). As well, Table 3-5 list the sample forms for each case.

P(b2=0) 8% 9% 10% 11% 12%

distribution of data 17 20 18 20 25

Table 3. Data distribution for P(b2=0) * 10%

14 P(b2=0) 48% 49% 50% 51% 52%

distribution of data 24 15 19 15 27

Table 4. Data distribution for P(b2=0) «50%

P(b2=0) 88% 89% 90% 91% 92%

distribution of data 22 18 18 16 26

Table 5. Data distribution for P(b2=0) » 90%

In Figure 1-3, there seems to be a strong linear relationship between no and n2.

Therefore, to analyze the data, we consider model (A), n2 = Co + aono for the data such

that P(b2=0) = 10%, P(b2=0) = 50% and P(b2=0) = 90%, where ao and c0 are unknown regression coefficients. For each case, we have about 20 sample points (see Table 3-5).

We also consider model (B), n2 = c0 + aono, for the data such that P(b2=0) « 10%, P(b2=0)

« 50% and P(b2=0) » 90%, where ao and c0 are unknown regression coefficients. For each case, we have 100 sample points which are all assumed to be of the same probability (see Table 3-5). In order to make the best use of the data, it is reasonable to consider the

coefficients, ao and Co in model (B), as the function of p = P(b2=0). This will hopefully give us a less biased estimate of ao and Co because much more information is provided. The

simplest case is the linear part of the Taylor series at a specific value of P(b2=0). We consider the model (C) to be:

n2 = Co + ci( p - 10%) + ( ao + ai( p - 10%))no for P(b2=0) « 10%,

n2 = Co + ci( p - 50%) + ( ao + ax( p - 50%))no for P(b2=0) « 50%, and

n2 = Co + ci( p - 90%) + ( ao + ai( p - 90%))no for P(b2=0) « 90%,

15 where ao, ai, c0, and Ci are unknown variables, and p is the observed probability for the data.

A useful measure of how well an estimated regression fits the observed (no, n2) is

2 the R -value, the squared correlation between the observed n2 and f(no), where fQ is the

2 2 1 2 fitted function. The formula for the Revalue is R = 1- JXpj - fW)) /!^ - E(n2)) ,

1 where E(n2) is the mean of sample n2, fi^no )i s the fitted value at point no' and 1 < i < 100. If R2-value is near to zero, then f() is a poor fit; if R2-value is near to one, f() is a good fit. However, it is not sufficient merely to study the R2-value and then to evaluate the model on this basis. There are many examples with the same R2-value, but the nature of the relationship between the two variables on the plots varies enormously as is shown in [4]. Clearly, the scatterplots are vital for helping us to gain real insight into the nature of the data.

All of our three models can be fit using the various statistical functions, (i.e. multiregress and R-squared ) in the Maple package developed by the University of Waterloo. The results are shown in Table 6-8.

10% parameters R-squared value

model (A) ao=9.6927, Co =-67.907 0.9908

model (B) ao=9.51, Co = -61.384 0.9926

model (C) ao=9.50567, ai=-4.02949, Co=-61.05784, Ci=105.16423 0.9963

Table 6. Regression functions of model (A), model (B) and model (C)

16 50% parameters R-squared value

model (A) ao=6.6807, Co=-50.928 0.9926

model (B) ao=6.7691, c0= -57.775 0.9913

model (C) ao=6.6744, a^-1.50752, CQ= -50.6871, Ci=36.8622 0.9972

Table 7. Regression functions of model (A), model (B) and model (C)

90% parameters R-squared value

model (A) ao=4.0021, c0=-36.445 0.9869

model (B) ao=4.1481,Co=-47.141 0.9839

model (C) ao=4.0577, ^=-0.11962, Co=-42.111, Ci=-4.6374 0.9956

Table 8. Regression functions of model (A), model (B) and model (C)

Each model gives a good fit for each of P(b2=0) « 10%, P(b2=0) « 50% and

P(b2=0) = 90%. The difference between the coefficients, ao and c0 in model (A) and (C), shows that model (A) and model (C) are consistent. Although the difference between the R-squared values is very small, model (C) improves the fit for each case because it

contains more parameters. In our data , if the pair (no,n2) satisfies a linear relationship

(see Tables 6-8 ), P(b2=0) achieves some specific values (i.e. 10%, 50%, 90%). This linear

17 relationship is limited by the data scope for P(b2=0) = 10%, where 10 < no < 80; for

P(b2=0) « 50%, where 12 < no < 100; and for P(b2=0) « 90%, where 12 < no < 145. We

display the regression line for P(b2=0) = 10%, P(b2=0) = 50%, and P(b2=0) = 90% in Figures 4, 6, and 8, respectively. In Figures 5, 7, and 9, we give the regression lines generated by model (B). In Figures 5, 7, and 9, we have indicated all of the different probabilities with the same symbol due to computer software contraint.

When P(b2=0) « 10%, 50% and 90%, the distributions of b2 are shown in

Figures 10-12, 13-15, and 16-18, respectively, where E(b2) means the expectation of the

second Betti number, b2, of the sample simplicial complexes and S(b2) means the unbiased

variance of b2 of the sample simplicial complexes. For each case, we also investigated

many other b2's distributions for different pairs (no, n2), but their shapes are all similar to those already shown in Figures 10-18.

4. Tetrahedra and Six-triangles in a Simplicial Complex

By a " hole" in a simplicial complex, K, we mean that a subset of K, denoted by H, such that for each 1-simplex c e H, there are exactly two 2-simplices in H, namely a and P, 3 o = anp\ Its size is the number of 2-simplices in it. Intuitively, a hole is soccerball whose patches are triangles rather than octagons. By a collection of independent holes in a simplicial complex, K, we mean that these holes are independent as 2-chains. In some specific situations, the number of independent holes in K are equal to the second Betti number of K. Therefore, one way to approximate the second Betti

18 number of a simplicial complex is to detect all of the independent holes in it. Suppose there are k triangles making up a hole, then the number of edges of this collection triangles is 3k/2 which must be an integer. Therefore, k is even. In the simplicial complex generated by our model, there are no duplicate triangles. We have k = 2m, where m > 2. Hence, we approximate the second Betti number by findingal l independent tetrahedra ( i.e. holes of size four), 6-triangles ( holes of size six), and so on in the simplicial complex. For our specific values of P(b2=0), we guess that the second Betti number mainly comes from tetrahedra and 6-triangles. In this section, t(K) denotes number of tetrahedra in the simplicial complex K; s(K) denotes number of 6-triangles in the simplicial complex K. We calculate the probability of the event occurring in model MB.

4.1. The Expected Number of Tetrahedra and Six-triangles in S(n0,n2)

Given no and n2, in order to calculate the expected number of tetrahedra with respect to model MB, we pick four triangles in T(K) to see if they form a tetrahedron,

where T(K) consists of n2 random triangles. We order the four triangles arbitrarily such as {triangle 1, triangle 2, triangle 3, triangle 4 }. Notice that if the four triangles are to form

a tetrahedron, the first triangle can have its three vertices chosen arbitrarily. To form a tetrahedron with the first triangle, the second triangle must have two common vertices with the first triangle. The number of ways of choosing these two vertices is 3, the number of ways of choosing one other vertex is no-3, and the number of triangles is (Cs^-l) because no repeated triangles are allowed, so the probability that the second triangle forms

no the start of a tetrahedron with the first triangle is 3(no-3)/(C3 -l). For the third triangle to form the start of a tetrahedron with the previous triangles, two vertices are fixed, so we can only choose one of two vertices which are common vertices of the first two triangles, therefore, the probability that the third triangle forms the start of a tetrahedron with the

1? previous triangles is 2/(C3°0-2). The fourth triangle's vertices are fixed, which determine only one triangle, then the probability that the fourth triangle forms the start of a tetrahedron with the previous triangles is l/(C3"°-3). Thus, the probability ofK containing

n<, n<, no a tetrahedron is [3(n0-3)/(C3 -l)][2/(C3 -2)][l/(C3 -3)]. The number of the collection of

four triangles in n2 triangles is As a result, the expected number of tetrahedra in K, E(T), is

nj n,, n,, n<, E(T) = C4 [3(n0-3)/(C3 -l)][2/(C3 -2)][l/(C3 -3)] (18)

4 8 2 Formula (18) is approximately 54n2 /no . When n2 « 0.38no, we have E(T) « 1; when

4 4 n2=cno, we have E(T) » 54c/no, where c is a constant. Notice that E(T) goes to zero as

n0 goes to infinity.

Let C(no) be the set of triangles whose vertices are in set V(no) and CF(K) be the

n<> complement of F(X) in C(no). The size of CF(K) is n« = | CF(K) |= C3 - n2. By the definition of 6-triangle, we know that a 6-triangle in the simplicial complex K is determined by one triangle in the set CF(K) and the two other vertices in V(no). Fix a triangle, a, in CF(K). The number of ways of choosing the two other vertices in V(no) to form a 6-triangle with a is C^'3. Fix these two vertices. The probability that any six

N triangles with fixed vertices in F(£) form a 6-triangle is 1/C6 °, where No = C3"°-l, since N0

is the number of triangles excluding a, and C6N° is the total number of the collection of six

1 triangles in N0 triangles. The number of the collection of six triangles in K is C^ . The expected number of 6-triangles mK, E(S), is therefore:

nj n<> E(S) = C6 (C3 - n2) CF*IC?* (19)

20 6 13 When n2 « C3"°, E(S) is .approximately 3888n2/no ; when n2=cno, we have E(S) « 3888c6/no7, where c is a constant. Notice that E(S) goes to zero as no goes to infinity.

4.2. The Processes to Detect Tetrahedra and Six-triangles A tetrahedron in the simplicial complex is made up of four triangles of the form

{vi, v2, v3}, {vi, v2, v4}, {vi, v3, v4} and {v2, v3, v4}. For each triangle {vi, v2, v3} in the

simplicial complex K, we look for the second triangle of the form (vi, v2, v4}, where v4 is

any vertex in K such that v4 is greater than v3, if the triangle {vi, v2, v4} is not found, then

we choose the next triangle after {vi, v2, v3} as the new {vi, v2, v3} and start again. If the

second triangle {vi, v2, v4} is found, then we look for the third triangle {vi, v3, v4} mK. If

the third triangle (vi, v3, v4} is found, then we look for the fourth triangle {v2, v3, v4}.

Otherwise we choose the next triangle below (vi, v2, v3} as {vi, v2, v3}. If we cannot find

the fourth triangle (v2, v3, v4}, then we choose the next triangle below (vi, v2, v3}. As

well, we record the fourth triangle in set M for later use. If the fourth triangle (v2, v3, v4} is found, we have then found one tetrahedron in K. Each tetrahedron independent of the

previous ones in the simplicial complex increases the second Betti number, b2, by one.

To find the number of 6-triangles, we let Ka = KKJG for each a in M and find the

number of tetrahedra in Ka. Because of the way we generate the set M, we know that

there is at least one tetrahedron in K 1, then K contains

d (C2 °) 6-triangles which increases the second Betti number, b2, by da- 1. For each element

d c in M, we have (C2 °) 6-triangles. The total number of 6-triangles in the simplicial

d complex K is s(K) = £a(C2 °). We sum all of the do-1 over each element in M to get contribution to the second Betti number by 6-triangles.

21 We should point out that any one 6-triangle and any two tetrahedra found in this

way are independent. Because we focus on the specific probabilities of event (b2=0)

8 n (i.e n2«C3 °), we conjecture that at these probabilities the second Betti number of the

simplicial complex K is greater than or equal to t(K) +£o(d0-l) with large probability The main advantage of finding tetrahedra and 6-triangles is that we only need to store the triangle set, F(X), which is far smaller than the corresponding chain matrix, especially when no is large. The algorithm is simple and the only operation is to compare the two triangles to see if they are same. Because it requires less storage space, computing the number of tetrahedra and 6-triangles is quick. Although it is not clear how to find the number of higher-triangles efficiently, we might give a lower bound of the second Betti

number, b2, for a simplicial complex through detecting the tetrahedra and 6-triangles in it under certain situation.

Table 9 gives the experimental data resulting from calculations performed on some

pairs (no,n2) with no small. P(# of tetrahedra > 0) = 5% means, among 100 trials, there are 5 sample simplicial complexes which contain tetrahedra. The average number of tetrahedra by our experiment is given as E( # of tetrahedra ). E(T) is the expected number of tetrahedra calculated by formula (18). E(S) is the expected number of 6-triangles calculated by formula (19).

When n2 is small, the tetrahedra and 6-triangles seem to be independent 22 (no, n2) (12,25) (15,35) (20,143) (30,80)

P(b2*0) 10% 10% 89% 11%

E(b2) 0.12 0.1 2.65 0.12

P(# of tetrahedra >0) 10% 5% 52% 8%

E(# of tetrahedra) 0.11 0.05 1.27 0.08

P(#of6-triangles>0) 0 0 18% 0

E(# of 6-triangles) 0 0 0.22 0

E(T) 0.05 0.04 1.16 0.004

E(S) 0.009 0.004 0.41 0.00006

Table 9. Portion of b2 coming from tetrahedra and 6-triangles

23 112

750 T

O

700 + 0 o ft

650

600

O 550 o

500 +

• 8% 450 I 8 0 9% 400 6 A10%

350 f ° | O ^

250 4-

200 + . & O

150 +

100

50

H 1 1 1 1 1——H 1 1 1 1 1 1 1 1 1 no 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Figure 1. Scatter plot of m against no when P(b2=0) near 10%

24 m 650 "

600 " o * 9 550 " $ 3 500 " $ a 450 " ?

048% 400 * ft S 049%

350 • A 50% o t -51% 300 • • 52%

250 - * 200 - ***

150 " a

100 "

• 50 "

no < i i i i i i i i i i i i i i i i i i 0 i 1 1 t ;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Figure 2. Scatter plot of m against no when P(b2=0) near 50%

25 112

• o . • q •

o A

O" •

0 88% • ft O 89%

A 90%

-91%

• 92%

95

no 100 110 120 130 140 150

Figure 3. Scatter plot of m against no when P(b2=0) near 90%

26 27 28 29 30 31 Figure 9. Regression line of 112 against no, when P(b2=0) near 90%

32 probability of D2

0.8

0.6

0.4 4-

24% 18% 15% 0.2 12% 15% 9% 6% 1%

2 3 4 5 b2 Figure 10. Distribution of bi when no=20 112=143 E(b2>=2.65 S(b2)= 1.74295

probability of b2

0.8 4-

0.6

0.4 29% 22% 12% 15% 9% 0.2 7% 3o/o 3o/0 b2 1 I - 0 1 2 3 4 5 6 7 Figure 11. Distribution of D2 when no=30 m=200 E(b2)=2.35 S(b2) =1.72548

probability of D2

0.6

0.4 23%.... 0.2 18% 1^70 14% -8%. —\ 1%—I 0 !— 1 2 3 4 5 b2 Figure 12. Distribution of b2 when no = 50 m = 400 E(b2)= 2.65 S(b2)=1.6959

33 probability of b2

bz 0 1 2 3 4 5 Figure 13. Distribution of b2 when no= 45 n2= 256 E(b2)=.75 S(b2)=1.04563 probability of b2

0.8 --

0.6 "" 50% -- 34%

-- 19% 4% 1 \ b2 0 1 2 3

Figure 14. Distribution of b2 when no= 50 m=260 E(b2)= .7 S(b2)=.834847

probability of b2

0.8

0.6 48%

0.4 33%

0.2 + 13%

5% 1% =ib2 0 12 3 4 Figure 15. Distribution of b2 when no =80 m=500 E(b2>= .78 S(b2)=1.00449

34 probability of b2

l T 89%

0.8 4

0.6

0.4

0.2 4 10% 1%

b2

Figure 16. Distribution of D2 when no=30 112= 80 E(b2>= .12 S(b2)=.356186

probability of b2

1 92%

0.8

0.6 4

0.4

0.2

0 4 1 1 1 0 1 2 Figure 17. Distribution of b2 when no=75 m=250 E(b2>= .08 S(b2)= .027266

probability of 02 89%

0.8 4

0.6

0.4

0.2 4 10% i2L 0 =1 b2

Figure 18. Distribution of b2 when no = 100 m=350 E(b2>=.12 S(b2)=.356186

35 Bibliography

[1] Bela Bollobas. Random Graphs. Academic Press. 1985

[2] James R. Munkes. Element of Algebraic Topology. Academic Press. 1984.

[3] Joel Friedman. Computing Betti Numbers via Combinatorial Laplacians. 1995 Preprint.

[4] John M.Chambers etc. Graph Methods for Data Analysis.Duxbury Press. 1983

36