<<

Describing Graphs a

FirstOrder Approach to

Graph Canonization

Neil Immerman

Eric Lander

ABSTRACT In this pap er we ask the question What must b e added to

rstorder logic plus leastxed p oint to obtain exactly the p olynomialtime

prop erties of unordered graphs We consider the languages L consisting

k

of rstorder logic restricted to k variables and C consisting of L plus

k k

counting quantiers Wegive ecient canonization algorithms for graphs

characterized by C or L It follows from known results that all trees and

k k

almost all graphs are characterized by C

This pap er app eared in Complexity Theory Retrospective Alan Selman ed

SpringerVerlag

Intro duction

In this pap er we present a new and dierent approach to the graph canon

ization and isomorphism problems Our approachinvolves a combination

of complexity theory with mathematicallogicWe consider rstorder lan

guages for describing graphs We dene what it means for a language to

characterize a set of graphs Denition We next dene the languages

L resp C consisting of the formulas of rstorder logic in whichonly k

k k

variables o ccur resp L plus counting quantiers We then study which

k

sets of graphs are characterized by certain L s and C s It follows bya

k k

result of Babai and Kucera that the language C characterizes almost

all graphs WealsoshowthatC characterizes all trees In Section we

k

give a simple O n log n step algorithm to test if two graphs G and H on

Computer Science Dept University of Massachusetts Amherst MA

Research supp orted by NSF grants DCR and CCR This pap er

app eared in in Complexity Theory Retrospective Alan Selman ed Springer

Verlag

Whitehead Institute Cambridge MA Research supp orted bygrants

from the National Science Foundation DCB and from the System De

velopmentFoundation G

Neil Immerman Eric Lander

n vertices agree on all sentences in L orC IfG is characterized by L

k k k

or C a variant of this algorithm computes a canonical lab eling for G in

k

the same time b ound

This line of researchhastwo main goals First nding a language appro

priate for graph canonization is a basic problem central to the rst authors

work on descriptive computational complexityWe will explain this setting

in the next section

A canonization algorithm for a set of graphs S gives a unique ordering

canonical lab eling to each isomorphism class from S Thus two graphs

from S are isomorphic if and only if they are identical in the canonical

ordering The second goal of this work is to describ e a simple and general

class of canonization algorithms We hop e that variants of these algorithms

will be powerful enough to provide simple canonical forms for all graphs

and do so without resorting to the the high p owered group theory needed

in the present b est algorithms

Descriptive Complexity

In this section we discuss an alternate view of complexityinwhichthe

complexity of the descriptions of problems is measured This approachhas

provided new insights and techniques to help us understand the standard

complexity notions time memory space parallel time numb er of pro ces

sors The motivations for the present pap er come from DescriptiveCom

plexityWe can only sketch this area here The interested reader should

consult for a more extensivesurvey

Given a prop erty S one can discuss the computational complexityof

checking whether or not an input satises S One can also ask What

is the complexityofexpressing the prop erty S It is natural that these

two questions are related However it is startling how closely tied they are

when the second question refers to expressing the prop erty in rstorder

logic We will now describ e the rstorder languages in detail Next wewill

state some facts relating descriptive and computational complexity

FirstOrder Logic

In this pap er we will conne our attention to graphs and prop erties of

graphs Thus when wemention complexity classes P NP etc we will really

b e refering to those problems of ordered graphs that are contained in P NP

etc If you wanttoknowwhy the word ordered was included in the

previous sentence please read on One of the main concerns of this pap er

is how to remove the need to order the vertices of graphs

For our purp oses a graph will b e dened as a nite logical structure

G hV Ei V is the universe the vertices and E is a binary relation

on V the edges As an example the undirected graph G hV E i

Describing Graphs a FirstOrder Approach to Graph Canonization













FIGURE An Undirected Graph

pictured in Figure has vertex set V f g and edge relation

E fh i h i h i h ih i h ig consisting of pairs cor

resp onding to the six undirected edges By convention we will assume that

all structures referred to in this pap er have universe f n g for

some natural number n

The rstorder language of is built up in the usual way

from the variables x x the relations symbols E and the logi

cal connectives and the quantiers and The quantiers

range over the vertices of the graph in question For example consider the

following rstorder sentence

xy E x y E y x x y

says that G is undirected and lo op free Unless we sp ecically say other

wise we will assume that all graphs G in this pap er satisfy insymb ols

G j

It is useful to consider an apparently more general set of structures

The rstorder language of coloredgraphs consists of the addition of a

countable set of unary relations fC C g to the rstorder language

of graphs Dene a coloredgraph to b e a graph that interprets these new

unary relations so that all but nitely many of the predicates are false at

eachvertex These unary relations may b e thought of as colorings of the

vertices A vertex of a colored graph may satisfy zero one or several of

the color relations However wewillsaythattwovertices are the same

color i they satisfy the same set of color relations Thus by increasing

the numb er of color relations wemay assume that eachvertex satises a

unique color relation

Colorings can b e simulated in uncolored graphs by attaching gadgets For

example a colored graph G with colors green and yellow can b e mo delled as a

graph G with some auxilliary vertices so that in G eachvertex v from G is now

connected to either a triangle or a square or a p entagon or a hexagon according

as v is green yellow green and yellow neither green nor yellow All mention of

color predicates in this pap er can b e removed in this way

Neil Immerman Eric Lander

Results from Descriptive Complexity

The ordering n ofthevertices of a graph is irrelevantto

a graph prop erty Unfortunately however it is imp ossible to presentan

unordered graph to a computer The vertices and edges must b e presented

in some order Furthermore many algorithms compute graph prop erties

by visiting the vertices of the graph in some order which dep ends on this

arbitrary ordering

Let FO denote the rstorder language of ordered graphs This lan

guage includes the logical relation whichmust b e interpreted as the usual

ordering on the vertices V f n gWe will see in Fact

that FOiscontained in CRAM the set of prop erties that can b e

checked by a concurrent parallel random access machine in constant time

using p olynomially many pro cessors

In order to express a richer class of problems we consider uniform se

quences of rstorder sentences f g where the sentence expresses

i n

iN

the prop erty in question for graphs with at most n vertices

Let FOtn VARv n b e the set of problems expressible bya

th

uniform sequence of rstorder sentences suchthatthen sentence has

length O tn and uses at most v n distinct variables The following fact

says that the set of p olynomialtime recognizable prop erties of ordered

graphs is equal to the set of prop erties expressible by uniform sequences

of rstorder sentences of p olynomial length using a b ounded number of

distinct variables

Fact

k

P FOn VARk

k

The Least Fixed PointLFP op erator has long b een used by logicians

to formalize the p ower to dene new relations by induction cf In

and in it is shown that the uniform sequence of formulas in Fact

can b e represented by a single use of LFP applied to a rstorder formula

Thus

Fact

k

P FO LFP FOn VARk

k

th

The uniformity in question can b e purely syntactical ie the n sentences

of a FOtn prop erty consists of a xed blo ck of quantiers rep eated tn times

followed by a xed quantierfree formula Uniformity is discussed extensively in

In this pap er the reader may think of uniform as meaning that the map from

n to is easily computable eg in logspace n

Describing Graphs a FirstOrder Approach to Graph Canonization

Example The monotone circuit value problem is an example of a

complete problem for P whichwe will use to illustrate Fact

Instances of this problem consist of b o olean circuits with only and and

or gates and a unique output gate whose value as determined bythe

inputs is one

We can co de the monotone circuit value problem as a colored directed

graph with colors T A R as follows inputs are colored True if they are on

other vertices are colored And if they are and gates otherwise they are

or gates The unique output gate is colored RootTheobvious inductive

denition of circuit value can b e written in FO LFP as follows

h

V x T x y E x y V y

i

Ax y E x y V y

The intuitive meaning of V xisthatthevalue of no de x is one Thus

V xistrueif x is an input that is on or if x is an or gate and there

exists a gate y suchthatV y and y is an input to xorifx is an and

gate having at least one input and all of its inputs y satisfy V y

Fix a graph C The formula V x induces an op erator from unary

C

relations on the vertices of C to unary relations on the vertices of C as

follows

C j R z R z

C

Furthermore the correct Value relation on the circuit C is the least xed

pointof ie the smallest unary relation V such that V V Use

C C

the notation LFP V x to denote this least xed p oint

Let x there exists a unique x such that b e an abbreviation for

the formula

xx y y y x

The circuit value problem can now b e expressed as follows

w Rw w Rw LFP V xw

A particular kind of inductive op erator that is worth studying on its own

is the transitive closure op erator TC intro duced in Let x ybe

any binary relation on k tuples Then TC x y denotes the reexive

transitive closure of The following was proved in with the nishing

touchproved in

Fact

FO TC NSPACElog n

Neil Immerman Eric Lander

Example Consider the following complete problem for NSPACElog n

The GAP problem consists of the set of directed graphs having a unique

vertex colored S and a unique vertex colored T and such that there is a

from the S vertex to the T vertex It is obvious how to express GAP

in FO TC

xS x xT x stS s T t TC E x y x t

It is interesting to examine the relationship b etween the number of vari

ables needed to describ e a problem and the computational complexityof

the problem Let FOtn VARv b e the restriction of FOtn to sen

tences with at most v distinct variables Then the following b ounds can b e

derived from the pro of of Fact in

Fact

k k k

DTIMEn FOn VARk DTIMEn

k

Thus the DTIMEn prop erties of ordered graphs are roughly the prop

k

erties expressible by rstorder sentences with k variables and length n

Obviously this is very rough A closer relationship b etween machine com

plexity and rstorder expressibility is obtained if one takes into account

the built in parallelism of quantiers

Let CRAMtnPROCpn b e the set of problems accepted by a concurrent

read concurrentwrite parallel random access machine CRAM in paral

lel time O tn using O pn pro cessors In order to get a precise rela

tionship with the CRAM mo del when tn log n it was necessary to

add another logical relation to FO Since variables range over the universe

f n g they may b e thoughtofaslogn bit numb ers Let the re

th

lations BIT x y betruejustifthex bit in the binary expansion of y is

a one In it is shown that FO BITtn VARO is exactly the

set of prop erties checkable by a CRAM in parallel time O tn using p oly

nomially many pro cessors In fact FO BIT tnVARv corresp onds

v

to CRAMTIMEtn using roughly n pro cessors

Fact For al l tn

O

FO BIT tn VARO CRAMtnPROCn

In particular wehave that the rstorder prop erties are those checkable

in constant time by a CRAM using p olynomially many pro cessors

Fact

O

FO BIT CRAM PROCn

Describing Graphs a FirstOrder Approach to Graph Canonization

Prop erties of Unordered Graphs

Facts and give natural languages expressing resp ectively the

p olynomialtime and nondeterministic logspace prop erties of ordered graphs

When the ordering is not present it is p ossible to prove nearly optimal up

p er and lower b ounds on the numb er of quantiers and variables needed to

express various prop erties in rstorder logic

For example in the graphs Y and N are constructed These graphs

k k

have the prop ertythat Y has a complete subgraph on k vertices but N

k k

do es not However using EhrenfeuchtFraisse games cf Section one can

showthatY and N agree on all sentences with k variables but without

k k

ordering It thus follows that k variables are necessary and sucientto

express the existence of a complete subgraph of size k If these b ounds

applied to the languages with ordering they would imply that P NP

In there is a similar construction of a sequence of pairs of graphs

which dier on a p olynomialtime complete prop erty but agree on all

sentences of p olylogarithmic length without ordering If this result went

through with the ordering it would followthatNCP and in particular

that NSPACElog n is not equal to P

The reason these arguments do not go through with ordering is as follows

For any constant cthereisavery simple formula with ordering x

c

th

that holds just for x equal to the c vertex It follows that whenever two

graphs agree on all simple sentences with ordering they are equal

It is of great interest to understand the role of ordering and if p ossible

to replace the ordering with a more b enign construction Furthermore

the most basic problem on which to study the role of ordering is graph

isomorphism If two graphs dier on any prop erty they are certainly not

isomorphic

Let a graph property b e an order indep endent prop erty of ordered graphs

One can ask the question

Question Is there a natural language for the time graph

properties

Gurevich has conjectured that the answer to Question is No

An armativeanswer to this question would imply a similar answer

to the more basic

Question Is therearecursively enumerable listing of al l polynomial

time graph properties

Questions and are imp ortantinvarious settings It is well

More precisely the formula has variables and length O log n

Neil Immerman Eric Lander

known that graphs are the most general logical structures Thus these

questions are equivalent to the corresp onding questions concerning rela

tional databases ie give a database query language for expressing exactly

the p olynomialtime queries that are indep endent of the arbitrary ordering

of tuples cf We b elieve that the answers to Questions and

are b oth Yes and we ask the more practical

Question What must we add to rstorder logic after taking out

the ordering so that Fact remains true Put another way describea

language L that expresses exactly the polynomialtime graph properties

The ordering relation is crucial for simulating computation a Turing ma

chine will b e given an input graph in some order It will visit the vertices

of the graph using this ordering and it is dicult to see how to simu

late an arbitrary computation without reference to this ordering It is well

known that rstorder logic without ordering is not strong enough to ex

press computation Let EVEN b e the set of graphs with an even number

of vertices We will show in Prop osition that the prop erty EVEN

requires n variables for graphs with n vertices For a prop erty to b e ex

pressible in FOLFP a necessary condition is that it is expressible in a

constantnumber of variables indep endentof n

In view of Prop osition it is natural to add the abilitytocountto

rstorder logic without ordering This is formalized in Section where

we dene the languages C of rstorder logic restricted to k distinct vari

k

ables plus counting quantiers Weshow in Corollary that the very

simple language C suces to give unique descriptions and thus ecient

canonical forms for almost all graphs

Foralongtimewe susp ected that rstorder logic plus least xed p oint

and counting was enough to express all p olynomialtime graph prop erties

It would have immediately followed that for each p olynomialtime graph

prop erty Q there would b e a xed k such that for all n the prop erty

Q restricted to graphs of size n is expressible in C In particular if our

k

suspicion were right then for every set of graphs S admitting a p olynomial

time graph isomorphism algorithm there would exist a xed k such that C

k

characterizes S to b e dened later This implies that for anytwo graphs

G and H from S ifG and H are C equivalent ie G and H agree on

k

all sentences from C then they are isomorphic For example the sets

k

of graphs of b ounded color class size dened b elow admit p olynomial

time graph isomorphism algorithms Weshow in Prop osition that the

language C characterizes graphs of color class size However the following

recent result shows in a strong waythatnoC characterizes the graphs of

k

More preciselyevery rstorder language maybeinterpreted in the rst

order theory of graphs Wewould like to know to whom this is due and where

it app ears in print

Describing Graphs a FirstOrder Approach to Graph Canonization

color class size Thus our suspicion was wrong rstorder logic plus least

xed p oint and counting do es not express all the p olynomialtime graph

prop erties

Fact There exists a sequenceofpairs of nonisomorphic graphs

fG H g such that G and H have O n vertices color class size and

n n n n

admit linear time and logspacecanonization algorithms However G and

n

H are C equivalent

n n

Characterization of Graphs

Throughout this pap er we will b e considering various languages for describ

ing colored graphs We are interested in knowing when a language suces

to characterize a particular graph or class of graphs Some of the following

denitions and notation are adapted from

Denition For a given language L wesay that the graphs G and H

are Lequivalent G H i for all sentences L

L

G j H j

A partial valuation over a graph G V E is a partial function u

fx g V The domain of u is denoted u Callak conguration

over G H a pair u v where u is a partial valuation over G and v is a

partial valuation over H suchthatu vfx x g If u v isa

k

k conguration over G and H wesaythatG u and H v are Lequivalent

G u H v iforallformula L with free variables from x x

L k

G u j H u j

Using the concept of Lequivalence we can now dene what it means for

the language L to characterize a set of graphs

Denition WesaythatL k characterizes G i for all graphs H and

for all k congurations u v over G H ifG u and H v are Lequivalent

then there is an isomorphism from G to H extending the corresp ondence

given byu v In symb ols

G u H v f v u f G H

L

Wesay that L characterizes G i L characterizes G for all colorings G

of GFor a set of graphs S wesaythatL characterizes S i for all G S

L characterizes G

Neil Immerman Eric Lander

Prop osition Let GRAPHS be the set of al l nite coloredgraphs

and let FObe the rstorder language of coloredgraphs Then FOcharac

terizes GRAPHS

Pro of Let G GRAPHS have n vertices and let u b e a partial valuation

over GFor simplicity supp ose that u fx x gandux ux

k k

are all distinct Let g g be a numb ering of Gs vertices so that

n

g ux for i k Letr b e a subscript greater than that of any

i i

color relation holding in G It is simple to write a rstorder formula

r

with n k quantiers that says a there exist x x such that the

k n

x s are all distinct b any other vertex is equal to one of the x s c each

i i

pair x x has an edge or not exactly as the edge g g o ccurs or not

i j i j

in G and nally d for each x i n and each C j r C x holds

i j j i

exactly if C g holds in G Let H be any graph and let r b e greater than

j i

the index of any color relation holding in H Letv be anyvaluation over

H such that H v satises Letv b e an extension of v to a valuation

r

over H with v fx x g making the quantierfree part of true

n r

Then clearly f g v x is the desired isomorphism

i i

Prop osition leads to an inecient graph canonization algorithm In

the next section we consider languages weaker than full rstorder logic

in order to obtain ecient algorithms

The Language L

k

Dene L to b e the set of rstorder formulas such that the quantied

k

variables in are a subset of x x x Note that variables in rst

k

order formulas are similar to variables in programs they can b e reused

ie requantied For example consider the following sentence in L

x x E x x x E x x

The sentence says that every vertex is adjacent to some vertex whichis

itself not adjacenttoevery vertex As an example the graph from Figure

satises Note that the outermost quantier x refers only to the free

o ccurrence of x within its scop e

In this section we will consider the question Which graphs are charac

terized by L Dene a color class to b e the set of vertices which satisfy

k

a particular set of color relations and no others The color class size of a

graph is dened to b e the cardinality of the largest color class

Prop osition L characterizes the coloredgraphs with color class

size one

Describing Graphs a FirstOrder Approach to Graph Canonization

Pro of This is clear In L we can assert that each color class is of size

at most one eg x x B x B x x x We can also say

which edges exist eg the blue vertex is connected to the red vertex Thus

if graph G has color class size one and if G g H h then there is an

L

isomorphism f G H Since f preserves colors f g h

Next we consider the much more p owerful language L In this language

we can express the existence of paths

Prop osition For any natural number r the formula P x x mean

r

ing that thereisapath of length at most r from x to x can be written in

L

Pro of By induction P x x is E x x x x Inductively

P x x x P x x P x x

st s t

Note that a maximum of distinct variables is used

We will see in Section that there are graphs with color class size

that cannot b e distinguished byasentence in L The abilityofL to talk

ab out path lengths makes it slightly less trivial

Prop osition L characterizes graphs of color class size at most

three

Pro of Let G and H b e colored graphs let g and h be vertices of G and H

and supp ose that G g H hWe will build an isomorphism f G H

L



such that f g h

We rst rene the colorings of the vertices of G and H to corresp ond to

L typ es For A B fG H gvertices a A and b B will have the same

L rened color i they satisfy the same L formulas ie

k k

x

x





L A j L B j

k k

a

b

The following lemma says that wemay assume that the color typ es of G

and H are already rened

Lemma Let the nite coloredgraphs G and H be L equivalent and

k

let G and H betheL color renements of G and H Then G and H

k

are L equivalent

k

x

denotes the formula with the term t substituted for the The notation

t

variable x

Neil Immerman Eric Lander

Pro of Since G and H are nite each rened color class C is determined

i

by the conjunction L of a nite set of formulas That is for all i G

i k

and H b oth satisfy

x C x

i

i

Note that has x as its free variable Thus any o ccurrence of C x may

i

i

b e replaced by the equivalent Similarly any o ccurrence of C x j

i j

i

j

k may b e replaced by where is a p ermutation of fx x g

j k

i

sending x to x Nowforany formula L C C wemay replace

j

j

to obtain an equivalentformula x by each o ccurrence of C

j

i

i

L C C

r

By the ab ove lemma wemay assume that the color classes of G and H

corresp ond exactly to the L typ es of the vertices Let R and B be two

colors and consider the edges b etween red and blue vertices in G or H Note

that this is a regular b ecause we can express in L that a

red vertex has or all blue vertices as neighb ors Note also that for

color classes of size at most the only regular bipartite graphs representing

nontrivial relationships b etween vertices are the corresp ondence graphs

and their complements Let us then change such bipartite graphs as follows

replace the complete bipartite graph by its complement and replace the

graphs of two whose complements are corresp ondence graphs by

these complements Note that when we p erform these changes on G and

H the new graphs are still L equivalent and they are isomorphic nowi

they were b efore

Let the color valence of a graph b e the maximum numb er of edges from

anyvertex to vertices of a xed color Wehave reduced the problem to

constructing an isomorphism b etween L equivalent graphs G and H when

these graphs havecolorvalence one We construct the isomorphism f as

follows Begin by letting f g h Next while there is a vertex g in the

domain of f with a unique neighbor g of color C not yet in the domain

i

of f do the following Let h b e the neighbor of f g ofcolorC and let

i

f g h

We claim that the function f constructed ab ove is an isomorphism from

G to H Ifnotthenitmust b e the case that there is a lo op of a certain

color sequence in one of the graphs but not the other For example supp ose

that wechose g g g and h h h so that g and h are color

j j

C and for ij g and h are the unique neighb ors of g and h

i i i i

resp ectivelyofcolor C However supp ose now that the neighbor of h

i j

of color C is h butthatg is not a neighbor of g In this case there is a

j

certain easily describable lo op in H but not in G That means that G and

H disagree on the following L formula

C x E x x x C x E x x C x x

x C x E x x x C x E x x

i j i i

Describing Graphs a FirstOrder Approach to Graph Canonization

Since G H they must agree on the ab oveformula Therefore f is an

L



isomorphism as claimed

In the next section we describ e some games that maybeusedtoprove

lower b ounds on the expressibilityoftheL s We will show as an example

k

using these games that L do es not suce to characterize graphs of color

class size Recently it has b een shown cf Fact that no xed L

k

suces to characterize the graphs of color class size

Lower Bounds

In this section we will showthatL is not expressive enough to characterize

k

graphs ecientlyWe will use the combinatorial games of Ehrenfeucht and

Fraisse as mo died for L see All of the results in this

k

section could b e proved by induction on the complexityofthesentences in

question but we nd that the games oer more intuitive arguments

Let G and H be two graphs and let k b e a natural numb er Dene the

L game on G and H as follows There are two players and there are k

k

pairs of p ebbles g h g h On eachmove Player I picks up anyof

k k

the p ebbles and he places it on a vertex of one of the graphs Sayhepicks

up g Hemust then place it on a vertex from G Player I I then picks up

i

the corresp onding p ebble If Player I chose g then she must cho ose h

i i

and places it on a vertex of the appropriate graph H in this case

Let p r b e the vertex on which p ebble p is sitting just after move

i i

r Then wesay Player I wins the game at move r if the map that takes

g r to h r i k is not an isomorphism of the induced k vertex

i i

subgraphs Note that if the graphs are colored then an isomorphism must

preserve color as well as edges Thus Player I I has a winning strategy for

the L game just if she can always nd matching p oints to preservethe

k

isomorphism Player I is trying to p oint out a dierence b etween the two

graphs and Player I I is trying to keep them lo oking the same

As an example consider the L game on the graphs G and H shown in

Figure

Supp ose that Player Is rst moveistoplaceg on a red vertex in G

Player I I mayanswer by putting h on either of the red vertices in H

Now supp ose Player I puts h on an adjacentyellowvertex in H Player

I I has a resp onse b ecause in G g also has an adjacentyellowvertex

The reader should convince himself or herself that in fact Player I I has

a winning strategy for the L game on the given G and H The relevant

theorem concerning the relationship b etween this game and the matter at

hand is

Fact Theorem C Let u v beak conguration over G H

Player II has a winning strategy for the L game on u v if and only if

k

G u H v

L k

Neil Immerman Eric Lander

G H

r



 r r

b  y  

 

y   y

b y

 b b



r



FIGURE The L Game

Note that wehave the following

Corollary L does not characterize graphs of color class size

We will prove in Section that testing whether G H can b e done

L

k

k

in time O n log n Furthermore if L characterizes a set S of graphs then

k

canonical forms for the graphs in S may b e computed in this same time

b ound

It is interesting to note that not only do es no L characterize all graphs

k

but almost all graphs are indistinguishable in L Thus if two graphs of size

k

n k are chosen at random they will almost certainly b e L equivalent

k

but not isomorphic

Fact cf Fix k and let Pr G H betheprobability that

n L

k

two randomly chosen graphs of size n are L equivalent Then

k

h i

lim Pr G H

n L

k

n

Not only do es L not characterize most graphs it is not strong enough

k

to express counting

Prop osition Let EVEN be the set of graphs with an even number

of vertices This property is not expressible in L for graphs with n or more

n

vertices Furthermore L does not characterize the set of total ly discon

n

nectedgraph on n vertices

Pro of Let D b e the uncolored graph with n vertices and no edges We

n

claim that D D The following is a winning strategy for Player

n L n

n

I I in the np ebble game on D and D Player Is moves are answered

n n

preserving distinctness That is if Player I places p ebble i on a vertex

already o ccupied by p ebble j thenPlayer I I do es the same If Player I

places p ebble i on a vertex not o ccupied byany other p ebbles then Player

I I do es the same This is p ossible b ecause there are n vertices and only

Describing Graphs a FirstOrder Approach to Graph Canonization

n other p ebbles Since there are no edges the resulting maps are always

isomorphisms

In the next section we increase the expressivepower of the L s by adding

k

the abilitytocount

Counting Quantiers

In this section weaddcounting quantiers to the languages L thus ob

k

taining the new languages C For each p ositiveinteger iwe include the

k

quantier ix The meaning of x x for example is that

there exist at least vertices suchthatWe will sometimes also use the

quantiers ix meaning that there exists exactly ixs

ixx ixx i xx

Example As our rst example note that the following sentence in

C characterizes the graph D of Prop osition

n

nxx x xy E xy

Note that every sentence in C is equivalent to an ordinary rstorder

k

sentence with p erhaps manymorevariables and quantiers We will see

that testing C equivalence is no harder than testing L equivalence the

k k

idea is that to test the truth of x or x wehave to consider all p ossible

xs anyway and it do esnt cost more to count them In Corollary

k

weshow that C equivalence can b e tested in time O n log n Similarly

k

graphs characterized by C can b e given canonical lab elings in the same

k

time

The following notation is useful

Denition Let b e a set of nite graphs Dene varnresp

vcn to b e the minimum k suchthatL resp C characterizes the

k k

graphs in with at most n vertices Let varn varGRAP H S n and

vcn vcGRAP H S n When varnorvcnisboundedwewrite

var max varn and vc max vcn

n n

For example by combining various results obtained so far weknowthat

var GRAPHSnn var CC and var CC var CC

Here we are letting CCk b e the set of color class k graphs

Wewillnow examine C attempting to compute vcS for various sets

k

of graphs S A mo dication of the L game provides a combinatorial to ol

k

for analyzing the expressivepower of C Given a pair of graphs dene the

k

C game on G and H as follows Just liketheL game wehavetwoplayers

k k

and k pairs of p ebbles Nowhowever eachmovehastwo steps

Neil Immerman Eric Lander

Player I picks up a p ebble say g He then cho oses a set Aof

i

vertices from one of the graphs in this case G NowPlayer I I must

answer with a set B ofvertices from the other graph B must have

the same cardinalityasA

Player I places h on some vertex b B Player I I answers by placing

i

g on some a A

i

The denition for winning is as b efore Note that what is going on in the

twostepmove is that Player I is asserting that there exist jAj vertices in G

with a certain prop erty Player I I answers that there are the same number

of suchvertices in H A straightforward extension of the pro of of Fact

shows that this game do es indeed capture expressibilityinC

k

Theorem Let u v beak conguration over G H Player II has a

winning strategy for the C game on u v if and only if G u H v

k C

k

Consider the following example of the C game

k

Prop osition Player II has a win for the C game on the graphs

pictured in Figure Thus vc CC

Pro of Player I Is winning strategy is as follows She matches the rst

vertex chosen byPlayer I with anyvertex of the same color Nowsuppose

that at anypoint in the game the rst pair of p ebbles are placed on vertices

g and h b oth vertices of the same color say red Supp ose that Player Is

next moveinvolves the other pair of p ebbles There is a corresp ondence

between the vertices in G and H as follows

g h

blue vertex adjacenttog blue vertex adjacenttoh

yellowvertex adjacentto g yellowvertex adjacentto h

red vertex not adjacenttog red vertex not adjacenttoh

yellowvertex not adjacenttog yellowvertex not adjacenttoh

blue vertex not adjacentto g blue vertex not adjacentto h

If Player I cho oses a set AthenPlayer I I cho oses the set B to b e the

corresp onding set of vertices under the ab ove map Whichever vertex Player

I then picks from B Player I I will cho ose the corresp onding vertex in A

Thus the chosen pair of vertices will b e the same color and either b oth

adjacent or b oth not adjacent to the other chosen pair Thus Player I I can

always preserve the partial isomorphism

Vertex Renement Corresp onds to C

It turns out that the expressivepower of C is characterized bythewell

known metho d of vertex renement see Let G hV E C C i

r

Describing Graphs a FirstOrder Approach to Graph Canonization

b e a colored graph in whichevery vertex statises exactly one color relation

Let f V f ng be given by f v i i v C We then dene f

i

the renementof f as follows The new color of eachvertex v is dened

to b e the following tuple

hf v n n i

r

where n is the number of vertices of color i that v is adjacent to Wesort

i

these new colors lexicographically and assign f v tobethenumber of

the new color class which v inhabits Thus twovertices are in the same

new color class just if they were in the same old color class and they were

adjacent to the same number of vertices of each color Wekeep rening the

k k k

coloring until at some level f f Weletf f and call f the

stable renement of f

The equivalence of stable colorings and C equivalence is summed up by

the following

Theorem Given a coloredgraph G hV E C C i with two

r

vertices g and g the fol lowing areequivalent

f g f g

For al l x C G j g i G j g

Player II wins the C game on two copies of Gwithpebble pair num

ber initial ly placedong and g respectively

Pro of By induction on r weshow that the following are equivalent

r r

f g f g

For all x C of quantier depth r G j g i G j g

Player I I wins the r move C game on two copies of G with p ebble

pair numb er initially placed on g and g resp ectively

The base case is by denition f g f g f g ig and g

satisfy the same initial color predicate This is true if and only if g and

g satisfy all the same quantier free formulas This in turn is true if and

only if the map sending g to g is a partial isomorphism This last is the

denition of Player I I winning the movegame

Assume that the equivalence holds for all g and g and for all rm

m m

Supp ose that f g f g There are two cases

m m

If f g f g thenby the inductive assumption there is a

quantier depth m formula C on which g and g dier Otherwise

it must b e that g and g have a dierentnumb er of neighb ors of some

m

f color class i Let N b e the maximum of these twonumbers By

m

induction twovertices are in the same f color class i they agree on

Neil Immerman Eric Lander

all quantier depth m C formulas Since quantier depth m formulas

are closed under conjunction and the graphs in question are nite there is

a depth m C such that for all g G

i

m x



f g i G j

i

g

It follows that g and g dier on the formula

x



Nx E x x

i

x

x x

 

Supp ose that G j but G j for some C

g g

 

of quantier depth mIf is a conjunction then g and g must dier

on at least one of the conjuncts so wemay assume that is of the form

Nx x On the rst move of the game Player I cho oses the N vertices

x



v such that v Whatever Player I I cho oses as B there will b e at least

g



x



Player I puts his p ebble number one vertex v suchthatG j v

g



on this v Player I I must resp ond with some v AThevertices v v

nowdieronaquantier depth m formula Thus by induction Player

I I loses the remaining m movegame

m m

Supp ose that f g f g It follows that g and g

m

have the same numb er of neighb ors of each f color Thus a cor

resp ondence exists b etween the vertices in the rst copyofG and those in

the second preserving b oth the prop erty of b eing adjacentto g and the

i

m

f color Note that since we are considering two copies of the same

graph if b oth copies have the same numb er of red neighb ors of g then they

i

also b oth have the same numb er of red nonneighbors of g It follows that

i

Player I I can assure that after the rst move the pair of vertices chosen

m

will b e in the same f color class Thus by the induction hyp othesis

Player I I has a win for the remaining m move game

All Trees and Almost All Graphs

Theorem combined with some facts ab out stable colorings provide us

with several corollaries concerning graphs characterized by C First it is

well known that the set of nite trees is characterized by stable coloring

Thus

Corollary Let TREES be the set of nite trees Then vcTREES

It is interesting to compare Corollary with the more complicated

situation in which counting is not present

Fact Let T be the set of nite trees such that each node has at

k

most k children and let S be the subset of T in which each nonleaf has

k k

exactly k children Then

Describing Graphs a FirstOrder Approach to Graph Canonization

if k

if k

var T

k

k if k

if k

var S if k

k

dke if k

Babai and Kucera haveproved the following result ab out stable colorings

of random graphs

Fact There exists a constant such that if G is chosen

randomly from the set of al l labeledgraphs on n vertices then

n

Prob fG has two vertices of the same stable color g

Corollary Almost al l nite graphs arecharacterizedbyC

It is easy to see that Fact fails for regular graphs all regular graphs

of degree d on n vertices are C equivalent More recentlyKucera has given

a linear algorithm for canonization of regular graphs of a given xed degree

It follows from his results that

Corollary For al l d and suciently large n C characterizes more

than O n of the regular graphs of degree d on n vertices

Equivalence and Canonization Algorithms

The stable coloring of a graph is computable in O jE j log n steps We

present the algorithm for completeness

Algorithm Place indices r of initial color classes on list L

While L do b egin

For each vertex v adjacent to some color classes in L

record how many neighbors of each such color class v has

Sort these records to form new color classes

Replace L with indices of al l but the largest pieceofeach old class

Theorem Algorithm computes the vertex renement of a graph

GItcan be implementedtoruninO jE j log n time on a RAM

Neil Immerman Eric Lander

Pro of If we implementlineasabucket sort then the amountofwork in

p erforming an iteration of the while lo op is prop ortional to the number of

edges traversed Note that each time an edge is traversed the color class of

its head is at most half of its previous size Thus O jE j log n steps suce

Corollary We can test if G H in O jE j log n steps where jE j

C

is the number of edges in G

Pro of We compute the stable coloring of G H G and H are C equivalent

i each color class has the same number of vertices from each graph

As promised weshowhow to mo dify the ab ove algorithm to compute

canonical lab elings of graphs characterized by C

Theorem Let S be a set of nite graphs characterizedbyC Then

canonical labelings for S arecomputable in O jE j log n steps

Pro of We mo dify Algorithm as follows When a stable coloring is

reached if eachvertex has a unique stable color then a canonical lab eling

is determined Otherwise let C b e the rst color class of size greater than

i

one and let g be a vertex of color C Make g a new color C addC

i new new

to L and continue the renement

Supp ose that G g H h Let G and H b e the result of coloring

C

g and h new Since C characterizes S G and H are isomorphic It

follows that C equivalent graphs will result in the same canonical lab eling

Furthermore the analysis of the revised algorithm is unchanged

We will next present the algorithm to test C equivalence for k

k

Dene stable colorings of k tuples as follows Initially wegiveeach k tuple of

vertices from G a color according to its isomorphism typ e That is hg g i

k

has the same initial color as hh h i just if the map g h i

k i i

k is an isomorphism

We next form the new color of hg g i as the tuple

k

n o

f g g SORT hf g g g fg gg g fg g gi g G

k k k k

That is the new color of a k tuple is formed from the old color as well as

from considering for eachvertex g the old color of the kktuples resulting

from the substitution of g into each p ossible place

Theorem A stable coloring of k tuples in an n vertex graph may be

k

computedinO k n log n steps

Describing Graphs a FirstOrder Approach to Graph Canonization

Pro of This is a generalization of Algorithm Wemust rene the

coloring for each color class B ofk tuples Eachsuch renementtakes

i

k

O kn steps for each k tuple in B Eachofthen k tuples will have its

i

k

color class treated at most logn times

Theorem Let G beagraph whose k tuples of vertices arecolored

k

Let g h G The fol lowing areequivalent

f g f h

For al l x x C G j g i G j h

k k

Player II wins the C game on two copies of G with pebbles k

k

initial ly placedong g and h h respectively

k k

Pro of The pro of is similar to that of Theorem

k

Corollary C equivalence may be testedinO n log n steps If k

k

k

is al lowed to vary with n this becomes O k n log n Similarly if S is

characterizedbyC thencanonical labelings for S may becomputed in the

k

same time bound

Conclusions

Wehave b egun a study of which sets of graphs are characterized bythe

languages L and C For such sets of graphs wehavegiven simple and e

k k

cient canonization algorithms General directions for further study include

the following

There are manyinteresting questions concerning the values var S

and vc S forvarious classes of graphs S In particular it would b e

very interesting to determine vcPlanar Graphs and vcGenus k

graphs

Question in its new form What must we add to rstorder logic

with xed p oint and counting in order to obtain all p olynomialtime

graph prop erties deserves considerable further study cf

Fact implies that FOLFP countingdoesnoteven in

clude all of DSPACElog n It would b e very interesting and p er

haps more tractable to answer question for other classes suchas

NSPACElog n

Acknow ledgements Thanks to Steven Lindell for suggesting some improve

ments to this pap er

Neil Immerman Eric Lander

References

AV Aho JE Hop croft and JD Ullman The Design and

Analysis of Computer Algorithms Addison Wesley

Laszlo Babai Mo derately Exp onential Bound for Graph Isomor

phism Pro c Conf on Fundamentals of Computation Theory Szeged

August

L Babai WM Kantor EM Luks Computational Complexity and

the Classication of Finite Simple Groups th IEEE FOCS Symp

Laszlo Babai and Ludek Kucera Canonical Lab elling of Graphs

in Linear Average Time th IEEE Symp on Foundations of Com

puter Science

Laszlo Babai and Eugene M Luks Canonical Lab eling of Graphs

th ACM STOC Symp

D Mix Barrington N Immerman and H Straubing On Uniformity

Within NC ThirdAnnual Structure in Complexity Theory Symp

Jon Barwise On Moschovakis Closure Ordinals J Symb Logic

J Cai M Furer N Immerman An Optimal Lower Bound on the

Number of Variables for Graph Identication th IEEE FOCS

Symp

Ashok Chandra and David Harel Structure and Complexity of Re

lational Queries JCSS

A Ehrenfeucht An Application of Games to the Completeness Prob

lem for Formalized Theories Fund Math

Ron Fagin Probabilities on Finite Mo dels J Symbolic Logic

No

R Fraisse Sur les Classications des Systems de Relations Publ

Sci Univ Alger I

Leslie Goldschlager The Monotone and Planar Circuit Value Prob

lems are Log Space Complete for P SIGACT News No

Yuri Gurevich Logic and the Challenge of Computer Science in

Current Trends in Theoretical Computer Science ed Egon Borger Computer Science Press

Describing Graphs a FirstOrder Approach to Graph Canonization

Christoph M Homann GroupTheoretic Algorithms and Graph Iso

morphism SpringerVerlag Lecture Notes in Computer Science

John E Hop croft and Rob ert Tarjan Isomorphism of Planar

Graphs in Complexity of Computer Computations R Miller and

JW Thatcher eds Plenum Press

Neil Immerman Number of Quantiers is Better than Number of

Tap e Cells JCSS No June

Neil Immerman Upp er and Lower Bounds for First Order Express

ibility JCSS No

Neil Immerman Relational Queries Computable in Polynomial

Time Information and Control

Neil Immerman Languages That Capture Complexity Classes

SIAM J Comput No

Neil Immerman Nondeterministic Space is Closed Under Comple

mentation SIAM J Comput No

Neil Immerman Expressibility and Parallel Complexity SIAM J of

Comput

Neil Immerman Expressibility as a Complexity Measure Results and

Directions Second Structure in Complexity Theory Conf

Neil Immerman Descriptive and Computational Complexity in

Computational Complexity Theory ed J Hartmanis Proc Symp in

Applied Math American Mathematical So ciety

Neil Immerman and Dexter Kozen Denability with Bounded Num

berofBoundVariables Information and Computation

Ludek Kucera Canonical Lab eling of Regular Graphs in Linear Av

erage Time th IEEE FOCS Symp

Eugene M Luks Isomorphism of Graphs of Bounded Valence Can

be Tested in Polynomial Time JCSS pp

Yiannis N Moschovakis Elementary Induction on Abstract Structures

North Holland

Bruno Poizat Deux ou trois chose que je sais de Ln J Symbolic

Logic

Neil Immerman Eric Lander

Simon Thomas Theories With Finitely Many Mo dels J Symbolic

Logic No

M Vardi Complexity of Relational Query Languages th Sympo

sium on Theory of Computation