Descriptive Complexity a Logicians

Approach to Computation

Neil Immerman

Computer Science Dept

University of Massachusetts

Amherst MA

immermancsumassedu

App eared in Notices of the American Mathematical Society

A basic issue in computer science is the complexity of problems If one is doing a

calculation once on a mediumsized input the simplest may b e the b est

metho d to use even if it is not the fastest However when one has a subproblem that

will havetobesolved millions of times optimization is imp ortant A fundamental

issue in theoretical computer science is the computational complexity of problems

Howmuch time and howmuch memory space is needed to solve a particular problem

Here are a few examples of such problems

Given a directed graph and two sp ecied p oints s t determine if

there is a path from s to t A simple lineartime algorithm marks s and then

continues to mark every vertex at the head of an edge whose tail is marked

When no more vertices can b e marked t is reachable from s i it has b een

marked

Mintriangulation Given a p olygon in the plane and a length determine if

there is a triangulation of the p olygon of total length less than or equal to L

Even though there are exp onentially many p ossible triangulations a dynamic

programming algorithm can nd an optimal one in O n steps

ThreeColorability Given an undirected graph determine whether its vertices

can b e colored using three colors with no two adjacentvertices having the same

color Again there are exp onentially many p ossibilities but in this case no known

algorithm is faster than exp onential time

Research supp orted in part by NSF grant CCR

Computational complexity measures howmuch time andor memory space is needed

as a function of the input size Let TIMEtn b e the of problems that can b e

solved by that p erform at most O tn steps for inputs of size n The

Polynomial Time P is the set of problems that are solvable in time

at most some p olynomial in n

k

P TIME n

k

Even though the class TIMEtn is sensitive to the exact machine mo del used in

the computations the class P is quite robust The problems Reachability and Min

triangulation are elements of P

Some imp ortant computational problems app ear to require more than p olynomial

time An interesting class of such problems is contained in nondeterministic p oly

nomial time NP A nondeterministic computation is one that maymake arbitrary

choices as it works If anyofthesechoices lead to an accept state then wesaythe

input is accepted As an example let us consider the threecolorability problem A

nondeterministic algorithm traverses the input graph arbitrarily assigning to each

vertex a color red yellow or blue Then it checks whether each edge joins vertices

of dierent colors If it accepts

A nondeterministic computation can b e mo deled as a whose ro ot is the starting

conguration Every choice forms a branching no de The computation accepts if any

of the leaves is an accepting conguration The nondeterministic time of this com

putation is the length of the path from ro ot to accepting leaf not the exp onentially

larger size of the tree of all p ossible choices

Continuing the example given a graph G with v vertices the nondeterministic three

v

colorability algorithm denes a computation tree that has branches one for each

p ossible coloring of the vertices Each such branch ends in an accepting leaf if and

only if the corresp onding coloring is a valid three coloring It follows that there

exists an accepting leaf just if G is threecolorable Thus threecolorabilitycanbe

checked in nondeterministic O n time where n is the number of vertices plus edges

ThreecolorabilityisinNP

The threecolorability problem as well as hundreds of other wellknown combinatorial

problems are NP complete See for a survey of many of these This means that

not only are they in NP but they are the hardest problems in NP all problems

in NP are reducible to each NPcomplete problem A reduction from problem A to

problem B is a p olynomialtime mapping from anyinputtoA to an input to B that

has the same answer It follows that if A is reducible to B and B is in P then A is in

P At present the fastest known algorithm for any of these problems is exp onential

An ecient algorithm for any one of these problems would translate to an ecient

algorithm for all of them

The P NP question is an example of our inability to determine what can or cannot

b e computed in a certain amount of computational resource time space parallel

time etc The only truly eectivetoolthatwe currently have for this is Cantors

diagonalization argument This is very useful for proving hierarchy theorems ie

that more of a given computational resource enables us to compute more

TIMEn TIMEn NTIMEn NTIMEn SPACEn SPACEn

However there are no known techniques for comparing dierenttyp es of resources

eg time versus nondeterministic time times versus space etc

These notions of complexitymight not seem fundamental but rather tied to the sort

of machine that the algorithm will b e p erformed on On the contrary the notions

of time space parallel time and even nondeterministic time are fundamental and

have many equivalent formulations One such formulation whichIwillnow describ e

involves no machines at all but relies instead on classic notions from mathematical

logic

Complexitytheorytypically considers yesno problems this is the examination of

the diculty of computing a particular bit of the desired output Yesno problems

are prop erties of the input the set of all inputs to which the answer is yes have

the prop erty in question Rather than asking the complexityofchecking if a certain

input has a prop erty T in Descriptive Complexitywe ask how hard is it to express

the prop erty T in a formal language It is plausible that prop erties that are harder

to check might b e harder to express What is surprising is how closely mathemat

ics mimics the physical world when we use rstorder logic descriptive complexity

exactly captures the imp ortant complexity classes

In Descriptive Complexitywe view inputs as nite logical structures eg a binary

string w w w w is co ded as

jw j

w

A hf jw jgS i

w

consisting of a universe U f jw jg of the bit p ositions in the string the

w w th

monadic S dened so that S i holds i the i bit of w is one and is

the usual total ordering on U

G

A graph is a logical structure A hf vgE i whose universe is the set

G

G

of vertices and E is the binary edge relation A graph problem is a set of nite

structures whose vo cabulary consists of a single Similarlywemay

think of any problem T in some complexity class C as a set of structures of some xed

vo cabulary

Recall that in rstorder logic we can quantify over the universe Wecansayfor

example that a string ends in a one The following sentence do es this by asserting

the existence of a string p osition x that is the last p osition and asserting S x ie

the bit at that p osition is a one

xy y x S x

As another example wecansay that there are exactly two edges leaving every vertex

xyz w y z E x y E x z E x w w y w z

For any structure Awe use the notation Aj to mean that is true in A

In secondorder logic we also havevariables X that range over relations over the

i

universe These variables may b e quantied

A secondorder existential formula SO b egins with second order existential quan

tiers and is followed by a rstorder formula As an example the following second

order existential sentence says that the graph in question is threecolorable It

do es this by asserting that there are three unary relations Red R Yellow Y and

Blue B dened on the universe of vertices It go es on to saythatevery vertex has

some color and no two adjacentvertices have the same color

h

R Y B x R x Y x B x y E x y

i

R x R y Y x Y y B x B y

Descriptive Complexity b egan with the following theorem of Ronald Fagin Fagins

theorem says that NP is equal to the set of problems describable in secondorder

existential logic ObservethatFagins Theorem characterizes the complexity class

NP purely by logic with no mention of machines or time

Theorem A set of structures T is in NP i there exists a secondorder exis

tential formula such that T fA j A jg

To capture complexity classes P and b elow rstorder logic is more appropriate

Since rstorder logic is weaker than secondorder it is natural to let formulas grow

with the size of inputs

For example one can ask How long a rstorder formula is needed to express graph

connectivity A graph G is connected i G jxy P x y whereP x y means

that there is a path from x to y We are interested in paths of length less than

G

n jV j Firstorder formulas of size O log n are b oth necessary and sucientto

express connectivity

In order to characterize complexity classes as in Fagins theorem but via rstorder

logics one must lo ok at the co ding of inputs A graph or other structure is given to a

computer in some order eg vertices v v v Furthermore algorithms mayuse

n

the ordering eg searching through eachvertex in turn Tosimulate the machine

the logical languages need access to the ordering This was also necessary for Fagins

theorem but in SO wemay existentially quantify a total ordering on the universe

From now on we will assume that all rstorder languages include a binary relation

symbol denoting a total ordering on the universe With this proviso wecan

relate computational complexity to rstorder descriptive complexity

Let FOSIZEsn b e the set of prop erties expressible by uniform sequences of rst

th

order formulas f g such that the n formula has O sn symb ols and ex

i

iZ

presses the prop erty in question for structures of size n Uniform means that the

map n has very low complexityegSPACElog n Later we will see that

n

purely syntactic uniformity conditions suce

The following theorem shows that FOSIZE is a go o d measure of space In the

following NSPACE is the nondeterministic version of space in which again we allow

algorithms to make arbitrary choices and we only count the amount of space used in

an accepting path In Walter Savitchproved that NSPACEsn is contained

in SPACEsn a result that is not obvious It is not known whether Savitchs

theorem is optimal nor is it known whether SPACEsn is equal to NSPACEsn

The following theorem closely relates the descriptive measure FO SIZE to space by

tting it within the b ounds of Savitchs theorem

Theorem For sn log nanyproblem in nondeterministic space sn can

beexpressed by rstorder formulas of size sn log nAny problem so expressible

is in deterministic space sn Insymbols

NSPACEsn FOSIZEsn log n SPACE sn

One way to increase the p ower of rstorder logic is byallowing inductive denitions

This is formalized via a least xed p oint op erator LFP

As an example supp ose that wewant to dene the transitive of the edge

relation of a graph This would b e useful if wewere given a directed graph G

G G

V E st and wewanted to assert that there is a path from s to t Let E be

the reexive of the edge relation E Given E we can describ e the

reachability prop erty simplyE s t We can dene E inductively as follows

E x y x y E x y z E x z E z y

Equation asserts that E is a xed p oint of the following map R R of binary

relations

R x y E x y z R x z R z y

Since R app ears only p ositively that is without any negation signs in this op erator

has a least xed p oint whichwetake as the meaning of the inductive denition

Thus we can write

E LFP

Rxy

In fact if we consider the sequence then LFP

is the union of this sequence Since the sequence is monotone and there are at most

k

tuples in a k ary relation LFP is in fact a p olynomial iterator of formulas Thus n

LFP is a particularly natural way to iterate rstorder formulas letting their size grow

while the number of variables they use remain constant The following theorem says

that if we add to rstorder logic the p ower to dene new relations by induction then

we can express exactly the prop erties that are checkable in p olynomial time This

is exciting b ecause a very natural descriptive class rstorder logic plus inductive

denitions captures a natural and imp ortant complexity class Polynomial time is

characterized using only basic logical notions and no mention of computation

Theorem Aproblem is in polynomial time i it is describable in

rstorder logic with the addition of the least xedpoint operator This is equivalent

to being expressible by a rstorder formula iteratedpolynomial ly many times In

O

symbols P FO LFP FOn

Theorems and cast the P NP question in a dierentlight In the following

we are using the fact that if P were equal to NPthenNPwould b e closed under

complementation It would then followthatevery secondorder formula would b e

equivalent to a secondorder existential one

Corollary P is equal to NP i every secondorder expressible property over nite

ordered structures is already expressible in rstorder logic using inductive denitions

In symbols

P NP FO LFP SO

Wemention two other natural op erators that let us capture imp ortant complexity

b e a formula with k free variables We can think x classes Let x x x

k

k

k

of as representing an edge relation over a vertex set V U consisting of all k tuples

from the original universe Dene the transitive closure op erator writing TC

x x

to denote the reexive transitive closure of the binary relation NSPACElog nis

an imp ortant complexity class that includes most path problems The Reachability

problem mentioned at the b eginning of this article is complete for NSPACElog n

The following theorem says that this complexity class is captured byFOTC

Note that transitive closures are a sp ecial kind of inductive denition see Equation

Every time we reapply the inductive equation the paths considered double in

length That is whyFOTC FOlog n

Theorem The complexity class nondeterministic logarithmic spaceis

equal to the set of problems describable in rstorder logic with the addition of a

transitive closureoperator This is a subclass of the set of problems describable by

rstorder formulas iterated log n times In symbols

NSPACElog n FO TC FOlog n

We write FOtn to denote those prop erties expressible for structures of size n by formulas

that consist of a blo ck of restricted quantiers rep eated tn times What can b e expressed with a

b ounded number of variables is exactly what can b e computed with p olynomially much hardware

ie p olynomialspace and p olynomially many pro cessors The depth of nesting of quantiers needed

to express a prop erty is precisely the parallel time needed to check it See for a more detailed

survey of these results

Supp ose we are given a rstorder formula R x x where R is a new relation

k

variable but in which R need not o ccur only p ositively Then the least xed p ointof

may not exist However wemay describ e its partial xed p oint PFP whichis

equal to the rst xed p ointinthesequence or if there is no such

k

n

xed p oint Since this sequence must rep eat after at most steps PFP maybe

thought of as an exp onential iterator

The following theorem shows that the arbitrary iteration of rstorder formulas

which is the same as iterating them exp onentially allows the description of exactly

all prop erties computable using a p olynomial amount of space

Theorem Aproblem is in polynomial space i it is describable in

rst logic with the addition of the partial xedpoint operator This is equivalent to

being expressible by rstorder formulas of polynomial size and also equivalent to being

expressible by a rstorder formula iteratedexponential ly In symbols

O 

O n

PSPACE FO PFP FOSIZEn FO

Complementation

One of the early successes of descriptive complexitywas in resp onse to questions con

cerning database query languages A very p opular mo del of databases is the relational

mo del In this mo del databases are exactly nite logical structures Furthermore

query languages are based on rstorder logic

As an example supp ose that wehave an airline database One of its relations might

b e FLIGHTS with arguments ightnumb er origin departure time destination

arrival time The query What are the direct ights from JFK to LAX leaving in

the morning could b e phrased as

t t t FLIGHTSx JFK t LAXt

d a d d a

Note that this query has one free variable x The resp onse to the query should b e

the set of xs suchthat x is the ightnumb er of a direct ight from JFK to LAX

that leaves b efore no on

Of course not all queries that wemightwant to express are rstorder For example

the reachability query is there a route with no xed limit on the numb er of hops

taking me from s to t is not rstorder For this and related reasons Chandra

and Harel prop osed a hierarchy of query languages ab ove rstorder logic based on

alternating applications of quantication negation and the least xed p oint op erator

It was known that for innite structures this gave a strict hierarchy the same

was conjectured for nite structures

There is a big dierence between innite and nite structures Over an innite

structure a least xed p ointmightnever reach a stage of the induction at which

it has completed Over a nite structure there is a nite stage where the xed

point is realized Furthermore one can verify within the induction that this stage

has b een reached Thus bysaying that wehave reached the nal stage and that

tuple t is not present we can express the negation of LFP t It follows that the

hierarchy prop osed by Chandra and Harel is not a hierarchy at all Every formula

in FO LFP is expressible as a single xed p oint of a rstorder formula Wesay

that the hierarchy collapses to its rst level

Theorem Every formula in rstorder logic with the addition of the least

xedpoint operator is equivalent to a single application of least xedpoint to a rst

order formula

A related question could b e asked ab out a prop osed hierarchyofFO TC In fact

the transitive closure hierarchy also collapses to its rstlevel

Theorem Every formula in rstorder logic with the addition of the tran

sitive closureoperator is equivalent to a single application of transitive closuretoa

rstorder formula

Deterministic complexity classes are closed under complementation It had b een long

believed in the computer science community that nondeterministic space is a one

sided class not closed under complementation In particular it was b elieved that

the complement of the Reachability problem that is the set of graphs G V E s t

such that there is no path from s to twas not recognizable in NSPACElog n It is

conjectured that SPACElog n is strictly contained in NSPACElog n and the most

likely waytoprove that app eared to b e via showing that NSPACElog n is not closed

under complementation Thus the following corollary of Theorem was quite

surprising

Corollary For sn log n NSPACEsn is closedundercomple

mentation

Coincidently Corollary which had b een op en since was proved indep endently

byRob ert Szelep csenyi at almost exactly the same time but using dierent metho ds

Lower Bounds and Ordering

One tantalizing feature of Theorems and is that they hold out the prosp ect

that we can settle questions such as P NP via logical metho ds In particular there

are quantier games due to EhrenfeuchtandFrassethatcharacterize the expressive

power of logical languages EhrenfeuchtFrasse games are played on a pair of struc

tures A B There are twoplayers the Sp oiler and the Duplicator At eachmove

the Sp oiler cho oses some element of the universe of one of the structures and the

Duplicator must resp ond with an element from the other structure If at the end

of the game the map from the elements chosen from A to those chosen from B is

an isomorphism of the induced substructures then the Duplicator wins otherwise the

Sp oiler wins For most logical languages L there is a corresp onding game G with the

L

following fundamental prop erty Write A B to mean that for all L Aj

L

i Bj

Duplicator has a winning strategy for game G A B A B

L L

We use EhrenfeuchtFrasse games to prove that certain prop erties are not expressible

in certain logics If A and B disagree on prop erty S and yet we can construct a winning

strategy for the Duplicator on G A B then wehave proved that S is not expressible

L

in L

SOmonadic is a subset of SO in which the only secondorder variables are relations

that take one argument We can thus assert the existence of colorings of the vertices

Fagin used EhrenfeuchtFrasse games to provethat

Theorem Graph Connectivity is not describable by a secondorder existen

tial formula in which al l relational variables are monadic

However nonconnectivityisinSOmonadic The following formula says that there

exists a set C of vertices that is not empty and not the whole universe that is closed

under edge connections Thus Theorem implies that SOmonadic is not closed

under complementation

NotConnected C xy C x C y xy C x E x y C y

Proving lower b ounds on SO when relational variables need not b e monadic app ears

hard Lower b ounds for rstorder logic may b e more tractable Alower b ound

corresp onding to Theorem was presented for the AGAP problem AGAP is a

generalization of the graph reachability problem to andor graphs and is complete

for P The following theorem proves a lower b ound on the size of rstorder formulas

needed to express the AGAP prop erty

Theorem The AGAP property is describable by linearsize rstorder for

mulas that do not include the ordering relation However it is not describable by such

p

log n

formulas of size less than

The relationship b etween space and time is not well understo o d Wedoknow that

k

NSPACElog n P SPACEn

k

From Theorem weknow that NSPACElog n FOSIZE log n Theorem is

alower b ound on languages without ordering If Theorem could b e shown with

ordering it would follow that AGAP is not in FOSIZE log nandthus that P strictly

contains NSPACElog n

The EhrenfeuchtFrasse games b ecome much less useful as a pro of technique when

working with ordered graphs This is b ecause in a fairly weak ordered language we

th

can express the prop ertyofavertex v that it is the i vertex in the ordering Once

i

a language has this strength two structures will b e equivalent i they are identical

If we can assert ab out a graph G that vertex has an edge to vertex then any

graph that agrees on all such sentences is identical to G

On the other hand if we just remove the ordering then Theorems and all

fail For example it is easy to show that without an ordering we cannot count In

fact if EVEN represents the query The size of the universe is even then

Theorem In the absence of the ordering relation rstorder logic with the

addition of the xedpoint operator cannot describe the problem EVEN

All the graph prop erties that wewant to express are order indep endent Thus it

would b e very nice to have a language that captures all p olynomialtime computable

orderindep endent prop erties This would b e analogous to Theorem whichsays that

FO LFP captures p olynomial time for ordered structures

Before examples involving the counting of large unstructured sets were the only

problems known to b e in orderindep endent P but not in FOwo LFP Consider

the language FOwo LFP COUNTinwhich structures are twosorted their

universe is partitioned into an unordered domain D fd d d g and a separate

n

numb er domain N f ng Wehave the database predicates dened on D

and the standard ordering dened on N The two sorts are combined via counting

quantiers

ixx

meaning that there exist at least i elements x suchthatx Here i is a number

variable and x is a domain variable

For quite a while it was an op en question whether the language FOwo LFP

COUNTwas equal to order indep endentP

Instead in itwas proved that FO wo LFP COUNT is strictly contained in

orderindep endent P In the following lower b ound the graphs are almost ordered

They consist of ngroupsofvertices each and there is a given total ordering on

the groups

 k

In fact it would follow that no Pcomplete problem is in SPACElog n for any k From

this we could conclude that Pcomplete problems do not admit the kind of extensive sp eedup via

parallelism that problems such as reachabilitymatrixmultiplication and matrix inversion do

Theorem Thereisanorderindependent property of graphs T that is very

easy to compute in a complexity class wel l below P but in the absenceofordering

is not expressible in rstorder logic with the addition of the xedpoint operator and

counting quantiers

Most imp ortant complexity classes b elowPhave had their languages without order

ing or with some partial orderings separated No such separation

had b een proved for the languages FO wo LFP and FOwo PFP In

Abiteb oul and Vianu explained whybyproving

Theorem The fol lowing conditions areequivalent

In the absenceofordering rstorder logic plus the least xedpoint operator

describes al l the properties describable by rstorder logic plus the partial xed

point operator

In the presenceofordering rstorder logic plus the least xedpoint operator

describes al l the properties describable by rstorder logic plus the partial xed

point operator

P PSPACE

Theorem is proved byshowing that if two structures G and H are FOwoLFP

equivalent then they are FO wo PFP equivalentaswell Thus ordering is not

the problem but EhrenfeuchtFrasse games wont help separate P from PSPACE

Descriptive complexityreveals a simple but elegant view of computation Nonethe

less the basic problems of how to compare dierent computational resources remain

mysterious

References

S Abiteb oul and V Vianu Generic Computation And Its Complexity rd

ACM STOC

JCaiMFurer N Immerman An Optimal Lower Bound on the Number of

Variables for Graph Identication Combinatorica

A Chandra and D Harel Structure and Complexity of Relational Queries

st Symp on Foundations of Computer Science Also app eared

in a revised form in JCSS

K Etessami and N Immerman Tree Canonization and Transitive Closure

IEEE Symp Logic In Comput Sci

K Etessami and N Immerman Reachability and the Power of Lo cal Ordering

Eleventh Symp Theoretical Aspects Comp Sci

R Fagin Generalized FirstOrder Sp ectra and PolynomialTime Recognizable

Sets in Complexity of Computation ed R Karp SIAMAMS Proc

R Fagin Monadic generalized sp ectra Zeitschr f math Logik und Grund la

gen d Math

M R Garey and D S Johnson Computers and Intractability Freeman

E Gradel and G McColm On the Power of Deterministic Transitive Closures

Information and Computation

N Immerman Numb er of Quantiers is Better than Number of Tap e Cells

JCSS

N Immerman Upp er and Lower Bounds for First Order Expressibility JCSS

No

N Immerman Relational Queries Computable in Polynomial Time Informa

tion and Control A preliminary version of this pap er app eared

in th ACM STOC Symp

N Immerman Languages That Capture Complexity Classes SIAM J Com

put No A preliminary version of this pap er app eared in

th ACM STOC Symp

N Immerman Nondeterministic Space is Closed Under Complementation

SIAM J Comput No Also app eared in Third Structure

in Complexity Theory Conf

N Immerman Descriptive and Computational Complexity in Computational

Complexity Theory ed J Hartmanis Proc Symp in Applied Math Amer

ican Mathematical So ciety

k

N Immerman DSPACEn VARk Sixth IEEE Structure in Com

plexity Theory Symp

SY Kuro da Classes of Languages and LinearBounded Automata Informa

tion and Control

Yiannis N Moschovakis Elementary Induction on Abstract Structures North

Holland

S Patnaik and N Immerman DynFO A Parallel Dynamic Complexity

Class ACM Symp Principles Database Systems

WJSavitch Relationships Between Nondeterministic and Deterministic Tap e

Complexities J Comput System Sci

R Szelep csenyi The Metho d of Forced Enumeration for Nondeterministic Au

tomata Acta Informatica

M Vardi Complexity of Relational Query Languages th ACM STOC

Symp