The Power of Team Exploration: Two Rob ots Can Learn

Unlab eled Directed Graphs

 y

Michael A. Bender Donna K. Slonim

Aiken Computation Lab oratory MIT Lab oratory for Computer Science

Harvard University 545 Technology Square

Cambridge, MA 02138 Cambridge, MA 02139

b [email protected] [email protected]

Septemb er 6, 1995

Abstract

We show that two co op erating rob ots can learn exactly any strongly-connected directed

graph with n indistinguishable no des in exp ected time p olynomial in n.Weintro duce a new

typ e of homing sequence for two rob ots, which helps the rob ots recognize certain previously-seen

no des. We then present an algorithm in which the rob ots learn the graph and the homing se-

quence simultaneously by actively wandering through the graph. Unlike most previous learning

results using homing sequences, our algorithm do es not require a teacher to provide counterex-

amples. Furthermore, the algorithm can use eciently any additional information available that

distinguishes no des. We also present an algorithm in which the rob ots learn by taking random

walks. The rate at which a random walk on a graph converges to the stationary distribution

is characterized by the conductance of the graph. Our random-walk algorithm learns in ex-

p ected time p olynomial in n and in the inverse of the conductance and is more ecient than

the homing-sequence algorithm for high-conductance graphs.



Supp orted by NSF Grant CCR-93-13775.

y

Supp orted by NSF Grant CCR-93-10888 and NIH Grant 1 T32 HG00039-01.

1 Intro duction

Consider a rob ot trying to construct a street map of an unfamiliar cityby driving along the city's

roads. Since many streets are one-way, the rob ot may b e unable to retrace its steps. However, it

can learn by using street signs to distinguish intersections. Now supp ose that it is nighttime and

that there are no street signs. The task b ecomes signi cantly more challenging.

In this pap er we present a probabilistic p olynomial-time algorithm to solve an abstraction of the

ab ove problem by using two co op erating learners. Instead of learning a city,we learn a strongly-

connected directed graph G with n no des. Every no de has d outgoing edges lab eled from 0 to d 1.

No des in the graph are indistinguish able, so a rob ot cannot recognize if it is placed on a no de that

it has previously seen. Moreover, since the graph is directed, a rob ot is unable to retrace its steps

while exploring.

For this mo del, one might imagine a straightforward learning algorithm with a running time

p olynomial in a sp eci c prop erty of the graph's structure suchascover time or mixing time. Any

such algorithm could require an exp onential numb er of steps, however, since the cover time and

mixing time of directed graphs can b e exp onential in the numb er of no des. In this pap er, we present

2 5

a probabilistic algorithm for two rob ots to learn any strongly-connected directed graph in O d n 

steps with high probability.

The two rob ots in our mo del can recognize when they are at the same no de and can communicate

freely by radio. Radio communication is used only to synchronize actions. In fact, if we assume

that the two rob ots move synchronously and share a p olynomial-length random string, then no

communication is necessary.Thus with only minor mo di cations, our algorithms may b e used in a

distributed setting.

Our main algorithm runs without prior knowledge of the numb er of no des in the graph, n,in

time p olynomial in n.We show that no probabilistic p olynomial-time algorithm for a single rob ot

with a constantnumb er of p ebbles can learn all unlab eled directed graphs when n is unknown.

Thus, our algorithms demonstrate that two rob ots are strictly more p owerful than one.

1.1 Related Work

Previous results showing the p ower of team learning are plentiful, particularly in the eld of induc-

tive inference see Smith [Smi94] for an excellent survey. Several team learning pap ers explore the

+

problems of combining the abilities of a numb er of di erent learners. Cesa-Bianchi et al. [CBF 93]

consider the task of learning a probabilistic binary sequence given the predictions of a set of exp erts

on the same sequence. They showhow to combine the prediction strategies of several exp erts to

predict nearly as well as the b est of the exp erts. In a related pap er, Kearns and Seung [KS93]

explore the statistical problems of combining several indep endenthyp otheses to learn a target con-

cept from a known, restricted concept class. In their mo del, eachhyp othesis is learned from a

di erent, indep endently-drawn set of random examples, so the learner can combine the results to

p erform signi cantly b etter than any of the hyp otheses alone.

There are also many results on learning unknown graphs, but most previous work has concen-

trated on learning undirected graphs or graphs with distinguishable no des. For example, Deng and 1

Papadimitriou consider the problem of learning strongly-connected, directed graphs with labeled

no des, so that the learner can recognize previously-seen no des. They provide a learning algorithm

whose comp etitive ratio versus the optimal time to traverse all edges in the graph is exp onen-

tial in the de ciency of the graph [DP90, Bet92]. Betke, Rivest, and Singh intro duce the notion of

piecemeal learning of undirected graphs with lab eled no des. In piecemeal learning, the learner must

return to a xed starting p oint from time to time during the learning pro cess. Betke, Rivest, and

Singh provide linear algorithms for learning grid graphs with rectangular obstacles [BRS93], and

with Awerbuch [ABRS95] extend this work to show nearly-linear algorithms for general graphs.

Rivest and Schapire [RS87, RS93] explore the problem of learning deterministic nite automata

whose no des are not distinguishable except by the observed output. We rely heavily on their results

+ +

in this pap er. Their work has b een extended byFreund et al. [FK 93], and by Dean et al. [DA 92].

Freund et al. analyze the problem of learning nite automata with average-case lab elings by the

observed output on a random string, while Dean et al. explore the problem of learning DFAs with

a rob ot whose observations of the environment are not always reliable. Ron and Rubinfeld [RR95]

present algorithms for learning \fallible" DFAs, in which the data is sub ject to p ersistent random

errors. Recently, Ron and Rubinfeld [RR95b ] have shown that a teacher is unnecessary for learning

nite automata with small cover time.

In our mo del a single rob ot is p owerless b ecause it is completely unable to distinguish one no de

from any other. However, when equipp ed with a numb er of p ebbles that can b e used to mark

no des, the single rob ot's plight improves. Rabin rst prop osed the idea of dropping p ebbles to

mark no des [Rab67]. This suggestion led to a b o dy of work exploring the searching capabilities of a

nite automaton supplied with p ebbles. Blum and Sako da [BS77] consider the question of whether

a nite set of nite automata can search a 2 or 3-dimensional obstructed grid. They prove that a

single automaton with just four p ebbles can completely searchany 2-dimensional nite maze, and

that a single automaton with seven p ebbles can completely searchany 2-dimensional in nite maze.

They also prove, however, that no collection of nite automata can searchevery 3-dimensional

maze. Blum and Kozen [BK78] improve this result to show that a single automaton with 2 p ebbles

can search a nite, 2-dimensional maze. Their results imply that mazes are strictly easier to search

than planar graphs, since they also show that no single automaton with p ebbles can search all

planar graphs. Savitch [Sav73]intro duces the notion of a maze-recognizing automaton MRA,

whichisaDFA with a nite numb er of distinguishable p ebbles. The mazes in Savitch's pap er are

n-no de 2-regular graphs, and the MRAs have the added ability to jump to the no de with the next

higher or lower numb er in some ordering. Savitch shows that maze-recognizing automata and log n

space-b ounded Turing machines are equivalent for the problem of recognizing threadable mazes

i.e., mazes in which there is a path b etween a given pair of no des.

Most of these pap ers use p ebbles to mo del memory constraints. For example, supp ose that

the no des in a graph are lab eled with log n-bit names and that a nite automaton with k log n

bits of memory is used to search the graph. This situation is mo deled by a single rob ot with k

distinguishable p ebbles. A rob ot dropping a p ebble at a no de corresp onds to a nite automaton

storing the name of that no de. In our pap er, by contrast, weinvestigate time rather than space

constraints. Since memory is now relatively cheap but time is often critical, it makes sense to ask 2

whether a rob ot with any reasonable amount of memory can use a constantnumb er of p ebbles to

learn graphs in p olynomial time.

Co ok and Racko generalized the idea of p ebbles to jumping automata for graphs JAGs

[CR80]. A jumping automaton is equipp ed with p ebbles that can b e dropp ed to mark no des and

that can \jump" to the lo cations of other p ebbles. Thus, this mo del is similar to our two-rob ot

mo del in that the second rob ot maywait at a no de for a while to mark it and then catchupto

the other rob ot later. However, the JAG mo del is somewhat broader than the two-rob ot mo del.

Co ok and Racko show upp er and lower b ounds of log n and log n= log log n on the amount of space

required to determine whether there is a directed path b etween two designated no des in any n-no de

graph. JAGs have b een used primarily to prove space eciency for st-connectivity algorithms, and

they have recently resurfaced as a to ol for analyzing time and space tradeo s for graph traversal

+

and connectivity problems e.g. [BB 90,Po o93, Edm93].

Universal traversal sequences have b een used to provide upp er and lower b ounds for the ex-

ploration of undirected graphs. Certainly, a universal traversal sequence for the class of directed

graphs could b e used to learn individual graphs. However, for arbitrary directed graphs with n

no des, a universal traversal sequence must have size exp onential in n.Thus, such sequences will

not provide ecient solutions to our problem.

1.2 Strategy of the Learning Algorithm

The p ower b ehind the two-rob ot mo del lies in the rob ots' abilities to recognize each other and to

move indep endently. Nonetheless, it is not obvious how to harness this p ower. If the rob ots separate

in unknown territory, they could search for each other for an amount of time exp onential in the size

of the graph. Therefore, in any successful strategy for our mo del the two rob ots must always know

how to nd each other. One strategy that satis es this requirement has b oth rob ots following the

same path whenever they are in unmapp ed territory. They may travel at di erent sp eeds, however,

with one rob ot scouting ahead and the other lagging b ehind. We call this a lead-lag strategy.Ina

lead-lag strategy the lagging rob ot must rep eatedly make a dicult choice. The rob ot can wait at

a particular no de, thus marking it, but the leading rob ot may not nd this marked no de again in

p olynomial time. Alternatively, the lagging rob ot can abandon its current no de to catch up with

the leader, but then it may not knowhow to return to that no de. In spite of these diculties, our

algorithms successfully employ a lead-lag strategy.

Our work also builds on techniques of Rivest and Schapire [RS93]. They present an algorithm

for a single rob ot to learn minimal deterministic nite automata. With the help of an equivalence

oracle, their algorithm learns a homing sequence, which it uses in place of a reset function. It then

runs several copies of Angluin's algorithm [Ang87] for learning DFAs given a reset. Angluin has

shown that any algorithm for actively learning DFAs requires an equivalence oracle [Ang81].

In this pap er, weintro duce a new typ e of homing sequence for two rob ots. Because of the

strength of the homing sequence, our algorithm do es not require an equivalence oracle. For any

2 5

graph, the exp ected running time of our algorithm is O d n . In practice, our algorithm can

use additional information such as indegree, outdegree, or color of no des to nd b etter homing

sequences and to run faster. 3

Note that throughout the pap er, the analyses of the algorithms account for only the number of

steps that the rob ots take across edges in the graph. Additional calculations p erformed b etween

moves are not considered, so long as they are known to take time p olynomial in n. In practice,

such calculations would not b e a noticeable factor in the running time of our algorithms.

Two rob ots can learn sp eci c classes of directed graphs more quickly, such as the class of

graphs with high conductance. Conductance, a measure of the expansion prop erties of a graph,

was intro duced by Sinclair and Jerrum [SJ89]. The class of directed graphs with high conductance

includes graphs with exp onentially-large cover time. We present a randomized algorithm that learns

1

4

2

in O dn log n steps with high probability. graphs with conductance greater than n

2 Preliminaries

Let G =V; E represent the unknown graph, where G has n no des, each with outdegree d. An edge

from no de u to no de v with lab el i is denoted hu; i; v i.Wesay that an algorithm learns graph G

if it outputs a graph isomorphic to G. Our algorithms maintain a graph map which represents the

subgraph of G learned so far. Included in map is an implicit start no de u .Itisworth emphasizing

0

the di erence b etween the target graph G and the graph map that the learner constructs. The

graph map is meant to b e a map of the underlying environment, G.However, since the rob ots do

not always know their exact lo cation in G, in some cases map may contain errors and therefore may

not b e isomorphic to any subgraph of G. Much of the notation in this section is needed to sp ecify

clearly whether we are referring to a rob ot's lo cation in the graph G or to its putative lo cation in

map .

Anode u in map is called un nished if it has any unexplored outgoing edges. No de u is map-

reachable from no de v if there is a path from v to u containing only edges in map .For rob ot k , the

no de in map corresp onding to k 's lo cation in G if map is correct is denoted Loc k . Rob ot k 's

M

lo cation in G is denoted Loc k .

G

Let f b e an automorphism on the no des of G such that

8a; b 2 G; ha; i; bi2 G hf a;i;fbi2 G:

Wesay nodes c and d are equivalent written c d i there exists suchanf where f c= d.

Wenow present notation to describ e the movements of k rob ots in a graph. An action A of

i

the ith rob ot is either a lab el of an outgoing edge to explore, or the symbol r for \rest." A k -robot

sequence of actions is a sequence of steps denoting the actions of the k rob ots; each step is a k -tuple

hA ;:::;A i.For sequences s and t of actions, s  t denotes the sequence of actions obtained by

0 k 1

concatenating s and t.

A path is a sequence of edge lab els. Let jpath j represent the length of path. A rob ot fol lows

a path by traversing the edges in the path in order b eginning at a particular start no de in map.

The no de in map reached by starting at u and following path is denoted nal path , u . Let s be

0 M 0

atwo-rob ot sequence of actions such that if b oth rob ots start together at anynodeinany graph

and execute s, they follow exactly the same path, although p erhaps at di erent sp eeds. We call

such a sequence a lead-lag sequence. Note that if two rob ots start together and execute a lead-lag 4

sequence, they end together. The no de in G reached if b oth rob ots start at no de a in G and follow

lead-lag sequence s is denoted nal s, a.

G

For convenience, we name our rob ots Lewis and Clark. Whenever Lewis and Clark execute a

lead-lag sequence of actions, Lewis leads while Clark lags b ehind.

3 Using a Reset to Learn

Learning a directed graph with indistinguishabl e no des is dicult b ecause once b oth rob ots have

left a known p ortion of the graph, they do not knowhow to return. This problem would vanish

if there were a reset function that could transp ort b oth rob ots to a particular start no de u .We

0

describ e an algorithm for two rob ots to learn directed graphs given a reset. Having a reset is not a

realistic mo del, but this algorithm forms the core of later algorithms, which learn without a reset.

Algorithm Learn-with-Reset maintains the invariant that if a rob ot starts at u , there is a

0

directed path it can follow that visits every no de in map at least once. To learn a new edge one

not yet in map  using algorithm Learn-with-Reset, Lewis crosses the edge and then Clark tours

the entire known p ortion of the map. If they encounter each other, Lewis's p osition is identi ed;

otherwise Lewis is at a new no de. The depth- rst strategy employed by Learn-Edge is essential

in later algorithms. In Learn-with-Reset, as in all the pro cedures in this pap er, variables are

passed by reference and are mo di ed destructively.

2

Lemma 1 The variable path in Learn-with-Reset denotes a tour of length  dn that starts at

u and traverses al l edges in map.

0

Learn-with-Reset:

1 map := fu g; ; f map is the graph consisting of no de u and no edges g

0 0

2 path := empty path f path is the null sequence of edge lab els g

3 k := 1 f k counts the numb er of no des in map g

4 while there are un nished no des in map

5 do Learn-Edgemap,path,k 

6 Reset

7 return map

Learn-Edgemap,path,k : fpath = tour through all edges in map g

1 Lewis follows path to nal path,u 

M 0

2 u := some un nished no de in map map-reachable from Loc Lewis

i M

3 Lewis moves to no de u ; app end the path taken to path

i

4 pick an unexplored edge l out of no de u

i

5 Lewis moves along edge l ; app end edge l to path f Lewis crosses a new edge g

6 Clark follows path to nal path,u  f Clark lo oks for Lewis g

M 0

7 if 9j

j

8 then add edge hu ;l;u i to map

i j

9 else add new no de u and edge hu ;l;u i to map

k i k

10 k := k +1 5

Pro of: Every time Lewis crosses an edge, that edge is app ended to path. Since no edge is added to

map until Lewis has crossed it, path must traverse all edges in map. In each call to Learn-Edge,at

2

most n edges are added to path. The b o dy of the while lo op is executed dn times, so jpath j dn .

2

Lemma 2 Map is always a subgraph of G.

Pro of: Initially map contains a single no de u and no edges. Assume inductively that map is a

0

subgraph of G after the cth call to Learn-Edge when map has c edges. To learn the next edge,

the algorithm cho oses a no de u in map and explores a new edge e = hu ;l;vi. By Lemma 1 and

i i

the inductivehyp othesis, if Clark encounters Lewis at u then v is identi ed as u . Otherwise v is

j j

recognized to b e a new no de and named u . Therefore the up dated map is a subgraph of G. 2

k

Lemma 3 If map contains any un nishednodes, then there is always some un nishednode in

map map-reachable from nal path,u .

M 0

Pro of: Supp ose this assumption were false. Then there is some un nished no de in map , but

all no des of map in the strongly-connected comp onentof nal path,u  are nished. Thus by

M 0

Lemma 2, there are no additional edges of G leaving that comp onent, so graph G is not strongly

connected. 2

2 3

Theorem 4 After O d n  moves and dn cal ls to Reset, Learn-with-Reset halts and outputs a

graph isomorphic to G.

Pro of: The correctness of the output follows from Lemmas 1 { 3. For each call to Learn-Edge,

2

each rob ot takes lengthpath dn steps. The algorithm Learn-Edge is executed at most dn

2 3

times, so the algorithm halts within O d n  steps. 2

4 Homing Sequences

In practice, rob ots learning a graph do not have access to a reset function. In this section we

suggest an alternative technique: weintro duce a new typ e of homing sequence for two rob ots.

Intuitively, a homing sequence is a sequence of actions whose observed output uniquely deter-

mines the nal no de reached in G. Rivest and Schapire [RS93] showhow a single rob ot with a

teacher can use homing sequences to learn strongly-connected minimal DFAs. The output at each

no de indicates whether that no de is an accepting or rejecting state of the automaton. If the target

DFA is not minimal, their algorithm learns the minimal enco ding of the DFA. In other words, their

algorithm learns the function that the graph computes rather than the structure of the graph.

In unlab eled graphs the no des do not pro duce output. However, two rob ots can generate output

indicating when they meet.

De nitions: Each step of a two-rob ot sequence of actions pro duces an output symbol T if the



rob ots are together and S if they are separate. An output sequence is a string in fT; Sg denoting 6

the observed output of a sequence of actions. Let s b e a lead-lag sequence of actions and let a be

anodein G. Then outputs,a denotes the output pro duced by executing the sequence s, given

that both rob ots start at a. A lead-lag sequence s of actions is a two-robot homing sequence i

8 no des u; v 2 G;

output s; u= output s; v   nal s; u nal s; v :

G G

Because the output of a sequence dep ends on the p ositions of b oth rob ots, it provides information

ab out the underlying structure of the graph. Figure 1 illustrates the de nition of a two-rob ot

homing sequence. This new typ e of homing sequence is p owerful. Unlike most previous learning

results using homing sequences, our algorithms do not require a teacher to provide counterexamples.

In fact, two rob ots on a graph de ne a DFA whose states are pairs of no des in G and whose

edges corresp ond to pairs of actions. Since the automata de ned in this way form a restricted class

of DFAs, our results are not inconsistent with Angluin's work [Ang81] showing that a teacher is

necessary for learning general DFAs.

Theorem 5 Every strongly-connected directedgraph has a two-robot homing sequence.

Pro of: The following algorithm based on that of Kohavi [Koh78, RS93 ] constructs a homing

sequence: Initially, let h b e empty. As long as there are two no des u and v in G such that outputh,u

= outputh,v but nalh,u 6 nalh,v, let x b e a lead-lag sequence whose output distinguishes

nalh,u from nalh,v. Since nalh,u 6 nalh,v and G is strongly connected, suchanx

always exists. App end x to h.

Each time a sequence is app ended to h, the numb er of di erent outputs of h increases byat

least 1. Since G has n no des, there are at most n p ossible output sequences. Therefore, after n 1

iterations, h is a homing sequence. 2

In Section 5 we show that it is p ossible to nd a counterexample x eciently. Given a strongly-

connected graph G and a no de a in G, a pair of rob ots can verify whether they are together at a no de

equivalenttoa on some graph isomorphic to G.We describ e a veri cation algorithm Verifya,

G in Section 5. The sequence of actions returned by a call of Verifyu; G is always a suitable

counterexample x. Using the b ound from Corollary 8, we claim that this algorithm pro duces a

4

homing sequence of length O n  for all graphs. Note that shorter homing sequences exist; the

3

homing sequence pro duced by algorithm Learn-Graph in Section 5 has length O dn .

4.1 Using a Homing Sequence to Learn

Given a homing sequence h, an algorithm can learn G by maintaining several running copies of

Learn-with-Reset. Instead of a single start no de, there are as manyasn p ossible start no des,

each corresp onding to a di erent output sequence of h. Note that many distinct output sequences

may b e asso ciated with the same nal no de in G.

The new algorithm Learn-with-HS maintains several copies of map and path, one for each

output sequence of h. Thus, graph map denotes the copy of the map asso ciated with output

c 7 0 0

a b

8

T if Lewis and Clark are to-

1 > >

1 >

>

<

gether at a no de

output = >

0 1 >

>

S if Lewis and Clark are at

>

: separate no des.

c 0 d

1



Lewis: 0r



sequence s =

Lewis: 0r 1r

Clark: r 0

homing sequence h =

Clark: r 0r 1

start no de output end no de start no de output end no de

a TTST b a TT a

b ST ST b b ST a

c ST ST b c ST a

d STTT c d ST c

Figure 1: Illustration of a two-rob ot homing sequence and a lead-lag sequence. Note that b oth h

and s are lead-lag sequences. However, sequence h is a two-rob ot homing sequence, b ecause for

each output sequence there is a unique end no de. Note that the converse is not true. Sequence

s is not a two-rob ot homing sequence, b ecause the rob ots may end at no des a or c and yet see the

same output sequence ST .

sequence c. Initially, Lewis and Clark are at the same no de. Whenever algorithm Learn-with-

Reset would use a reset, Learn-with-HS executes the homing sequence h. If the output of h is

c, the algorithm learns a new edge in map as if it had b een reset to u in map see Figure 2.

c 0 c

After each execution of h, the algorithm learns a new edge in some map . Since there are at most

c

n copies, each with dn edges to learn, eventually one map will b e completed. Recall that a homing

sequence is a lead-lag sequence. Therefore, at the b eginning and end of every homing sequence the

two rob ots are together.

Theorem 6 If Learn-with-HS is cal led with a homing sequence h as input, it halts within

2 4 2

O d n + dn jhj steps and outputs a graph isomorphic to G.

Pro of: The algorithm Learn-with-HS maintains at most n running versions of Learn-with-

Reset, one for each output of the homing sequence. In particular, whenever the two rob ots execute 8

Learn-with-HSh:

1 done := FALSE

2 while not done

3 do execute h; c := the output sequence pro duced f instead of a reset g

4 if map is unde ned

c

5 then map := fu g; ; f map = graph consisting of no de u and no edges g

c 0 c 0

6 path := empty path f path is the null sequence of edge lab els g

c c

7 k := 1 f k counts the numb er of no des in map g

c c

8 Learn-Edgemap ,path ,k 

c c c

9 if map has no un nished no des

c

10 then done := TRUE

11 return map

c

a homing sequence and obtain an output c, they have identi ed their p osition as the start no de u

0

in map , and can learn one new edge in map b efore executing another homing sequence.

c c

Eventually, one of the versions halts and outputs a complete map . Therefore, the correctness

c

of Learn-with-HS follows directly from Theorem 4 and the de nition of a two-rob ot homing

sequence.

2 3

Let r = O d n  b e the numb er of steps taken by Learn-with-Reset. Since there are at most

2

n start no des, Learn-with-HS takes at most nr + dn jhj steps. 2

5 Learning a Homing Sequence

Unlike a reset function, a two-rob ot homing sequence can b e learned. The algorithm Learn-Graph

maintains a candidate homing sequence h and improves h as it learns G.

De nition: Candidate homing sequence h is called a bad homing sequence if there exist no des

u; v , u 6= v , such that outputh; u= outputh; v , but nal h; u 6 nal h; v .

G G

De nition: Let a be a node in G.Wesay that map with start no de u is a goodrepresentation

c 0

c c

of ha; Gi i there exists an isomorphism f from the no des in map =V ;E  to the no des in a

c

0 0 0

subgraph G =V ;E ofG, such that f u = a, and

0

c c 0

8u ;u 2 V ; hu ;`;u i2 E hf u ;`;fu i2 E :

i j i j i j

In algorithms Learn-with-Reset and Learn-with-HS, the graphs map and map are always

c

go o d representations of G.InLearn-Graph if the candidate homing sequence h is bad, a particular

map may not b e a go o d representation of G. However, the algorithm can test for such maps.

c

Whenever a map is shown to b e in error, h is improved and all maps are discarded. By the pro of

c

of Theorem 5, we know that a candidate homing sequence must b e improved at most n 1 times.

In Section 5.1 we explain how to use adaptive homing sequences to discard only one map p er

improvement.

Wenow de ne a test that with probability at least 1=n detects an error in map if one exists.

c 9

output of starting no de map

homing sequence of map

1

TTST b b 10dc

0

ST ST b

b 1 a

STTT c

c 01ab

Figure 2: A p ossible \snapshot" of the learners' knowledge during an execution of Learn-with-

HS. The rob ots are learning the graph G from Figure 1 using the two-rob ot homing sequence h

from Figure 1. No de names in maps are not known to the learner, but are added for clarity. The

following example demonstrates how the rob ots learn a new edge using Learn-with-HS. Supp ose

that the rob ots execute h and see output TTST. Then the rob ots are together at no de b. Lewis

follows path 1; 0 to un nished no de c and then crosses the edge lab eled 1. Now Clark follows path

1; 0; 1. Since Clark sees Lewis after 2 steps, the dotted edge is added to map . Next, the rob ots

TTST

execute h again and see output ST ST .Thus, they go on to learn some edge in map .

ST ST

De nition: Let path b e a path such that a rob ot starting at u and following path traverses

c 0 c

c c c

every edge in map =V ;E . Let u :::u b e the no des in V numb ered in order of their rst

c 0 m

app earance in path . If b oth rob ots are at u then test u  denotes the following lead-lag sequence

c 0 c i

of actions: 1 Lewis follows path to the rst o ccurrence of u ; 2 Clark follows path to the rst

c i c

o ccurrence of u ; 3 Lewis follows path to the end; 4 Clark follows path to the end.

i c c

De nition: Given map and any lead-lag sequence t of actions, de ne expectedt, map tobe

c c

the exp ected output if map is correct and if b oth rob ots start at no de u and execute sequence t.

c 0

We abbreviate expectedtest u ,map by expectedtest u .

c i c c i

Lemma 7 Suppose Lewis and Clark areboth at some node a in G.Let path bea path such that

c

arobot starting at u and fol lowing path traverses every edge in map . Then map is a good

0 c c c

c

representation of ha; Gi i 8u 2 V ; outputtest u =exp ectedtest u .

i c i c i

Pro of:

=: By de nition of go o d representation and expectedtest u .

c i

=: Supp ose that all tests pro duce the exp ected output. We de ne a function f as follows: Let

f u =a. Let pu  b e the pre x of path up to the rst o ccurrence of u . De ne f u  to b e the

0 i c i i 10

Learn-Graph:

1 done := FALSE

2 h :=  empty sequence

3 while not done

4 do execute h; c := the output sequence pro duced f instead of a reset g

5 if map is unde ned

c

6 then map := fu g; ; f map is the graph consisting of no de u and no edges g

c 0 c 0

7 path := empty path f path is the null sequence of edge lab els g

c c

8 k := 1 f k counts the numb er of no des in map g

c c

9 if map has no un nished no de map-reachable from nal path 

c M c

10 then Lewis and Clark moveto nal path 

M c

11 comp := maximal strongly-connected comp onentinmap containing nal path 

c M c

12 h-improve := Verify nal path ,comp

M c

13 if h-improve =

14 then done := TRUE f map is complete g

c

15 else f h-improve 6= . error detected g

16 app end h-improve to end of h f improve homing sequence :::g

17 discard all maps and paths f :::and start learning maps from scratch g

18 else v := value of a fair 0/1 coin ip f learn edges or test for errors? g

19 if v =0 f test for errors g

20 then u := a random no de in map f randomly pick no de to test g

i c

21 h-improve := Testmap , path , i

c c

22 if h-improve 6= f error detected g

23 then app end h-improve to end of h f improve homing sequence :::g

24 discard all maps and paths f :::start learning maps from scratch g

25 else Learn-Edgemap ,path ,k 

c c c

26 return map

c

Testmap , path , i: fu ;u ;:::;u = the no des in map indexed by rst app earance in path g

c c 0 1 k c c

1 h-improve := the following sequence of actions:

2 Lewis follows path to the rst o ccurrence of u in path

c i c

3 Clark follows path to the rst o ccurrence of u in path

c i c

4 Lewis follows path to the end

c

5 Clark follows path to the end

c

6 if outputh-improve 6= expected-outputh-improve f if error detected g

7 then return h-improve f return test u  g

c i

8 else return  f return empty sequence g

Verifyv , map: f v ;v ;:::;v are the no des in map ordered by rst app earance in pg

0 0 1 k

1 path := path such that a rob ot starting at v in map and following path visits all no des

0

in map and returns to v

0

2 for each i; 0  i

3 do h-improve := Testmap, path, i

4 if h-improve 6=

5 then return h-improve

6 return  11

0 0 0

no de in G that a rob ot reaches if it starts at a and follows pu . Let G =V ;E  b e the image

i

of f map onV; E.

c

c 0 0

We rst show that f is an isomorphism from V to V . By de nition of G , f must b e surjective.

c

To see that f is injective, assume the contrary. Then there exist two no des u ;u 2 V such that

i j

i 6= j but f u =f u . But then outputtest u  6= expectedtest u , which contradicts our

i j c i c i

c 0

assumption that all tests succeed. Next, we show that hu ;`;u i2V hf u ;`;fu i2 V ;

i j i j

proving that map is a go o d representation of ha; Gi.

c

0

=: By de nition of G , the image of map .

c

c 0

=: Inductively assume that hu ;`;u i2V hf u ;`;fu i2 V for the rst m edges

i j i j

in path , and supp ose that this pre x of the path visits only no des u :::u . Now consider the

c 0 i

m + 1st edge e = ha; `; bi. There are two p ossibilities. In one case, edge e leads to some new no de

0

u . Then by de nition f u is b's image in G,sohf a;`;fbi2 G . Otherwise e leads to

i+1 i+1

some previously-seen no de u . Supp ose that f u is not the no de reached in G by starting at

ik ik

u and following the rst m + 1 edges in path . Then outputtest u  6= expectedtest u ,

0 c c ik c ik

so test u  fails, and we arrive at a contradiction. Therefore f b= f u  and hf a;`;fbi

c ik ik

0

2 G . 2

Corollary 8 Suppose Lewis and Clark aretogether at u in map .Let map be strongly connected

0 c c

and have n nodes, u ;:::u . Then the two robots can verify whether map is a goodrepresen-

0 n1 c

3

tation of hLoc Lewis;Gi in O n  steps.

G

Pro of: Since map is strongly connected, there exists a path path with the following prop erty:

c c

a rob ot starting at u and following path visits all no des in map and returns to u . Index the

0 c c 0

remaining no des in order of their rst app earance in path . The two rob ots verify whether, for all

c

u in order, outputtest u  = expectedtest u . Note that Lewis and Clark are together at u

i c i c i 0

2

after each test. By Lemma 7, this pro cedure veri es map . Since path has length O n , each test

c c

2 3

has length O n , so veri cation requires O n  steps. 2

In Learn-Graph after the rob ots execute a homing sequence, they randomly decide either to

learn a new edge or to test a random no de in map . The following lemma shows that a test that

c

failed can b e used to improve the homing sequence.

Lemma 9 Let h bea candidate homing sequencein Learn-Graph, and let u bea node such

k

that outputtest u  6= exp ectedtest u . Then thereare two nodes a; b in G that h does not

c k c k

distinguish but that h  test u  does.

c k

Pro of: Let a be a node in G such that when b oth rob ots start at a, outputtest u  6=

c k

expectedtest u . Supp ose that at step i in test u , the exp ected output is T resp ectively

c k c k

S , but the actual output is S resp. T . Each edge in path and map was learned using Learn-

c c

Edge.Ifmap indicates that the ith no de in path is u , there must b e a start no de b in G where

c c k

u really is the ith no de in path . Since outputtest u  6= expectedtest u , the sequence h 

k c c k c k

test u  distinguishes a from b. 2

c k 12

The algorithm Learn-Graph runs until there are no more map-reachable unexplored no des in

some map .Ifmap is not strongly connected, then it is not a go o d representation of G. In this

c c

case, the representation of the last strongly-connected comp onentonpath must b e incorrect. Thus,

calling Verify on this comp onent from the last no de on path returns a sequence that improves h.

If map is strongly connected, then either Verify returns an improvement to the homing sequence,

c

or map is a go o d representation of G.

c

Before we can prove the correctness of our algorithm, we need one more set of to ols. Consider

the following statement of Cherno b ounds from Raghavan [Rag89].

Lemma 10 Let X ;:::;X be independent Bernoul li trials with E [X ]= p . Let the random

1 m j j

P

m

variable X = X , where  = E [X ]  0. Then for >0,

j

j =1

 "





e

; Pr[X>1 +   ] <

1+

1 +  

and

2

 =2

Pr[X<1   ]

In our analysis in this section and in Section 6, the random variables may not b e indep endent.

However, the following corollary b ounds the conditional probabilities. The pro of of this corollary

is exactly analogous to that of a similar corollary by Aumann and Rabin [AR94, Corollary 1].

Corollary 11 Let X ;:::;X be 0/1 random variables not necessarily independent, and let b 2

1 m j

P

m

X .For any b ;:::;b and >0,if f0; 1g for 1  j  m.Let the random variable X =

j 1 j 1

j =1

P

m

p > 0; then Pr[X =1jX = b ;X = b ;:::;X = b ]  p and  =

j j 1 1 2 2 j 1 j 1 j

j =1

" 





e

; Pr[X>1 +   ] <

1+

1 +  

and for any b ;:::;b and >0,ifPr[X =1jX = b ;X = b ;:::;X = b ]  p and

1 j 1 j 1 1 2 2 j 1 j 1 j

P

m

 = p > 0; then

j

j =1

2

 =2

Pr[X<1   ]

Theorem 12 The algorithm Learn-Graph always outputs a map isomorphic to G and halts in

2 6 cn

O d n  steps with overwhelming probability  1 e , whereconstant c>0 can be chosen as

needed.

Pro of: Since Learn-Graph veri es map b efore nishing, if the algorithm terminates then by

c

Corollary 8 it outputs a map isomorphic to G. It is therefore only necessary to show that the

2 6

algorithm runs in O d n  steps with overwhelming probability.

In each iteration of the while lo op in Learn-Graph, if there are no map-reachable un nished

no des, then the algorithm attempts to verify the map. Otherwise, the algorithm decides randomly

whether to learn a new edge or to test a random no de in the graph. It follows from Lemma 10 that

a constant fraction of the random decisions are for learning and a constant fraction are for testing. 13

2 3

By Theorem 4 the total numb er of steps sp ent learning edges in eachversion of map is O d n .

For each candidate homing sequence, there are n versions of map , and the candidate homing

2 5

sequence is improved at most n times. Thus, O d n  steps are sp ent learning no des and edges.

2

We consider the numb er of steps taken testing no des. Each test requires jpathj = O dn  steps.

Once a map contains an error, the probability that the rob ots cho ose to test a no de that is in error

is at least 1=n. A map with more than dn edges must b e faulty. Note that the candidate homing

sequence is improved at most n 1 times. Thus by Corollary 11, with overwhelming probability

2

after O n  tests of maps with at least dn no des, the candidate sequence h is a homing sequence.

3 3

Overall, the algorithm has to learn O dn  edges, and therefore it executes O dn  tests. Thus the

2 5

total numb er of steps sp ent testing is O d n .

After each test or veri cation, the algorithm executes a candidate homing sequence. Since

3

there are O dn edges in each map, candidate homing sequences are executed O dn  times. Each

improvement of the candidate homing sequence extends its length by jpathj, so the time sp ent

2 6 2 6

executing homing sequences is O d n . Thus, the total running time of the algorithm is O d n .

2

5.1 Improvements to the Algorithm

The running time for Learn-Graph can b e decreased signi cantly by using two-rob ot adaptive

homing sequences. As in Rivest and Schapire [RS93], an adaptive homing sequence is a decision

tree, so the actions in later steps of the sequence dep end on the output of earlier steps. With an

adaptive homing sequence, only one map needs to b e discarded each time the homing sequence is

c

2 5

improved. Thus the running time of Learn-Graph decreases by a factor of n to O d n .

Any additional information that distinguishes no des can b e included in the output, so homing

sequences can b e shortened even more. For example, a rob ot learning an unfamiliar city could

easily count the numb er of roads leading into and out of intersections. It might also recognize

stop signs, trac lights, railroad tracks, or other common landmarks. Therefore, in any practical

2 5

application of this algorithm we exp ect a signi cantly lower running time than the O d n  b ound

suggests.

Graphs with high conductance can b e learned even faster using the algorithm presented in

Section 6.

5.2 Limitations of a Single Rob ot with Pebbles

Wenow compare the computational p ower of two rob ots to that of one rob ot with a constant

numb er of p ebbles. Note that although Learn-Graph runs in time p olynomial in n, the algorithm

requires no prior knowledge of n.We argue here that a single rob ot with a constantnumber of

p ebbles cannot eciently learn strongly-connected directed graphs without prior knowledge of n.

1

Asatoolweintro duce a family C = [ C of graphs called combination locks. For a graph

n n

1

Graphs of this sort have b een used in theoretical computer science for manyyears see [Mo o56 ], for example.

More recently they have reemerged as to ols to prove the hardness of learning problems. We are not sure who rst

coined the term \combination lo ck." 14

C =V; Ein C the class of n-no de combination lo cks, V = fu ;u ;:::;u g and either

n 0 1 n1

hu ; 0;u i and hu ; 1;u i2E; or hu ; 1;u i and hu ; 0;u i2E; for all i  n see

i i+1 mod n i 0 i i+1 mod n i 0

Figure 3a. In order for a rob ot to \pick a lo ck" in C | that is, to reachnode u |itmust

n n1

follow the unique n-no de simple path from u to u .Thus any algorithm for a single rob ot with

0 n1

n

no p ebbles can exp ect to take 2  steps to pick a random combination lo ckin C . n

1000,1 1 0

0 1 0 1 1 (a) uuuuu012345u

1 0010 10

u 0,1 u 0,1 u 0,1 u 0,1 uu0,1 0 u 1 u 0 u 1 u 1 u

(b) 012345678910

Figure 3: a A combination-lo ck, whose combination is 0; 1; 0; 1; 1. b A graph in R . Graphs in

11

1

R = [ R cannot b e learned by one rob ot with a constantnumb er of p ebbles.

n

n=1

We construct a restricted family R of graphs and consider algorithms for a single rob ot with a

single p ebble. For all p ositiveintegers n, the class R contains all graphs consisting of a directed

n

ring of n=2 no des with an n=2-no de combination lo ck inserted into the ring as in Figure 3b. Let

1

R .We claim that there is no probabilistic algorithm for one rob ot and one p ebble that R = [

n

n=1

learns arbitrary graphs in R in p olynomial time with high probability.

To see the claim, consider a single rob ot in no de u of a random graph in R.Until the rob ot

0

drops its p ebble for the rst time it has no information ab out the graph. Furthermore, with

n

high probability the rob ot needs to take 2  steps to emerge from a randomly-chosen n-no de

combination lo ck unless it drops a p ebble in the lo ck. But since the size of the graph is unknown,

the rob ot always risks dropping the p ebble b efore entering the lo ck. If the p ebble is dropp ed outside

the lo ck, the rob ot will not see the p ebble again until it has passed through the lo ck. A rob ot that

cannot nd its p ebble has no way of marking no des and cannot learn.

More formally, supp ose that there were some probabilistic algorithm for one rob ot and a p ebble

to learn random graphs in R in p olynomial time with probability greater than 1/2. Then there

must b e some constant c such that the probability that the rob ot drops its p ebble in its rst c steps 15

is greater than 1/2. Otherwise, the probability that the algorithm fails to learn in time p olynomial

in n is greater than 1/2. Therefore, the probability that the rob ot loses its p ebble and fails to

learn a random graph in R eciently is at least 1/2.

2c

A similar argument holds for a rob ot with a constantnumb er of p ebbles. We conjecture that

even if the algorithm is given n as input, a single rob ot with a constantnumb er of p ebbles cannot

learn strongly-connected directed graphs. However, using techniques similar to those in Section 6,

one rob ot with a constantnumb er of p ebbles and prior knowledge of n can learn high-conductance

graphs in p olynomial time with high probability.

6 Learning High Conductance Graphs

For graphs with go o d expansion, learning bywalking randomly is more ecient than learning by

using homing sequences. In this section we de ne conductance and present an algorithm that runs

p

2

log n=dn . more quickly than Learn-Graph for graphs with conductance greater than

6.1 Conductance

The conductance [SJ89] of a graph characterizes the rate at which a random walk on the graph

converges to the stationary distribution  . For a given directed graph G =V; E, consider a

0

weighted graph G =V; E; W  with the same vertices and edges as G, but with edge weights

de ned as follows. Let M = fm g b e the transition matrix of a random walk that leaves i by

i;j

each outgoing edge with probability1=2  degree i and remains at no de i with probability1=2.

0 t 0 t

Let P b e an initial distribution on the n no des in G, and let P = P M b e the distribution after

t steps of the walk de ned by M . Note that  is a steady state distribution if for every no de

t+1

t

i; P =  ! P =  .For irreducible and ap erio dic Markovchains,  exists and is unique.

i i

i

i

Then the edge weight w =  m is prop ortional to the steady state probability of traversing the

i;j i i;j

edge from i to j . Note that the total weightentering a no de is equal to the total weight leaving it;

P P

that is, w = w .

i;j j;i

j j

S . For sets of no des S and T , let W = Consider a set S V which de nes a cut S;

S;T

P

w .We denote W by W ,soW represents the total weight of all edges in the graph.

s;t S;V S V

s2S;t2T

P

=  = W =W : Then the conductance of S is de ned as  = W

i S S

i2S

S S;S S;

The conductance of a graph is the least conductance over all cuts whose total weight is at most

W =2: G = min fmax  ; g : The conductance of a directed graph can b e exp onentially

V S S

S

small.

2 2

Mihail [Mih89] shows that after a walk of length  log 2n= , the L norm of the distance

1

P

between the current distribution P and the stationary distribution  is at most  i.e. jP  j

i i

i

2

. In the rest of this section, a choice of  =1=n is sucient, so a random walk of length

2 5 2 5

 log 2n  is used to approximate the stationary distribution. We call T =  log 2n  the

approximate mixing time of a random walk on an n-no de graph with conductance . 16

6.2 An Algorithm for High Conductance Graphs

If a graph has high conductance it can b e learned more quickly. In a high-conductance graph, we

can estimate the steady state probabilityofnodei by leaving Clark at no de i while Lewis takes w

2 5

random walks of  log 2n  steps each. Let x b e the numb er of times that Lewis sees Clark at

the last step of a walk. If w is large enough, x=w is a go o d approximation to  .

i

2

De nitions: Call a no de i a likely no de if   1=2n +1=n . Note that every graph must haveat

i

2 2

least one such no de. The 1=n term app ears b ecause of the distance  =1=n from the stationary

distribution. Its inclusion here simpli es the analysis later. A no de that is not likely is called

unlikely.

Algorithm Learn-Graph2 uses this estimation technique to nd a likely no de u and then calls

0

the pro cedure Build-Map to construct a map of G starting from u . The pro cedure Build-Map

0

learns at least one new edge each iteration by sending Lewis across an unexplored edge hu; `; v i of

some un nished no de u in map. Clark waits at start no de u while Lewis walks randomly until

0

he meets Clark. If u is a likely no de, this walk is exp ected to take O Tn steps. Lewis stores

0

this random walk in the variable path.Thus, path is the lab el of the edge traversed at the ith step

i

of the random walk, path[i:::j] represents edges path to path , and jpathj represents the length of

i j

path. Wesay that path-stepLewis = i if Lewis has just crossed the ith edge on path.

Learn-Graph2w; B; M; T :

1 done := FALSE

2 5

2 T:=  log 2n  f the mixing time g

3 while not done

4 do map := fu g; ; f map is the graph consisting of no de u and no edges g

0 0

5 lost := FALSE

6 Lewis and Clark together take a random walk of length T

7 Lewis takes w random walks of length T f approx. stationary prob. of Loc Clark g

G

8 x := number of walks where Lewis and Clark are together at the last step

9 path := the path Lewis followed since leaving Clark

10 if x=w  B f b ound B<1=n chosen for ease of pro of g

11 then Clark follows path to catch up to Lewis f not at a frequently-visited no de g

12 else Lewis moves randomly until he sees Clark f call no de where they meet u g

0

13 done := Build-Mapmap,M; T 

14 return map

The pro cedure Compress-Path returns the shortest subpath of path that connects v to u .

0

Finally, Truncate-Path-at-Map compares no des on the path with all no des in map and returns

the shortest subpath connecting v to some no de in map. By adding the nal path to the map,

Build-Map connects the new no de v to map ,somap always represents a strongly connected

subgraph of G. Figure 4 illustrates a single iteration of the main lo op in Build-Map.

Algorithm Learn-Graph2 takes as input parameters the numb er of random walks w , a b ound

B to separate the probability of likely and unlikely no des wecho ose B to b e approximately 3=4n,

the mixing time T , and a quantity M . This quantityischosen so that the probability of a rob ot's 17

Build-Mapmap,M; T :

1 while there are un nished no des in map and not lost f Lewis and Clark b oth at u g

0

2 do u := an un nished no de in map

i

3 restart := a minimal path in map from u to u

0 i

4 m := largest index of the no des in map

5 path := empty path

6 Lewis follows restart and traverses unexplored edge ` f Lewis crosses a new edge g

7 while lengthpath

0 0

8 do Lewis traverses a random edge ` and adds ` to end of path

9 if rob ots are together f :::until he sees Clark at u g

0

10 then b oth rob ots follow restart to u and cross edge `

i

11 path := Compress-Pathpath, restart f removes lo ops from path g

12 path, u := Truncate-Path-At-Mappath, map f shortest path backtomap g

j

13 if jpathj =0

14 then add edge hu ;`;u i to map

i j

15 else add no des u ;:::;u to map

m+1

m+jpath j

16 add edges hu ;`;u i and hu ; path ;u i to map

i m+1 j

m+jpath j jpathj

17 8k; 1  k

m+k m+k +1

k

18 b oth rob ots moveto u

0

19 else f if Lewis walks MT steps without seeing Clark g

20 Clark follows restart, `,path to catch up to Lewis

21 lost := TRUE

22 if lost

23 then return FALSE

24 else return TRUE

Compress-Path path,restart:

1 while Clark not at end of path f Lewis and Clark b oth at u g

0

2 do while Lewis not at end of path

3 do Lewis traverses the next edge of path

4 if Lewis and Clark are together f found a lo op in path | removeit g

5 then path := path [1 :::path-stepClark  path[path-stepLewis + 1 :::jpathj]

6 Lewis follows restart and edge `

7 Lewis traverses edges path[1 :::path-stepClark ]

8 b oth rob ots are now together and traverse one edge of path

9 return path f Lewis and Clark b oth at u g

0

Truncate-Path-At-Mappath,map :

1 earliest := jpathj f rst p osition on path that is a no de already in map g

2 earliest-node := u f the name of this no de g

0

3 for each no de u in map

k

4 Clark moves to u

k

5 Lewis follows restart and edge `

6 while Lewis not at end of path

7 do if Lewis and Clark are together and path-stepLewis  < earliest

8 then earliest := path-stepLewis 

9 earliest-node := u

k

10 Lewis traverses next edge on path

11 b oth rob ots moveto u

0

12 return path [1 :::earliest ], earliest-node map

l ui

u0 (a)

map

l ui u 0 uj

(b)

Figure 4: Pro cedure Build Map during one execution of the while lo op. The ovals represent map,

the p ortion of graph G learned so far. Note that map is strongly connected. No de u , the rst

0

no de added to map, is with high probability a no de with a large stationary probabilityalikely

node. The rob ots nd u in pro cedure Learn-Graph2 using random walks in line 2 of Build

0

Map. The rob ots agree on a no de u with unexplored outgoing edges an un nishednode. Then

i

Lewis moves to u and follows the unexplored edge `, while Clark stays at u . Since ` is unexplored,

i 0

Lewis is now at an unknown no de. Lewis walks randomly until, visiting u , he nds Clark. The

0

dotted line of Figure 4 [a] depicts this random walk, denoted path. Random walk path may pass

through the same no de many times. In pro cedure Compress-Path, the rob ots collectively remove

all of the lo ops from the path reducing the path to the solid line in [a] and [b]. In pro cedure

Truncate-Path-at-Map, the rob ots nd u , the rst no de in path already in map. All the no des

j

and edges of path until u the b old line in [b] are added to map.

j 19

starting at a likely no de and walking randomly for MT steps without returning to its start no de is

very small.

In sections 6.3 and 6.4 we prove the following theorems.

Theorem 13 When Learn-Graph2 halts, it outputs a graph isomorphic to G.

p

3

4 +  dn and M = Theorem 14 Suppose Learn-Graph2 is run on a graph G with w =



3 2

2

4 +  dn for some constant >0, and B = 1+ . Then for suciently large n, with

4n n

p

1

4+ nd

3

20

+ probability at least 1  Learn-Graph2 halts within O 4 +  dn T  steps, where  = e

p

1

dn

4+ nd

2

4

e . + e

6.3 Correctness of Learn-Graph2

In this section, we prove the correctness of each pro cedure in Learn-Graph2.

Lemma 15 The procedure Compress-Path halts in O n jpathj steps and returns a path in which

no node occurs more than once.

Pro of: We prove the following invariantby induction: in Compress-Path, whenever Lewis

reaches the end of path, eachnodein path [1 :::path-stepClark] app ears at most once in the entire

path.

Assume that this claim holds after Clark has crossed the rst k edges in path . By the inductive

hyp othesis, we know that when Clark crosses the k + 1st edge, he arrives at some new no de not

previously encountered along path .Now Lewis follows the entire path. Whenever the path lo ops

backtoLoc Clark, the lo op is removed from the path. Thus, all rep eated o ccurrences of the new

G

no de are removed from the path, proving the inductive step.

Since there are n no des in the graph, Clark can only make n moves b efore he returns to u . Lewis

0

can move at most jpathj steps for every move of Clark's, so the total running time is O n jpathj. 2

Lemma 16 The procedure Truncate-Path-At-Map nds the index of the rst path step leading

2

to a node already in map. The algorithm runs in O n  steps.

Pro of: For eachnodeu in map , Lewis traverses the path once while Clark waits at u . The pro-

k k

cedure keeps track of the earliest no de found that is already in map , so the pro cedure's correctness

follows. Clark takes at most n steps to reach eachnode u . Lewis needs at most n steps to follow

k

the compressed path and n more to return to the start of the path. Thus the algorithm requires no

2

more than 3n steps. 2

The algorithm Learn-Graph2 halts only when Build-Map returns TRUE. The following

lemma shows that whenever Build-Map returns TRUE, map is isomorphic to G.

Lemma 17 In Build-Map , whenever Clark is at u , map is a goodrepresentation of hLoc Clark;Gi.

0 G 20

0 0 0

Pro of: We inductively build a subgraph G =V ;E ofG =V; E and construct an isomorphism

0

f from map to G . Initially, Lewis and Clark are b oth at u and map consists of the single no de

0

0 0

u and no edges. Let V = Loc Clark;E = ;; and f u = Loc Clark. Then map is a good

0 G 0 G

representation of hLoc Clark;Gi.

G

Consider the rob ots starting an iteration of the rst while lo op in Build-Map. Both rob ots

are together at u . Inductively assume that map is a go o d representation of hLoc Clark;Gi and

0 G

that map is strongly-connected. Thus, Lewis can always reach an un nished no de in map if one

exists. Lewis walks to an un nished no de u , crosses a new edge ` to an unknown no de v , and

i

then walks randomly until he returns to u , where Clark is waiting. From Lemmas 15 and 16, after

0

the algorithm executes subroutines Compress-Path and Truncate-Path, `  path is a path that

b egins at u , crosses edge `, ends at u , and whose intermediate no des are not represented in map.

i j

If path is empty after Truncate-Path-At-Map, then v is no de u already in map . Adding

j

0 0

edge hf u ;`;fu i to E and hu ;`;u i to map and therefore maintains the invariant that G is a

i j i j

0

subgraph of G and preserves the isomorphism b etween map and G .

If path is not empty, then by lemmas 15 and 16 all no des reached by starting at u and following

i

path to the end are distinct, and only the last no de reached is already in map . Let m b e the highest

index so far of anynodein map. The algorithm adds new no des u ;:::;u and new edges

m+1

m+jpathj

hu ; path ;u i8k; 1  k

m+k m+k +1 i m+1 j

k

m+jpathj jpathj

f u  b e the lo cation of Lewis in G after Lewis has crossed the k th edge of the path. Add the

m+k

0 0

jpathj1 new no des to V , and edges hf u ; path ;fu i to E . Then f is an isomorphism

m+k m+k +1

k

0

from map to G G,somap is a go o d representation of hLoc Clark;Gi. Since path connects an

G

un nished no de to another no de in map, map remains strongly-connected. 2

When Build-Map halts and returns TRUE, there are no un nished no des in map . Since

0

map is isomorphic to G G and has no un nished no des, map must have the same number

of no des and edges as G. Therefore, map is isomorphic to G when Build-Map returns TRUE,

proving Theorem 13. 2

6.4 Running Time and Failure Probability of Learn-Graph2

We proved that when the algorithm terminates it is correct. In this section, we prove Theorem 14

by analyzing the probability that the algorithm terminates in a reasonable amount of time. We

say the algorithm fails if any of the following cases holds:

1. Algorithm Learn-Graph2 nds an unlikely no de but estimates that it is a likely no de.

2. Algorithm Learn-Graph2 fails to nd a likely no de in the allotted time. We allow w =

p

3

4 +  dn iterations, each consisting of w random walks.

3. Algorithm Learn-Graph2 calls Build-Map from a likely no de, but Build-Map returns

FALSE.

In fact, these conditions overestimate the probability that the algorithm fails to run in O 4 +

3

 dn T  steps. The next three lemmas b ound the probabilities of each of the three failure conditions. 21

At the end of the section, we analyze the running time of Learn-Graph2 when no failure condition

o ccurs.

p

3

Lemma 18 Failure Condition 1 Suppose that Learn-Graph2 is run with w = 4 +  dn

2

and M =4+ dn . Assume that the algorithm estimates node u's steady-state probability to be

p

1

2 3 4+ nd

20

1 + . Then the probability that u is not a likely node is at most e greater than B = .

4n n

Pro of: Call each random walk in Learn-Graph2 a phase. Let X b e the random variable where

i



1 if Lewis and Clark are together at the end of phase i

X =

i

0 otherwise.

P

w

Then X = X is the numb er of phases where Lewis and Clark end together. If u is an unlikely

i

i=1

no de, then E [X ]  w=2n1+2=n, b ecause the estimation of  could b e inaccurate by at most

u

2

.We therefore b ound the quantity  =1=n

 

2 2 w 3w

1+ 1+ : E [X ]  Pr X>

4n n 2n n

Using the Cherno b ound from Lemma 11 with  =1=2, we get:

w 2

 "

1+

   

1

2n n

2

3w 2 e

Pr X  1+ 

3

3

2

4n n

 

2

p

w

! 

r

4+ dn

p

p

2n p   p

4+ dn

4+ dn

2

8e 27

8e

ln ln

27 2 8e

20

 e  e  e  :

27

2

Lemma 19 Failure Condition 2 The probability that Learn-Graph2 fails to nd and recog-

p

p

1

4+ dn

3

2

for suciently large n. 4 +  dn iterations is at most e nize a likely node after

Pro of: De ne a good no de to b e a no de with steady state probability at least 1=n. Note that

every graph has at least one go o d no de. We can b ound the probability that we fail to nd and

recognize a likely no de by the probability that we fail to identify a go o d no de within w iterations.

First, we b ound the probability that the algorithm fails to recognize a go o d no de when testing

one. Random variables X and X are de ned as in Lemma 18. Then, since the stationary probability

i

of a go o d no de is greater than 1=n,

1 1

E X   w :

2

n n

To simplify the math, note that for suciently large n,

1 15w 1 2

w  1+ :

2

n n 16n n 22

Then by the Cherno corollary in Lemma 11, for  =1=5,

   

p

15w 3w

1 3 15 1 w

4+ dn

2 16 25 n 160

Pr X<1   = Pr X<  e  e :

16n 4n

De ne to b e the quantity

p

1

3

3

4+ dn

160

: 1 e 1

n

The probability that the no de reached at the end of a random walk of b oth rob ots is a go o d no de

and is identi ed as such is at least

p

1 1

3

3

4+ dn

160

: = 1 e

2

n n n

Therefore the probability that after w trials, no likely no de is found and recognized is at most



p p

n

3

4+ dn 4+ dn

p

4+ dn

1 : = 1 = e

n n

Note that approaches 1 as n increases. For suciently large n; > 1=2, so the probability

p

1

4+ dn

2

that the rob ots fail to nd a likely no de after w trials is at most e . 2

Nowwe analyze the running time of Build-Map. The pro cedure Build-Map executes the

main while lo op at most once for each of the dn edges in the graph. Let k b e the length of the

i

P

dn

random walk in the ith iteration of the while lo op. Then let K = k b e the total length of all

i

i=1

the random walks in the algorithm. Since by Lemmas 15 and 16 Compress-Path runs in O nk 

i

2 2

steps and Truncate-Path-At-Map runs in O n  steps, the ith iteration takes O n + k + nk + n 

i i

3

steps. Therefore the total running time of the algorithm is O dn + nK :

dn

4

Lemma 20 Failure Condition 3 If u is a likely node, then with probability at least 1 e ,

0

2

K  4 +  dn T .

Pro of: We use an amortized analysis to prove the b ound on K . First we sub divide all of Lewis'

2 2

random walks during Build-Map into p erio ds of T =  log 2n= steps each, where T is the

approximate mixing time. Recall that if Lewis starts from any no de, after T steps Lewis is at no de

2 2

u with probabilitybetween  1=n and  +1=n .Thus, Lewis' p osition after the k th p erio d

k k k

is almost indep endent of Lewis' p osition after the k + 1st p erio d.

We asso ciate a 0=1-valued random variable, X , with the k th p erio d of Lewis' random walk.

k



1 if Lewis is at no de u at the end of the k th p erio d

0

X =

k

0 otherwise.

2

P

4+ dn

Since u is a likely no de, X = 1 with probability at least 1=2n. Let X = X b e the

0 k k

k =1

2

numb er of times Lewis returns to u in 4 +  dn p erio ds. Note that E X  is at least 4 +  dn=2.

0

Using Cherno b ounds with  =1 2=4 +  we nd:

2

4+ dn

2

1

 

4 4+

Pr[X<1  E [X ]]



4+ dn

4 4

4+ dn

dn

1 +

+dn

2

4 4+

4+ 

4 4

2

3 3

Thus, with high probability Build-Map runs in O dn + dn 4 +  T  steps. Supp ose none

of the failure conditions o ccurs in a run of Learn-Graph2. Then the execution never calls

Build-Map on an unlikely no de, does call Build-Map on a likely no de, and when it do es,

Build-Map returns TRUE. Therefore Learn-Graph2 makes at most w steady-state probabil-

ity estimates, each taking O wT  steps, b efore calling Build-Map once. Therefore, the running

3

time is O 4 +  dn T , proving Theorem 14. 2

6.5 Exploring Without Prior Knowledge

Prior knowledge of n is used in twoways in Learn-Graph2: to estimate the stationary probability

of a no de and to compute the mixing time T .IfT is known but n is not, the algorithm can forego

estimating  entirely and simply run Build-Map after step 6. The removal of lines 7 { 12 from

i

Learn-Graph2 yields a new algorithm whose exp ected running time is p olynomially slower than

the original. If we know neither n nor T ,we can run this new algorithm using standard doubling to

estimate the quantity MT . This quantity can b e used in line 6 of Learn-Graph2 and also in line

7ofBuild-Map as an upp er b ound on the length of the random walks. Thus no prior knowledge

of the graph is necessary.

7 Conclusions and Op en Problems

Note that with high probability, a single rob ot with a p ebble can simulate algorithm Learn-

Graph2 with a substantial but p olynomial slowdown. However, Learn-Graph2 do es not run in

p olynomial exp ected time on graphs with exp onentially-small conductance. An op en problem is

to establish tight b ounds on the running time of an algorithm that uses one rob ot and a constant

numb er of p ebbles to learn an n-no de graph G.We conjecture that the lower b ound will b e a

function of G, but there may b e other graph characteristics e.g.,cover time which yield b etter

b ounds. It would also b e interesting to establish tight b ounds on the numb er of p ebbles a single

rob ot needs to learn graphs in p olynomial time.

Another direction for future work is to nd other sp ecial classes of graphs that two rob ots can

learn substantially more quickly than general directed graphs, and to nd ecient algorithms for

these cases.

Acknowledgements

We are particularly grateful to , who patiently listened to many half-baked ideas, help ed

us work through some of the rough sp ots, and suggested names for the rob ots. We gratefully thank

Michael Rabin, Mike Sipser and Les Valiant for many enjoyable and helpful discussions. We also

thank Rob Schapire for his insightful comments on an earlier draft of the pap er. 24

References

[Ang81] Dana Angluin. A note on the numb er of queries needed to identify regular languages.

Information and Control, 51:76{87, 1981.

[Ang87] Dana Angluin. Learning regular sets from queries and counterexamples. Information

and Computation, 75:87{106, Novemb er 1987.

[AR94] Y. Aumann and M. O. Rabin. Clo ck construction in fully asynchronous parallel systems

and PRAM simulation. Theoretical Computer Science, 128:3{30, 1994.

[ABRS95] BaruchAwerbuch, Margrit Betke, Ronald L.Rivest, and Mona Singh. Piecemeal graph

exploration by a mobile rob ot. In Proceedings of the 1995 ACM Conference on Compu-

tational Learning Theory, pages 321{328, Santa Cruz, CA, July 1995.

+

[BB 90] Paul Beame, Allan Boro din, Prabhakar Raghavan, Walter Ruzzo, and Martin Tompa.

Time-space tradeo s for undirected graph traversal. In Proceedings of the 31st Annual

Symposium on Foundations of Computer Science, pages 429{430, 1990.

[Bet92] Margrit Betke. Algorithms for exploring an unknown graph. Master's thesis. Published

as MIT Lab oratory for Computer Science Technical Report MIT/LCS/TR-536, March,

1992.

[BRS93] Margrit Betke, Ronald Rivest, and Mona Singh. Piecemeal learning of an unknown

environment. In Proceedings of the 1993 Conference on Computational Learning Theory,

pages 277{286, Santa Cruz, CA, July 1993. Published as MIT AI-Memo 1474, CBCL-

Memo 93; to b e published in .

[BK78] Manuel Blum and Dexter Kozen. On the p ower of the compass or, why mazes are

easier to search than graphs. In 19th Annual Symposium on Foundations of Computer

Science, pages 132{142. IEEE, 1978.

[BS77] M. Blum and W. J. Sako da. On the capability of nite automata in 2 and 3 dimensional

space. In 18th Annual Symposium on Foundations of Computer Science, pages 147{161.

IEEE, 1977.

+

[CBF 93] N. Cesa-Bianchi, Y. Freund, D. Helmb old, D. Haussler, R. Schapire, and M. Warmuth.

How to use exp ert advice. In Proceedings of the Eighteenth Annual ACM Symposium on

Theory of Computing, pages 382{391, San Diego, CA, May 1993.

[CR80] Stephen A. Co ok and Charles W. Racko . Space lower b ounds for maze threadability

on restricted machines. SIAM Journal on Computing, 9:636{652, 1980.

+

[DA 92] Thomas Dean, Dana Angluin, Kenneth Basye, Sean Engelson, Leslie Kaelbling, Evan-

gelos Kokkevis, and Oded Maron. Inferring nite automata with sto chastic output

functions and an application to map learning. In Proceedings of the Tenth National

ConferenceonArti cial Intel ligence, pages 208{214, July 1992.

[DP90] Xiaotie Deng and Christos H. Papadimitriou. Exploring an unknown graph. In Pro-

ceedings of the 31st Symposium on Foundations of Computer Science,volume I, pages

355{361, 1990. 25

+

[DP 91] Rob ert Daley, Leonard Pitt, Mahendran Velauthapillai, and To dd Will. Relations b e-

tween probabilistic and team one-shot learners. In Proceedings of COLT '91, pages

228{239. Morgan Kaufmann, 1991.

[Edm93] Je Edmonds. Time-space trade-o s for undirected st-connectivity onaJAG. In Pro-

ceedings of the Eighteenth Annual ACM Symposium on Theory of Computing, pages

718{727, San Diego, CA, May 1993.

+

[FK 93] YoavFreund, Michael Kearns, Dana Ron, Ronitt Rubinfeld, Rob ert Schapire, and Linda

Sellie. Ecient learning of typical nite automata from random walks. In Proceedings of

the Eighteenth Annual ACM Symposium on Theory of Computing, pages 315{324, San

Diego, CA, May 1993.

[Koh78] Zvi Kohavi. Switching and Finite Automata Theory. McGraw-Hill, second edition, 1978.

[KS93] Michael J. Kearns and H. Sebastian Seung. Learning from a p opulation of hyp otheses.

In Proceedings of COLT '93, pages 101{110, 1993.

[Mih89] Milena Mihail. Conductance and convergence of Markovchains { a combinatorial treat-

ment of expanders. In Proceedings of the 30th Annual Symposium on Foundations of

Computer Science, pages 526{531, 1989.

[Mo o56] Edward F. Mo ore. Gedanken-Experiments on Sequential Machines, pages 129{153.

Princeton University Press, 1956. Edited by C. E. Shannon and J. McCarthy.

[Po o93] C.K. Po on. Space b ounds for graph connectivity problems on no de-named JAGs and

no de-ordered JAGs. In Proceedings of the Thirty-Fourth Annual Symposium on Foun-

dations of Computer Science, pages 218{227, Palo Alto, California, Novemb er 1993.

[PW95] James Gary Propp and David Bruce Wilson. Exact sampling with coupled Markov

chains and applications to statistical mechanics. To app ear in the Proceedings of the

1995 ConferenceonRandom Structures and Algorithms.

[Rab67] Michael O. Rabin. Maze threading automata. Octob er 1967. Seminar talk presented at

the University of California at Berkeley.

[RS87] Ronald L. Rivest and Rob ert E. Schapire. Diversity-based inference of nite automata.

In Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer

Science, pages 78{87, Los Angeles, California, Octob er 1987.

[Rag89] Prabhakar Raghavan. CS661: Lecture Notes on RandomizedAlgorithms. Yale University,

Fall 1989.

[RS93] Ronald L. Rivest and Rob ert E. Schapire. Inference of nite automata using homing

sequences. Information and Computation, 1032:299{347, April 1993.

[RR95] Dana Ron and Ronitt Rubinfeld. Learning fallible deterministic nite automata. Ma-

chine Learning, 18:149{185, February/March 1995.

[RR95b] Dana Ron and Ronitt Rubinfeld. Exactly learning automata with small cover time.

In Proceedings of the 1995 ACM Conference on Computational Learning Theory, pages

427{436, Santa Cruz, CA, July 1995. 26

[Sav73] Walter J. Savitch. Maze recognizing automata and nondeterministic tap e complexity.

JCSS, 7:389{403, 1973.

[SJ89] and . Approximate counting, uniform generation and

rapidly mixing markovchains. Information and Computation, 821:93{133, July 1989.

[Smi94] Carl H. Smith. Three decades of team learning. To app ear in Proceedings of the Fourth

International Workshop on on Algorithmic Learning Theory. Springer Verlag, 1994. 27