The Power of Team Exploration: Two Rob ots Can Learn
Unlab eled Directed Graphs
y
Michael A. Bender Donna K. Slonim
Aiken Computation Lab oratory MIT Lab oratory for Computer Science
Harvard University 545 Technology Square
Cambridge, MA 02138 Cambridge, MA 02139
b [email protected] [email protected]
Septemb er 6, 1995
Abstract
We show that two co op erating rob ots can learn exactly any strongly-connected directed
graph with n indistinguishable no des in exp ected time p olynomial in n.Weintro duce a new
typ e of homing sequence for two rob ots, which helps the rob ots recognize certain previously-seen
no des. We then present an algorithm in which the rob ots learn the graph and the homing se-
quence simultaneously by actively wandering through the graph. Unlike most previous learning
results using homing sequences, our algorithm do es not require a teacher to provide counterex-
amples. Furthermore, the algorithm can use eciently any additional information available that
distinguishes no des. We also present an algorithm in which the rob ots learn by taking random
walks. The rate at which a random walk on a graph converges to the stationary distribution
is characterized by the conductance of the graph. Our random-walk algorithm learns in ex-
p ected time p olynomial in n and in the inverse of the conductance and is more ecient than
the homing-sequence algorithm for high-conductance graphs.
Supp orted by NSF Grant CCR-93-13775.
y
Supp orted by NSF Grant CCR-93-10888 and NIH Grant 1 T32 HG00039-01.
1 Intro duction
Consider a rob ot trying to construct a street map of an unfamiliar cityby driving along the city's
roads. Since many streets are one-way, the rob ot may b e unable to retrace its steps. However, it
can learn by using street signs to distinguish intersections. Now supp ose that it is nighttime and
that there are no street signs. The task b ecomes signi cantly more challenging.
In this pap er we present a probabilistic p olynomial-time algorithm to solve an abstraction of the
ab ove problem by using two co op erating learners. Instead of learning a city,we learn a strongly-
connected directed graph G with n no des. Every no de has d outgoing edges lab eled from 0 to d 1.
No des in the graph are indistinguish able, so a rob ot cannot recognize if it is placed on a no de that
it has previously seen. Moreover, since the graph is directed, a rob ot is unable to retrace its steps
while exploring.
For this mo del, one might imagine a straightforward learning algorithm with a running time
p olynomial in a sp eci c prop erty of the graph's structure suchascover time or mixing time. Any
such algorithm could require an exp onential numb er of steps, however, since the cover time and
mixing time of directed graphs can b e exp onential in the numb er of no des. In this pap er, we present
2 5
a probabilistic algorithm for two rob ots to learn any strongly-connected directed graph in O d n
steps with high probability.
The two rob ots in our mo del can recognize when they are at the same no de and can communicate
freely by radio. Radio communication is used only to synchronize actions. In fact, if we assume
that the two rob ots move synchronously and share a p olynomial-length random string, then no
communication is necessary.Thus with only minor mo di cations, our algorithms may b e used in a
distributed setting.
Our main algorithm runs without prior knowledge of the numb er of no des in the graph, n,in
time p olynomial in n.We show that no probabilistic p olynomial-time algorithm for a single rob ot
with a constantnumb er of p ebbles can learn all unlab eled directed graphs when n is unknown.
Thus, our algorithms demonstrate that two rob ots are strictly more p owerful than one.
1.1 Related Work
Previous results showing the p ower of team learning are plentiful, particularly in the eld of induc-
tive inference see Smith [Smi94] for an excellent survey. Several team learning pap ers explore the
+
problems of combining the abilities of a numb er of di erent learners. Cesa-Bianchi et al. [CBF 93]
consider the task of learning a probabilistic binary sequence given the predictions of a set of exp erts
on the same sequence. They showhow to combine the prediction strategies of several exp erts to
predict nearly as well as the b est of the exp erts. In a related pap er, Kearns and Seung [KS93]
explore the statistical problems of combining several indep endenthyp otheses to learn a target con-
cept from a known, restricted concept class. In their mo del, eachhyp othesis is learned from a
di erent, indep endently-drawn set of random examples, so the learner can combine the results to
p erform signi cantly b etter than any of the hyp otheses alone.
There are also many results on learning unknown graphs, but most previous work has concen-
trated on learning undirected graphs or graphs with distinguishable no des. For example, Deng and 1
Papadimitriou consider the problem of learning strongly-connected, directed graphs with labeled
no des, so that the learner can recognize previously-seen no des. They provide a learning algorithm
whose comp etitive ratio versus the optimal time to traverse all edges in the graph is exp onen-
tial in the de ciency of the graph [DP90, Bet92]. Betke, Rivest, and Singh intro duce the notion of
piecemeal learning of undirected graphs with lab eled no des. In piecemeal learning, the learner must
return to a xed starting p oint from time to time during the learning pro cess. Betke, Rivest, and
Singh provide linear algorithms for learning grid graphs with rectangular obstacles [BRS93], and
with Awerbuch [ABRS95] extend this work to show nearly-linear algorithms for general graphs.
Rivest and Schapire [RS87, RS93] explore the problem of learning deterministic nite automata
whose no des are not distinguishable except by the observed output. We rely heavily on their results
+ +
in this pap er. Their work has b een extended byFreund et al. [FK 93], and by Dean et al. [DA 92].
Freund et al. analyze the problem of learning nite automata with average-case lab elings by the
observed output on a random string, while Dean et al. explore the problem of learning DFAs with
a rob ot whose observations of the environment are not always reliable. Ron and Rubinfeld [RR95]
present algorithms for learning \fallible" DFAs, in which the data is sub ject to p ersistent random
errors. Recently, Ron and Rubinfeld [RR95b ] have shown that a teacher is unnecessary for learning
nite automata with small cover time.
In our mo del a single rob ot is p owerless b ecause it is completely unable to distinguish one no de
from any other. However, when equipp ed with a numb er of p ebbles that can b e used to mark
no des, the single rob ot's plight improves. Rabin rst prop osed the idea of dropping p ebbles to
mark no des [Rab67]. This suggestion led to a b o dy of work exploring the searching capabilities of a
nite automaton supplied with p ebbles. Blum and Sako da [BS77] consider the question of whether
a nite set of nite automata can search a 2 or 3-dimensional obstructed grid. They prove that a
single automaton with just four p ebbles can completely searchany 2-dimensional nite maze, and
that a single automaton with seven p ebbles can completely searchany 2-dimensional in nite maze.
They also prove, however, that no collection of nite automata can searchevery 3-dimensional
maze. Blum and Kozen [BK78] improve this result to show that a single automaton with 2 p ebbles
can search a nite, 2-dimensional maze. Their results imply that mazes are strictly easier to search
than planar graphs, since they also show that no single automaton with p ebbles can search all
planar graphs. Savitch [Sav73]intro duces the notion of a maze-recognizing automaton MRA,
whichisaDFA with a nite numb er of distinguishable p ebbles. The mazes in Savitch's pap er are
n-no de 2-regular graphs, and the MRAs have the added ability to jump to the no de with the next
higher or lower numb er in some ordering. Savitch shows that maze-recognizing automata and log n
space-b ounded Turing machines are equivalent for the problem of recognizing threadable mazes
i.e., mazes in which there is a path b etween a given pair of no des.
Most of these pap ers use p ebbles to mo del memory constraints. For example, supp ose that
the no des in a graph are lab eled with log n-bit names and that a nite automaton with k log n
bits of memory is used to search the graph. This situation is mo deled by a single rob ot with k
distinguishable p ebbles. A rob ot dropping a p ebble at a no de corresp onds to a nite automaton
storing the name of that no de. In our pap er, by contrast, weinvestigate time rather than space
constraints. Since memory is now relatively cheap but time is often critical, it makes sense to ask 2
whether a rob ot with any reasonable amount of memory can use a constantnumb er of p ebbles to
learn graphs in p olynomial time.
Co ok and Racko generalized the idea of p ebbles to jumping automata for graphs JAGs
[CR80]. A jumping automaton is equipp ed with p ebbles that can b e dropp ed to mark no des and
that can \jump" to the lo cations of other p ebbles. Thus, this mo del is similar to our two-rob ot
mo del in that the second rob ot maywait at a no de for a while to mark it and then catchupto
the other rob ot later. However, the JAG mo del is somewhat broader than the two-rob ot mo del.
Co ok and Racko show upp er and lower b ounds of log n and log n= log log n on the amount of space
required to determine whether there is a directed path b etween two designated no des in any n-no de
graph. JAGs have b een used primarily to prove space eciency for st-connectivity algorithms, and
they have recently resurfaced as a to ol for analyzing time and space tradeo s for graph traversal
+
and connectivity problems e.g. [BB 90,Po o93, Edm93].
Universal traversal sequences have b een used to provide upp er and lower b ounds for the ex-
ploration of undirected graphs. Certainly, a universal traversal sequence for the class of directed
graphs could b e used to learn individual graphs. However, for arbitrary directed graphs with n
no des, a universal traversal sequence must have size exp onential in n.Thus, such sequences will
not provide ecient solutions to our problem.
1.2 Strategy of the Learning Algorithm
The p ower b ehind the two-rob ot mo del lies in the rob ots' abilities to recognize each other and to
move indep endently. Nonetheless, it is not obvious how to harness this p ower. If the rob ots separate
in unknown territory, they could search for each other for an amount of time exp onential in the size
of the graph. Therefore, in any successful strategy for our mo del the two rob ots must always know
how to nd each other. One strategy that satis es this requirement has b oth rob ots following the
same path whenever they are in unmapp ed territory. They may travel at di erent sp eeds, however,
with one rob ot scouting ahead and the other lagging b ehind. We call this a lead-lag strategy.Ina
lead-lag strategy the lagging rob ot must rep eatedly make a dicult choice. The rob ot can wait at
a particular no de, thus marking it, but the leading rob ot may not nd this marked no de again in
p olynomial time. Alternatively, the lagging rob ot can abandon its current no de to catch up with
the leader, but then it may not knowhow to return to that no de. In spite of these diculties, our
algorithms successfully employ a lead-lag strategy.
Our work also builds on techniques of Rivest and Schapire [RS93]. They present an algorithm
for a single rob ot to learn minimal deterministic nite automata. With the help of an equivalence
oracle, their algorithm learns a homing sequence, which it uses in place of a reset function. It then
runs several copies of Angluin's algorithm [Ang87] for learning DFAs given a reset. Angluin has
shown that any algorithm for actively learning DFAs requires an equivalence oracle [Ang81].
In this pap er, weintro duce a new typ e of homing sequence for two rob ots. Because of the
strength of the homing sequence, our algorithm do es not require an equivalence oracle. For any
2 5
graph, the exp ected running time of our algorithm is O d n . In practice, our algorithm can
use additional information such as indegree, outdegree, or color of no des to nd b etter homing
sequences and to run faster. 3
Note that throughout the pap er, the analyses of the algorithms account for only the number of
steps that the rob ots take across edges in the graph. Additional calculations p erformed b etween
moves are not considered, so long as they are known to take time p olynomial in n. In practice,
such calculations would not b e a noticeable factor in the running time of our algorithms.
Two rob ots can learn sp eci c classes of directed graphs more quickly, such as the class of
graphs with high conductance. Conductance, a measure of the expansion prop erties of a graph,
was intro duced by Sinclair and Jerrum [SJ89]. The class of directed graphs with high conductance
includes graphs with exp onentially-large cover time. We present a randomized algorithm that learns
1
4
2
in O dn log n steps with high probability. graphs with conductance greater than n
2 Preliminaries
Let G =V; E represent the unknown graph, where G has n no des, each with outdegree d. An edge
from no de u to no de v with lab el i is denoted hu; i; v i.Wesay that an algorithm learns graph G
if it outputs a graph isomorphic to G. Our algorithms maintain a graph map which represents the
subgraph of G learned so far. Included in map is an implicit start no de u .Itisworth emphasizing
0
the di erence b etween the target graph G and the graph map that the learner constructs. The
graph map is meant to b e a map of the underlying environment, G.However, since the rob ots do
not always know their exact lo cation in G, in some cases map may contain errors and therefore may
not b e isomorphic to any subgraph of G. Much of the notation in this section is needed to sp ecify
clearly whether we are referring to a rob ot's lo cation in the graph G or to its putative lo cation in
map .
Anode u in map is called un nished if it has any unexplored outgoing edges. No de u is map-
reachable from no de v if there is a path from v to u containing only edges in map .For rob ot k , the
no de in map corresp onding to k 's lo cation in G if map is correct is denoted Loc k . Rob ot k 's
M
lo cation in G is denoted Loc k .
G
Let f b e an automorphism on the no des of G such that
8a; b 2 G; ha; i; bi2 G hf a;i;fbi2 G:
Wesay nodes c and d are equivalent written c d i there exists suchanf where f c= d.
Wenow present notation to describ e the movements of k rob ots in a graph. An action A of
i
the ith rob ot is either a lab el of an outgoing edge to explore, or the symbol r for \rest." A k -robot
sequence of actions is a sequence of steps denoting the actions of the k rob ots; each step is a k -tuple
hA ;:::;A i.For sequences s and t of actions, s t denotes the sequence of actions obtained by
0 k 1
concatenating s and t.
A path is a sequence of edge lab els. Let jpath j represent the length of path. A rob ot fol lows
a path by traversing the edges in the path in order b eginning at a particular start no de in map.
The no de in map reached by starting at u and following path is denoted nal path , u . Let s be
0 M 0
atwo-rob ot sequence of actions such that if b oth rob ots start together at anynodeinany graph
and execute s, they follow exactly the same path, although p erhaps at di erent sp eeds. We call
such a sequence a lead-lag sequence. Note that if two rob ots start together and execute a lead-lag 4
sequence, they end together. The no de in G reached if b oth rob ots start at no de a in G and follow
lead-lag sequence s is denoted nal s, a.
G
For convenience, we name our rob ots Lewis and Clark. Whenever Lewis and Clark execute a
lead-lag sequence of actions, Lewis leads while Clark lags b ehind.
3 Using a Reset to Learn
Learning a directed graph with indistinguishabl e no des is dicult b ecause once b oth rob ots have
left a known p ortion of the graph, they do not knowhow to return. This problem would vanish
if there were a reset function that could transp ort b oth rob ots to a particular start no de u .We
0
describ e an algorithm for two rob ots to learn directed graphs given a reset. Having a reset is not a
realistic mo del, but this algorithm forms the core of later algorithms, which learn without a reset.
Algorithm Learn-with-Reset maintains the invariant that if a rob ot starts at u , there is a
0
directed path it can follow that visits every no de in map at least once. To learn a new edge one
not yet in map using algorithm Learn-with-Reset, Lewis crosses the edge and then Clark tours
the entire known p ortion of the map. If they encounter each other, Lewis's p osition is identi ed;
otherwise Lewis is at a new no de. The depth- rst strategy employed by Learn-Edge is essential
in later algorithms. In Learn-with-Reset, as in all the pro cedures in this pap er, variables are
passed by reference and are mo di ed destructively.
2
Lemma 1 The variable path in Learn-with-Reset denotes a tour of length dn that starts at
u and traverses al l edges in map.
0
Learn-with-Reset:
1 map := fu g; ; f map is the graph consisting of no de u and no edges g
0 0
2 path := empty path f path is the null sequence of edge lab els g
3 k := 1 f k counts the numb er of no des in map g
4 while there are un nished no des in map
5 do Learn-Edgemap,path,k
6 Reset
7 return map
Learn-Edgemap,path,k : fpath = tour through all edges in map g
1 Lewis follows path to nal path,u
M 0
2 u := some un nished no de in map map-reachable from Loc Lewis
i M
3 Lewis moves to no de u ; app end the path taken to path
i
4 pick an unexplored edge l out of no de u
i
5 Lewis moves along edge l ; app end edge l to path f Lewis crosses a new edge g
6 Clark follows path to nal path,u f Clark lo oks for Lewis g
M 0
7 if 9j j 8 then add edge hu ;l;u i to map i j 9 else add new no de u and edge hu ;l;u i to map k i k 10 k := k +1 5 Pro of: Every time Lewis crosses an edge, that edge is app ended to path. Since no edge is added to map until Lewis has crossed it, path must traverse all edges in map. In each call to Learn-Edge,at 2 most n edges are added to path. The b o dy of the while lo op is executed dn times, so jpath j dn . 2 Lemma 2 Map is always a subgraph of G. Pro of: Initially map contains a single no de u and no edges. Assume inductively that map is a 0 subgraph of G after the cth call to Learn-Edge when map has c edges. To learn the next edge, the algorithm cho oses a no de u in map and explores a new edge e = hu ;l;vi. By Lemma 1 and i i the inductivehyp othesis, if Clark encounters Lewis at u then v is identi ed as u . Otherwise v is j j recognized to b e a new no de and named u . Therefore the up dated map is a subgraph of G. 2 k Lemma 3 If map contains any un nishednodes, then there is always some un nishednode in map map-reachable from nal path,u . M 0 Pro of: Supp ose this assumption were false. Then there is some un nished no de in map , but all no des of map in the strongly-connected comp onentof nal path,u are nished. Thus by M 0 Lemma 2, there are no additional edges of G leaving that comp onent, so graph G is not strongly connected. 2 2 3 Theorem 4 After O d n moves and dn cal ls to Reset, Learn-with-Reset halts and outputs a graph isomorphic to G. Pro of: The correctness of the output follows from Lemmas 1 { 3. For each call to Learn-Edge, 2 each rob ot takes lengthpath dn steps. The algorithm Learn-Edge is executed at most dn 2 3 times, so the algorithm halts within O d n steps. 2 4 Homing Sequences In practice, rob ots learning a graph do not have access to a reset function. In this section we suggest an alternative technique: weintro duce a new typ e of homing sequence for two rob ots. Intuitively, a homing sequence is a sequence of actions whose observed output uniquely deter- mines the nal no de reached in G. Rivest and Schapire [RS93] showhow a single rob ot with a teacher can use homing sequences to learn strongly-connected minimal DFAs. The output at each no de indicates whether that no de is an accepting or rejecting state of the automaton. If the target DFA is not minimal, their algorithm learns the minimal enco ding of the DFA. In other words, their algorithm learns the function that the graph computes rather than the structure of the graph. In unlab eled graphs the no des do not pro duce output. However, two rob ots can generate output indicating when they meet. De nitions: Each step of a two-rob ot sequence of actions pro duces an output symbol T if the rob ots are together and S if they are separate. An output sequence is a string in fT; Sg denoting 6 the observed output of a sequence of actions. Let s b e a lead-lag sequence of actions and let a be anodein G. Then outputs,a denotes the output pro duced by executing the sequence s, given that both rob ots start at a. A lead-lag sequence s of actions is a two-robot homing sequence i 8 no des u; v 2 G; output s; u= output s; v nal s; u nal s; v : G G Because the output of a sequence dep ends on the p ositions of b oth rob ots, it provides information ab out the underlying structure of the graph. Figure 1 illustrates the de nition of a two-rob ot homing sequence. This new typ e of homing sequence is p owerful. Unlike most previous learning results using homing sequences, our algorithms do not require a teacher to provide counterexamples. In fact, two rob ots on a graph de ne a DFA whose states are pairs of no des in G and whose edges corresp ond to pairs of actions. Since the automata de ned in this way form a restricted class of DFAs, our results are not inconsistent with Angluin's work [Ang81] showing that a teacher is necessary for learning general DFAs. Theorem 5 Every strongly-connected directedgraph has a two-robot homing sequence. Pro of: The following algorithm based on that of Kohavi [Koh78, RS93 ] constructs a homing sequence: Initially, let h b e empty. As long as there are two no des u and v in G such that outputh,u = outputh,v but nalh,u 6 nalh,v, let x b e a lead-lag sequence whose output distinguishes nalh,u from nalh,v. Since nalh,u 6 nalh,v and G is strongly connected, suchanx always exists. App end x to h. Each time a sequence is app ended to h, the numb er of di erent outputs of h increases byat least 1. Since G has n no des, there are at most n p ossible output sequences. Therefore, after n 1 iterations, h is a homing sequence. 2 In Section 5 we show that it is p ossible to nd a counterexample x eciently. Given a strongly- connected graph G and a no de a in G, a pair of rob ots can verify whether they are together at a no de equivalenttoa on some graph isomorphic to G.We describ e a veri cation algorithm Verifya, G in Section 5. The sequence of actions returned by a call of Verifyu; G is always a suitable counterexample x. Using the b ound from Corollary 8, we claim that this algorithm pro duces a 4 homing sequence of length O n for all graphs. Note that shorter homing sequences exist; the 3 homing sequence pro duced by algorithm Learn-Graph in Section 5 has length O dn . 4.1 Using a Homing Sequence to Learn Given a homing sequence h, an algorithm can learn G by maintaining several running copies of Learn-with-Reset. Instead of a single start no de, there are as manyasn p ossible start no des, each corresp onding to a di erent output sequence of h. Note that many distinct output sequences may b e asso ciated with the same nal no de in G. The new algorithm Learn-with-HS maintains several copies of map and path, one for each output sequence of h. Thus, graph map denotes the copy of the map asso ciated with output c 7 0 0 a b 8 T if Lewis and Clark are to- 1 > > 1 > > < gether at a no de output = > 0 1 > > S if Lewis and Clark are at > : separate no des. c 0 d 1 Lewis: 0r sequence s = Lewis: 0r 1r Clark: r 0 homing sequence h = Clark: r 0r 1 start no de output end no de start no de output end no de a TTST b a TT a b ST ST b b ST a c ST ST b c ST a d STTT c d ST c Figure 1: Illustration of a two-rob ot homing sequence and a lead-lag sequence. Note that b oth h and s are lead-lag sequences. However, sequence h is a two-rob ot homing sequence, b ecause for each output sequence there is a unique end no de. Note that the converse is not true. Sequence s is not a two-rob ot homing sequence, b ecause the rob ots may end at no des a or c and yet see the same output sequence ST . sequence c. Initially, Lewis and Clark are at the same no de. Whenever algorithm Learn-with- Reset would use a reset, Learn-with-HS executes the homing sequence h. If the output of h is c, the algorithm learns a new edge in map as if it had b een reset to u in map see Figure 2. c 0 c After each execution of h, the algorithm learns a new edge in some map . Since there are at most c n copies, each with dn edges to learn, eventually one map will b e completed. Recall that a homing sequence is a lead-lag sequence. Therefore, at the b eginning and end of every homing sequence the two rob ots are together. Theorem 6 If Learn-with-HS is cal led with a homing sequence h as input, it halts within 2 4 2 O d n + dn jhj steps and outputs a graph isomorphic to G. Pro of: The algorithm Learn-with-HS maintains at most n running versions of Learn-with- Reset, one for each output of the homing sequence. In particular, whenever the two rob ots execute 8 Learn-with-HSh: 1 done := FALSE 2 while not done 3 do execute h; c := the output sequence pro duced f instead of a reset g 4 if map is unde ned c 5 then map := fu g; ; f map = graph consisting of no de u and no edges g c 0 c 0 6 path := empty path f path is the null sequence of edge lab els g c c 7 k := 1 f k counts the numb er of no des in map g c c 8 Learn-Edgemap ,path ,k c c c 9 if map has no un nished no des c 10 then done := TRUE 11 return map c a homing sequence and obtain an output c, they have identi ed their p osition as the start no de u 0 in map , and can learn one new edge in map b efore executing another homing sequence. c c Eventually, one of the versions halts and outputs a complete map . Therefore, the correctness c of Learn-with-HS follows directly from Theorem 4 and the de nition of a two-rob ot homing sequence. 2 3 Let r = O d n b e the numb er of steps taken by Learn-with-Reset. Since there are at most 2 n start no des, Learn-with-HS takes at most nr + dn jhj steps. 2 5 Learning a Homing Sequence Unlike a reset function, a two-rob ot homing sequence can b e learned. The algorithm Learn-Graph maintains a candidate homing sequence h and improves h as it learns G. De nition: Candidate homing sequence h is called a bad homing sequence if there exist no des u; v , u 6= v , such that outputh; u= outputh; v , but nal h; u 6 nal h; v . G G De nition: Let a be a node in G.Wesay that map with start no de u is a goodrepresentation c 0 c c of ha; Gi i there exists an isomorphism f from the no des in map =V ;E to the no des in a c 0 0 0 subgraph G =V ;E ofG, such that f u = a, and 0 c c 0 8u ;u 2 V ; hu ;`;u i2 E hf u ;`;fu i2 E : i j i j i j In algorithms Learn-with-Reset and Learn-with-HS, the graphs map and map are always c go o d representations of G.InLearn-Graph if the candidate homing sequence h is bad, a particular map may not b e a go o d representation of G. However, the algorithm can test for such maps. c Whenever a map is shown to b e in error, h is improved and all maps are discarded. By the pro of c of Theorem 5, we know that a candidate homing sequence must b e improved at most n 1 times. In Section 5.1 we explain how to use adaptive homing sequences to discard only one map p er improvement. Wenow de ne a test that with probability at least 1=n detects an error in map if one exists. c 9 output of starting no de map homing sequence of map 1 TTST b b 10dc 0 ST ST b b 1 a STTT c c 01ab Figure 2: A p ossible \snapshot" of the learners' knowledge during an execution of Learn-with- HS. The rob ots are learning the graph G from Figure 1 using the two-rob ot homing sequence h from Figure 1. No de names in maps are not known to the learner, but are added for clarity. The following example demonstrates how the rob ots learn a new edge using Learn-with-HS. Supp ose that the rob ots execute h and see output TTST. Then the rob ots are together at no de b. Lewis follows path 1; 0 to un nished no de c and then crosses the edge lab eled 1. Now Clark follows path 1; 0; 1. Since Clark sees Lewis after 2 steps, the dotted edge is added to map . Next, the rob ots TTST execute h again and see output ST ST .Thus, they go on to learn some edge in map . ST ST De nition: Let path b e a path such that a rob ot starting at u and following path traverses c 0 c c c c every edge in map =V ;E . Let u :::u b e the no des in V numb ered in order of their rst c 0 m app earance in path . If b oth rob ots are at u then test u denotes the following lead-lag sequence c 0 c i of actions: 1 Lewis follows path to the rst o ccurrence of u ; 2 Clark follows path to the rst c i c o ccurrence of u ; 3 Lewis follows path to the end; 4 Clark follows path to the end. i c c De nition: Given map and any lead-lag sequence t of actions, de ne expectedt, map tobe c c the exp ected output if map is correct and if b oth rob ots start at no de u and execute sequence t. c 0 We abbreviate expectedtest u ,map by expectedtest u . c i c c i Lemma 7 Suppose Lewis and Clark areboth at some node a in G.Let path bea path such that c arobot starting at u and fol lowing path traverses every edge in map . Then map is a good 0 c c c c representation of ha; Gi i 8u 2 V ; outputtest u =exp ectedtest u . i c i c i Pro of: =: By de nition of go o d representation and expectedtest u . c i =: Supp ose that all tests pro duce the exp ected output. We de ne a function f as follows: Let f u =a. Let pu b e the pre x of path up to the rst o ccurrence of u . De ne f u to b e the 0 i c i i 10 Learn-Graph: 1 done := FALSE 2 h := empty sequence 3 while not done 4 do execute h; c := the output sequence pro duced f instead of a reset g 5 if map is unde ned c 6 then map := fu g; ; f map is the graph consisting of no de u and no edges g c 0 c 0 7 path := empty path f path is the null sequence of edge lab els g c c 8 k := 1 f k counts the numb er of no des in map g c c 9 if map has no un nished no de map-reachable from nal path c M c 10 then Lewis and Clark moveto nal path M c 11 comp := maximal strongly-connected comp onentinmap containing nal path c M c 12 h-improve := Verify nal path ,comp M c 13 if h-improve = 14 then done := TRUE f map is complete g c 15 else f h-improve 6= . error detected g 16 app end h-improve to end of h f improve homing sequence :::g 17 discard all maps and paths f :::and start learning maps from scratch g 18 else v := value of a fair 0/1 coin ip f learn edges or test for errors? g 19 if v =0 f test for errors g 20 then u := a random no de in map f randomly pick no de to test g i c 21 h-improve := Testmap , path , i c c 22 if h-improve 6= f error detected g 23 then app end h-improve to end of h f improve homing sequence :::g 24 discard all maps and paths f :::start learning maps from scratch g 25 else Learn-Edgemap ,path ,k c c c 26 return map c Testmap , path , i: fu ;u ;:::;u = the no des in map indexed by rst app earance in path g c c 0 1 k c c 1 h-improve := the following sequence of actions: 2 Lewis follows path to the rst o ccurrence of u in path c i c 3 Clark follows path to the rst o ccurrence of u in path c i c 4 Lewis follows path to the end c 5 Clark follows path to the end c 6 if outputh-improve 6= expected-outputh-improve f if error detected g 7 then return h-improve f return test u g c i 8 else return f return empty sequence g Verifyv , map: f v ;v ;:::;v are the no des in map ordered by rst app earance in pg 0 0 1 k 1 path := path such that a rob ot starting at v in map and following path visits all no des 0 in map and returns to v 0 2 for each i; 0 i 3 do h-improve := Testmap, path, i 4 if h-improve 6= 5 then return h-improve 6 return 11 0 0 0 no de in G that a rob ot reaches if it starts at a and follows pu . Let G =V ;E b e the image i of f map onV; E. c c 0 0 We rst show that f is an isomorphism from V to V . By de nition of G , f must b e surjective. c To see that f is injective, assume the contrary. Then there exist two no des u ;u 2 V such that i j i 6= j but f u =f u . But then outputtest u 6= expectedtest u , which contradicts our i j c i c i c 0 assumption that all tests succeed. Next, we show that hu ;`;u i2V hf u ;`;fu i2 V ; i j i j proving that map is a go o d representation of ha; Gi. c 0 =: By de nition of G , the image of map . c c 0 =: Inductively assume that hu ;`;u i2V hf u ;`;fu i2 V for the rst m edges i j i j in path , and supp ose that this pre x of the path visits only no des u :::u . Now consider the c 0 i m + 1st edge e = ha; `; bi. There are two p ossibilities. In one case, edge e leads to some new no de 0 u . Then by de nition f u is b's image in G,sohf a;`;fbi2 G . Otherwise e leads to i+1 i+1 some previously-seen no de u . Supp ose that f u is not the no de reached in G by starting at i k i k u and following the rst m + 1 edges in path . Then outputtest u 6= expectedtest u , 0 c c i k c i k so test u fails, and we arrive at a contradiction. Therefore f b= f u and hf a;`;fbi c i k i k 0 2 G . 2 Corollary 8 Suppose Lewis and Clark aretogether at u in map .Let map be strongly connected 0 c c and have n nodes, u ;:::u . Then the two robots can verify whether map is a goodrepresen- 0 n 1 c 3 tation of hLoc Lewis;Gi in O n steps. G Pro of: Since map is strongly connected, there exists a path path with the following prop erty: c c a rob ot starting at u and following path visits all no des in map and returns to u . Index the 0 c c 0 remaining no des in order of their rst app earance in path . The two rob ots verify whether, for all c u in order, outputtest u = expectedtest u . Note that Lewis and Clark are together at u i c i c i 0 2 after each test. By Lemma 7, this pro cedure veri es map . Since path has length O n , each test c c 2 3 has length O n , so veri cation requires O n steps. 2 In Learn-Graph after the rob ots execute a homing sequence, they randomly decide either to learn a new edge or to test a random no de in map . The following lemma shows that a test that c failed can b e used to improve the homing sequence. Lemma 9 Let h bea candidate homing sequencein Learn-Graph, and let u bea node such k that outputtest u 6= exp ectedtest u . Then thereare two nodes a; b in G that h does not c k c k distinguish but that h test u does. c k Pro of: Let a be a node in G such that when b oth rob ots start at a, outputtest u 6= c k expectedtest u . Supp ose that at step i in test u , the exp ected output is T resp ectively c k c k S , but the actual output is S resp. T . Each edge in path and map was learned using Learn- c c Edge.Ifmap indicates that the ith no de in path is u , there must b e a start no de b in G where c c k u really is the ith no de in path . Since outputtest u 6= expectedtest u , the sequence h k c c k c k test u distinguishes a from b. 2 c k 12 The algorithm Learn-Graph runs until there are no more map-reachable unexplored no des in some map .Ifmap is not strongly connected, then it is not a go o d representation of G. In this c c case, the representation of the last strongly-connected comp onentonpath must b e incorrect. Thus, calling Verify on this comp onent from the last no de on path returns a sequence that improves h. If map is strongly connected, then either Verify returns an improvement to the homing sequence, c or map is a go o d representation of G. c Before we can prove the correctness of our algorithm, we need one more set of to ols. Consider the following statement of Cherno b ounds from Raghavan [Rag89]. Lemma 10 Let X ;:::;X be independent Bernoul li trials with E [X ]= p . Let the random 1 m j j P m variable X = X , where = E [X ] 0. Then for >0, j j =1 " e ; Pr[X>1 + ] < 1+ 1 + and 2