<<

Improved Noise-Tolerant Learning and Generalized Statistical Queries

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Aslam, Javed A. and Scott E. Decatur. 1994. Improved Noise- Tolerant Learning and Generalized Statistical Queries. Harvard Computer Science Group Technical Report TR-17-94.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:25691716

Terms of Use This article was downloaded from ’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Improved NoiseTolerant Learning and

Generalized Statistical Queries

y

Javed A Aslam Scott E Decatur

Aiken Computation Lab oratory

Harvard University

Cambridge MA

July

Abstract

The statistical query learning mo del can b e viewed as a to ol for creating or demon

strating the existence of noisetolerant learning algorithms in the PAC mo del The

complexity of a statistical query algorithm in conjunction with the complexity of sim

ulating SQ algorithms in the PAC mo del with noise determine the complexity of the

noisetolerant PAC algorithms pro duced Although roughly optimal upp er b ounds have

b een shown for the complexity of statistical query learning the corresp onding noise

tolerant PAC algorithms are not optimal due to inecient simulations In this pap er

we provide b oth improved simulations and a new variant of the statistical query mo del

in order to overcome these ineciencies

We improve the time complexity of the classication noise simulation of statistical

query algorithms Our new simulation has a roughly optimal dep endence on the noise

rate We also derive a simpler pro of that statistical queries can b e simulated in the

presence of classication noise This pro of makes fewer assumptions on the queries

themselves and therefore allows one to simulate more general types of queries

We also dene a new variant of the statistical query mo del based on relative error and

we show that this variant is more natural and strictly more p owerful than the standard

additive error mo del We demonstrate ecient PAC simulations for algorithms in this

new mo del and give general upp er b ounds on b oth learning with relative error statistical

queries and PAC simulation We show that any statistical query algorithm can b e

simulated in the PAC mo del with malicious errors in such a way that the resultant PAC

algorithm has a roughly optimal tolerable malicious error rate and sample complexity

Finally we generalize the types of queries allowed in the statistical query mo del We

discuss the advantages of allowing these generalized queries and show that our results

on improved simulations also hold for these queries

This pap er is available from the Center for Research in Computing Technology Division of Applied Sciences

Harvard University as technical rep ort TR

Author was supp orted by Air Force Contract FJ Part of this research was conducted while the

author was at MIT and supp orted by DARPA Contract NK and by NSF Grant CCR Authors

net address jaadasharvardedu

y

Author was supp orted by an NDSEG Do ctoral Fellowship and by NSF Grant CCR Authors net

address seddasharvardedu

Introduction

The statistical query mo del of learning was created so that algorithm designers could construct

noisetolerant PAC learning algorithms in a natural way Ideally such a mo del of robust learning

should restrict the algorithm designer as little as p ossible while maintaining the ability to eciently

simulate these new algorithms in the PAC mo del with noise In this pap er we b oth extend and

improve the current statistical query mo del in ways which b oth increase the p ower of the algorithm

designer and decrease the complexity of simulating these new algorithms We b egin by introducing

the various mo dels of learning required for the exp osition that follows

Since Valiants introduction of the Probably Approximately Correct mo del of learning PAC

learning has proven to b e an interesting and well studied mo del of machine learning In an instance

of PAC learning a learner is given the task of determining a close approximation of an unknown

f gvalued target function f from labelled examples of that function The learner is given access

to an example oracle and accuracy and condence parameters When p olled the oracle draws

an example according to a distribution D and returns the example along with its lab el according

to f The error rate of an hypothesis output by the learner is the probability that an example

chosen according to D will b e mislab elled by the hypothesis The learner is required to output an

hypothesis such that with high condence the error rate of the hypothesis is less then the accuracy

parameter Two standard complexity measures studied in the PAC mo del are sample complexity

and time complexity Ecient PAC learning algorithms have b een developed for many function

classes and PAC learning continues to b e a p opular mo del of machine learning

One criticism of the PAC mo del is that the data presented to the learner is assumed to b e

noisefree In fact most of the standard PAC learning algorithms would fail if even a small number

of the lab elled examples given to the learning algorithm were noisy Two p opular noise mo dels

for b oth theoretical and exp erimental research are the classication noise mo del introduced by

Angluin and Laird and the malicious error mo del introduced by Valiant and further

studied by Kearns and Li In the classication noise mo del each example received by the

learner is mislab elled randomly and indep endently with some xed probability In the malicious

error mo del an adversary is allowed with some xed probability to substitute a lab elled example

of his choosing for the lab elled example the learner would ordinarily see

While a limited number of ecient PAC algorithms had b een developed which tolerate classi

cation noise no general framework for ecient learning in the presence of classication

noise was known until Kearns introduced the Statistical Query mo del

In the SQ mo del the example oracle of the standard PAC mo del is replaced by a statistics oracle

An SQ algorithm queries this new oracle for the values of various statistics on the distribution of

lab elled examples and the oracle returns the requested statistics to within some sp ecied additive

error Up on gathering a sucient number of statistics the SQ algorithm returns an hypothesis

of the desired accuracy Since calls to the statistics oracle can b e simulated with high probability

by drawing a suciently large sample from the example oracle one can view this new oracle as

an intermediary which eectively limits the way in which a learning algorithm can make use of

lab elled examples Two standard complexity measures of SQ algorithms are query complexity the

maximum number of statistics required and tolerance the minimum additive error required

Kearns has demonstrated two imp ortant prop erties of the SQ mo del which make it worthy

of study First he has shown that nearly every PAC learning algorithm can b e cast within the SQ

mo del thus demonstrating that the SQ mo del is quite general and imp oses a rather weak restriction

on learning algorithms Second he has shown that calls to the statistics oracle can b e simulated

with high probability by a pro cedure which draws a suciently large sample from a classication

noise oracle An immediate consequence of these two prop erties is that nearly every PAC learning

Angluin and Laird introduced a general framework for learning in the presence of classication noise However

their metho ds do not yield computationall y ecient algorithms in most cases

algorithm can b e transformed into one which tolerates arbitrary amounts of classication noise

Decatur has demonstrated that calls to the statistics oracle can also b e simulated with high

probability by a pro cedure which draws a suciently large sample from a malicious error oracle

The amount of malicious error tolerable in such a simulation is prop ortional to the tolerance of the

SQ algorithm

The complexity of a statistical query algorithm in conjunction with the complexity of simulating

SQ algorithms in the various noise mo dels determine the complexity of the noisetolerant PAC

learning algorithms obtained Kearns has derived b ounds on the minimum complexity of SQ

algorithms and Aslam and Decatur have demonstrated a general technique for constructing SQ

algorithms which are nearly optimal with resp ect to these b ounds In spite of this the robust PAC

learning algorithms obtained by simulating SQ algorithms in the presence of noise are inecient

when compared to known lower b ounds for PAC learning in the presence of noise In

fact the PAC learning algorithms obtained by simulating SQ algorithms in the absence of noise

are inecient when compared to the tight b ounds known for noisefree PAC learning These

shortcomings could b e consequences of either inecient simulations or a deciency in the mo del

itself In this pap er we show that b oth of these explanations are true and we provide b oth new

simulations and a variant of the SQ mo del which combat the current ineciencies of PAC learning

via statistical queries

We improve the complexity of simulating SQ algorithms in the presence of classication noise

by providing a more ecient simulation If is a lower b ound on the minimum additive error

requested by an SQ algorithm and is an upp er b ound on the unknown noise rate then

b

Kearns original simulation essentially runs dierent copies of the SQ algorithm and

b

pro cesses the results of these runs to obtain an output We show that this branching factor can

b e reduced to log thus reducing the time complexity of the simulation We also provide

b

a new and simpler pro of that statistical queries can b e estimated in the presence of classication

noise and we show that our formulation can easily b e generalized to accommo date a strictly larger

class of statistical queries

We improve the complexity of simulating SQ algorithms in the absence of noise and in the

presence of malicious errors by prop osing a natural variant of the SQ mo del and providing ecient

simulations for this variant In the relative error SQ mo del we allow SQ algorithms to submit

statistical queries whose estimates are required within some sp ecied relative error We show that

a class is learnable with relative error statistical queries if and only if it is learnable with standard

additive error statistical queries Thus known learnability and hardness results for statistical

queries also hold in this variant

We demonstrate general b ounds on the complexity of SQ learning with relative error statistical

queries and we show that many learning algorithms can naturally b e written as highly ecient

relative error SQ algorithms We further provide simulations of relative error SQ algorithms in b oth

the absence and presence of noise These simulations in the absence of noise and in the presence

of malicious errors are more ecient than the simulations of additive error statistical queries and

yield PAC learning algorithms using roughly optimal number of examples These results hold for

al l classes which are learnable from statistical queries

Finally we show that our simulations of SQ algorithms in the absence of noise in the presence

of classication noise and in the presence malicious errors can all b e mo died to accommo date

a strictly larger class of statistical queries In particular we show that our simulations can ac

commo date realvalued and probabilistic statistical queries Probabilistic queries arise naturally

when applying b o osting techniques to algorithms which output probabilistic hypotheses while

realvalued queries allow an algorithm to query the expected value of a realvalued function of la

b elled examples Our results on improved simulations hold for these generalized queries in b oth

the absence and presence of noise

The remainder of the pap er is organized as follows In Section we formally dene the learning

mo dels of interest In Section we describ e the improved simulation of statistical query algorithms

in the presence of classication noise We introduce the mo del of statistical queries with relative

errors in Section and give results relating to this mo del In Section we generalize the types of

queries p ermitted in a statistical query algorithm We conclude the pap er with some op en questions

in Section

Learning Mo dels

In this section we formally dene the relevant mo dels of learning necessary for the exp osition that

follows We b egin by dening the examplebased PAC learning mo del as well as the classication

noise and malicious error variants We then dene the standard statistical query mo del which we

later generalize

ExampleBased PAC Learning

In an instance of PAC learning a learner is given the task of determining a close approximation of

an unknown f gvalued target function from lab elled examples of that function The unknown

target function f is assumed to b e an element of a known function class F dened over an example

n

space X The example space X is typically either the Bo olean hypercub e f g or ndimensional

n

Euclidean space We use the parameter n to denote the common length of each example x X

We assume that the examples are distributed according to some unknown probability distribu

tion D on X The learner is given access to an example oracle EX f D as its source of data A

call to EX f D returns a lab elled example hx l i where the example x X is drawn randomly and

indep endently according to the unknown distribution D and the lab el l f x We often refer to

a sequence of lab elled examples drawn from an example oracle as a sample

A learning algorithm draws a sample from EX f D and eventually outputs an hypothesis h

from some hypothesis class H dened over X For any hypothesis h the error rate of h is dened

to b e the distribution weight of those examples in X where h and f dier By using the notation

Pr P x to denote the probability of drawing an example in X according to D which satises

D

the predicate P we may dene error h Pr hx f x We often think of H as a class

D

of representations of functions in F and as such we dene size f to b e the size of the smallest

representation in H of the target function f

The learners goal is to output with probability at least an hypothesis h whose error rate

is at most for the given error parameter and condence parameter A learning algorithm is

said to b e polynomially ecient if its running time is p olynomial in n and size f

Classication Noise

In the classication noise model the lab elled example oracle EX f D is replaced by a noisy

example oracle EX f D Each time this noisy example oracle is called an example x X

CN

is drawn according to D The oracle then outputs hx f xi with probability or hx f xi

with probability randomly and indep endently for each example drawn Despite the noise in the

lab elled examples the learners goal remains to output an hypothesis h which with probability at

least has error rate error h Pr hx f x at most

D

While the learner do es not typically know the exact value of the noise rate the learner is given

an upp er b ound on the noise rate and the learner is said to b e p olynomially

b b

ecient if its running time is p olynomial in the usual PAC learning parameters as well as

b

Malicious Errors

In the malicious error model the lab elled example oracle EX f D is replaced by a noisy example

oracle EX f D When a lab elled example is requested from this oracle with probability

MAL

an example x is chosen according to D and hx f xi is returned to the learner With probability

a malicious adversary selects any example x selects a lab el l f g and returns hx l i Again

the learners goal is to output an hypothesis h which with probability at least has error rate

error h Pr hx f x at most

D

Statistical Query Based Learning

In the SQ mo del the example oracle EX f D from the standard PAC mo del is replaced by a

statistics oracle STAT f D An SQ algorithm queries the STAT oracle for the values of various

statistics on the distribution of lab elled examples eg What is the probability that a randomly

chosen lab elled example hx l i has variable x and l and the STAT oracle returns the

i

requested statistics to within some sp ecied additive error Formally a statistical query is of the

form Here is a mapping from lab elled examples to f g ie X f g f g

corresp onding to an indicator function for those lab elled examples ab out which statistics are to b e

gathered while is an additive error parameter A call to STAT f D returns an estimate

P of P Pr x f x which satises jP P j

D

A call to STAT f D can b e simulated with high probability by drawing a suciently large

sample from EX f D and outputting the fraction of lab elled examples which satisfy x f x as

the estimate P Since the required sample size dep ends p olynomially on and the simulation

time additionally dep ends on the time required to evaluate an SQ learning algorithm is said to

b e p olynomially ecient if the time required to evaluate each and the running time of the

SQ algorithm are all b ounded by p olynomials in n and size f

An SQ learning algorithm is said to use query space Q if it only makes queries of the form

where Q Let b e the lower b ound on the additive error of every query made by an SQ

algorithm

Classication Noise Simulations of SQ Algorithms

In this section we describ e an improved metho d for eciently simulating a statistical query algo

rithm using a classication noise oracle The advantages of this new metho d are twofold First our

simulation employs a new technique which signicantly reduces the running time of simulating SQ

algorithms Second our formulation for estimating individual queries is simpler and more easily

generalized

Kearns pro cedure for simulating SQ algorithms works in the following way Kearns shows that

given a query P can b e written as an expression involving the unknown noise rate and other

probabilities which can b e estimated from the noisy example oracle EX f D We note that

CN

the derivation of this expression relies on b eing f gvalued The actual expression obtained is

given b elow

P P p p P

In order to estimate P with additive error a sensitivity analysis is employed to determine

how accurately each of the comp onents on the righthand side of Equation must b e known

Kearns shows that for some constants c and c if is estimated within additive error c

and each of the probabilities is estimated within additive error c then the estimate

b

obtained for P from Equation will b e suciently accurate Since the value of is not known

the pro cedure for simulating SQ algorithms essentially guesses a set of values for f g

i

such that at least one satises j j c where is the minimum tolerance of the

j j

SQ algorithm Since c c the simulation uniformly guesses

b

b

values of b etween and For each guess of the simulation runs a separate copy of the SQ

b

algorithm and estimates the various P s using the formula given ab ove Since some guess at was

go o d at least one of the runs will have pro duced a go o d hypothesis with high probability The

various hypotheses are then tested to nd a go o d hypothesis of which at least one exists Note

that the guessing has a signicant impact on the running time of the simulation

In what follows we show a new derivation of P which is simpler and more easily generalizable

than Kearns original version We also show that to estimate individual P s it is only necessary to

have an estimate of within additive error c for some constant c We further show that the

thus signicantly reducing the time complexity number of guesses need only b e O log

b

of the SQ simulation

A New Derivation for P

In this section we present a simpler derivation of an expression for P In previous sections it

was convenient to view a f gvalued as a predicate so that P Pr x f x In this

D

section it will b e more convenient to view as a function so that P E x f x Further

D

by making no assumptions on the range of the results obtained herein can easily b e generalized

these generalizations will b e discussed in Section

Let X b e the example space and let Y X f g b e the lab elled example space We

consider a number of dierent examples oracles and the distributions these example oracles imp ose

on the space of lab elled examples For a given target function f and distribution D over X let

EX f D b e the standard noisefree example oracle In addition we dene the following example

oracles Let EX f D b e the antiexample oracle EX f D b e the noisy example oracle and

CN

EX f D b e the noisy antiexample oracle Note that we have access to EX f D and we can

CN CN

easily construct EX f D by simply ipping the lab el of each example drawn from EX f D

CN CN

and D Each of these oracles imp oses a distribution over lab elled examples Let D D D

f

f

f

f

b e these distributions resp ectively Note that P E x f x E

D D

f

Finally for a lab elled example y hx l i let y hx li We dene y y Note that is

l The function is easily constructed a new function which on input hx l i simply outputs x

from

Theorem

E E

D D

f f

P E

D

f

Pro of For simplicity of exp osition we assume that X is a nite discrete space eg the Bo olean

n

hypercub e f g so that D x is well dened In general the theorem holds for any probability

space X D where D is a probability measure on a algebra of subsets of X

We rst make some observations regarding the relationship b etween the various distributions

dened ab ove

D y D y

f

f

y D y D D y

f f

f

D y D y

f

f

y D y D y D

f f

f

D y D y

f

f

y D

f

Multiplying Equation by and Equation by we obtain

y D y D y D

f

f

f

y D y D y D

f

f

f

Subtracting Equation from Equation and solving for D y we nally obtain

f

y D y D

f

f

D y

f

This implies that

E E

D D

f

f

E

D

f

and since E E by Equation and the denition of we obtain Equation 2

D D

f

f

Note that in the derivation given ab ove we have not assumed that is f gvalued This

derivation is quite general and can b e applied to estimating the exp ectations of probabilistic and

realvalued s These results are given in Section

y y

E then P E Thus Finally note that if we dene y

D

D

f

f

given a whose exp ectation we require with resp ect to the noisefree oracle we can construct a

new whose exp ectation with resp ect to the noisy oracle is identical to the answer we require This

formulation may even b e more convenient if one has the capability of estimating the exp ectation

of realvalued functions we discuss this generalization in Section

Sensitivity Analysis

In this section we provide a sensitivity analysis of Equation to determine the accuracy with

which various quantities must b e estimated We make use of the following claim which can easily

b e shown

Claim If a b c and fa bc a b c a b cg then to obtain an estimate

of a within additive error it is sucient to obtain estimates of b and c within additive error

fc g respectively

and E Lemma Let E each within additive and E be estimates of E

D D D D

f f f f

error Then the quantity

E E

D D

f f

is within additive error of P E

D

f

Pro of To obtain an estimate of the righthand side of Equation within additive error it is

sucient to obtain estimates of the numerator and denominator with additive error

This condition holds for the denominator if is estimated with additive error

To obtain an estimate of the numerator within additive error it is sucient to

estimate the summands of the numerator with additive error Similarly to obtain

accurate estimates of these summands it is sucient to estimate E and E each with

D D

f f

additive error 2

Estimates for E and E are obtained by sampling and an estimate for is obtained

D D

f f

by guessing We address these issues in the following sections

Estimating E and E

D D

f f

One can estimate the exp ected values of all queries submitted by drawing separate samples for each

s and applying Lemma However b etter results are obtained by of the corresp onding and

app ealing to uniform convergence

Let Q b e the query space of the SQ algorithm and let Q f Qg The query space of

our simulation is Q Q Q Note that for nite Q jQ j jQj and for all Q VCdim Q

VCdim Q

If is a lower b ound on the minimum additive error requested by the SQ algorithm and is

b

an upp er b ound on the noise rate then by Lemma is a sucient additive error

b

with which to estimate all exp ectations Standard uniform convergence results can b e applied to

show that all exp ectations can b e estimated within the given additive error using a single noisy

sample of size

jQj

log m O

b

in the case of a nite query space or a single noisy sample of size

VCdim Q

m O log log

b

b b

in the case of an innite query space of nite VCdimension

Guessing the Noise Rate

By Lemma to obtain estimates for P it is sucient to have an estimate of the noise rate

within additive error Since the noise rate is unknown the simulation guesses various

values of the noise rate and runs the SQ algorithm for each guess If one of the noise rate guesses

is suciently accurate then the corresp onding run of the SQ algorithm will pro duce the desired

accurate hypothesis

To guarantee that an accurate guess is used one could simply guess values of

b

spaced uniformly b etween and This is essentially the approach adopted by Kearns Note that

b

this would cause the simulation to run the SQ algorithm times

b

log by constructing We now show that this branching factor can b e reduced to O

b

our guesses in a much b etter way The result follows immediately from the following lemma when

Lemma For al l there exists a sequence of guesses f g where i

b i

O such that for al l there exists an which satises j j log

b j j

b

Pro of The sequence is constructed as follows Let and consider how to determine from

j

The value is a valid estimate for all which satisfy Solving

j j j j

j j

Consider an The for we nd that is a valid estimate for all

j j j

value is a valid estimate for all which satisfy Solving for we nd

j j j

j

that is a valid estimate for all To ensure that either or is a valid estimate

j j j j

for any we set

j j

j j

Solving for in terms of we obtain

j j

j j

Substituting we obtain the following recurrence

j j

Note that if then as well

By constructing guesses using this recurrence we ensure that for all at least one

i

of f g is a valid estimate Solving this recurrence we nd that

i

i

X

j i

i

j

Since and we are only concerned with we may b ound the number of guesses required

b

by nding the smallest i which satises

i

X

j

b

j

Given that

i

X

i i

j

j

i

ln is sucient we need Solving for i we nd that any i ln

b

b

for all x we nd that Using the fact that x ln

x

i ln ln

b b

is an upp er b ound on the number of guesses required 2

The Overall Simulation

We now combine the results of the previous sections to obtain an overall simulation as follows

Draw m samples from EX f D in order to estimate exp ectations in Step

CN

Run the SQ algorithm once for each of the O guesses estimating the various log

b

queries by applying Lemma and using the sample drawn

log hypotheses obtained in Step Output one of Draw m samples and test the O

b

these hypotheses whose error rate is at most

Step can b e accomplished by a generalization of a technique due to Laird The sample

size required is

log m O log

b

b

Since for all SQ algorithms the sample complexity of the overall simulation is

jQj

log O log log

b

b

b

in the case of a nite query space or

VCdim Q

log log O

b

b b

in the case of an innite query space of nite VCdimension

To determine the running time of our simulation one must distinguish b etween two dierent

types of SQ algorithms Some SQ algorithms submit a xed set of queries indep endent of the

estimates they receive for previous queries We refer to these algorithms as batch SQ algorithms

Other SQ algorithms may submit various queries based up on the estimates they receive for previous

queries We refer to these algorithms as dynamic SQ algorithms Note that multiple runs of a

dynamic SQ algorithm may pro duce many more queries which need to b e estimated

With resp ect to Simon has shown a sample complexity and therefore time complexity

lower b ound of for PAC learning in the presence of classication noise We therefore

to note that by reducing the branching factor of the simulation from log

b

b

the running time of our simulation is optimal with resp ect to the noise rate mo dulo lower order

logarithmic factors for b oth dynamic and batch algorithms For dynamic algorithms the time

factor b etter than the current simulation complexity of our new simulation is in fact a

b

Statistical Queries with Relative Error Estimates

In the standard mo del of statistical query learning a learning algorithm asks for an estimate of the

probability that a predicate is true The required accuracy of this estimate is sp ecied by the

learner in the form of an additive error parameter The limitation of this mo del is clearly evident

in even the standard noisefree statistical query simulation This simulation uses

examples Since for all SQ algorithms this simulation eectively uses

examples However the dep endence of the general b ound on sample complexity for PAC learning

is

This sample complexity results from the worst case assumption that large

probabilities may need to b e estimated with small additive error Either the nature of statistical

query learning is such that learning sometimes requires the estimation of large probabilities with

small additive error or it is always sucient to estimate each probability with an additive error

comparable to the probability If the former were the case then the present mo del and simulations

would b e the b est that one could hop e for We show that the latter is true and that a mo del in

which queries are sp ecied with relative error is a more natural and strictly more p owerful to ol

We dene such a mo del of relative error statistical query learning and we show how this new

mo del relates to the standard additive error mo del We also show general upp er b ounds on learning

in this new mo del which demonstrate that for al l classes learnable by statistical queries it is

sucient to make estimates with relative error indep endent of We then give roughly optimal

PAC simulations for relative error SQ algorithms Finally we demonstrate natural problems which

only require estimates with constant relative error

The Relative Error SQ Mo del

Given the motivation ab ove we mo dify the standard mo del of statistical query learning to allow

for estimates to b e requested with relative error We replace the additive error STAT f D oracle

with a relative error RelSTAT f D oracle which accepts a query a relative error parameter

and a threshold parameter The value P Pr x f x is dened as b efore If P is less

D

than the threshold then the oracle may return the symbol If the oracle do es not return

P such that then it must return an estimate

P P P

Note that we consider any SQ algorithm with a p olynomial ly sized query space to b e a batch algorithm since

all queries may b e pro cessed in advance

c

When b we dene O b to mean O b log b for some constant c When b we dene O b to mean

c

O b log b for some constant c We dene similarly for some constant c and dene to mean O and

Note that the oracle may chose to return an accurate estimate even if P A class is said to

b e learnable by relative error statistical queries if it satises the same conditions of additive error

statistical query learning except we instead require that and are p olynomially b ounded

Let and b e the lower b ounds on the relative error and threshold of every query made by an SQ

algorithm Given this denition of relative error statistical query learning we show the following

desirable equivalence

Theorem F is learnable by additive error statistical queries if and only if F is learnable by

relative error statistical queries

Pro of One can take any query to the additive error oracle which requires additive tolerance

and simulate it by calling the relative error oracle with relative error and threshold Similarly

one can take any query to the relative error oracle which requires relative error and threshold and

simulate it by calling the additive error oracle with tolerance In each direction the simulation

uses p olynomially b ounded parameters if and only if the original algorithm uses p olynomially

b ounded parameters 2

Kearns shows that almost all classes known to b e PAC learnable are learnable with additive

error statistical queries By the ab ove theorem these classes are also learnable with relative error

statistical queries In addition the hardness results of Kearns for learning parity functions and

the general hardness results of Blum et al based on Fourier analysis also hold for relative error

statistical query learning

One can convert an additive error SQ algorithm to a relative error SQ algorithm in a more

ecient way than describ ed in the pro of of Theorem The key idea is that for each query

we search for P starting at by successive doubling

Theorem An additive error query can be simulated by O logP relative error queries

where for each i and P

i i i i i

i

Pro of We rst give the search algorithm then prove it correctness and nally prove b ounds on

the number and complexity of the queries made

i

Let and Let and If the resp onse P to the query

i i i i

i i

is then is returned If P then P is returned Otherwise the next

i i i i

query is asked

i i

Note that for all i Since we have

i i i i i

i

If we ask query i it must have b een true that P and since

i i

i

P P we can show that P as follows

i i

i

P P

i

i i i

i

i

Thus after the rst query no query will b e asked which could b e answered If the rst query is

answered then returning is correct since P

i i

When P is returned we know that P which implies that P But since

i i i

i

the additive dierence b etween P and P is guaranteed to b e no more than P this gap is no

i

more than and therefore the returned value is suciently accurate

i i

It is trivial to verify that for all i and The search

i i i

i

i

will end when is small enough to force P This condition is guaranteed when

i i i

i

P and therefore when P Since this holds for the

i i i i i

i

rst i such that P Therefore the maximum number of queries made is O logP

i

and the worst case P 2

i

i

A Natural Example of Relative Error SQ

In this section we examine a learning problem which has b oth a simple additive error SQ algorithm

and a simple relative error SQ algorithm We consider the problem of learning a monotone con

junction of Bo olean variables in which the learning algorithm must determine which subset of the

variables fx x g are contained in the unknown target conjunction f

n

We construct an hypothesis h which contains all the variables in the target function f and

thus h will not misclassify any negative examples We further guarantee that for each variable x

i

in h the distribution weight of examples which satisfy x and f x is less than n

i

Therefore the distribution weight of p ositive examples which h will misclassify is at most Such

an hypothesis has error rate at most

Consider the following query x l x l P is simply the probability that x

i i i

i

is false and f x is true If variable x is in f then P If we mistakenly include a variable x

i i

i

in our hypothesis which is not in f then the error due to this inclusion is at most P We simply

i

construct our hypothesis to contain all target variables but no variables x for which P n

i

i

An additive error SQ algorithm queries each with additive error n and includes all variables

i

for which the estimate P n Even if P the oracle is constrained to return an estimate

i i

with additive error less than n A relative error SQ algorithm queries each with relative error

i

or and threshold n and includes all variables for which the estimate P

i

The sample complexity of the standard noisefree PAC simulation of additive error SQ algo

rithms dep ends linearly on while in Section we show that the sample complexity of

a noisefree PAC simulation of relative error SQ algorithms dep ends linearly on Note that

in the ab ove algorithms for learning conjunctions n while n We

further note note that the is constant for learning conjunctions We show in Section that no

learning problem requires to dep end on and in Section that is actually a constant in

many algorithms

General Bounds on Relative Error SQ Learning

In this section we prove general upp er b ounds on the complexity of relative error statistical query

learning We do so by applying b o osting techniques and sp ecically these techniques

as applied in the statistical query mo del We rst prove some useful lemmas which allow us to

decomp ose relative estimates of ratios and sums

Lemma Let a bc where a b c If an estimate of a is desired with error

provided that c then it is sucient to estimate c with error and b with

error

Pro of If the estimatec is or less than then c Therefore an estimate for a is not

required and we may halt Otherwisec and therefore c

since If the estimate b is then b Therefore a bc so we may answer

a Otherwise b andc are estimates of b and c each within a factor It is easy to show

that bc is within a factor of a 2

P P

p If an estimate Lemma Let s p z where the fp g are known s p z and

i i i i i i

i

of s is desired with error then it is sucient to estimate each z with error

i

P

Pro of Let B fi estimate of z is g E fi estimate of z is z g s p z and s

i i i B i i E

B

P P

p z If s then we return otherwise p z Note that s Let s

i i E i i B E

E E

we return s

E

Ifs then s But in this case s s s

E E E B

so we are correct in returning

Otherwise we return s which is at least If B then it is easy to see thats is

E E

within a and therefore factor of s Otherwise we are implicitl y setting z for

i

each i B and therefore it is enough to show that s s

E

Since s we have s Using the fact that for all

E E

we have s If s s then

E E E

s s since s and s s s But since s s this condition

E B B E E E

holds when s s Solving for s this nal condition holds when

E E E

s which we have shown to b e true whenever an estimate is returned 2

E

Theorem If a concept class F is SQ learnable by algorithm A then F is SQ learnable with

queries each of relative error and threshold log Here N and O N log

are the number of queries worst case relative error and worst case threshold respectively of

algorithm A run with a constant accuracy parameter Note that N and are independent

of

Pro of Aslam and Decatur show that given an SQ learning algorithm A one can construct a

very ecient SQ algorithm by combining the output of O log runs of A Each run of A is made

with resp ect to a dierent distribution and uses accuracy parameter Each run makes at

most N queries each with relative error no smaller than and threshold no smaller than

In run i the algorithm makes queries of the form STAT f D x f x where D is a

i i

distribution based on D Since the learning algorithm only has access to a statistics oracle for D

they show that a query with resp ect to D may b e written in terms of new queries with resp ect

i

to D as follows

P

w

w w

x f x STAT f D x f x

j

j j

P

STAT f D x f x

i

w

w w

STAT f D x f x

j

j j

P

w w

It is also the are known and In the ab ove equation w i the values

j

j j

case that if the denominator of Equation is less than log then the query need not

b e estimated Using Lemmas and it is easy to show that a query to STAT f D can b e

i

estimated by making queries to STAT f D with relative error at least and threshold at least

Since a query with resp ect to D requires O i queries with resp ect to D the total

i

number of queries made is O N log 2

Simulating Relative Error SQ Algorithms

In this section we give the complexity of simulating relative error SQ algorithms in the PAC mo del

b oth in the absence and presence of noise We also give general upp er b ounds on the complexity

of PAC algorithms derived from SQ algorithms based on the simulations and the general upp er

b ounds of Theorem

The simulation of relative error SQ algorithms in the noisefree PAC mo del is based on a

standard Cherno b ound analysis We obtain the following theorem on the sample complexity of

such a simulation The pro of of this theorem is identical to the pro of of Theorem b elow if in the

latter pro of the malicious error rate is set to

Theorem If F is learnable by a statistical query algorithm which makes N queries from query

space Q with worst case relative error and worst case threshold then F is PAC learnable

jQj

N N

when Q is nite or O when drawing a separate with sample complexity O log log

sample for each query

Corollary If F is SQ learnable then F is PAC learnable with a sample complexity whose de

pendence on is O

Although one could use b o osting techniques in the PAC mo del to achieve this nearly optimal

sample complexity these b o osting techniques would result in a more complicated algorithm and

output hypothesis a circuit whose inputs were hypotheses from the original hypothesis class If

instead we have a relative error SQ algorithm meeting the b ounds of Theorem then we achieve

this PAC sample complexity directly

For SQ simulations in the classication noise mo del we achieve the sample complexity given in

Theorem b elow This sample complexity is essentially identical to the simulation of an additive

error SQ algorithm for which and in fact this is one way of proving the result Although

this result do es not improve the sample complexity of SQ simulations in the presence of classication

noise we b elieve that to improve up on this b ound requires the use of relative error statistical queries

for the reasons discussed in the introduction to Section

Theorem If F is learnable by a statistical query algorithm which makes N queries from query

space Q with worst case relative error and worst case threshold then F is PAC learnable

in the presence of classication noise If is an upper bound on the noise rate then the

b

sample complexity required is

jQj

O log log log

b

b

b

when Q is nite or

N N

log log log O

b

b

b

when drawing a separate sample for each query

Corollary If F is SQ learnable then F is PAC learnable in the presence of classication noise

The dependence on and of the required sample complexity is O

b

b

We next consider the simulation of relative error SQ algorithms in the presence of malicious

errors Decatur showed that an SQ algorithm can b e simulated in the presence of malicious errors

with a maximum allowable error rate which dep ends on the smallest additive error required by

the SQ algorithm In Theorem we show that an SQ algorithm can b e simulated in the presence

of malicious errors with a maximum allowable error rate and sample complexity which dep end on

and the minimum relative error and threshold required by the SQ algorithm

The key to this malicious error tolerant simulation is to draw a large enough sample such that

for each query the combined error in an estimate due to b oth the adversary and the statistical

uctuation on errorfree examples is less than the accuracy required for this query We make use

of the following lemmas

Lemma Cherno Bounds Let X X be independent Bernoul li random variables

m

P

each of whose expectation is p Let Y be the random variable X If and m

i

i

m

then p Y p with probability at least ln

p

Lemma Let P be the fraction of examples satisfying in a noisefree sample of size m and let

P be the fraction of examples satisfying in a sample of size m drawn from EX f D Then

MAL

to ensure with high probability jP P j it is sucient to draw a sample of size m

which with high probability simultaneously ensures both

The adversary corrupts at most a fraction of examples drawn from EX f D

MAL

jP P j

Pro of Let m b e a sample size large enough to ensure with high probability b oth conditions hold

Consider a sample of size m drawn from a malicious error oracle in which the adversary decided

not to corrupt any examples for which it was given the opp ortunity Then by Condition with

high probability the fraction of examples satisfying on this sample is within of P But by

Condition with high probability the adversary may change the empirical fraction of examples

satisfying by no more than Thus the empirical fraction of examples is within of P 2

Theorem If F is learnable by a statistical query algorithm which makes N queries from query

space Q with worst case relative error and worst case threshold then F is PAC learnable

in the presence of malicious errors The maximum al lowable error rate is and the

jQj

N N

when Q is nite or O when drawing a sample complexity required is O log log

separate sample for each query

Pro of We rst analyze the tolerable error and sample complexity for simulating a single query

and then determine these values for simulating the entire algorithm

We assume that For a given query P is the probability with resp ect

to the noisefree example oracle which needs to b e estimated with error Let S b e a sample of

c c

size m for which constants c and c are appropriately chosen Note that this sample ln

size is sucient to ensure with high probability that the adversary do es not corrupt more than a

fraction of the examples

We rst show that this sample is large enough to condently say that either P or P

Let P b e the fraction of examples satisfying in sample of size m drawn from EX f D With

high probability

P P and P P

Let P b e the fraction of examples in a sample of size m drawn from EX f D With high

MAL

probability the adversary corrupts no more than fraction of the examples

and therefore by Lemma with high probability

P P and P P

Thus if P then P and can b e returned Otherwise P and therefore

P In this case P is returned and we must show that P is within a factor of P

Since P a sample of size m ensures with high probability that P is within a

factor of P or equivalently that P is within P of P The sample also ensures with high

probability that the adversary corrupts no more than a P fraction of the

examples Therefore Lemma ensures with high probability that P is within P of P Thus

P is within a factor of P

The total sample complexity required when drawing a separate sample for each query follows

directly from the ab ove analysis The total sample complexity required when using a nite query

space follows by noting that on such a sample the noisefree empirical estimates of al l queries

converge to their true probabilities and the adversary corrupts no more than a fraction of the

examples 2

Corollary If F is SQ learnable then F is PAC learnable in the presence of malicious errors

The dependence on of the maximum al lowable error rate is while the dependence on of the

required sample complexity is O

Note that we are within logarithmic factors of b oth the O maximum allowable malicious

error rate and the lower b ound on the sample complexity for noisefree PAC learn

ing In this malicious error tolerant PAC simulation the sample time space and hypothesis

size complexities are asymptotically identical to the corresp onding complexities in our noisefree

PAC simulation

Very Ecient Malicious Error Learning

In previous sections we showed general upp er b ounds on relative error SQ algorithms and the

eciency of PAC algorithms derived from them In this section we describ e relative error SQ

algorithms which actually achieve these b ounds and therefore have very ecient malicious error

tolerant PAC simulations We rst present a very ecient algorithm for learning conjunctions in

the presence of malicious errors when there are many irrelevant attributes We then highlight a

prop erty of this SQ algorithm which allows for its eciency and we further show that many other

SQ algorithms naturally exhibit this prop erty as well We can simulate these SQ algorithms in

the PAC mo del with malicious errors with roughly optimal malicious error tolerance and sample

complexity

Decatur gives an algorithm for learning conjunctions which tolerates a malicious error rate

independent of the number of irrelevant attributes thus dep ending only on the number of rele

vant attributes and the desired accuracy This algorithm while reasonably ecient is based on

an additive error SQ algorithm of Kearns and therefore do es not have an optimal sample

complexity

We present an algorithm based on relative error statistical queries which tolerates the same

malicious error rate and has a sample complexity whose dep endence on roughly matches the

general lower b ound for noisefree PAC learning

Theorem The class of conjunctions of size k over n variables is PAC learnable with malicious

errors The maximum al lowable malicious error rate is and the sample complexity required

k log

k k

is O log log n log log

Pro of We present a pro of for learning monotone conjunctions of size k and we note that this

pro of can easily b e extended for learning nonmonotone conjunctions of size k

The target function f is a conjunction of k variables We construct an hypothesis h which is

variables such that the distribution weight of misclassied p ositive a conjunction of r O k log

examples is at most and the distribution weight of misclassied negative examples is also at

most

First all variables which could contribute more than r error on the p ositive examples are

eliminated from consideration This is accomplished by using the same queries that the monotone

conjunction SQ algorithm of Section uses The queries are asked with relative error and

threshold r

Next the negative examples are greedily covered so that the distribution weight of misclassi

ed negative examples is no more than We say that a variable covers all negative examples for

which this variable is false We know that the set of variables of f is a cover of size k for the entire

space of negative examples We iteratively construct h by conjoining new variables such that the

distribution weight of negative examples covered by each new variable is at least a fraction of

k

the distribution weight of negative examples remaining to b e covered

Given a partially constructed hypothesis h x x x let X b e the set of negative

j i i i

j j

examples not covered by h ie X fx f x h x g Let D b e the conditional

j j

j j

distribution on X induced by D ie for any x X D x D xD X By denition X

j j j j

is the space of negative examples and D is the conditional distribution on X We know that

the target variables not yet in h cover the remaining examples in X and therefore there exists a

j

j

cover of X of size at most k Thus there exists at least one variable which covers a set of negative

j

examples in X whose distribution weight with resp ect to D is at least

j j

k

Given h for each x let x l AjB x jl h x Note that P is

j i ji i j

ji

the distribution weight with resp ect to D of negative examples in X covered by x Thus there

i

j j

By duality identical results also hold for learning disjunctions

exists a variable x such that P is at least To nd such a variable we ask queries of the ab ove

i

ji

k

and threshold Note that this is a query for a conditional probability form with relative error

k

which must b e determined by the ratio of two unconditional probabilities We show how to do

this b elow Since there exists a variable x such that P we are guaranteed to nd some

i

ji

k

Note that if P then such that the estimate P is at least variable x

i

k k k

ji ji

to h we are guaranteed to cover a set of negative P Thus by conjoining x

j

i

k k ji

examples in X whose distribution weight with resp ect to D is at least Since the distribution

j j

k

factor in weight with resp ect to D of uncovered negative examples is reduced by at least a

k

iterations each iteration it is easy to show that this metho d requires no more than r O k log

to cover all but a set of negative examples whose distribution weight with resp ect to D and

therefore with resp ect to D is at most

We now show how to estimate the conditional probability query AjB with relative error

We estimate b oth queries which constitute the standard expansion of and threshold

k

the conditional probability App ealing to Lemma we rst estimate B the probability that a

negative example is not covered by h using relative error and threshold If this

then the weight of negative examples misclassied by estimate is or less than

h is at most so we halt and output h Otherwise we have a lower b ound of

on this probability and can therefore estimate A B with relative error and threshold

k

For this algorithm the worst case relative error is the worst case threshold is

k log

log n Therefore the theorem follows from Theorem 2 and log jQj O k log

An imp ortant prop erty of this statistical query algorithm is that for every query we need only to

determine whether P falls b elow some threshold or ab ove some constant fraction of this threshold

This allows the relative error parameter to b e a constant The learning algorithm describ ed in

Section for monotone conjunctions has this prop erty and we note that many other learning

problems which involve covering such as learning axis parallel rectangles and decision lists also

have this prop erty In all these cases we obtain very ecient malicious error tolerant algorithms

Probabilistic and RealValued Queries

Throughout this pap er we have assumed that queries submitted to the statistical query oracle

were restricted to b eing deterministic f gvalued function of lab elled examples In this case

the oracle returned an estimate of the probability that x f x on an example x chosen

randomly according to D

We now generalize the SQ mo del to allow algorithms to submit queries which are probabilistic

and realvalued Formally we dene a probabilistic realvalued query to b e a such that for every

lab elled example hx l i x l is a random variable whose range is M and whose exp ectation

exists We dene P to b e the exp ected value of where the exp ectation is taken over the draw

of the example and the random value of given the lab elled example Note that under these

conditions P always exists

This generalization can b e quite useful If the learning algorithm requires the exp ected value

of some function of lab elled examples it may simply sp ecify this using a realvalued query Proba

bilistic queries are required when applying b o osting techniques to weak learning algorithms which

output probabilistic hypotheses When applying b o osting techniques the queries constructed are

functions of weak hypotheses Thus if these weak hypotheses are probabilistic the corresp ond

The range M is used so that we can show relative error simulations For additive error one may consider any

interval a a M and simply translate

A weak learning algorithm is one which outputs an hypothesis whose accuracy is just slightly b etter than random

guessing

ing queries are also b e probabilistic Goldman Kearns and Schapire show that by allowing a

weak learning algorithm to output a probabilistic hypothesis the complexity of learning is reduced

Therefore this this generalization gives the algorithm designer more freedom and p ower Further

more the ability to eciently simulate these algorithms in the PAC mo del in b oth the absence and

presence of noise is retained as shown b elow

Since the exp ectation P always exists the results given b elow Theorems may b e proven

identically to the resp ective deterministic f gvalued cases by simply applying Ho eding and

Cherno style b ounds for b ounded real random variables Lemmas and Note that when

the range of the queries is a unit sized interval ie M these sample complexities and noise

tolerances are identical to those for deterministic f gvalued queries

Lemma Lemma in Let X X be independent identically distributed random

m

P

m

variables taking real values in the range a a M Let Y be the random variable X Then

i

i

m

for any t

mt M

PrjY E Y j t e

Lemma Corollary in Let X X be independent identically distributed ran

m

P

m

X and dom variables taking real values in the range Let Y be the random variable

i

i

m

p E Y Then for any

mp

PrY p p e

mp

Prp Y p e

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case additive error then F is PAC learnable with

jQj

N M NM

log log sample complexity O when Q is nite or O when drawing a separate sample

for each query

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case relative error and worst case threshold

jQj

NM N M

when Q is nite or O log log then F is PAC learnable with sample complexity O

when drawing a separate sample for each query

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case additive error then F is PAC learnable in the

presence of classication noise If is an upper bound on the noise rate then the sample

b

complexity required is

jQj

M

O log log log

b

b

b

when Q is nite or

NM N

log log O log

b

b

b

when drawing a separate sample for each query

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case relative error and worst case threshold

then F is PAC learnable in the presence of classication noise If is an upper bound on

b

the noise rate then the sample complexity required is

jQj

M

log log log O

b

b

b

when Q is nite or

NM N

O log log log

b

b

b

when drawing a separate sample for each query

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case additive error then F is PAC learnable in

the presence of malicious errors The maximum al lowable error rate is M and the sample

jQj

M NM N

complexity required is O when Q is nite or O when drawing a separate log log

sample for each query

Theorem If F is learnable by a statistical query algorithm which makes N probabilistic M

valued queries from query space Q with worst case relative error and worst case threshold

then F is PAC learnable in the presence of malicious errors The maximum al lowable error rate is

jQj

M NM N

M and the sample complexity required is O when Q is nite or O log log

when drawing a separate sample for each query

Op en Questions

The question of what sample complexity is required to simulate statistical query algorithms in the

presence of classication noise remains op en The current simulations of b oth additive and relative

error SQ algorithms yield PAC algorithms whose sample complexities dep end quadraticly on

However in the absence of computational restrictions all nite concept classes can b e learned in

the presence of classication noise using a sample complexity which dep ends linearly on

As discussed in Section many classes which are SQ learnable have algorithms with a constant

worst case relative error Can one show that al l classes which are SQ learnable have algorithms

with this prop erty or instead characterize exactly which classes do

References

Dana Angluin Computational learning theory Survey and selected bibliography In Proceed

th

ings of the Annual ACM Symposium on the Theory of Computing

Dana Angluin and Philip Laird Learning from noisy examples Machine Learning

Dana Angluin and Leslie G Valiant Fast probabilistic algorithms for Hamiltonian circuits

and matchings Journal of Computer and System Sciences April

Javed Aslam and Scott Decatur General b ounds on statistical query learning and PAC learn

th

ing with noise via hypothesis b o osting In Proceedings of the Annual Symposium on

Foundations of Computer Science pages November

Avrim Blum Merrick Furst Jeery Jackson Michael Kearns Yishay Mansour and Steven

Rudich Weakly learning DNF and characterizing statistical query learning using fourier anal

th

ysis In Proceedings of the Annual ACM Symposium on the Theory of Computing May

Anselm Blumer Andrzej Ehrenfeucht and Manfred K Warmuth Learnability

and the VapnikChervonenkis dimension Journal of the ACM

Scott Decatur Statistical queries and faulty PAC oracles In Proceedings of the Sixth Annual

ACM Workshop on Computational Learning Theory pages ACM Press July

Andrzej Ehrenfeucht David Haussler Michael Kearns and Leslie Valiant A general lower

b ound on the number of examples needed for learning Information and Computation

September

Yoav Freund Bo osting a weak learning algorithm by ma jority In Proceedings of the Third

Annual Workshop on Computational Learning Theory pages Morgan Kaufmann

Yoav Freund An improved b o osting algorithm and its implications on learning complexity

In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory pages

ACM Press

Sally A Goldman Michael J Kearns and Rob ert E Schapire On the sample complexity

of weak learning In Proceedings of the Third Annual Workshop on Computational Learning

Theory pages Morgan Kaufmann

Michael Kearns Ecient noisetolerant learning from statistical queries In Proceedings of the

th

Annual ACM Symposium on the Theory of Computing pages San Diego

Michael Kearns and Ming Li Learning in the presence of malicious errors In Proceedings of

th

the Annual ACM Symposium on Theory of Computing Chicago Illinois May

Philip D Laird Learning from Good and Bad Data Kluwer international series in engineering

and computer science Kluwer Academic Publishers Boston

Colin McDiarmid On the metho d of b ounded dierences In J Siemons editor Surveys

in Combinatorics pages Cambridge University Press Cambridge London

Mathematical So ciety LNS

Yasubumi Sakakibara Algorithmic Learning of Formal Languages and Decision Trees PhD

thesis Tokyo Institute of Technology Octob er International Institute for Advanced

Study of So cial Information Science Fujitsu Lab oratories Ltd Research Rep ort I IASRR

E

Rob ert Schapire The strength of weak learnability Machine Learning

Hans Ulrich Simon General b ounds on the number of examples needed for learning probabilis

tic concepts In Proceedings of the Sixth Annual ACM Workshop on Computational Learning

Theory pages ACM Press

Leslie Valiant A theory of the learnable Communications of the ACM

November

Leslie Valiant Learning disjunctions of conjunctions In Proceedings of the International Joint

Conference on Articial Intel ligence