A Practical Generalization of Fourier-Based Learning

A Practical Generalization of Fourier-based Learning Adam Drake [email protected] Dan Ventura [email protected] Computer Science Department, Brigham Young University, Provo, UT 84602 USA Abstract was introduced by Kushilevitz and Mansour (1993). This paper presents a search algorithm for Their algorithm (hereafter referred to as the KM algo- finding functions that are highly correlated rithm) does not require a function’s spectral represen- with an arbitrary set of data. The functions tation to be concentrated on the low-order coefficients. found by the search can be used to approx- Instead, it recursively searches the space of Fourier ba- imate the unknown function that generated sis functions to find, with high probability, the basis the data. A special case of this approach is a functions with the largest coefficients. It relies on a method for learning Fourier representations. membership oracle (a black box that can be queried Empirical results demonstrate that on typi- to learn the value of the function at any point) to ef- cal real-world problems the most highly cor- ficiently carry out its search. related functions can be found very quickly, These algorithms have been successfully used to prove while combinations of these functions provide learnability results for many classes of problems. How- good approximations of the unknown func- ever, there has been little work in applying Fourier- tion. based algorithms to real-world problems. The primary difficulty in applying Fourier techniques to real-world problems is that the number of Fourier basis func- 1. Introduction tions, and the time required to compute the Fourier transform, is exponential in the number of inputs to a The discrete Fourier transform converts a function into function. The LMN and KM algorithms avoid expo- a unique spectral representation in which it is repre- nential complexity by imposing restrictions that limit sented as a linear combination of Fourier basis func- real-world applicability. tions. The ability to represent functions as a combination of basis functions led to the development of The LMN algorithm avoids exponential complexity by learning algorithms based on the Fourier transform. approximating only the low-order coefficients. This re- These Fourier-based learning algorithms, which have striction is undesirable because it limits the set of func- been used primarily in the field of computational learn- tions that can effectively learned. Furthermore, there ing theory, learn functions by approximating the coef- is generally nothing known about the spectral repre- ficients of the most highly correlated basis functions. sentation of a real-world learning problem, making it impossible to know beforehand whether the restriction The first Fourier-based learning algorithm was intro- to low-order coefficients is acceptable. duced by Linial, Mansour, and Nisan (1993). They presented an algorithm (hereafter referred to as the The KM algorithm avoids exponential complexity by LMN algorithm) that learns functions by approximat- relying on a membership oracle to guide its search for ing the coefficients of the low-order basis functions. large coefficients. The requirement of a membership Given sufficient training examples drawn from a uni- oracle greatly limits the applicability of the algorithm. form distribution, the LMN algorithm can effectively Mansour and Sahar (2000) presented results of effec- learn any function whose spectral representation is tively applying the KM algorithm to a real-world prob- concentrated on the low-order coefficients. lem for which a membership oracle exists. Unfortu- nately, however, the existence of a membership oracle Another important Fourier-based learning algorithm is not typical of many learning scenarios. nd Appearing in Proceedings of the 22 International Confer- The main result of this paper, which may be useful ence on Machine Learning, Bonn, Germany, 2005. Copy- right 2005 by the author(s)/owner(s). beyond the field of machine learning, is a new search A Practical Generalization of Fourier-based Learning algorithm that efficiently searches a space of functions lated with f have coefficients with large absolute val- to find those that are most highly correlated with an ues. (A negative coefficient indicates an inverse corre- arbitrary set of data. A special case of this algorithm lation.) is a method for finding the most highly correlated Given the Fourier basis functions and coefficients as Fourier basis functions. This search allows Fourier- defined above, any Boolean function f can be repre- based learning algorithms to overcome the limitations sented as a linear combination of the basis functions: of previous approaches. 2n−1 X ˆ In addition to describing the search algorithm and f(x) = f(α)χα(x) (3) demonstrating its effectiveness, results of learning real- α=0 world problems with the results of the search are presented. The first approach is a standard Fourier-based A Fourier-based learning algorithm can approximate approach of learning a linear combination of Fourier the Fourier coefficients when the function is only par- basis functions. However, because the search algo- tially known, as is the case in typical learning scenar- rithm can be used to find other highly correlated func- ios. Let X be a set of available training data, with x tions, the basic Fourier approach is generalized to al- being a particular training example. Then the Fourier low other types of functions to be used in the linear coefficients can be approximated by the following: combination. This extension, which is a departure ˜ 1 X fˆ(α) = f(x)χ (x) (4) from Fourier theory, has been found to be very use- |X| α ful in practice. x∈X As described previously, the search algorithm pre- 2. Definitions and Notation sented in this paper can find functions other than the Fourier basis functions. In particular, two additional Previous work in Fourier-based learning has been con- types of Boolean functions are considered: functions cerned with Boolean functions and the Fourier trans- that compute the conjunction (logical AND) of sub- form for Boolean functions, which is also known as sets of the inputs and functions that compute the dis- a Walsh transform. The algorithms and results pre- junction (logical OR) of subsets of the inputs. These sented in this paper are also for Boolean functions, functions can be defined in a manner similar to the although many of the ideas presented could be applied Fourier basis functions as follows: to discrete-valued functions and the more general dis- Pn−1 Pn−1 crete Fourier transform. +1 : if i=0 xiαi = i=0 αi ANDα(x) = Pn−1 Pn−1 (5) −1 : if i=0 xiαi < i=0 αi Let f be a Boolean function of the form f : {0, 1}n → +1 : if Pn−1 x α > 0 {1, −1} where n is the number of inputs. There are OR (x) = i=0 i i (6) n α Pn−1 2 Fourier basis functions for a function of n Boolean −1 : if i=0 xiαi = 0 inputs, each indexed by a binary number α ∈ {0, 1}n. Coefficients for the AND and OR functions can be Each basis function χα is defined as follows: computed in the same manner as shown in (4) simply Pn−1 by replacing χ with either AND or OR . These +1 : if i=0 xiαi is even α α α χα(x) = Pn−1 (1) “coefficients” measure the correlation with the func- −1 : if i=0 xiαi is odd tion f, but otherwise do not have the same meaning where x ∈ {0, 1}n. Note that each basis function com- as the Fourier basis function coefficients. Neither the putes the parity (or logical XOR) of a subset of the AND nor the OR functions form a basis for the space inputs. Specifically, basis function χα computes the of Boolean functions, and the coefficients do not gen- parity of those inputs xi for which αi = 1. There is erally yield a linear combination for representing f. one basis function for each possible subset of inputs. As each AND, OR, and parity function measures the The Fourier coefficients fˆ(α) are defined by: correlation between itself and the training data, we refer to these functions as correlation functions. 2n−1 1 X fˆ(α) = f(x)χ (x) (2) 2n α x=0 3. A Best-first Search for Correlation Functions Thus, each coefficient is computed by taking the dot product of the outputs of functions f and χα and This section presents the algorithm for finding correla- therefore measures the correlation between f and χα. tion functions that are highly correlated with an arbi- In other words, basis functions that are highly corre- trary set of data. Empirical results show that although A Practical Generalization of Fourier-based Learning the search space is exponential in size, it is possible to correlation function Cα such that α ⊆ β. find solutions while exploring only a fraction of the The search algorithm is efficiently implemented with a space. priority queue of nodes, which sorts nodes in decreas- ing order of fˆ (β). Each node contains a correlation 3.1. The Algorithm max function label β, a set of training examples Xβ, and a The algorithm uses a best-first search to explore the variable countβ. The root node of the search is initial- possible subsets of inputs over which a particular cor- ized with a completely unspecified correlation function n relation function could be defined. The search space (β = ∗ ), all of the training examples (Xβ = X), and can be represented as a binary tree of nodes, n levels countβ = 0. deep. Each node represents a partially or completely The algorithm begins by inserting the root node into specified correlation function label. At the top of the a priority queue.

Load more