<<

PROC. OF THE 11th PYTHON IN CONF. (SCIPY 2012) 11 A Tale of Four Libraries

Alejandro Weinstein‡∗, Michael Wakin‡

!

Abstract—This work describes the use some scientific Python tools to solve One of the contributions of our research is the idea of rep- information gathering problems using Reinforcement Learning. In particular, resenting the items in the datasets as vectors belonging to a we focus on the problem of designing an agent able to learn how to gather linear . To this end, we build a Latent Semantic Analysis information in linked datasets. We use four different libraries—RL-Glue, Gensim, (LSA) [Dee90] model to project documents onto a . NetworkX, and scikit-learn—during different stages of our research. We show This allows us, in addition to being able to compute similarities that, by using NumPy arrays as the default vector/matrix format, it is possible to between documents, to leverage a variety of RL techniques that integrate these libraries with minimal effort. require a vector representation. We use the Gensim library to build Index Terms—reinforcement learning, latent semantic analysis, machine learn- the LSA model. This library provides all the machinery to build, ing among other options, the LSA model. One place where Gensim shines is in its capability to handle big data sets, like the entirety of Wikipedia, that do not fit in memory. We also combine the vector Introduction representation of the items as a property of the NetworkX nodes. In addition to bringing efficient array computing and standard Finally, we also use the manifold learning capabilities of mathematical tools to Python, the NumPy/SciPy libraries provide sckit-learn, like the ISOMAP [Ten00], to perform some an ecosystem where multiple libraries can coexist and interact. exploratory . By reducing the dimensionality of the This work describes a success story where we integrate several LSA vectors obtained using Gensim from 400 to 3, we are able libraries, developed by different groups, to solve some of our to visualize the relative position of the vectors together with their research problems. connections. Our research focuses on using Reinforcement Learning (RL) to Source code to reproduce the results shown in this work is gather information in domains described by an underlying linked available at https://github.com/aweinstein/a_tale. dataset. We are interested in problems such as the following: given a Wikipedia article as a seed, find other articles that are interesting Reinforcement Learning relative to the starting point. Of particular interest is to find articles The RL paradigm [Sut98] considers an agent that interacts with that are more than one-click away from the seed, since these an environment described by a Markov Decision Process (MDP). articles are in general harder to find by a human. Formally, an MDP is defined by a state space , an action space In addition to the staples of scientific Python computing X , a transition function P, and a reward function r. At NumPy, SciPy, Matplotlib, and IPython, we use the libraries RL- A a given sample time t = 0,1,... the agent is at state x ∈ , and Glue [Tan09], NetworkX [Hag08], Gensim [Reh10], and scikit- t X it chooses action a ∈ . Given the current state x and selected learn [Ped11]. t A action a, the probability that the next state is x0 is determined by Reinforcement Learning considers the interaction between P(x,a,x0). After reaching the next state x0, the agent observes an a given environment and an agent. The objective is to design immediate reward r(x0). Figure1 depicts the agent-environment an agent able to learn a policy that allows it to maximize its interaction. In an RL problem, the objective is to find a function total expected reward. We use the RL-Glue library for our RL π : 7→ , called the policy, that maximizes the total expected experiments. This library provides the infrastructure to connect an X A reward environment and an agent, each one described by an independent " ∞ # t Python program. R = E ∑ γ r(xt ) , We represent the linked datasets we work with as graphs. t=1 For this we use NetworkX, which provides data structures to where γ ∈ (0,1) is a given discount factor. Note that typically efficiently represent graphs, together with implementations of the agent does not know the functions P and r, and it must many classic graph . We use NetworkX graphs to find the optimal policy by interacting with the environment. See describe the environments implemented using RL-Glue. We also Szepesvári [Sze10] for a detailed review of the theory of MDPs use these graphs to , analyze and visualize graphs built from and the different algorithms used in RL. unstructured data. We implement the RL algorithms using the RL-Glue library Corresponding author: [email protected] [Tan09]. The library consists of the RL-Glue Core program and * 1 ‡ the EECS department of the Colorado School of Mines a set of codecs for different languages to communicate with the library. To run an instance of a RL problem one needs to Copyright © 2012 Alejandro Weinstein et al. This is an open-access article write three different programs: the environment, the agent, and distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, the experiment. The environment and the agent programs match provided the original author and source are credited. exactly the corresponding elements of the RL framework, while 12 PROC. OF THE 11th PYTHON IN SCIENCE CONF. (SCIPY 2012) x a In the context of a collection of documents, or corpus, word r Agent Environment frequency is not enough to asses the importance of a term. For this reason, we introduce the quantity document frequency dft , defined to be the number of documents in the collection that contain term t. We can now define the inverse document frequency (idf) as N Fig. 1: The agent-environment interaction. The agent observes the idft = log , current state x and reward r; then it executes action π(x) = a. dft where N is the number of documents in the corpus. The idf is a measure of how unusual a term is. We define the tf-idf weight of the experiment orchestrates the interaction between these two. The term t in document d as following code snippets show the main methods that these three programs must implement: tf-idft,d = tft,d ×idft . ################# environment.py ################# This quantity is a good indicator of the discriminating power of a class env(Environment): term inside a given document. For each document in the corpus def env_start(self): # Set the current state we compute a vector of length M, where M is the total number of terms in the corpus. Each entry of this vector is the tf-idf weight return current_state for each term (if a term does not exist in the document, the weight is set to 0). We stack all the vectors to build the M × N term- def env_step(self, action): # Set the new state according to document matrix C. # the current state and given action. Note that since typically a document contains only a small fraction of the total number of terms in the corpus, the columns of return reward the term-document matrix are sparse. The method known as Latent #################### agent.py #################### Semantic Analysis (LSA) [Dee90] constructs a low-rank approx- class agent(Agent): imation Ck of rank at most k of C. The value of k, also known def agent_start(self, state): as the latent dimension, is a design parameter typically chosen # First step of an experiment to be in the low hundreds. This low-rank representation induces return action a projection onto a k-dimensional space. The similarity between the vector representation of the documents is now computed after def agent_step(self, reward, obs): projecting the vectors onto this subspace. One advantage of LSA # Execute a step of the RL algorithm is that it deals with the problems of synonymy, where different return action words have the same meaning, and polysemy, where one word has different meanings. ################# experiment.py ################## RLGlue.init() Using the Singular Value Decomposition (SVD) of the term- T RLGlue.RL_start() document matrix C = UΣV , the k-rank approximation of C is RLGlue.RL_episode(100) # Run an episode given by C = U V T , Note that RL-Glue is only a thin layer among these programs, k kΣk k allowing us to use any construction inside them. In particular, as where Uk, Σk, and Vk are the matrices formed by the k first described in the following sections, we use a NetworkX graph to columns of U, Σ, and V, respectively. The tf-idf representation model the environment. of a document q is projected onto the k-dimensional subspace as −1 T qk = Σk Uk q. Computing the Similarity between Documents Note that this projection transforms a sparse vector of length M To be able to gather information, we need to be able to quantify into a dense vector of length k. how relevant an item in the dataset is. When we work with In this work we use the Gensim library [Reh10] to build the documents, we use the similarity between a given document and vector space model. To test the library we downloaded the top 100 the seed to this end. Among the several ways of computing most popular books from project Gutenberg.2 After constructing similarities between documents, we choose the Vector Space the LSA model with 200 latent dimensions, we computed the Model [Man08]. Under this setup, each document is represented similarity between Moby Dick, which is in the corpus used to by a vector. The similarity between two documents is estimated build the model, and 6 other documents (see the results in Table by the cosine similarity of the document vector representations. 1). The first document is an excerpt from Moby Dick, 393 words The first step in representing a piece of text as a vector is long. The second one is an excerpt from the Wikipedia Moby to build a bag of words model, where we count the occurrences Dick article. The third one is an excerpt, 185 words long, of The of each term in the document. These word frequencies become Call of the Wild. The remaining two documents are excerpts from the vector entries, and we denote the term frequency of term t Wikipedia articles not related to Moby Dick. The similarity values in document d by tft,d. Although this model ignores information we obtain validate the model, since we can see high values (above related to the order of the words, it is still powerful enough to 0.8) for the documents related to Moby Dick, and significantly produce meaningful results. smaller values for the remaining ones.

1. Currently there are codecs for Python, C/C++, Java, Lisp, MATLAB, and 2. As per the , 2011 list, http://www.gutenberg.org/browse/scores/ Go. top. A TALE OF FOUR LIBRARIES 13

Arctic Text description LSA similarity Boot Insect Culture Cuba Flax Shoe Fabric Sandal (footwear) Trenchcoat Silk Excerpt from Moby Dick 0.87 Deer Weapon Tank Body piercing Cane Bullet Shirt Air gun Wool Arrow Leg Excerpt from Wikipedia Moby Dick article 0.83 Makeup Scissors Glove Hunting Military Artificial fibres Permit Cylinder Media Excerpt from The Call of the Wild 0.48 Helicopter Needle Cotton Hemp Human Torso Nylon Nobility Excerpt from Wikipedia Jewish Calendar 0.40 Rifle Soldier Purse Arm Submachine gun Water Polyester Clothing Sewing machine Army Eyeglasses article Gun Country Cloth Machine gun Leather Coat Pants Animal Pistol Scarification Excerpt from Wikipedia article 0.33 Airplane Food Fashion (clothing) Warship Umbrella Cartridge Acrylic Tent Vehicle Sock Spear Shoot Knife Jewelry Tattoo Police Paris Fur Sniper Costume Defense Revolver Body modification Ramie Skirt Thread Olympic Games Machine Wikimedia Commons Perfume Shotgun Hat Jeans TABLE 1: Similarity between Moby Dick and other documents. Accessories Fig. 2: Graph for the "Army" article in the simple Wikipedia with 97 nodes and 99 edges. The seed article is in blue. The size of Next, we build the LSA model for Wikipedia that allows us to the nodes (except for the seed node) is proportional to the similarity. compute the similarity between Wikipedia articles. Although this In red are all the nodes with similarity greater than 0.5. We found is a lengthy process that takes more than 20 hours, once the model two articles ("Defense" and "Weapon") similar to the seed three links is built, a similarity computation is very fast (on the order of 10 ahead. milliseconds). Results in the next section make use of this model. Note that although in principle it is simple to compute the

LSA model of a given corpus, the size of the datasets we are 21 sim= get_similarity(page_text) interested in make doing this a significant challenge. The two 22 self.G.add_edge(source, main difficulties are that in general (i) we cannot hold the vector 23 dest, 24 weight=sim, representation of the corpus in RAM memory, and (ii) we need to 25 d=dist) compute the SVD of a matrix whose size is beyond the limits of 26 new_links= [(l, dist, page.name) what standard solvers can handle. Here Gensim does stellar work 27 for l in page.links()] 28 links= new_links+ links by being able to handle both these challenges. 29 n+=1 30 31 return G Representing the State Space as a Graph We are interested in the problem of gathering information in We now show the result of running the code above for two domains described by linked datasets. It is natural to describe such different setups. In the first instance we crawl the Simple English domains by graphs. We use the NetworkX library [Hag08] to build Wikipedia4 using "Army" as the seed article. We set the limit on the graphs we work with. NetworkX provides data structures to the number of articles to visit to 100. The result is depicted5 in Fig. represent different kinds of graphs (undirected, weighted, directed, 2, where the node corresponding to the seed article is in light blue etc.), together with implementations of many graph algorithms. and the remaining nodes have a size proportional to the similarity NetworkX allows one to use any hashable Python object as a node with respect to the seed. Red nodes are the ones with similarity identifier. Also, any Python object can be used as a node, edge, or bigger than 0.5. We observe two nodes, "Defense" and "Weapon", graph attribute. We exploit this capability by using the LSA vector with similarities 0.7 and 0.53 respectively, that are three links away representation of a Wikipedia article, which is a NumPy array, as from the seed. a node attribute. In the second instance we crawl Wikipedia using the article The following code snippet shows a function3 used to build a "James Gleick"6 as seed. We set the limit on the number of directed graph where nodes represent Wikipedia articles, and the articles to visit to 2000. We show the result in Fig.3, where, edges represent links between articles. Note that we compute the as in the previous example, the node corresponding to the seed LSA representation of the article (line 11), and that this vector is in light blue and the remaining nodes have a size proportional is used as a node attribute (line 13). The function obtains up to to the similarity with respect to the seed. The eleven red nodes n_max articles by breadth-first crawling the Wikipedia, starting are the ones with similarity greater than 0.7. Of these, 9 are more from the article defined by page. than one link away from the seed. We see that the article with the

1 def crawl(page, n_max): biggest similarity, with a value of 0.8, is about " Wright 2 G= nx.DiGraph() (journalist)", and it is two links away from the seed (passing 3 n=0 through the "Slate " article). Robert Wright writes books 4 links= [(page,-1, None)] about , history and religion. It is very reasonable to 5 while n< n_max: 6 link= links.pop() consider him an author similar to James Gleick. 7 page= link[0] Another place where graphs can play an important role in the 8 dist= link[1]+1 RL problem is in finding basis functions to approximate the value- 9 page_text= page.edit().encode('utf-8') 10 # LSI representation of page_text 11 v_lsi= get_lsi(page_text) 3. The parameter page is a mwclient page object. See http://sourceforge. 12 # Add node to the graph net/apps/mediawiki/mwclient/. 13 G.add_node(page.name, v=v_lsi) 4. The Simple English Wikipedia (http://simple.wikipedia.org) has articles 14 if link[2]: written in simple English and has a much smaller number of articles than the 15 source= link[2] standard Wikipedia. We use it because of its simplicity. 16 dest= page.name 17 if G.has_edge(source, dest): 5. To generate this figure, we save the NetworkX graph in GEXF format, 18 # Link already exists and create the diagram using Gephi (http://gephi.org/). 19 continue 6. James Gleick is "an American author, journalist, and biographer, whose 20 else: books explore the cultural ramifications of science and technology". 14 PROC. OF THE 11th PYTHON IN SCIENCE CONF. (SCIPY 2012)

Palgrave Macmillan Hewlett Bay Park, Wayne County, New YorkYeshiva University MTA Regional Bus Operations Derby, Connecticut Terraced house Kinshasa Northeast Corridor Dover, Commuting Snow Castle Clinton National Monument All in the Family Riverhead (CDP), New York 1 World Trade Center ServiceCayuga County, New York Orleans County, New York Rio de Janeiro Taxicabs of George M. Cohan Traffic congestion Hamilton County, New York Building AMNRL Court of International Trade Westport, Connecticut Congress of the Confederation Wall Street Newark Liberty International Airport Jacksonville Axemen New Netherland United Airlines Seagram Building New Angoulme Oyster Bay Cove, New York Greater Hartford Mount Vernon, New York Indianapolis metropolitan area County, New York Utica, New York Precipitation (meteorology)

1972 Summer Paralympics Tourism in New York City George II of Great Britain Torrington, Connecticut John Keese Oneida County, New York Huntington (CDP), New York U.S. state BirminghamHoover Metropolitan Area IndianapolisCleveland, Ohio Cairo New York School Sussex County, New Jersey Outer barrier Gay rights movement Montreal New York Giants (MLB) Peter Minuit Massapequa, New York List of places known as of the world Columbia County, New YorkMuttontown, New York Niagara Falls, New York Freeway 1910 United States Census Archie Bunker Indian-American Gateway National Recreation Area National Institutes of HealthSons of Liberty Baltimore Metropolitan Area Islip (town), New York Brightwaters, New York Rhotic and non-rhotic accents Thomas Jefferson University World economy Louisville, Kentucky Thomas Menino 1850 United States Census Lloyd Harbor, New York Juilliard School New York City, New York Architecture of New York City 1996 Summer Paralympics New York in film American International Dodgers New Orleans People's Republic of Lower Daily News (New York) Matinecock, New York Charlotte, North Carolina National Invitation Tournament Holland Purchase Spain Mill Neck, New York Bloods Basingstoke Munsey Park, New York Old Brookville, New York 1870 United States Census Lawrence, Nassau County, New York New York, New York (disambiguation) Peekskill, New York Metro Detroit Upper West Side The New School Seneca County, New York Borough (New York City) Wantagh, New York Metropolitan Museum of Art Stickball Greenbelt Colombia College town Arnhem Bronx Zoo Huntington, New York Kant region Ridge-and-Valley Appalachians WNET List of urban areas by population Jersey City , North Jersey , New York East Rand The Record (Bergen County) Upper Manhattan Head of the Harbor, New York Southbury, Connecticut Niall of the Nine Hostages New Jersey Falafel Great Neck Estates, New York Giovanni da Verrazzano The Statesman's Yearbook Staten Island New York Amsterdam NewsPrudential Center Jazz at Lincoln Center Oakland, Denver, ColoradoCouncillor New York metropolitan area Lynbrook, New York hydride bomb Square Park, New York Democratic Party (United States) Delaware County, New York September 11, 2001 attacks 1810 United States Census Colson Whitehead List of tallest buildings in New York City Morgan Stanley Alley New York Supreme Court American Express Memphis, Tennessee San Francisco GiantsMaine Natural Hoboken, New Jersey Mercer County, New Jersey Freeport, New York Amityville, New YorkLevittown, New York Los Alamos National Laboratory North Haven, Connecticut Kingston, New York Geography of New York John F. Kennedy International Airport Asbury Park, New Jersey Saskia Sassen Saratoga County, New York Trumbull, Connecticut Central Park Zoo New Meadowlands Connecticut Jewish American literature Massapequa Park, New York Manhattan Chinatown Vivian Beaumont TheatreNiagara Frontier Tramway Amtrak Wyoming County, New York Battery Weed Percy Williams Bridgman Tioga County, New York Second Anglo-Dutch War Lower East Side, ManhattanGrand Central Terminal Abstract expressionism Rensselaer County, New York Johannes NBC Studios Richard E. Taylor New York State Unified Court System Karachi , Washington Dominican Republic Hybrid electric vehicle Evacuation Day (New York) Saint Lawrence Seaway DallasFort Worth metroplex New York City Department of City Planning New York Botanical Garden Independent (politician) New York-style pizza Feynman Long Division Puzzles Statue of Liberty National MonumentLaw enforcement in New York City Psychometrics York, Pennsylvania Otsego County, New York General Grant National Memorial National Football League Johannes Diderik van der Waals Gun politics Great Fire of New York Roslyn Estates, New York List of municipalities on Ulster County, New York Federal Writers' Project Subway Series Co-op City, BronxBridgeport, Connecticut New Haven County, Connecticut Babylon (village), New York John L. Hall West New York, New Jersey Demographics of New York City City New York Metropolitan Area Corning (city), New York Max 1830 United States Census Kuala Lumpur New York Supreme Court, Appellate Division Thomas A. Welton Nissequogue, New York North Hills, New York Intelligence quotient Brazil History of Long Island Fixing Broken Windows Monroe County, New York New York State Legislature Conference House Park Island Park, New York Valley Stream, New York Drainage basin Brookings Institute Brookfield, Connecticut Nathaniel Parker Willis Greater Orlando Art Deco Middlesex County, New Jersey Los Angeles metropolitan area 1840 United States Census Tri-State Region Santiago, ChileNew York Jets Town twinning John CockcroftMartin Ryle East Orange, New Jersey National Hockey League Glen Cove, New York Jacob Riis Park Theodor W. Hnsch Latin America Binghamton, New York Dutch guilder Central Park Conservatory Garden Cellular automata Weak decay 1976 Summer Paralympics Great Neck, New York List of U.S. cities with most pedestrian commuters East Rutherford, New Jersey New Canaan, Connecticut Nazi GermanyRobert Woodrow Wilson Dsseldorf Hamden, Connecticut Glens Falls, New York East Rockaway, New York Cheshire, Connecticut Atomic bombings of and Uniondale, New York Broome County, New York Hewlett Harbor, New York El Paso, Texas List of corporations based in New York City The Pleasure of Finding Things Out Van Cortlandt Park Kurt Gdel Morris County, New Jersey Yorkshire YangMills NASDAQ New York City Department of Hunterdon County, New Jersey Harrison, New York Central Park SummerStage Economy of Long Island CincinnatiGothic Revival architecture Passaic County, New Jersey Messenger Lectures Cedarhurst, New York Washington Irving Litchfield Hills Continental Airlines Kensington, New York Las Vegas, Nevada Federal Hall National Memorial Richmond County, New York John Peter Zenger Disco JetBlue Independent film Fresh water East Hills, New York Greater Danbury Sea Cliff, New York Racial and ethnic 1988 Summer Paralympics Murray Gell-Mann Dering Harbor, New York Yonkers, New YorkHarlem Success Academy Governors Island National Monument Bellerose Terrace, New York Millrose Games Thomaston, New York Public Prep Bibliography of New York Don QuixoteIntegral calculus American Mafia Education in New York Lazio Upper Brookville, New York Metro systems by annual passenger rides Open Library Monroe, Connecticut Italian people Shinichiro Tomonaga Woolworth Building Newtown, Connecticut History of New York Neighborhoods of New York City Madison Square Garden Nutation Delhi Transatlantic telephone cable Nashville Metropolitan Statistical Area Nicolaas BloembergenStckelbergFeynman interpretationToshihide Maskawa J. J. Thomson Run (island) Arlington, Texas Queens Metonymy Homicide NYCTV Newburgh (city), New York Ocean County, New Jersey Las Vegas metropolitan area New York (Anthony Burgess) Kansas City Metropolitan Area Orange, Connecticut Albany County, New YorkHewlett Neck, New York Hybrid vehicle Atlantic Beach, New York Virginia Beach, Virginia Clean diesel Space Shuttle Challenger disaster Encyclopdia Britannica Eleventh Edition Madison Avenue (Manhattan) Nanotechnology Humid subtropical climate United States District Court for the Eastern District of New York Madrid Metro 1964 New York World's Fair RobertErnest Oppenheimer Walton Gustaf Daln Journal of the American Planning Association Manhatta Huntington Bay, New York Geneticist Denver--Broomfield,Athens CO Metropolitan Statistical Area Fort Lee, New Jersey World Trade Center Memorial Party platform LaGuardia Airport Oklahoma City metropolitan area J.P. Morgan Chase& Co. DonaldAfrican A. Burial Glaser Ground National Monument Hempstead (village), New York John Kerry List of regions of the United States Transportation in New York VerizonElections in New Communications York American English Essex County, New Jersey metropolitan area Cuisine of New York City Metric ton Waldenstrm's macroglobulinemia Kings County, New York Asian (U.S. Census) Minor league baseball Merrick, New York New York's congressional districts South Shore (Long Island) Phoenix, Arizona Economy of New York Bagel Kansas City, Missouri Tom Wolfe Pelham Bay Park Prohibition in the United States Riverbank State Park Mesa Lewis County, New York Taxation in Flushingthe United MeadowsCorona States Park New Brunswick, New Jersey Elmira, New York Bachelor's degree Ontario County, New York United States Census, 2000 Erie County, New York Louisville Jefferson County, KYIN Metropolitan Statistical Area C. V. Raman Mineola, New YorkParis George W. Bush Bangkok Hampton Roads Northeastern United States Replica plating National Historic Landmark Mayors Against Illegal Guns Coalition Guyana Oyster Bay (town), New York Fundamental force County (US) Manhattan Rye (city), New York Constitution of the United States Irish Americans in New York City Food and water in New York City Atlanta, Georgia Supreme Court of the United States Marie CurieF. L. Vernon, Jr. 24/7 Baldwin, Nassau County, New York Superfluid SynesthesiaWomen's National Basketball Association Albert Abraham Michelson 2012 Summer Paralympics Peter GrnbergParks and recreation in New York City New York Botanical Gardens Thomson Reuters Corporation Freestyle music Black Spades Weston, Connecticut John Robert Schrieffer Staten IslandCombined Yankees statistical area Geography of New YorkTelephone City numbering plan Lists of New York City topics Genghis Blues Baghdad Port Jefferson, New York 2004 Summer Paralympics Feynman point Red Bull Arena (Harrison) New York City Ballet Politics of New York Unisphere Transportation on University of New York Hess Corporation List of cities in New York Lawrence M. Krauss Massachusetts Institute of Technology Schenectady County, New York Union County, New Jersey () Water treatment Rugby league City (New York) Taipei Jacksonville, FloridaMemorial Sloan-Kettering Cancer Center RappingTownhouse List of neighborhoods HarlemBethpage, New York Native Americans in the United States IBM Bohai Economic Rim Kyoto Naugatuck, Connecticut Viacom California Institute of Technology Raymond Davis, Jr. Gotham Gazette Lincoln Center for the Performing Arts Scissor SistersBen Roy Mottelson South Floral Park, New York Tulsa, Oklahoma Barnard College Pierre-Gilles de GennesLeon M. Lederman ModelMass (abstract) transit in New York City Wilton, Connecticut design Syracuse, New York Gentrification Gross metropolitan product Buffalo, New YorkDistrito Nacional Great Fire of New York (1776)North America College of Mount Saint Vincent Manorhaven, New York Indianapolis, Indiana New York Philharmonic Atomic bomb Big Apple Barcelona Prospect Park (Brooklyn) Sin-Itiro Tomonaga Plandome Manor,Mumbai New York OsakaResearch Triangle Madrid Tehran Washington Heights, Manhattan Punched card Weill Cornell Medical College of Cornell UniversitySalt marsh Martin Lewis PerlThousandGovernor Islands of New York Malverne, New York Electro-weak Westhampton Beach, New York Ridgefield, Connecticut Jazz World Series Geographic coordinate system Plymouth, Connecticut 1964 Summer Paralympics chromodynamics Hybrid taxi Norwalk, Connecticut World War II Smithtown, New York Darien, Connecticut Brooklyn Bridge New York City Fire Department They Might Be Giants theatre General Slocum Sacramento metropolitan area GeneticsRobert B. Leighton (physicist) Paterson, New Jersey Skyscraper NBC Washington Metro ChennaiInternational style (architecture) United States Senate Rockaway Peninsula () Connection Machine Le Szilrd Non-Hispanic whiteGuangzhou Mohawk Valley Park Avenue (Manhattan) Goldman Sachs Group Wichita, Kansas Greater Waterbury J. Hans D. JensenCentralNew Park York City Police Department Plandome Heights, New York Northern New Jersey Andre GeimErnest Orlando Lawrence Award HellmannFeynman theorem Geography of Hicksville, New York 1968 Summer ParalympicsMTVKlaus von Klitzing Onondaga County, New York Hong Kong Biotechnology AIA Guide to New York City Floral Park, New York New Jersey Nets Soccer USA Today M-theory Urban heat island Dutch Republic Universal Music Group Stonewall Rebellion Farmingdale, New York Ocean Beach, New York Queens, New York Saddle Rock, New York New York City (disambiguation) Entertainment industry Church (New York City) Bellport, New York Tuva or Bust! New York City OperaRichmondPetersburg Atlanta metropolitanProfessor area World Almanac Provo, Utah Central Hungary Erwin Schrdinger Nassau County, New York World Trade Center site Trinity (nuclear test) Manhattan Municipal Building Advertising Age Charles Hard Townes FeynmanKac formula Orange County, New York Madison, Connecticut Demonym West Hampton Dunes, New York Belvedere Castle Hubert UnitedHumphrey States Postal Service Louis NelE. C. George Sudarshan African American Crips Russell Alan HulseChicago metropolitan area Egypt CompStat Environmental issues in New York City John G. Kemeny Newburgh, New York Ruhr Delaware Valley Variational perturbation theory Foresight Nanotech Institute FeynmanFlower Hill,Prize New York List of Long Island recreational facilities American Community college metropolitan area Jamaica Bay Wildlife Refuge Aaron Novick Patchogue, New York Southampton (village), New York Ralph LeightonJersey City, New Jersey 1916 Zoning Resolution Kobe Jefferson County, New York Financial centre WarJohn Bardeen Plainfield, New JerseyFlorence National Basketball Association William Bellerose, New York Carl Feynman De jure Middletown, Orange County, New York Putnam County, New York Rahway, New Jersey Trenton, New Jersey Schrdinger equationUnited Nations Headquarters Hyderabad, India Theodore Roosevelt Birthplace National Historic Site Asharoken, New York Manhattanhenge Government of New York Baja California PubMed Central Clifford Shull Diffraction Transvaal Province United States Atomic Energy Commission New York City Quantum Princeton, New Jersey Phoenix metropolitan area Five Boroughs Los AngelesGreenhouse gas British EmpireQED (book)Statue of Liberty Delta Air Lines Mike Wallace (historian) Holland Tunnel Herman Melville Maria Goeppert-Mayer Gerard't Hooft List of people from New York City Brownstone North Fork, Suffolk County, New York Hidden track Greater Boston Indie rock , Berkeley Oak Ridge National Laboratory Old , NewPittsburgh York Williston Park, New York 1890 United States Census Lord Howe Charles Thomson Rees Wilson Tokyo 1950 United States Census Westchester County, New York George Washington Bridge Karl Alexander Mller Yokohama Mathematical operatorNew York State Constitution SUNY Downstate Medical Center United States Court of Appeals for the Second Circuit Criticality accidentNeodesha, Kansas to the United States Nagoya Chrysler Building List of counties in New York West Islip, New YorkWilliam Cullen Bryant LSD New Hyde Park, New York Hollywood, Los Angeles, California Greater Cleveland Subrahmanyan Chandrasekhar Rucker Park Giovanni Rossi Lomanitz Samba school List of museums and cultural institutions in New York City Tompkins County, New York The Bravery American Revolution Seattle metropolitan area Organized crime White American Greater Austin Ric Burns Table of United States Metropolitan Statistical Areas NucleonPlandome, New York Metropolitan municipality Digital object identifier Georges CharpakWilhelm Wien Skyscrapers International Standard Book Number Los Angeles, California PubMed Identifier UTC-5 Tom Newman (scientist) Niagara County, New York New York 1820 United States Census Robert Barro Institute for Advanced Study Raleigh, North Carolina Dallas Union City, New Jersey Sustainable design Asia Chicago"L" Quantum Robert Lane Greene Land reclamation White Plains, New York Puerto Rican people Port Authority Bus Terminal Beijing Independence Party of New York Arista National Honor Society Frontline (magazine) 1984 Summer Paralympics South Fork, Suffolk County, New York New York's Village Halloween Parade Christopher Ma Willard H. Wells Charles douard Guillaume Surely You're Joking, Mr. Feynman! Wikitravel Jackson Heights, Queens Bergen County, New Jersey Metropolitan area Transportation in New York City Hannes Alfvn Watertown, Connecticut Stonewall Inn Poquott, New York Ellis Island BabylonLong (town), Beach,New NewYork York New City York Public Advocate 1960 Summer Paralympics Centre Island, New York Symphony of Science Ithaca, New York Buffalo Niagara Falls metropolitan area Garden city movement Garden City, New York New York Insurance Parkway Stoke Mandeville Clay Ponds State Park Edward Harrigan Albuquerque, TIAA-CREF Daniel C. Tsui So Paulo New York City Department of Parks& Recreation Norman Foster Ramsey, Jr. Bibcode Center of the Philip Warren Anderson Fort Worth, Texas Pfizer Path integral formulation George Zweig Chautauqua County, New York Online Library Center LagosDay to Day Jerusalem District Infinity (film) OklahomaScarsdale, City, Oklahoma NewPhys.Suffolk Rev.York County, New York KolkataHowell, New Jersey Schenectady, New York New Milford, Connecticut North Branford, Connecticut Herkimer County, New York Erie Canal QED (play) Brooklyn Public LibraryEmpire State Flag of New York City Verrazano-Narrows Bridge Steuben County, New York MS-13 Gang United States Department of Energy American Civil ChenWar Ning YangFive Families Atlanta Hamilton, Ontario United States Bill of Rights Dhaka Max Delbruck Working Families Party Stewart International Airport Utne Reader Channel East Hampton (village), New York Martinus J. G. Veltman Project Tuva Cheesecake Eugene WignerEmilio G. Segr Quantum computer Rockland County, New York Major League Soccer Quantum computingVictor Weisskopf Gerd BinnigWilliam McLellan (nanotechnology) The Feynman Lectures on PhysicsOutlook (magazine) Winchester, Connecticut Yellow feverBeacon, New YorkList of New York City gardens New York Knights RLFC Jet Propulsion Laboratory Washington, D.C. Public-access Rochester, New York Federal government of the United States Above mean sea level Austin, Texas Cattaraugus County, New York Frederic de Hoffmann The New York UTC-4 Samuel C. C. Ting The Sunday Times Magazine New York City Marathon Jamaica Software development Commissioners' Plan of 1811 Taylor series Saltaire, New York Mother Jones (magazine) Kppen climate classification PATCO Speedline Differential calculus Russell Gardens, New York Houston, Texas Joseph Hooton Taylor, Jr. Salt Lake City metropolitan area Alice Tully Hall Livingston County, New York E (number) Alternative Greene County, New York Microbiologist Westfield, New Jersey Water tower Essen New York City mayoral elections Stamford, Connecticut Yates County, New York Long Branch, New Jersey String theoryMorristown, New Jersey Greenwich Village Chinatown, Manhattan List of sovereign states Loyalist (American Revolution) Albany, New York Massachusetts v. Environmental Protection Agency Robert MarshakAntony Hewish National Medal of ScienceNorth Hempstead, New York Parallel computing Roach Guards Avant-garde Cove Neck, New York Politics of Long Island Brian David Josephson 2016 Summer Paralympics Herald (Pakistan) Incheon Royal Society Musical theatre 1980 United States Census Quogue, New York 1990 United States Census National Broadcasting Company Owen Willans Richardson National Endowment for the Arts Feynman (disambiguation)Emily YoffeParis MtroKaplan Law School Metro Manila Open Directory Project SuperconductivityMax von Laue Half-derivative Greater St. Louis News Corporation United Nations headquarters MetLife 1960 United States Census Philip Morris International New York GiantsDeep inelastic Long Beach, California Japan Gauteng British Crown The Walrus Samba , Pennsylvania Latium The Stationery Office Douglas D. Osheroff Flexagon Green building Estuary Oceanside, New York Intercity bus International Ladies' Garment Workers'Southwestern Union Connecticut North Shore (Long Island) Portland, Oregon Mayor-council government Baxter Estates, New York Henry Way Kendall 2000 United States Census Capital District Yankee StadiumGothamFive Points, Manhattan Quantum electrodynamicsWalther Bothe New York dialectBethel, Connecticut Monmouth County, New Jersey Reason (magazine) San Diego, California Haiti National Oceanic and Atmospheric Administration Jean-Marie Colombani Time Warner Center Major League Baseball 1992 Summer Paralympics Kings Park, New YorkPresident of the United States List of New York state symbols The Character of PhysicalBachelor Law of Science Clinton County, New York Red Bull New York Baltimore, Cecil Frank Powell Ho Chi Minh City Hackensack, New Jersey Sullivan County, New York 1980 Summer Paralympics Einstein field equation Mount Sinai School of MedicineEastern Time Zone (North America) Westbury, New York Appalachians Stewart Manor, New York Roy J. Glauber Political machine Milwaukee, Wisconsin Conference House First Continental Congress WorldThe Trade Center Encyclopedia of New York City Warren County, New York Lahore KPRC-TV Sydney Kongar-ool Ondar Baruch Samuel Blumberg Kebab Fiorello H. LaGuardia Market capitalization Largest cities in the Americas New Jersey Transit rail operations United States District Court for the Southern District of NewFordham York University Troy Patterson Chinese people New York City ethnic enclaves Association of International Marathons and Road Races Sticky bead argumentViscosity Sagaponack, New YorkPort Authority of New York and New Jersey Danbury, Connecticut Edward Victor Appleton Veronica Dillon Long Island SoundList of references to Long Island places in popular culture San Francisco Bay Area Lindenhurst, New York Charles K. Kao Fire Island Rockefeller University Situation comedy Alma mater Roslyn Harbor, New York Liposarcoma Chemung County, New York Fort Tompkins Riverhead (town), New York The Wave (newspaper) Shanghai Wolcott, Connecticut Bayonne, New Jersey Compressed natural gas Lancaster, Pennsylvania Greater New Haven , Baron Blackett Broadway theater The Phoenix (magazine) Super Bowl XLVIII , New York 2000 Summer Paralympics School of Visual Arts Pace University Tokyo Subway Housing cooperative Southampton (town), New York Boisfeuillet Jones, Jr. 2010 United States Census 2008 Summer Paralympics Greater Jacksonville Metropolitan Area Setback (architecture) List of U.S. cities with high transit ridershipDoonesbury David Weigel Summer Paralympic Games Northport, New York St. John's University (Jamaica, NY) Esther Lederberg Omega-minus particle India Today Sum-over-paths New York County List of United States cities by population density Luis St. Patrick's DayKingdom of EnglandThe Progressive 1 E9 m Victor Stabin Principle of stationary actionMedia in New York City Monocle (2007 magazine) Hamilton Grange National Memorial Manhattan Neighborhood Network Bayville, New York Columbus, Ohio metropolitan area Will Leitch Megacity The Week (Indian magazine)Rob Walker (journalist) Gustav Ludwig HertzAlbert Einstein 1880 United States Census Music of Long Island Wilhelm Rntgen Wanamaker Mile New Statesman 1790 United States Census Minneapolis, Minnesota Rogers Commission List of hospitals in New York City Providence metropolitan area Cannabis (drug) Community Service Society of New York New York City Landmarks Preservation CommissionAnsonia, Connecticut Greenport, Suffolk County, New York Citigroup Indigenous peoples of the Americas Adirondack MountainsNew York State Colorado Springs, Colorado Simon Doonan The Mesa, Arizona Cairo Governorate James II of England Pike County, Pennsylvania Harlem Renaissance Greater Houston Roosevelt Island ZIP code Kings Point, New York Makoto Kobayashi (physicist) Johann Hari Cond Nast Building Greater San Antonio Passaic, New Jersey Rufus Wilmot Griswold Miami Republican Party (United States) Santo Domingo John William Strutt, 3rd Baron RayleighArXiv Jews Neural networkFreshman W. Daniel Hillis Butterfly EffectJohannesburg Unicameralism Fort Wadsworth Tel Aviv Guilford, Connecticut Jamestown, New York New York City arts organizations Trinidad and Tobago Old Westbury, New York Bogot Advertising agency The OldieGreat Neck Plaza, New York Kenneth T. Jackson Encyclopedia of New York City Alexei Alexeyevich Abrikosov Boston Area code 646 Newark, New Jersey Brooklyn Cyclones Rush hour North Haven, New York NYC (disambiguation) Schuyler County, New York Kearny, New Jersey Belle Terre, New York Political independent Massachusetts Bay Colony News WeeklyAsthma Guttenberg, New Jersey Josiah Willard GibbsFeynman diagram Wikiquote National Journal PlaNYC Washington Metropolitan AreaDemographics of New York Shelter Island (town), New York George Washington Isolation tank Washington County, New York Haute cuisine CBS Parton (particle physics) Hudson County, New Jersey Whitaker's Almanack Litchfield County, Connecticut Milford, Connecticut North China Battery Park City, Manhattan Marble Hill, Manhattan Matthew SandsKenneth G. Wilson Second Continental CongressAllegany County, New York Roslyn, New York Sands Point, New YorkInland Empire (California) Leon Neil Cooper Apartment building Village of the Branch, New York Southold, New York New Rochelle, New York New York County, New York East Hampton (town), New York 1900 United States Census Rockville Centre,Tribeca NewFilm Festival York Membership of the New York City Council Port Washington North, New York The Great White Way Greater Buenos AiresJakarta Memphis metropolitan area Views Omaha, Nebraska Minneapolis Saint Paul Rockefeller CenterPhiladelphia San Diego metropolitan area Portland metropolitan area Quantum cellular automata Biology Fred Kaplan Garfield, New Jersey Triangle Shirtwaist Factory fireShawangunk Ridge Brookville, New York Hans BetheMeriden, Connecticut Anthony Burgess List of Long Islanders Lima Essex County, New York Physicist Uniform Resource Locator Articles of Confederation Melting pot Parallel processing Tianjin Edward Manukyan Jerusalem Interpol (band) China Hip hop culture George E. Smith Martian Polykarp KuschBranford, Connecticut Grand Slam (tennis) American Broadcasting Company Baltimore International Standard Serial NumberOld master print Madison County, New York St. Lawrence County, New York Chonqing Real Time Opera List of theoretical Hempstead (town), New York Michael BloombergFur tradePerth Amboy, New Jersey Istanbul Poughkeepsie, New York Western Hemisphere Putnam Fellow BetheFeynman formula Nashville, Tennessee Jamaica Bay Albert Hibbs Matthew BroderickFermiDirac statistics Time zone Elizabeth, New Jersey IT Week James Surowiecki Science (journal) List of countries by homicide rate GuglielmoAltadena, California Marconi Prediction Battle of Long Island Doctoral advisor Fortnight MagazineStamp Act CongressGreat Migration (African American) Montgomery County, New York American Airlines American Museum of Natural History List of capitals in the United States Magill LenapeLondon Underground Somerset County, New Jersey New Fairfield, Connecticut Classified Ventures Shenzhen Linden, New Jersey Books on New York City John C. Mather Critical exponent Amandla (magazine) Beijing Review Stratford, Connecticut Ltd Westchester County Noseweek Schoharie County, New York MasatoshiPhilosophiae KoshibaNaturalis Principia Mathematica Administrative divisions of NewGoogle York Book Search Troy, New York Waterbury, Connecticut Great Irish Famine New York Stock Exchange Hearst Tower (New York City) Ketamine Shelton, Connecticut Rapid transit Tsung-Dao LeeFrits ZernikeBoseEinstein statistics Paul Krugman Heidelberg Music of New York City New Haven, Connecticut Ripponden Time Warner History of New York TheCity New York Times Barbara McClintock Tuva or Bust Hudson-Fulton Celebration Todt Hill Carnegie Hall Tampa Bay AreaPunk subculture Seoul United States Census Bureau List of people from New York New York Draft Riots Dongguan Christopher Hitchens Hectare The WeekForward Magazine Rome Long Island List of United States cities by population Bacteriophage lambda U.S. News& World Report United States East Williston, New York Islandia, New York Game design List of law enforcement agencies in Long Island Microsoft Excel Sports in New York City Timeline of New York City crimes and disasters Charles II of England John C. Lilly Brooklyn Islip (hamlet), New York Columbus, Ohio Broadway (New York City) Forth magazine Tucson, Arizona Irish people 1860 United States Census South Africa Austan Goolsbee US Open (tennis) EcuadorSag Harbor, New York HBO Harper's Magazine Adobe Flash Latin Kings (gang) E. B. White Fifth Avenue (Manhattan) Seymour, Connecticut Dutchess County, New York Government of New York City West Haven, Connecticut New York Times Laurel Hollow, New York New York Harbor Feynman sprinkler Lake Success, New York Bruce ReedNew Orleans metropolitan area The American Prospect San Jose, California Pennsylvania Station (New York City) Prospect (magazine) Times Herald-Record Area code 718 The New Republic Miami, Florida San Antonio, Texas The Jerusalem Report County (United States) Forty Thieves (New York gang) Overland (literary journal) Boston, MassachusettsEdwin G. Burrows Investigate (magazine) JSTOR Wallingford, Connecticut Silicon Valley Shoreham, New York Genetics (journal) Cortland County, New York Secaucus, New Jersey FermilabClaude Cohen-TannoudjiPhilipp Lenard List of Nobel laureates in Physics Barclays Center Area code 347 List of cities with most skyscrapers The EconomistClifton, New Jersey Rudolf Mssbauer The New York Review of Books Chenango County, New York Robert R. Wilson Sabbatical Horst Ludwig Strmer UrbanSacramento, area California Superfluidity Rev. Mod. Phys. Queens Borough Public Library South Florida metropolitan area Port Authority Trans-Hudson One-electron Ernstuniverse Ruska Community of Madrid Cincinnati Northern Kentucky metropolitan area Robert B. Laughlin Burial Ridge1939 New York World's Fair Bandung Caltech European American Physical ReviewHuman computer Punk rock General American Pulitzer Prize Charter school Lattingtown, New York TWA Oswego County, New York There'sEliot Spitzer Plenty of Room at the BottomMetropolitan Opera Geography of Long Island Summit, New Jersey East Haven, Connecticut MagnumSwedish Photos Cottage Marionette Theatre Metro-North Railroad Award Simon van der FreemanMeer Dyson Harrison, New Jersey WNYC Capital Ethiopia Wikipedia Honsh John A. Wheeler Phil Carter Flash VideoBASIC (physics) Melvin SchwartzLeo Esaki African Americans Lake Grove, New York The Strokes Tel Aviv District 1800 United States Census Ann Hulbert WheelerFeynman absorber theoryJacques Attali Montreal Metro Norval White Thomas Curtright Feynman diagrams Far Rockaway, QueensJohannes Stark Jewish quota Education in New York City Coney Island 1970 United States Census Michael Steinberger Maclean's Guido Pontecorvo Macy's Thanksgiving Day Parade Robert Coleman Richardson List of metropolitan areas by population Tim Wu Office of Scientific and Technical InformationGerald M. Rosberg Bangalore The Middle East (magazine) Google VideosBCS theory Physics Richard Chace Tolman Ultraviolet David Edelstein Peter Parnell El Tiempo LatinoNiels BohrWhat Do You Care What Other People Think? AtheismTuvaPieter ZeemanNASA Kaplan, Inc. John Gribbin The Monthly Amazon.com Village (magazine) Slate (magazine) WKMG-TV Hal S. Jones WJXT Post-Newsweek Stations Key West Literary Seminar The (Bangladesh) U. S. Department of Justice Boston Phoenix Dahlia Lithwick John Alderman () Ann L. McDaniel Fairfax County Times Jack Shafer FT Magazine The Washington Post Writers Group Newsline (magazine) Chaos theory James Gleick Atlantic Annie Lowrey Eric Leser Steven Landsburg The Fray (Internet forum) Salon.com PrivateSeth StevensonEye Washington Post Company New Zealand Listener Dhaka Courier KSAT-TV National ReviewStephen Jay Gould Sasha Frere-Jones Jonah Weiner Wilson Quarterly Linguistics William Saletan Robert Wright (journalist)Harvard College Jacob Weisberg David Plotz Ethiopian Review Andy Bowers Foreign AffairsThe Nation Human Events E!Sharp Dana Stevens (critic) The Atlantic WDIV-TV Adam Kirsch Podcast Dear Prudence (advice column)Virginia Heffernan Douglas Hofstadter Timothy Noah The Liberal The Herald (Everett) Asia Pacific Management Institute Farhad Manjoo WPLG The American Spectator BBC Focus on Africa David Helvarg Donald E. Graham Fareed Zakaria MSN Queue Magazine Literary Review of CanadaForeign Policy Greater Washington Publishing Ron Rosenbaum Anne Applebaum

Vijay Ravindran Tom VanderbiltNews magazine Nina Shen Rastogi Microsoft London Review of Books European Commission Authors Guild Newsweek EUobserver Daniel Radosh New African New York (magazine)Harvard Crimson Kaplan College Paaras National Book Award North& South (magazine) Franklin Foer Fig. 4: Environment described by three 16 × 20 rooms connected The Big Issue Policy Review Michael Kinsley Dissent (Australian magazine) Meghan O'Rourke Autodesk Forum (Bangladesh) The Drouth CableOne David Greenberg Standpoint (magazine) Robert PinskyExpress (newspaper) Griffith Review SCORE! Educational Centers The Spectator The Washington Post Company Antitrust Emily Bazelon Chief Executive Officer The Bulletin (Brussels weekly) through the middle row. Ian Bremmer Southern Maryland The PipelineTehelka The Gazette (Maryland) Minneapolis Maisonneuve (magazine) Washington Post This (Canadian magazine) Quadrant (magazine) National Public Radio Business School The Weekly Standard Time (magazine) John Dickerson (journalist) Variant (magazine) Canadian Dimension Eliot Porter National Observer (Australia) Jody Rosen USA MagazineIndependent station (North America) Fig. 3: Graph for the "James Gleick" Wikipedia article with 1975 nodes and 1999 edges. The seed article is in light blue. The size of the nodes (except for the seed node) is proportional to the similarity. In red are all the nodes with similarity bigger than 0.7. There are several articles with high similarity more than one link ahead. 0.06 0.04

0.02 π function. The value-function is the function V : X 7→ R defined 0.00 as 0.02 " ∞ # π t 0.04 V (x) = E ∑ γ r(xt ) x0 = x,at = π(xt ) , t=1 0.06 and plays a key role in many RL algorithms [Sze10]. When the 14 π 12 dimension of X is significant, it is common to approximate V (x) 10 8 rows by 10 20 6 π 30 4 V ≈ Vˆ = Φw, columns40 2 50 where Φ is an n-by-k matrix whose columns are the basis functions Second to fourth eigenvectors of the laplacian of the three used to approximate the value-function, n is the number of states, Fig. 5: rooms graph. Note how the eigendecomposition automatically cap- and w is a vector of dimension k. Typically, the basis functions tures the structure of the environment. are selected by hand, for example, by using polynomials or radial basis functions. Since choosing the right functions can be difficult, Mahadevan and Maggioni [Mah07] proposed a framework where Visualizing the LSA Space these basis functions are learned from the topology of the state We believe that being able to work in a vector space will allow us space. The key idea is to represent the state space by a graph and to use a series of RL techniques that otherwise we would not be use the k smoothest eigenvectors of the graph laplacian, dubbed available to use. For example, when using Proto-value functions, Proto-value functions, as basis functions. Given the graph that it is possible to use the Nyström approximation to estimate the represents the state space, it is very simple to find these basis value of an eigenvector for out-of-sample states [Mah06]; this is functions. As an example, consider an environment consisting of only possible if states can be represented as points belonging to a three 16 × 20 grid-like rooms connected in the middle, as shown Euclidean space. in Fig.4. Assuming the graph is stored in G, the following code7 How can we embed an entity in Euclidean space? In the computes the eigenvectors of the laplacian: previous section we showed that LSA can effectively compute L = nx.laplacian(G, sorted(G.nodes())) the similarity between documents. We can take this concept one evalues, evec = np.linalg.eigh(L) step forward and use LSA not only for computing similarities, but Figure5 shows 8 the second to fourth eigenvectors. Since in general also for embedding documents in Euclidean space. value-functions associated to this environment will exhibit a fast To evaluate the soundness of this idea, we perform an ex- change rate close to the room’s boundaries, these eigenvectors ploratory analysis of the simple Wikipedia LSA space. In order provide an efficient approximation basis. to be able to visualize the vectors, we use ISOMAP [Ten00] to reduce the dimension of the LSA vectors from 200 to 3 (we use 7. We assume that the standard import numpy as np and import the ISOMAP implementation provided by scikit-learn [Ped11]). networkx as nx statements were previously executed. We show a typical result in Fig.6, where each point represents the 8. The eigenvectors are reshaped from vectors of dimension 3 × 16 × 20 = LSA embedding of an article in R3, and a line between two points 960 to a matrix of size 16-by-60. To get meaningful results, it is necessary to represents a link between two articles. We can see how the points build the laplacian using the nodes in the grid in a row major order. This is why the nx.laplacian function is called with sorted(G.nodes()) as close to the "Water" article are, in effect, semantically related the second parameter. ("Fresh water", "Lake", "Snow", etc.). This result confirms that the A TALE OF FOUR LIBRARIES 15

[Ped11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and Rain E. Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12:2825-2830, 2011 Sea 0.3 [Reh10]R. Reh˚uˇ rekˇ and P. Sojka. Software Framework for Topic Modelling Snow 0.4 with Large Corpora, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45-50 May 2010 0.5 [Sze10] C. Szepesvári. Algorithms for Reinforcement Learning. San Rafael, 0.6 CA, Morgan and Claypool Publishers, 2010. [Sut98] R.S. Sutton and A.G. Barto. Reinforcement Learning. , Drink 0.7 Massachusetts, The MIT press, 1998. River 0.8 [Mah07] S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian Lake framework for learning representation and control in Markov deci- Fresh water Salt water sion processes. Journal of Machine Learning Research, 8:2169-2231, 0.20 2007. Water 0.25 [Man08] C.D. Manning, P. Raghavan and H. Schutze. An introduction to 0.2 0.30 information retrieval 0.1 . Cambridge, England. Cambridge University 0.0 0.35 0.1 Press, 2008 0.2 0.40 [Ten00] J.B Tenenbaum, V. de Silva, and J.C. Langford. A global geo- 0.3 0.4 0.45 metric framework for nonlinear dimensionality reduction . Science, 290(5500), 2319-2323, 2000 Fig. 6: ISOMAP projection of the LSA space. Each point represents [Mah06] S. Mahadevan, M. Maggioni, K. Ferguson and S.Osentoski. Learn- the LSA vector of a Simple English Wikipedia article projected ing representation and control in continuous Markov decision pro- onto R3 using ISOMAP. A line is added if there is a link between cesses. National Conference on Artificial Intelligence, 2006. the corresponding articles. The figure shows a close-up around the [Dee90] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer and R. "Water" article. We can observe that this point is close to points Harshman, R. (1990). Indexing by latent semantic analysis. Journal associated to articles with a similar semantic. of the American Society for Information Science, 41(6), 391-407.

LSA representation is not only useful for computing similarities between documents, but it is also an effective mechanism for embedding the information entities into a Euclidean space. This result encourages us to propose the use of the LSA representation in the definition of the state. Once again we emphasize that since Gensim vectors are NumPY arrays, we can use its output as an input to scikit-learn without any effort.

Conclusions We have presented an example where we use different elements of the scientific Python ecosystem to solve a research problem. Since we use libraries where NumPy arrays are used as the standard vector/matrix format, the integration among these components is transparent. We believe that this work is a good success story that validates Python as a viable scientific programming language. Our work shows that in many cases it is advantageous to use general purposes languages, like Python, for scientific computing. Although some computational parts of this work might be some- what simpler to implement in a domain specific language,9 the breadth of tasks that we work with could make it hard to integrate all of the parts using a domain specific language.

Acknowledgment This work was partially supported by AFOSR grant FA9550-09- 1-0465.

REFERENCES [Tan09] B. Tanner and A. White. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments, Journal of Machine Learn- ing Research, 10(Sep):2133-2136, 2009 [Hag08] A. Hagberg, D. Schult and P. Swart, Exploring Network Structure, Dynamics, and Function using NetworkX, in Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11-15, Aug 2008

9. Examples of such languages are MATLAB, Octave, SciLab, etc.