PROC. OF THE 11th PYTHON IN SCIENCE CONF. (SCIPY 2012) 11 A Tale of Four Libraries
Alejandro Weinstein‡∗, Michael Wakin‡
!
Abstract—This work describes the use some scientific Python tools to solve One of the contributions of our research is the idea of rep- information gathering problems using Reinforcement Learning. In particular, resenting the items in the datasets as vectors belonging to a we focus on the problem of designing an agent able to learn how to gather linear space. To this end, we build a Latent Semantic Analysis information in linked datasets. We use four different libraries—RL-Glue, Gensim, (LSA) [Dee90] model to project documents onto a vector space. NetworkX, and scikit-learn—during different stages of our research. We show This allows us, in addition to being able to compute similarities that, by using NumPy arrays as the default vector/matrix format, it is possible to between documents, to leverage a variety of RL techniques that integrate these libraries with minimal effort. require a vector representation. We use the Gensim library to build Index Terms—reinforcement learning, latent semantic analysis, machine learn- the LSA model. This library provides all the machinery to build, ing among other options, the LSA model. One place where Gensim shines is in its capability to handle big data sets, like the entirety of Wikipedia, that do not fit in memory. We also combine the vector Introduction representation of the items as a property of the NetworkX nodes. In addition to bringing efficient array computing and standard Finally, we also use the manifold learning capabilities of mathematical tools to Python, the NumPy/SciPy libraries provide sckit-learn, like the ISOMAP algorithm [Ten00], to perform some an ecosystem where multiple libraries can coexist and interact. exploratory data analysis. By reducing the dimensionality of the This work describes a success story where we integrate several LSA vectors obtained using Gensim from 400 to 3, we are able libraries, developed by different groups, to solve some of our to visualize the relative position of the vectors together with their research problems. connections. Our research focuses on using Reinforcement Learning (RL) to Source code to reproduce the results shown in this work is gather information in domains described by an underlying linked available at https://github.com/aweinstein/a_tale. dataset. We are interested in problems such as the following: given a Wikipedia article as a seed, find other articles that are interesting Reinforcement Learning relative to the starting point. Of particular interest is to find articles The RL paradigm [Sut98] considers an agent that interacts with that are more than one-click away from the seed, since these an environment described by a Markov Decision Process (MDP). articles are in general harder to find by a human. Formally, an MDP is defined by a state space , an action space In addition to the staples of scientific Python computing X , a transition probability function P, and a reward function r. At NumPy, SciPy, Matplotlib, and IPython, we use the libraries RL- A a given sample time t = 0,1,... the agent is at state x ∈ , and Glue [Tan09], NetworkX [Hag08], Gensim [Reh10], and scikit- t X it chooses action a ∈ . Given the current state x and selected learn [Ped11]. t A action a, the probability that the next state is x0 is determined by Reinforcement Learning considers the interaction between P(x,a,x0). After reaching the next state x0, the agent observes an a given environment and an agent. The objective is to design immediate reward r(x0). Figure1 depicts the agent-environment an agent able to learn a policy that allows it to maximize its interaction. In an RL problem, the objective is to find a function total expected reward. We use the RL-Glue library for our RL π : 7→ , called the policy, that maximizes the total expected experiments. This library provides the infrastructure to connect an X A reward environment and an agent, each one described by an independent " ∞ # t Python program. R = E ∑ γ r(xt ) , We represent the linked datasets we work with as graphs. t=1 For this we use NetworkX, which provides data structures to where γ ∈ (0,1) is a given discount factor. Note that typically efficiently represent graphs, together with implementations of the agent does not know the functions P and r, and it must many classic graph algorithms. We use NetworkX graphs to find the optimal policy by interacting with the environment. See describe the environments implemented using RL-Glue. We also Szepesvári [Sze10] for a detailed review of the theory of MDPs use these graphs to create, analyze and visualize graphs built from and the different algorithms used in RL. unstructured data. We implement the RL algorithms using the RL-Glue library Corresponding author: [email protected] [Tan09]. The library consists of the RL-Glue Core program and * 1 ‡ the EECS department of the Colorado School of Mines a set of codecs for different languages to communicate with the library. To run an instance of a RL problem one needs to Copyright © 2012 Alejandro Weinstein et al. This is an open-access article write three different programs: the environment, the agent, and distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, the experiment. The environment and the agent programs match provided the original author and source are credited. exactly the corresponding elements of the RL framework, while 12 PROC. OF THE 11th PYTHON IN SCIENCE CONF. (SCIPY 2012) x a In the context of a collection of documents, or corpus, word r Agent Environment frequency is not enough to asses the importance of a term. For this reason, we introduce the quantity document frequency dft , defined to be the number of documents in the collection that contain term t. We can now define the inverse document frequency (idf) as N Fig. 1: The agent-environment interaction. The agent observes the idft = log , current state x and reward r; then it executes action π(x) = a. dft where N is the number of documents in the corpus. The idf is a measure of how unusual a term is. We define the tf-idf weight of the experiment orchestrates the interaction between these two. The term t in document d as following code snippets show the main methods that these three programs must implement: tf-idft,d = tft,d ×idft . ################# environment.py ################# This quantity is a good indicator of the discriminating power of a class env(Environment): term inside a given document. For each document in the corpus def env_start(self): # Set the current state we compute a vector of length M, where M is the total number of terms in the corpus. Each entry of this vector is the tf-idf weight return current_state for each term (if a term does not exist in the document, the weight is set to 0). We stack all the vectors to build the M × N term- def env_step(self, action): # Set the new state according to document matrix C. # the current state and given action. Note that since typically a document contains only a small fraction of the total number of terms in the corpus, the columns of return reward the term-document matrix are sparse. The method known as Latent #################### agent.py #################### Semantic Analysis (LSA) [Dee90] constructs a low-rank approx- class agent(Agent): imation Ck of rank at most k of C. The value of k, also known def agent_start(self, state): as the latent dimension, is a design parameter typically chosen # First step of an experiment to be in the low hundreds. This low-rank representation induces return action a projection onto a k-dimensional space. The similarity between the vector representation of the documents is now computed after def agent_step(self, reward, obs): projecting the vectors onto this subspace. One advantage of LSA # Execute a step of the RL algorithm is that it deals with the problems of synonymy, where different return action words have the same meaning, and polysemy, where one word has different meanings. ################# experiment.py ################## RLGlue.init() Using the Singular Value Decomposition (SVD) of the term- T RLGlue.RL_start() document matrix C = UΣV , the k-rank approximation of C is RLGlue.RL_episode(100) # Run an episode given by C = U V T , Note that RL-Glue is only a thin layer among these programs, k kΣk k allowing us to use any construction inside them. In particular, as where Uk, Σk, and Vk are the matrices formed by the k first described in the following sections, we use a NetworkX graph to columns of U, Σ, and V, respectively. The tf-idf representation model the environment. of a document q is projected onto the k-dimensional subspace as −1 T qk = Σk Uk q. Computing the Similarity between Documents Note that this projection transforms a sparse vector of length M To be able to gather information, we need to be able to quantify into a dense vector of length k. how relevant an item in the dataset is. When we work with In this work we use the Gensim library [Reh10] to build the documents, we use the similarity between a given document and vector space model. To test the library we downloaded the top 100 the seed to this end. Among the several ways of computing most popular books from project Gutenberg.2 After constructing similarities between documents, we choose the Vector Space the LSA model with 200 latent dimensions, we computed the Model [Man08]. Under this setup, each document is represented similarity between Moby Dick, which is in the corpus used to by a vector. The similarity between two documents is estimated build the model, and 6 other documents (see the results in Table by the cosine similarity of the document vector representations. 1). The first document is an excerpt from Moby Dick, 393 words The first step in representing a piece of text as a vector is long. The second one is an excerpt from the Wikipedia Moby to build a bag of words model, where we count the occurrences Dick article. The third one is an excerpt, 185 words long, of The of each term in the document. These word frequencies become Call of the Wild. The remaining two documents are excerpts from the vector entries, and we denote the term frequency of term t Wikipedia articles not related to Moby Dick. The similarity values in document d by tft,d. Although this model ignores information we obtain validate the model, since we can see high values (above related to the order of the words, it is still powerful enough to 0.8) for the documents related to Moby Dick, and significantly produce meaningful results. smaller values for the remaining ones.
1. Currently there are codecs for Python, C/C++, Java, Lisp, MATLAB, and 2. As per the April 20, 2011 list, http://www.gutenberg.org/browse/scores/ Go. top. A TALE OF FOUR LIBRARIES 13
Arctic Text description LSA similarity Boot Insect Culture Cuba Flax Shoe Fabric Sandal (footwear) Trenchcoat Silk Excerpt from Moby Dick 0.87 Deer Weapon Tank Body piercing Cane Bullet Shirt Air gun Wool Arrow Leg Excerpt from Wikipedia Moby Dick article 0.83 Makeup Scissors Glove Hunting Military Artificial fibres Permit Cylinder Media Excerpt from The Call of the Wild 0.48 Helicopter Needle Cotton Hemp Human Torso Nylon Nobility Excerpt from Wikipedia Jewish Calendar 0.40 Rifle Soldier Purse Arm Submachine gun Water Polyester Clothing Sewing machine Army Eyeglasses article United States Gun Country Cloth Machine gun Leather Coat Pants Animal Pistol Scarification Excerpt from Wikipedia Oxygen article 0.33 Airplane Food Fashion (clothing) Warship Umbrella Cartridge Temperature Acrylic Australia Tent Vehicle Sock Spear Shoot Knife Jewelry Tattoo Police Paris Fur Sniper Costume Defense Revolver Body modification Ramie Skirt Thread Olympic Games Machine Wikimedia Commons Perfume Shotgun Hat Jeans TABLE 1: Similarity between Moby Dick and other documents. Accessories Fig. 2: Graph for the "Army" article in the simple Wikipedia with 97 nodes and 99 edges. The seed article is in light blue. The size of Next, we build the LSA model for Wikipedia that allows us to the nodes (except for the seed node) is proportional to the similarity. compute the similarity between Wikipedia articles. Although this In red are all the nodes with similarity greater than 0.5. We found is a lengthy process that takes more than 20 hours, once the model two articles ("Defense" and "Weapon") similar to the seed three links is built, a similarity computation is very fast (on the order of 10 ahead. milliseconds). Results in the next section make use of this model. Note that although in principle it is simple to compute the
LSA model of a given corpus, the size of the datasets we are 21 sim= get_similarity(page_text) interested in make doing this a significant challenge. The two 22 self.G.add_edge(source, main difficulties are that in general (i) we cannot hold the vector 23 dest, 24 weight=sim, representation of the corpus in RAM memory, and (ii) we need to 25 d=dist) compute the SVD of a matrix whose size is beyond the limits of 26 new_links= [(l, dist, page.name) what standard solvers can handle. Here Gensim does stellar work 27 for l in page.links()] 28 links= new_links+ links by being able to handle both these challenges. 29 n+=1 30 31 return G Representing the State Space as a Graph We are interested in the problem of gathering information in We now show the result of running the code above for two domains described by linked datasets. It is natural to describe such different setups. In the first instance we crawl the Simple English domains by graphs. We use the NetworkX library [Hag08] to build Wikipedia4 using "Army" as the seed article. We set the limit on the graphs we work with. NetworkX provides data structures to the number of articles to visit to 100. The result is depicted5 in Fig. represent different kinds of graphs (undirected, weighted, directed, 2, where the node corresponding to the seed article is in light blue etc.), together with implementations of many graph algorithms. and the remaining nodes have a size proportional to the similarity NetworkX allows one to use any hashable Python object as a node with respect to the seed. Red nodes are the ones with similarity identifier. Also, any Python object can be used as a node, edge, or bigger than 0.5. We observe two nodes, "Defense" and "Weapon", graph attribute. We exploit this capability by using the LSA vector with similarities 0.7 and 0.53 respectively, that are three links away representation of a Wikipedia article, which is a NumPy array, as from the seed. a node attribute. In the second instance we crawl Wikipedia using the article The following code snippet shows a function3 used to build a "James Gleick"6 as seed. We set the limit on the number of directed graph where nodes represent Wikipedia articles, and the articles to visit to 2000. We show the result in Fig.3, where, edges represent links between articles. Note that we compute the as in the previous example, the node corresponding to the seed LSA representation of the article (line 11), and that this vector is in light blue and the remaining nodes have a size proportional is used as a node attribute (line 13). The function obtains up to to the similarity with respect to the seed. The eleven red nodes n_max articles by breadth-first crawling the Wikipedia, starting are the ones with similarity greater than 0.7. Of these, 9 are more from the article defined by page. than one link away from the seed. We see that the article with the
1 def crawl(page, n_max): biggest similarity, with a value of 0.8, is about "Robert Wright 2 G= nx.DiGraph() (journalist)", and it is two links away from the seed (passing 3 n=0 through the "Slate magazine" article). Robert Wright writes books 4 links= [(page,-1, None)] about sciences, history and religion. It is very reasonable to 5 while n< n_max: 6 link= links.pop() consider him an author similar to James Gleick. 7 page= link[0] Another place where graphs can play an important role in the 8 dist= link[1]+1 RL problem is in finding basis functions to approximate the value- 9 page_text= page.edit().encode('utf-8') 10 # LSI representation of page_text 11 v_lsi= get_lsi(page_text) 3. The parameter page is a mwclient page object. See http://sourceforge. 12 # Add node to the graph net/apps/mediawiki/mwclient/. 13 G.add_node(page.name, v=v_lsi) 4. The Simple English Wikipedia (http://simple.wikipedia.org) has articles 14 if link[2]: written in simple English and has a much smaller number of articles than the 15 source= link[2] standard Wikipedia. We use it because of its simplicity. 16 dest= page.name 17 if G.has_edge(source, dest): 5. To generate this figure, we save the NetworkX graph in GEXF format, 18 # Link already exists and create the diagram using Gephi (http://gephi.org/). 19 continue 6. James Gleick is "an American author, journalist, and biographer, whose 20 else: books explore the cultural ramifications of science and technology". 14 PROC. OF THE 11th PYTHON IN SCIENCE CONF. (SCIPY 2012)
Palgrave Macmillan Hewlett Bay Park, New York Wayne County, New YorkYeshiva University MTA Regional Bus Operations Derby, Connecticut Terraced house Kinshasa Northeast Corridor Dover, New Jersey Commuting Snow Castle Clinton National Monument All in the Family Riverhead (CDP), New York 1 World Trade Center Mexico Public Broadcasting ServiceCayuga County, New York Orleans County, New York Rio de Janeiro East River Taxicabs of New York City George M. Cohan Traffic congestion Hamilton County, New York Mexico City Empire State Building AMNRL Court of International Trade Westport, Connecticut Congress of the Confederation Wall Street Cooper Union Newark Liberty International Airport Jacksonville Axemen New Netherland United Airlines Seagram Building New Angoulme Oyster Bay Cove, New York Benjamin Franklin Greater Hartford Mount Vernon, New York Indianapolis metropolitan area Queens County, New York Utica, New York Precipitation (meteorology) Hudson Valley
1972 Summer Paralympics Tourism in New York City George II of Great Britain Upstate New York Torrington, Connecticut John Keese Oneida County, New York Huntington (CDP), New York U.S. state BirminghamHoover Metropolitan Area IndianapolisCleveland, Ohio Cairo New York School Sussex County, New Jersey Outer barrier San Francisco Gay rights movement Montreal New York Giants (MLB) Peter Minuit Massapequa, New York List of places known as the capital of the world Columbia County, New YorkMuttontown, New York Niagara Falls, New York Freeway 1910 United States Census Archie Bunker Indian-American Gateway National Recreation Area National Institutes of HealthSons of Liberty Baltimore Metropolitan Area Islip (town), New York Brightwaters, New York Rhotic and non-rhotic accents Thomas Jefferson University World economy Louisville, Kentucky Thomas Menino 1850 United States Census Lloyd Harbor, New York Juilliard School New York City, New York Architecture of New York City 1996 Summer Paralympics New York in film American International Group Los Angeles Dodgers New Orleans People's Republic of China Lower Manhattan Daily News (New York) Matinecock, New York Charlotte, North Carolina National Invitation Tournament Holland Purchase Spain Mill Neck, New York Bloods Basingstoke Munsey Park, New York Old Brookville, New York 1870 United States Census Lawrence, Nassau County, New York New York, New York (disambiguation) Peekskill, New York Metro Detroit Upper West Side The New School Seneca County, New York Borough (New York City) Wantagh, New York Metropolitan Museum of Art Stickball Staten Island Greenbelt Colombia College town Arnhem Bronx Zoo Huntington, New York Israel New York Knicks Kant region Ridge-and-Valley Appalachians WNET List of urban areas by population Jersey City Dallas, Texas North Jersey Rome, New York East Rand The Record (Bergen County) Upper Manhattan Head of the Harbor, New York Southbury, Connecticut Niall of the Nine Hostages New Jersey Falafel Great Neck Estates, New York Giovanni da Verrazzano The Statesman's Yearbook Southtowns Staten Island New York Amsterdam NewsPrudential Center Jazz at Lincoln Center Oakland, California Denver, ColoradoCouncillor New York metropolitan area Lynbrook, New York Uranium hydride bomb Washington Square Park, New York Democratic Party (United States) Delaware County, New York September 11, 2001 attacks Comedy Central 1810 United States Census Freedom of the press Colson Whitehead List of tallest buildings in New York City Morgan Stanley Silicon Alley New York Supreme Court American Express Memphis, Tennessee San Francisco GiantsMaine Natural logarithm Hoboken, New Jersey Mercer County, New Jersey Culture of New York City Freeport, New York Amityville, New YorkLevittown, New York Los Alamos National Laboratory North Haven, Connecticut Kingston, New York Geography of New York John F. Kennedy International Airport Asbury Park, New Jersey Saskia Sassen Saratoga County, New York Trumbull, Connecticut Central Park Zoo New Meadowlands Stadium Connecticut Jewish American literature Massapequa Park, New York Manhattan Chinatown Vivian Beaumont TheatreNiagara Frontier Roosevelt Island Tramway Amtrak Wyoming County, New York Battery Weed Percy Williams Bridgman Tioga County, New York Second Anglo-Dutch War Lower East Side, ManhattanGrand Central Terminal Abstract expressionism Rensselaer County, New York Johannes Georg Bednorz NBC Studios Richard E. Taylor New York State Unified Court System Karachi Seattle, Washington Dominican Republic Hybrid electric vehicle Evacuation Day (New York) Saint Lawrence Seaway DallasFort Worth metroplex New York City Department of City Planning New York Botanical Garden Independent (politician) New York-style pizza Feynman Long Division Puzzles Statue of Liberty National MonumentLaw enforcement in New York City Psychometrics York, Pennsylvania Staten Island Ferry Hip hop music Otsego County, New York General Grant National Memorial National Football League Johannes Diderik van der Waals Gun politics Alexander Hamilton Great Fire of New York Roslyn Estates, New York Isidor Isaac Rabi List of municipalities on Long Island Hungary Ulster County, New York Federal Writers' Project Subway Series Co-op City, BronxBridgeport, Connecticut New Haven County, Connecticut Babylon (village), New York John L. Hall West New York, New Jersey Demographics of New York City Tuberculosis Seal of New York City New York Metropolitan Area Corning (city), New York Max Planck 1830 United States Census Kuala Lumpur New York Supreme Court, Appellate Division Tammany Hall Thomas A. Welton William Alfred Fowler Nissequogue, New York North Hills, New York Intelligence quotient Brazil Max Born History of Long Island Fixing Broken Windows Monroe County, New York New York State Legislature Conference House Park Island Park, New York Valley Stream, New York Drainage basin Brookings Institute Brookfield, Connecticut Nathaniel Parker Willis Greater Orlando Art Deco Middlesex County, New Jersey Los Angeles metropolitan area 1840 United States Census Tri-State Region Santiago, ChileNew York Jets Town twinning John CockcroftMartin Ryle East Orange, New Jersey National Hockey League Glen Cove, New York Jacob Riis Park Theodor W. Hnsch Hendrik Lorentz Latin America Binghamton, New York Dutch guilder Central Park Conservatory Garden Cellular automata Weak decay 1976 Summer Paralympics Central Jersey Great Neck, New York List of U.S. cities with most pedestrian commuters East Rutherford, New Jersey New Canaan, Connecticut Nazi GermanyRobert Woodrow Wilson Staten Island Railway Dsseldorf Hamden, Connecticut Glens Falls, New York East Rockaway, New York Cheshire, Connecticut New York Bay Atomic bombings of Hiroshima and Nagasaki Uniondale, New York Broome County, New York Hewlett Harbor, New York El Paso, Texas List of corporations based in New York City The Pleasure of Finding Things Out Van Cortlandt Park Hudson River Kurt Gdel Morris County, New Jersey Yorkshire YangMills NASDAQ New York City Department of Education Hunterdon County, New Jersey Harrison, New York Central Park SummerStage Economy of Long Island CincinnatiGothic Revival architecture Passaic County, New Jersey Messenger Lectures Cedarhurst, New York Washington Irving Litchfield Hills England New York Post Continental Airlines Kensington, New York Las Vegas, Nevada Federal Hall National Memorial Richmond County, New York John Peter Zenger Disco JetBlue Independent film Antimatter Fresh water East Hills, New York Greater Danbury Sea Cliff, New York Racial and ethnic history of New York City 1988 Summer Paralympics Alexander Prokhorov Murray Gell-Mann Dering Harbor, New York Yonkers, New YorkHarlem Success Academy Governors Island National Monument Bellerose Terrace, New York Wolfgang Pauli Millrose Games Thomaston, New York Public Prep Bibliography of New York Don QuixoteIntegral calculus American Mafia Education in New York Lazio Upper Brookville, New York Metro systems by annual passenger rides Open Library Monroe, Connecticut Italian people Shinichiro Tomonaga Woolworth Building Newtown, Connecticut History of New York Neighborhoods of New York City Madison Square Garden Nutation Delhi El Diario La Prensa Transatlantic telephone cable Nashville Metropolitan Statistical Area Nicolaas BloembergenStckelbergFeynman interpretationToshihide Maskawa Strong interaction J. J. Thomson Run (island) Arlington, Texas Queens Metonymy Homicide Nuclear reactor NYCTV Newburgh (city), New York Ocean County, New Jersey Las Vegas metropolitan area New York (Anthony Burgess) Kansas City Metropolitan Area Orange, Connecticut Helium Albany County, New YorkHewlett Neck, New York Hybrid vehicle Atlantic Beach, New York Virginia Beach, Virginia Clean diesel Space Shuttle Challenger disaster Encyclopdia Britannica Eleventh Edition Madison Avenue (Manhattan) James Cronin Nanotechnology Humid subtropical climate United States District Court for the Eastern District of New York Madrid Metro 1964 New York World's Fair John von Neumann RobertErnest Oppenheimer Walton Gustaf Daln Journal of the American Planning Association Manhatta Huntington Bay, New York Geneticist Denver-Aurora-Broomfield,Athens CO Metropolitan Statistical Area Fort Lee, New Jersey World Trade Center Memorial Party platform LaGuardia Airport Oklahoma City metropolitan area J.P. Morgan Chase& Co. Jack Steinberger DonaldAfrican A. Burial Glaser Ground National Monument Hempstead (village), New York John Kerry List of regions of the United States Transportation in New York James Rainwater VerizonElections in New Communications York American English Essex County, New Jersey Western New York Milwaukee metropolitan area Cuisine of New York City Metric ton Waldenstrm's macroglobulinemia Kings County, New York Asian (U.S. Census) Minor league baseball Merrick, New York New York's congressional districts South Shore (Long Island) Phoenix, Arizona Economy of New York Abdus Salam Bagel Kansas City, Missouri New York City Hall Louis de Broglie Jerome Isaac Friedman Tom Wolfe Pelham Bay Park Prohibition in the United States Riverbank State Park Mesa Lewis County, New York Sony Music Entertainment Taxation in Flushingthe United MeadowsCorona States Park New Brunswick, New Jersey Elmira, New York Bachelor's degree Liquid helium Ontario County, New York United States Census, 2000 Erie County, New York Louisville Jefferson County, KYIN Metropolitan Statistical Area C. V. Raman Eric Allin Cornell Manne Siegbahn Mineola, New YorkParis George W. Bush Bangkok Hampton Roads Gluon Oersted Medal Northeastern United States Replica plating Walter Houser Brattain National Historic Landmark Mayors Against Illegal Guns Coalition Guyana Oyster Bay (town), New York Fundamental force County (US) Manhattan Rye (city), New York Constitution of the United States Nobel Prize Irish Americans in New York City Food and water in New York City Sheldon Lee Glashow Atlanta, Georgia Supreme Court of the United States Marie CurieF. L. Vernon, Jr. 24/7 Baldwin, Nassau County, New York Superfluid SynesthesiaWomen's National Basketball Association Albert Abraham Michelson Long Island Rail Road 2012 Summer Paralympics Peter GrnbergParks and recreation in New York City New York Botanical Gardens Albert Fert Thomson Reuters Corporation Freestyle music Black Spades Weston, Connecticut John Robert Schrieffer Igor Tamm Staten IslandCombined Yankees statistical area Geography of New YorkTelephone City numbering plan Lists of New York City topics Genghis Blues Baghdad Port Jefferson, New York 2004 Summer Paralympics Feynman point Red Bull Arena (Harrison) New York City Ballet Politics of New York Unisphere Transportation on Long Island City University of New York Hess Corporation List of cities in New York Lawrence M. Krauss Massachusetts Institute of Technology Val Logsdon Fitch Schenectady County, New York Union County, New Jersey David Lee (physicist) Water treatment Rugby league City (New York) Taipei Jacksonville, FloridaMemorial Sloan-Kettering Cancer Center RappingTownhouse List of Brooklyn neighborhoods Kai Siegbahn Pavel Cherenkov HarlemBethpage, New York Moscow Native Americans in the United States IBM Music industry Bohai Economic Rim Kyoto Naugatuck, Connecticut Viacom California Institute of Technology Raymond Davis, Jr. Gotham Gazette Lincoln Center for the Performing Arts Scissor SistersBen Roy Mottelson South Floral Park, New York Tulsa, Oklahoma Barnard College Nikolay Basov Pierre-Gilles de GennesLeon M. Lederman ModelMass (abstract) transit in New York City Wilton, Connecticut Nuclear weapon design Syracuse, New York Gentrification Gross metropolitan product Buffalo, New YorkDistrito Nacional Great Fire of New York (1776)North America College of Mount Saint Vincent Manorhaven, New York Indianapolis, Indiana New York Philharmonic Atomic bomb Big Apple Barcelona Prospect Park (Brooklyn) Felix Bloch Sin-Itiro Tomonaga Plandome Manor,Mumbai New York OsakaResearch Triangle Madrid Tehran Washington Heights, Manhattan Punched card Weill Cornell Medical College of Cornell UniversitySalt marsh Martin Lewis PerlThousandGovernor Islands of New York Malverne, New York Electro-weak Westhampton Beach, New York Ridgefield, Connecticut Jazz World Series Geographic coordinate system Plymouth, Connecticut 1964 Summer Paralympics Quantum chromodynamics Hans Georg Dehmelt Hybrid taxi Ernest Lawrence Norwalk, Connecticut World War II Smithtown, New York Darien, Connecticut Brooklyn Bridge New York City Fire Department They Might Be Giants Central New York Kip Thorne Language delay Broadway theatre Italy Leon Cooper General Slocum Sacramento metropolitan area Vitaly Ginzburg GeneticsRobert B. Leighton (physicist) Paterson, New Jersey Skyscraper NBC Washington Metro ChennaiInternational style (architecture) United States Senate Rockaway Peninsula Spin (physics) Connection Machine Le Szilrd Non-Hispanic whiteGuangzhou Mohawk Valley Park Avenue (Manhattan) Negative probability Goldman Sachs Group Wichita, Kansas Greater Waterbury J. Hans D. JensenCentralNew Park York City Police Department Plandome Heights, New York Northern New Jersey Andre GeimErnest Orlando Lawrence Award HellmannFeynman theorem Geography of New York Harbor Hicksville, New York 1968 Summer ParalympicsMTVKlaus von Klitzing Onondaga County, New York Hong Kong Biotechnology AIA Guide to New York City Floral Park, New York New Jersey Nets Soccer USA Today M-theory Urban heat island Dutch Republic Willis Lamb Universal Music Group Stonewall Rebellion Cornell University Farmingdale, New York Ocean Beach, New York Queens, New York Saddle Rock, New York New York City (disambiguation) Yoichiro Nambu Entertainment industry Trinity Church (New York City) Bellport, New York Tuva or Bust! New York City OperaRichmondPetersburg Atlanta metropolitanProfessor area World Almanac Provo, Utah Central Hungary Feynman slash notation Erwin Schrdinger Nassau County, New York World Trade Center site Doctor of Philosophy Trinity (nuclear test) Manhattan Municipal Building Advertising Age Charles Hard Townes FeynmanKac formula Orange County, New York Madison, Connecticut Demonym West Hampton Dunes, New York Belvedere Castle Hubert UnitedHumphrey States Postal Service Louis NelE. C. George Sudarshan Manuel Sandoval Vallarta African American Crips Feynman checkerboard Russell Alan HulseChicago metropolitan area Egypt CompStat Environmental issues in New York City John G. Kemeny Newburgh, New York Ruhr Delaware Valley Variational perturbation theory Foresight Nanotech Institute FeynmanFlower Hill,Prize New York Neutron List of Long Island recreational facilities American Jews Community college Robert Serber Pittsburgh metropolitan area Jamaica Bay Wildlife Refuge Aaron Novick Patchogue, New York Southampton (village), New York Ralph LeightonJersey City, New Jersey Carl David Anderson 1916 Zoning Resolution Kobe Jefferson County, New York Financial centre Cold WarJohn Bardeen Plainfield, New JerseyFlorence National Basketball Association John Archibald Wheeler William Lawrence Bragg Bellerose, New York Carl Feynman De jure Middletown, Orange County, New York Putnam County, New York Rahway, New Jersey Trenton, New Jersey Chicago George Paget Thomson Hagen Kleinert Schrdinger equationUnited Nations Headquarters Hyderabad, India Theodore Roosevelt Birthplace National Historic Site Asharoken, New York Wolfgang Paul Manhattanhenge Stephen Wolfram Government of New York Baja California PubMed Central Clifford Shull Diffraction United Nations Transvaal Province Bertram Brockhouse United States Atomic Energy Commission New York City Pseudoscience Quantum mechanics Princeton, New Jersey Phoenix metropolitan area Five Boroughs Los AngelesGreenhouse gas British EmpireQED (book)Statue of Liberty Delta Air Lines Mike Wallace (historian) New York Public Library Holland Tunnel Herman Melville Maria Goeppert-Mayer Gerard't Hooft Gabriel Lippmann List of people from New York City Brownstone North Fork, Suffolk County, New York Otto Stern Positron Hidden track Greater Boston Indie rock University of California, Berkeley Oak Ridge National Laboratory Old Field, NewPittsburgh York Williston Park, New York 1890 United States Census James Franck Lord Howe Charles Thomson Rees Wilson Tokyo 1950 United States Census Westchester County, New York George Washington Bridge Karl Alexander Mller Yokohama Mathematical operatorNew York State Constitution SUNY Downstate Medical Center United States Court of Appeals for the Second Circuit Criticality accidentNeodesha, Kansas Immigration to the United States Nagoya Chrysler Building List of counties in New York West Islip, New YorkWilliam Cullen Bryant LSD Spacetime New Hyde Park, New York Hollywood, Los Angeles, California Greater Cleveland Subrahmanyan Chandrasekhar Rucker Park Giovanni Rossi Lomanitz Samba school List of museums and cultural institutions in New York City Tompkins County, New York The Bravery Pyotr Kapitsa Columbia University American Revolution Seattle metropolitan area Organized crime White American Edward Teller Greater Austin Ric Burns Table of United States Metropolitan Statistical Areas NucleonPlandome, New York Metropolitan municipality Digital object identifier Georges CharpakWilhelm Wien Skyscrapers Fox Broadcasting Company International Standard Book Number Los Angeles, California PubMed Identifier UTC-5 Tom Newman (scientist) Niagara County, New York New York 1820 United States Census Robert Barro Institute for Advanced Study Raleigh, North Carolina Russia Dallas Union City, New Jersey Sustainable design Asia Chicago"L" Jack Kilby Quantum gravity Robert Lane Greene Land reclamation White Plains, New York Puerto Rican people Port Authority Bus Terminal Owen Chamberlain Beijing Independence Party of New York Arista National Honor Society Frontline (magazine) 1984 Summer Paralympics South Fork, Suffolk County, New York New York's Village Halloween Parade Edward Mills Purcell Arno Allan Penzias Southern Tier Christopher Ma Willard H. Wells Charles douard Guillaume Surely You're Joking, Mr. Feynman! Wikitravel Jackson Heights, Queens Bergen County, New Jersey Metropolitan area The Bronx Transportation in New York City Hannes Alfvn Watertown, Connecticut Stonewall Inn The Wall Street Journal Poquott, New York Ellis Island BabylonLong (town), Beach,New NewYork York New City York Public Advocate William Shockley 1960 Summer Paralympics Centre Island, New York Symphony of Science Alfred Kastler Ithaca, New York Buffalo Niagara Falls metropolitan area Garden city movement Steven Weinberg New York University Garden City, New York New York Life Insurance Parkway United Kingdom New Jersey Devils Stoke Mandeville Clay Pit Ponds State Park Edward Harrigan Julian Schwinger Elementary particle Albuquerque, New Mexico Toronto TIAA-CREF Feynman parametrization Daniel C. Tsui So Paulo New York City Department of Parks& Recreation Norman Foster Ramsey, Jr. Bibcode Center of the universe Philip Warren Anderson Fort Worth, Texas Pfizer William Henry Bragg Path integral formulation George Zweig Chautauqua County, New York Online Computer Library Center LagosDay to Day Jerusalem District Infinity (film) OklahomaScarsdale, City, Oklahoma NewPhys.Suffolk Rev.York County, New York KolkataHowell, New Jersey Schenectady, New York New Milford, Connecticut North Branford, Connecticut Herkimer County, New York Erie Canal QED (play) James Chadwick Klaus Fuchs Brooklyn Public LibraryEmpire State Flag of New York City Verrazano-Narrows Bridge Steuben County, New York MS-13 Gang Subatomic particle United States Department of Energy American Civil ChenWar Ning YangFive Families Atlanta Hamilton, Ontario United States Bill of Rights Dhaka Max Delbruck Working Families Party Stewart International Airport Utne Reader Fox News Channel Richard Feynman East Hampton (village), New York Martinus J. G. Veltman Project Tuva Cheesecake Eugene WignerEmilio G. Segr Quantum computer Rockland County, New York Budapest Major League Soccer Quantum computingVictor Weisskopf Gerd BinnigWilliam McLellan (nanotechnology) The Feynman Lectures on PhysicsOutlook (magazine) Winchester, Connecticut Yellow feverBeacon, New YorkList of New York City gardens New York Knights RLFC Jet Propulsion Laboratory Washington, D.C. Public-access television Rochester, New York Federal government of the United States Above mean sea level Jean Baptiste Perrin Austin, Texas Cattaraugus County, New York Frederic de Hoffmann Hideki Yukawa The New York Sun UTC-4 Electron Samuel C. C. Ting The Sunday Times Magazine New York City Marathon Jamaica Software development Commissioners' Plan of 1811 Taylor series Saltaire, New York Mother Jones (magazine) The Meaning of It All Kppen climate classification Mayor of New York City PATCO Speedline Differential calculus Russell Gardens, New York Houston, Texas Joseph Hooton Taylor, Jr. Salt Lake City metropolitan area Alice Tully Hall Livingston County, New York E (number)Soviet Union Frederick Reines Alternative newspaper Greene County, New York Microbiologist Westfield, New Jersey Water tower Essen New York City mayoral elections Stamford, Connecticut Yates County, New York Long Branch, New Jersey String theoryMorristown, New Jersey Greenwich Village Chinatown, Manhattan List of sovereign states Particle physics Loyalist (American Revolution) New York Mets Albany, New York Massachusetts v. Environmental Protection Agency Robert MarshakAntony Hewish National Medal of ScienceNorth Hempstead, New York Parallel computing Roach Guards Avant-garde Cove Neck, New York Politics of Long Island Brian David Josephson 2016 Summer Paralympics Herald (Pakistan) Incheon Royal Society Musical theatre 1980 United States Census Quogue, New York 1990 United States Census National Broadcasting Company Crime in New York City Owen Willans Richardson National Endowment for the Arts Feynman (disambiguation)Emily YoffeParis MtroKaplan Law School Metro Manila Open Directory Project SuperconductivityMax von Laue Half-derivative Greater St. Louis News Corporation United Nations headquarters MetLife 1960 United States Census Philip Morris International Charles Glover Barkla New York GiantsDeep inelastic scattering Long Beach, California Japan Gauteng British Crown Upper East Side The Walrus Samba Philadelphia, Pennsylvania Latium The Stationery Office Douglas D. Osheroff Flexagon Green building Estuary Oceanside, New York Times Square Intercity bus International Ladies' Garment Workers'Southwestern Union Connecticut North Shore (Long Island) Carlo Rubbia Karl Ferdinand Braun Portland, Oregon Mayor-council government Baxter Estates, New York Henry Way Kendall Internet Archive 2000 United States Census Capital District Yankee StadiumGothamFive Points, Manhattan Quantum electrodynamicsWalther Bothe New York dialectBethel, Connecticut Monmouth County, New Jersey Reason (magazine) San Diego, California Haiti National Oceanic and Atmospheric Administration Proton Jean-Marie Colombani Time Warner Center Major League Baseball 1992 Summer Paralympics Kings Park, New YorkPresident of the United States List of New York state symbols The Character of PhysicalBachelor Law of Science Clinton County, New York Red Bull New York Baltimore, Maryland Cecil Frank Powell Ho Chi Minh City Anthony James Leggett Hackensack, New Jersey Sullivan County, New York 1980 Summer Paralympics Einstein field equation Mount Sinai School of MedicineEastern Time Zone (North America) Westbury, New York Appalachians Stewart Manor, New York Roy J. Glauber Political machine Milwaukee, Wisconsin Conference House First Continental Congress WorldThe Trade Center Encyclopedia of New York City Cargo cult science Warren County, New York Central New York Region Lahore KPRC-TV Sydney Paul Dirac Kongar-ool Ondar Baruch Samuel Blumberg Kebab Champlain Valley Beta decay Fiorello H. LaGuardia Manhattan Project Market capitalization Largest cities in the Americas New Jersey Transit rail operations United States District Court for the Southern District of NewFordham York University Ernest Rutherford Troy Patterson Chinese people New York City ethnic enclaves Association of International Marathons and Road Races Sticky bead argumentViscosity Sagaponack, New YorkPort Authority of New York and New Jersey Danbury, Connecticut Wolfgang Ketterle Edward Victor Appleton Veronica Dillon Long Island SoundList of references to Long Island places in popular culture San Francisco Bay Area Salsa music Lindenhurst, New York Charles K. Kao Fire Island Rockefeller University Situation comedy Alma mater Konstantin Novoselov Roslyn Harbor, New York Liposarcoma Chemung County, New York Fort Tompkins New York City Council Riverhead (town), New York The Wave (newspaper) Shanghai Wolcott, Connecticut Bayonne, New Jersey Compressed natural gas Lancaster, Pennsylvania Greater New Haven Patrick Blackett, Baron Blackett Broadway theater The Phoenix (magazine) Super Bowl XLVIII Joan Feynman North Country, New York Dennis Gabor 2000 Summer Paralympics School of Visual Arts Arthur Leonard Schawlow Pace University Tokyo Subway Housing cooperative Southampton (town), New York Neutrino Boisfeuillet Jones, Jr. 2010 United States Census 2008 Summer Paralympics Greater Jacksonville Metropolitan Area Setback (architecture) New York Rangers List of U.S. cities with high transit ridershipDoonesbury David Weigel Summer Paralympic Games Northport, New York St. John's University (Jamaica, NY) Esther Lederberg Omega-minus particle India Today Sum-over-paths Tin Pan Alley New York County Ski country List of United States cities by population density Luis Walter Alvarez St. Patrick's DayKingdom of EnglandThe Progressive 1 E9 m Victor Stabin Principle of stationary actionMedia in New York City Monocle (2007 magazine) Hamilton Grange National Memorial Hudson Highlands Manhattan Neighborhood Network Bayville, New York Columbus, Ohio metropolitan area Boson Will Leitch Ilya Frank Megacity The Week (Indian magazine)Rob Walker (journalist) Gustav Ludwig HertzAlbert Einstein 1880 United States Census Music of Long Island Wilhelm Rntgen Wanamaker Mile New Statesman 1790 United States Census Minneapolis, Minnesota Rogers Commission List of hospitals in New York City Providence metropolitan area Cannabis (drug) Theoretical physics Community Service Society of New York New York City Landmarks Preservation CommissionAnsonia, Connecticut Greenport, Suffolk County, New York Citigroup Indigenous peoples of the Americas Adirondack MountainsNew York State Colorado Springs, Colorado Simon Doonan The Associated Press Mesa, Arizona Cairo Governorate James II of England Pike County, Pennsylvania Harlem Renaissance Quark Zhores Alferov Steven Chu Greater Houston Roosevelt Island ZIP code Kings Point, New York Makoto Kobayashi (physicist) United States Congress Johann Hari Cond Nast Building Greater San Antonio United States Army Passaic, New Jersey Rufus Wilmot Griswold Miami Republican Party (United States) Santo Domingo John William Strutt, 3rd Baron RayleighArXiv Jews Neural networkFreshman Carl Wieman W. Daniel Hillis Butterfly EffectJohannesburg Unicameralism Fort Wadsworth Tel Aviv Guilford, Connecticut Jamestown, New York New York City arts organizations Trinidad and Tobago Old Westbury, New York Bogot Advertising agency The OldieGreat Neck Plaza, New York Kenneth T. Jackson Encyclopedia of New York City Alexei Alexeyevich Abrikosov Boston Area code 646 Newark, New Jersey Brooklyn Cyclones Rush hour North Haven, New York NYC (disambiguation) Allegheny Plateau Isaac Newton Schuyler County, New York Kearny, New Jersey Belle Terre, New York Political independent John Hasbrouck Van Vleck Massachusetts Bay Colony News WeeklyAsthma Guttenberg, New Jersey Josiah Willard GibbsFeynman diagram Wikiquote National Journal PlaNYC Washington Metropolitan AreaDemographics of New York Shelter Island (town), New York George Washington Isolation tank Washington County, New York September 11 attacks Haute cuisine CBS New York Islanders George Smoot Parton (particle physics) Hudson County, New Jersey Whitaker's Almanack Litchfield County, Connecticut Milford, Connecticut North China Battery Park City, Manhattan Marble Hill, Manhattan Matthew SandsKenneth G. Wilson Second Continental CongressAllegany County, New York Roslyn, New York Sands Point, New YorkInland Empire (California) Leon Neil Cooper Apartment building Village of the Branch, New York Southold, New York New York Yankees New Rochelle, New York New York County, New York East Hampton (town), New York Warner Music Group 1900 United States Census Rockville Centre,Tribeca NewFilm Festival York Membership of the New York City Council Port Washington North, New York Ivar Giaever The Great White Way Greater Buenos AiresJakarta Memphis metropolitan area Victor Francis Hess Alberta Views Omaha, Nebraska Minneapolis Saint Paul Rockefeller CenterPhiladelphia San Diego metropolitan area Lev Landau Portland metropolitan area Quantum cellular automata Biology Fred Kaplan Garfield, New Jersey Triangle Shirtwaist Factory fireShawangunk Ridge Brookville, New York Heike Kamerlingh Onnes Hans BetheMeriden, Connecticut Anthony Burgess List of Long Islanders Lima Essex County, New York Physicist Uniform Resource Locator Articles of Confederation Melting pot Parallel processing Tianjin Edward Manukyan The Village Voice Jerusalem Interpol (band) Herbert Kroemer China Hip hop culture George E. Smith Martian Polykarp KuschBranford, Connecticut Grand Slam (tennis) American Broadcasting Company Baltimore International Standard Serial NumberOld master print Madison County, New York St. Lawrence County, New York Chonqing Real Time Opera List of theoretical physicists Riccardo Giacconi Pierre Curie Hempstead (town), New York Michael BloombergFur tradePerth Amboy, New Jersey Istanbul Poughkeepsie, New York Western Hemisphere Putnam Fellow BetheFeynman formula Nashville, Tennessee Jamaica Bay Albert Hibbs Matthew BroderickFermiDirac statistics Time zone Elizabeth, New Jersey IT Week James Surowiecki Science (journal) List of countries by homicide rate GuglielmoAltadena, California Marconi Prediction Battle of Long Island Heinrich Rohrer Doctoral advisor Fortnight MagazineStamp Act CongressGreat Migration (African American) Montgomery County, New York American Airlines Burton Richter American Museum of Natural History List of capitals in the United States Magill LenapeLondon Underground Somerset County, New Jersey Enrico Fermi Hugh David Politzer New Fairfield, Connecticut Classified Ventures Shenzhen Linden, New Jersey Books on New York City John C. Mather Critical exponent Amandla (magazine) Beijing Review Stratford, Connecticut Kaplan Financial Ltd Westchester County Noseweek Schoharie County, New York MasatoshiPhilosophiae KoshibaNaturalis Principia Mathematica Administrative divisions of NewGoogle York Book Search Troy, New York Waterbury, Connecticut Great Irish Famine New York Stock Exchange Hearst Tower (New York City) Ketamine Shelton, Connecticut Rapid transit Tsung-Dao LeeFrits ZernikeBoseEinstein statistics Clinton Davisson Paul Krugman London Heidelberg Music of New York City New Haven, Connecticut Ripponden David Gross Time Warner History of New York TheCity New York Times Barbara McClintock Tuva or Bust Hudson-Fulton Celebration Todt Hill Carnegie Hall Tampa Bay AreaPunk subculture New York Liberty Seoul United States Census Bureau List of people from New York Willard Boyle New York Draft Riots Dongguan Christopher Hitchens Hectare The WeekForward Magazine Rome Long Island Robert Andrews Millikan List of United States cities by population Bacteriophage lambda U.S. News& World Report United States East Williston, New York Islandia, New York Game design List of law enforcement agencies in Long Island Microsoft Excel Sports in New York City Timeline of New York City crimes and disasters New Amsterdam Charles II of England John C. Lilly William Daniel Phillips Brooklyn Islip (hamlet), New York Columbus, Ohio Broadway (New York City) Nobel Prize in Physics Forth magazine Tucson, Arizona Irish people 1860 United States Census South Africa Austan Goolsbee US Open (tennis) EcuadorSag Harbor, New York HBO Harper's Magazine Princeton University Adobe Flash Latin Kings (gang) E. B. White Fifth Avenue (Manhattan) Seymour, Connecticut Dutchess County, New York Government of New York City West Haven, Connecticut Frank Wilczek New York Times Laurel Hollow, New York New York Harbor Feynman sprinkler Lake Success, New York Bruce ReedNew Orleans metropolitan area The American Prospect Henri Becquerel San Jose, California Pennsylvania Station (New York City) Prospect (magazine) Times Herald-Record Area code 718 The New Republic Miami, Florida San Antonio, Texas The Jerusalem Report County (United States) Forty Thieves (New York gang) Overland (literary journal) Boston, MassachusettsEdwin G. Burrows Investigate (magazine) JSTOR Wallingford, Connecticut Aage Bohr Silicon Valley Shoreham, New York Nicholas Metropolis Genetics (journal) Cortland County, New York Secaucus, New Jersey Fermion FermilabClaude Cohen-TannoudjiPhilipp Lenard List of Nobel laureates in Physics Barclays Center Light cone Area code 347 List of cities with most skyscrapers The EconomistClifton, New Jersey Rudolf Mssbauer The New York Review of Books Chenango County, New York Werner Heisenberg New York City Subway Engineering Robert R. Wilson Sabbatical Horst Ludwig Strmer UrbanSacramento, area California Superfluidity Rev. Mod. Phys. The Washington Post Queens Borough Public Library South Florida metropolitan area Port Authority Trans-Hudson Weak interaction One-electron Ernstuniverse Ruska Community of Madrid Poland Cincinnati Northern Kentucky metropolitan area Economy of New York City Robert B. Laughlin Burial Ridge1939 New York World's Fair Bandung Caltech European American Robert Hofstadter Physical ReviewHuman computer Punk rock General American Pulitzer Prize Charter school Lattingtown, New York TWA Oswego County, New York Nevill Francis Mott There'sEliot Spitzer Plenty of Room at the BottomMetropolitan Opera Geography of Long Island Summit, New Jersey East Haven, Connecticut MagnumSwedish Photos Cottage Marionette Theatre Metro-North Railroad Albert Einstein Award Simon van der FreemanMeer Dyson Harrison, New Jersey Finger Lakes WNYC Tech Valley Capital Ethiopia Wikipedia Honsh John A. Wheeler Phil Carter Flash VideoBASIC Parity (physics) Melvin SchwartzLeo Esaki African Americans Lake Grove, New York The Strokes Tel Aviv District 1800 United States Census Ann Hulbert WheelerFeynman absorber theoryJacques Attali Montreal Metro Norval White Thomas Curtright Feynman diagrams Far Rockaway, QueensJohannes Stark Jewish quota Education in New York City Coney Island 1970 United States Census Michael Steinberger Maclean's Guido Pontecorvo Macy's Thanksgiving Day Parade Robert Coleman Richardson List of metropolitan areas by population Tim Wu Office of Scientific and Technical InformationGerald M. Rosberg Far Rockaway High School Bangalore Midtown Manhattan The Middle East (magazine) Google VideosBCS theory Physics Richard Chace Tolman Ultraviolet David Edelstein Peter Parnell List of physicists El Tiempo LatinoNiels BohrWhat Do You Care What Other People Think? AtheismTuvaPieter ZeemanNASA Kaplan, Inc. John Gribbin The Monthly Arthur Compton Amazon.com Village (magazine) Slate (magazine) WKMG-TV Hal S. Jones WJXT Post-Newsweek Stations Key West Literary Seminar The Star (Bangladesh) U. S. Department of Justice Boston Phoenix Dahlia Lithwick John Alderman (publishing) Ann L. McDaniel Fairfax County Times Jack Shafer FT Magazine The Washington Post Writers Group Newsline (magazine) Chaos theory James Gleick Atlantic Annie Lowrey Eric Leser Steven Landsburg The Fray (Internet forum) Salon.com PrivateSeth StevensonEye Washington Post Company New Zealand Listener Dhaka Courier KSAT-TV National ReviewStephen Jay Gould Sasha Frere-Jones Jonah Weiner Wilson Quarterly Linguistics William Saletan Robert Wright (journalist)Harvard College Jacob Weisberg David Plotz Ethiopian Review Andy Bowers Foreign AffairsThe Nation Human Events E!Sharp Mitchell Feigenbaum Dana Stevens (critic) The Atlantic The New Yorker WDIV-TV Adam Kirsch Podcast Dear Prudence (advice column)Virginia Heffernan Douglas Hofstadter Timothy Noah The Liberal The Herald (Everett) Asia Pacific Management Institute Farhad Manjoo WPLG The American Spectator BBC Focus on Africa David Helvarg Donald E. Graham Fareed Zakaria MSN Queue Magazine Literary Review of CanadaForeign Policy Greater Washington Publishing Ron Rosenbaum Anne Applebaum
Vijay Ravindran Tom VanderbiltNews magazine Nina Shen Rastogi Microsoft London Review of Books European Commission Authors Guild Newsweek EUobserver Daniel Radosh New African New York (magazine)Harvard Crimson Kaplan College Paaras National Book Award North& South (magazine) Franklin Foer Fig. 4: Environment described by three 16 × 20 rooms connected The Big Issue Policy Review Michael Kinsley Dissent (Australian magazine) Meghan O'Rourke Autodesk Forum (Bangladesh) The Drouth CableOne David Greenberg Standpoint (magazine) Benoit Mandelbrot Robert PinskyExpress (newspaper) Griffith Review SCORE! Educational Centers The Spectator The Washington Post Company Antitrust Emily Bazelon Chief Executive Officer The Bulletin (Brussels weekly) through the middle row. Concord Law School Ian Bremmer Southern Maryland Newspapers The PipelineTehelka The Gazette (Maryland) Minneapolis Kaplan University Maisonneuve (magazine) Washington Post This (Canadian magazine) Quadrant (magazine) National Public Radio The Slate Group Dublin Business School The Weekly Standard Time (magazine) John Dickerson (journalist) Variant (magazine) Canadian Dimension Eliot Porter National Observer (Australia) Jody Rosen USA The New York Times MagazineIndependent station (North America) Fig. 3: Graph for the "James Gleick" Wikipedia article with 1975 nodes and 1999 edges. The seed article is in light blue. The size of the nodes (except for the seed node) is proportional to the similarity. In red are all the nodes with similarity bigger than 0.7. There are several articles with high similarity more than one link ahead. 0.06 0.04
0.02 π function. The value-function is the function V : X 7→ R defined 0.00 as 0.02 " ∞ # π t 0.04 V (x) = E ∑ γ r(xt ) x0 = x,at = π(xt ) , t=1 0.06 and plays a key role in many RL algorithms [Sze10]. When the 14 π 12 dimension of X is significant, it is common to approximate V (x) 10 8 rows by 10 20 6 π 30 4 V ≈ Vˆ = Φw, columns40 2 50 where Φ is an n-by-k matrix whose columns are the basis functions Second to fourth eigenvectors of the laplacian of the three used to approximate the value-function, n is the number of states, Fig. 5: rooms graph. Note how the eigendecomposition automatically cap- and w is a vector of dimension k. Typically, the basis functions tures the structure of the environment. are selected by hand, for example, by using polynomials or radial basis functions. Since choosing the right functions can be difficult, Mahadevan and Maggioni [Mah07] proposed a framework where Visualizing the LSA Space these basis functions are learned from the topology of the state We believe that being able to work in a vector space will allow us space. The key idea is to represent the state space by a graph and to use a series of RL techniques that otherwise we would not be use the k smoothest eigenvectors of the graph laplacian, dubbed available to use. For example, when using Proto-value functions, Proto-value functions, as basis functions. Given the graph that it is possible to use the Nyström approximation to estimate the represents the state space, it is very simple to find these basis value of an eigenvector for out-of-sample states [Mah06]; this is functions. As an example, consider an environment consisting of only possible if states can be represented as points belonging to a three 16 × 20 grid-like rooms connected in the middle, as shown Euclidean space. in Fig.4. Assuming the graph is stored in G, the following code7 How can we embed an entity in Euclidean space? In the computes the eigenvectors of the laplacian: previous section we showed that LSA can effectively compute L = nx.laplacian(G, sorted(G.nodes())) the similarity between documents. We can take this concept one evalues, evec = np.linalg.eigh(L) step forward and use LSA not only for computing similarities, but Figure5 shows 8 the second to fourth eigenvectors. Since in general also for embedding documents in Euclidean space. value-functions associated to this environment will exhibit a fast To evaluate the soundness of this idea, we perform an ex- change rate close to the room’s boundaries, these eigenvectors ploratory analysis of the simple Wikipedia LSA space. In order provide an efficient approximation basis. to be able to visualize the vectors, we use ISOMAP [Ten00] to reduce the dimension of the LSA vectors from 200 to 3 (we use 7. We assume that the standard import numpy as np and import the ISOMAP implementation provided by scikit-learn [Ped11]). networkx as nx statements were previously executed. We show a typical result in Fig.6, where each point represents the 8. The eigenvectors are reshaped from vectors of dimension 3 × 16 × 20 = LSA embedding of an article in R3, and a line between two points 960 to a matrix of size 16-by-60. To get meaningful results, it is necessary to represents a link between two articles. We can see how the points build the laplacian using the nodes in the grid in a row major order. This is why the nx.laplacian function is called with sorted(G.nodes()) as close to the "Water" article are, in effect, semantically related the second parameter. ("Fresh water", "Lake", "Snow", etc.). This result confirms that the A TALE OF FOUR LIBRARIES 15
[Ped11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and Rain E. Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12:2825-2830, 2011 Sea 0.3 [Reh10]R. Reh˚uˇ rekˇ and P. Sojka. Software Framework for Topic Modelling Snow 0.4 with Large Corpora, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45-50 May 2010 0.5 [Sze10] C. Szepesvári. Algorithms for Reinforcement Learning. San Rafael, 0.6 CA, Morgan and Claypool Publishers, 2010. [Sut98] R.S. Sutton and A.G. Barto. Reinforcement Learning. Cambridge, Drink 0.7 Massachusetts, The MIT press, 1998. River 0.8 [Mah07] S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian Lake framework for learning representation and control in Markov deci- Fresh water Salt water sion processes. Journal of Machine Learning Research, 8:2169-2231, 0.20 2007. Water 0.25 [Man08] C.D. Manning, P. Raghavan and H. Schutze. An introduction to 0.2 0.30 information retrieval 0.1 . Cambridge, England. Cambridge University 0.0 0.35 0.1 Press, 2008 0.2 0.40 [Ten00] J.B Tenenbaum, V. de Silva, and J.C. Langford. A global geo- 0.3 0.4 0.45 metric framework for nonlinear dimensionality reduction . Science, 290(5500), 2319-2323, 2000 Fig. 6: ISOMAP projection of the LSA space. Each point represents [Mah06] S. Mahadevan, M. Maggioni, K. Ferguson and S.Osentoski. Learn- the LSA vector of a Simple English Wikipedia article projected ing representation and control in continuous Markov decision pro- onto R3 using ISOMAP. A line is added if there is a link between cesses. National Conference on Artificial Intelligence, 2006. the corresponding articles. The figure shows a close-up around the [Dee90] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer and R. "Water" article. We can observe that this point is close to points Harshman, R. (1990). Indexing by latent semantic analysis. Journal associated to articles with a similar semantic. of the American Society for Information Science, 41(6), 391-407.
LSA representation is not only useful for computing similarities between documents, but it is also an effective mechanism for embedding the information entities into a Euclidean space. This result encourages us to propose the use of the LSA representation in the definition of the state. Once again we emphasize that since Gensim vectors are NumPY arrays, we can use its output as an input to scikit-learn without any effort.
Conclusions We have presented an example where we use different elements of the scientific Python ecosystem to solve a research problem. Since we use libraries where NumPy arrays are used as the standard vector/matrix format, the integration among these components is transparent. We believe that this work is a good success story that validates Python as a viable scientific programming language. Our work shows that in many cases it is advantageous to use general purposes languages, like Python, for scientific computing. Although some computational parts of this work might be some- what simpler to implement in a domain specific language,9 the breadth of tasks that we work with could make it hard to integrate all of the parts using a domain specific language.
Acknowledgment This work was partially supported by AFOSR grant FA9550-09- 1-0465.
REFERENCES [Tan09] B. Tanner and A. White. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments, Journal of Machine Learn- ing Research, 10(Sep):2133-2136, 2009 [Hag08] A. Hagberg, D. Schult and P. Swart, Exploring Network Structure, Dynamics, and Function using NetworkX, in Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11-15, Aug 2008
9. Examples of such languages are MATLAB, Octave, SciLab, etc.