Major Web Intelligence Tools
© 2005 1 Web Intelligence Tools
• I. Collection – Offline Explorer – SpidersRUs (AI Lab) – Google Scholar
• II. Analysis (Data and Text Mining) – Google APIs – Google Translation – GATE – Arizona Noun Phraser (AI Lab) – Self-Organizing Map, SOM (AI Lab) – Weka
• III. Visualization – NetDraw – JUNG – Analyst’s Notebook and Starlight © 2005 2 Collection: Offline Explorer
• Developed by MetaProducts Corporation, Offline Explorer can download Web sites to your hard disk for offline browsing. http://www.metaproducts.com/OE.html
• Advantages of Offline Explorer – Save Time: Download up to 500 files simultaneously.
– Save Yesterday's Web Sites for Tomorrow's Use
– Monitor Web Sites
– Mine your Data • TextPipe tool in Offline Explorer Pro edition can extract or change the desired data, or even explort it to a database.
© 2005 3 Offline Explorer
Project list
Project properties setup window
Download URLs File filters, URL filters, Download and other advanced level properties.
File modification check © 2005 4 SpidersRUs
• SpidersRUs Digital Library Toolkit was developed by Artificial Intelligence Lab at the University of Arizona. http://ai.eller.arizona.edu/spidersrus/
• Provide modular tools for spidering, indexing, searching for building digital libraries in different languages in a simple DIY (Do-It- Yourself) way. Users can create their own search engines easily and quickly via the friendly user interface.
• SpidersRUs can automate the development of vertical search engines in different domains and languages. It can work on non- English languages such as Asian and Middle East languages.
© 2005 5 SpidersRUs
Keyword search
Search results
© 2005 An example of a Chinese search engine built by SpidersRUs 6 Google Scholar
• Google Scholar provides a simple way to broadly search for scholarly literature. http://scholar.google.com/
• Features of Google Scholar: – Search diverse sources from one convenient place – Find papers, abstracts and citations – Locate the complete paper through your library or on the web – Learn about key papers and scholars in any area of research
© 2005 7 Google Scholar
Search for “Bioterrorism” in Google Scholar
List of papers citing this paper
366 citations
© 2005 8 Analysis: Google APIs
• Google provides many APIs to help you quickly develop your own applications. http://code.google.com/more/
• Examples of Google APIs: – Google API for Inlink: Discovers what pages link to your website. – Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. – Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages. – Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. – Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages. – YouTube Data API: Integrates online videos from YouTube into your applications.
© 2005 9 Example: Google API for Inlink
Results: all the related inlink Web pages
Input “link URL” and search © 2005 10