<<

Major Web Intelligence Tools

© 2005 1 Web Intelligence Tools

• I. Collection – Offline Explorer – SpidersRUs (AI Lab) – Scholar

• II. Analysis (Data and Text Mining) – Google APIs – Google Translation – GATE – Arizona Noun Phraser (AI Lab) – Self-Organizing Map, SOM (AI Lab) – Weka

• III. Visualization – NetDraw – JUNG – Analyst’s Notebook and Starlight © 2005 2 Collection: Offline Explorer

• Developed by MetaProducts Corporation, Offline Explorer can download Web sites to your hard disk for offline browsing. http://www.metaproducts.com/OE.html

• Advantages of Offline Explorer – Save Time: Download up to 500 simultaneously.

– Save Yesterday's Web Sites for Tomorrow's Use

– Monitor Web Sites

– Mine your Data • TextPipe tool in Offline Explorer Pro edition can extract or change the desired data, or even explort it to a database.

© 2005 3 Offline Explorer

Project list

Project properties setup window

Download URLs File filters, URL filters, Download and other advanced level properties.

File modification check © 2005 4 SpidersRUs

• SpidersRUs Digital Library Toolkit was developed by Artificial Intelligence Lab at the University of Arizona. http://ai.eller.arizona.edu/spidersrus/

• Provide modular tools for spidering, indexing, searching for building digital libraries in different languages in a simple DIY (Do-It- Yourself) way. Users can create their own search engines easily and quickly via the friendly user interface.

• SpidersRUs can automate the development of vertical search engines in different domains and languages. It can work on non- English languages such as Asian and Middle East languages.

© 2005 5 SpidersRUs

Keyword search

Search results

© 2005 An example of a Chinese search engine built by SpidersRUs 6

• Google Scholar provides a simple way to broadly search for scholarly literature. http://scholar.google.com/

• Features of Google Scholar: – Search diverse sources from one convenient place – Find papers, abstracts and citations – Locate the complete paper through your library or on the web – Learn about key papers and scholars in any area of research

© 2005 7 Google Scholar

Search for “Bioterrorism” in Google Scholar

List of papers citing this paper

366 citations

© 2005 8 Analysis: Google APIs

• Google provides many APIs to help you quickly develop your own applications. http://code.google.com/more/

• Examples of Google APIs: – Google API for Inlink: Discovers what pages link to your website. – Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including , , , Google Spreadsheets and Web Albums. – Google AJAX Search API: Uses JavaScript to embed a simple, dynamic box and display search results in your own Web pages. – : Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. – Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected and malware pages. – YouTube Data API: Integrates online videos from YouTube into your applications.

© 2005 9 Example: Google API for Inlink

Results: all the related inlink Web pages

Input “link URL” and search © 2005 10