The Top 10 Alternative Search Engines (ASE) - Within Selected Categories Ranked by Webometric Indicators
Total Page:16
File Type:pdf, Size:1020Kb
The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked by Webometric Indicators Bernd Markscheff el Bastian Eine Within the scientifi c fi eld of webometrics many research objects had been analyzed and compared with the help of webometric indicators (e.g., the performance of a whole country, a research group or an indi- vidual). This paper presents a ranking of ASEs which is based on webo- metric indicators. As search engines have become an essential tool for searching for information on the web many alternative search services have specialized in fi nding topic- or format-specifi c search results. By creating a ranking of these ASEs within selected categories we present an overview of the ASEs which are currently available. Through webo- metric indicators the ASEs were compared and the most popular ASEs of the respective categories determined. Keywords: Search engines, Webometric indicators Bernd Markscheff el Chair of Information and 1. Introduction Knowledge Management Technische Universität While searching for information on the web, search en- Ilmenau gines can assist the user to fi nd satisfying results. Although, P.O. Box 100565 just a few dominate the search engine market a large num- 98684 Ilmenau ber of diff erent web search services and tools exist (Maaβ Germany et al., [1]). Beside the well known universal search engines bernd.markscheff el@ like Google, Yahoo and Bing several ASEs are specialized tuilmenau.de in providing options to search for special document types, specifi c topics or time-sensitive information (Gelernter [2]; Bastian Eine Consultant for Search Engine Originally presented at the 7th International Conference on Webometrics, Optimization and Online Informetrics and Scientometrics (WIS) and 12th COLLNET Meeting, Marketing September 20–23, 2011, Istanbul Bilgi University, Istanbul, Turkey. Bochumer Straβe 37 38108 Braunschweig Published Online First : 10 March 2012 Germany http://www.tarupublications.com/journals/cjsim/cjsim.htm [email protected] © COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 1 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked Lewandowski [3]). Because of the large number of ASEs and their rapidly changing range, we want to investigate the dynamic development of the ASE market. Therefore, a rank- ing can help to create a picture of the most popular ASEs currently available. To compare research objects like web sites, web pages or parts of web pages by their popularity or external impact, selected webometric indicators can be used (Ingwersen [4]). Webometric indicators can be based on data related to the web page content, web link structure, web usage or web technology (Björneborn & Ingwersen [5]; Thelwall et al., [6]). We used three diff erent webometric indicators to rank ASEs. Through the Web Impact Factor (WIF) (In- gwersen [4]) and the PageRank (Page et al., [7]) we analyzed data based on the web link structure; with the help of the Alexa Traff ic Rank we analyzed data based on the web usage (Alexa Internet [8]). These three indicators were chosen because they evaluate a large amount of data and can be retrieved easily. By using three diff erent indicators based on two types of data an objective ranking of the ASEs can be expected. 2. Methotology At fi rst we predefi ned the categories of ASEs which we will analyse in our research. Then we determined a universal set of ASEs according to these categories. Subsequently we retrieved the selected webometric indicators for the determined ASEs. Finally, we cal- culated a total value for each ASE based on the three indicators and created a ranking for each category. 2.1. Categorisation of ASEs ASEs can be structured by several diff erent approaches. The following approaches can be used to diff erentiate between the multiple types of search engines: • User behaviour (Broder [9]) • Universal & specialized search (Gelernter [2]; Lewandowski [3]) • Manual & automatic indexing (Baeza-Yates & Ribeiro-Neto [10]) • Invisible web search (Sherman & Price [11]) • Social search (Skusa & Maaβ [12]) • Real time search (Lewandowski [13]) • Semantic search (Berners-Lee et al., [14]; Skusa & Maaβ 2008 [12]) • Visual search (Weinhold [15], Bekavac et al., [16]) • Personalized search (Griesbaum [17]) • Local search (Lewandowski [3]) These and further approaches are implemented in many diff erent combinations by the ASEs. In addition, the market of ASEs and its development are highly dynamic. Hence, 2 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine there is no universally valid categorisation of search engines. We selected the following categories of ASEs for our research: • Image search engines • Video search engines • Audio search engines • People search engines • Question & answer services • Social bookmarking services • Blog search engines • Twitter search engines • News search engines • Science search engines 2.2. Determination of the Universal Set To receive a complete, actual and objective universal set for our ranking, we used two diff erent methods for the determination of ASEs (a more detailed description of these methods can be found in Eine & Markscheff el [18]). On the one hand we evaluated search engine lists established and maintained by experts. On the other hand we specifi ed search queries, submitted them to Google and evaluated the respective search results. In the be- ginning of our evaluation of these lists, they contained a total number of 1695 search en- gines. We analyzed each of these search engines by the following criteria. A search engine of our universal set has to • Be available and functional • Fit in one of the selected categories • Use methods of his own to utilise their own or an external search index • Be without a restriction regarding topic or country (except the restrictions given by the categories) • Off er its service without a registration or charges for the user (except science search engines) In addition, the following criteria for our categories were specifi ed: • I mage and video search engines do include online communities, portals and archives with a search function available • Audio search engines have to off er an option to download the results for free • People search engines do not include social networks, white or yellow pages • News search engines have to aggregate their search results from more resources than their own news articles. COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 3 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked Further ASEs were collected by specifying search queries and submitting them to Google. Therefore, we determined keywords which the search queries should include to receive a large number of ASEs in the result set. Keywords can be found on the website of the search engines which we obtained from the search engine lists. In addition, we submitted the names of these search engines to Google to receive the search engine’s short text which also can be seen as a source of po- tential keywords. The collected keywords were assigned to our categories. After that, one search query for each category was submitted to Google (the keywords for one query were combined by the OR operator). The fi rst 100 search results were analyzed with the help of the same criteria as the ones used for the search engine lists. Finally, the ASEs obtained by these two methods were merged and duplicates removed. The determined universal set of ASEs consists of • 50 image search engines • 45 video search engines • 24 audio search engines • 7 People search engines • 13 question & answer services • 21 social bookmarking services • 17 blog search engines • 27 twitter search engines • 25 news search engines • 24 science search engines 2.3. Data Collection To determine the most popular ASEs for each category, we retrieved the WIF, PageRank and Alexa Traff ic Rank for each research object of our universal set. The WIF is collected in a very simple form. In contrast to the WIF by Ingwersen [4] we taken into account only the number of external inlinks for the ASEs. We did not imply the search engines’ total number of web pages because the popularity of the search function is the relevant subject of interest. We determined the number of external inlinks with the help of a universal search engine. Universal search engines provide commands or tools to retrieve the number of external inlinks to a web site or web page easily. Therefore, the universal search engines represent data which is gathered by their crawlers. These crawlers cannot reach all parts of the web due to the web structure (Broder et al., [9]) and dynamic of the web. Hence, it is not an ideal but capable tool for our purpose (Thelwall [19]). We used the Yahoo Site Explorer to collect the total number of external inlinks. Through the PageRank, we analyzed the web link structure as well. The basic The weight of a link indicates that a link from one web page to another can have diff erent impacts on a web page depending on the reputation of the outlinking web page. 4 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine A web page has a high reputation when it receives inlinks from many popular web pages or has the status of a web page with high quality content regarding a specifi c topic. When calculating the PageRank a web page’s PageRank is passed on to the outlinked web pages by allocating it equally to them. So the calculation of a web page’s PageRank is recursive because its value depends on the PageRank of the inlinked web pages and in turn has an infl uence on the PageRank of the outlinked web pages. We retrieved the PageRank of the ASEs using the Google Toolbar. Google Toolbar is a browser add-on and provides diff er- ent functions and information to assist the user while surfi ng the web.