<<

The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked by Webometric Indicators

Bernd Markscheff el Bastian Eine

Within the scientifi c fi eld of webometrics many research objects had been analyzed and compared with the help of webometric indicators (e.g., the performance of a whole country, a research group or an indi- vidual). This paper presents a ranking of ASEs which is based on webo- metric indicators. As search engines have become an essential tool for searching for information on the web many alternative search services have specialized in fi nding topic- or format-specifi c search results. By creating a ranking of these ASEs within selected categories we present an overview of the ASEs which are currently available. Through webo- metric indicators the ASEs were compared and the most popular ASEs of the respective categories determined.

Keywords: Search engines, Webometric indicators

Bernd Markscheff el Chair of Information and 1. Introduction Knowledge Management Technische Universität While searching for information on the web, search en- Ilmenau gines can assist the user to fi nd satisfying results. Although, P.O. Box 100565 just a few dominate the market a large num- 98684 Ilmenau ber of diff erent web search services and tools exist (Maaβ Germany et al., [1]). Beside the well known universal search engines bernd.markscheff el@ like , Yahoo and Bing several ASEs are specialized tuilmenau.de in providing options to search for special document types, specifi c topics or time-sensitive information (Gelernter [2]; Bastian Eine Consultant for Search Engine Originally presented at the 7th International Conference on Webometrics, Optimization and Online Informetrics and Scientometrics (WIS) and 12th COLLNET Meeting, Marketing September 20–23, 2011, Istanbul Bilgi University, Istanbul, Turkey. Bochumer Straβe 37 38108 Braunschweig Published Online First : 10 March 2012 Germany http://www.tarupublications.com/journals/cjsim/cjsim.htm [email protected] ©

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 1 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked

Lewandowski [3]). Because of the large number of ASEs and their rapidly changing range, we want to investigate the dynamic development of the ASE market. Therefore, a rank- ing can help to create a picture of the most popular ASEs currently available. To compare research objects like web sites, web pages or parts of web pages by their popularity or external impact, selected webometric indicators can be used (Ingwersen [4]). Webometric indicators can be based on data related to the web page content, web link structure, web usage or web technology (Björneborn & Ingwersen [5]; Thelwall et al., [6]). We used three diff erent webometric indicators to rank ASEs. Through the Web Impact Factor (WIF) (In- gwersen [4]) and the PageRank (Page et al., [7]) we analyzed data based on the web link structure; with the help of the Alexa Traff ic Rank we analyzed data based on the web usage (Alexa Internet [8]). These three indicators were chosen because they evaluate a large amount of data and can be retrieved easily. By using three diff erent indicators based on two types of data an objective ranking of the ASEs can be expected.

2. Methotology At fi rst we predefi ned the categories of ASEs which we will analyse in our research. Then we determined a universal set of ASEs according to these categories. Subsequently we retrieved the selected webometric indicators for the determined ASEs. Finally, we cal- culated a total value for each ASE based on the three indicators and created a ranking for each category.

2.1. Categorisation of ASEs ASEs can be structured by several diff erent approaches. The following approaches can be used to diff erentiate between the multiple types of search engines:

• User behaviour (Broder [9]) • Universal & specialized search (Gelernter [2]; Lewandowski [3]) • Manual & automatic indexing (Baeza-Yates & Ribeiro-Neto [10]) • Invisible web search (Sherman & Price [11]) • Social search (Skusa & Maaβ [12]) • Real time search (Lewandowski [13]) • (Berners-Lee et al., [14]; Skusa & Maaβ 2008 [12]) • Visual search (Weinhold [15], Bekavac et al., [16]) • Personalized search (Griesbaum [17]) • (Lewandowski [3])

These and further approaches are implemented in many diff erent combinations by the ASEs. In addition, the market of ASEs and its development are highly dynamic. Hence,

2 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine there is no universally valid categorisation of search engines. We selected the following categories of ASEs for our research:

• Image search engines • Video search engines • Audio search engines • People search engines • Question & answer services • Social bookmarking services • Blog search engines • search engines • News search engines • Science search engines

2.2. Determination of the Universal Set To receive a complete, actual and objective universal set for our ranking, we used two diff erent methods for the determination of ASEs (a more detailed description of these methods can be found in Eine & Markscheff el [18]). On the one hand we evaluated search engine lists established and maintained by experts. On the other hand we specifi ed search queries, submitted them to Google and evaluated the respective search results. In the be- ginning of our evaluation of these lists, they contained a total number of 1695 search en- gines. We analyzed each of these search engines by the following criteria. A search engine of our universal set has to

• Be available and functional • Fit in one of the selected categories • Use methods of his own to utilise their own or an external search index • Be without a restriction regarding topic or country (except the restrictions given by the categories) • Off er its service without a registration or charges for the user (except science search engines) In addition, the following criteria for our categories were specifi ed: • I mage and video search engines do include online communities, portals and archives with a search function available • Audio search engines have to off er an option to download the results for free • People search engines do not include social networks, white or yellow pages • News search engines have to aggregate their search results from more resources than their own news articles.

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 3 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked

Further ASEs were collected by specifying search queries and submitting them to Google. Therefore, we determined keywords which the search queries should include to receive a large number of ASEs in the result set. Keywords can be found on the website of the search engines which we obtained from the search engine lists. In addition, we submitted the names of these search engines to Google to receive the search engine’s short text which also can be seen as a source of po- tential keywords. The collected keywords were assigned to our categories. After that, one search query for each category was submitted to Google (the keywords for one query were combined by the OR operator). The fi rst 100 search results were analyzed with the help of the same criteria as the ones used for the search engine lists. Finally, the ASEs obtained by these two methods were merged and duplicates removed. The determined universal set of ASEs consists of

• 50 image search engines • 45 video search engines • 24 audio search engines • 7 People search engines • 13 question & answer services • 21 social bookmarking services • 17 blog search engines • 27 twitter search engines • 25 news search engines • 24 science search engines

2.3. Data Collection To determine the most popular ASEs for each category, we retrieved the WIF, PageRank and Alexa Traff ic Rank for each research object of our universal set. The WIF is collected in a very simple form. In contrast to the WIF by Ingwersen [4] we taken into account only the number of external inlinks for the ASEs. We did not imply the search engines’ total number of web pages because the popularity of the search function is the relevant subject of interest. We determined the number of external inlinks with the help of a universal search engine. Universal search engines provide commands or tools to retrieve the number of external inlinks to a web site or web page easily. Therefore, the universal search engines represent data which is gathered by their crawlers. These crawlers cannot reach all parts of the web due to the web structure (Broder et al., [9]) and dynamic of the web. Hence, it is not an ideal but capable tool for our purpose (Thelwall [19]). We used the Yahoo Site Explorer to collect the total number of external inlinks. Through the PageRank, we analyzed the web link structure as well. The basic The weight of a link indicates that a link from one web page to another can have diff erent impacts on a web page depending on the reputation of the outlinking web page.

4 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine

A web page has a high reputation when it receives inlinks from many popular web pages or has the status of a web page with high quality content regarding a specifi c topic. When calculating the PageRank a web page’s PageRank is passed on to the outlinked web pages by allocating it equally to them. So the calculation of a web page’s PageRank is recursive because its value depends on the PageRank of the inlinked web pages and in turn has an infl uence on the PageRank of the outlinked web pages. We retrieved the PageRank of the ASEs using the Google Toolbar. Google Toolbar is a browser add-on and provides diff er- ent functions and information to assist the user while surfi ng the web. After entering a web page, the PageRank of the respective web page is shown in the Google Toolbar as a rounded whole-number. We entered the web page of each ASE in our universal set and collected the shown PageRank. For analyzing web usage data, we retrieved the Alexa Traff ic Rank of the ASEs. The web service Alexa Internet evaluates the web usage of Alexa Toolbar users and ìother, diverse traff ic data sourcesî to calculate the Alexa Traff ic Rank (Alexa Internet [8]). The Alexa Traff ic Rank represents the rank of a web site in the Alexa Ranking. Therefore, Alexa Internet collects the number of visitors of a web site (reach) and the number of page views within a web site. Thereby, ìmultiple page views of the same page made by the same user on the same day are counted only onceî (Alexa Internet [8]). For the Alexa Traff ic Rank Alexa Internet calculates the arithmetic average of the reach- and page-view-data which had been measured during the last three months. However, Alexa Internet points out that the analyzed data does not represent the global internet population, because it is collected primarily through the Alexa Toolbar. Thus the data represents the behavior of Alexa Tool- bar users. To prevent potential biases Alexa Internet tries to include methods with which web site visitors without an installed Alexa Toolbar can be taken into account. These meth- ods are not yet explained in more detail. In addition, Alexa Internet notifi es that their ranking of web pages, especially for web pages with fewer than 1000 visitors per month, can be inaccurate because of the size of the web and Alexa’s algorithms. In spite of these uncertainties Alexa idea of the PageRank is to rate search results not only by the number of links but also by the weight of links (Page et al., [7]). usage data because it is an instrument which evaluates a large amount of data, is avail- able without the need of any registration or charges and performed representative results in another webometric study (Vaughan [20]). A main disadvantage of the Alexa Traff ic Rank is that it analyses only a URLs second level domain (e.g., http://video.google.com/ has the same Alexa Traff ic Rank as http://google.com).

2.4. Calculation of Total Values and Creation of the Rankings The collection of the data was performed on the 6th of October in 2010. We could assign one value per indicator for each ASE. Subsequently we calculated one total value based on our retrieved values for each ASE. Therefore, we normalized the values of the three indi- cators because of the indicators diff erent scales. Three ranking orders were created up for each category. In the fi rst ranking order the search engines were sorted in descending order based on the WIF, in the second ranking order descending based on the PageRank and in

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 5 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked the third ranking order ascending based on the Alexa Traff ic Rank. For every ranking order each search engine takes one rank.

3. Findings The fi ndings of our research are represented by the rankings shown in Table 1-6. Traff ic Rank can be used to rate ASEs by their web A rank represents the position of the search engine in the respective ranking order. If there are two or more search engines with the same value for one indicator in the same ranking order then the rank of these search en- gines is represented by the arithmetic average of the aff ected ranks. That way each search engine receives three ranks which represent the normalized values for the three indicators. By using ranking orders the normalized values are now in ordinal scale. This normalisa- tion was chosen because several disruptive factors can have an infl uence on the values of these indicators (see discussion). Through normalising the values to the ordinal scale the absolute distance between two values has less infl uence on the total values of the ASEs. The search engines’ total values were determined by calculating the arithmetic average of their three normalized values. Finally, the rankings were created and sorted in descend- ing order based on the total values. If search engines share the same total value within a category then the search engine with the higher WIF ranks higher.

4. Discussion The interpretation of our ranking is limited by our calculation of the total values and by potential inaccuracies of our indicators. The total values were calculated by summing up the equally weighted ordinal scaled values of the indicators. The distances and quotient from ordinal scaled values cannot be interpreted. Hence, the total values can provide only the ranking order within the categories. When interpreting the collected values of the indicators the following aspects have to be considered. A search engine might be available at several URLs. So links to a search engine’s web pages might be created by using diff erent URLs. We have used only one URL for each ASE to retrieve the indicators’ values. Therefore, the values and ranking refer to the URL which we used for the data collection. For this reason we might be able to explain why Google’s image search engine is not listed among the top ten for example. The values of Google’s image search engine were collected by using the URL http://www.google. com/imghp. On the 6th of October in 2010, we received a WIF of 7881 for this URL. When we retrieved the WIF by using the URL http://images.google.com/ we received a value of 289840 on the same date. Hence, Google’s image search engine would have taken a higher rank if we had used the URL http://images.google.com/ for the determination of its WIF. An ASE which is available at only one URL might has a higher WIF than an ASE avail- able at more than one URL when the total number of external inlinks is distributed across multiple URLs. We used only one URL per search engine to retrieve its WIF, so there might

6 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine 2,3 4,8 5,8 6,8 7,2 9,0 9,6 9,8 12,6 13,2 ic Rank by Rank by Rank by Average Table 1 Table Ranking of the Search Engines) ASEs (Image le.com/ 650970 7 9580 5 5,5 19 ickr.com/ ickr.com/ 23839721 9 36 1 1 5 2. Photobucket http://photobucket.com/ http://photobucket.com/ 2. Photobucket 3. deviantART http://www.deviantart.com/ 4. SmugMug 9279912 http://www.smugmug.com/ 856388 http://images.search.yahoo.com/ 5. TinyPic Image 37621 7 7 Search 1311179 6. Yahoo! http://www.tinypic.com/ 6 7 http://www.bing.com/images Images http://www.picsearch.com/ 117 84 7. Bing 240084 1934 32139 7 11273 34325 8. morgueFile 4 http://www.morguefi 7 14 Picsearch 4 6 2 5,5 9. TinEye 3 12 209 10. 20 5,5 5,5 http://www.tineye.com/ 22 5,5 13 6 13 8 7 12 5,5 44623 2 13 6 10 3302 3 11 13 14 SearchImage Engines Name Traff 1. http://www.fl Alexa URL Page WIF Rank Rank WIF Rank Page Alexa TR Rank

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 7 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked 1,7 5,3 5,3 6,7 6,7 9,0 9,3 4,0 4,0 4,7 5,3 7,7 7,8 9,3 9,7 10,0 10,3 11,7 10,0 11,7 ic Rank by Rank by Rank by Average Table 2 Table Ranking Search Engines) of the ASEs (Video and Audio ndsounds.com/ 47125 6 61795 5 ndsounds.com/ 2 7 Audio Search Engines Name Traff 1. Video YouTube 2. MySpace http://video.google.com/ Videos http://vids.myspace.com/ Alexa 3. Google http://www.youtube.com/ 2453760 1284508 4. Dailymotion 12231721 Video 9 http://www.dailymotion.com/ URL 29575135 8 5. Aol Page 7 http://video.aol.com/ 6. Vimeo 9 30 106 7. Veoh Video 1 http://www.truveo.com/ Search 8. Truveo 4073991 http://www.vimeo.com/ 9 13260906 11 3 7 3 672840 http://www.veoh.com/ 9. Metacafe 7 2 9 10. Megavideo 4 8 3656 http://www.metacafe.com/ 47 WIF 5293294 http://www.megavideo.com/ Audio Search Engines 177 1695533 1 1060852 7 Rank 5 2 1. Mp3Raid 7 6 1 9 6 13 2. BeeMP3 456 Rank http://www.mp3raid.com/ http://www.fi 216 2 8 3. FindSounds 8 2 http://beemp3.com/ 4. Mp3Bear 83 hunting 75643 5 http://www.mp3hunting.com/ 10 WIF seekasong 5. MP3 174272 4 65159 http://www.seekasong.com/ 20 5 Rank Page http://mp3bear.com/ 7372 6 6. 12 1 2 Alexa TR 14,5 12 8 8 5 Rank 7. MusicRobot 878 http://www.musicrobot.com/ sounds 8 59928 8. 9131 7846 16 public domain 15 5 13 5 22049 3 http://www.pdsounds.org/ 9. Soungle 299222 12 5 482 10. podscope 7 7 11 3917 http://www.soungle.com/ 7 8864 http://www.podscope.com/ 4 2729 7 6 5 2 6 8377 4 7 557012 551114 6 7 10 1201159 14 1 9 3 10 2 2 14 7 23 13

8 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine ic Rank by Rank by Rank by Average Table 3 Table Ranking Search Engines) of the ASEs (Science and News Science Search Engines Name Scirus Traff 1. Google Scholar CiteULike 2509 8 2. http://scholar.google.com/ http://www.citeulike.org/ http://www.scirus.com/ 8511 9 6,5 269405 8 13077 3. 18390 8 11966 INFOMINE Alexa http://infomine.ucr.edu/ 181591 9 37579 3 6,2 CiteSeerX 2 4. Science.gov 8 488890 http://citeseerx.ist.psu.edu/ 6,5 6,5 4 1055117 5. URL 7 6,5 2 Page 5 6. 8 9 http://www.science.gov/ 4 15 9,5 4,5 81750 6,2 vascoda 7. WorldWide 6 4,0 http://www.vascoda.de/ Science 424202 5 6,5 8 248416 8. 130264 1 8 http://news.google.com/ 9 6,8 3909933 9. http://worldwidescience.org/ News Search microsoft.com/ 1 9 Academic 2 2,5 10. BASE 709615 WIF 1 1 1,8 Google http://academic.research. News Search Engines Topix 2799 8 Rank 1367969 3 2,5 Wikio 1. http://www.base-search.net/ 3 2 12 5,8 http://www.topix.com/ 8 Rank 2. 838 1184579 7908 http://www.wikio.com/ 6475 4 2,5 3. 7 7 4,5 4. 2 News AOL WIF 5 6 15562 1 News 581195 5. Lycos http://news.search.yahoo.com/ Rank Page News 4 Alexa TR 6. NewsNow http://www.aolnews.com/ 40628 7 33528 14 18,5 Rank http://www.pressdisplay.com/ News http://news.lycos.com/ 11 7. World 1,3 12 10 Yahoo! 23 2 11,5 News 15864876 8. Bing 7 http://www.newsnow.co.uk/ http://wn.com/ 9. 12 5,3 PressDisplay 17 12,0 Search 1046204 11 6 http://www.bing.com/news 440714 10. 7 10 17 7 312326 661 10,7 1995 5 1628 161007 1 2 7 5 22 6 10,0 12 1718 7 7 7 5 8 18,5 11 8 6,0 7 7,7 3 7,0 9,5 9 8,0

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 9 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked ic Rank by Rank by Rank by Average Table 4 Table Ranking of the ASEs (Twitter and Blog Search Engines) Ranking of the ASEs (Twitter Traff Alexa Page http://tweetmeme.com/ SearchTweetMeme Engines Twitter 7 381 Name 444353 Search 1 2,5 1. Twitter Twingly 2. 3 2,2 http://search.twitter.com/ http://www.twingly.com/ 3. Twellow Microblog 70000 7 19523 Search 8668 http://www.twellow.com/ 6 microblogsearch 386617 4 91133 monitter 3 6,5 31428 6 75161 2,5 URL 4. Twitscoop 5 4,8 5 http://www.monitter.com/ 8 6,5 5. http://www.twitscoop.com/ 6 BackTweets 15841 6 48715 4,2 http://backtweets.com/ 6. 10 7 7,2 8 1326 6 80088 897 7. 6,5 21577329 Twitterfall 14 19323 5 21605 1 1,5 http://tweepsearch.com/ 6,5 9 Scan 8. http://www.twitterfall.com/ Tweet http://www.technorati.com/ TweepSearch 9 6 12,5 3 1,8 9. 7,5 11 10,5 3589 5 82826 10. 7 http://www.tweetscan.com/ Technorati WIF 8 8,5 2 12,5 Topix Blog Search Engines 12 10,8 Rank 1. 3282 6692157 7 22492 http://www.topix.com/ http://www.blogarama.com/ 1 8 2. 838 1186208 2 Rank 6 1,5 Blog 6 3. Google Blogarama 5 Search 2580554 7 28764 2 3,2 Bloggernity 121577 http://www.bloggernity.com/ WIF 4 4. 8 http://blogsearch.google.com/ 2 5,0 Rank Page 5 5. 661457 Alexa TR Catalog 561120 6 40390 6. Blog Rank 9 http://www.blogsearchengine. 1,7 9 Search 6,0 7. 7 9 IceRocket Bloghub http://www.blogcatalog.com/ 8. 9 of the Best Web Blog http://www.bloghub.com/ 3048695 6,5 2173623 6 41114 9. http://blogs.botw.org/ 10 9,3 http://www.icerocket.com/ 5 10. 1 4 9 Engine 146374 15 11 8,3 123708 910 7 7 10,2 com/ 7 5 3 5590 4599 13,5 12 1 13 4 5 4,3 5 6,8 6 5 7,7 7,7

10 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine ic Rank by Rank by Rank by Average Table 5 Table Ranking of the ASEs (Social Search Engine and Q&A Services) Social Search Engines Name 1. Digg 2. Wong Mister 3. StumbleUpon http://www.mister-wong.de/ 4. Mixx http://www.stumbleupon.com/ 21681633 39270657 http://digg.com/ Diigo 5. reddit URL 6. Delicious 8 http://www.diigo.com/ 9 http://www.clipmarks.com/ Clipmarks 1556 3529 8 6 6 7. 11 8,5 http://answers.yahoo.com/ 1472296 5639366 8661 5 8,5 http://www.mixx.com/ http://www.social-book 4 8. Connotea Social-Book 8 9,2 http://www.reddit.com/ http://www.delicious.com/ 170 2 1,5 2085 6 6,5 9. Answers 1 1,5 30364573 10. 29975 4 11171 http://www.connotea.org/ 1444187 10 marking.Net 12,5 8060092 3 http://www.answers.com/ 2545418 Yahoo! 1 8 8 142 marking.net/ 278705 WIF 10 10,8 7 Q&A Search Engines 3 1,5 Answers.com 8 127962 8 2 2,2 1. Rank 4 115 http://www.mahalo.com/ 1 ChaCha 2. 268 8 1545 5 1197 http://www.chacha.com/ 1711 http://www.allexperts.com/ 6 12259 AllExperts 6 9,5 http://answers.wikia.com/ 245 Rank 3. Answerbag 22830 1868 5 5,5 6 Answer 39289 6 Wikianswers 4 5,5 5 6,8 4. 203 1909 2 4574 10 5,5 6 5,5 2 http://www.answerbag.com/ 7 5. 7 7 5,5 4 WIF 3 6,2 Mahalo 6 6. 22532832 Rank Page Alexa TR 3,0 4 8 3,0 Rank 7 7 7. 4 4 8. Ask Me Help FunAdvice 4 1407 http://www.funadvice.com/ http://www.questionbin.com/ Desk 1 6323 5 89185 QuestionBin 4 http://www.askmehelpdesk. 8011 5 11219 5 9. wiki/Wikianswers 3 8 7 9,5 2,3 10. 9,5 1 9 6,0 com/ 4,3 13 answers 10,2 4,3 2258 9 8,5 7,0 3 6 6872 4 2,7 9 5,5 8 7,5 Traff Alexa Page

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 11 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked ic Rank by Rank by Rank by Average Table 6 Table Ranking of the ASEs (People Search Engines) Ranking of the ASEs (People 2. 123people http://www.123people.com/ 4700007 6 1690 6 4700007 1 2,5 1266 6 302565 2 2,5 3 2,2 1 1,8 http://search.intelius.com/ Intelius http://www.123people.com/ Search Engines People Name 123people Traff 1. Pipl 1521 6 2. 29422 http://www.pipl.com/ 4 2,5 3. Alexa 2 2,8 http://www.peekyou.com/ PeekYou 4. Wink 5. URL 25446 5 10199 Page 6. yasni 5 7. yoName 6 http://wink.com/ http://www.yasni.com/ http://www.yoname.com/ 5 5,3 870 1779 WIF 5 5 292412 33998 Rank 5190 6 7 Rank 6 17262 6 WIF 6 Rank Page Alexa TR 7 3 Rank 4 6,7 2,5 5,3 6 3,8

12 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine be a number of external inlinks which were not measured by our method. This problem cannot be solved though because on the one hand we cannot determine the number of URLs a search engine is available at. Another problem is that there might be more services available at one URL than a search function. So a link to this URL does not only represent a reference to a search en- gine. One example is the online communities and portals for hosting and sharing. They are highly ranked in the categories image search engines and video search engines (e.g., Flickr, YouTube). Beside making a search function available these online communities and portals are used for saving and managing documents and communicating with other users. We cannot identify the infl uence of these additional functions on the values of our indicators WIF and PageRank. There is a similar problem with the Alexa Traff ic Rank. The Alexa Traff ic Rank measures only the data for the second level domain of an URL. So we cannot identify the infl uence of other functions, which are available at the same second level domain, on our values. Especially the search engines with other popular services available at the same URL might benefi t from this fact. It might be obvious that alternative search services, for example the ones from Google, Yahoo and Bing, benefi t from their popular universal search service. Because the crawler cannot gather all the links on the web, the number of external in- links which were determined for this ranking might not necessarily represent the actual existing number of external inlinks for the ASEs. Furthermore, the reasons for creating a link can be very diff erent (Bar-Ilan [21]; Björneborn & Ingwersen [5]; Thelwall [22]). A link does not have to represent a recommendation. Especially the subjective arrangements for increasing a web site’s visibility on the web (e.g. through search engine optimization) might not be detected by the automatic procedures which are collecting the data for our indicators. Hence, the values for our indicators might be infl uenced by such arrangements. We cannot identify which of these ASEs in our rankings have infl uenced values. Neverthe- less, the number of external inlinks can help to create a ranking because these problems are the same for all of our research objects. While interpreting the indicators we have to take into account the date or period of time the data have been collected. The data for the WIF are collected continually and re- petitive by the Yahoo Site Explorer crawler. Thus, our values for the WIF were up to date. The data for the PageRank which is shown in the Google Toolbar is updated every three to four month (Cutts [23]). So the values of the PageRank retrieved for this ranking were cal- culated up to four month ago. The Alexa Traff ic Rank is calculated everyday based on the web usage data collected in the last three month. Hence, the Alexa Traff ic Rank represents an up to date value but is based on data from the last three month. To sum up, the total value of an ASE is based on one indicator (the PageRank) which values were calculated up to four month ago. Finally, due to the method for the determination of the universal set, our rankings in- clude only ASEs in the English or German language area. However, there might be very popular ASEs in other language areas on the web which do not appear in this ranking (e.g. ASEs from Baidu, Yandex, Naver).

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 13 The Top 10 Alternative search Engines (ASE) - within Selected Categories Ranked

5. Conclusion and Future Work With the help of webometric indicators we were able to create a ranking of ASEs within selected categories. Our ranking creates a picture of the most popular ASEs currently avail- able. This paper could be seen as the beginning of a couple of studies which will explore the development of the search engine market and the dynamic of ASE categories. How- ever, the signifi cance of the values is limited due to the indicators we used and the calcula- tion of total values. Hence, conclusions based on this paper always have to be explained within the context of this ranking and its weak points. In contrast to our approach used in this paper, a ranking can be achieved by conduct- ing a survey of the popularity of the ASE as well. Thereby, our results and the results of a survey can complement one another.

References [1] Maaβ, C., Skusa, A., Heβ, A., Pietsch, G. Der Markt för Internet-Suchmaschinen. In: D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen. Heidelberg, Aka Verlag, 2009, pp. 3–17. [2] Gelernter, J. At the limits of Google: Specialized search engines. 2003. http://www.allbusiness. com/technology/software-services-applications-search-engines/10603593-1.html. Accessed August 6, 2010. [3] Lewandowski, D. Spezialsuchmaschinen. In: D. Lewandowski (Ed.), Handbuch Internet- Suchmaschinen. Heidelberg: Aka Verlag. 2009,pp. 53-69. [4] Ingwersen, P. The Calculation of Web Impact Factors. Journal of Documentation. Vol. 54(2), 1998, pp. 236-243. [5] Björneborn, L. and Ingwersen, P. Perspectives of webometrics. Scientometrics.Vol. 50(1), 2001, pp. 65–82. [6] Thelwall, M., Vaughan, L. & Björneborn, L. Webometrics. Annual Review of Information Science and Technology. Vol. 39, 2005, pp. 81–135. [7] Page, L., Brin, S., Motwani, R. and Winograd, T. The Page Rank Citation Ranking: Bring- ing Order to the Web. 1998. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf. Accessed September 10, 2010. [8] Alexa Internet. About the Alexa Traff ic Rankings. 2010. http://www.alexa.com/help/ traff ic-learn-more. Accessed August 15, 2010. [9] Broder, A. A taxonomy of web search. 2002. http://www.sigir.org/forum/F2002/broder. pdf. Accessed July 19, 2010. [10] Baeza-Yates, R. and Ribeiro-Neto, B. Modern . New York, ACM Press, 1999. [11] Sherman, C. and Price, G. The Invisible Web: Uncovering Information Sources Search Engines Can’t See. Medford, Information Today, 2001.

14 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) Bernd Markscheff el and Bastian Eine

[12] Skusa, A. and Maaβ, C. Suchmaschinen: Status quo und Entwicklungstendenzen. In: D. Lewandowski and C. Maaβ (Eds.), Web-2.0-Dienste als Ergänzung zu algorithmischen Suchmaschinen. Berlin, Logos, 2008, pp. 1-11. [13] Lewandowski, D. Einstieg in Real Time Search. 2009. http://www.bui.haw-hamburg.de/ fileadmin/user_upload/lewandowski/doc/Real_Time_Suche_Lewandowski.pdf. Accessed July 22, 2010. [14] Berners-Lee, T., Hendler, J. and Lassila, O. The Semantic Web. 2001. http://www.scientifi - camerican.com/article.cfm?id=the-semantic-web&page=2. Accessed September 13, 2010. [15] Weinhold, T., Bekavac, B., Hierl, S., Öttl, S. and Herget, J. Visualisierungen bei Internetsuch- diensten. In: D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen. Heidelberg, Aka Verlag, 2009, pp. 249-282. [16] Bekavac, B., Herget, J., Hierl, S. and Öttl, S. Visualisierungskomponenten bei Web-basierten Suchmaschinen: Methoden, Kriterien und ein Marktüberblick. In: IWP - Information Wissenschaft & Praxis, Vol. 58(3), 2007, pp. 149-158. [17] Griesbaum, J. Entwicklungstrends im Web Information Retrieval: Neue Potentiale für die Webrecherche durch Personalisierung & Web 2.0-Technologien. 2007. http://www.web- information-retrieval.de/files/volltext_kollaboratives_retrievalfinal_040707.pdf. Accessed July 22, 2010. [18] Eine, B. & Markscheff el, B. Alternative Websuchdienste: ‹bersicht und Vergleich. Ilmenauer Beiträge zur Wirtschaftsinformatik, 2011-01. [19] Thelwall, M. Link Analysis: An Information Science Approach. San Diego: Academic Press. 2004. [20] Vaughan, L. A New Frontier of Informetric and Webometric Research: Mining Web Usage Data. 2008. http://www.collnet.de/Berlin-2008/VaughanWIS2008nfi .pdf. Accessed September 14, 2010. [21] Bar-Ilan, J. Data collection methods on the Web for informetric purposes – A review and analysis. Scientometrics. Vol. 50(1), 2001, pp. 7–32. [22] Thelwall, M. What is this link doing here? Beginning a fi ne-grained process of identify- ing reasons for academic hyperlink creation. Information Research. Vol. 8(3), 2003. http:// informationr.net/ir/8-3/paper151.html. Accessed September 12, 2010. [23] Cutts, M. More info on PageRank. 2006. http://www.mattcutts.com/blog/more-info- on-pagerank/. Accessed August 17, 2010.

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 15