Performance Analysis of 5A's Web Robots

Home , Aliweb, AlltheWeb, Dogpile, Gigablast, Search engine

Full Length Research Paper

Performance analysis of 5A 's web robots

Ijaz Ali Shoukat and Mohsin Iftikhar

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia.

Accepted 6 May, 2011

Human nature is greedy to follow less effort heuristics in seeking of scientific literature. Despite of manual literature investigation, web robots are vital tools that provide vision to gigantic passageway of WWW (World Wide Web) knowledge for information seekers. The reputation and implication of web robots need not be introduced further. Optimal searching approach requires efficient response and reliable services with limited time to quest large array of worldwide information. The list of search engines is too vast with varying index size and performance characteristics. This study evaluates performance metrics like (1) response time, (2) load, (3) page download time, (4) health status, (5) down time, (6) uptime and (7) reliability for popular search engines (AOL, Ask, AltaVista, Alltheweb, Aliweb) starting with initial alphabetic ‘A’ so called 5A’s Web Robots. These 5A’s crawlers have not been evaluated with such technical characteristics before. This paper evaluates selected web robots under practical estimation of technical factors with maximum capacity of seven effective performance metrics. We used two web bench marking tools (Mange Engine Application Manager and Web Application Testing WAPT 6.0) under active probing techniques to obtain experimental results as this approach has active literature support. Finally, this study concludes new and beneficial results for researchers.

Key words: Performance analysis, web robots, search engine’s evaluation, hybrid, Meta crawlers.

INTRODUCTION

WWW (World Wide Web) domain has effective and more primary robots to show searching result in painstaking influence from last two decades with massive combined fashion for end user. (3) Hybrid crawlers like growth. Researchers, scientists, students and teachers Mamma search engine follows the combined approach of are trendier to use search engines rather to library or Meta search engine and primary search engine in such a other physical searching resources. Nowadays, web way they posses their own indexing database as well as searching tools have revolutionary impact in science and lend services from two or more primary robots to show discoveries. A long list of searching tools is available but large capacity of searching results in combined fashion. in this paper we studied only some popular tools as (4) Human powered scholar directories like IEEE summarized in Table 1. Web robots are same term as explorer, ACM, and OpenJGate have their own document web spiders or web crawlers or the search engines. Web databases built by scholars. (5) Informative Paid or free robots or search engines can be categorized into five Inclusion like Scribd and DocStoc maintain their own classifications: (1) Primary Crawlers like Google, Yahoo, database to store open quality public documents with pay MSN, Aliweb and Ask as these all have their own or without pay. These kinds of directories are open indexing database with ability to cache some parts of file access that is why their documents have no quality. automatically. (2) Meta crawlers like AOL, Excite, In this paper, we evaluated the five web crawlers Altavisat, Lycos, AlltheWeb and Dogpile do not maintain (Altavista, Ask, AOL, Alltheweb, Aliweb) with large capa- their own indexing database but lend services from two or city of seven performance metrics with large selection of search engines because previous studies provide limited criteria for comparative analysis with limited number of search engines. Previous studies ignored most important *Corresponding author. E-mail: [email protected]. performance metrics like up time, down time, meant

Shoukat and Iftihkar 2565

Table 1. Category and ranking of web crawlers.

Starting year (Dujmovi ć Information Search engine Category Popularity ranking and Bai, 2006) coverage Aliweb 1993 Primary robot **

Hybrid search engine as it has its own indexing Ask 1996 database and it is also powered by Teoma (Das ****** and Nandy, 2010)

Altavista Meta robot ******* AlltheWeb 1999 Meta Power by Yahoo ** AOL 2003 Meta search engine powered by Google ***

Meta search engine as it fetches results from Lycos 1995 *** yahoo and Looksmart

Meta search engine powered by Google, Yahoo!, Dogpile 1996 **** Bing and Ask

Primary search engine with small indexing Teoma 2001 database. Teoma contains indexes of 200 million *** pages (Sherman, 2002)

time between failures (MTBF), mean time to repair 2006). A comparative analysis of server side and client (MTTR) and health severity for all search engines related side searching tools was reported in a study (Chau et al., studies but this study provides comparative evaluation of 2002) in which design of two techniques was discussed selected search engines with all these factors in addition in the light of advantages and diversities to conclude, that to response time and reliability. the selection of searching methods are concerned with We also concluded that a day of week with more loads the motive of their development and usages limit. which means least response and most loaded hour McCown and Nelson (2007) conducted a study for among 24 h which means least response and vice versa. analyzing the Application Programming Interfaces (APIs) We implemented active probing (Cherkasova et al., 2002) and Web User Interfaces (WUIs) of Google, Yahoo and technique by utilizing two web application bench marking MSN. In this study, the measurement methodology was tools: Mange Engines Application Manager by Zoho observing the top 10 and 100 search results against a Corporation and Web Application Testing Tool (WAPT variety of searching quires. The other metrics for this 6.0), which is a consistent and cost effective approach evaluation were total number of indexing of web pages (Vail, 2005; Gupta et al. (2010; Horák et al., 2009; Rajput and caching of URLs. Consequently, the same study of et al., 2010) used to obtain experimental results. McCown and Nelson (2007) concluded that MSN APIs are able to produced good results than Google and Yahoo APIs because Yahoo and Google APIs are not LITERATURE REVIEW older but smallest than MSN. According to Bar-Iian (1998), when we search some information via search In 2006, Dujmovi ć and Bai (2006) reported that prelim- engines, some information is lost or dropped by the same nary effort on searching techniques was started in 1993 search engine and might be possible to lost measurable and later on it became the part of published literature portion of information in subsequent tries. In Bar-Ilan (Chu and Rosenthal, 1996). Many studies (Sherman, (1998) study, six search engines, Altavista, Excite, 2002; Das and Nandy, 2010; Baeza-Yates and Ribeiro- Hotbot, Infoseek, Lycos and NothernLight were evaluated Neto, 1999; Chakrabarti, 2003; Jones and Willett, 1997; with a query “Informetrics or Informetric” as their results Hawking, 2006) have been published in next decade from shown in Table 2. 1993 in order to enhance the searching methodology for Now, the web storage has been increased to 11.5 public use. Four search engines, Google, Yahoo, Msn Billion pages (Gulli and Signorini, 2005). To investigate and Ask have been quantitatively evaluated by using LPS the overlaps among major search engines the Spink and method in which Google remained superior than Msn and Jansen (2006) collected 10316 random queries from a MSN is proved better than Yahoo (Dujmovi ć and Bai, search engine named as “Dogpile”. After that, an overlap

2566 Sci. Res. Essays

Table 2. Relative coverage of search engines (Bar-Ilan, 1998).

Total relative Average relative Relative coverage by Relative coverage by Search tools coverage (%) coverage (%) Lawrence and Giles (%) Bharat and Broder (%) ALtavista 52.8 42.6 46.5 62.5 Excite 50.5 20.3 23.1 20.0 Hotbot 44.9 46.3 57.5 48.1 Inforseek 16.5 21.2 16.5 16.9 Lycos 11.2 8.8 4.4 - NorternLight 24.5 31.8 32.9 -

algorithm is applied to collect the percentages of overlaps unable to find most relevant keywords regarding to their results with 31.5% in Yahoo, 25.8% in Google and 27.7% desired search. For these kind of peoples and for those in Ask. On the other hand, the shared results percentage, information which does not indexed by search engines, remained 3.3% for Google and Yahoo, 6.3% for Google the social networks play a major role by providing the and Ask and 2.2% for Ask and Yahoo. The more facility for questioning an answering. (Morris et al. (2010) interesting finding is that the unique results from all three reported the comparison of social networks and search search engines (Google, Yahoo and Ask) are only 3%. engines. Furthermore, this study differentiates the pros The notable thing is that for 26.4% queries, the results and cons of searching vs. asking as asking is easy with remained un-sponsored from either Yahoo or Google. In social networks and searching is easy with search a summarized conclusion of Spink and Jansen (2006) engines but both have their own merits and demerits. study, overlaps among major search engines are too low. Murugesh et al. (2010) reports the comparison of three In searching point of view, the quality of searching results search engines (Google, Yahoo, MSN) by analyzing two counts for scholars to justify their opinions. To evaluate properties, (1) total search results against a same query quality metrics (corpus size, index freshness and for all three search engines, and (2) searching time for duplication in results), Yossef and Gurevich (2007) con- the same query by each search engine. This study found ducted a study which adopted the sampling procedure for that Google is better than Yahoo and Yahoo is better generating results for the aforementioned quality metrics. than MSN. Furthermore, this study reported some other World Wide Web comprises of huge set of information results which are shown in a Table 4. which creates hurdles for users in terms of seeing all the According to US Search Engine’ Report comScore set of searching results via any search engine; therefore, Releases April 2009, the core percentage of search done the ranking of results is needed. To provide this ranking by Google, Yahoo, MSN, Ask and AOL is as shown in a support, Bar-Ilan (2005) study became the part of Table 5. literature in 20, April 2005 with this statement that the In this study, we have practically analyzed the Google, ranking differs by algorithms of search engines and the Yahoo, MSN, Gigablast, Mamma and Excite with maxi- better ranking depends upon the eyes of user, however mum capacity of more performance metrics. There are the coverage against various set of queries (q1, q2, q3 still no such study is available which provides the experi- ………qn); the number of shown results are indicated in mental analysis with such maximized exposure as we Table 3 with coverage percentage as shown in the have done. square brackets (Bar-Ilan, 2005). Up to date repository of search engines represents the MATERIAL AND METHODS quality of fresh indexing with new information. For checking the freshness, Dasdan and Xinh (2009) In order to find out the experimental results against performance conducted a study which introduced staleness frequency metrics like latency, response time, Page download time, measurement of already stored document as metric to availability, health status and reliability. We used active probing check the freshness of searching repository. According to technique rather than Web Page Instrumentation (Cherkasova et al., 2003). Non- technical readers are referred to consult this study, if a Link or pages is viewed or clicked too Cherkasova et al. (2003) to know the basics about active probing many times, it means it is old and not fresh and which and web instrumentation methods. Our experimental setup follows pages is clicked or viewed only few times it means it is the following steps: fresh. Rangaswamy et al. (2009) reported the importance of search regarding the business point of view in such a 1. Installation of WAPT 6.0 and Manage Engine Application way what we can do with search engines exactly? Manager 9 on a single fixed point machine (Windows XP OS, RAM 2GB, Processor 2.4 GHz., firewall was disabled) at Department of Some layman users do not professional in searching Computer Science, King Saud University, Riyadh KSA. their desire information that is why sometime they are 2. Definition and creation of URL instances against each desired

Shoukat and Iftihkar 2567

Table 3. Result coverage (Bar-Ilan, 2005).

Query Result’s coverage Query Total No. Google AlltheWeb AltaVista HotBot 58 88 q01 ‘‘relative ranking’’ TREC 30[34%] 24[27%] 6[7%] [66%]

177 40 27 30 q02 ‘‘SIGIR 2004’’ 219 [81%] [18%] [12%] [14%]

‘‘everyday life 170 114 60 67 q03 262 information seeking’’ [65%] [44%] [23%] [26%]

137 78 12 63 q04 ‘‘natural language processing for IR’’ 229 [60%] [34%] [5%] [28%]

‘‘multilingual retrieval’’ 22 6 4 4 q05 30 Hebrew [73%] [20%] [13%] [13%]

568 292 114 56 q06 ‘‘relative ranking’’ IR 792 [72%] [37%] [37%] [7%]

‘‘social network analysis’’ 748 408 169 206 q07 1120 ‘‘information retrieval’’ [67%] [36%] [15%] [18%]

269 209 115 84 q08 ‘‘link mining’’ 464 [58%] [45%] [25%] [18%]

‘‘citation analysis’’ 377 157 65 57 q09 652 ‘‘link structure” [58%] [24%] [10%] [9%]

bibliometrics 315 106 60 46 q10 382 ‘‘link analysis’’ [82%] [28%] [16%] [12%]

relevance ranking 1000 (1640) 696 (1233) 426 (708) 345 (540) q 11 1791 ‘‘link analysis’’ [56%] [39%] [24%] [19%]

1000 (2480) 1093 (5675) 1000 (1100) 421 (840) 2477 q 12 ‘‘Cross-language retrieval’’ [40%] [44%] [40%] [17%]

1000 (3960) 1088 (8346) 1000 (1177) 671 (1492) q 13 ‘‘Question answering’’ IR 2709 [37%] [40%] [37%] [25%]

1000 (1,090,000) 1100 (1,091,278) 994 (264,826) 449 (247,992) q 14 ‘‘information retrieval’’ 2914 [34%] [38%] [34%] [15%]

998 (1,730,000) 1100 (1,274,549) 963 (367,411) 450 (384,846) q 15 ‘‘data mining’’ 2856 [35%] [39%] [34%] [16%]

search engine by entering their universal domain name of their 2010 to get automatic reading against the desired performance servers with interval time 10 min in each going hour. bottlenecks (as mentioned earlier) 3. Continuous monitoring of selected search engines with selected 4. Finally, we analyzed and calculated average cases for getting tool (s) for every hour of a day for several weeks from May to June, final results.

2568 Sci. Res. Essays

Table 4. Daily search and ranking (Murugesh et al., 2010).

Daily search US research Search engine Boolean search Index size done (%) ranking Google does not agree on yahoo’s 20 Google 49.2 2 Limited Boolean search billion pages index. May yahoo has more duplicate results than Google

Yahoo 23.8 6 Better than Google 20 Billions pages MSN 9.6 7 Offers full Boolean search ???

Limited index sizes or some are used Rest of search engines 17.4 - - the indexing of Google, Yahoo or MSN

Table 5. Core percentage of search done.

Core percentage of search done in Core Percentage of Search done in Search engine April 2009 (%) March 2009 (%) Google 64.2 63.7 Yahoo 20.4 20.5 MSN 8.2 8.3 Ask 3.8 3.8 AOL 3.4 3.7

Response time : Min. Average: 391.0 ms Max. average : 12281.218 ms Average: 2335.1714 ms

Figure 1. Altavista’s response time.

The utilized benchmarking tools have significance literature support some are in tabular forms. These search engines are in which one is Mange Engine Application Manger 9 by Zoho monitored with Manage Engine Application Manager 9 Corporation and the other is the Web Application Testing Tool and WAPT 6.0 against every hour of each day and night (WAPT 6.0). The evaluation of WAPT reported by Vail (2005) supports its use as a consistent and cost effective approach. Many for many weeks of May and June, 2010, regularly. studies (Gupta and Sharma, 2010; Horák et al., 2009; Rajput et al., 2010) have used WAPT6.0 for getting their experimental results like response time, latency, availability and reliability. Hourly based average response time of Altavista

The average response time against each going hour of all EXPERIMENTAL RESULTS day from May 12, 2010 to June 7, 2010 is graphically shown in Figure 2. The best average response was found The experimental results of top five search engines at 3:00 PM and lowest response time was found at 6:00 (Altavista, Ask, AOL, Alltheweb, and Aliweb) are reported AM which means Google usage is on peak at 6:00 AM in subsequently with graphical representations as well as the whole world.

Shoukat and Iftihkar 2569

Figure 2. Hourly based Altavista’s response time.

Table 6. Altavista’s response time for days of the Week.

Day of week Minimum value in ms Maximum value in ms Hourly average in ms Sunday 359 8,354 2416.48 Monday 359 8,104 2615.55 Tuesday 0 60,012 2877.82 Wednesday 359 41,005 2528.5 Thursday 358 7,447 1919.28 Friday 358 137,464 1833.44 Saturday 359 7,991 2408.13

Response time : Min. Average: 375.0 ms Max. Average: 17873.182 ms Average: 2013.6647 ms

Figure 3. Ask response time.

Average response time of Altavista against days of time which means on Thursday the Altavista is used by the week most of peoples in the world.

The average response time of Altavista against the day of week for all days has been monitor against each going Hourly based average response time of Ask hour from May 12, 2010 to June 7, 2010 as shown in a Table 6. The results of the table report that Tuesday has The average response time against each going hour of highest response time; while Friday has least response each day from May 12, 2010 to June 7, 2010 is

2570 Sci. Res. Essays

Figure 4. Hourly based Ask’s response time.

Response time : Minimum average : 867.0 ms Max. average : 14112.695 ms Average: 4000.869 ms

Figure 5. AOL response time .

Figure 6. Hourly based response time of AOL.

Shoukat and Iftihkar 2571

Table 7. Ask’s response time for days of the week.

Day of the week Minimum value in ms Maximum value in ms Hourly average in ms Sunday 0 11,429 2060.24 Monday 296 8,209 2167.06 Tuesday 297 63,510 2112.39 Wednesday 312 61,754 2067.47 Thursday 296 61,115 1844.05 Friday 297 180,630 1863.09 Saturday 289 38,103 2061.77

Response time : Minimum average : 406.0 ms Max. average : 13368.739 ms Average: 2420.4521 ms

Figure 7. Response time of AlltheWeb search engine.

graphically shown in Figure 4. Average hourly response Average response time of AOL against the days of time at 6:00 PM was found highest and lowest response the week time was found at 06:00 AM which means Yahoo usages is on peak at 06:00 AM in the whole world. The average response time of AOL against the day of week for all days has been monitor against each going hour from May 12, 2010 to June 7, 2010 as shown in a Average response time of Ask against the day of Table 8. The results of the table clearly invoke that week Monday has highest response time, while Friday has least response time for AOL search engine. The average response time of Ask against the day of week for all days has been monitored against each going hour from May 12, 2010 to June 7, 2010 as shown in a Hourly based average response time of AlltheWeb Table 7. The results of the table report that Monday has highest response time, while Thursday has least The average response time against each going hour of all response time for Ask search engine. day from May 12, 2010 to June 7, 2010 is graphically shown in Figure 8. Average hourly response time at 3:00 PM was found highest and lowest response time was Hourly based average response time of AOL found at 06:00 AM which means AlltheWeb usages is on peak at 06:00 AM in the whole world. The average response time against each going hour of all day from May 12, 2010 to June 7, 2010 is graphically shown in Figure 6. Average hourly response time at 3:00 Average response time of AlltheWeb against the day PM was found highest and lowest response time was of week found at 06:00 AM which means AOL usages is on peak at 06:00 AM in the whole world. The average response time of AlltheWeb against the day

2572 Sci. Res. Essays

Table 8. AOL’s response time against days of the week.

Days of the week Minimum value in ms Maximum value in ms Hourly average in ms Sunday 500 12,319 4209.46 Monday 500 110,749 4631.94 Tuesday 484 66,773 4459.73 Wednesday 453 105,552 4188.71 Thursday 467 10,386 3438.55 Friday 0 66,784 3030.81 Saturday 0 70,621 4396.77

Figure 8. Hourly based AlltheWeb’s response time.

Response time : Minimum Average: 31.5 ms Max . average : 7369.0835 ms Average: 2546.469 ms

Figure 9. Response Time of AliWeb.

of week for all days has been monitor against each going 2:00 PM was found highest and lowest response time hour from May12, 2010 to June 7, 2010 as shown in was found at 10:00 PM which means AliWeb usages is Table 9. According to the results of the table, Tuesday on peak at 10:00 PM in the whole world. has highest response time, while Thursday has least response time for AlltheWeb search engine. Average response time of AliWeb against days of the week Hourly based average response time of AliWeb The average response time of AliWeb against the day of The average response time against each going hour of all week for all days has been monitor against each going

Shoukat and Iftihkar 2573

Figure 10. Hourly based response time of AliWeb.

Table 9. Response time against days of the week for AlltheWeb.

Day of the week Minimum value in ms Maximum value in ms Hourly average in ms Sunday 340 8,444 2853.33 Monday 359 7,823 2709.03 Tuesday 343 73,143 3150.78 Wednesday 343 11,274 2534.27 Thursday 357 8,586 1723.81 Friday 358 7,854 1729.12 Saturday 359 8,693 2602.58

Table 10. Response time of AliWeb against the days of the week.

Day of the week Minimum value in ms Maximum value in ms Hourly average in ms Sunday 15 33,967 2492.56 Monday 0 35,040 2601.92 Tuesday 15 35,299 2663.03 Wednesday 15 34,231 2433.26 Thursday 0 38,219 2575.14 Friday 15 58,785 2599.38 Saturday 15 37,723 2475.83

hour from May 12, 2010 to June 7, 2010 as shown in a the AOL, Ask and Aliweb. Table 10. The results of the table show that Tuesday has highest response time, while Wednesday has least response time for AliWeb search engine. Availability of search engines

The availability of any search engines, depends upon the Average page download time results in seconds AOL, Ask and Aliweb their server’s up time, down time, mean time between failures (MTBF) and mean time to The graphical representation of Table 11 is shown in a repair (MTTR) as shown in Table 12. The results of the Figure 11 which clearly shows that main page down load table claim that up time percentages of all search engines time of Altavista and AlltheWeb is same and superior to are almost same but down time, MTBF and MTTR has

2574 Sci. Res. Essays

Table 11. Average page download time results.

Day AltaVista (s) Ask (s) AOL (s) AliWeb (s) AlltheWeb (s) 15 May 2010 0.055 1.064 0.357 1.671 0.032 16 May 2010 0.029 1.018 0.311 1.588 0.031 17 May 2010 0.051 1.217 0.296 1.636 0.038 18 May 2010 0.054 1.311 0.199 1.852 0.061 19 May 2010 0.045 1.334 0.177 1.715 0.026 23 May 2010 0.032 1.353 0.185 1.566 0.034 14 June 2010 0.027 1.439 0.192 1.777 0.053 15 June 2010 0.049 1.285 0.195 1.770 0.045 16 June 2010 0.040 1.334 0.169 1.735 0.020 17 June 2010 0.039 1.333 0.195 1.544 0.032 18 October 2010 0.043 0.962 0.337 1.719 0.063 25 October 2010 0.036 0.946 0.244 1.603 0.041 05 November 2010 0.027 0.909 0.220 1.666 0.048 Average page loading time (s) 0.040 1.192 0.236 1.680 0.040

Figure 11. Avgerage download time of main page.

Table 12. Down time report from May 12 to June 6, 2010.

Factor AltaVista Ask AOL AliWeb AlltheWeb Up time (%) 99.91 99.71 99.54 100 99.91 Total down time (%) 32 min 38 s (0.09) 1 h 44 min 19 s (0.29) 2 h 47 min 55 s (0.46) 0.0 33 min 34 s (0.09) MTTR 16 min 19 s 34 min 46 s 13 min 59 s 0.0 s 16 min 47 s MTBF 300 h 52 min 17 s 200 h 21 min 37 s 49 h 59 min 5 s 0.0 s 301 h 6 min 26 s

considerable variations. and Asl’s health is critical than Aliweb but better than Altavista, AlltheWeb and AOL.

Critical health status DISCUSSION AND ANALYSIS The critical or bad health status of all reported search engines is summarized in a Table 13. Hence, the results We found on which day of week and hour of day among of the Table report that Aliweb’s health remains power full 24 h the usage of which crawler is at the peak or least,

Shoukat and Iftihkar 2575

Table 13. Health status from May 12 to June 6, 2010.

Search Health severity percentile based on up Critical heath percentile based on health engine and down time (%) severity (%) Altavista 0.156 31.64 Ask 0.837 23.70 AOL 11.723 51.98 AliWeb 69.709 11.64 AlltheWeb 0.056 33.57

Table 14. Result’s comparison.

Performance metrics Altavista Ask AOL Aliweb AlltheWeb Response time (Average) (s) 2.34 2.01 4.00 2.54 2.42 Highest response day of week Tuesday Monday Monday Tuesday Tuesday Least response day of the week Friday Thursday Friday Wednesday Thursday Highest response hour (PM) 3:00 6:00 3:00 2:00 3:00 Least response hour (AM) 6:00 6:00 6:00 10:00 6:00 Main page download time (s) 0.040 1.192 0.2360 1.680 0.040 UP time (%) 99.91 99.71 99.54 100 99.91 Down time (%) 0.09 0.29 0.46 0.0 0.09

Mean time between failure 16 min 19 s 34 min 46 s 14 min 0.0 s 16 min 47 s (Average)

Mean time to repair (Average) 300 h 52 min 17 s 200 h 21 min 37 s 49 h 59 min 5 s 0.0 s 301 h 6 min 26 s Critical or bad health status (%) 31.64 23.70 52 11.64 33.57 Reliability Medium Average Low Average Medium

which indicates the load ratio for respective day and hour. mean time between failure, mean time to repair and the From seven days of week, Ask and AOL have proved severity of critical health status. The experimental result’s good in response time on Monday but Altavista, Aliweb, comparison (Table 14 and Figure 13) of these study AlltheWeb possess good response on Tuesday averagely shows that Ask web crawler possesses good response which means they bear low load on these days. within (2.01 s) and after that Altavista has (2.34 s) res- Furthermore, within 24 h of a day, Altavista, Ask, AOL ponse time penalty which is superior to AOL, AlltheWeb and AlltheWeb contain least response at 6.00 AM which and Aliweb. Moreover, AlltheWeb is on 3rd position with means more at these respective hours; while Altavista, (2.42 s) response time penalty and AOL search engine is AOL and AlltheWeb contain highest response at 3:00 PM proved to be worst in early responding with response which means more load at these hours. Finally, we time (4.0 s). According to main page download time analyzed the results as summarized in Table 14. Altavista and AlltheWeb is equal in ranking with download However, the Figures (1, 3, 5, 7 and 9) represent the time penalty of (0.040 s) followed by AOL with (0.24 s) minimum, maximum and average ratings of response main page downloading penalty as shown in Figure 12. time against the selected web crawlers (Altavista, Ask, Up time of all reported search engines seems to be AOL, AlltheWeb and Aliweb) respectively. almost same but in case of down time Aliweb is at first position with (0.00%) down time but Altavista and AlltheWeb is at 2nd position with same down time Conclusion (0.09%) followed by Ask with (0.29%) down time as shown in Figure 13. Optimal performance and efficiency is reliant on response According to MTBF once again Aliweb is at 1st position time, download time, server up time, server down time, with no failure and AOL is at 2nd position with (14 min)

2576 Sci. Res. Essays

Percentage

Figure 12. Up and down time.

Figure 14. Health status and reliability.

Time (s) (s) Time

Avg. Response Time (s) Main page Junior Time (s)

Figure 13. Response and page download time.

MTBF penalty but Altavista and AlltheWeb are at 3rd position with same penalty of (16 min). According to MTTR AliWeb is at 1st position and AOL is better than the other ones followed by Ask but again the Altavista and AlltheWeb has close equal competition for 3rd position in this regards as discussed in Table 14 and graphically shown in Figure 15. The reliability of Ask and Aliweb is average and better than the others; while the Figure 15. MTBF and MTTR. reliability of Altavista and AlltheWeb is medium but AOL has Low reliability. According to critical or bad health status, Aliweb is better than the others with health severity of (11.64%) and Ask is at 2nd position withhealth is a close competition among Altavista, Ask and severity of (23.70%) followed by close competition of AlltheWeb. According to response time, health severity, Altavista and Aliweb but AOL contains sever health MTT and reliability, Ask is better than Altavista and status with (52%) critical health penalty as represented in Alltheweb but according to main page download time, Figure 14. down time and MTBF Altavista and AlltheWeb is same in In the light of all performance metrics averagely, there ranking mutually and better than Ask search engine.

Shoukat and Iftihkar 2577

ACKNOWLEDGEMENT Gupta S, Sharma DL (2010). Performance Analysis of Internal vs. External Security Mechanism in Web Applications. Int. J. Adv. Netw. Appl., 01(05): 314-317. We appreciate the “Research Center of College of Hawking D (2006). Web Search Engines. Comput., 39(6): Part 1 and Computer and Information Sciences” for their generous 39(8): Part 2. support in this work. We are really grateful to them in this Horák J, Ardielli J, Horáková B (2009). Testing of Web Map Services. regard. Int. J. Spat. Data Infra. Res. Special Issue GSDI-11. Jones SK, Willett P (1997). Book-Readings in Information Retrieval. Morgan Kaufmann Series in Multimedia Information and Systems. ISBN: 1558604545, Elsevier Publisher. REFERENCES McCown F, Nelson ML (2007). Agreeing to Disagree: Search Engines and Their Public Interfaces. In Proc. ACM, JCDL’07, June 17–22, Baeza-Yates R, Ribeiro-Neto B (1999). Modern Information Retrieval. 2007, Vancouver, British Columbia, Canada. Addison-Wesley Publisher. Morris MR, Teevan J, Panovich K (2010), A Comparison of Information Bar-Ilan J (1998). Search Engine Results over Time-A Case Study on Seeking Using Search Engines and Social. Search Engine Stability. Int. J. Scientomet., Inform. Bibliomet. Murugesh V, Onyango D, Murugesh S (2010). Exploring Google search Cybermetrics. Issues Contents., 2(3) engine functionalities and Google search capabilities, in proc. Inter. Bar-Ilan J (2005). Comparing rankings of search results on the Web. Resch Symp. in Service Manage., pp. 1-12. Inform. Process. Manage., 41(2005): 1511-1519. Rajput S, Vadivel S, Shetty S (2010). Design and Security Analysis of Chakrabarti S (2003). Book-Mining the Web: Analysis of Hypertext and web application based and web services based Patient Management Semi Structured Data. ISBN: 978-1-55860-754-5, Elsevier Publisher. System (PMS). Int. J. Comput. Sci. Netw. Sec., 10(3): 22-27. Chau M, Chen H, Qin J, Zhou Y, Qin Y, Sung WK, McDonald D (2002). Rangaswamy A, Giles CL, Seres S (2009). A Strategic perspective on Comparison of Two Approaches to Building a Vertical Search Tool: A Search Engines: Thought Candies for Practitioners and Researchers. Case Study in the Nanotechnology Domain. In Proc. JCDL ’02, July in J. Interac. Mark., 23 (2009): 49-60. Elsevier. 13-17 ACM 1-58113-513-0/02/0007. Sherman tag">C (2002). Teoma vs. Google, Round Two, cited at Cherkasova C, Fu Y, Tang W, Vahdat A (2003). Measuring End-to-End http://searchenginewatch.com/2159601 Internet Service Performance. In J. ACM Trans. On Inter. Tech., 3(4): Spink A, Jansen BJ (2006). Overlap among major web search engines, 2003. Published in Inter. Res. 16(4): 419-426. Emerald Group Publishing Chu H, Rosenthal M (1996). Search Engines for the World Web Web: A Limited 1066-2243. DOI 10.1108/10662240610690034. Comparative study and Evaluation Methodology. ASIS 1996 Annual Vail C (2005). Stress, Load, Volume, Performance, Benchmark and Conf. in Proc. October 19-24. Base Line Testing Tool Evaluation and Comparison. Cited at http//: Das A, Nandy S (2010). Hybrid Focused Crawler - A Fast Retrieval Of www.vcca.com. Topic Related Web Resource For Domain Specific Searching. Int. J. Yossef ZB, Gurevich M (2007). Efficient Search Engine Measurements, Inf. Tech. Knowl. Manage., 2(2): 355-360. in proc. of WWW 2007, May 8-12, 2007, Banff, Alberta, Canada. Dasdan A, Xinh HX (2009). User-Centric Content freshness Metrics for ACM 9781595936547/07/0005. Search Engines. In WWW 2009, April 20-24, 2009, Madrid, Spain. ACM 978-1-60558-487-4/09/04. Dujmovi ć J, Bai H (2006). Evaluation and Comparison of Search Engines Using the LSP Method. ComSIS, 3(2): UDC 004.738.52. Gulli A, Signorini A (2005). The indexable web is more than 11.5 billion pages. In Proc. of the World Wide Web 2005 Conference, May 10-14, Chiba, Japan.