Link Analysis Algorithms to Handle Hanging and Spam Pages

Total Page:16

File Type:pdf, Size:1020Kb

Link Analysis Algorithms to Handle Hanging and Spam Pages Department of Computing Link Analysis Algorithms to Handle Hanging and Spam Pages Ravi Kumar Patchmuthu This thesis is presented for the Degree of Doctor of Philosophy of Curtin University June 2014 Declaration Declaration To the best of my knowledge and belief this thesis contains no material previously published by any other person except where due acknowledgment has been made. This thesis contains no material which has been accepted for the award of any other degree or diploma in any university. Signature : ………………………… (RAVI KUMAR PATCHMUTHU) Date : 18th June 2014 Abstract Abstract In the Web, when a page does not have any forward or outgoing links, then that page can be called as hanging page/dangling page/zero-out link page/dead end page. Most of the ranking algorithms used by search engines just ignore the hanging pages. However, hanging pages cannot be just ignored because they may have relevant and useful information like .pdf, .ppt, video and other attachment files. Hanging pages are one of the hidden problems in link structure based ranking algorithms because: They do not propagate the rank scores to other pages (important function of link structure based ranking algorithms). They can be compromised by spammers to induce link spam in the Web. They can affect Website optimization and the performance of link structure based ranking algorithms. A detailed literature survey on link structure based ranking algorithms, hanging pages and their effect on Web information retrieval was conducted. Different link structure based ranking algorithms were explored and compared. Also, literature review on Web spam and in particular link spam was conducted. Finally, a detail literature survey on Web site optimization was conducted. The following are the research objectives: Handling Hanging Pages (HP) in the link structure based ranking algorithms. PageRank is used as the base algorithm throughout this research. I Abstract Developing Hanging Relevancy Algorithm (HRA) to produce fair and relevant ranking results by including only the relevant hanging pages. Developing Link Spam Detection (LSD) algorithm to analyse and detect the effect of hanging pages in link spam contribution. Developing techniques and methods to improve Web Site Optimization (WSO) by studying the effect of hanging pages in Search Engine Optimization (SEO). The following are the methodologies used to meet the research objectives: Web Graph is implemented where nodes are treated as Web pages and edges between nodes are treated as hyperlinks. PageRank algorithm is simulated, matrix interface as well as graph interface is created and the ranks of Web pages are computed. Experiments are conducted to show the effects of hanging pages and methods are proposed to handle hanging pages in ranking Web pages. Hanging Relevancy Algorithm (HRA) is implemented and only the relevant hanging pages are included in the rank computation to reduce the complexity. Link Spam Detection (LSD) algorithm is implemented to detect link spam contributed by hanging pages. Experiments are conducted to show the effect of hanging pages in Search Engine Optimization (SEO) and factors are proposed to build optimized Web sites. PageRank algorithm was implemented and experiments were carried out to show its convergence. Three publicly available datasets, WEBSPAM-UK2006, WEBSPAM- II Abstract UK2007 and EU2010 were used for the experiments, apart from live data from the World Wide Web. Several experiments were conducted to show the effects of hanging pages on Web page ranking. Methods were proposed to include all the hanging pages in PageRank computation and to compare them with PageRank algorithm. The proposed methods were slower than PageRank algorithm but produced more relevant results. The experiments showed the percentage of hanging pages in the following datasets: WEBSPAM-UK2006 - 21.35%, WEBSPAM- UK2007 - 43.11%, EU2010 - 54.21% and the Curtin University (Sarawak) Website - 35.57%. The study showed that the hanging pages are keep increasing in the Web. Hanging Relevancy Algorithm (HRA) is implemented and it produced more relevant results with less computation time compared to including all the hanging pages in the computation. The experiment also showed that the ranks of certain relevant hanging pages were increased by four, signifying that these pages deserved a better ranking. Experiments were carried out using live Web data to prove the contribution of link spam by hanging pages. The results showed that the rank of the target page after link spam had increased by two and the order had also improved. This proved that hanging pages contributed to spam and the proposed method had detected link spam contributed by the hanging pages. Experiments were done to determine the On-Site and Off-Site ranking factors by taking www.curtin.edu.my as a sample Website. The link analysis experiment showed that 90% of the sample Website's back links are external links and only 10% are internal back links. 80% of the sample Website's links are followed back links and only 20% of them are no-followed back links. The experiments showed different ranking factors and also suggested factors to improve the ranking of the particular Website. The research study has therefore, helped to improve the rankings of relevant hanging pages and reduce the link spam contributed by hanging pages in the Search Engine Result Pages of link structure based ranking algorithms. III Acknowledgement Acknowledgement This thesis would not have been possible without the guidance and help of several individuals, who in one way or another, contributed or extended their valuable assistance in the preparation and completion of this research study. First, I would like to thank my Supervisor, Dr. Zhuquan Zang (Associate Professor, HOD, Department of Electrical and Computer Engineering), for providing me with all the administrative support and guidance throughout this research journey. Next, I would like to express my utmost gratitude to my Co-Supervisor/Associate Supervisor, Dr. Ashutosh Kumar Singh (Ex. Associate Professor, Department of Electrical and Computer Engineering, Curtin University, (Miri Campus), currently, Professor, Department of Computer Application, NIT, Kurukshetra, India) to whom I was reporting during this research journey. I have enjoyed his energetic and enthusiastic thought provoking ideas, constructive comments and valuable advices during the research discussions; much of this is reflected in my research and in the thesis as a whole. I am really grateful for Dr. Singh's constant support, guidance and encouragement, and deeply thankful and proud of him as my supervisor. The enthusiasm and dedication he has shown towards research, has been commendable and particularly motivational for me, during the tough stages of this research endeavour. I would like to acknowledge the support given by the Chairperson of the Doctoral Thesis Committee, Dr. Chua Han Bing (Associate Professor, HOD, Chemical Engineering) and the ex-chairperson Dr. K. Shenbaga (ex-Dean, Research and IV Acknowledgement Development, Curtin University). Both of them have provided guidance, encouragement and support and shared tremendous knowledge with me. I am also obliged to acknowledge Professor Dr. Michael Cloke (Dean, SOE), Professor Dr. Marcus Man Kon Lee (Associate Dean, Research Training), Associate Professor Dr. Michael Kobina Danquah (Associate Dean, Research and Development) and Nur Afnizan Johan (Tom) for all the support given to me during the course of this research study. I would also like to extend my sincere gratitude to my research colleague, Alex Goh Kwang Leng for participating in the research discussions and contributing to the research study as a whole. I wish him good luck for his journey towards his doctoral study. I like to acknowledge Dr. Anand Mohan, (Director, NIT, Kurukshetra, India) for his expert advice on this research study. My appreciation also goes to Mdm. Sheila Gopinath for proof reading and editing this thesis, Billy Lau Pik Lik for the initial contribution to this research and Dr. Herbert Raj, my brother-in-law, for correcting few chapters. I extend my thanks to others as well, who have helped me either directly or indirectly in this research study. My sincere gratitude goes to Jefri Bolkiah College of Engineering, Kuala Belait, and Ministry of Education, Brunei for supporting me in this research study. Special thanks go to my family, especially to my wife Jelciana, for her constant support and motivation to complete this thesis. I would also like to acknowledge my son Arun and daughter Cynthea for their love and support, as well as my father and my in-laws for their telephonic support and motivation. Last of all but most importantly; I would like to thank Almighty God for providing me with such great strength and opportunity to complete this thesis. V Publications Publications JOURNAL PAPERS 1. Ravi Kumar Patchmuthu, Ashutosh Kumar Singh and Anand Mohan. ―Efficient Methodologies to Overcome the effects of Hanging Pages in Website Optimization‖, International Journal of Web Engineering and Technology. (Scopus, Under Review). 2. Ravi Kumar Patchmuthu, Goh Kwang Leng, Ashutosh Kumar Singh and Anand Mohan. ―Efficient Methodologies to determine the Relevancy of Hanging Pages with Stability Analysis‖, Cybernetics and Systems: An International Journal. (ISI, Under Review). 3. Ravi Kumar Patchmuthu, Ashutosh Kumar Singh and Anand Mohan. ―A New Algorithm for Detection of Link Spam Contributed by Zero-Out-Link Pages‖, Turkish
Recommended publications
  • Erasmus Mundus Master's Journalism and Media Within Globalisation: The
    Erasmus Mundus Master’s Journalism and Media within Globalisation: The European Perspective. Blue Book 2009-2011 The 2009- 2011 Masters group includes 49 students from all over the world: Australia, Brazil, Britain, Canada, China, Czech, Cuba, Denmark, Egypt, Ethiopia, Finland, France, Germany, India, Italy, Kenya, Kyrgyzstan, Norway, Panama, Philippines, Poland, Portugal, Serbia, South Africa, Spain, Ukraine and USA. 1 Mundus Masters Journalism and Media class of 2009-2011. Family name First name Nationality Email Specialism Lopez Belinda Australia lopez.belinda[at]gmail.com London City d'Essen Caroline Brazil/France caroldessen[at]yahoo.com.br Hamburg Werber Cassie Britain cassiewerber[at]gmail.com London City Baker Amy Canada amyabaker[at]gmail.com Swansea Halas Sarka Canada/Czech sarkahalasova[at]gmail.com London City Diao Ying China dydiaoying[at]gmail.com London City Piñero Roig Jennifer Cuba jenniferpiero[at]yahoo.es Hamburg Jørgensen Charlotte Denmark charlotte_j84[at]hotmail.com Hamburg Poulsen Martin Kiil Denmark poulsen[at]martinkiil.dk Swansea Billie Nasrin Sharif Denmark Nasrin.Billie[at]gmail.com Swansea Zidore Christensen Ida Denmark IdaZidore[at]gmail.com Swansea Sørensen Lasse Berg Denmark lasseberg[at]gmail.com London City Hansen Mads Stampe Denmark Mads_Stampe[at]hotmail.com London City Al Mojaddidi Sarah Egypt mojaddidi[at]gmail.com Swansea Gebeyehu Abel Adamu Ethiopia abeltesha2003[at]gmail.com Swansea Eronen Eeva Marjatta Finland eeva.eronen[at]iki.fi London City Abbadi Naouel France naouel.mohamed[at]univ-lyon2.fr
    [Show full text]
  • Characterization of Portuguese Web Searches
    FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Characterization of Portuguese Web Searches Rui Ribeiro Master in Informatics and Computing Engineering Supervisor: Sérgio Nunes (PhD) July 11, 2011 Characterization of Portuguese Web Searches Rui Ribeiro Master in Informatics and Computing Engineering Approved in oral examination by the committee: Chair: João Pascoal Faria (PhD) External Examiner: Daniel Coelho Gomes (PhD) Supervisor: Sérgio Sobral Nunes (PhD) 5st July, 2011 Abstract Nowadays the Web can be seen as a worldwide library, being one of the main access points to information. The large amount of information available on websites all around the world raises the need for mechanisms capable of searching and retrieving relevant information for the user. Information retrieval systems arise in this context, as systems capable of searching large amounts of information and retrieving relevant information in the user’s perspective. On the Web, search engines are the main information retrieval sys- tems. The search engine returns a list of possible relevant websites for the user, according to his search, trying to fulfill his information need. The need to know what users search for in a search engine led to the development of methodologies that can answer this problem and provide statistical data for analysis. Many search engines store the information about all queries made in files called trans- action logs. The information stored in these logs can vary, but most of them contain information about the user, query date and time and the content of the query itself. With the analysis of these logs, it is possible to get information about the number of queries made on the search engine, the mean terms per query, the mean session duration or the most common topics.
    [Show full text]
  • Google/ Doubleclick REGULATION (EC) No 139/2004 MERGER PROCEDURE Article 8(1)
    EN This text is made available for information purposes only. A summary of this decision is published in all Community languages in the Official Journal of the European Union. Case No COMP/M.4731 – Google/ DoubleClick Only the English text is authentic. REGULATION (EC) No 139/2004 MERGER PROCEDURE Article 8(1) Date: 11-03-2008 COMMISSION OF THE EUROPEAN COMMUNITIES Brussels, 11/03/2008 C(2008) 927 final PUBLIC VERSION COMMISSION DECISION of 11/03/2008 declaring a concentration to be compatible with the common market and the functioning of the EEA Agreement (Case No COMP/M.4731 – Google/ DoubleClick) (Only the English text is authentic) Table of contents 1 INTRODUCTION .....................................................................................................4 2 THE PARTIES...........................................................................................................5 3 THE CONCENTRATION.........................................................................................6 4 COMMUNITY DIMENSION ...................................................................................6 5 MARKET DESCRIPTION......................................................................................6 6 RELEVANT MARKETS.........................................................................................17 6.1. Relevant product markets ............................................................................17 6.1.1. Provision of online advertising space.............................................17 6.1.2. Intermediation in
    [Show full text]
  • Exploring Search Engine Counts in the Identification and Characterization
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Repository of the University of Porto FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Exploring search engine counts in the identification and characterization of search queries Diogo Magalhães Moura Mestrado Integrado em Engenharia Informática e Computação Supervisor: Carla Teixeira Lopes July 26, 2018 Exploring search engine counts in the identification and characterization of search queries Diogo Magalhães Moura Mestrado Integrado em Engenharia Informática e Computação Approved in oral examination by the committee: Chair: João Moreira External Examiner: Ricardo Campos Supervisor: Carla Lopes July 26, 2018 Abstract The Web is the greatest source of information nowadays and it’s frequently used by most people to seek specific information. People do not search the most efficient way possible. They tend to use a small number of search terms, look into few retrieved documents and rarely use advanced features. This makes it harder of finding the intent and motive of the search and provide the user with relevant and precise documents among the numerous documents available online. This topic is particularly important when the information searched is in the health domain. Health-related searches have gained much popularity. In the U.S. only, 72% of Internet users search in this domain. Being able to provide better results could encourage people to be more participative in managing their health and have a fundamental impact on their quality of life. The purpose of this investigation is to explore the usage of a semantic similarity measure, called “Normalized Google Distance" (NGD), to enrich and increase the context of a health query.
    [Show full text]
  • CITY of ROCKPORT AGENDA CITY COUNCIL WORKSHOP MEETING 3:00 P.M., Tuesday, May 27, 2008 Rockport City Hall, 622 East Market Stree
    CITY OF ROCKPORT AGENDA CITY COUNCIL WORKSHOP MEETING 3:00 p.m., Tuesday, May 27, 2008 Rockport City Hall, 622 East Market Street NOTICE is hereby given that the Rockport City Council will hold a Regular Meeting on Tuesday, May 27, 2008 at 3:00 p.m. at the Rockport City Hall, 622 E. Market, Rockport, Texas. The following subjects will be discussed to wit: I. CALL TO ORDER. II. ITEMS FOR CONSIDERATION A. Presentation and general discussion on the Aquarium at Rockport Harbor. B. Presentation and general discussion of presentation entitled Improving Coastal Land Use Planning Through the Application and Evaluation of the Interoperability of Three Decision Support Tools as presented by Kiersten Madden, Ph.D., Stewardship Coordinator of the Mission-Aransas National Estuarine Research Reserve, UTMSI, Port Aransas . C. Presentation and general discussion of request from the Aransas First for the City to provide and fund water for Aransas First use at the TxDOT Roadside Park on Hwy. 35 for the maintenance and restoration of the Demo Garden. D. Presentation and general discussion regarding addressing efforts to better communicate with Citizens regarding city services and programs. E. Presentation and general discussion regarding 2008 Appointments for the following: (1) City Committees, Boards and Commissions and (2) Council Liaison to City Committees, Boards and Commissions and community organizations. III. ADJOURNMENT NOTICE This facility is wheelchair accessible and accessible parking spaces are available. Requests for accommodations or interpretive services must be made 48 hours prior to this meeting. Please contact the City Secretary’s office at 361/729-2213 ext. 138 or FAX 361/790-5966 or E-Mail [email protected] for further information.
    [Show full text]
  • Download Download
    International Journal of Management & Information Systems – Fourth Quarter 2011 Volume 15, Number 4 History Of Search Engines Tom Seymour, Minot State University, USA Dean Frantsvog, Minot State University, USA Satheesh Kumar, Minot State University, USA ABSTRACT As the number of sites on the Web increased in the mid-to-late 90s, search engines started appearing to help people find information quickly. Search engines developed business models to finance their services, such as pay per click programs offered by Open Text in 1996 and then Goto.com in 1998. Goto.com later changed its name to Overture in 2001, and was purchased by Yahoo! in 2003, and now offers paid search opportunities for advertisers through Yahoo! Search Marketing. Google also began to offer advertisements on search results pages in 2000 through the Google Ad Words program. By 2007, pay-per-click programs proved to be primary money-makers for search engines. In a market dominated by Google, in 2009 Yahoo! and Microsoft announced the intention to forge an alliance. The Yahoo! & Microsoft Search Alliance eventually received approval from regulators in the US and Europe in February 2010. Search engine optimization consultants expanded their offerings to help businesses learn about and use the advertising opportunities offered by search engines, and new agencies focusing primarily upon marketing and advertising through search engines emerged. The term "Search Engine Marketing" was proposed by Danny Sullivan in 2001 to cover the spectrum of activities involved in performing SEO, managing paid listings at the search engines, submitting sites to directories, and developing online marketing strategies for businesses, organizations, and individuals.
    [Show full text]
  • A Progress Report of the Coalition to End Wildlife Trafficking Online
    COALITION TO END WILDLIFE TRAFFICKING ONLINE OFFLINE AND IN THE WILD A Progress Report of the Coalition to End Wildlife Trafficking Online © Martin Harvey / WWF The world’s most endangered species are under threat from an unexpected source: the internet 2 Advances in technology and access to connectivity across the world, combined with rising buying power and demand for illegal wildlife products, have increased the ease of exchange from poacher to consumer across continents. Products derived from species like elephants, pangolins and marine turtles, as well as live tiger cubs, reptiles, primates and birds for the exotic pet trade are readily available for purchase online. Buyers don’t need advanced detection skills to seek out these animals, but rather can quickly find the rare and exotic for sale on widely used platforms. The scale and anonymous nature of online trafficking, which includes everchanging usernames, the security of offline chats and cloaked VPNs, means it is essential that tech companies take charge in detecting and disrupting online wildlife criminals. E-commerce, social media and tech companies can operate at an unprecedented scale, preventing or removing millions of listings of protected wildlife in collaboration with wildlife experts. © James Warwick / WWF-US 3 A COALITION APPROACH © Rachel Stump / WWF-US In 2018, WWF, TRAFFIC and IFAW launched the Coalition to End Wildlife Trafficking Online with 21 of the world’s biggest tech companies after years of behind-the-scenes collaborations bringing companies to the table. The Coalition brings together The Coalition e-commerce, search and social media companies across the world in partnership with these three leading wildlife brings together organizations that support this industry-wide approach to e-commerce, reduce wildlife trafficking online on company platforms by 80% by 2020.
    [Show full text]
  • A Google and SAPO Comparative Study Regarding Different Interaction Modalities
    Univ Access Inf Soc (2017) 16:581–592 DOI 10.1007/s10209-016-0489-5 LONG PAPER Usability evaluation of navigation tasks by people with intellectual disabilities: a Google and SAPO comparative study regarding different interaction modalities 1,2 1,2 1,2 3 Taˆnia Rocha • Diana Carvalho • Maximino Bessa • Sofia Reis • Luı´s Magalha˜es4 Published online: 5 August 2016 Ó Springer-Verlag Berlin Heidelberg 2016 Abstract This paper presents a case study regarding the phased study, we think we may be able to infer some rec- usability evaluation of navigation tasks by people with ommendations to be used by developers in order to create intellectual disabilities. The aim was to investigate the more intuitive layouts for easy navigation regarding this factors affecting usability, by comparing their user-Web group of people, and thereby facilitate digital inclusion. interactions and underline the difficulties observed. For that purpose, two distinct study phases were performed: the first Keywords Web search Á Web navigation tasks Á Google Á consisted in comparing interaction using two different SAPO Á Intellectual disability search engines’ layouts (Google and SAPO) and the second phase consisted in a preliminary evaluation to analyze how users performed the tasks with the usual input devices 1 Introduction (keyboard and mouse) and provide an alternative interface to help overcome possible interaction problems and Today, the evolution of the information society justifies enhance autonomy. For the latter, we compared two dif- concerns about universal access to services provided online ferent interfaces: a WIMP-based one and speech-based one. in order to improve people’s quality of life.
    [Show full text]
  • Web Trend Map 2007/V2 Social News Political Blogs Community Junctions INFORMATION ARCHITECTS JAPAN ANALYTIC DESIGN and CONSULTING Design Stations
    PChome Vnet EastMoney Tom Pandora OhMyNews Soso 9 Rules Newsvine 2.0 Technorati 2.0 Taobao 2.0 Social News Chinese Line UPS 163 Sohu Red State Malkin C&L Media Matters Passport Sina 2.0 Money Baidu Huffington Post Daily Kos 1.5 Edelman 2.5 1.5 Netscape 1.5 1.0 Statcounter XBox Live Office Live Daily Motion Uwants QQ Yahoo Messenger Geocities 2.0 Beppe Grillo 1.5 Micro Persuasion Tools Wallmart Music Delicious Yahoo Spreeblick 1.5 Livejournal 1.0 2.0 MSN Yahoo.com.cn Yahoo Maps Metacafé T-Online Hotmail Live Microsoft 1.5 Imagevenue 2.5 1.5 Bild Repubblica Yahoo Answers 1.0 Leo 1.0 1.0 Feedburner Xanga 37 Signals Yahoo News Prägnanz LeMonde News Yigg Edelman ElPais Money 2.0 Photolog.com Web.de Heise 1.0 1.0 2.5 MyVideo GMX 1.0 2.0 Sueddeutsche 1.0 2.0 Yahoo Mail Techmeme Flickr Imageshack 1.0 Der Spiegel Dell Myspace Scoblizer Techcrunch Friendster Messenger OM Malik 2.0 Seth Godin PhotoBucket ZeFrank 2.0 2.0 2.0 1.5 CNet FAZ 1.0 2.0 Hi5 Twitter Kotaku Comedy Central NZZ 2.0 MyBlogLog WebtoolsAmazon 2.0 SecondLife Google Blogsearch Knowhow Mixi Cisco Orkut 2.0 Engadget 2.0 2.0 1.5 TheGuardian Technology 2.0 2.0 1.5 Memeorandum Wired Slashdot.org IBM Google Maps John Batelle FOX CNN Reddit Murdoch News Sony Google Calendar Wall Street Journal Studiverzeichnis Jeff Jarvis Social Networks 2.0 1.5 2.0 1.5 1.5 1.0 1.0 2.0 1.0 Adwords NYTimes USA Today Slate Google.co.jp 0.5 Washington Post Trillian Google Analytics 1.0 Music Versiontracker Design AIGA Google.fr Google.de Google.it Google News Doc Searls Chinese Line 1.5 2ch.net BBC 1.5 Bebo
    [Show full text]
  • Revisiting Email Spoofing Attacks
    Revisiting Email Spoofing Attacks Hang Hu Gang Wang Virginia Tech Virginia Tech Abstract—The email system is the central battleground against emails still be delivered to users? If so, how could users know phishing and social engineering attacks, and yet email providers the email is questionable? Take Gmail for example, Gmail still face key challenges to authenticate incoming emails. As a delivers certain forged emails to the inbox and places a security result, attackers can apply spoofing techniques to impersonate a trusted entity to conduct highly deceptive phishing attacks. In cue on the sender icon (a red question mark, Figure 4(a)). this work, we study email spoofing to answer three key questions: We are curious about how a broader range of email providers (1) How do email providers detect and handle forged emails? (2) handle forged emails, and how much the security cues actually Under what conditions can forged emails penetrate the defense help to protect users. to reach user inbox? (3) Once the forged email gets in, how email In this paper, we describe our efforts and experience in eval- providers warn users? Is the warning truly effective? We answer these questions through end-to-end measurements uating the effectiveness of user-level protections against email 1 on 35 popular email providers (used by billions of users), and spoofing . We answer the above questions in two key steps. extensive user studies (N = 913) that consist of both simulated First, we examine how popular email providers detect and han- and real-world phishing experiments. We have four key findings.
    [Show full text]
  • Quantitative Analysis of Whether Machine Intelligence Can Surpass
    Quantitative Analysis of Whether Machine Intelligence Can Surpass Human Intelligence Feng Liua,1, Yong Shib aSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China bResearch Center on Fictitious Economy and Data Science, the Chinese Academy of Sciences, China, Beijing 100190, China Abstract Whether the machine intelligence can surpass the human intelligence is a controversial issue. On the basis of traditional IQ, this article presents the Universal IQ test method suitable for both the machine intelligence and the human intelligence. With the method, machine and human intelligences were divided into 4 major categories and 15 subcategories. A total of 50 search engines across the world and 150 persons at different ages were subject to the relevant test. And then, the Universal IQ ranking list of 2014 for the test objects was obtained. According to the test results, it’s found that the machine intelligence is developed rapidly in low-end intelligence like mathematical calculations and general knowledge mastering, but slowly in high-end intelligence, especially in the knowledge innovation capability. In the future, people may be able to judge whether the machine intelligence can surpass the human intelligence by carrying out the tests like this regularly. Keywords: Machine Intelligence; Human Intelligence;General IQ 1. Introduction On February 18, 2011, the supercomputer Watson competed on a quiz show Jeopardy! against former winners Brad Rutter and Ken Jennings and won the first prize[1]. The famous futurist Kurzweil argued in a book 《The Singularity Is Near: When Humans Transcend Biology 》 that now the information technology is progressing to the technological singularity of a greater-than-human intelligence and when the singularity is reached in 2045, the artificial intelligence (AI) will surpass human intelligence.
    [Show full text]
  • Web Search Engine
    Web Search Engine Bosubabu Sambana, MCA, M.Tech Assistant Professor, Dept of Computer Science & Engineering, Simhadhri Engineering College, Visakhapatnam, AP-531001, India. Abstract: This hypertext pool is dynamically changing due to this reason it is more difficult to find useful The World Wide Web (WWW) allows people to share information.In 1995, when the number of “usefully information or data from the large database searchable” Web pages was a few tens of millions, it repositories globally. We need to search the was widely believed that “indexing the whole of the information with specialized tools known generically Web” was already impractical or would soon become as search engines. There are many search engines so due to its exponential growth. A little more than a available today, where retrieving meaningful decade later, the GYM search engines—Google, information is difficult. However to overcome this problem of retrieving meaningful information Yahoo!, and Microsoft—are indexing almost a intelligently in common search engines, semantic thousand times as much data and between them web technologies are playing a major role. In this providing reliable sub second responses to around a paper we present a different implementation of billion queries a day in a plethora of languages. If this semantic search engine and the role of semantic were not enough, the major engines now provide much relatedness to provide relevant results. The concept higher quality answers. of Semantic Relatedness is connected with Wordnet which is a lexical database of words. We also For most searchers, these engines do a better job of made use of TF-IDF algorithm to calculate word ranking and presenting results, respond more quickly frequency in each and every webpage and to changes in interesting content, and more effectively Keyword Extraction in order to extract only useful eliminate dead links, duplicate pages, and off-topic keywords from a huge set of words.
    [Show full text]