Univerzita Karlova V Praze Internetové Vyhledávače S Důrazem Na

Total Page:16

File Type:pdf, Size:1020Kb

Univerzita Karlova V Praze Internetové Vyhledávače S Důrazem Na Univerzita Karlova v Praze Filozofická fakulta Ústav informačních studií a knihovnictví Studijní program: informační studia a knihovnictví Studijní obor: informační studia a knihovnictví Libor Nováček internetové vyhledávače s důrazem na technický vývoj a ekonomiku provozu Diplomová práce Praha 2008 Vedoucí diplomové práce: PhDr. Richard Papík, Ph.D. Oponent diplomové práce: Datum obhajoby: Hodnocení: Vysoká škola: Univerzita Karlova v Praze Fakulta: Filozofická fakulta Součást: Ustav informačních studií a knihovnictví Školní rok: 2004/2005 - ZADÁNÍ DIPLOMOVÉ PRÁCE (PROJEKTU, UMĚLECKÉHO DÍLA, UMĚLECKÉHO VÝKONU) pro Libor Nováček obor Informační studia a knihovnictví Název tématu: Internetové vyhledávače s důrazem na technický vývoj a ekonomiku provozu Zásady pro vypracování: Cílem diplomové práce je popsat a analyzovat vývoj hlavních celosvětových a českých vyhledávačů typu „web search engines“ a současně se záměřit na jejich technické a ekonomické aspekty. Předběžná osnova: 1. Úvod do problematiky vyhledávacích prostředků na Internetu - období před komerčním rozšířením WWW služby - období druhé poloviny 90. let minulého století - "search engines" v 21. století 2. Technický vývoj vyhledávačů 3. Ekonomické aspekty provozu vyhledávačů 4. Předpokládané trendy vývoje internetových vyhledávacích prostředků Diplomová práce bude připravena a upravena v souladu s platnými vnitřními předpisy FF UK a dalšími metodickými pokyny a normativními dokumenty. Tisk: Rapy 0078/2001 o |SEVT| 49 395 0 1/2001 Rozsah grafických prací: Rozsah průvodní zprávy: Seznam odborné literatury: 1. BAEZA-YATES, Ricardo; RIBEIRO-NETO, Berthier. Modern information retrieval. New York : Addison-Wesley. 1999. 513 s. ISBN 0-201-39829-X 2. Search engine watch [online]. Darien (Connecticut): Jupitermedia, 1998-. [cit. 2005-01-11]. Dostupné z WWW: <http://searchenainewatch.com> 3. HOCK, Randolph. The extreme searcher's guide to Web search engines: a handbook for the serious searcher. Medford (N .J.): CyberAge Books, 1999. 212 s. ISBN 0-910965-26-9 Vedoucí diplomové práce: PhDr. Richard Papík, Ph.D. Datum zadání diplomové práce: 11.1. 2005 Termín odevzdání diplomové práce: L.S. PhDr. Richard Papík, Ph.D. Vedoucí součésti-ředitel ŮISK FF UK Děkan FF UK V Praze dne 11.1.2005 Prohlášení: Prohlašuji, že jsem diplomovou práci zpracoval samostatně a že jsem uvedl všechny použité informační zdroje. V Praze, 21. srpna 2008 Identifikační záznam: NOVÁČEK, Libor. Internetové vyhledávače s důrazem na technický vývoj a ekonomiku provozu [Internet search engines, with emphasis on their technical development and business operationsj. Praha, 2008. 125 s., 75 s. příl. Diplomová práce. Univerzita Karlova v Praze, Filozofická fakulta, Ústav informačních studií a knihovnictví 2008. Vedoucí diplomové práce PhDr. Richard Papík, Ph.D.. Abstrakt Tématem práce jsou internetové vyhledávače, zejména vyhledávače operující v prostředí volně indexovatelného WWW. Nejprve je přiblížena struktura internetových vyhledávacích systémů. Dále jsou rozebrány jejich jednotlivé části, zvláštní pozornost je věnována zejména crawleru. Hlouběji se práce věnuje také způsobům řazení výsledků a jejich možnému ovlivňování. V další části práce jsou pak popsány hlavní zahraniční a domácí vyhledávače a to nejen vyhledávající v prostoru www. Důraz je kladen na okolnosti jejich vzniku, použité technologie a jejich financování. Pozornost je věnována též osobám zakladatelů. Klíčová slova: internetové vyhledávače, vyhledávací stroje, Internet, World Wide Web, WWW, volně indexovatelný web, web crawler, spider, robot, technologie, historie internetu, optimalizace pro vyhledávače (SEO), intelektuální kapitál, rozvojový kapitál. Obsah Poděkováni.............................................................................................................................11 Předmluva.............................................................................................................................. 11 Tvůrčí poznámka....................................................................................................................12 1.Úvo d................................................................................................................................. 14 1.1.Význam pojmu „internetový vyhledávač“ ...................................................................14 1.2. Vymezení obsahu práce.................................................................................. 15 2.Technologie a struktura webu.......................................................... ................................... 18 2.1 .Technologický pohled na web.....................................................................................18 2.2.„Motýlková struktura“ webu.......................................................................................19 2.3.Regionální působnost a překryv vyhledávačů............................................................. 20 2.4.Speciální webové technologie......................................................................................22 2.4.1 .Technologie AJAX...............................................................................................22 2.4.2.AJAX aplikace z hlediska vyhledávačů................................................................23 3.Architektura vyhledávačů................................................................................................... 25 3.1.1.Softwarová stavba vyhledávačů........................................................................... 27 3.2.Crawler neboli robot....................... ......... .................................................................. 28 3.2.1.Překážky pro robota............................................................................................. 30 3.2.2.0mezení crawlingu plynoucí z legislativy............................................................32 3.2.2.1.Právní situace v České republice...................................................................33 3.3.Index a databáze vyhledávače...................................................................................... 33 3.3.1.Algoritmy pro řazení výsledků vyhledávání......................................................... 34 3.3.2.PageRank algoritmus a související patenty...........................................................35 3.3.3.Zrychlení algoritmů PageRank................. ........................................................... 36 3.3.4.0dkazy přirozené („organické“) vs. odkazy „umělé“...........................................36 3.3.5.Nejdůležitější faktory ovlivňující pořadí výsledků................................................37 3.3.6.Atribut „nofollow“......................................................................................38 3.3.7.Google bomby......................................................................................................39 3.3.8.Google Sitemaps................................................................................................. 40 3.3.8.1.Podpora Sitemaps ostatními vyhledávači...................................................... 40 3.3.8.2.Nástroje pro tvorbu sitemaps........................................................................41 3.3.8.3.Technická specifikace SiteMaps....................................................................41 3.3.9.Google Analytics.................................................................................................. 41 3.4.Vyhledávače a jejich uživatelé..................................................................................... 42 3.4.1.Frekvence používání výhledávačů uživateli..........................................................43 3.4.2.Chování uživatelů při zadávání dotazů............................................................... 44 3.4.2.1.Bezpečnostní rizika a hrozby na straně uživatelů.......................................... 44 3.4.3.Průniky hackerů do vyhledávacích služeb................... .........................................45 3.5.Postavení vyhledávačů na trhu, srovnávání a měření................................................46 3.6.Tržm podíly českých vyhledávačů ........................................................................47 3.6.1.Tržní podíly českých vyhledávačů v roce 2004....................................................47 3.6.2.Tržní podíly českých vyhledávačů v letech 2005 - 2008...................................... 47 3.6.3. Tržní podíly podle unikátních návštěvníků....................................................48 4.Vybrané aspekty reklamy u vyhledávačů............................................................. ................ 49 4.1.Právní aspekty kontextové reklamy.............................................................................. 49 4.2.Certifikace AdWords profesionálů................................................................................49 4.3.Pay-per-Click systém Sklik systém vyhledávače Seznam.cz......................................... 50 8 4.4.Reklamní program AdSense.........................................................................................51 4.4.1 .MFA weby........................................................................................................... 52 4.4.1.1 .MFA v České republice.................................................................................53 4.4.1.2.Google AdSense for domains........................................................................53 4.4.1.3.AdSense v České republice...........................................................................53
Recommended publications
  • Signature Redacted Kimberly Smith Program in Media Arts and Sciences May 12, 2017 Signature Redacted
    New Materials for Teaching Computational Thinking in Early Childhood Education by Kimberly Smith M.EA. Fine Arts School of Visual Arts, 2012 B.F.A. Fine Arts LIBRARIES Oregon State University, 2005 ARCHIVES Submitted to the Department of Media Arts and Sciences, School of Architecture and Planning, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2017 2017 Massachusetts Institute of Technology. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author:-Signature redacted Kimberly Smith Program in Media Arts and Sciences May 12, 2017 Signature redacted Certified by Sepandar Kamvar Associate Professor of Media Arts and Sciences Signature redacted Accepted by Pattie Maes, Ph.D. kcademic Head, Program in Media Arts and Sciences I New Materials for Teaching Computational Thinking in Early Childhood Education by Kimberly Smith M.F.A. Fine Arts School of Visual Arts, 2012 B.EA. Fine Arts Oregon State University, 2005 Submitted to the Department of Media Arts and Sciences, School of Architecture and Planning, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2017 Abstract The need for computer science education is greater than ever. There are currently over 500,000 unfilled computer science jobs in the United States and many schools do not teach computer science in their classrooms.
    [Show full text]
  • Signature Redacted a Uthor
    Automatic Identification of Representative Content on Twitter by OF TECHNOLOGY Prashanth Vijayaraghavan JUL 14 2016 B.E, Anna University (2012) LIBRARIES Submitted to the Program in Media Arts and Sciences ARKIVES in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016 @ Massachusetts Institute of Technology 2016. All rights reserved. Signature redacted A uthor ....... ...................... Program in Media Arts and Sciences May 11, 2016 Signature redacted Certified by ..................... Xeb Roy Associate Professor Program in Media Arts and Sciences Thesis Supervisor Signature redacted Accepted by ........................ Pattie Maes Academic Head Program in Media Arts and Sciences --owN Automatic Identification of Representative Content on Twitter by Prashanth Vijayaraghavan Submitted to the Program in Media Arts and Sciences on May 11, 2016, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences Abstract Microblogging services, most notably Twitter, have become popular avenues to voice opinions and be active participants of discourse on a wide range of topics. As a consequence, Twitter has become an important part of the political battleground that journalists and political analysts can harness to analyze and understand the narratives that organically form, spread and decline among the public in a political campaign. A challenge with social media is that important discussions around certain issues can be overpowered by majoritarian or controversial topics that provoke strong reactions and attract large audiences. In this thesis we develop a method to identify the specific ideas and sentiments that represent the overall conversation surrounding a topic or event as reflected in collections of tweets.
    [Show full text]
  • Signature Redacted Author
    Understanding email communication patterns Daniel Smilkov M.Sc. Computer Science, 2011 B.Sc. Computer Science, 2009 Ss. Cyril and Methodius University, Skopje, Macedonia Submitted to the Program in Media Arts and Sciences, MASSACHUSETS WMNfVlE School of Architecture and Planning, OF TECHNOLOGY in Partial Fulfillment of the Requirements for the Degree of JUL Master of Science in Media Technology at the Massachusetts Institute of Technology LIBRARIES June 2014 © Massachusetts Institute of Technology, 2014. All rights reserved. Signature redacted Author Ilaniel Smilkov Program in Media Arts and Sciences May 16, 2014 Signature redacted Certified By Dr. Cesar A. Hidalgo Assistant Professor in Media Arts and Sciences Program in Media Arts and Sciences Signature redacted Accepted By Dr. Pattie Maes Professor of Media Technology, Associate Academic Head Program in Media Arts and Sciences 1 Understanding email communication patterns Daniel Smilkov Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, On May 16, 2014 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Media Technology at the Massachusetts Institute of Technology Abstract It has been almost two decades since the beginning of the web. This means that the web is no longer just a technology of the present, but also, a record of our past. Email, one of the original forms of social media, is even older than the web and contains a detailed description of our personal and professional history. This thesis explores the world of email communication by introducing Immersion, a tool build for the purposes to analyze and visualize the information hidden behind the digital traces of email activity, to help us reflect on our actions, learn something new, quantify it, and hopefully make us react and change our behavior.
    [Show full text]
  • An Algorithmic Approach to Social Networks David Liben-Nowell
    An Algorithmic Approach to Social Networks by David Liben-Nowell B.A., Computer Science and Philosophy, Cornell University, 1999 M.Phil., Computer Speech and Language Processing, University of Cambridge, 2000 Submitted to the Department of Electrical Engineering and Computer Science in partial ful¯llment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2005 c Massachusetts Institute of Technology 2005. All rights reserved. Author . Department of Electrical Engineering and Computer Science May 20, 2005 Certi¯ed by. Erik D. Demaine Assistant Professor Thesis Supervisor Accepted by . Arthur C. Smith Chairman, Department Committee on Graduate Students An Algorithmic Approach to Social Networks by David Liben-Nowell Submitted to the Department of Electrical Engineering and Computer Science on May 20, 2005, in partial ful¯llment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract Social networks consist of a set of individuals and some form of social relationship that ties the individuals together. In this thesis, we use algorithmic techniques to study three aspects of social networks: (1) we analyze the \small-world" phenomenon by examining the geographic patterns of friendships in a large-scale social network, showing how this linkage pattern can itself explain the small-world results; (2) using existing patterns of friendship in a social network and a variety of graph-theoretic techniques, we show how to predict new relationships that will form in the network in the near future; and (3) we show how to infer social connections over which information ows in a network, by examining the times at which individuals in the network exhibit certain pieces of information, or interest in certain topics.
    [Show full text]
  • Web Search Engines: Practice and Experience
    Web Search Engines: Practice and Experience Tao Yang, University of California at Santa Barbara Apostolos Gerasoulis, Rutgers University 1 Introduction The existence of an abundance of dynamic and heterogeneous information on the Web has offered many new op- portunities for users to advance their knowledge discovery. As the amount of information on the Web has increased substantially in the past decade, it is difficult for users to find information through a simple sequential inspection of web pages or recall previously accessed URLs. Consequently, the service from a search engine becomes indispensable for users to navigate around the Web in an effective manner. General web search engines provide the most popular way to identify relevant information on the web by taking a horizontal and exhaustive view of the world. Vertical search engines focus on specific segments of content and offer great benefits for finding information on selected topics. Search engine technology incorporates interdisciplinary concepts and techniques from multiple fields in computer science, which include computer systems, information retrieval, machine learning, and computer linguistics. Understanding the computational basis of these technologies is important as search engine technologies have changed the way our society seeks out and interacts with information. This chapter gives an overview of search techniques to find information from on the Web relevant to a user query. It describes how billions of web pages are crawled, processed, and ranked, and our experiences in building Ask.com's search engine with over 100 million north American users. Section 2 is an overview of search engine components and historical perspective of search engine development.
    [Show full text]
  • Programming Abstractions, Compilation, and Execution Techniques for Massively Parallel Data Analysis
    Programming Abstractions, Compilation, and Execution Techniques for Massively Parallel Data Analysis vorgelegt von Dipl.-Inf. Stephan Ewen aus Mainz der Fakult¨atIV - Elektrotechnik und Informatik der Technischen Universit¨atBerlin zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften - Dr. Ing. - genehmigte Dissertation Promotionsausschuss: Vorsitzender: Prof. Dr. Peter Pepper Gutachter: Prof. Dr. Volker Markl Prof. Dr. Odej Kao Prof. Mike Carey, Ph.D. Tag der wissenschaftlichen Aussprache: 22. Oktober 2014 Berlin 2015 D 83 ii Acknowledgments During the process of creating this thesis, I have been accompanied by a great many people over the time. Many of them offered valuable discussions and helped to shape the outcome of this thesis in one way or another. First and foremost, I want to thank my adviser Volker Markl. He has not only provided the great work environment and the collaborations under which this thesis was created, but helped with countless comments and valuable insights. Throughout my entire academic life, Volker has mentored me actively and given me the opportunity to work on amazing projects. I would like to thank my advisers Odej Kao and Mike Carey for their helpful comments and the discussions that we had at at various points, like project meetings, conferences, or academic visits. I worked with many great people on a day to day basis, among who I want to give special thanks to Alexander Alexandrov, Fabian H¨uske, Sebastian Schelter, Kostas Tzoumas, and Daniel Warneke. Thank you guys for the great work atmosphere, your ideas, your sense of humor, and for making the frequent long working hours not feel that bad.
    [Show full text]