Praca Dyplomowa - Magisterska

Total Page:16

File Type:pdf, Size:1020Kb

Praca Dyplomowa - Magisterska Wydział Informatyki i Zarządzania kierunek studiów: Informatyka specjalność: Systemy informacyjne Praca dyplomowa - magisterska Agregator wyników zapytań w wyszukiwarkach internetowych Agregator of results gathered from Internet search engines Maksim Buben słowa kluczowe: search engines quality raters agregator Krótkie streszczenie: Ta praca ma na celu zbadanie dziedziny zastosowania agregatorów wyników wyszukiwania, w których jakość uzyskanych wyników będzie wyższa niż jakość poszczególnych wyszukiwarek, wyniki wyszukiwania, których zostaną wykorzystane w tworzeniu zagregowanych wyników. opiekun pracy Dr inż. Marek Kopel ....................... ....................... dyplomowej Tytuł/stopień naukowy/imię i nazwisko ocena podpis Do celów archiwalnych pracę dyplomową zakwalifikowano do:* a) kategorii A (akta wieczyste) b) kategorii BE 50 (po 50 latach podlegające ekspertyzie) * niepotrzebne skreślić pieczątka wydziałowa Wrocław 2018 Streszczenie Tematem niniejszej pracy magisterskiej jest agregator wyników zapytań w wyszukiwarkach internetowych. Opisano zarówno istniejące systemy meta-wyszukiwania, jak i meta-wyszukiwarki, które przestały i obecnie nie są wykorzystywane przez użytkowników Internetu. Przeanalizowano przyczyny tego zjawiska oraz zaproponowano sposoby rozwoju systemów informacyjnych typu metasearch search engine. W pierwszej części niniejszej pracy przybliżono podstawowe pojęcia niezbędne do zrozumienia zasad funkcjonowania takiego rodzaju systemów: SERP (Search engine results page), Snippet, Search query (Zapytanie), Rodzaje zapytań, Organic results (Wyniki organiczne). Zaprezentowano również pojęcia konieczne do oceny jakości zagregowanych wyników wyszukiwań: Relewantność, Pertynentność, Assesor (Ассесор), Quality Rater, Discounted Cumulated Gain – DCG, Normalized Discounted Cumulated Gain - nDCG. Podano przykłady wykorzystania agregacji wyników wyszukiwania. W drugiej części niniejszej pracy opisano i zaimplementowano agregator wyników wyszukiwania na podstawie pozycji w wynikach wyszukiwania, których wyniki zostały wykorzystane w utworzeniu wyników agregacji. W trzeciej części pracy zbadano jakość wyników wyszukiwania agregatora na podstawie preferencji użytkownika. Przygotowano dokumentację do oceny jakości, korzystając z dwóch przewodników oceny jakości wyszukiwania: Google Przewodnik oceny jakości wyszukiwania (Search Quality Evaluator Guidelines), Yandex Przewodnik dla Asesora (Руководство для Ассесора). Zespół badawczy składał się z 15 osób: specjalistów SEO, specjalistów PPC, programistów oraz specjalistów ds. marketingu internetowego. Oceny przyznawane przez użytkowników składały się na analizę jakości zarówno poszczególnych wyszukiwarek (Google, Bing, Yandex), jak i agregatora, wykorzystującego dane z wyszukiwarek w tworzeniu własnych wyników. Zapytania do wyszukiwarek wysyłano w trzech językach. Miały one charakter zapytań o stan faktyczny. W podsumowaniu przedstawiono wnioski na temat uzyskanych wyników. Na ich podstawie można stwierdzić, że korzystanie z agregatorа wyników zapytań dla zapytań o aktualnym stanie jest uzasadnione i może poprawić jakość wyników wyszukiwania, zwiększając zadowolenie użytkownika. 1 Abstract The subject of my master's thesis is: “The aggregator of results gathered from Internet search engines. In my master's thesis, I have described existing meta-search systems and also meta- search engines, have stopped working in the recent past. The reasons for this phenomenon are analyzed. The methods of development of this type of information systems have been proposed. The first part of my work describes the basic concepts necessary to understand the principles of functioning of such systems, such as: SERP (Search engine results page), Snippet, Search query, Types of queries, Organic results. The concepts necessary to evaluate the quality of aggregated search results have also been described: Relevance, Pertinence, Assessor, Quality Rater, Discounted Cumulated Gain – DCG, Normalized Discounted Cumulated Gain - nDCG. Examples are provided of the use of aggregation of search results. In the second part of my work, a search results aggregator was described and implemented based on positions in search results, the results of which were used to create aggregation results. In the third part of my, the quality of the aggregator search results was tested based on the user's preferences. Prepared documentation for this evaluation based on two search quality evaluator guides from Google and Yandex: Search Quality Evaluator Guidelines, Guide for Assessor from Yandex. As a research team, there were 15 people involved: SEO specialists, PPC specialists, programmers and internet marketing specialists. On the basis of user ratings, a quality evaluation was made, like individual search engines (Google, Bing, Yandex), as well as an aggregator that used data from search engines to create its own results. Inquiries for search engines were in three languages and had the character of inquiries about the actual state. In summary, I presented conclusions on the obtained results. Based on these results, it can be concluded that using an aggregator of query results for queries about the current state is justified and can improve the quality of search results, which in turn increases user satisfaction. 2 Spis treści Wstęp ............................................................................................................................... 5 Przegląd stanu wiedzy w dziedzinie agregacji wyników zapytań ................................................ 9 Meta-wyszukiwanie standardowe ................................................................................................ 11 Meta-wyszukiwanie zaawansowane ............................................................................................ 13 Meta-wyszukiwarka Nigma .......................................................................................................... 15 Duckduckgo.com .......................................................................................................................... 16 Podsumowanie ............................................................................................................................. 17 Obszary wykorzystania agregacji wyników zapytań ................................................................ 19 Wyniki wyszukiwania (SERP)......................................................................................................... 20 Snippet .......................................................................................................................................... 20 Search query (Zapytanie) .............................................................................................................. 21 Organic results (Wyniki organiczne) ............................................................................................. 21 Rodzaje zapytań ............................................................................................................................ 22 Przykłady narzędzi wykorzystujących agregację wyników wyszukiwania ................................. 23 Ahrefs ............................................................................................................................................ 23 Webpozycja .................................................................................................................................. 25 Serp.watch .................................................................................................................................... 25 Senuto ........................................................................................................................................... 26 Google Search Console ................................................................................................................. 26 Podsumowanie ............................................................................................................................. 27 Faza konceptualna .......................................................................................................... 31 Zdefiniowanie podstawowych pojęć do oceny jakości systemu wyszukiwania informacji .......... 31 Dokładność (precision) ................................................................................................................. 33 Kompletność (recall) ..................................................................................................................... 33 Fall-out .......................................................................................................................................... 34 F-miara (F-measure, miara Van Riesbergena) .............................................................................. 34 Discounted Cumulative Gain ........................................................................................................ 37 Normalized Discounted Cumulative Gain ..................................................................................... 37 Cel pracy ................................................................................................................................ 39 Koncepcja realizacji agregacji ................................................................................................. 40 Opis algorytmu agregacji .............................................................................................................. 40 Algorytm rankingowania .............................................................................................................
Recommended publications
  • Russia Technology Internet Local Dominance Strengthens
    12 December 2018 | 1:51AM MSK Russia Technology: Internet Local dominance strengthens; competition among ecosystems intensifies It’s been a year since we published Russia’s internet champions positioned to Vyacheslav Degtyarev +7(495)645-4010 | keep US giants at bay. We revisit our thesis, highlighting that the domestic internet [email protected] OOO Goldman Sachs Bank incumbents are successfully defending their home turf from international competition. We have seen only modest incremental efforts from global players, with some recognizing the importance of local expertise (Alibaba’s agreement to transfer control in AliExpress Russia to local partners) or conceding to domestic market leaders (Uber merged its Russian operations with Yandex.Taxi, citing Yandex’s strong technology and brand advantage). The two domestic market leaders, Yandex and Mail.ru, have solidified their dominant positions in search and social networks, respectively, and are leveraging these core businesses to exploit new sources of growth across their ecosystems (e.g. advertising, taxi, food tech, music). While their ever-expanding competitive overlap is worrying, we note this is not unique for global tech and is still relatively limited in scale. We expect the local dominance trend to continue and see significant untapped opportunities in e-commerce, messengers, local services, cloud and fintech. We re-iterate our Buy ratings on Yandex (on CEEMEA FL) and Mail.ru, and view them as the key beneficiaries of internet sector growth in Russia. We believe the market
    [Show full text]
  • Comparative Analysis of Yandex and Google Search Engines
    Anna Paananen Comparative Analysis of Yandex and Google Search Engines Helsinki Metropolia University of Applied Sciences Master’s Degree Information Technology Master’s Thesis 26 May 2012 PREFACE Working in NetBooster Finland as an International Project Manager specialized in Russian market I’ve been asked many times about differences between the search engines Yandex and Google. This Master’s Thesis is the outcome of my professional experience in the Search Engine Optimisation field in Russia and Finland. I would like to thank all the people from NetBooster Finland and Helsinki Metropolia University of Applied Sciences who has helped me in the development of the study. Special thanks to my instructors Timo-Pekka Jäntti and Ville Jääskeläinen for all the support, both in technical and non-technical matters. I would like to thank also my collegues from NetBooster Finland for their help and support while writing the thesis. Last but not least I would like to thank my mother Tamara Kapitonova, who always has been my prior motivator for the education, and of course to my lovely husband Jukka Paananen for his inconditional support and patience. Helsinki, May 26, 2012 Anna Paananen Author(s) Anna Paananen Title Comparative Analysis of Google and Yandex Search Engines Number of Pages 51 pages + 1 appendix Date 26 May 2012 Degree Master’s Degree Degree Programme Degree Programme in Information Technology Specialisation option Instructor Timo-Pekka Jäntti, Supervisor This thesis presents a comparative analysis of algorithms and information retrieval performance of two search engines: Yandex and Google in the Russian language. Comparing two search engines is usually done with user satisfaction studies and market share measures in addition to the basic comparison measures.
    [Show full text]
  • Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem
    Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: Digital Single Market CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report This study was carried out for the European Commission by Luc MEERTENS 2 Khalid CHOUKRI Stefania AGUZZI Andrejs VASILJEVS Internal identification Contract number: 2017/S 108-216374 SMART number: 2016/0103 DISCLAIMER By the European Commission, Directorate-General of Communications Networks, Content & Technology. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. ISBN 978-92-76-00783-8 doi: 10.2759/142151 © European Union, 2019. All rights reserved. Certain parts are licensed under conditions to the EU. Reproduction is authorised provided the source is acknowledged. 2 CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report CONTENTS Table of figures ................................................................................................................................................ 7 List of tables ..................................................................................................................................................
    [Show full text]
  • A Survey of Collaborative Web Search Through Collaboration Among Search Engine Users to More Relevant Results
    A Survey of Collaborative Web Search Through Collaboration among Search Engine Users to More Relevant Results Pavel Surynek Faculty of Mathematics and Physics, Charles University in Prague, Malostranské náměstí 25, Prague, Czech Republic Keywords: Collaborative Web Search, Social Search, Search Engine, Search Results, Collaborative Filtering, Recommender Systems, System Integration. Abstract: A survey on collaborative aspects of web search is presented in this paper. Current state in full-text web search engines with regards on users collaboration is given. The position of the paper is that it is becoming increasingly important to learn from other users searches in a collaborative way in order to provide more relevant results and increase benefit from web search sessions. Recommender systems represent a rich source of concepts that could be employed to enable collaboration in web search. A discussion of techniques used in recommender systems is followed by a suggestion of integration web search with recommender sys- tems. An initial experience with web search powering small academic site is reported finally. 1 INTRODUCTION AND assumption that a series of queries characterize the effort of what the user want to find better than the MOTIVATION single query. The typical search engine however does not help in this effort – users are put into isola- Web search is an area of the information technology tion typically which precludes any cooperation and industry where artificial intelligence and particularly recommendation from other users based on past knowledge engineering techniques can be applied queries. To be honest, for instance the Bing search with potentially significant impacts. Currently users engine (more correctly the decision engine) uses face a still increasing amount of data of many kinds certain technology that provide search results based that can be accessed through web (textual data, mul- on user’s search history and geographical location.
    [Show full text]
  • Efficient Marketing Communications Towards Russian Customers. Case: Grande Orchidée Fashion Center
    Saimaa University of Applied Sciences Faculty of Business Administration, Lappeenranta Degree Programme in International Business Specialisation in International Business Bachelor's Thesis 2014 Ekaterina Evtikhevich Efficient Marketing Communications towards Russian Customers. Case: Grande Orchidée Fashion Center Bachelor's Thesis 2014 ABSTRACT Ekaterina Evtikhevich Efficient Marketing Communications towards Russian Customers. Case: Grande Orchidée Fashion Center, 47 pages, 2 appendices Saimaa University of Applied Sciences Faculty of Business Administration, Lappeenranta Degree Programme in International Business Specialisation in International Business Bachelor’s Thesis 2014 Instructor: Principal Lecturer Minna Ikävalko, Saimaa University of Applied Sciences The objective of this thesis was to research what are the most efficient market- ing communications of Grande Orchidée Fashion Center towards Russian customers. The focus was kept on individual customers who come regularly to do shopping in Lappeenranta. The theory part of this research work includes an examination of conventional theories of marketing communication tools and analysis of modern marketing in Russia. The empirical part was implemented by studying the current marketing com- munications of the company. The data collection methods included a semi structured interview with the CEO's assistant at the company and a customer survey. The outcomes showed the most efficient communication channels that can be utilized and that can positively contribute to the company's marketing
    [Show full text]
  • Accelerating Development Using the Web: Empowering Poor and Marginalized Populations George Sadowsky, Ed
    Accelerating Development Using the Web: Empowering Poor and Marginalized Populations George Sadowsky, ed. George Sadowsky Najeeb Al-Shorbaji Richard Duncombe Torbjörn Fredriksson Alan Greenberg Nancy Hafkin Michael Jensen Shalini Kala Barbara J. Mack Nnenna Nwakanma Daniel Pimienta Tim Unwin Cynthia Waddell Raul Zambrano Cover image: Paul Butler http://paulbutler.org/archives/visualizing-facebook-friends/ Creative Commons License Attribution-ShareAlike 3.0 This work, with the exception of Chapter 7 (Health), is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Chapter 7 (Health) © World Health Organization [2012]. All rights reserved. The World Health Organization has granted the Publisher permission for the reproduction of this chapter. Accelerating Development Using the Web | Foreword from the Rockefeller Foundation i Foreword from the Rockefeller Foundation For almost 100 years, the Rockefeller Foundation has been at the forefront of new ideas and innovations related to emerging areas of technology. In its early years, the Foundation advanced new technologies to eradicate hookworm and develop a vaccine for yellow fever, creating a lasting legacy of strengthening the application of new technologies to improve the lives of the world’s poor and vulnerable. By the middle of the 20th century, this approach led the Foundation to the pre-cursor to the modern day comput- er. At the dawn of the digital era in 1956, the Foundation helped launch the field of artificial intelligence through its support for the work of John McCarthy, the computing visionary who coined the term.
    [Show full text]
  • The Competitiveness Analysis of the European Language Technology Market
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3381–3389 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC The Competitiveness Analysis of the European Language Technology Market Andrejs Vasiļjevs, Inguna Skadiņa, Indra Sāmīte, Kaspars Kauliņš, Ēriks Ajausks, Jūlija Meļņika and Aivars Bērziņš Tilde Vienibas gatve 75a, Riga, Latvia {fistname.lastname}@tilde.lv Abstract This paper presents the key results of a study on the global competitiveness of the European Language Technology market for three areas – machine translation, speech technology, and cross-lingual search. EU competitiveness is analyzed in comparison to North America and Asia. The study focuses on seven dimensions (research, innovations, investments, market dominance, industry, infrastructure, and Open Data) that have been selected to characterize the language technology market. The study concludes that while Europe still has strong positions in Research and Innovation, it lags behind North America and Asia in scaling innovations and conquering market share. Keywords: competitiveness analysis, language technology market, machine translation, speech technology, cross-lingual search search technologies are covered in more detail, while only 1. Introduction the essence is included for machine translation because it has already been discussed by Vasiljevs et al. (2019b). This paper provides the key results of the competitiveness analysis of the European Language Technology (LT) market
    [Show full text]
  • Ngos and Illicit Drug Policy Change in the Russian Federation: 2010-2013 [Doctoral Thesis]
    COPYRIGHT AND USE OF THIS THESIS This thesis must be used in accordance with the provisions of the Copyright Act 1968. Reproduction of material protected by copyright may be an infringement of copyright and copyright owners may be entitled to take legal action against persons who infringe their copyright. Section 51 (2) of the Copyright Act permits an authorized officer of a university library or archives to provide a copy (by communication or otherwise) of an unpublished thesis kept in the library or archives, to a person who satisfies the authorized officer that he or she requires the reproduction for the purposes of research or study. The Copyright Act grants the creator of a work a number of moral rights, specifically the right of attribution, the right against false attribution and the right of integrity. You may infringe the author’s moral rights if you: - fail to acknowledge the author of this thesis if you quote sections from the work - attribute this thesis to another author - subject this thesis to derogatory treatment which may prejudice the author’s reputation For further information contact the University’s Copyright Service. sydney.edu.au/copyright NGOs and illicit drug policy change in the Russian Federation: 2010-2013 Andrey Zheluk University of Sydney 2014 This thesis is submitted in fulfilment of the requirements of the degree of Doctor of Philosophy 1 “Хотели как лучше, а получилось как всегда.” Виктор Черномырдин. Председатель правительства РФ. 6 августа 1993 г. “We'd hoped for the best, but things turned out just as they always do.” Victor Chernomyrdin Prime Minister of the Russian Federation 6th August 1993.
    [Show full text]
  • Market Opportunities and Key Foreign Trade Restrictions
    United States International Trade Commission Global Digital Trade 1: Market Opportunities and Key Foreign Trade Restrictions August 2017 Publication Number: 4716 Investigation Number: 332-561 United States International Trade Commission Commissioners Rhonda K. Schmidtlein, Chairman David S. Johanson, Vice Chairman Irving A. Williamson Meredith M. Broadbent Catherine DeFilippo Director, Office of Operations Jonathan Coleman Director, Office of Industries Address all communications to Secretary to the Commission United States International Trade Commission Washington, DC 20436 United States International Trade Commission Global Digital Trade 1: Market Opportunities and Key Foreign Trade Restrictions August 2017 Publication Number: 4716 Investigation Number: 332-561 United States International Trade Commission This report was prepared principally by: Project Leader David Coffin [email protected] Deputy Project Leader Jeremy Streatfeild [email protected] Office of Industries Jared Angle, Renato Barreda, Laura Bloodgood, Sharifa Crawford, Sharon Ford, Eric Forden, John Giamalva, Fernando Gracia, Jeffrey Horowitz, Mahnaz Khan, Dan Kim, Erick Oh, Sarah Oliver, Chris Robinson, Mitchell Semanik, George Serletis, Isaac Wohl Office of Economics Nabil Abbyad, Tamara Gurevich, Peter Herman, Grace Kenneally, Ricky Ubee, Heather Wickramarachi Content Reviewers Jennifer Powell and David Riker Editorial Reviewers Judy Edelhoff and Peg Hausman Office of Analysis and Research Services Maureen Letostak Document Preparation and Support Jaime
    [Show full text]
  • Written Statement for the Record by Megan Gray, General Counsel And
    Written Statement for the Record by Megan Gray, General Counsel and Policy Advocate for DuckDuckGo for a hearing entitled "Online Platforms and Market Power, Part 2: Innovation and Entrepreneurship" before The House Judiciary Subcommittee on Antitrust, Commercial and Administrative Law Rep. David Cicilline, Chair Rep. James Sensenbrenner, Ranking Member Tuesday, July 16, 2019 DuckDuckGo is a privacy technology company that helps consumers stay more private online. DuckDuckGo has been competing in the U.S. search market for over a decade, and it is currently the 4th largest search engine in this market (see market share section below). From the vantage point of a company vigorously trying to compete, DuckDuckGo can hopefully provide useful background on the U.S. search market. Features of Competitive General Search Engines A competitive U.S. general search engine in 2019 must have a set of high-quality search features, and ensure none are substandard or shown at the wrong times. This set of mandatory high-quality search features includes: An up-to-date index of most all of the English web pages on the Internet (referred to as “organic links”) Maps Local business answers (e.g., restaurant addresses and phone numbers) News Images Videos Products/shopping Definitions Wikipedia reference Quick answers (calculator, conversions, etc.) Additional niche features may also be necessary to be competitive with particular consumer segments, such as: Up-to-date indexes of web pages in other languages Sports scores Airplane flight information Question/Answer reference (e.g., for computer programming) Lyrics When DuckDuckGo launched in 2008, this list was much smaller, and arguably just one item was a required feature: organic links (sometimes referred to as “the ten blue links”).
    [Show full text]
  • 0 Time-Aware Click Model
    0 Time-Aware Click Model YIQUN LIU, Tsinghua University XIAOHUI XIE, Tsinghua University CHAO WANG, Tsinghua University JIAN-YUN NIE, Universite´ de Montreal´ MIN ZHANG, Tsinghua University SHAOPING MA, Tsinghua University Click-through information is considered as a valuable source of users’ implicit relevance feedback for commercial search engines. As existing studies have shown that search result position in search engine result page(SERP) has a very strong influence on users’ examination behavior, most existing click models are position-based, assuming that users examine results from top to bottom in a linear fashion. While these click models have been successful, most do not take temporal information into account. As many existing studies have shown, click dwell time and click sequence information are strongly correlated with users’ perceived relevance and search satisfaction. Incorporating temporal information may be important to improve performance of user click models for Web searches. In this paper, we investigate the problem of properly incorporating temporal information into click models. We firstly carry out a laboratory eye-tracking study to analyze users’ examination behavior in different click sequences and find that user common examination path among adjacent clicks is linear. Afterwards, we analyze user dwell time distribution in different search logs and find that we cannot simply use a click dwell time threshold (e.g. 30s) to distinguish relevant/irrelevant results. Finally, we propose a novel click model named Time-Aware Click Model (TACM) that captures the temporal information of user behavior. We compare TACM with a number of existing click models using two real-world search engine logs.
    [Show full text]
  • Optimizacija Web Stranica Kao Determinanta Uspješnosti U Turizmu
    SVEUČILIŠTE U SPLITU EKONOMSKI FAKULTET DIPLOMSKI RAD OPTIMIZACIJA WEB STRANICA KAO DETERMINANTA USPJEŠNOSTI U TURIZMU Mentor: Studentica: doc. dr. sc. Daniela Garbin Praničević Marija Puljić 2160455 Split, ožujak, 2019. SADRŽAJ 1. UVOD ............................................................................................................... 1 1.1. Problem istraživanja .................................................................................................... 1 1.2. Predmet istraživanja .................................................................................................... 2 1.3. Istraživačka hipoteza .................................................................................................... 3 1.4. Ciljevi istraživanja ........................................................................................................ 3 1.5. Metode istraživanja ...................................................................................................... 4 1.6. Doprinos istraživanja ................................................................................................... 5 1.7. Obrazloženje strukture diplomskog rada .................................................................. 5 2. OPTIMIZACIJA WEB STRANICE ............................................................. 7 2.1. Pojam optimizacije web stranica (SEO) ..................................................................... 7 2.1.1. Search engine optimisation .................................................................................
    [Show full text]