
INFO310 0 Advanced Topics in Model-Based Information Systems Kandidat 102 Oppgaver Oppgavetype Vurdering Status Introduction Dokument Automatisk poengsum Levert Plagiarism and Declaration Dokument Automatisk poengsum Levert 1 Essay Filopplasting Manuell poengsum Levert INFO310 0 Advanced Topics in Model-Based Information Systems Emnekode INFO310 PDF opprettet 16.11.2016 16:37 Vurderingsform INFO310 Opprettet av Andreas Lothe Opdahl Starttidspunkt: 03.11.2016 14:00 Antall sider 18 Sluttidspunkt: 09.11.2016 14:00 Oppgaver inkludert Nei Sensurfrist Ikke satt Skriv ut automatisk rettede Nei 1 Kandidat 102 Seksjon 1 1 OPPGAVE Essay Upload your file here. Maximum one file. BESVARELSE Filopplasting Filnavn 9066477_cand-9347761_9157556 Filtype pdf Filstørrelse 1482.886 KB Opplastingstid 09.11.2016 12:42:35 Neste side Besvarelse vedlagt INFO310 0 Advanced Topics in Model-Based Information Systems Page 2 av 18 Kandidat 102 SEMANTIC TECHNOLOGIES IN SEARCH ENGINES: GOOGLE AND COMPETITORS MARIO MARTINEZ REQUENA [email protected] Student number: 248948 INFO310 0 Advanced Topics in Model-Based Information Systems Page 3 av 18 Kandidat 102 Index 1. Introduction .................................................................................................................................. 2 2. Semantic search in Google ........................................................................................................... 2 1. How Google search engine works ............................................................................................ 3 2. Knowledge Graph ..................................................................................................................... 3 3. Knowledge Vault ...................................................................................................................... 4 4. Google Hummingbird ............................................................................................................... 5 5. Minor semantic patents ........................................................................................................... 7 1. Identification of semantic units from within a search query ............................................... 7 2. Inferring User Interests ........................................................................................................ 7 3. Competitors .................................................................................................................................. 7 1. Kngine ....................................................................................................................................... 7 2. Wolfram Alpha ......................................................................................................................... 8 3. Comparative Study ................................................................................................................... 9 4. Conclusion and future directions ............................................................................................... 11 5. Personal opinion and difficulties throw this work ...................................................................... 13 6. Referencies ................................................................................................................................. 14 INFO310 0 Advanced Topics in Model-Based Information Systems Page 4 av 18 Kandidat 102 1. Introduction We are all living on the information age. We have digital components all over the place, from our cars to health trackers. We live in a world where we use our smartphones as a part of us. Smartphones has become the first way that the humans have to interact with the digital word as the number “traditional computers” was surpassed by the intelligent phones in 2011 [1]. This is the first thing to understand this new era of the human-to-machine interaction´. Smartphones can be interpreted now as a part of the “new human being”. Now the relation needs to be more user friendly, more organic. Following with this concept and applied to the search engines, that nowadays are a kind of access to the collective memory, they need to get a “questions and answers” dynamic, an “human touch”, and this is in part achieved by introducing in the traditional search engines parts of semantic search. Semantic search, according to the definition provided by Wikipedia [2], wants to improve the search accuracy by analysing the context and intent of the user. Both of this concepts are really important because they can change radically the correct answer to the same search question. In a normal search engine, it would not even be noticed. This is why the world leaders search companies are introducing it on them powerful engines, and in this paper is going to be discussed why and how. 2. Semantic search in Google According to this rank [3], Google is by far, the first search engine on the internet, so this makes it the first subject of analysis. The Google search engine has been upgraded during the years. During its 18 years, the algorithm behind the engine has been changed many times, and big changes are announced publicly by Google. One of the first semantic big changes that Google has introduced was the Knowledge Graph, on May 16 2012, that aim to give to the users a more “environmental information” and entity recognition about the search that the user performs perform [4]. Apart from the Knowledge Graph, Google perform some changes to the engine itself. The latest ones have been Google Caffeine, designed to return results faster changing the way the crawlers index the pages, Google Panda, that aimed to display the higher quality sites first, Google Penguin, that corrects the errors from the Panda update and penalises the sites that are artificially increasing the rank of their pages and, finally, in September 26, 2013, they announced the biggest in the algorithm change since 2001, Google Hummingbird. This upgrade aims to, apart from the already included synonyms, to understand the context and intent of the user. In other words, they introduced semantic search on their algorithm. Even if Google Hummingbird probably one of the biggest semantic changes to the search engine, semantics have been on Google for a long period of time. They are not as big as Hummingbird, but they all help to create a more semantic Google INFO310 0 Advanced Topics in Model-Based Information Systems Page 5 av 18 Kandidat 102 1. How Google search engine works The whole process between a Google search cannot be displayed as a line because half of the process is being realized constantly. This is the crawling and indexing process: Google send crawlers, called Googlebots, surf the internet. They got throw the web by following links from page to page. Apart from traditional links, they also crawl through books, maps, Wikipedia, CIA world factbook, etc. It is a continuous process and because of that, the sites that are frequently updated will get more crawled. A copy of each page is stored on a gigantic Index as well as some data about it. This index also contains images. From the search perspective, a user performs a Search Query. Google analyse, correct and will try to understand the string of characters/voice command/image. This is the part Hummingbird upgraded. Then, based on this analysis, it will pull pages from its index, and Google will rank them based on more than 200 internal parameters. These parameters are almost secret. On this set of filters are included the quality, freshness and number of users that enter on this pages, between others. This is the part where the SEO experts works, trying to perfect details that makes that a page is considered as “good quality” for Google in order to put it up on higher the list. After this ranking, Google will pick relevant pieces to show from the page according to the search and will elaborate the search page itself. 2. Knowledge Graph One of the main Google statements is the following: “Google’s mission is to organize the world’s information to make it universally accessible and useful.” The introduction of the Knowledge Graph is behind this statement. It is not a remarkable change on the search algorithm, but it is one of the first big approaches that Google has taken to the semantics technologies in its search engine. The knowledge graph is a knowledge base that contains information about entities and relationships between entities. Knowledge extracts information out of text from Wikipedia, Wikidata and the CIA World Factbook. Basically, it is not processing you subject of search as a string of characters that need be found on a database, it is treating your query as an entity, a real world object or character, and as an actual object will be related to other entities. The entities can be classified as the way that they are obtained: Explicit entities: These entities are extracted directly with semantic web technologies from the structured mark-up of a webpage. Implicit entities: These entities are referred or derived from a text on page. In order to get the entities out of the text algorithms for processing the natural language are used. This type of knowledge graph has been used by others companies in different fields: Bing is the second search engine and it works really similar to Google, so, in 2013 Microsoft announced Satori Knowledge Base, with near to 0 information about how it works. INFO310 0 Advanced Topics in Model-Based Information Systems Page 6 av 18 Kandidat 102 Another popular search engine such as Yahoo!
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages18 Page
-
File Size-