Credibility in Web Search Engines Dirk Lewandowski Hamburg University of Applied Sciences, Germany
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Web Content Analysis
Web Content Analysis Workshop in Advanced Techniques for Political Communication Research: Web Content Analysis Session Professor Rachel Gibson, University of Manchester Aims of the Session • Defining and understanding the Web as on object of study & challenges to analysing web content • Defining the shift from Web 1.0 to Web 2.0 or ‘social media’ • Approaches to analysing web content, particularly in relation to election campaign research and party homepages. • Beyond websites? Examples of how web 2.0 campaigns are being studied – key research questions and methodologies. The Web as an object of study • Inter-linkage • Non-linearity • Interactivity • Multi-media • Global reach • Ephemerality Mitra, A and Cohen, E. ‘Analyzing the Web: Directions and Challenges’ in Jones, S. (ed) Doing Internet Research 1999 Sage From Web 1.0 to Web 2.0 Tim Berners-Lee “Web 1.0 was all about connecting people. It was an interactive space, and I think Web 2.0 is of course a piece of jargon, nobody even knows what it means. If Web 2.0 for you is blogs and wikis, then that is people to people. But that was what the Web was supposed to be all along. And in fact you know, this ‘Web 2.0’, it means using the standards which have been produced by all these people working on Web 1.0,” (from Anderson, 2007: 5) Web 1.0 The “read only” web Web 2.0 – ‘What is Web 2.0’ Tim O’Reilly (2005) “The “read/write” web Web 2.0 • Technological definition – Web as platform, supplanting desktop pc – Inter-operability , through browser access a wide range of programmes that fulfil a variety of online tasks – i.e. -
A Study on Vertical and Broad-Based Search Engines
International Journal of Latest Trends in Engineering and Technology IJLTET Special Issue- ICRACSC-2016 , pp.087-093 e-ISSN: 2278-621X A STUDY ON VERTICAL AND BROAD-BASED SEARCH ENGINES M.Swathi1 and M.Swetha2 Abstract-Vertical search engines or Domain-specific search engines[1][2] are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Broad-based search engines or Web-wide search engines. The paper focuses on the survey of domain specific search engine which is becoming more popular as compared to Web- Wide Search Engines as they are difficult to maintain and time consuming .It is also difficult to provide appropriate documents to represent the target data. We also listed various vertical search engines and Broad-based search engines. Index terms: Domain specific search, vertical search engines, broad based search engines. I. INTRODUCTION The Web has become a very rich source of information for almost any field, ranging from music to histories, from sports to movies, from science to culture, and many more. However, it has become increasingly difficult to search for desired information on the Web. Users are facing the problem of information overload , in which a search on a general-purpose search engine such as Google (www.google.com) results in thousands of hits.Because a user cannot specify a search domain (e.g. medicine, music), a search query may bring up Web pages both within and outside the desired domain. Example 1: A user searching for “cancer” may get Web pages related to the disease as well as those related to the Zodiac sign. -
Legality and Ethics of Web Scraping
Murray State's Digital Commons Faculty & Staff Research and Creative Activity 12-15-2020 Tutorial: Legality and Ethics of Web Scraping Vlad Krotov Leigh Johnson Murray State University Leiser Silva University of Houston Follow this and additional works at: https://digitalcommons.murraystate.edu/faculty Recommended Citation Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and Ethics of Web Scraping. Communications of the Association for Information Systems, 47, pp-pp. https://doi.org/10.17705/1CAIS.04724 This Peer Reviewed/Refereed Publication is brought to you for free and open access by Murray State's Digital Commons. It has been accepted for inclusion in Faculty & Staff Research and Creative Activity by an authorized administrator of Murray State's Digital Commons. For more information, please contact [email protected]. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/343555462 Legality and Ethics of Web Scraping, Communications of the Association for Information Systems (forthcoming) Article in Communications of the Association for Information Systems · August 2020 CITATIONS READS 0 388 3 authors, including: Vlad Krotov Murray State University 42 PUBLICATIONS 374 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Addressing barriers to big data View project Web Scraping Framework: An Integrated Approach to Retrieving Big Qualitative Data from the Web View project All content following this -
Digital Citizenship Education
Internet Research Ethics: Digital Citizenship Education Internet Research Ethics: Digital Citizenship Education by Tohid Moradi Sheykhjan Ph.D. Scholar in Education University of Kerala Paper Presented at Seminar on New Perspectives in Research Organized by Department of Education, University of Kerala Venue Seminar Hall, Department of Education, University of Kerala, Thycaud, Thiruvananthapuram, Kerala, India 17th - 18th November, 2017 0 Internet Research Ethics: Digital Citizenship Education Internet Research Ethics: Digital Citizenship Education Tohid Moradi Sheykhjan PhD. Scholar in Education, University of Kerala Email: [email protected] Abstract Our goal for this paper discusses the main research ethical concerns that arise in internet research and reviews existing research ethical guidance in relation to its application for educating digital citizens. In recent years we have witnessed a revolution in Information and Communication Technologies (ICTs) that has transformed every field of knowledge. Education has not remained detached from this revolution. Ethical questions in relation to technology encompass a wide range of topics, including privacy, neutrality, the digital divide, cybercrime, and transparency. In a growing digital society, digital citizens recognize and value the rights, responsibilities and opportunities of living, learning and working in an interconnected digital world, and they engage in safe, legal and ethical behaviors. Too often we see technology users misuse and abuse technology because they are unaware of -
Meta Search Engine with an Intelligent Interface for Information Retrieval on Multiple Domains
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.4, October 2011 META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON MULTIPLE DOMAINS D.Minnie1, S.Srinivasan2 1Department of Computer Science, Madras Christian College, Chennai, India [email protected] 2Department of Computer Science and Engineering, Anna University of Technology Madurai, Madurai, India [email protected] ABSTRACT This paper analyses the features of Web Search Engines, Vertical Search Engines, Meta Search Engines, and proposes a Meta Search Engine for searching and retrieving documents on Multiple Domains in the World Wide Web (WWW). A web search engine searches for information in WWW. A Vertical Search provides the user with results for queries on that domain. Meta Search Engines send the user’s search queries to various search engines and combine the search results. This paper introduces intelligent user interfaces for selecting domain, category and search engines for the proposed Multi-Domain Meta Search Engine. An intelligent User Interface is also designed to get the user query and to send it to appropriate search engines. Few algorithms are designed to combine results from various search engines and also to display the results. KEYWORDS Information Retrieval, Web Search Engine, Vertical Search Engine, Meta Search Engine. 1. INTRODUCTION AND RELATED WORK WWW is a huge repository of information. The complexity of accessing the web data has increased tremendously over the years. There is a need for efficient searching techniques to extract appropriate information from the web, as the users require correct and complex information from the web. A Web Search Engine is a search engine designed to search WWW for information about given search query and returns links to various documents in which the search query’s key words are found. -
Methodologies for Crawler Based Web Surveys
Mike Thelwall Page 1 of 19 Methodologies for Crawler Based Web Surveys Mike Thelwall1 School of Computing, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK [email protected] Abstract There have been many attempts to study the content of the web, either through human or automatic agents. Five different previously used web survey methodologies are described and analysed, each justifiable in its own right, but a simple experiment is presented that demonstrates concrete differences between them. The concept of crawling the web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well-known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. This paper concludes that any scientific attempt to crawl the web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. A new hybrid random page selection methodology is also introduced. Keywords Web surveys, search engine, indexing, random walks 1 Introduction The current importance of the web has spawned many attempts to analyse sets of web pages in order to derive general conclusions about its properties and patterns of use (Crowston and Williams, 1997; Hooi-Im et al, 1998; Ingwerson, 1998; Lawrence and Giles, 1998a; Miller and Mather, 1998; Henzinger et al., 1999; Koehler, 1999; Lawrence and Giles, 1999a; Smith, 1999; Snyder and Rosenbaum, 1999; Henzinger et al., 2000; Thelwall, 2000a; Thelwall, 2000b; Thelwall, 2000d). -
DEEP WEB IOTA Report for Tax Administrations
DEEP IOTA Report for Tax Administrations IOTA Report for Tax Administrations – Deep Web DEEP WEB IOTA Report for Tax Administrations Intra-European Organisation of Tax Administrations (IOTA) Budapest 2012 1 IOTA Report for Tax Administrations – Deep Web PREFACE This report on deep Web investigation is the second report from the IOTA “E- Commerce” Task Team of the “Prevention and Detection of VAT Fraud” Area Group. The team started operations in January 2011 in Wroclaw, Poland initially focusing its activities on problems associated with the audit of cloud computing, the report on which was published earlier in 2012. During the Task Teams’ second phase of work the focus has been on deep Web investigation. What can a tax administration possibly gain from the deep Web? For many the deep Web is something of a mystery, something for computer specialists, something they have heard about but do not understand. However, the depth of the Web should not represent a threat as the deep Web offers a very important source of valuable information to tax administrations. If you want to understand, to master and to fully exploit the deep Web, you need to see the opportunities that arise from using information buried deep within the Web, how to work within the environment and what other tax administrations have already achieved. This report is all about understanding, mastering and using the deep Web as the key to virtually all audits, not just those involving E-commerce. It shows what a tax administration can achieve by understanding the deep Web and how to use it to their advantage in every audit. -
Harnessing the Deep Web: Present and Future
Harnessing the Deep Web: Present and Future Jayant Madhavan Loredana Afanasiev Lyublena Antova Alon Halevy Google Inc. Universiteit van Amsterdam Cornell University Google Inc. [email protected] [email protected] [email protected] [email protected] 1. INTRODUCTION pre-compute queries to forms and inserts the resulting pages The Deep Web refers to content hidden behind HTML into a web-search index. These pages are then treated like forms. In order to get to such content, a user has to perform any other page in the index and appear in answers to web- a form submission with valid input values. The name Deep search queries. We have pursued both approaches in our Web arises from the fact that such content was thought to work. In Section 3 we explain our experience with both, be beyond the reach of search engines. The Deep Web is and where each approach provides value. also believed to be the biggest source of structured data on We argue that the value of the virtual integration ap- the Web and hence accessing its contents has been a long proach is in constructing vertical search engines in specific standing challenge in the data management community [1, domains. It is especially useful when it is not enough to 8, 9, 13, 14, 18, 19]. just focus on retrieving data from the underlying sources, Over the past few years, we have built a system that ex- but when users expect a deeper experience with the con- posed content from the Deep Web to web-search users of tent (e.g., making purchases) after they found what they Google.com. -
A History of Social Media
2 A HISTORY OF SOCIAL MEDIA 02_KOZINETS_3E_CH_02.indd 33 25/09/2019 4:32:13 PM CHAPTER OVERVIEW This chapter and the next will explore the history of social media and cultural approaches to its study. For our purposes, the history of social media can be split into three rough-hewn temporal divisions or ages: the Age of Electronic Communications (late 1960s to early 1990s), the Age of Virtual Community (early 1990s to early 2000s), and the Age of Social Media (early 2000s to date). This chapter examines the first two ages. Beginning with the Arpanet and the early years of (mostly corpo- rate) networked communications, our history takes us into the world of private American online services such as CompuServe, Prodigy, and GEnie that rose to prominence in the 1980s. We also explore the internationalization of online services that happened with the publicly- owned European organizations such as Minitel in France. From there, we can see how the growth and richness of participation on the Usenet system helped inspire the work of early ethnographers of the Internet. As well, the Bulletin Board System was another popular and similar form of connection that proved amenable to an ethnographic approach. The research intensified and developed through the 1990s as the next age, the Age of Virtual Community, advanced. Corporate and news sites like Amazon, Netflix, TripAdvisor, and Salon.com all became recogniz- able hosts for peer-to-peer contact and conversation. In business and academia, a growing emphasis on ‘community’ began to hold sway, with the conception of ‘virtual community’ crystallizing the tendency. -
Search Engines and Power: a Politics of Online (Mis-) Information
5/2/2020 Search Engines and Power: A Politics of Online (Mis-) Information Webology, Volume 5, Number 2, June, 2008 Table of Titles & Subject Authors Home Contents Index Index Search Engines and Power: A Politics of Online (Mis-) Information Elad Segev Research Institute for Law, Politics and Justice, Keele University, UK Email: e.segev (at) keele.ac.uk Received March 18, 2008; Accepted June 25, 2008 Abstract Media and communications have always been employed by dominant actors and played a crucial role in framing our knowledge and constructing certain orders. This paper examines the politics of search engines, suggesting that they increasingly become "authoritative" and popular information agents used by individuals, groups and governments to attain their position and shape the information order. Following the short evolution of search engines from small companies to global media corporations that commodify online information and control advertising spaces, this study brings attention to some of their important political, social, cultural and economic implications. This is indicated through their expanding operation and control over private and public informational spaces as well as through the structural bias of the information they attempt to organize. In particular, it is indicated that search engines are highly biased toward commercial and popular US- based content, supporting US-centric priorities and agendas. Consequently, it is suggested that together with their important role in "organizing the world's information" search engines -
Characterizing Search-Engine Traffic to Internet Research Agency Web Properties
Characterizing Search-Engine Traffic to Internet Research Agency Web Properties Alexander Spangher Gireeja Ranade Besmira Nushi Information Sciences Institute, University of California, Berkeley and Microsoft Research University of Southern California Microsoft Research Adam Fourney Eric Horvitz Microsoft Research Microsoft Research ABSTRACT The Russia-based Internet Research Agency (IRA) carried out a broad information campaign in the U.S. before and after the 2016 presidential election. The organization created an expansive set of internet properties: web domains, Facebook pages, and Twitter bots, which received traffic via purchased Facebook ads, tweets, and search engines indexing their domains. In this paper, we focus on IRA activities that received exposure through search engines, by joining data from Facebook and Twitter with logs from the Internet Explorer 11 and Edge browsers and the Bing.com search engine. We find that a substantial volume of Russian content was apolit- ical and emotionally-neutral in nature. Our observations demon- strate that such content gave IRA web-properties considerable ex- posure through search-engines and brought readers to websites hosting inflammatory content and engagement hooks. Our findings show that, like social media, web search also directed traffic to IRA Figure 1: IRA campaign structure. Illustration of the struc- generated web content, and the resultant traffic patterns are distinct ture of IRA sponsored content on the web. Content spread from those of social media. via a combination of paid promotions (Facebook ads), un- ACM Reference Format: paid promotions (tweets and Facebook posts), and search re- Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, ferrals (organic search and recommendations). These path- and Eric Horvitz. -
Evaluating Internet Research Sources
Evaluating Internet Research Sources By Robert Harris Version Date: June 15, 2007 Introduction: The Diversity of Information Information is a Think about the magazine section in your local grocery store. If you reach out Commodity with your eyes closed and grab the first magazine you touch, you are about as Available in likely to get a supermarket tabloid as you are a respected journal (actually more Many Flavors likely, since many respected journals don't fare well in grocery stores). Now imagine that your grocer is so accommodating that he lets anyone in town print up a magazine and put it in the magazine section. Now if you reach out blindly, you might get the Elvis Lives with Aliens Gazette just as easily as Atlantic Monthly or Time. Welcome to the Internet. As I hope my analogy makes clear, there is an extremely wide variety of material on the Internet ranging in its accuracy, reliability, and value. Unlike most traditional information media (books, magazines, organizational documents), no one has to approve the content before it is made public. It's your job as a searcher, then, to evaluate what you locate in order to determine whether it suits your needs. Information Information is everywhere on the Internet, existing in large quantities and Exists on a continuously being created and revised. This information exists in a large Continuum of variety of kinds (facts, opinions, stories, interpretations, statistics) and is Reliability and created for many purposes (to inform, to persuade, to sell, to present a Quality viewpoint, and to create or change an attitude or belief).