Determining the Remote Client Perceived Response Time from Live Packet Streams David P

Total Page:16

File Type:pdf, Size:1020Kb

Determining the Remote Client Perceived Response Time from Live Packet Streams David P Appears in Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI 2004) ksniffer: Determining the Remote Client Perceived Response Time from Live Packet Streams David P. Olshefski1;2, Jason Nieh1 and Erich Nahum2 1Columbia University and 2IBM T.J Watson Research [email protected], [email protected], [email protected] Abstract Server farm management systems that allocate re- sources on-demand to meet specified response time goals As dependence on the World Wide Web continues to are receiving much attention. The ability of a Web host- grow, so does the need for businesses to have quantitative ing center to move CPU cycles, machines, bandwidth measures of the client perceived response times of their and storage from a hosted Web site that is meeting its Web services. We present ksniffer, a kernel-based traf- latency goal to one that is not, is a key requirement for fic monitor capable of determining pageview response an automated management system. Such allocation de- times as perceived by remote clients, in real-time at giga- cisions must be based on accurate measurements. Over- bit traffic rates. ksniffer is based on novel, online mech- allocating resources to one hosted Web site results in an anisms that take a “look once, then drop” approach to overcharge to that customer and a reduction in the avail- packet analysis to reconstruct TCP connections and learn able physical resources left to meet the needs of the oth- client pageview activity. These mechanisms are designed ers. Under-allocation results in poor response time and to operate accurately with live network traffic even in the unsatisfied Web site users. The ability to base these al- presence of packet loss and delay, and can be efficiently location decisions on a measure that is relevant to both implemented in kernel space. This enables ksniffer to the Web site owner and the end user of the Web site is a perform analysis that exceeds the functionality of cur- competitive advantage. rent traffic analyzers while doing so at high bandwidth Unfortunately, obtaining an accurate measure of the rates. ksniffer requires only to passively monitor network client perceived response time is non-trivial. Current ap- traffic and can be integrated with systems that perform proaches include active probing from geographically dis- server management to achieve specified response time tributed monitors, instrumenting HTML Web pages with goals. Our experimental results demonstrate that ksnif- JavaScript, offline analysis of packet traces, and instru- fer can run on an inexpensive, commodity, Linux-based menting Web servers to measure application-level per- PC and provide online pageview response time measure- formance or per connection performance. All of these ments, across a wide range of operating conditions, that approaches fall short, in one area or another, in terms of are within five percent of the response times measured at accuracy, cost, scalability, usefulness of information col- the client by detailed instrumentation. lected, and real-time availability of measurements. We have created ksniffer, an online server-side traffic 1 Introduction monitor that combines passive packet capture with fast For many businesses, the World Wide Web is a highly online mechanisms to accurately determine client per- competitive environment. Customers seeking quality on- ceived pageview response times on a per pageview basis. line services have choices, and often the characteristic ksniffer uses a model of TCP retransmission and expo- that distinguishes a successful site from the rest is per- nential backoff that accounts for latency due to connec- formance. Clients are keenly aware when response time tion setup overhead and network packet loss. It combines exceeds acceptable thresholds and are not hesitant to this model with higher level online mechanisms that use simply take their business elsewhere. It is therefore ex- access history and HTTP referer information when avail- tremely important for businesses to know the response able to learn relationships among Web objects to corre- time that their clients are experiencing. This places them late connections and Web objects to determine pageview in a difficult position: having to obtain accurate client response times. perceived response time metrics in a timely, cost effec- ksniffer mechanisms take a “look once, then drop” ap- tive manner so that problems can be immediately iden- proach to packet analysis, use simple hashing data struc- tified and fixed. For larger Web sites, the requirement tures to match Web objects to pageviews, and can be of having a scalable solution is key; in addition, the ca- efficiently implemented in kernel space. Furthermore, pability to transmit this information to an online cluster ksniffer only looks at TCP/IP and HTTP protocol header management system is also a necessity. information and does not need to parse any HTTP data 1 Appears in Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI 2004) ksniffer Machine payload. This enables ksniffer to perform higher level Web pageview analysis effectively online in the presence Local Analysis of high data rates; it can monitor traffic at gigabit line user mmap kernel speeds while running on an inexpensive, commodity PC. Socket These mechanisms enable ksniffer to provide accurate (a) TCP filtering config results across a wide range of operating conditions, in- files protocol (b) cluding high load, connection drops, and packet loss. In analysis IP these cases, obtaining accurate performance measures is most crucial because Web server and network resources device independent may be overloaded. ksniffer has several advantages over other approaches. device driver device driver Remote First, ksniffer does not require any modifications to Web packet stream Analysis pages, Web servers, or browsers, making deployment Figure 1: ksniffer architecture. easier and faster. This is particularly important for Web hosting companies responsible for maintaining the in- as well as a platform for research into traffic analysis. frastructure surrounding a Web site but are often not Figure 1 depicts the ksniffer architecture. permitted to modify the customer’s server machines or ksniffer is designed to be implemented as a set of dy- content. Second, ksniffer captures network character- namically loadable kernel modules that reside above the istics such as packet loss and delay, aiding in distin- network device independent layer in the operating sys- guishing network problems from server problems. Third, tem. Its device independence makes it easy to deploy ksniffer measures the behavior of every session for ev- on any inexpensive, commodity PC without special NIC ery real client who visits the Web site. Therefore, it hardware or device driver modifications. ksniffer appears does not fall prey to biases that arise when sampling to the kernel simply as another network protocol layer from a select, predefined set of client monitoring ma- within the stack and is treated no different than TCP/IP, chines that have better connectivity, and use different which is shown for comparison in Figure 1. ksniffer Web browser software, than the actual users of the Web monitors bidirectional traffic and looks at each packet site. Fourth, ksniffer can obtain metrics for any Web once, extracts any TCP/IP or HTTP header information content, not just HTML. Fifth, ksniffer performs online that is present, then discards the packet. The in-kernel analysis of high bandwidth, live packet traffic instead of implementation exploits several performance advantages offline analysis of traces stored on disk, bypassing the such as zero-copy buffer management, eliminated sys- need to manage large amounts of disk storage to store tem calls, and reduced context switches [16, 17]. ksniffer packet traces. More importantly, ksniffer can provide does not produce packet trace log files, but can read con- performance measurements to Web servers in real-time, figuration parameters and write debugging information enabling them to respond immediately to performance to disk from kernel space. problems through diagnosis and resource management. This design gives ksniffer a three to four fold improve- This paper presents the design and implementation of ment in performance over user space systems that copy ksniffer. Section 2 presents an overview of the ksniffer every packet to user space. Each packet could potentially architecture. Section 3 describes the ksniffer algorithms impact the response time measurement, yet ksniffer only for reconstructing TCP connections and pageview activ- examines a small percentage of the bytes within each ities. Section 4 discusses how ksniffer handles less ideal packet (TCP/IP fields and the HTTP headers, if present). operating conditions, such as packet loss and server over- By executing in kernel space, ksniffer avoids transferring load. Section 5 presents experimental results quantifying large amounts of irrelevant bytes to user space, saving the accuracy and scalability of ksniffer under various op- CPU cycles and memory bandwidth. erating conditions. We measure the accuracy of ksniffer ksniffer provides a low overhead shared memory in- against measurements obtained at the client and compare terface (similar to MAGNET [13]) to export results (not the scalability of ksniffer against user-space packet anal- packets) to user space. This allows more sophisticated ysis systems. Section 6 discusses related work. Finally, analysis that is less performance critical
Recommended publications
  • Causal Inference in the Presence of Interference in Sponsored Search Advertising
    Causal Inference in the Presence of Interference in Sponsored Search Advertising Razieh Nabi∗ Joel Pfeiffer Murat Ali Bayir Johns Hopkins University Microsoft Bing Ads Microsoft Bing Ads [email protected] [email protected] [email protected] Denis Charles Emre Kıcıman Microsoft Bing Ads Microsoft Research [email protected] [email protected] Abstract In classical causal inference, inferring cause-effect relations from data relies on the assumption that units are independent and identically distributed. This assumption is violated in settings where units are related through a network of dependencies. An example of such a setting is ad placement in sponsored search advertising, where the clickability of a particular ad is potentially influenced by where it is placed and where other ads are placed on the search result page. In such scenarios, confounding arises due to not only the individual ad-level covariates but also the placements and covariates of other ads in the system. In this paper, we leverage the language of causal inference in the presence of interference to model interactions among the ads. Quantification of such interactions allows us to better understand the click behavior of users, which in turn impacts the revenue of the host search engine and enhances user satisfaction. We illustrate the utility of our formalization through experiments carried out on the ad placement system of the Bing search engine. 1 Introduction In recent years, advertisers have increasingly shifted their ad expenditures online. One of the most effective platforms for online advertising is search engine result pages. Given a user query, the search engine allocates a few ad slots (e.g., above or below its organic search results) and runs an auction among advertisers who are bidding and competing for these slots.
    [Show full text]
  • Understanding Attention Metrics
    Understanding Attention Metrics sponsored by: 01 introduction For years and years we’ve been told that if the click isn’t on its deathbed, it should be shivved to death immediately. While early display ads in the mid-90s could boast CTRs above 40%, the likely number you’ll see attached to a campaign these days is below 0.1%. But for some reason, even in 2015, click-through rate remains as vital to digital ad measurement although its importance is widely ridiculed. Reliance on the click means delivering unfathomable amounts of ad impressions to users, which has hyper-inflated the value of the pageview. This has led to a broken advertising system that rewards quantity over quality. Instead of building audiences, digital publishers are chasing traffic, trying to lure users to sites via “clickbait” headlines where the user is assaulted with an array of intrusive ad units. The end effect is overwhelming, ineffective ads adjacent to increasingly shoddy content. However, the rise of digital video and viewability have injected linear TV measurement capabilities into the digital space. By including time in its calculation, viewability has opened a conversation into the value of user attention. It has introduced metrics with the potential to assist premium publishers in building attractive audiences; better quantify exposure for advertisers; and ultimately financially reward publishers with more engaged audiences. This is a new and fast developing area, but one where a little education can go a long way. In addition to describing current attention metrics, this playbook will dive into what you should look for in an advanced metrics partner and how to use these metrics in selling inventory and optimizing campaign performance.
    [Show full text]
  • WTF IS NATIVE ADVERTISING? Introduction
    SPONSORED BY IS NATIVE ADVERTISING? Table of Contents Introduction WTF is Native 3 14 Programmatic? Nomenclature The Native Ad 4 16 Triumvirate 5 Decision tree 17 The Ad Man 18 The Publisher Issues Still Plaguing 19 The Platform 6 Native Advertising Glossary 7 Scale 20 8 Metrics 10 Labelling 11 The Church/State Divide 12 Credibility 3 / WTF IS NATIVE ADVERTISING? Introduction In 2013, native advertising galloped onto the scene like a masked hero, poised to hoist publishers atop a white horse, rescuing them from the twin menaces of programmatic advertising and sagging CPMs. But who’s really there when you peel back the mask? Native advertising is a murky business. Ad executives may not consider it advertising. Editorial departments certainly don’t consider it editorial. Even among its practitioners there is debate — is it a format or is it a function? Publishers who have invested in the studio model position native advertising as the perfect storm of context, creative capital and digital strategy. For platforms, it may be the same old banner advertising refitted for the social stream. Digiday created the WTF series to parse murky digital marketing concepts just like these. WTF is Native Advertising? Keep reading to find out..... DIGIDAY 4 / WTF IS NATIVE ADVERTISING? Nomenclature Native advertising An advertising message designed Branded content Content created to promote a Content-recommendation widgets Another form to mimic the form and function of its environment brand’s products or values. Branded content can take of native advertising often used by publishers, these a variety of formats, not all of them technically “native.” appear to consumers most often at the bottom of a web Content marketing Any marketing messages that do Branded content placed on third-party publishing page with lines like “From around the web,” or “You not fit within traditional formats like TV and radio spots, sites or platforms can be considered native advertising, may also like.” print ads or banner messaging.
    [Show full text]
  • Google Analytics Education and Skills for the Professional Advertiser
    AN INDUSTRY GUIDE TO Google Analytics Education and Skills For The Professional Advertiser MODULE 5 American Advertising Federation The Unifying Voice of Advertising Google Analytics is a freemium service that records what is happening at a website. CONTENTS Tracking the actions of a website is the essence of Google MODULE 5 Google Analytics Analytics. Google Analytics delivers information on who is visiting a Learning Objectives .....................................................................................1 website, what visitors are doing on that website, and what Key Definitions .......................................................................................... 2-3 actions are carried out at the specific URL. Audience Tracking ................................................................................... 4-5 Information is the essence of informed decision-making. The The A,B,C Cycle .......................................................................................6-16 website traffic activity data of Google Analytics delivers a full Acquisition understanding of customer/visitor journeys. Behavior Focus Google Analytics instruction on four(4) Conversions basic tenets: Strategy: Attribution Model ............................................................17-18 Audience- The people who visit a website Case History ........................................................................................... 19-20 Review ..............................................................................................................
    [Show full text]
  • Auditing Wikipedia's Hyperlinks Network on Polarizing Topics
    Auditing Wikipedia’s Hyperlinks Network on Polarizing Topics Cristina Menghini Aris Anagnostopoulos Eli Upfal DIAG DIAG Dept. of Computer Science Sapienza University Sapienza University Brown University Rome, Italy Rome, Italy Providence, RI, USA [email protected] [email protected] [email protected] ABSTRACT readers to deepen their understanding of a topic by conveniently ac- People eager to learn about a topic can access Wikipedia to form cessing other articles." Consequently, while reading an article, users a preliminary opinion. Despite the solid revision process behind are directly exposed to its content and indirectly exposed to the the encyclopedia’s articles, the users’ exploration process is still content of the pages it points to. influenced by the hyperlinks’ network. In this paper, we shed light Wikipedia’s pages are the result of collaborative efforts of a com- on this overlooked phenomenon by investigating how articles de- mitted community that, following policies and guidelines [4, 20], scribing complementary subjects of a topic interconnect, and thus generates and maintains up-to-date and high-quality content [28, may shape readers’ exposure to diverging content. To quantify 40]. Even though tools support the community for curating pages this, we introduce the exposure to diverse information, a metric that and adding links, it lacks a systematic way to contextualize the captures how users’ exposure to multiple subjects of a topic varies pages within the more general articles’ network. Indeed, it is im- click-after-click by leveraging navigation models. portant to stress that having access to high-quality pages does not For the experiments, we collected six topic-induced networks imply a comprehensive exposure to an argument, especially for a about polarizing topics and analyzed the extent to which their broader or polarizing topic.
    [Show full text]
  • Chapter 3. Event Expressions: Filtering the Event Stream
    Chapter 3. Event Expressions: Filtering the Event Stream As we discussed in Chapter 1, the Live Web is uses a data model that is the dual of the data model that has prevailed on the static Web. The static Web is built around interactive Web sites that leverage pools of static data held in databases. Interactive Web sites are built with languages like PHP and Ruby on Rails that are designed for taking user interaction and formulating SQL queries against databases. The static Web works by making dynamic queries against static data. In contrast, the Live Web is based on static queries against dynamic data. Streams of real-time data are becoming more and more common online. Years ago, such data streams were just a trickle, but the advent of technologies like RSS and Atom as well as the appearance of services like Twitter and Facebook has turned this trickle into a raging torrent. And this is just the beginning. The only way that we can hope to make use of all this data is if we can filter it and automate our responses to it as much as possible. As we’ll discuss in detail in Chapter 4, the task is greater than mere filtering, the task is correlating these events contextually. Correlating events provides power well beyond merely using filters to tame the data torrent. This chapter introduces in detail the pattern language, called “event expressions1,” that we will use to match against streams of events. Event expressions are an important way that we match events with context.
    [Show full text]
  • The New World of Content Measurement Why Existing Metrics Are Flawed (And How to Fix Them)
    The New World of Content Measurement Why existing metrics are flawed (and how to fix them) Copyright © 2014 Contently. All rights reserved. contently.com Carta Marina, Olaus Magnus Death to the pageview. — Paul Fredrich VP of Product, Contently GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM Editor’s Note At Contently, three things have been a constant topic of discussion over the past six months: basketball, cult comedies from the early 1990s, and the disastrous state of content measurement in the marketing and media industries. Simply put, everyone was using flawed, superficial metrics to measure success—like a doctor using x-ray glasses instead of an x-ray machine. Even as forward-thinking content analytics platforms like Chartbeat began to make headway towards measuring content in more sophisticated ways, we saw brand publishers we worked with getting left out. That’s because brands face a fundamentally different set of content measurement challenges than traditional publishing companies. In pursuit of a solution, we did a lot of research and hard thinking, and even built an analytics tool of our own. This report is the result of our findings. We hope you find it useful, and that it empowers you to firmly tie your content to business results. — Joe Lazauskas Contently Editor in Chief 3 CONTENTLY GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM Table of Contents I. Introduction 5 II. History 6 III. The Present Day 7 IV. Flawed Proxy Metrics 9 V. The Key to the Content Marketing Kingdom 11 VI. The Future 13 4 CONTENTLY GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM You’re measuring the success of your content all wrong.
    [Show full text]
  • Lessons Learned Globally: Tobacco Control Digital Media Campaigns
    Lessons Learned Globally: Tobacco Control Digital Media Campaigns Suggested Citation Gutierrez K, Newcombe, R. Lessons Learned Globally: Tobacco Control Digital Media Campaigns. Saint Paul, Minnesota, United States: Global Dialogue for Effective Stop-Smoking Campaigns; 2012. Listings of Case Studies Two lists of case studies are provided for more convenient review based on readers’ interests. The Table of Contents presents the case studies section in alphabetical order by country, then province or state (if applicable), and then in chronological order. It also lists the other parts of the document, such as the Executive Summary, Methodology, Lessons Learned, etc. The List of Campaigns by Goal found after the Table of Contents focuses only on the case studies, organizing them by the main tobacco control goal of each campaign. Campaigns are grouped according to whether they sought to: 1. Prevent initiation of tobacco use 2. Reduce tobacco use via assisting tobacco users in quitting 3. Reduce exposure to secondhand smoke The List of Campaigns by Goal may be helpful to those who are seeking insights from campaigns that have a common tobacco control goal. On the following page, there is a grid detailing the digital elements used in each campaign which may be helpful to those who are seeking information about effective use of particular digital vehicles/approaches. Table of Contents List of Campaigns by Goal 3 Grid of Digital Elements used in Each Campaign 4 I. Executive Summary 5 II. Introduction 8 Acknowledgements 8 Purpose of This Document 8 Methods 9 Limitations 11 Additional Campaign Information 12 III. Key Lessons Learned 13 IV.
    [Show full text]
  • Facebook Hotel Playbook CONTENTS
    Facebook Hotel Playbook CONTENTS 01. Landscape and Our Recommended Approach 3 02. Travel and Hotel Insights 4 03. Hotel Strategy on Facebook 5 04. Drive Hotel Demand 6 a. Dynamic Retargeting 7 b. Standard Retargeting 7 c. Dynamic prospecting 8 d. Boost Programs 9 05. Build Brand Confidence 10 a. Awareness 10 b. Inspiration 11 c. Loyalty Acquisition and Retention 12 06. Media Buying Efficiencies 14 07. Key Hotel Considerations 16 08. Global Hotel Insights and Considerations 17 09. Creative Tips for Recovery 18 10. Creative Best Practices and Dynamic Creative 19 11. General Measurement Guidance 21 12. Nimble Testing Ideas 23 13. Optimization and Signals 25 14. Signals for Enhanced Measurement 26 15. Other Considerations 28 a. Influencer Branded Content Marketing 28 b. Messenger 29 c. Facebook Marketing Partners Can Help 30 16. Additional Resources 32 17. Food and Beverage Playbook for Hotels 34 © Facebook company Facebook Hotel Playbook 2 01. Landscape and Our Recommended Approach The COVID-19 outbreak has affected And with 70% of people surveyed globally businesses around the world. Employee saying they are “now spending more time health, work patterns, production and on their smartphones,” hoteliers must consumer habits have all been affected. continue to adopt mobile-first strategies.3 The travel industry has been hit With this continued shift of consumer especially hard with an estimated behavior comes an opportunity to redefine $2.1T loss to the industry globally.1 media strategies and focus on driving new demand for hotels, regaining consumer And the hospitality industry has trust and fortifying customer loyalty. similarly been impacted as consumers change their habits.
    [Show full text]
  • Download Legal Document
    Case 1:15-cv-00662-TSE Document 168-6 Filed 12/18/18 Page 1 of 67 Wikimedia Foundation v. NSA No. 15-cv-0062-TSE (D. Md.) Plaintiff’s Exhibit 2 Case 1:15-cv-00662-TSE Document 168-6 Filed 12/18/18 Page 2 of 67 IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF MARYLAND WIKIMEDIA FOUNDATION, Plaintiff, No. 1:15-cv-00662-TSE v. NATIONAL SECURITY AGENCY / CENTRAL SECURITY SERVICE, et al., Defendants. DECLARATION OF JONATHON PENNEY Case 1:15-cv-00662-TSE Document 168-6 Filed 12/18/18 Page 3 of 67 TABLE OF CONTENTS I. Qualifications ...................................................................................................................... 1 II. Assignment ......................................................................................................................... 3 III. Summary of Opinions ......................................................................................................... 4 IV. Overview of Chilling Effects Theory & Supporting Empirical Evidence for Surveillance Related Chill Online ........................................................................................................... 4 V. Wikipedia Study—Design, Method, and Results ............................................................. 10 A. Design ......................................................................................................................... 11 i. Intervention Selection ................................................................................................. 12 ii. Data Selection ............................................................................................................
    [Show full text]
  • Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets
    JOURNAL OF SOFTWARE, VOL. 3, NO. 8, NOVEMBER 2008 19 Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets Po-Zung Chen, Chu-Hao Sun Department of Computer Science and Information Engineering, Tamkang University, Taipei County, Taiwan Email: [email protected], [email protected] Shih-Yang Yang Department of Information Management, Kang-Ning Junior College of Medical Care and Management , Taipei, Taiwan Email: [email protected] Abstract—Precise analysis of the Web structure can profiles left by all the browser users. As the profiles are facilitate data pre-processing and enhance the accuracy of only the sequentially recorded contents of the services the mining results in the procedure of Web usage mining. provided by the Web server, the profiles not only could STPN ( Stochastic Timed Petri Nets ) is a high-level contain multiple browsing profiles of different browser graphical model widely used in modeling system activities users but also could take in some extra or erroneous with concurrency. STPN can save the analyzed results in an profiles. The website management personnel must incidence matrix for future follow-up analyses, and some already-verified properties held by STPN, such as proceed with preprocessing to these usage profiles if they reachability, can also be used to solve some unsettled are to correctly analyze said users’ webpage-contents problems in the model. In the present study, we put forth usage sequence. Hence, a data preprocessing is needed to the use of STPN as the Web structure model. We adopt enhance information processing before we can analyze Place in the STPN model to represent webpage on the the usage profiles.
    [Show full text]
  • Wikibrain: Democratizing Computation on Wikipedia
    WikiBrain: Democratizing computation on Wikipedia Shilad Sen Toby Jia-Jun Li Macalester College University of Minnesota St. Paul, Minnesota Minneapolis, Minnesota [email protected] [email protected] ∗ WikiBrain Team Brent Hecht Matt Lesicko, Ari Weiland, Rebecca Gold, University of Minnesota Yulun Li, Benjamin Hillmann Minneapolis, Minnesota Macalester College [email protected] St. Paul, Minnesota ABSTRACT has raced towards this goal, growing in articles, revisions, 1 Wikipedia is known for serving humans' informational needs. and gigabytes by factor of 20. The planet has taken notice. Over the past decade, the encyclopedic knowledge encoded Humans turn to Wikipedia's 31 million articles more than in Wikipedia has also powerfully served computer systems. any other online reference. The online encylopedia's 287 languages attract a global audience that make it the sixth Leading algorithms in artificial intelligence, natural language 2 processing, data mining, geographic information science, and most visited site on the web. many other fields analyze the text and structure of articles However, outside the public eye, computers have also come to build computational models of the world. to rely on Wikipedia. Algorithms across many fields extract Many software packages extract knowledge from Wiki- meaning from the encyclopedia by analyzing articles' text, pedia. However, existing tools either (1) provide Wikipedia links, titles, and category structure. Search engines such as Google and Bing respond to user queries by extracting struc- data, but not well-known Wikipedia-based algorithms or (2) 3 narrowly focus on one such algorithm. tured knowledge from Wikipedia . Leading semantic relat- This paper presents the WikiBrain software framework, edness (SR) algorithms estimate the strength of association an extensible Java-based platform that democratizes access between words by mining Wikipedia's text, link structure to a range of Wikipedia-based algorithms and technologies.
    [Show full text]