Lecture 14.1 Google's Map/Reduce

Copy Link

Lecture 14.1 Google’s Map/Reduce EN 600.320/420 Instructor: Randal Burns 13 March 2018 Department of Computer Science, Johns Hopkins University Overview Map/Reduce – A structured programming model for data parallelism – Geared toward data intensive applications The Google File System – Large scale parallel file system for low $$ – Failures are the normal case – M/R requires such a global data service Open-source re-implementation – HADOOP: Map/Reduce – HDFS: Google File System What is it? – Data-parallel, distributed-memory, data-intensive Lecture 14: Google’s Map/Reduce Map/Reduce Model Map – Process key/value pairs – Produce intermediate key/value pairs Reduce – Merge intermediate pairs on key value Map and reduce were LISP primitives – Hence the functional style – Actually write procedural code under functional constraints Lecture 14: Google’s Map/Reduce An M/R Schematic Lecture 14: Google’s Map/Reduce Example: Count all Words This is pseudocode – Need a tokenizer, etc. Lecture 14: Google’s Map/Reduce Example: Count all Words Lecture 14: Google’s Map/Reduce Word Count Visualized Splits: by files, by lines – Like most things, customizable within the framework Output is deceptive: not a file, but many files (one per reducer) Lecture 14: Google’s Map/Reduce What can you do? Seems like a limited model But, many string processing problems fit naturally Can be used iteratively Examples – Grep – Web-log processing (e.g. URL hit counts) – Reverse Web-link graph – Terms vector per host – Build an inverted index – Distributed sort (naturally) Lecture 14: Google’s Map/Reduce.

Recommended publications

Intro to Google for the Hill

Introduction to A company built on search Our mission Google’s mission is to organize the world’s information and make it universally accessible and useful. As a ﬁrst step to fulﬁlling this mission, Google’s founders Larry Page and Sergey Brin developed a new approach to online search that took root in a Stanford University dorm room and quickly spread to information seekers around the globe. The Google search engine is an easy-to-use, free service that consistently returns relevant results in a fraction of a second. What we do Google is more than a search engine. We also offer Gmail, maps, personal blogging, and web-based word processing products to name just a few. YouTube, the popular online video service, is part of Google as well. Most of Google’s services are free, so how do we make money? Much of Google’s revenue comes through our AdWords advertising program, which allows businesses to place small “sponsored links” alongside our search results. Prices for these ads are set by competitive auctions for every search term where advertisers want their ads to appear. We don’t sell placement in the search results themselves, or allow people to pay for a higher ranking there. In addition, website managers and publishers take advantage of our AdSense advertising program to deliver ads on their sites. This program generates billions of dollars in revenue each year for hundreds of thousands of websites, and is a major source of funding for the free content available across the web. Google also offers enterprise versions of our consumer products for businesses, organizations, and government entities.
Google Apps Premier Edition: Easy, Collaborative Workgroup Communication with Gmail and Google Calendar

Google Apps Premier Edition: easy, collaborative workgroup communication with Gmail and Google Calendar Messaging overview Google Apps Premier Edition messaging tools include email, calendar and instant messaging solutions that help employees communicate and stay connected, wherever and whenever they work. These web-based services can be securely accessed from any browser, work on mobile devices like BlackBerry and iPhone, and integrate with other popular email systems like Microsoft Outlook, Apple Mail, and more. What’s more, Google Apps’ SAML-based Single Sign-On (SSO) capability integrates seamlessly with existing enterprise security and authentication services. Google Apps deliver productivity and reduce IT workload with a hosted, 99.9% uptime solution that gets teams working together fast. Gmail Get control of spam Advanced ﬁlters keep spam from employees’ inboxes so they can focus on messages that matter, and IT admins can focus on other initiatives. Keep all your email 25 GB of storage per user means that inbox quotas and deletion schedules are a thing of the past. Integrated instant messaging Connect with contacts instantly without launching a separate application or leaving your inbox. No software required. Built-in voice and video chat Voice and video conversations, integrated into Gmail, make it easy to connect face-to-face with co-workers around the world. Find messages instantly Powerful Google search technology is built into Gmail, turning your inbox into your own private and secure Google search engine for email. Protect and secure sensitive information Additional spam ﬁltering from Postini provides employees with an additional layer of protection and policy-enforced encryption between domains using standard TLS protocols.
Notes on Chromebooks and Neverware Cloudready Chromium

Chromebooks Are For Seniors - Ron Brown - APCUG VTC - 8-19-17 https://youtu.be/4uszFPNL-SU http://cb4s.net/ Are Chromebooks more secure than laptops? Google’s security features in ChromeOS When Google set about designing ChromeOS it had the distinct advantage of being able to see the problems that Windows, macOS, and even Linux had struggled with when it came to security. With this in mind it implemented five key features that make ChromeOS a formidable system for hackers to crack. Automatic Updates As new threats become known, it’s vital that patches are applied quickly to thwart them. Google has an excellent track record on this, as not only does it release fixes on a very regular basis, but with Chromebooks guaranteed OS updates for seven years after release, the majority of users are running the most up to date version anyway. his can be an issue on other platforms, where differing combinations of OS versions and hardware can delay patches. Sandboxing If something does go wrong, and malware gets onto a Chromebook, there’s not much damage it can do. Each tab in ChromeOS acts as a separate entity with a restricted environment or ‘sandbox’. This means that only the affected tab is vulnerable, and that it is very difficult for the infection to spread to other areas of the machine. In Windows and macOS the malware is usually installed somewhere on the system itself, which makes it a threat with a much wider scope. There are ways to restrict this of course, with anti-virus software, regular system scans, and not running as an administrator.
Learn How to Use Google Reviews at Your Hotel

Learn How to Use Google Reviews at your Hotel Guide Managing Guest Satisfaction Surveys: Best Practices Index Introduction 2 Can you explain Google’s rating system? 3 What’s different about the new Google Maps? 5 Do reviews affect my hotel’s search ranking? 6 How can we increase the number of Google reviews? 7 Can I respond to Google reviews? 8 Managing Guest Satisfaction1 Surveys: Best Practices Introduction Let’s be honest, Google user reviews aren’t very helpful And then there’s the near-ubiquitous “+1” button, a way when compared to reviews on other review sites. for Google+ users to endorse a business, web page, They’re sparse, random and mostly anonymous. You photo or post. can’t sort them, filtering options are minimal, and the rating system is a moving target. These products are increasingly integrated, allowing traveler planners to view rates, availability, location, But that’s all changing. photos and reviews without leaving the Google ecosystem. Reviews and ratings appear to play an increasingly critical role in Google’s master plan for world domination This all makes Google reviews difficult to ignore—for in online travel planning. They now show prominently in travelers and hotels. So what do hotels need to know? In Search, Maps, Local, Google+, Hotel Finder and the this final instalment in ReviewPro’s popular Google For new Carousel—and on desktops, mobile search and Hotels series, we answer questions from webinar mobile applications. attendees related to Google reviews. Managing Guest Satisfaction2 Surveys: Best Practices Can you Explain Google’s Rating System? (I) Registered Google users can rate a business by visiting its Google+ 360° Guest Local page and clicking the Write a Review icon.
GOOGLE LLC V. ORACLE AMERICA, INC

(Slip Opinion) OCTOBER TERM, 2020 1 Syllabus NOTE: Where it is feasible, a syllabus (headnote) will be released, as is being done in connection with this case, at the time the opinion is issued. The syllabus constitutes no part of the opinion of the Court but has been prepared by the Reporter of Decisions for the convenience of the reader. See United States v. Detroit Timber & Lumber Co., 200 U. S. 321, 337. SUPREME COURT OF THE UNITED STATES Syllabus GOOGLE LLC v. ORACLE AMERICA, INC. CERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE FEDERAL CIRCUIT No. 18–956. Argued October 7, 2020—Decided April 5, 2021 Oracle America, Inc., owns a copyright in Java SE, a computer platform that uses the popular Java computer programming language. In 2005, Google acquired Android and sought to build a new software platform for mobile devices. To allow the millions of programmers familiar with the Java programming language to work with its new Android plat- form, Google copied roughly 11,500 lines of code from the Java SE pro- gram. The copied lines are part of a tool called an Application Pro- gramming Interface (API). An API allows programmers to call upon prewritten computing tasks for use in their own programs. Over the course of protracted litigation, the lower courts have considered (1) whether Java SE’s owner could copyright the copied lines from the API, and (2) if so, whether Google’s copying constituted a permissible “fair use” of that material freeing Google from copyright liability. In the proceedings below, the Federal Circuit held that the copied lines are copyrightable.
Advanced Search Options

ADVANCED SEARCH OPTIONS Even the most powerful search engine requires a bit of fine-tuning. To enhance your Google search, try the following options: Phrase Searches Search for complete phrases by enclosing them in quotation marks. Words enclosed in double quotes ("like this") appear together in all results exactly as you have entered them. Phrase searches are especially useful for finding famous sayings or proper names. Category Searches The Google Web Directory (located at directory.google.com) is a good place to start if you're not sure exactly what terms to use. A directory can also eliminate unwanted results from your search. For example, searching for "Saturn" within the Science > Astronomy category of the Google Web Directory returns only Google supports several advanced operators. Many are pages about the planet Saturn, while searching for "Saturn" accessible from the Google advanced search page. within the Automotive category of the Google Web Directory returns only pages about Saturn cars. Advanced Searches Made Easy You can increase the accuracy of your searches by adding Domain Restrict Searches operators that fine-tune your keywords. Most of the options listed If you know the website you want to search but aren't sure on this page can be entered directly into the Google search box or where the information you want is located within that site, you selected from Google's "Advanced Search" page, which can be can use Google to search only that domain. Do this by entering found at: http://www.google.com/advanced_search what you're looking for, followed by the word "site" and a colon followed by the domain name.
In the Common Pleas Court Delaware County, Ohio Civil Division

IN THE COMMON PLEAS COURT DELAWARE COUNTY, OHIO CIVIL DIVISION STATE OF OHIO ex rel. DAVE YOST, OHIO ATTORNEY GENERAL, Case No. 21 CV H________________ 30 East Broad St. Columbus, OH 43215 Plaintiff, JUDGE ___________________ v. GOOGLE LLC 1600 Amphitheatre Parkway COMPLAINT FOR Mountain View, CA 94043 DECLARATORY JUDGMENT AND INJUNCTIVE RELIEF Also Serve: Google LLC c/o Corporation Service Co. 50 W. Broad St., Ste. 1330 Columbus OH 43215 Defendant. Plaintiff, the State of Ohio, by and through its Attorney General, Dave Yost, (hereinafter “Ohio” or “the State”), upon personal knowledge as to its own acts and beliefs, and upon information and belief as to all matters based upon the investigation by counsel, brings this action seeking declaratory and injunctive relief against Google LLC (“Google” or “Defendant”), alleges as follows: I. INTRODUCTION The vast majority of Ohioans use the internet. And nearly all of those who do use Google Search. Google is so ubiquitous that its name has become a verb. A person does not have to sign a contract, buy a specific device, or pay a fee to use Good Search. Google provides its CLERK OF COURTS - DELAWARE COUNTY, OH - COMMON PLEAS COURT 21 CV H 06 0274 - SCHUCK, JAMES P. FILED: 06/08/2021 09:05 AM search services indiscriminately to the public. To use Google Search, all you have to do is type, click and wait. Primarily, users seek “organic search results”, which, per Google’s website, “[a] free listing in Google Search that appears because it's relevant to someone’s search terms.” In lieu of charging a fee, Google collects user data, which it monetizes in various ways—primarily via selling targeted advertisements.
Larry Page Developing the Largest Corporate Foundation in Every Successful Company Must Face: As Google Word.” the United States

LOWE —continued from front flap— Praise for $19.95 USA/$23.95 CAN In addition to examining Google’s breakthrough business strategies and new business models— In many ways, Google is the prototype of a which have transformed online advertising G and changed the way we look at corporate successful twenty-ﬁ rst-century company. It uses responsibility and employee relations——Lowe Google technology in new ways to make information universally accessible; promotes a corporate explains why Google may be a harbinger of o 5]]UZS SPEAKS culture that encourages creativity among its where corporate America is headed. She also A>3/9A addresses controversies surrounding Google, such o employees; and takes its role as a corporate citizen as copyright infringement, antitrust concerns, and “It’s not hard to see that Google is a phenomenal company....At Secrets of the World’s Greatest Billionaire Entrepreneurs, very seriously, investing in green initiatives and personal privacy and poses the question almost Geico, we pay these guys a whole lot of money for this and that key g Sergey Brin and Larry Page developing the largest corporate foundation in every successful company must face: as Google word.” the United States. grows, can it hold on to its entrepreneurial spirit as —Warren Buffett l well as its informal motto, “Don’t do evil”? e Following in the footsteps of Warren Buffett “Google rocks. It raised my perceived IQ by about 20 points.” Speaks and Jack Welch Speaks——which contain a SPEAKS What started out as a university research project —Wes Boyd conversational style that successfully captures the conducted by Sergey Brin and Larry Page has President of Moveon.Org essence of these business leaders—Google Speaks ended up revolutionizing the world we live in.
Should Google Be Taken at Its Word?

CAN GOOGLE BE TRUSTED? SHOULD GOOGLE BE TAKEN AT ITS WORD? IF SO, WHICH ONE? GOOGLE RECENTLY POSTED ABOUT “THE PRINCIPLES THAT HAVE GUIDED US FROM THE BEGINNING.” THE FIVE PRINCIPLES ARE: DO WHAT’S BEST FOR THE USER. PROVIDE THE MOST RELEVANT ANSWERS AS QUICKLY AS POSSIBLE. LABEL ADVERTISEMENTS CLEARLY. BE TRANSPARENT. LOYALTY, NOT LOCK-IN. BUT, CAN GOOGLE BE TAKEN AT ITS WORD? AND IF SO, WHICH ONE? HERE’S A LOOK AT WHAT GOOGLE EXECUTIVES HAVE SAID ABOUT THESE PRINCIPLES IN THE PAST. DECIDE FOR YOURSELF WHO TO TRUST. “DO WHAT’S BEST FOR THE USER” “DO WHAT’S BEST FOR THE USER” “I actually think most people don't want Google to answer their questions. They want Google to tell them what they should be doing next.” Eric Schmidt The Wall Street Journal 8/14/10 EXEC. CHAIRMAN ERIC SCHMIDT “DO WHAT’S BEST FOR THE USER” “We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers.” Larry Page & Sergey Brin Stanford Thesis 1998 FOUNDERS BRIN & PAGE “DO WHAT’S BEST FOR THE USER” “The Google policy on a lot of things is to get right up to the creepy line.” Eric Schmidt at the Washington Ideas Forum 10/1/10 EXEC. CHAIRMAN ERIC SCHMIDT “DO WHAT’S BEST FOR THE USER” “We don’t monetize the thing we create…We monetize the people that use it. The more people use our products,0 the more opportunity we have to advertise to them.” Andy Rubin In the Plex SVP OF MOBILE ANDY RUBIN “PROVIDE THE MOST RELEVANT ANSWERS AS QUICKLY AS POSSIBLE” “PROVIDE THE MOST RELEVANT ANSWERS AS QUICKLY
Mapreduce: Simplified Data Processing On

MapReduce: Simpliﬁed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat [email protected], [email protected] Google, Inc. Abstract given day, etc. Most such computations are conceptu- ally straightforward. However, the input data is usually MapReduce is a programming model and an associ- large and the computations have to be distributed across ated implementation for processing and generating large hundreds or thousands of machines in order to ﬁnish in data sets. Users specify a map function that processes a a reasonable amount of time. The issues of how to par- key/value pair to generate a set of intermediate key/value allelize the computation, distribute the data, and handle pairs, and a reduce function that merges all intermediate failures conspire to obscure the original simple compu- values associated with the same intermediate key. Many tation with large amounts of complex code to deal with real world tasks are expressible in this model, as shown these issues. in the paper. As a reaction to this complexity, we designed a new Programs written in this functional style are automati- abstraction that allows us to express the simple computa- cally parallelized and executed on a large cluster of com- tions we were trying to perform but hides the messy de- modity machines. The run-time system takes care of the tails of parallelization, fault-tolerance, data distribution details of partitioning the input data, scheduling the pro- and load balancing in a library. Our abstraction is in- gram's execution across a set of machines, handling ma- spired by the map and reduce primitives present in Lisp chine failures, and managing the required inter-machine and many other functional languages.
Mapreduce: a Flexible Data Processing Tool

contributed articles DOI:10.1145/1629175.1629198 of MapReduce has been used exten- MapReduce advantages over parallel databases sively outside of Google by a number of organizations.10,11 include storage-system independence and To help illustrate the MapReduce fine-grain fault tolerance for large jobs. programming model, consider the problem of counting the number of by JEFFREY DEAN AND SaNjay GHEMawat occurrences of each word in a large col- lection of documents. The user would write code like the following pseudo- code: MapReduce: map(String key, String value): // key: document name // value: document contents for each word w in value: A Flexible EmitIntermediate(w, “1”); reduce(String key, Iterator values): // key: a word Data // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Processing Emit(AsString(result)); The map function emits each word plus an associated count of occurrences (just `1' in this simple example). The re- Tool duce function sums together all counts emitted for a particular word. MapReduce automatically paral- lelizes and executes the program on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, MAPREDUCE IS A programming model for processing scheduling the program’s execution and generating large data sets.4 Users specify a across a set of machines, handling machine failures, and managing re- map function that processes a key/value pair to quired inter-machine communication. generate a set of intermediate key/value pairs and MapReduce allows programmers with a reduce function that merges all intermediate no experience with parallel and dis- tributed systems to easily utilize the re- values associated with the same intermediate key.
ELLIOTT V. GOOGLE and ITS CONFLICTS with the GENERICIDE DOCTRINE in a DIGITAL AGE Christopher Brown†

LET ME GOOGLE THAT FOR YOU: ELLIOTT V. GOOGLE AND ITS CONFLICTS WITH THE GENERICIDE DOCTRINE IN A DIGITAL AGE Christopher Brown† On September 25, 2017, the legal team at Velcro Companies (“Velcro”) attempted to educate their customer base about trademark law through a unique brand play. 1 Velcro created a YouTube video entitled “Don’t Say Velcro.”2 In the video, the Velcro legal team discusses how people mistakenly call their product “velcro,” while they refer to their invention as hook and loop.3 The legal team passionately sings that if everyone calls every version of the product “velcro,” then the company will “lose [its] Circled R.”4 To the team’s nearly 500,000 viewers, this video may have seemed comical. However, couched in the Velcro legal department’s playful video is a real legal fear. The Velcro lawyers are referring to a doctrine called genericide, which applies when a protected trademark is appropriated by the relevant public as the name of a product.5 Once a mark is declared to be a generic name, the designation enters the “linguistic commons” and is free for all to use.6 Velcro, like many large companies with household names, is constantly thinking about protecting its intellectual property, including its trademark. Losing its DOI: https://doi.org/10.15779/Z38SQ8QH9J © 2018 Christopher Brown. † J.D. Candidate, 2019, University of California, Berkeley, School of Law. 1. Velcro Companies (VELCRO® Brand), Don’t Say Velcro, YOUTUBE (Sept. 25, 2017), https://www.youtube.com/watch?v=rRi8LptvFZY [https://perma.cc/VS5V-4XS2]. 2. Id.