RePriv: Re-Envisioning In-Browser Privacy

Matt Fredrikson Ben Livshits University of Wisconsin Microsoft Research New York Times

Share data to get Privacy personalized concerns results

Google news Browser: & Privacy

• Broad applications: – Site personalizationYourGoogleNetflixAmazon browser 12 – Personalized search1 Browsing history 11 1 10 Top: Computers: Security: Internet: Privacy – Ads10 2 Top: Arts: Movies: Genres: Film Noir 9 3 Top: Sports: Hockey: Ice Hockey 9 3 Top: Science: Math: Number Theory • User data in browser Top: Recreation: Outdoors: Fishing 8 4 7 5 Distill 6 • Control information release

User interest profile Scenario #1:

bn.com would like to learn your top interests. We will let them know you are interested in:

Interest • Science profile • Technology • Outdoors Interest profile

Accept Decline RePriv Protocol Scenario #2: Personalized Search

PersonalizedPersonalized Results Results Would you like to install an extension RePrivcalled “Bing Personalizer“weather”” that will:  weather.com

“weather”  •weather.comWatch mouse clicks“sports” on bing.com  espn.com • Modify appearance of bing.com “sports”  espn.com “movies”  imdb.com • Store personal data in browser “movies”  imdb.com “recipes”  epicurious.com

“recipes”  epicurious.comAccept Decline Contributions of RePriv

• An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.

Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.

•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.

Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies.

7 RePriv Architecture

Browser equipped with RePriv

Core mining Core mining Core mining Core mining 3rd party Miners providers

RePriv APIs User consent and policies consent User 1st party Personal store providers Core Mining

• Taxonomy from first two • Naïve Bayes levels of ODP taxonomy – All categories equally likely – ~450 categories total – Training: min(3000, # – 20 top-level categories pages) sites per category – Overlap exists – Attribute words occur in at least 15% of docs for ≥1 category

Physics • Science Classification is fast Math enough: O(c•n) Top – n is # words in document Sports Football – c is # document categories

Global Mining Convergence

40

35 30 25 20 15 10

Avg. Distance From Final From Distance Avg. 5 0 0 10 20 30 40 50 60 70 80 90 % History Complete RePriv vs. the White Pages

Source: WebMii • An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.

Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.

•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.

Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies. Verifying Miners

• Untrusted miners are written in Fine

• API wrappers for RePriv functionality written in Fine Miner Name C# LoC Fine LoC Verif. Time • Refined types on security-critical arguments to reflect TwitterMiner 89 36 6.4 policy BingMinerneeds 78 35 6.8 • All MinersNetflixMiner state policy at112 top of source110 code 7.7 GlueMiner 213 101 9.5 • Won’t compile unless code follows policy

assume ExtensionId "twitterminer" assume CanCommunicateXHR ".com“ Nil assume CanUpdateStore("twitter.com“ “twitterminer”) val MakeRequest: p:provs -> ({host:string | CanCommunicate host p}) -> t:tracked -> … tracked val AddEntry ({p:provs | CanUpdateStore p}) -> data:tracked -> string -> tracked,p> -> … unit

Netflix Example

• Update interest profile letbased doGetMovies on Netflix.com genre cdom = interactions … 114 lines of Fine code let– flixEntsWatches = GetStoreEntriesByTopic clicks on rating links, myprov "movie" in assume ExtensionIdupdates "netflixminer store" assume foralllet– genreFlixReads(s:string) . store(ExtensionId = bind to s)myprovfind => CanUpdateStore recently flixEnts- (P "netflix.com" s) assume forall viewed (s:string ) . CanReadDOMId movies (filterByGenre by "netflix.com" genre genre) s in assume CanReadDOMClass "netflix.com" "rv1" assume CanReadDOMClassExtensionReturn "netflix.com" cdom "rv2" myprov genreFlix assume CanReadDOMClass "netflix.com" "rv3" assume• Can CanReadDOMClass provide "netflix.com" this "rv4" assume CanReadDOMClass "netflix.com" "rv5" assumeinformation CanCaptureEvents "onclick on" (P request "netflix.com" "tonetflixminer ") assume CanServeInformation– fandango.com "fandango.com" (P "netflix.com" "netflixminer") assume CanServeInformation "amazon.com" (P "netflix.com" "netflixminer") assume CanServeInformation– amazon.com "metacritic.com" (P "netflix.com" "netflixminer") assume CanHandleSites– metacritic.com "netflix.com" assume CanReadStore (P "netflix.com" "netflixminer") assume CanReadLocalFile "moviegenres.txt"

• An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.

Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.

•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.

Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies. Privacy-Aware News Personalization

Map RePriv interest taxonomy to del.icio.us topics

Query personal store for top interests

Ask del.icio.us API for “hot” stories in appropriate topic areas from nytimes.com

Replace nytimes.com front page with del.icio.us stories Privacy Policy

ChangeQuery “href del.icio.us” attribute with of anchortop elements interest dataon nytimes.com

Change TextContent of selected anchor and div elements on nytimes.com Evaluation Process

Technology/Web 2.0 Technology/Mobile Science/Chemistry Science/Physics

• 2,200 questions • Over 3 days • Types of results – Default – Personalized – Random

News Personalization: Effectiveness

Personalized

Random Most responses

rated highly!

Default Most responses

rated poorly

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 User Relevance Score RePriv Summary

• Existing solutions require privacy sacrifice

• RePriv is a browser-based solution – User retains control of personal information – High-quality information mined from browser use – General-purpose mining useful & performant – Flexibility with rigorous guarantees of privacy

• Personalized content & privacy can coexist