RePriv: Re-Envisioning In-Browser Privacy
Matt Fredrikson Ben Livshits University of Wisconsin Microsoft Research New York Times
Share data to get Privacy personalized concerns results Netflix
Google news Amazon Browser: Personalization & Privacy
• Broad applications: – Site personalizationYourGoogleNetflixAmazon browser 12 – Personalized search1 Browsing history 11 1 10 Top: Computers: Security: Internet: Privacy – Ads10 2 Top: Arts: Movies: Genres: Film Noir 9 3 Top: Sports: Hockey: Ice Hockey 9 3 Top: Science: Math: Number Theory • User data in browser Top: Recreation: Outdoors: Fishing 8 4 7 5 Distill 6 • Control information release
User interest profile Scenario #1: Online Shopping
bn.com would like to learn your top interests. We will let them know you are interested in:
Interest • Science profile • Technology • Outdoors Interest profile
Accept Decline RePriv Protocol Scenario #2: Personalized Search
PersonalizedPersonalized Results Results Would you like to install an extension RePrivcalled “Bing Personalizer“weather”” that will: weather.com
“weather” •weather.comWatch mouse clicks“sports” on bing.com espn.com • Modify appearance of bing.com “sports” espn.com “movies” imdb.com • Store personal data in browser “movies” imdb.com “recipes” epicurious.com
“recipes” epicurious.comAccept Decline Contributions of RePriv
• An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.
Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.
•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.
Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies.
7 RePriv Architecture
Browser equipped with RePriv
Core mining Core mining Core mining Core mining 3rd party Miners providers
RePriv APIs User consent and policies consent User 1st party Personal store providers Core Mining
• Taxonomy from first two • Naïve Bayes levels of ODP taxonomy – All categories equally likely – ~450 categories total – Training: min(3000, # – 20 top-level categories pages) sites per category – Overlap exists – Attribute words occur in at least 15% of docs for ≥1 category
Physics • Science Classification is fast Math enough: O(c•n) Top – n is # words in document Sports Football – c is # document categories
Global Mining Convergence
40
35 30 25 20 15 10
Avg. Distance From Final From Distance Avg. 5 0 0 10 20 30 40 50 60 70 80 90 % History Complete RePriv vs. the White Pages
Source: WebMii • An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.
Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.
•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.
Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies. Verifying Miners
• Untrusted miners are written in Fine
• API wrappers for RePriv functionality written in Fine Miner Name C# LoC Fine LoC Verif. Time • Refined types on security-critical arguments to reflect TwitterMiner 89 36 6.4 policy BingMinerneeds 78 35 6.8 • All MinersNetflixMiner state policy at112 top of source110 code 7.7 GlueMiner 213 101 9.5 • Won’t compile unless code follows policy
assume ExtensionId "twitterminer" assume CanCommunicateXHR "twitter.com“ Nil assume CanUpdateStore("twitter.com“ “twitterminer”) val MakeRequest: p:provs -> ({host:string | CanCommunicate host p}) -> t:tracked,p> -> … unit
Netflix Example
• Update interest profile letbased doGetMovies on Netflix.com genre cdom = interactions … 114 lines of Fine code let– flixEntsWatches = GetStoreEntriesByTopic clicks on rating links, myprov "movie" in assume ExtensionIdupdates "netflixminer store" assume foralllet– genreFlixReads(s:string) . store(ExtensionId = bind to s)myprovfind => CanUpdateStore recently flixEnts- (P "netflix.com" s) assume forall viewed (s:string ) . CanReadDOMId movies (filterByGenre by "netflix.com" genre genre) s in assume CanReadDOMClass "netflix.com" "rv1" assume CanReadDOMClassExtensionReturn "netflix.com" cdom "rv2" myprov genreFlix assume CanReadDOMClass "netflix.com" "rv3" assume• Can CanReadDOMClass provide "netflix.com" this "rv4" assume CanReadDOMClass "netflix.com" "rv5" assumeinformation CanCaptureEvents "onclick on" (P request "netflix.com" "tonetflixminer ") assume CanServeInformation– fandango.com "fandango.com" (P "netflix.com" "netflixminer") assume CanServeInformation "amazon.com" (P "netflix.com" "netflixminer") assume CanServeInformation– amazon.com "metacritic.com" (P "netflix.com" "netflixminer") assume CanHandleSites– metacritic.com "netflix.com" assume CanReadStore (P "netflix.com" "netflixminer") assume CanReadLocalFile "moviegenres.txt"
• An in-browser framework for collecting & RePriv managing personal data to facilitate personalization.
Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.
•A framework for integrating verified third-party RePriv miners code into the behavior mining & dissemination of RePriv.
Real-world •Evaluation of above mechanisms on real browsing Evaluation histories & two in-depth case studies. Privacy-Aware News Personalization
Map RePriv interest taxonomy to del.icio.us topics
Query personal store for top interests
Ask del.icio.us API for “hot” stories in appropriate topic areas from nytimes.com
Replace nytimes.com front page with del.icio.us stories Privacy Policy
ChangeQuery “href del.icio.us” attribute with of anchortop elements interest dataon nytimes.com
Change TextContent of selected anchor and div elements on nytimes.com Evaluation Process
Technology/Web 2.0 Technology/Mobile Science/Chemistry Science/Physics
• 2,200 questions • Over 3 days • Types of results – Default – Personalized – Random
News Personalization: Effectiveness
Personalized
Random Most responses
rated highly!
Default Most responses
rated poorly
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 User Relevance Score RePriv Summary
• Existing solutions require privacy sacrifice
• RePriv is a browser-based solution – User retains control of personal information – High-quality information mined from browser use – General-purpose mining useful & performant – Flexibility with rigorous guarantees of privacy
• Personalized content & privacy can coexist