Data Intimacy and Machine Learning
Michael Kearns University of Pennsylvania [email protected]
AT&T Forum for Technology, Entertainment, and Policy Washington D.C. September 26, 2017
Accompanying whitepaper in prepara on Data Volume and Diversity
Utility Entertainment/Social Media Operation
Cardboard Chrome Google Google Daydream AdMob AdSense Android Allo Chromecast View Google+ Phones
Google Google Google Google Google Play Google Play Android Android Android Express Calendar Camera Play Games Movies & TV Auto Pay Wear
Google Calico DoubleClick Google Google Fit Google Play Tango YouTube Analytics Maps Flights Music
LinkNYC Project Project Fi Google Google Gmail YouTube YouTube TV Baseline Nest Trips Gaming
Senosis Tilt Brush Waze Waymo Health
Facebook Oculus VR Octazen Facebook Facebook Facebook Instagram Free Basics Facebook Check-In Events Groups Social Solutions Pixel Network
DeepFace Facebook Facebook WhatsApp Marketplace Messanger Private Data vs. Intimate Data
• Private: social security number, credit cards/history, medical records, etc. • Emphasis on objec ve “facts” and data, and keeping them locked down • In mate: opinions, a tudes, beliefs, moods, mental state, lifestyle, etc. • May not be “wri en down” anywhere • In mate data is more valuable and ac onable than private data Data Intimacy: “As If Unobserved”
[Stevens-Davidowitz] Data Intimacy: “As If Unobserved”
[Stevens-Davidowitz] Data Intimacy: Drawing Inferences
[Kosinski, S llwell, Graepel] Data Intimacy: Drawing Inferences
[Backstrom, Kleinberg] The Machine Learning Pipeline raw data feature engineering/extrac on feedback/supervision The Machine Learning Pipeline Never Enough: Long Tails Never Enough: Correlations Implications
• Generic data: e.g. raw network traffic, packets; has no “meaning” • Private data: sensi ve and personal, but s ll “on the surface” • In mate data: not even explicit or tangible, requires inferences • In mate data is the most valuable, and cannot be measured in bits