Hacker Bits, August 2016
Total Page:16
File Type:pdf, Size:1020Kb
hacker bits August 2016 new bits Hello from Redmond! As we near the dog days of summer, we can’t think of anything more restorative than catching up on our reading with a tall glass of iced tea. And if you are looking for something worthwhile to read, don’t miss Cody Littlewood’s My first 10 minutes on a server, an essential primer for anyone looking to secure Ubuntu. Speaking of summer reading, some of you have written to tell us that you’d love to see more data- related articles…well, consider that done! We’ll be bringing you lots of informative data-related pieces in the future, so stay tuned! Before we sign off (and get back to our iced tea), we’d like to give a big shout-out to our new subscribers (we’re looking at you, fans of DevMastery)! Welcome and we hope you like what you see at Hacker Bits. Our mission at Hacker Bits is to help our readers learn more, read less and stay current, so feel free to let us know how we can do it better! Peace and stay cool! — Maureen and Ray [email protected] content bits August 2016 06 What is differential privacy? My first 10 minutes on a server — 12 primer for securing Ubuntu The “cobra effect” that is disabling paste on password 18 fields Mental models I find repeatedly 24 useful How are zlib, gzip and Zip 44 related? Boosting sales with machine 48 learning We built voice modulation to mask gender in technical 54 interviews A simple mind hack that helps 62 beat procrastination hacker bits 3 contributor bits Matthew Green Cody Littlewood Troy Hunt Gabriel Weinberg Matthew is a cryptogra- Cody is CEO at Codelitt Troy is a Pluralsight Gabriel Weinberg is pher and professor at Incubator, a corpo- author, Microsoft Re- the CEO & Founder Johns Hopkins Univer- rate skunkworks, R&D gional Director, Most of DuckDuckGo, the sity. He has designed program and prod- Valuable Professional search engine that and analyzed cryp- uct incubator. He is a (MVP) and world-re- doesn't track you. I also tographic systems used staunch supporter of nowned Internet secu- co-authored Traction, in wireless networks, the EFF, Mozilla, The rity specialist. He’s the the book that helps you payment systems and Linux Foundation, and creator of “Have I been get traction. He resides digital content protec- other open source, net pwned?”, the free on- in Valley Forge, PA and tion platforms. freedom and privacy line service for breach on Twitter @yegg. focused organizations. monitoring and notifi- cations, and blogs at troyhunt.com from his home in Australia. Mark Adler Per Harald Borgen Aline Lerner Shyal Beardsley Dr. Mark Adler is a Per is a developer and Aline is the co-found- Shyal started his career Fellow at NASA's Jet machine learning en- er and CEO of doing R&D in Visual FX, Propulsion Laboratory, thusiast from Norway. interviewing.io. She including working on where he has been to He currently works as a likes ranting on the In- the Harry Potter mov- Saturn and Mars vir- developer at Xeneta. He ternet about how hiring ies. He is now CTO for tually through robot previously co-founded is broken, and her work a real estate startup. spacecraft. He has been a kids app startup, Pro- has appeared in Forbes, He is also the owner at a key contributor to the pell, which he ran for the Wall Street Journal, shyal.com ltd. where he opensource Info-ZIP, 3 years as the CEO. He and Fast Company. focuses on rapid devel- gzip, and zlib projects. has a degree in macro opment and entrepre- economics and learned neurship. to code at Found- ers&Coders in London in 2015. 4 hacker bits Ray Li Maureen Ker Curator Editor Ray is a software en- Maureen is an editor, gineer and data en- writer, enthusiastic thusiast who has been cook and prolific collec- blogging at rayli.net tor of useless kitchen for over a decade. He gadgets. She is the loves to learn, teach author of 3 books and and grow. You’ll usu- 100+ articles. Her work ally find him wrangling has appeared in the data, programming and New York Daily News, lifehacking. and various adult and children’s publications. hacker bits 5 Security What is differential privacy? By MATTHEW GREEN 6 hacker bits Apple announced that they will be using a technique called "Differential Privacy" to improve the privacy of their data collection practices. t the 2016 WWDC key- could mean for Apple and for note, Apple announced Privacy adds mathematical your iPhone. Aa series of new security noise to a small sample of and privacy features, including the individual’s usage pat- one feature that's drawn a bit of tern. As more people share The motivation the same pattern, general attention and confusion. Specifi- In the past several years, "aver- patterns begin to emerge, cally, Apple announced that they age people" have gotten used to which can inform and en- will be using a technique called the idea that they're sending a hance the user experience. "Differential Privacy" (henceforth hell of a lot of personal informa- referred to as DP) to improve the tion to the various services they In iOS 10, this technology privacy of their data collection use. Surveys also tell us they're will help improve Quick- practices. starting to feel uncomfortable Type and emoji sugges- The reaction to this by most about it. tions, Spotlight deep link people has been a big "???", This discomfort makes sense suggestions and Lookup since few people have even when you think about compa- Hints in Notes. heard of Differential Privacy, let nies using our personal data to alone understand what it means. market to us. To make a long story short, Unfortunately Apple isn't known But sometimes there are it sounds like Apple is going to for being terribly open when decent motivations for col- be collecting a lot more data it comes to sharing the secret lecting usage information. For from your phone. They're mainly sauce that drives their platform, example, Microsoft recently doing this to make their services so we'll just have to hope that announced a tool that can better, and not to collect individ- at some point they’d decide to diagnose pancreatic cancer by ual users' usage habits. publish more. monitoring your Bing queries. To guarantee this, Apple What we know so far comes Google famously runs Google intends to apply sophisticated from Apple's iOS 10 Preview Flu Trends. And of course, we all statistical techniques to ensure guide: benefit from crowdsourced data that this aggregate data the that improves the quality of the statistical functions it computes Starting with iOS 10, Apple services we use — from map- over all your information — is using Differential Priva- ping applications to restaurant don't leak your individual contri- cy technology to help dis- reviews. butions. In principle this sounds cover the usage patterns Unfortunately, even pretty good. But of course, the of a large number of users well-meaning data collection can devil is always in the details. without compromising indi- go bad. For example, in the late While we don't have those vidual privacy. 2000s, Netflix ran a competition details, this seems like a good to develop a better film recom- time to at least talk a bit about To obscure an individu- mendation algorithm. To drive what Differential Privacy is, how al’s identity, Differential the competition, they released it can be achieved, and what it hacker bits 7 The neat thing about DP is (that it) can be applied to...complex statistical calculations like the ones used by Machine Learning algorithms. an "anonymized" viewing dataset up the results" and releasing that had been stripped of identi- Imagine you have two oth- them does not satisfy the DP fying information. erwise identical databases, definition, since computing Unfortunately, this de-identi- one with your information a sum on the database that fication turned out to be insuf- in it, and one without it. contains your information will ficient. In a well-known piece of Differential Privacy en- potentially produce a different work, Narayanan and Shmatikov sures that the probability result from computing the sum showed that such datasets could that a statistical query will on a database without it. be used to re-identify specific produce a given result is Thus, even though these users and even predict their po- (nearly) the same whether sums may not seem to leak litical affiliation (!), if you simply it's conducted on the first much information, they reveal at knew a little bit of additional or second database. least a little bit about you. A key information about a given user. observation of the Differential This sort of thing should be One way to look at this is Privacy research is that in many worrying to us. Not just because that DP provides a way to know cases, DP can be achieved if the companies routinely share data if your data has a significant ef- tallying party is willing to add (though they do) but because fect on the outcome of a query. random noise to the result. breaches happen, and even sta- If it doesn't, then you might as For example, rather than tistics about a dataset can some- well contribute to the database, simply reporting the sum, the times leak information about the since there's almost no harm tallying party can inject noise individual records used to com- that can come of it.