hacker bits August 2016 new bits

Hello from Redmond!

As we near the dog days of summer, we can’t think of anything more restorative than catching up on our reading with a tall glass of iced tea. And if you are looking for something worthwhile to read, don’t miss Cody Littlewood’s My first 10 minutes on a server, an essential primer for anyone looking to secure Ubuntu.

Speaking of summer reading, some of you have written to tell us that you’d love to see more data- related articles…well, consider that done! We’ll be bringing you lots of informative data-related pieces in the future, so stay tuned!

Before we sign off (and get back to our iced tea), we’d like to give a big shout-out to our new subscribers (we’re looking at you, fans of DevMastery)! Welcome and we hope you like what you see at Hacker Bits. Our mission at Hacker Bits is to help our readers learn more, read less and stay current, so feel free to let us know how we can do it better!

Peace and stay cool!

— Maureen and Ray [email protected] content bits August 2016

06 What is differential privacy?

My first 10 minutes on a server — 12 primer for securing Ubuntu

The “cobra effect” that is disabling paste on password 18 fields

Mental models I find repeatedly 24 useful

How are , and 44 related?

Boosting sales with machine 48 learning

We built voice modulation to mask gender in technical 54 interviews

A simple mind hack that helps 62 beat procrastination

hacker bits 3 contributor bits

Matthew Green Cody Littlewood Troy Hunt Gabriel Weinberg Matthew is a cryptogra- Cody is CEO at Codelitt Troy is a Pluralsight Gabriel Weinberg is pher and professor at Incubator, a corpo- author, Microsoft Re- the CEO & Founder Johns Hopkins Univer- rate skunkworks, R&D gional Director, Most of DuckDuckGo, the sity. He has designed program and prod- Valuable Professional search engine that and analyzed cryp- uct incubator. He is a (MVP) and world-re- doesn't track you. I also tographic systems used staunch supporter of nowned Internet secu- co-authored Traction, in wireless networks, the EFF, Mozilla, The rity specialist. He’s the the book that helps you payment systems and Linux Foundation, and creator of “Have I been get traction. He resides digital content protec- other open source, net pwned?”, the free on- in Valley Forge, PA and tion platforms. freedom and privacy line service for breach on Twitter @yegg. focused organizations. monitoring and notifi- cations, and blogs at troyhunt.com from his home in Australia.

Mark Adler Per Harald Borgen Aline Lerner Shyal Beardsley Dr. Mark Adler is a Per is a developer and Aline is the co-found- Shyal started his career Fellow at NASA's Jet machine learning en- er and CEO of doing R&D in Visual FX, Propulsion Laboratory, thusiast from Norway. interviewing.io. She including working on where he has been to He currently works as a likes ranting on the In- the Harry Potter mov- Saturn and Mars vir- developer at Xeneta. He ternet about how hiring ies. He is now CTO for tually through robot previously co-founded is broken, and her work a real estate startup. spacecraft. He has been a kids app startup, Pro- has appeared in Forbes, He is also the owner at a key contributor to the pell, which he ran for the Wall Street Journal, shyal.com ltd. where he opensource Info-ZIP, 3 years as the CEO. He and Fast Company. focuses on rapid devel- gzip, and zlib projects. has a degree in macro opment and entrepre- economics and learned neurship. to code at Found- ers&Coders in London in 2015.

4 hacker bits Ray Li Maureen Ker Curator Editor Ray is a software en- Maureen is an editor, gineer and data en- writer, enthusiastic thusiast who has been cook and prolific collec- blogging at rayli.net tor of useless kitchen for over a decade. He gadgets. She is the loves to learn, teach author of 3 books and and grow. You’ll usu- 100+ articles. Her work ally find him wrangling has appeared in the data, programming and New York Daily News, lifehacking. and various adult and children’s publications.

hacker bits 5 Security

What is differential privacy?

By MATTHEW GREEN

6 hacker bits Apple announced that they will be using a technique called "Differential Privacy" to improve the privacy of their data collection practices.

t the 2016 WWDC key- could mean for Apple and for note, Apple announced Privacy adds mathematical your iPhone. Aa series of new security noise to a small sample of and privacy features, including the individual’s usage pat- one feature that's drawn a bit of tern. As more people share The motivation the same pattern, general attention and confusion. Specifi- In the past several years, "aver- patterns begin to emerge, cally, Apple announced that they age people" have gotten used to which can inform and en- will be using a technique called the idea that they're sending a hance the user experience. "Differential Privacy" (henceforth hell of a lot of personal informa- referred to as DP) to improve the tion to the various services they In iOS 10, this technology privacy of their data collection use. Surveys also tell us they're will help improve Quick- practices. starting to feel uncomfortable Type and emoji sugges- The reaction to this by most about it. tions, Spotlight deep link people has been a big "???", This discomfort makes sense suggestions and Lookup since few people have even when you think about compa- Hints in Notes. heard of Differential Privacy, let nies using our personal data to alone understand what it means. market to us. To make a long story short, Unfortunately Apple isn't known But sometimes there are it sounds like Apple is going to for being terribly open when decent motivations for col- be collecting a lot more data it comes to sharing the secret lecting usage information. For from your phone. They're mainly sauce that drives their platform, example, Microsoft recently doing this to make their services so we'll just have to hope that announced a tool that can better, and not to collect individ- at some point they’d decide to diagnose pancreatic cancer by ual users' usage habits. publish more. monitoring your Bing queries. To guarantee this, Apple What we know so far comes Google famously runs Google intends to apply sophisticated from Apple's iOS 10 Preview Flu Trends. And of course, we all statistical techniques to ensure guide: benefit from crowdsourced data that this aggregate data the that improves the quality of the statistical functions it computes Starting with iOS 10, Apple services we use — from map- over all your information — is using Differential Priva- ping applications to restaurant don't leak your individual contri- cy technology to help dis- reviews. butions. In principle this sounds cover the usage patterns Unfortunately, even pretty good. But of course, the of a large number of users well-meaning data collection can devil is always in the details. without compromising indi- go bad. For example, in the late While we don't have those vidual privacy. 2000s, Netflix ran a competition details, this seems like a good to develop a better film recom- time to at least talk a bit about To obscure an individu- mendation algorithm. To drive what Differential Privacy is, how al’s identity, Differential the competition, they released it can be achieved, and what it

hacker bits 7 The neat thing about DP is (that it) can be applied to...complex statistical calculations like the ones used by Machine Learning algorithms.

an "anonymized" viewing dataset up the results" and releasing that had been stripped of identi- Imagine you have two oth- them does not satisfy the DP fying information. erwise identical databases, definition, since computing Unfortunately, this de-identi- one with your information a sum on the database that fication turned out to be insuf- in it, and one without it. contains your information will ficient. In a well-known piece of Differential Privacy en- potentially produce a different work, Narayanan and Shmatikov sures that the probability result from computing the sum showed that such datasets could that a statistical query will on a database without it. be used to re-identify specific produce a given result is Thus, even though these users and even predict their po- (nearly) the same whether sums may not seem to leak litical affiliation (!), if you simply it's conducted on the first much information, they reveal at knew a little bit of additional or second database. least a little bit about you. A key information about a given user. observation of the Differential This sort of thing should be One way to look at this is Privacy research is that in many worrying to us. Not just because that DP provides a way to know cases, DP can be achieved if the companies routinely share data if your data has a significant ef- tallying party is willing to add (though they do) but because fect on the outcome of a query. random noise to the result. breaches happen, and even sta- If it doesn't, then you might as For example, rather than tistics about a dataset can some- well contribute to the database, simply reporting the sum, the times leak information about the since there's almost no harm tallying party can inject noise individual records used to com- that can come of it. from a Laplace or Gaussian pute it. Differential Privacy is a Consider a silly example: distribution, producing a re- set of tools that was designed to Imagine that you choose to en- sult that's not quite exact, but address this problem. able a reporting feature on your that masks the contents of any iPhone that tells Apple if you given row. (For other interesting like to use the ice cream emoji functions, there are many other What is Differential routinely in your iMessage con- techniques as well.) Privacy? versations. This report consists Even more usefully, the of a single bit of information: 1 calculation of "how much" noise Differential Privacy is a privacy indicates you like ice cream, and to inject can be made without definition that was originally 0 doesn't. Apple might receive knowing the contents of the developed by Dwork, Nissim, these reports and fill them into a database itself (or even its size). McSherry and Smith, with ma- huge database. At the end of the That is, the noise calculation jor contributions by many day, it wants to be able to derive can be performed based only on others over the years. Roughly a count of the users who like knowledge of the function to be speaking, what it states can be this particular emoji. computed, and the acceptable summed up intuitively as fol- It goes without saying that amount of data leakage. lows: the simple process of "tallying

8 hacker bits  Mortality vs. info disclosure, from Frederikson et al. The red line is patient mortality.

A tradeoff between 1. The more information you Set it too high, and you leak intend to "ask" of your your sensitive data. Set it too privacy and accuracy database, the more noise low, and the answers you get Now obviously calculating the has to be injected in order might not be particularly useful. total number of ice cream-lov- to minimize the privacy Now in some applications, ing users on a system is a leakage. This means that like many of the ones on our iP- pretty silly example. The neat in DP there is generally hones, the lack of accuracy isn't thing about DP is that the same a fundamental tradeoff a big deal. We're used to our overall approach can be applied between accuracy and phones making mistakes. But to much more interesting func- privacy, which can be a sometimes when DP is applied tions, including complex statis- big problem when training in complex applications, such as tical calculations like the ones complex ML models. training Machine Learning mod- used by Machine Learning algo- 2. Once data has been els, this really does matter. rithms. It can even be applied leaked, it's gone. Once To give an absolutely cra- when many different functions you've leaked as much zy example of how big the are all computed over the same data as your calculations tradeoffs can be, consider this database. tell you is safe, you can't paper by Frederikson et al. from But there's a big caveat here. keep going, at least not 2014. The authors began with a Namely, while the amount of without risking your users' public database linking warfarin "information leakage" from a privacy. At this point, the dosage outcomes to specific single query can be bounded by best solution may be to genetic markers. They then used a small value, this value is not just destroy the database ML techniques to develop a dos- zero. Each time you query the and start over,if such a ing model based on their data- database on some function, the thing is possible. base, but applied DP at various total "leakage" increases and privacy budgets while training can never go down. Over time, The total allowed leakage is the model. Then they evaluated as you make more queries, this often referred to as a "privacy both the information leakage leakage can start to add up. budget", and it determines how and the model's success at treat- This is one of the more chal- many queries will be allowed ing simulated "patients." lenging aspects of DP. It mani- (and how accurate the results The results showed that the fests in two basic ways: will be). The basic lesson of DP model's accuracy depends a lot is that the devil is in the budget. on the privacy budget on which

hacker bits 9 Randomized response has been shown to achieve Differential Privacy, with specific guarantees that can be adjusted by fiddling with the coin bias.

it was trained. If the budget is trusted database operator who called RAPPOR, is based on an set too high, the database leaks has access to all of the "raw" un- implementation of the 50-year a great deal of sensitive patient derlying data. I chose this model old randomized response tech- information, but the resulting because it's the traditional mod- nique. Randomized response model makes dosing decisions el used in most of the literature, works as follows: that are about as safe as stan- not because it's a particularly dard clinical practice. great idea. 1. When a user wants to re- On the other hand, when the In fact, it would be wor- port a piece of potentially budget was reduced to a level risome if Apple was actually embarrassing information that achieved meaningful priva- implementing their system this (made up example: "Do cy, the "noise-ridden" model had way. That would require Apple you use Bing?"), they first a tendency to kill its "patients." to collect all of your raw us- flip a coin, and if the coin Now before you freak out, age information into a massive comes up "heads", they let me be clear: your iPhone is centralized database, and then return a random answer, not going to kill you. Nobody is ("trust us!") calculate privacy-pre- calculated by flipping a saying that this example even serving statistics on it. At a second coin. Otherwise vaguely resembles what Apple is minimum this would make your they answer honestly. going to do on the phone. data vulnerable to subpoenas, 2. The server then collects The lesson of this research is Russian hackers, nosy Apple answers from the entire simply that there are interesting executives and so on. population, and (know- tradeoffs between effectiveness Fortunately this is not the ing the probability that and the privacy protection given only way to implement a Dif- the coins will come up by any DP-based system. These ferentially Private system. On "heads"), adjusts for the in- tradeoffs depend to a great the theoretical side, statistics cluded "noise" to compute degree on specific decisions can be computed using fancy an approximate answer for made by the system designers, cryptographic techniques (such the true response rate. the parameters chosen by the as secure multi-party compu- deploying parties, and so on. tation or fully-homomorphic Intuitively, randomized Hopefully Apple will soon tell us encryption.) Unfortunately these response protects the privacy what those choices are. techniques are probably too of individual user responses, inefficient to operate at the kind because a "yes" result could of scale Apple needs. mean that you use Bing, or it How do you collect A much more promising ap- could just be the effect of the the data, anyway? proach is not to collect the raw first mechanism (the random data at all. This approach was coin flip). You'll notice that in each of the recently pioneered by Google to More formally, randomized examples above, I've assumed collect usage statistics in their response has been shown to that queries are executed by a Chrome browser. The system, achieve Differential Privacy, with

10 hacker bits  I've met Craig Federighi. He actually looks like this in person.

specific guarantees that can be pretty weak evidence for any- how exciting it is to see research adjusted by fiddling with the thing, admittedly, but the pres- technology actually deployed in coin bias. ence of "hashing" in that quote the field. And Apple has a very RAPPOR takes this relatively at least hints towards the use of big field. old technique and turns it into RAPPOR-like filters. On the flipside, as security something much more power- The main challenge with professionals, it's our job to be ful. Instead of simply respond- randomized response systems is skeptical, and at a minimum ing to a single question, it can that they can leak data if a user demand people release their report on complex vectors of answers the same question mul- security-critical code (as Google questions, and may even return tiple times. RAPPOR tries to deal did with RAPPOR), or at least to complicated answers, such as with this in a variety of ways, be straightforward about what it strings, fore.g., which default one of which is to identify static is they're deploying. homepage you use. information and thus calculate If Apple is going to collect The latter is accomplished "permanent answers" rather than significant amounts of new data by first encoding the string into re-randomizing each time. from the devices that we depend a Bloom filter — a bitstring con- But it's possible to imagine on so much, we should really structed using hash functions in situations where such protec- make sure they're doing it right, a very specific way. The result- tions could go wrong. Once rather than cheering them for ing bits are then injected with again, the devil is very much in Using Such Cool Ideas. (I made noise, and summed, and the the details — we'll just have to this mistake already once, and I answers recovered using a (fairly see. I'm sure many fun papers still feel dumb about it.) complex) decoding process. will be written either way. But maybe this is all too "in- While there's no hard ev- side baseball". At the end of the idence that Apple is using a day, it sure looks like Apple is system like RAPPOR, there are So is Apple's use of honestly trying to do something some small hints. For exam- DP a good thing or a to improve user privacy, and ple, Apple's Craig Federighi given the alternatives, maybe describes Differential Privacy bad thing? that's more important than any- as "using hashing, subsampling As an academic researcher and thing else.  and noise injection to enable… a security professional, I have crowdsourced learning while mixed feelings about Apple's an- keeping the data of individual nouncement. On the one hand, users completely private." That's as a researcher, I understand

Reprinted with permission of the original author. First appeared at blog.cryptographyengineering.com.

hacker bits 11 Security

My first 10 minutes on a server — primer for securing Ubuntu

By CODY LITTLEWOOD

12 hacker bits I check our logwatch email every morning and thoroughly enjoy watching several hundred of attempts at gaining access with little prevail.

y First 5 Minutes on a which is what this is meant to your system applying the latest Server,” by Bryan Kenne- cover. patches. We'll have a section for “Mdy, is an excellent intro Disclaimer: This is meant to how to automate security up- into securing a server against serve as a primer and a base. grades later on. most attacks. We have a few You should extend upon it as modifications to his approach your needs dictate. apt-get update that we wanted to document as apt-get upgrade part of our efforts of externaliz- ing our processes and best prac- First things first tices. We also wanted to spend We don't even have a password Add your user a bit more time explaining a few for our root user yet. We'll want things that younger engineers to select something random and You should never be logging on may benefit from. complex. We use a password to a server as root. We follow a I check our logwatch email manager's password generator similar convention as Bryan in every morning and thoroughly set to the most difficult set- our user name, but you could enjoy watching several hundred ting. The PW manager saves the use whatever convention you'd (sometimes 1000s) of attempts password and it is encrypted like. With a small team, having at gaining access with little pre- with access only given by a long one login user hasn't been an vail. (Many are rather unimagi- master password. issue for us, but with a larger team, best practice would dic- native, such as trying root with A couple of redundancies are tate that different users would password 1234 over and over provided here (long, complex, again). random password + password be set up with different levels of This general overview works is stored behind encryption/an- permission only granting sudo for Debian/Ubuntu servers, other long password). Whether permissions to a select few. which are our personal favourite you use a PW manager or some choice. These usually only serve other means, keep this safe and useradd deploy mkdir /home/deploy as hosts for Docker containers, behind some form of encryp- but the principles still apply. mkdir /home/deploy/.ssh tion. You'll only need this root chmod 700 /home/deploy/.ssh We'll go more in depth in lock- password if you lose your sudo ing down a server specifically password. Set up your preferred shell for use as a Docker host another for the deploy user. Here we use time. # passwd bash: On large scale, you'll be better off with a full automat- Note: There's a lot of discussion usermod -s /bin/bash deploy ed setup using something like going on both HN and Reddit Ansible or Shipyard. However, on root passwords. It's worth a Remember chmod 700 means sometimes you're just creating read. that owner can read, write, and a single server or working on Next, you'll need to update execute. We're still root but in a base for an Ansible recipe, the repositories and upgrade a minute we'll recursively chown SSH keys are better than passwords only because they contain and require more information.

this folder for the deploy user SECURE PASSPHRASE PRO- Test deploy user and and deploy group. Only this user TECTING YOUR KEY. Re- should have access to do any- peated because it's bloody set up sudo thing with the .ssh folder. important. We're going to test logging in as deploy, while keeping our ssh So let's make password au- connection as root open just in Require ssh key thentication a thing of the past case. If it works, we'll use our on our server. Copy the contents authentication open connection as root user to 1 of your id_rsa.pub on your local set a password for deploy. Since We tend to avoid passwords machine to your server’s autho- we're disabling password logins, for logging into servers. There rized keys file. this password will be used when was a lot of discussion around sudo-ing. this after Bryan's original guide vim /home/deploy/.ssh/autho- Again we use a pw manager came out, but I tend to fall into rized_keys to create a complex and random this camp as well. Here are a few password, saving it behind an Let's set the right permis- notes on this: encrypted wall, and sharing it sions based on the Linux securi- among the team (syncing the ty principal of least privilege: 1. SSH keys are better than encrypted pw file). passwords only because they contain and require chmod 400 /home/deploy/.ssh/ passwd deploy more information. authorized_keys

2. Passwords can be brute chown deploy:deploy /home/de- Setting up sudo is simple. forced. Guessing a public ploy -R Open up the sudo file with: key is so essentially impos- sets permissions sible that they can be con- chmod 400 visudo so that the file can be read by sidered perfectly secure. owner. The second command, Add the %sudo group below 3. What about a stolen ma- chown makes the user deploy and the root user as shown below. chine? Yes, they have your group deploy owners (recursive- Make sure to comment out any private key, but expiring ly) of their home directory. We other users and groups with an SSH key is easy – just referenced this earlier when set- a #. (users have no prefix and remove the public key ting read/write/execute permis- groups start with %.) Most fresh from authorized_keys. You sions to owner for this directory. installs won't have any there, should also have your pri- We're going to come back but just make sure. vate key protected by a se- in a second after we've properly cure and long passphrase. tested our deploy user and sudo root ALL=(ALL) ALL See next point. to disable logging in as the root %sudo ALL=(ALL:ALL) ALL 4. All of this works, AS LONG user, and enforce ssh key logins Then add user to the AS YOU HAVE A LONG AND only. deploy sudo group.

14 hacker bits Simple is generally better with security. The DigitalOcean ufw is really good and goes over the basics.

file to match below. I think interface called ufw, which is a usermod -aG sudo deploy they're pretty self-explanatory. layer on top of IPtables meant to You'll want to add the IP that simplify things. Simple is gen- deploy now has access to you use to connect. We have a erally better with security. The sudo permissions. Now normally company VPN setup with Open- DigitalOcean ufw is really good you need exit and re-login to the VPN with cryptographic authen- and goes over the basics. shell in order to start having ac- tication, so in order to connect ufw is installed by default on cess to the group's permissions. to a server, you must also be Ubuntu and is just an apt-get There's a little trick though to authenticated and connected to install ufw away on Debian. avoid having to do that: the VPN. By default ufw should deny all incoming connections and exec su -l deploy PermitRootLogin no allow all outgoing connections, however, it won't be running This starts a new interactive PasswordAuthentication no (because otherwise how would shell for the user with deploy you be connected?). We'll go the new permissions to the AllowUsers deploy@(your-VPN- sudo or-static-IP) through and explicitly allow the group. It will require your de- connections that we deem okay. 's password, but it feels fast- ploy AddressFamily inet First we'll want to make sure er than logging out and logging Enable all these rules by that we are supporting IPv6. Just back in. open up the config file. Note: Thanks to ackack- restarting the ssh service. You'll acksyn on Reddit for pointing probably need to reconnect (do vim /etc/default/ufw out that you should not add us- so by using your deploy user!) ers directly to sudoers. Thanks Set IPv6 to yes. to FredFS456 in /r/netsec for service ssh restart pointing out you need to logout IPV6=yes and log back in for group per- Note: Thanks to raimue and missions to take effect. mwpmaybe on HN for pointing For the rest of the ports that out that fail2ban (installed later) we're going to open up, we can does not support IPv6 right now, just use the ufw tool from com- Enforce SSH key so I added AddressFamily inet to mand line, which is very handy. the sshd_config file, which will logins only allow IPv4 (which fail2ban sudo ufw allow from {your-ip} SSH configuration for the ma- does support). to any port 22 chine is stored here: sudo ufw allow 80 Setting up a firewall sudo ufw allow 443 vim /etc/ssh/sshd_config sudo ufw disable There are usually two camps. sudo ufw enable You'll want to change these Those who use IPtables direct- lines (or add if missing) in the ly and those who use a handy The first one is a redundancy

hacker bits 15 Fail2ban is a great package that actively blocks suspicious activity as it occurs.

measure that makes sure that to iptables. only connections from our IP APT::Periodic::Unattended-Up- Out of the box Fail2Ban can connect via SSH (standard grade "1"; comes with filters for various 2 SSH port). While the second and I generally agree with Bry- protocols (HTTPS, SMTP, SSH, third open up http and https an that you'll want to disable etc). It also has integration with traffic. normal updates and only enable a lot of services like Apache and Note: Thanks to chrisfos- security updates. The idea here Nginx, which can provide a cer- terelli for pointing out that if is that you don't want an appli- tain level of DDoS or brute force you're going to set up the first cation going down without you attack protection. rule (and you should), make knowing about it because some You'll want to be careful sure you have a static IP or se- package was updated that it with using it in this way though cure VPN that you connect to. A relies on, while security patches because depending on the IP dynamic IP will leave you locked very rarely create dependency address where the DDoS is out of your box some day in the nightmares at an application coming from, you could lock out future. level. real users for a time as well. It offers a lot of configuration op- tions including integration with vim /etc/apt/apt. Automated security conf.d/50unattended-upgrades SendMail to notify you when an updates IP gets banned. Feel free to take Make the file look like this: a look at some of the links and I like these. They're not perfect, see if any of the other options but it's better than missing Unattended-Upgrade::Al- interest you. patches as they come out. lowed-Origins { We're just going to install it "Ubuntu lucid-security"; and leave the default settings apt-get install unattend- //"Ubuntu lucid-updates"; }; for SSH as a base starting point ed-upgrades though: vim /etc/apt/apt.conf.d/10pe- You're all set. riodic apt-get install fail2ban Update this file to match Fail2ban this: Fail2ban is a great package 2 factor that actively blocks suspicious APT::Periodic::Update-Pack- authentication age-Lists "1"; activity as it occurs. From their wiki Fail2ban scans log files (e.g. 2FA is not optional for us when APT::Periodic::Download-Up- /var/log/apache/error_log) and building anything that has very gradeable-Packages "1"; bans IPs that show the malicious sensitive requirements. Theoret- APT::Periodic::AutocleanIn- signs – too many password fail- ically, if you're enforcing 2FA (on terval "7"; ures, seeking for exploits, and top of all these other measures), etc. It does this by adding rules then in order to gain access to

16 hacker bits Logwatch monitors your logfiles and when configured, sends you a daily email with the information parsed very nicely.

your server (baring application that helps you see what's going your application and services. vulnerabilities), the attacker on after the fact. Logwatch mon- These are another animal entire- would have to have: itors your logfiles and when con- ly though. figured, sends you a daily email We're making a big push to 1. Access to your certificate with the information parsed very externalize our processes and and key to access VPN nicely. best practices. If you're inter- 2. Access to your computer The output is quite enter- ested in learning more, take a to have your private key taining to watch and you'll be look at our repository. We open surprised at how many attempts source all of our policies and 3. Access to your passphrase are made every day to gain best practices, as well as contin- for your private key access to your server. I install ue to add to them there. 4. Access to your phone for it if for no other reason than to Have suggestions or ques- 2FA show the team how important tions? Comment below or sub- good security is. mit a PR/issue on the Github These are quite a few hur- There's a great write-up by repo! There are also a lot of dles to jump. Even then to gain DigitalOcean on Logwatch install really good bits of info on the root access via sudo, they'd have and config, but if we're keeping Hacker News thread and /r/ to have deploy's password that to 10 minutes, we'll just install it netsec.  is stored behind AES encryption and run a cron job to run it and 1 (5). email us daily. Make sure it's `.pub`. This seems to be very simple, but I've seen two people Install this package: (both *not* members of our organiza- apt-get install logwatch tion — it would be a quick way to stop apt-get install libpam-goo- being part of our org.) in my career, gle-authenticator Add a cron job: send me their private key (`id_rsa` with- out the .pub extension) when asking for their public keys. Set up by running this com- vim /etc/cron.daily/00log- mand and following the instruc- watch 2 There's a couple of camps on whether tions: to keep your SSH connection on a stan- Add this line to the cron file: dard or non-standard port. See [here] su deploy and [here] for opposing sides. google-authenticator /usr/sbin/logwatch --output mail --mailto [email protected] 2FA is very easy and adds a --detail high great layer of security.

Logwatch All done There you are. Your main con- This is really more of a simple cern and point of vulnerability pleasure and monitoring tool after completing this will be

Reprinted with permission of the original author. First appeared at codelitt.com/blog.

hacker bits 17 Security

The “cobra effect” that is disabling paste on password fields

By TROY HUNT

18 hacker bits ack in the day when the British had a penchant for Bconquering the world, they ran into a little problem on the subcontinent: cobras. Turns out there were a hell of a lot of the buggers wan- dering around India and it also turned out that they were rather venomous, which didn’t sit well with the colonials. Let’s just allow the nuanc- have some form of equivalent of Ingenious as the British es of that one to sink in for a CTRL-V, so what happens when were, they decided to offer the moment… you paste? Here's Go GE Capital citizens a bounty — you hand in I imagine Steve was picturing before logging in: dead cobras that would other- armies of elite hackers working wise have bitten some poor im- through password dictionaries perialist and you get some cash. with the old CTRL- / CTRL-V Problem solved. and the ICO then sweeping in Unfortunately, the enterpris- and taking British Gas’ proud- ing locals saw things differently ly mounted security certificate and interpreted the “cash for down from the wall in the foyer. cobras” scheme as a damn good That’s about the best I can come reason to start breeding ser- up with, but let’s not single out pents and raking in the dollars. British Gas here; this is a far Having now seen the flaw in broader issue than just them. their original logical, the poms quickly scrapped the scheme, meaning no more snake boun- For your security, we Now here they are after entering ty. Naturally the only thing for the username and pasting the the locals to do with their now have disabled pasting password: worthless cobras was to set of passwords them free so that they may seek out a nice cosy British settle- As far as amusing infosec tweets ment somewhere. go, British Gas did a fine job but This became known as the there are plenty of others who cobra effect or in other words, also reject the ability to paste a solution to a problem that into the password field as well, actually makes the whole thing a albeit with less humour (see lot worse. tweet above). Here’s a modern day imple- But what does this actual- mentation of the cobra effect as ly look like? I mean a website it relates to the ability to paste can’t disable the fact that your your password into a login field: operating system and browser Right… Go GE Capital just kills what- ever is pasted, it disappears in the blink of an eye. No warning, no “For your safety…” message, just goneski. PayPal take a slightly differ- ent approach on their change password page:

hacker bits 19  Figure 1: PayPal's change password page

Ok, you get a message Yeah, about that password… cript ideas to “hack” the ability which is nice, but zero info on anyway, the question remains, of CTRL-V out of browsers. why you shouldn’t be pasting in what voodoo is breaking this The point is that there are your password. I’ll come back most fundamental of be- many ways of skinning this cat to this example later on though haviours? For Al Rajhi, it’s very and it can be very easy. How- because there are other things simple: ever, it’s a conscious decision going on here which help ex- — the developer has to say “You plain their rationale. fields to begin with, so let’s cov- That’s it — the onpaste er that now. The mechanics of an event is returning false. Sound unfamiliar? I mean the onpaste Why on earth would anti-password-paster event? Yes, it’s a thing, here’s a Right, let’s drill into this and JSFiddler so you can play with it you want to paste a see what’s going on behind the yourself. password anyway?! scenes. We’ll pick someone dif- But there’s also this regard- ing the event: The argument of whether you ferent this time. Let’s try the Al should use a password or a Rajhi Bank in Saudi Arabia just Non-standard event de- passphrase or just let the cat for a change of flavour. wander over the keyboard and As you can see below, I’ve fined by Microsoft for use in Internet Explorer. see what happens has been well typed in a username and pasted and truly had, and in the end it a password: C’mon guys, we’ve been always comes back to this: The down the non-standard imple- only secure password is the one mentations in browsers before you can’t remember. and it always ends in tears. If you haven’t already Thing is though, not only reached this enlightened state does it work in IE but it works then free yourself from the in Chrome too. And Safari. And shackles of human memo- Firefox. And even if it didn’t, ry-based passwords and grab there are always bright JavaS- 1Password or KeePass or Last- Pass.

20 hacker bits  Figure 2: Error page from Al Rajhi bank

foundly simple in that they have Sometimes the name of the form The premise of all of these the ability to auto-fill the login field changes either due to the is that you delegate “remem- form of the website in ques- site being revised or because of bering” your password to the tion. For example, I was able to a perceived value of rotating and password manager and instead successfully use it to attempt to obfuscating the ID of the field. the only real memory required login to the Al Rajhi bank (see Sometimes you want to use (at least of the human variety) is figure 2). the same credentials on multiple to remember the single master As you can clearly see from domains of the same service, password that gains you access the red error message, the login and auto-fill only works against to all the other ones. credentials were incorrect (I’m the domain the pattern was Again, the debates on the guessing “johnsmith” may not recorded on. pros and cons of this have been be that common of a name Sometimes you’re in iOS had so go back to the aforemen- amongst their customers). and you really just want to use tioned blog post and read the But as you can also clearly Safari, so you copy from the comments if you want to jump see from the request headers, 1Password app and paste into on that bandwagon. both the username “johnsmith” the browser. Once a password manag- and the password “joshsmith” There are many, many valid er has the credentials stored, were successfully sent. In oth- reasons why people would want clearly at some point in time, er words, whilst non-standard to paste passwords in order to you’d want to get them back onpaste implementation may be increase their security profile, into a login form and actually blocking the old CTRL-V, 1Pass- yet the perception of those use them to gain access to the word’s ability to auto-fill the blocking this practice is that it website in question. form is not hindered. actually decreases security. Why? Password managers like However, clever tricks like Interesting you should ask… 1Password can make this pro- auto-fill don’t always work.

hacker bits 21 Because we’ve been page. You can easily paste into the maxlength attribute of the the login page and in fact you password field. bad, please create a can even paste into the original Nice one guys, good work weak password password field on the change there, bet you’re saving a bundle password page, just not the new of ingress data costs by cutting I was admittedly mystified by password field or the other field out those extra bytes! this practice so I asked around that confirms it. This, of course, is madness. and got a whole range of an- But it’s not just PayPal. Ap- Once passwords are stored, the swers along the lines of “be- parently it’s the same with the hash of a 50 character password cause some developers are just Oyster Card site in the UK (see is an identical length to that of a stupid”. tweet). 20 character one, which inciden- Hard to argue with that al- You can easily paste into the tally is also identical to that of though in fairness, this is often password field of the login page a 100 character one. (Of course something dictated to them (al- for the Oyster card, but appar- also keep in mind that there’s though perhaps they should be ently you can’t on the change a lot more to password storage doing a better job of articulating password page. The reason lies than just hashing). the counter-reasons). in the earlier message I showed All of you creating these low I thought I’d ask 1Password, from PayPal, in particular this arbitrary limits are actually hash- given these guys spend a bunch part of the password criteria: ing our passwords, right? Guys? of time thinking about how to I’ve written about why we’re securely get creds into websites. Use 8-20 characters forced into choosing bad pass- They pretty much concurred words in the past and there’s with everyone else (see tweet Ah, so because you’ve gone never really a good reason, just below). and put an arbitrary limit on various shades of bad reasons. Security theatre madness. the length of my password and That these bad reasons must Sounds about right. taken away my ability to create a then extend to disabling cus- But there’s one angle to this nice 50 character random string, tomers’ ability to use a pass- that helps explain the madness you’ve had to kill the paste word manager to at least ran- and it goes back to that earlier function, because otherwise domise the characters within the PayPal screen grab. I’d go around thinking I’ve got allowable limit only compounds This was of the change a 50 char password but it was the problem. password page, not the login actually truncated to 20 due to Here’s an idea — if you

22 hacker bits  Figure 3: Message from Trusteer.com

really want to force people into assumption that a compromised typing in that same crappy creating short passwords be- machine may be at risk of its combination of kids’ birthdays cause [insert baseless reason clipboard being accessed but and dog’s name they’re using all here], how about just watching not its keystrokes. over the place. for when the password field Why pull the password from In the examples above, suddenly has the maximum memory for the small portion of we’ve got a handful of websites number of allowable characters people that elect to use a pass- forcing customers into creating in it without the corresponding word manager when you can arbitrarily short passwords, then number of onkeypress events, just grab the keystrokes with disabling the ability to use pass- and then saying “Excuse me, malware? Crikey, the operat- word managers to the full extent you’ve pasted a password we ing system itself doesn’t even possible. And to make it even can’t accept, could you please need to be compromised to do worse, they’re using non-stan- weaken it somewhat?” that — you just grab a cheapie dard browser behaviour to do it! keylogger off the web and plug Hey, I’ve got an idea — it into someone’s USB keyboard. maybe I should offer a bounty But, but, but… Job done. where if anyone comes to me malware could steal and demonstrates that they have a crappy password on a your login, yeah, In summary, just site but is then able to kill it off that’s it, malware! don’t and create a secure one, I’d give them some cash. What could go There is a counter-argument to Of course it all comes back to wrong?!  pasting passwords and it goes this balance in security where like this: if your computer is there are no absolutes and infected by malware, the nasties things are often a trade-off inside it may be able to access between cost, convenience and your clipboard and steal the actually making the web a safer password you copied onto it. place. Sound crazy? Trusteer doesn’t But in the case of passwords, think so. (see figure 3) one of the best damn things Of course the irony of this anyone can do is get themselves position is that it makes the a password manager and stop

Reprinted with permission of the original author. First appeared at troyhunt.com.

hacker bits 23 Interesting

Mental models I find repeatedly useful

By GABRIEL WEINBERG

24 hacker bits A mental model is just a concept you can use to help try to explain things.

round 2003, I came my own experience and surely they are useful outside of across Charlie Mung- incomplete. I hope to continue their specific discipline. Aer’s 1995 speech, The Psy- to revise it as I remember and For example, use of the chology of Human Misjudgment, learn more. mental model “peak oil” which introduced me to how isn’t restricted to an energy behavioral economics can be context. Most references to applied in business and invest- How to use this list “peak x” are an invocation of ing. More profoundly, though, it I find mental models are useful this model. Similarly, infla- opened my mind to the power of to try to make sense of things tion as a concept applies seeking out and applying mental and to help generate ideas. To outside of economics, e.g. models across a wide array of actually be useful, however, you grade inflation and expecta- disciplines. have to apply them in the right tions inflation. A mental model is just a context at the right time. And • I roughly grouped the mental concept you can use to help for that to happen naturally, models by discipline, but as try to explain things (e.g. Han- you have to know them well and noted, this grouping is not lon’s Razor — “never attribute to practice using them. to be taken as an assertion malice that which is adequately Therefore, here are two sug- that they only apply within explained by carelessness.”). gestions for using this list: There are tens of thousands that discipline. The best ideas often arise when going of mental models, and every 1. For mental models you cross-discipline. discipline has their own set that don’t know or don’t know you can learn through course- well, you can use this list • I realize my definition of work, mentorship, or first-hand as a jumping-off point to mental model differs from experience. study them. I’ve provided some others, with mine There is a much smaller set links (mainly to Wikipedia) being more broadly defined of concepts, however, that come to start that process. as any concept that helps up repeatedly in day-to-day de- explain, analyze, or navi- cision making, problem solving, 2. When you have a particular gate the world. I prefer this and truth seeking. As Munger problem in front of you, broader definition because says, “80 or 90 important mod- you can go down this list, it allows me to assemble a els will carry about 90% of the and see if any of the mod- more wide-ranging list of freight in making you a worldly- els could possibly apply. useful concepts that may not wise person.” be mental models under oth- This post is my attempt to er definitions, but I neverthe- enumerate the mental models Notes less find on relatively equal that are repeatedly useful to me. • Most of the mental models footing in terms of useful- This set is clearly biased from on this list are here because ness in the real world.

hacker bits 25 If you’re trying to be generally effective, my best advice is to start with the things on this list.

• The numbers next to each Explaining or assumption.” (Related: mental model reflect the dimensionality reduction, or- frequency with which they • (1) Hanlon’s razor — “Nev- thogonality) come up: er attribute to malice that which is adequately ex- • (1) Proximate vs root plained by carelessness.” cause — “A proximate cause 1. Frequently (63 models) (Related: fundamental attri- is an event which is closest 2. Occasionally (43 models) bution error — “the tendency to, or immediately respon- sible for causing, some 3. Rarely, though still re- for people to place an undue observed result. This exists peatedly (83 models) emphasis on internal charac- teristics of the agent (charac- in contrast to a higher-lev- • If studying new models, I’d ter or intention), rather than el ultimate cause (or distal start with the lower num- external factors, in explain- cause), which is usually bers first. The quotes next ing another person’s behav- thought of as the ‘real’ to each concept are meant ior in a given situation.”) reason something occurred.” to be a basic definition to (Related: 5 whys — “to de- • (1) Occam’s razor — “Among termine the root cause of a remind you what it is, and is competing hypotheses, the not a teaching tool. Follow defect or problem by repeat- one with the fewest assump- ing the question ‘Why?’) the link to learn more. tions should be selected.” • I am not endorsing any of (Related: conjunction fallacy, these concepts as norma- overfitting, “when you hear Modeling tively good; I’m just saying hoof beats, think of horses they have repeatedly helped not zebras.”) • (1) Systems thinking — “By me explain and navigate the • (1) Cognitive biases — “Ten- taking the overall system as world. dencies to think in certain well as its parts into account, ways that can lead to sys- systems thinking is designed • I wish I had learned many of tematic deviations from a to avoid potentially contrib- these years earlier. In fact, standard of rationality or uting to further development the proximate cause for good judgments.” (See list of of unintended consequenc- posting this was so I could cognitive biases) es.” (Related: causal loop more effectively answer the diagrams, stock and flow, Le question I frequently get • (1) Arguing from first prin- Chatelier’s principle, hyster- from people I work with: ciples — “A first principle is a esis— “the time-based depen- “What should I learn next?” If basic, foundational, self-evi- dence of a system’s output you’re trying to be generally dent proposition or assump- on present and past inputs.”) effective, my best advice is tion that cannot be deduced from any other proposition • (1) Scenario analysis — “A to start with the things on process of analyzing possi- this list.

26 hacker bits Pareto principle — “for many events, roughly 80% of the effects come from 20% of the causes.”

ble future events by consid- physical quantities that are • (3) Pareto efficiency — “A ering alternative possible expected to be the sum of state of allocation of re- outcomes.” many independent processes sources in which it is im- • (1) Power-law — “A functional (such as measurement er- possible to make any one relationship between two rors) often have distributions individual better off without quantities, where a rela- that are nearly normal.” (Re- making at least one individ- tive change in one quantity lated: central limit theorem) ual worse off…A Pareto im- results in a proportional • (1) Sensitivity analysis — “The provement is defined to be a relative change in the other study of how the uncertainty change to a different alloca- quantity, independent of the in the output of a mathe- tion that makes at least one initial size of those quanti- matical model or system individual better off without ties: one quantity varies as (numerical or otherwise) can making any other individual a power of another.” (Re- be apportioned to different worse off, given a certain lated: Pareto distribution; sources of uncertainty in its initial allocation of goods Pareto principle — “for many inputs.” among a set of individuals.” events, roughly 80% of the • (1) Cost-benefit analysis — “A effects come from 20% of systematic approach to the causes.”, diminishing estimating the strengths and Brainstorming returns, premature optimi- weaknesses of alternatives • (1) Lateral thinking — “Solving zation, heavy-tailed distri- that satisfy transactions, ac- problems through an indi- bution, fat-tailed distribu- tivities or functional require- rect and creative approach, tion, long tail — “the portion ments for a business.” (Re- using reasoning that is not of the distribution having a lated: net present value — “a immediately obvious and large number of occurrenc- measurement of the profit- involving ideas that may es far from the “head” or ability of an undertaking that not be obtainable by using central part of the distribu- is calculated by subtracting only traditional step-by-step tion.”; black swan theory — “a the present values of cash logic.” metaphor that describes an outflows (including initial • (1) Divergent think- event that comes as a sur- cost) from the present values prise, has a major effect, and ing vs convergent think- of cash inflows over a period ing — “Divergent thinking is a is often inappropriately ratio- of time.”, discount rate) nalized after the fact with thought process or method the benefit of hindsight.”) • (3) Simulation — “The imita- used to generate creative tion of the operation of a ideas by exploring many • (1) Normal distribution — “A real-world process or system possible solutions. It is often very common continuous over time.” used in conjunction with its probability distribution… cognitive opposite, conver-

hacker bits 27 Leverage — “The force amplification achieved by using a tool, mechanical device or machine system.”

gent thinking, which follows especially an online commu- erations of scientists have a particular set of logical nity, rather than from em- different views.”) steps to arrive at one solu- ployees or suppliers.” (Relat- tion, which in some cases is ed: wisdom of the crowd — “a a ‘correct’ solution.” (Relat- large group’s aggregated an- Experimenting ed: groupthink) swers to questions involving • (1) Scientific method — “Sys- • (2) Critical mass — “The quantity estimation, general world knowledge, and spatial tematic observation, mea- smallest amount of fissile surement, and experiment, material needed for a sus- reasoning has generally been found to be as good as, and and the formulation, testing, tained nuclear chain reac- and modification of hypoth- tion.” “In social dynamics, often better than, the answer given by any of the individu- eses.” (Related: reproducibil- critical mass is a sufficient ity) number of adopters of an als within the group.”, collec- innovation in a social system tive intelligence) • (1) Proxy — “A variable that is so that the rate of adoption • (3) The structure of scientific not in itself directly relevant, becomes self-sustaining and revolutions — “An episodic but that serves in place of creates further growth.” model in which periods of an unobservable or immea- such conceptual continui- surable variable. In order • (2) Activation energy — “The for a variable to be a good minimum energy that must ty in normal science were interrupted by periods of proxy, it must have a close be available to a chemical correlation, not necessarily system with potential reac- revolutionary science. The discovery of “anomalies” linear, with the variable of tants to result in a chemical interest.” (Related: revealed reaction.” during revolutions in science leads to new paradigms. preference) • (2) Catalyst — “A substance New paradigms then ask • (1) Selection bias — “The se- that increases the rate of a new questions of old data, lection of individuals, groups chemical reaction.” move beyond the mere “puz- or data for analysis in such a • (2) Leverage — “The force am- zle-solving” of the previous way that proper randomiza- plification achieved by using paradigm, change the rules tion is not achieved, thereby a tool, mechanical device or of the game and the “map” ensuring that the sample machine system.” directing new research.” obtained is not representa- (Related: Planck’s princi- tive of the population intend- • (2) Crowdsourcing — “The ple — “the view that scien- ed to be analyzed.” (Relat- process of obtaining needed tific change does not occur ed: sampling bias) services, ideas, or content by because individual scien- soliciting contributions from • (1) Response bias — “A tists change their mind, but a large group of people, wide range of cognitive rather that successive gen-

28 hacker bits Survivorship bias — “The logical error of concentrating on the people or things that ‘survived’ and inadvertently overlooking those that did not.”

biases that influence the re- mation, back-of-the-envelope from a particular sample sponses of participants away calculation, dimensional does not necessarily include from an accurate or truthful analysis, Fermi problem) the true value of the parame- response.” • (1) Major vs minor fac- ter.” (Related: error bar) • (2) Observer effect — “Chang- tors — Major factors explains • (2) Bayes’ theorem — “De- es that the act of observation major portions of the re- scribes the probability of an will make on a phenomenon sults, while minor factors event, based on conditions being observed.” (Relat- only explain minor portions. that might be related to the ed: Schrödinger’s cat) (Related: first order vs event. For example, suppose • (2) Survivorship bias — “The second order effects — first one is interested in whether logical error of concentrating order effects directly follow a person has cancer, and on the people or things that from a cause, while second knows the person’s age. ‘survived’ some process and order effects follow from If cancer is related to age, inadvertently overlooking first order effects.) then, using Bayes’ theo- those that did not because • (1) False positives and false rem, information about the of their lack of visibility.” negatives — “A false posi- person’s age can be used to tive error, or in short false more accurately assess the • (3) Heisenberg uncertainty probability that they have principle — “A fundamental positive, commonly called a ‘false alarm’, is a result that cancer.” (Related: base rate limit to the precision with fallacy) which certain pairs of phys- indicates a given condition ical properties of a particle, has been fulfilled, when it • (2) Regression to the known as complementary actually has not been ful- mean — “The phenomenon variables, such as posi- filled…A false negative error, that if a variable is extreme tion x and momentum p, can or in short false negative, is on its first measurement, it be known.” where a test result indicates will tend to be closer to the that a condition failed, while average on its second mea- it actually was successful, surement.” Interpreting i.e. erroneously no effect has • (2) Inflection point— “A point been assumed.” on a curve at which the curve • (1) Order of magnitude — “An • (1) Confidence inter- changes from being con- order-of-magnitude esti- val — “Confidence intervals cave (concave downward) to mate of a variable whose consist of a range of values convex (concave upward), or precise value is unknown (interval) that act as good vice versa.” is an estimate rounded to estimates of the unknown • (3) Simpson’s paradox — “A the nearest power of ten.” population parameter; how- (Related: order of approxi- paradox in probability and ever, the interval computed statistics, in which a trend

hacker bits 29 False cause — “Presuming that a real or perceived relationship between things means that one is the cause of the other.”

appears in different groups perience. (Related: thinking tend to heavily weigh their of data but disappears or fast vs thinking slow — “a di- judgments toward more reverses when these groups chotomy between two modes recent information, making are combined.” of thought: ‘System 1’ is new opinions biased toward fast, instinctive and emo- that latest news.” tional; ‘System 2’ is slower, • (1) Confirmation bias — “The Deciding more deliberative, and more tendency to search for, logical.”) • (1) Business case — “Captures interpret, favor, and recall the reasoning for initiating • (1) Local vs global opti- information in a way that a project or task. It is often mum — “A local optimum of confirms one’s preexisting presented in a well-struc- an optimization problem is beliefs or hypotheses, while tured written document, but a solution that is optimal giving disproportionately may also sometimes come (either maximal or minimal) less consideration to alter- in the form of a short verbal within a neighboring set of native possibilities.” (Relat- argument or presentation.” candidate solutions. This is ed: cognitive dissonance) (Related: why this now?) in contrast to a global opti- • (3) Loss aversion — “People’s mum, which is the optimal • (1) Opportunity cost — “The tendency to strongly prefer solution among all possible avoiding losses to acquiring value of the best alterna- solutions, not just those in a tive forgone where, given gains.” (Related: diminishing particular neighborhood of marginal utility) limited resources, a choice values.” needs to be made between several mutually exclusive • (1) Decision trees — “A deci- alternatives. Assuming the sion support tool that uses Reasoning a tree-like graph or model of best choice is made, it is the • (1) Anecdotal — “Using a ‘cost’ incurred by not enjoy- decisions and their possible consequences, including personal experience or an ing the benefit that would isolated example instead of have been had by taking chance event outcomes, resource costs, and utility.” a sound argument or com- the second best available pelling evidence.” choice.” (Related: cost of (Related: expected value) capital) • (1) Sunk cost — “A cost that • (1) False cause — “Presum- has already been incurred ing that a real or perceived • (1) Intuition — Personal relationship between things experience coded into your and cannot be recovered.” (Related: “throwing good means that one is the cause personal neural network, of the other.” (Related: cor- which means your intuition money after bad”, “in for a penny, in for a pound”) relation does not imply is dangerous outside the causation, or in xkcd form) bounds of your personal ex- • (1) Availability bias — “People

30 hacker bits Likely — Thinking that just because something is possible means that it is likely.

• (1) Straw man — “Giving the by preventing more serious • (1) Incentives — “Something impression of refuting an crimes from happening.”) that motivates an individual opponent’s argument, while • (1) Black or white — “When to perform an action.” actually refuting an argu- two alternative states are • (2) Best alternative to a ment that was not advanced presented as the only pos- negotiated agreement by that opponent.” sibilities, when in fact more (BATNA) — “The most advan- • (1) Plausible — Thinking that possibilities exist.” tageous alternative course of just because something is • (1) Bandwagon — “Appealing action a party can take if ne- plausible means that it is to popularity or the fact that gotiations fail and an agree- true. many people do something ment cannot be reached.” • (1) Likely — Thinking that just as an attempted form of • (2) Zero-sum vs non-zero- because something is possi- validation.” sum— “A zero-sum game is a ble means that it is likely. • For a longer list, see Thou mathematical representation • (1) Appeal to emotion — “Ma- shall not commit logical fal- of a situation in which each nipulating an emotional lacies (I have this poster on participant’s gain (or loss) response in place of a valid my office door.) of utility is exactly balanced or compelling argument.” by the losses (or gains) of the utility of the other • (1) Ad hominem — “Attacking participant(s)…In contrast, your opponent’s character or Negotiating non-zero-sum describes a personal traits in an attempt • (1) The third story — “The situation in which the inter- to undermine their argu- third story is one an im- acting parties’ aggregate ment.” partial observer, such as a gains and losses can be less • (1) Slippery slope — “Assert- mediator, would tell; it’s a than or more than zero.” (Re- ing that if we allow A to version of events both sides lated: win-win — “A win–win happen, then Z will eventu- can agree on.” strategy is a conflict resolu- ally happen too, therefore A • (1) Active listening — “Re- tion process that aims to ac- should not happen.” (Relat- quires that the listener fully commodate all disputants.” ed: broken windows theo- concentrates, understands, • (3) Alternative dispute reso- ry — “maintaining and mon- responds and then remem- lution (ADR) — “Dispute reso- itoring urban environments bers what is being said.” lution processes and tech- to prevent small crimes such • (1) Trade-offs — “A situation niques that act as a means as vandalism, public drink- for disagreeing parties to ing, and toll-jumping helps that involves losing one quality or aspect of some- come to an agreement short to create an atmosphere of of litigation.” (Related: medi- order and lawfulness, there- thing in return for gaining another quality or aspect.” ation, arbitration)

hacker bits 31 Goodhart’s law — “When a measure becomes a target, it ceases to be a good measure.”

• (3) Prisoner’s dilemma — “A • (2) Preserving optionali- Managing standard example of a game ty — “A strategy of keeping analyzed in game theory that options open and fluid, • (1) Weekly 1–1s — “1–1’s shows why two completely fighting the urge to make can add a whole new level ‘rational’ individuals might choices too soon, before all of speed and agility to your not cooperate, even if it ap- of the uncertainties have company.” pears that it is in their best been resolved.” (Related: tyr- • (1) Forcing function — “A interests to do so.” (Relat- anny of small decisions — “a forcing function is any task, ed: Nash equilibrium, evolu- situation where a series of activity or event that forces tionarily stable strategy) small, individually rational you to take action and pro- decisions can negatively duce a result.” change the context of sub- • (1) Pygmalion effect — “The Mitigating sequent choices, even to the phenomenon whereby high- point where desired alterna- er expectations lead to an • (1) Unintended consequenc- tives are irreversibly de- es — “Outcomes that are increase in performance.” stroyed.”, boiling frog — “an (Related: market pull tech- not the ones foreseen and anecdote describing a intended by a purposeful nology policy — where the frog slowly being boiled government sets future action.” (Related: collateral alive.”, path dependence) damage — “Deaths, injuries, standards beyond what the or other damage inflicted • (2) Precautionary princi- current market can deliver, on an unintended target.”, ple — “If an action or policy and the market pulls that Goodhart’s law — “When a has a suspected risk of technology into existence.) measure becomes a target, it causing harm to the public, • (1) Virtual team — “A group ceases to be a good mea- or to the environment, in the of individuals who work sure”, Campbell’s law, Strei- absence of scientific consen- across time, space and sand effect— “The phenom- sus (that the action or policy organizational boundaries enon whereby an attempt is not harmful), the burden with links strengthened by to hide, remove, or censor a of proof that it is not harm- webs of communication piece of information has the ful falls on those taking an technology.” At least in some unintended consequence of action that may or may not circumstances, it is possible publicizing the information be a risk.” to have a completely virtual more widely, usually facili- • (2) Short-termism — team. The downsides in lack tated by the Internet”, cobra “Short-termism refers to an of face-to-face communica- effect — “when an attempted excessive focus on short- tion can be outweighed by solution to a problem ac- term results at the expense the upsides in sourcing from tually makes the problem of long-term interests.” the entire world. worse.”)

32 hacker bits Hedgehog vs fox — “A fox knows many things, but a hedgehog knows one important thing.”

• (2) Introversion vs extraver- clination, after an event has that you are right, you ac- sion — “Extraversion tends to occurred, to see the event as tually can’t let your junior be manifested in outgoing, having been predictable, de- colleague make a mistake.” talkative, energetic behavior, spite there having been little • (3) High-context vs low-con- whereas introversion is man- or no objective basis for pre- text culture — “In a high- ifested in more reserved and dicting it.” (Related: Pollyan- er-context culture, many solitary behavior. Virtually na principle — “tendency for things are left unsaid, letting all comprehensive models people to remember pleas- the culture explain. Words of personality include these ant items more accurately and word choice become concepts in various forms.” than unpleasant ones”) very important in higher-con- • (2) IQ vs EQ — “IQ is a total • (2) Organizational debt — “All text communication, since a score derived from one of the people/culture compro- few words can communicate several standardized tests mises made to ‘just get it a complex message very ef- designed to assess human done’ in the early stages of a fectively to an in-group (but intelligence.” “EQ is the startup.” less effectively outside that capacity of individuals to • (2) Generalist vs special- group), while in a low-con- recognize their own, and ist — “A generalist is a per- text culture, the communica- other people’s emotions, to son with a wide array of tor needs to be much more discriminate between differ- knowledge, the opposite of explicit and the value of a ent feelings and label them which is a specialist.” (Relat- single word is less import- appropriately, and to use ed: hedgehog vs fox — “A fox ant.” emotional information to knows many things, but a • (3) Peter principle — “The guide thinking and behav- hedgehog knows one im- selection of a candidate for ior.” portant thing.”) a position is based on the • (2) Growth mindset vs fixed • (2) Consequence vs convic- candidate’s performance mindset — “Those with a tion — “Where there is low in their current role, rather ‘fixed mindset’ believe that consequence and you have than on abilities relevant abilities are mostly innate very low confidence in your to the intended role. Thus, and interpret failure as own opinion, you should employees only stop being the lack of necessary basic absolutely delegate. And del- promoted once they can no abilities, while those with a egate completely, let people longer perform effectively, ‘growth mindset’ believe that make mistakes and learn. and ‘managers rise to the they can acquire any given On the other side, obvious- level of their incompetence.’” ability provided they invest ly where the consequences • (3) Maslow’s hierarchy of effort or study.” are dramatic and you have needs — “Maslow used the • (2) Hindsight bias — “The in- extremely high conviction terms ‘physiological’, ‘safe-

hacker bits 33 Zawinski’s law — “Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.”

ty’, ‘belongingness’ and Commandos, Infantry, and continues on the remaining ‘love’, ‘esteem’, ‘self-actual- Police: Whether invading half until it is successful.” ization’, and ‘self-transcen- countries or markets, the (Related: debugging) dence’ to describe the pat- first wave of troops to see • (1) Divide and conquer — “Re- tern that human motivations battle are the commandos… cursively breaking down a generally move through… Grouping offshore as the problem into two or more [though there is] little ev- commandos do their work sub-problems of the same idence for the ranking of is the second wave of sol- or related type, until these needs that Maslow described diers, the infantry…But there become simple enough to be or for the existence of a defi- is still a need for a military solved directly. The solutions nite hierarchy at all.” presence in the territory they to the sub-problems are then • (3) Loyalists vs mercenar- leave behind, which they combined to give a solution ies — “There are highly loyal have liberated. These third- to the original problem.” wave troops hate change. teams that can withstand • (1) Design pattern — “The almost anything and remain They aren’t troops at all but police.” re-usable form of a solution steadfastly behind their lead- to a design problem.” (Relat- er. And there are teams that ed: anti-pattern — “a common are entirely mercenary and response to a recurring will walk out without think- Developing problem that is usually inef- ing twice about it.” • (1) Technical debt — “A con- fective and risks being high- • (3) Dunbar’s number — “A cept in programming that re- ly counterproductive.”, dark suggested cognitive limit to flects the extra development pattern — “user interfaces the number of people with work that arises when code designed to trick people.”) that is easy to implement in whom one can maintain • (3) Zawinski’s law — “Every the short run is used instead stable social relationships… program attempts to expand of applying the best overall with a commonly used value until it can read mail. Those solution.” of 150.” programs which cannot so • (3) Zero tolerance — “Strict • (1) Binary search — “A search expand are replaced by ones punishment for infractions algorithm that finds the which can.” (Related: Green- of a stated rule, with the in- position of a target value spun’s tenth rule — “any tention of eliminating unde- within a sorted array. It com- sufficiently complicated C or sirable conduct.” pares the target value to the Fortran program contains an middle element of the array; • (3) Commandos vs Infantry ad hoc, informally-specified, if they are unequal, the half vs Police — “Three distinct bug-ridden, slow implemen- in which the target cannot lie groups of people that define tation of half of Common is eliminated and the search the lifetime of a company: Lisp.”)

34 hacker bits Clarke’s third law — “Any sufficiently advanced technology is indistinguishable from magic.”

• (3) Moore’s law — “The ob- invest in existing operations; familiar ideas was once servation that the number 2) acquire other businesses; unknown and unsuspect- of transistors in a dense 3) issue dividends; 4) pay ed…There are many more integrated circuit doubles down debt; 5) repurchase secrets left to find, but they approximately every two stock. Along with this, they will yield only to relentless years.” have three means of gen- searchers.” • (3) Metcalfe's Law — “The erating capital: 1) internal/ • (3) Strategic acquisition value of a telecommunica- operational cash flow; 2) vs financial acquisition vs tions network is propor- debt issuance; 3) equity acquihire — Different mo- tional to the square of the issuance.” tivations for an acquiring number of connected users • (3) Luck surface area — “When company typically have of the system…Within the you do something you’re ex- significantly different valu- context of social networks, cited about you will naturally ation models. (Related: rol- many, including Metcalfe pull others into your orbit. lup — “a technique used by himself, have proposed And the more people with investors (commonly private modified models using (n × whom you share your pas- equity firms) where multiple log n) proportionality rather sion, the more who will be small companies in the same than n^2 proportionality.” pulled into your orbit.” market are acquired and • (3) Clarke’s third law — “Any • (3) Hunting elephants vs merged.”, P/E-driven acquisi- sufficiently advanced tech- flies — “Salespeople some- tions, auction) nology is indistinguishable times refer to ‘elephants’, from magic.” ‘deers’ and ‘rabbits’ when they talk about the first Influencing three categories of custom- • (1) Framing — “With the same Business ers. To extend the metaphor information being used as a to the 4th and 5th type of base, the ‘frame’ surround- • (1) Minimum viable prod- customer, let’s call them uct — “A product with just ing the issue can change the ‘mice” and “flies’. So how can reader’s perception without enough features to gather you hunt 1,000 elephants, validated learning about the having to alter the actual 10,000 deer, 100,000 facts.” (Related: anchoring) product and its continued rabbits, 1,000,000 mice or development.” (Related: per- 10,000,000 flies?” (Relat- • (2) Cialdini’s six principles of fect is the enemy of good) ed: brontosaurus, whale, and influence — Reciprocity (“Peo- • (2) Capital allocation op- microbe) ple tend to return a fa- vor.”), Commitment (“If peo- tions — “Five capital alloca- • (3) Secrets — “Every one of tion choices CEOs have: 1) ple commit…they are more today’s most famous and likely to honor that commit-

hacker bits 35 Coda — “A term used in music primarily to designated a passage that brings a piece to an end.”

ment.”), Social Proof (“People Marketing sumers usually don’t go will do things they see other about their shopping by people are doing.”), Authori- • (1) Bullseye frame- conforming to particular ty (“People will tend to obey work — “With nineteen trac- segments. Rather, they take authority figures.”), Liking tion channels to consider, life as it comes. And when (“People are easily persuaded figuring out which one to faced with a job that needs by other people they like.”), focus on is tough. That’s doing, they essentially ‘hire’ and Scarcity (“Perceived scar- why we’ve created a simple a product to do that job.” city will generate demand”). framework called Bullseye that will help you find the • (3) Fear, uncertainty, and (Related: foot-in-the-door doubt (FUD) — “A disinforma- technique) channel that will get you traction.” tion strategy used in sales, • (3) Paradox of choice — “Elim- marketing, public relations, inating consumer choices • (1) Technology adoption politics and propaganda. can greatly reduce anxi- lifecycle — “Describes the FUD is generally a strategy ety for shoppers.” (Relat- adoption or acceptance of a to influence perception by ed: Hick’s law, “increasing new product or innovation, disseminating negative and the number of choices will according to the demo- dubious or false information, increase the decision time graphic and psychological and a manifestation of the logarithmically.”) characteristics of defined appeal to fear.” adopter groups. The pro- • (3) Major vs minor cess of adoption over time chords — “In Western music, is typically illustrated as a a minor chord, in compari- classical normal distribution Competing son, ‘sounds darker than a or “bell curve”. The mod- • (2) Supply and demand — “An major chord.’” el indicates that the first economic model of price • (3) Coda — “A term used in group of people to use a determination in a market. It music primarily to desig- new product is called ‘inno- concludes that in a compet- nated a passage that brings vators’, followed by ‘early itive market, the unit price a piece to an end.” (Relat- adopters’. Next come the for a particular good, or oth- ed: CTA.) People psycholog- early majority and late ma- er traded item such as labor ically expect codas, and so jority, and the last group to or liquid financial assets, will they can be used for influ- eventually adopt a product vary until it settles at a point ence. are called ‘laggards’.” (Re- where the quantity demand- lated: S-curve, crossing the ed (at the current price) will chasm, installation period vs equal the quantity supplied deployment period) (at the current price), result- • (3) Jobs to be done — “Con- ing in an economic equilib-

36 hacker bits Giffen good — “a product that people consume more of as the price rises and vice versa.”

rium for price and quantity raise the market price of a Strategizing transacted.” (Related: perfect good or service over margin- competition, arbitrage — “the al cost.” • (1) Sustainable competi- practice of taking advantage tive advantage — Structural • (3) Conspicuous consump- factors that allow a firm to of a price difference between tion — “The spending of two or more markets.”) outcompete its rivals for money on and the acquiring many years. • (2) Winner take all mar- of luxury goods and services ket — A market that tends to- to publicly display econom- • (1) Core competency — “A wards one dominant player. ic power.” (Related: Veblen harmonized combination of (Related: lock-in, monopo- goods — “types of luxury multiple resources and skills ly, monopsony) goods, such as expensive that distinguish a firm in the wines, jewelry, fashion-de- marketplace.” (Related: circle • (2) Two-sided market — “Eco- of competence — “you don’t nomic platforms having two signer handbags, and luxury cars, which are in demand have to be an expert on ev- distinct user groups that ery company, or even many. provide each other with net- because of the high prices asked for them.”) You only have to be able to work benefits.” evaluate companies within • (3) Barriers to entry — “A cost • (3) Comparative advan- your circle of competence. that must be incurred by a tage — “An agent has a The size of that circle is not new entrant into a market comparative advantage very important; knowing that incumbents don’t or over another in producing a its boundaries, however, is haven’t had to incur.” particular good if they can vital.”) produce that good at a lower • (3) Price elasticity — “The relative opportunity cost or • (1) Strategy vs tactics — Sun measurement of how re- autarky price, i.e. at a lower Tzu: “Strategy without tactics sponsive an economic relative marginal cost prior is the slowest route to victo- variable is to a change in to trade.” ry. Tactics without strategy another. It gives answers to is the noise before defeat.” questions such as ‘If I lower • (3) Creative destruc- tion — “Process of industrial • (1) Sphere of influence — “A the price of a product, how spatial region or concept much more will sell?’” (Relat- mutation that incessantly revolutionizes the econom- division over which a state ed: Giffen good — “a product or organization has a level that people consume more ic structure from within, incessantly destroying the of cultural, economic, mil- of as the price rises and vice itary, or political exclusiv- versa.”) old one, incessantly creating a new one.” ity, accommodating to the • (3) Market power — “The interests of powers outside ability of a firm to profitably the borders of the state that controls it.”

hacker bits 37 Switching costs — “The costs associated with switching suppliers.”

• (2) Unknown un- on two geographically sepa- sailed back under cover of knowns — “Known unknowns rate fronts.” night. The Greeks entered refers to ‘risks you are • (3) Flypaper theory — “The and destroyed.” aware of, such as cancelled idea that it is desirable to • (3) Empty fort strategy — “In- flights…’ Unknown un- draw enemies to a single volves using reverse psy- knowns are risks that ‘come area, where it is easier to kill chology (and luck) to deceive from situations that are so them and they are far from the enemy into thinking that out of this world that they one’s own vulnerabilities.” an empty location is full of never occur to you.’ (Related: honeypot) traps and ambushes, and • (2) Switching costs — “The • (3) Fighting the last war — therefore induce the enemy costs associated with switch- Using strategies and tactics to retreat.” (Related: vapor- ing suppliers.” that worked successfully in ware — “a product, typical- • (3) Network effect — “The the past, but are no longer ly computer hardware or effect that one user of a as useful. software, that is announced to the general public but is good or service has on the • (3) Rumsfeld’s rule — “You value of that product to oth- never actually manufactured go to war with the Army you nor officially cancelled.” er people. When a network have. They’re not the Army effect is present, the value you might want or wish to • (3) Exit strategy — “A means of a product or service is have at a later time.” (Relat- of leaving one’s current sit- dependent on the number of ed: Joy’s law — “no matter uation, either after a prede- others using it.” who you are, most of the termined objective has been • (3) Economies of scale — “The smartest people work for achieved, or as a strategy to cost advantages that enter- someone else.”) mitigate failure.” prises obtain due to size, • (3) Trojan horse — “After a • (3) Boots on the ground — output, or scale of operation, fruitless 10-year siege, the “The belief that military with cost per unit of output Greeks constructed a huge success can only be achieved generally decreasing with in- wooden horse, and hid a through the direct physical creasing scale as fixed costs select force of men inside. presence of troops in a con- are spread out over more The Greeks pretended to flict area.” units of output.” sail away, and the Trojans • (3) Winning hearts and pulled the horse into their minds — “In which one side city as a victory trophy. That seeks to prevail not by the Military night the Greek force crept use of superior force, but by • (3) Two-front war — “A war out of the horse and opened making emotional or intel- in which fighting takes place the gates for the rest of lectual appeals to sway sup- the Greek army, which had porters of the other side.”

38 hacker bits Information asymmetry — “The study of decisions in transactions where one party has more or better information than the other.”

• (3) Mutually assured destruc- itary unit reaches a landing in transactions where one tion — “In which a full-scale beach by sea and begins to party has more or better use of nuclear weapons by defend the area while other information than the other.” two or more opposing sides reinforcements help out until (Related: adverse selec- would cause the complete a unit large enough to begin tion — “when traders with annihilation of both the advancing has arrived.” better private information attacker and the defender. • (3) Proxy war — “A conflict about the quality of a prod- It is based on the theory of between two nations where uct will selectively partici- deterrence, which holds that neither country directly en- pate in trades which ben- the threat of using strong gages the other.” efit them the most”; moral weapons against the enemy hazard — “when one person prevents.” (Related: Mexican takes more risks because standoff, Zugzwang) Sports someone else bears the cost • (3) Containment — “A mil- of those risks.”) itary strategy to stop the • (3) Hail Mary pass — “A very • (3) Externalities — “An exter- expansion of an enemy. It is long forward pass in Amer- nality is the cost or benefit best known as the Cold War ican football, made in des- that affects a party who did policy of the United States peration with only a small not choose to incur that cost and its allies to prevent chance of success… has or benefit.” (Related: tragedy the spread of communism become generalized to refer of the commons — “A situa- abroad.” to any last-ditch effort with tion within a shared-resource little chance of success.” • (3) Appeasement — “A diplo- system where individual matic policy of making polit- users acting independent- ical or material concessions ly according to their own to an enemy power in order Market failure self-interest behave con- to avoid conflict.” (Related: • (2) Social vs market trary to the common good Danegeld, extortion) norms — “People are happy of all users by depleting to do things occasionally that resource through their • (3) Winning a battle but los- when they are not paid for collective action”; free rider ing the war— “A poor strat- them. In fact there are some problem — “when those who egy that wins a lesser (or situations in which work out- benefit from resources, sub-) objective but overlooks put is negatively affected by goods, or services do not and loses the true intended payment of small amounts pay for them, which re- objective.” (Related: sacrifice of money.” sults in an under-provision play) of those goods or ser- • (3) Information asymme- • (3) Beachhead — “A tempo- vices.”; Coase theorem — “if try — “The study of decisions rary line created when a mil- trade in an externality is

hacker bits 39 Shirky principle — “Institutions will try to preserve the problem to which they are the solution.”

possible and there are nate the industry or sector experiences from which one sufficiently low transaction it is charged with regulat- is absent.” costs, bargaining will lead to ing.” (Related: Shirky prin- • (2) Preferred stock vs com- a Pareto efficient outcome ciple — “Institutions will try mon stock — “Preferred stock regardless of the initial allo- to preserve the problem to is a type of stock which may cation of property.”) which they are the solution.”) have any combination of • (3) Deadweight loss — “A loss • (3) Duverger’s law — “A features not possessed by of economic efficiency that principle which states that common stock including can occur when equilibrium plurality-rule elections (such properties of both an equity for a good or service is not as first past the post) struc- and a debt instrument, and achieved or is not achiev- tured within single-member is generally considered a able.” districts tend to favor a hybrid instrument.” two-party system, and that • (3) Margin of safety — “The ‘the double ballot majority difference between the in- Political failure system and proportional trinsic value of a stock and representation tend to favor its market price.” • (2) Chilling effect — “The multipartism.’” inhibition or discouragement • (3) Investing vs specula- of the legitimate exercise of • (3) Arrow’s impossibility tion — “Typically, high-risk natural and legal rights by theorem — “When voters trades that are almost akin the threat of legal sanction… have three or more distinct to gambling fall under the Outside of the legal context alternatives (options), no umbrella of speculation, in common usage; any co- ranked order voting system whereas lower-risk invest- ercion or threat of coercion can convert the ranked pref- ments based on fundamen- (or other unpleasantries) can erences of individuals into a tals and analysis fall into the have a chilling effect on a community-wide (complete category of investing.” group of people regarding a and transitive) ranking while • (3) Compound interest — “In- specific behavior, and often also meeting a pre-specified terest on interest. It is the can be statistically measured set of criteria.” (Related: ap- result of reinvesting interest, or be plainly observed.” proval voting) rather than paying it out, • (3) Regulatory cap- so that interest in the next ture — “When a regulatory period is then earned on the agency, created to act in Investing principal sum plus previous- the public interest, instead • (2) Fear of missing ly-accumulated interest.” advances the commercial or out (FOMO) — “A pervasive • (3) Inflation — “A sustained political concerns of special apprehension that others increase in the general price interest groups that domi- might be having rewarding

40 hacker bits Makers vs manager’s schedule — “When you’re operating on the maker’s schedule, meetings are a disaster.”

level of goods and services • (3) Poison pill — “A type of relative competence and in an economy over a period defensive tactic used by may erroneously assume of time.” (Related: real vs a corporation’s board of that tasks which are easy nominal value, hyperinfla- directors against a takeover. for them are also easy for tion, deflation, debasement) Typically, such a plan gives others.” (Related: overconfi- • (3) Gross domestic product shareholders the right to buy dence effect) (GDP) — “A monetary measure more shares at a discount • (3) Spacing effect — “The phe- of the market value of all if one shareholder buys a nomenon whereby learning final goods and services pro- certain percentage or more is greater when studying is duced in a period (quarterly of the company’s shares.” spread out over time, as op- or yearly).” (Related: proxy fight). posed to studying the same • (3) Efficient-market hypoth- amount of time in a single esis — “Asset prices fully session.” reflect all available informa- Learning tion…Investors, including • (1) Deliberate prac- the likes of Warren Buffett, tice — “How expert one Productivity and researchers have dis- becomes at a skill has more • (1) Focus on high-leverage puted the efficient-market to do with how one practices activities — “Leverage should hypothesis both empirically than with merely perform- be the central, guiding and theoretically.” (Relat- ing a skill a large number of metric that helps you deter- ed: alpha) times.” mine where to focus your • (3) Purchasing power pari- • (3) Imposter syn- time.” (Related: Eisenhower ty — “Allows one to estimate drome — “High-achieving decision matrix — “what is what the exchange rate be- individuals marked by an important is seldom urgent, tween two currencies would inability to internalize their and what is urgent is sel- have to be in order for the accomplishments and a per- dom important.”, “The best exchange to be at par with sistent fear of being exposed time to plant a tree was 20 the purchasing power of the as a ‘fraud’.” years ago. The second best two countries’ currencies.” • (3) Dunning-Kruger ef- time is now.”, law of triviali- ty — “members of an organi- • (3) Insider trading — “The fect — “Relatively unskilled sation give disproportionate trading of a public compa- persons suffer illusory su- weight to trivial issues.”) ny’s stock or other securi- periority, mistakenly assess- ties (such as bonds or stock ing their ability to be much • (1) Makers vs manager’s options) by individuals with higher than it really is… schedule — “When you’re access to nonpublic informa- [and] highly skilled individu- operating on the maker’s tion about the company.” als may underestimate their schedule, meetings are a di- saster.” (Related: deep work)

hacker bits 41 Gate’s law — “Most people overestimate what they can do in one year and underestimate what they can do in ten years.”

• (2) Murphy’s law — “Anything • (2) Filling a vacuum — A vacu- endurance of systems and that can go wrong, will.” um “is space void of matter.” processes.” (Related:Hofstadter’s law, “It Filling a vacuum refers to • (3) Peak oil — “The point in always takes longer than you the fact that if a vacuum is time when the maximum expect, even when you take put next to something with rate of extraction of petro- into account Hofstadter’s pressure, it will be quickly leum is reached, after which law.”) filled by the gas producing it is expected to enter termi- • (3) Parkinson’s law — “Work that pressure. (Related: pow- nal decline.” expands so as to fill the time er vacuum) available for its completion.” • (2) Emergence — “Whereby • (3) Gate’s law — “Most people larger entities, patterns, and Philosophy regularities arise through overestimate what they can • (2) Consequential- do in one year and underes- interactions among smaller or simpler entities that them- ism — “Holding that the con- timate what they can do in sequences of one’s conduct ten years.” selves do not exhibit such properties.” (Related: de- are the ultimate basis for any centralized system, sponta- judgment about the right- neous order) ness or wrongness of that Nature conduct.” (Related: “ends jus- • (3) Natural selection — “The tify the means”) • (2) Nature vs nurture — “the differential survival and relative importance of an reproduction of individuals • (2) Distributive jus- individual’s innate qualities due to differences in pheno- tice vs procedural jus- as compared to an individu- type. It is a key mechanism tice — “Procedural justice con- al’s personal experiences in of evolution, the change in cerns the fairness and the causing individual differenc- heritable traits of a popula- transparency of the process- es, especially in behavioral tion over time.” es by which decisions are traits.” made, and may be contrast- • (3) Butterfly effect— “The ed with distributive justice • (2) Chain reaction — “A concept that small causes sequence of reactions (fairness in the distribution can have large effects.” (Re- of rights or resources), and where a reactive product lated: bullwhip effect — “in- or by-product causes ad- retributive justice (fairness in creasing swings in inventory the punishment of wrongs).” ditional reactions to take in response to shifts in place. In a chain reaction, customer demand as you • (3) Effective altruism — “En- positive feedback leads to move further up the supply courages individuals to con- a self-amplifying chain of chain.”) sider all causes and actions, events.” (Related: cascading and then act in the way that failure, domino effect) • (3) Sustainability — “The brings about the greatest

42 hacker bits Utilitarianism — “Holding that the best moral action is the one that maximizes utility.”

positive impact, based on Internet to acquire sensitive infor- their values.” mation such as usernames, • (2) Filter bubble — “In which a passwords, and credit card • (3) Utilitarianism — “Holding website algorithm selectively that the best moral action details (and sometimes, guesses what information a indirectly, money), often is the one that maximizes user would like to see based utility.” for malicious reasons, by on information about the masquerading as a trust- • (3) Agnosticism — “The view user (such as location, past worthy entity in an electronic that the truth values of click behavior and search communication.”, clickjack- certain claims — especially history) and, as a result, us- ing, social engineering) metaphysical and religious ers become separated from claims such as whether God, information that disagrees • (3) Content farm — “large the divine, or the supernatu- with their viewpoints, effec- amounts of textual content ral exist — are unknown and tively isolating them in their which is specifically de- perhaps unknowable.” own cultural or ideological signed to satisfy algorithms bubbles.” (Related: echo for maximal retrieval by • (3) Veil of ignorance — “A chamber) automated search engines.” method of determining the (Related: click farm — “where morality of a certain issue • (2) Botnet — “A number of In- a large group of low-paid (e.g., slavery) based upon ternet-connected computers workers are hired to click on the following thought exper- communicating with other paid advertising links for the iment: parties to the origi- similar machines in which click fraudster.”) nal position know nothing components located on net- about the particular abilities, worked computers commu- • (3) Micropayment — “A finan- tastes, and positions individ- nicate and coordinate their cial transaction involving a uals will have within a social actions by command and very small sum of money order. When such parties control (C&C) or by passing and usually one that occurs are selecting the principles messages to one another.” online.” for distribution of rights, (Related: flash mob) • (3) Godwin’s law — “If an positions, and resources in • (2) Spamming — “The use of online discussion (regardless the society in which they will electronic messaging sys- of topic or scope) goes on live, the veil of ignorance tems to send unsolicited long enough, sooner or later prevents them from know- messages (spam), especially someone will compare some- ing who will receive a given advertising, as well as send- one or something to Hitler or distribution of rights, posi- ing messages repeatedly Nazism.”  tions, and resources in that on the same site.” (Relat- society.” ed: phishing — “the attempt

Reprinted with permission of the original author. First appeared as a working document at medium.com/@yegg.

hacker bits 43 Programming

How are zlib, gzip and Zip related? What do they have in common and how are they different?

By MARK ADLER

44 hacker bits The ZIP format supports several compression methods.

Short form 0. The file is stored (no 17. Reserved by PKWARE compression) 18. File is compressed using .zip is an archive format using, usually, the compression 1. The file is Shrunk IBM TERSE (new) method. The .gz gzip format is 2. The file is Reduced with 19. IBM LZ77 z Architecture for single files, also using the compression factor 1 (PFS) Deflate compression method. 3. The file is Reduced with 97. WavPack compressed Often gzip is used in combina- compression factor 2 data tion with tar to make a com- 4. The file is Reduced with 98. PPMd version I, Rev 1 pressed archive format, .tar.gz. The zlib library provides compression factor 3 Deflate compression and decom- 5. The file is Reduced with Methods 1 to 7 are historical pression code for use by zip, compression factor 4 and are not in use. Methods 9 gzip, png (which uses the zlib through 98 are relatively recent wrapper on Deflate data), and 6. The file is Imploded additions, and are in varying, many other applications. 7. Reserved for Tokenizing small amounts of use. The only compression algorithm method in truly widespread use in the ZIP format is method 8, 8. The file is Deflated Long form Deflate, and to some smaller 9. Enhanced Deflating using The ZIP format was developed extent method 0, which is no Deflate64(tm) by Phil Katz as an open format compression at all. Virtually every file that with an open specification, 10. PKWARE Data Compres- .zip where his implementation, sion Library Imploding you will come across in the wild PKZIP, was shareware. It is an (old IBM TERSE) will use exclusively methods 8 and 0, and likely just method 8. archive format that stores files 11. Reserved by PKWARE and their directory structure, (Method 8 also has the means to where each file is individually 12. File is compressed using effectively store the data with no compressed. The file type is BZIP2 algorithm compression and relatively little . The files, as well as the di- 13. Reserved by PKWARE expansion, and Method 0 cannot .zip be streamed, whereas Method 8 rectory structure, can optionally 14. LZMA (EFS) be encrypted. can be.) The ZIP format supports sev- 15. Reserved by PKWARE The ISO/IEC 21320-1:2015 standard for file containers is eral compression methods: 16. Reserved by PKWARE a restricted zip format, such as

hacker bits 45 Unlike .tar, .zip has a central directory at the end, which provides a list of the contents.

used in Java archive files (.), Though some specific imple- create an archive of files, their Office Open XML files (Micro- mentations of Deflate were pat- attributes, and their directory soft Office .docx, .xlsx, .pptx), ented by Phil Katz, the format structure into a single .tar file, Office Document Format files was not, and so it was possible and to then compress it with (.odt, .ods, .odp), and EPUB files to write a Deflate implementa- compress to make a .tar.Z file. (.epub). That standard limits the tion that did not infringe on any In fact the tar utility had compression methods to 0 and patents. That implementation and still has an option to do the 8, as well as other constraints, has not been so challenged in compression at the same time, such as no encryption or signa- the last 20+ years. instead of having to pipe the tures. The Unix gzip utility was in- output of tar to compress. This Around 1990, the Info-ZIP tended as a drop-in replacement all carried forward to the gzip group wrote portable, free, open for compress, and in fact is able format, and tar has an option to source implementations of zip to decompress compress-com- compress directly to the .tar.gz and unzip utilities, supporting pressed data (assuming that you format. compression with the Deflate were able to parse that sen- The tar.gz format compress- format, and decompression of tence). gzip appends a .gz to the es better than the .zip approach, that and the earlier formats. file name. gzip uses the Deflate since the compression of a .tar This greatly expanded the use of compressed data format, which can take advantage of redundan- the .zip format. compresses quite a bit better cy across files, especially many In the early 90's, the gzip than Unix compress, has very fast small files. format was developed as a re- decompression, and adds a CRC- .tar.gz is the most common placement for the Unix compress 32 as an integrity check for the archive format in use on Unix utility, derived from the Deflate data. due to its very high portability, code in the Info-ZIP utilities. The header format also per- but there are more effective Unix compress was designed mits the storage of more infor- compression methods in use to compress a single file or mation than the compress format as well, so you will often see stream, appending a .Z to the allowed, such as the original file .tar.bz2 and .tar.xz archives. file name. compress uses the LZW name and the file modification Unlike .tar, .zip has a cen- compression algorithm, which at time. tral directory at the end, which the time was under patent and Though compress only com- provides a list of the contents. its free use was in dispute by presses a single file, it was That and the separate compres- the patent holders. common to use the tar utility to sion provide random access to

46 hacker bits zlib is now in widespread use for data transmission and storage.

the individual entries in a .zip zlib was adapted from the gzip compress the data using zlib. file. code. Different implementations A .tar file would have to All of the mentioned patents of Deflate can result in different be decompressed and scanned have since expired. compressed output for the same from start to end in order to The zlib library supports input data, as evidenced by the build a directory, which is how a Deflate compression and de- existence of selectable compres- .tar file is listed. compression, and three kinds sion levels that allow trading off Shortly after the introduction of wrapping around the Deflate compression effectiveness for of gzip, around the mid-1990's, streams. They are: no wrapping CPU time. the same patent dispute called at all ("raw" Deflate); zlib wrap- zlib and PKZIP are not the into question the free use of ping, which is used in the PNG only implementations of Deflate the .gif image format, which format data blocks; and gzip compression and decompres- was very widely used on bulletin wrapping, to provide gzip rou- sion. Both the 7-Zip archiving boards and the World Wide Web tines for the . utility and Google's zopfli library (a new thing at the time). The main difference between have the ability to use much So a small group created the zlib and gzip wrapping is that more CPU time than zlib in order PNG lossless compressed image the zlib wrapping is more com- to squeeze out the last few bits format, with file type .png, to pact, with six bytes vs. a mini- possible when using the Deflate replace .gif. That format also mum of 18 bytes for gzip, and format, reducing compressed uses the Deflate format for com- the integrity check, Adler-32, sizes by a few percent as com- pression, which is applied after runs faster than the CRC-32 that pared to zlib's highest compres- filters on the image data expose gzip uses. sion level. more of the redundancy. Raw Deflate is used by pro- The pigz utility, a parallel In order to promote wide- grams that read and write the implementation of gzip, includes spread usage of the PNG format, .zip format, which is another the option to use zlib (compres- two free code libraries were format that wraps around de- sion levels 1-9) or zopfli (com- created. libpng and zlib. libpng flate compressed data. pression level 11), and some- handled all of the features of the zlib is now in widespread what mitigates the time impact PNG format, and zlib provided use for data transmission and of using zopfli by splitting the the compression and decom- storage. For example, most compression of large files over pression code for use by libpng, HTTP transactions by servers multiple processors and cores.  as well as for other applications. and browsers compress and de-

Reprinted with permission of the original author. First appeared at stackoverflow.com/questions/.

hacker bits 47 Data

Boosting sales with machine learning How we use natural language processing to qualify leads

By PER HARALD BORGEN

48 hacker bits n this article, I’ll explain how we’re making our sales pro- Icess at Xeneta more effective by training a machine learning algorithm to predict the quality of our leads based upon their company descriptions. Head over to GitHub if you want to check out the script immediately, and feel free to suggest improvements as it’s  An example of a list of potential companies to contact, under continuous development. pulled from sec.gov

The problem This kind of pre-qualification And how do you make a of sales leads can take hours, as qualified guess? To understand It started with a request from it forces the sales representative that, you’ll first need to know business development represen- to figure out what every single what we do: tative Edvard, who was tired of company does (e.g. through performing the tedious task of reading about them on LinkedIn) In essence, Xeneta help going through big Excel sheets so that he/she can do a quali- companies that ship con- filled with company names, and fied guess at whether or not the tainers discover saving trying to identify which ones we company is a good fit for our potential by providing sea ought to contact. SaaS app. freight market intelligence.

More specifically, if your company ships above 500 con- tainers per year, you’re likely to discover significant saving po- tential by using Xeneta, as we’re able to tell you exactly where you’re paying above the market average price. This means that our target customers are vastly different from each other, as their only common denominator is that  This customer had a 748K USD saving potential down to market average on its sea freight spend. they’re somewhat involved in sea freight. Here are some ex- amples of company categories we target:

• Automotive • Freight forwarding • Chemicals • Consumer & retail • Low paying commodities

 This widget compares a customer's contracted rate (purple line) to the market average (green graph) for 20 foot containers from China to Northern Europe.

hacker bits 49 Given a company description, can we train an algorithm to predict whether or not it’s a potential Xeneta customer?

The hypothesis smelled like a messy, unpredict- training dataset. It needed to able and time consuming ac- contain at least 1,000 qualified Although the broad range of tivity, so we started looking for companies and 1,000 disquali- customers represents a chal- API’s to use instead. fied companies. lenge when finding leads, we’re After some searching we The first category was easy, normally able to tell if a compa- discovered FullContact, which as we could simply export a ny is of interest for Xeneta by has a Company API that provides list of 1,000 Xeneta users from reading their company descrip- you with descriptions of millions SalesForce. tion, as it often contains hints of of companies. Finding 1,000 disqualified whether or not they’re involved However, their API only was a bit tougher though, as we in sending stuff around the accept company URLs as inputs, don’t keep track of the compa- world. which are rarely present in our nies we’ve avoided contacting. This made us think: Excel sheets. So Edvard manually disqualified So we had to find a way to 1,000 companies. Given a company descrip- obtain the URLs as well, which tion, can we train an algo- made us land on the following rithm to predict whether or workflow: Cleaning the data not it’s a potential Xeneta customer? With that done, it was time to • Using the Google API to start writing the natural lan- google the company name guage processing script, with If so, this algorithm could (hacky, I know…) prove to be a huge time saver step one being to clean up the for the sales team, as it could • Loop through the search re- descriptions, as they are quite roughly sort the Excel sheets sult and find the most likely dirty and contain a lot of irrele- before they start qualifying the correct URL vant information. In the examples below, I’ll leads manually. • Use this URL to query the go through each of the cleaning FullContact API techniques we’re currently ap- The development plying, and show you how a raw There’s of course a loss at description ends up as an array As I started working on this, I each step here, so we’re going of numbers. quickly realised that the ma- to find a better way of doing An example of a raw descrip- chine learning part wasn’t the this. However, this worked well tion: only problem – we also needed a enough to test the idea out. way to get hold of the company Leading provider of busi- descriptions. ness solutions for the glob- We considered crawling the The dataset al insurance industry. TIA companies’ websites to fetch Having these scripts in place, Technology A/S is a soft- the About us section. But this the next step was to create our ware company that devel-

50 hacker bits We also need to transform the descriptions into something the machine understands, which is numbers.

ops leading edge business from nltk.stem.snowball description = [word for word solutions for the global import SnowballStemmer in description if not word in insurance industry. stopWords] stemmer = SnowballStemmer(‘english’) After removing stop words:

RegExp description = getDescription() lead provid busi solut The first thing we do is to use global insur industri tia regular expressions to get rid of description = [stemmer. technolog softwar compani non-alphabetical characters, as stem(word) for word in develop lead edg busi solut our model will only be able to description] global insur industri learn words. After stemming the words:

description = re.sub(“[^a- lead provid of busi solut zA-Z]”, “ “, description) for the global insur in- Transforming the After removing non-alpha- dustri tia technolog a s is data a softwar compani that betical characters: But cleaning and stemming develop lead edg busi solut the data won’t actually help us for the global insur indus- Leading provider of busi- do any machine learning, as tri ness solutions for the glob- we also need to transform the al insurance industry TIA descriptions into something the Technology A S is a soft- machine understands, which is Stop words ware company that devel- numbers. ops leading edge business We then remove stop words, solutions for the global using Natural Language Toolkit. Bag of Words insurance industry Stop words are words that have For this, we’re using the Bag of little relevance for the concep- Words (BoW) approach. If you’re tual understanding of the text, Stemmer not familiar with BoW, I’d rec- such as is, to, for, at, I, it, etc. ommend you read this Kaggle We also stem the words. This tutorial. means reducing multiple vari- from nltk.corpus import stop- BoW is a simple technique to ations of the same word to its words turn text phrases into vectors, stem. So instead of accepting stopWords = set(stopwords. where each item in the vectors words like manufacturer, manu- words('english')) represents a specific word. Scikit faction, manufactured & man- learn’s CountVectorizer gives ufactoring, we rather simplify description = you a super simple way to do getDescription() them to manufact. this:

hacker bits 51  Figure 1: he vector after applying tf-idf (sorry about the bad formatting)

Tf-idf transformation Again, scikit learn saves the from sklearn.feature_ex- Finally, we also apply a tf-idf day by providing tf-idf out of the traction.text import Count- box. Simply fit the model to your Vectorizer transformation, which is short for term frequency inverse docu- vectorized training data, and vectorizer = CountVectoriz- ment frequency. It’s a technique then use the transform method er(analyzer = ‘word’, max_ that adjusts the importance of to transform it. (See figure 1) features=5000) the different words in your doc- vectorizer.fit(training_data) uments. More specifically, tf-idf will The algorithm vectorized_training_data = emphasise words that occur After all the data has been vectorizer.transform(train- ing_data) frequently in a description (term cleaned, vectorised and trans- frequency), while de-emphasis- formed, we can finally start The max_features parameter ing words that occur frequently doing some machine learning, tells the vectorizer how many in the entire dataset (inverse which is one of the simplest words you want to have in our document frequency). parts of this task. vocabulary. In this example, the I first sliced the data into vectorizer will include the 5000 from sklearn.feature_ 70% training data and 30% test- words that occur most frequent- extraction.text import ing data, and then started off ly in our dataset and reject the TfidfTransformer with two scikit learn algorithms: rest of them. tfidf = Random Forest (RF) and K Near- TfidfTransformer(norm=’l1') An example of a very small est Neighbors (KNN). (35 items) Bag of Words vector tfidf.fit(vectorized_training_ It quickly became clear that (Ours is 5K items long): data) RF outperformed KNN, as the former quickly reached more tfidf_vectorized_data = tfidf. [0 0 0 0 2 0 0 0 1 1 0 0 0 0 transform(vectorized_train- than 80% accuracy while the 2 0 1 0 1 1 0 0 0 1 0 0 0 0 1 ing_data) latter stayed at 60%. 0 2 0 0 2 0] Fitting a scikit learn model is

52 hacker bits def runForest(X_train, X_test, Y_train, Y_test):

forest = RandomForestClassifier(n_estimators=100) forest = forest.fit(X_train, Y_train) score = forest.score(X_test, Y_test) return score

forest_score = runForest(X_train, X_test, Y_train, Y_test)

 Figure 2: Finding scikit learn model super simple. (See figure 2) The road ahead We’ll be pushing to GitHub So I continued with RF to see regularly if you want to follow how much I could increase the However, the script is by no the progress. And feel free to accuracy by tuning the following means finished. There are tons leave a comment below if you parameters: of ways to improve it. For ex- have anything you’d like to ample, the algorithm is likely to add.  • Vocabulary: how many be biased towards the kind of words the CountVectorizer descriptions we currently have includes in the vocabulary in our training data. This might Thanks for reading! We are Xeneta — the (currently 5K) become a performance bottle- world’s leading sea freight intelligence neck when testing it on more platform. We’re always looking for bright • Gram Range: size of phrases real world data. minds to join us, so head over to our to include in Bag Of Words website if you’re interested! You can Here are a few activities follow us at both Twitter and Medium. (currently 1–3, meaning up we’re considering to do in the until ‘3 word’-phrases) road ahead: • Estimators: amount of esti- mators to include in Random • Get more data (scraping, Forest (currently 90) other API’s, improve data cleaning)

With these parameters • Test other types of data tuned, the algorithm reaches an transformation (e.g. word- accuracy of 86.4% on the testing 2vec) dataset, and is actually starting to become useful for our sales • Test other ML algorithms team. (e.g. neural nets)

Reprinted with permission of the original author. First appeared at medium.com/xeneta/.

hacker bits 53 Programming

We built voice modulation to mask gender in technical interviews. Here’s what happened.

By ALINE LERNER

54 hacker bits We made men sound like women and women sound like men, and looked at how that affected their interview performance.

nterviewing.io is a platform The setup few different dimensions. where people can practice As you can see, we ask the Itechnical interviewing anony- When an interviewer and an interviewer if they would ad- mously and in the process, find interviewee match on our plat- vance their interviewee to the jobs based on their interview form, they meet in a collabora- next round. We also ask about a performance rather than their tive coding environment with few different aspects of inter- resumes. voice, text chat, and a white- view performance using a 1-4 Since we started, we’ve board and jump right into a scale. On our platform, a score amassed data from thousands technical question. of 3 or above is generally con- of technical interviews, and in Interview questions on the sidered good. this blog, we routinely share platform tend to fall into the cat- some of the surprising stuff egory of what you’d encounter we’ve learned. In this post, I’ll at a phone screen for a back- Women historically talk about what happened when end software engineering role, haven’t performed as we built real-time voice masking and interviewers typically come to investigate the magnitude of from a mix of large companies well as men… like Google, Facebook, Twitch, bias against women in technical One of the big motivators to and Yelp, as well as engineer- interviews. think about voice masking was ing-focused startups like Asana, In short, we made men the increasingly uncomfortable Mattermark, and others. sound like women and women disparity in interview perfor- After every interview, inter- sound like men, and looked at mance on the platform between how that affected their interview viewers rate interviewees on a performance. We also looked at what happened when women did poorly in interviews, how dras- tically that differed from men’s behavior, and why that differ- ence matters for the thorny issue of the gender gap in tech.

 Feedback form for interviewers

hacker bits 55 Armed with the ability to hide gender during technical interviews, we were eager to see what the hell was going on...

men and women1. At that time, … so we built voice see what the hell was going on we had amassed over a thou- and get some insight into why sand interviews with enough masking women were consistently under- data to do some comparisons Since we started working on in- performing. and were surprised to discover terviewing.io, in order to achieve that women really were doing true interviewee anonymity, we worse. knew that hiding gender would The experiment Specifically, men were get- be something we’d have to deal The setup for our experiment ting advanced to the next round with eventually but we put it off was simple. Every Tuesday eve- 1.4 times more often than wom- for a while because it wasn’t ning at 7 PM Pacific, interview- en. Interviewee technical score technically trivial to build a ing.io hosts what we call prac- wasn’t faring that well either real-time voice modulator. Some tice rounds. In these practice — men on the platform had an early ideas included sending rounds, anyone with an account average technical score of 3 out female users a Bane mask. can show up, get matched with of 4, as compared to a 2.5 out When the Bane mask thing an interviewer, and go to town. of 4 for women. didn’t work out, we decided we And during a few of these Despite these numbers, it ought to build something with- rounds, we decided to see what was really difficult for me to in the app, and if you play the would happen to interviewees’ believe that women were just videos below, you can get an performance when we started somehow worse at computers, idea of what voice masking on messing with their perceived so when some of our customers interviewing.io sounds like. In genders. asked us to build voice masking the first one, I’m talking in my In the spirit of not giving to see if that would make a dif- normal voice. away what we were doing and ference in the conversion rates And in the second one, I’m potentially compromising the of female candidates, we didn’t modulated to sound like a man.2 experiment, we told both inter- need much convincing. Armed with the ability to viewees and interviewers that we hide gender during technical were slowly rolling out our new interviews, we were eager to voice masking feature and that

56 hacker bits they could opt in or out of help- • Modulated without pitch we threw that condition in as a ing us test it out. Most people change further control. opted in, and we informed inter- • Modulated with pitch change viewees that their voice might be masked during a given round The results You might ask why we and asked them to refrain from After running the experiment, included the second condition, sharing their gender with their we ended up with some rather i.e. modulated interviews that interviewers. For interviewers, surprising results. Contrary to didn’t change the interviewee’s we simply told them that inter- what we expected (and probably pitch. As you probably noticed, viewee voices might sound a bit contrary to what you expected if you played the videos above, processed. as well!), masking gender had the modulated one sounds fairly We ended up with 234 total no effect on interview perfor- processed. interviews (roughly 2/3 male mance with respect to any of the The last thing we wanted and 1/3 female interviewees), scoring criteria (would advance was for interviewers to assume which fell into one of three cate- to next round, technical ability, that any processed-sounding gories: problem solving ability). interviewee must summarily If anything, we started to no- have been the opposite gender • Completely unmodulated tice some trends in the opposite of what they sounded like. So (useful as a baseline) direction of what we expected:

hacker bits 57 for technical ability, it appeared ject of gender and interview So if there’s no that men who were modulated performance. We’ll continue to to sound like women did a bit monitor the data as we collect systemic bias, better than unmodulated men, more of it, and it’s very possible why are women and that women who were mod- that as we do, everything we’ve performing worse? ulated to sound like men did found will be overturned. a bit worse than unmodulated I will say, though, that had After the experiment was over, women. Though these trends there been any staggering gen- I was left scratching my head. weren’t statistically significant, der bias on the platform, with If the issue wasn’t interviewer I am mentioning them because a few hundred data points, we bias, what could it be? I went they were unexpected and defi- would have gotten some kind back and looked at the seniority nitely something to watch for as of result. So that, at least, was levels of men vs. women on the we collect more data. encouraging. platform as well as the kind of On the subject of sample work they were doing in their size, we have no delusions that current jobs, and neither of this is the be-all and end-all of those factors seemed to differ pronouncements on the sub- significantly between groups.

58 hacker bits Women leave interviewing.io roughly 7 times as often as men after they do badly in an interview.

But there was one nagging so you’re probably only going person would do in the face of thing in the back of my mind. I to do it if you’re still looking), an existential or moral quanda- spend a lot of my time poring were just trying out the platform ry, i.e. fit the data to a curve. An over interview data, and I had out of curiosity, or they didn’t exponential decay curve seemed noticed something peculiar like something else about their reasonable for attrition behav- when observing the behavior of interviewing.io experience. ior, and you can see what I came female interviewees. Anecdotal- up with below. ly, it seemed like women were The x-axis is the number leaving the platform a lot more A totally speculative of what I like to call “attrition often than men. So I ran the thought experiment events”, namely things that numbers. might happen to you over the What I learned was pretty So, if these are the kinds of course of your computer science shocking. As it happens, women behaviors that happen in the studies and subsequent career leave interviewing.io roughly 7 interviewing.io microcosm, that might make you want to times as often as men after they how much is applicable to the quit. do badly in an interview. And broader world of software en- The y-axis is what portion the numbers for two bad inter- gineering? Please bear with me of people are left after each views aren’t much better. as I wax hypothetical and try attrition event. The red curve You can see the breakdown to extrapolate what we’ve seen denotes women, and the blue of attrition by gender below (the here to our industry at large. curve denotes men. differences between men and And also, please know that Now, as I said, this is pret- women are indeed statistically what follows is very speculative, ty speculative, but it really got significant with P < 0.00001). based on not that much data, me thinking about what these Also note that as much as and could be totally wrong… but curves might mean in the broad- possible, I corrected for people you gotta start somewhere. er context of women in comput- leaving the platform because If you consider the attrition er science. How many “attrition they found a job (practicing in- data points above, you might events” does one encounter terviewing isn’t that fun after all, want to do what any reasonable between primary and secondary

hacker bits 59 We need 3 times as many women studying computer science than men to get to the same number in our pipelines.

education, and entering a col- Prior art, or why ing test to male and female un- legiate program in CS and then dergrads, and then asked them starting to embark on a career? maybe this isn’t so how they did. So, I don’t know, let’s say nuts after all Not surprisingly, though there are 8 of these events be- there was no difference in Since gathering these findings tween getting into programming performance between genders, and starting to talk about them and looking around for a job. women underrated their own a bit in the community, I began If that’s true, then we need 3 performance more often than to realize that there was some times as many women studying men. Afterwards, participants supremely interesting academic computer science than men to were asked whether they’d like work being done on gender dif- get to the same number in our to enter a Science Jeopardy ferences around self-perception, pipelines. contest on campus in which they confidence, and performance. Note that that’s 3 times could win cash prizes. Again, Some of the work below more than men, not 3 times women were significantly less found slightly different trends more than there are now. If we likely to participate, with partic- than we did, but it’s clear that think about how many there are ipation likelihood being directly anyone attempting to answer now, which, depending on your correlated with self-perception the question of the gender source, is between 1/3 and a rather than actual performance.3 gap in tech would be remiss in 1/4 of the number of men, to In a different study, sociolo- not considering the effects of get to pipeline parity, we actual- gists followed a number of male confidence and self-perception ly have to increase the number and female STEM students over in addition to the more salient of women studying computer the course of their college ca- matter of bias. science by an entire order of reers via diary entries authored In a study investigating the magnitude. by the students. One prevailing effects of perceived performance trend that emerged immediately to likelihood of subsequent was the difference between how engagement, Dunning (of Dun- men and women handled the ning-Kruger fame) and Ehrlinger “discovery of their [place in the] administered a scientific reason-

60 hacker bits It’s about women being bad at dusting themselves off after failing, which, despite everything, is probably a lot easier to fix.

pecking order of talent, an initi- far as to say that men “reported So while the attrition num- ation that is typical of socializa- they would experience a more bers aren’t great, I’m massively tion across the professions.” positive than negative affective encouraged by the fact that at For women, realizing that response after… being sexually least in these findings, it’s not they may no longer be at the top rejected.” about systemic bias against of the class and that there were Maybe tying coding to sex is women or women being bad at others who were performing bet- a bit tenuous, but, as they say, computers or whatever. Rather, ter, “the experience [triggered] a programming is like sex — one it’s about women being bad at more fundamental doubt about mistake and you have to support dusting themselves off after fail- their abilities to master the tech- it for the rest of your life. ing, which, despite everything, is nical constructs of engineering probably a lot easier to fix.  expertise [than men].” And of course, what survey Why I’m not of gender difference research depressed by our 1 Roughly 15% of our users are female. would be complete without an We want way more, but it’s a start. results and why you allusion to the wretched an- 2 If you want to hear more examples of nals of dating? When I told the shouldn’t be either voice modulation or are just generously interviewing.io team about the down to indulge me in some shameless Prior art aside, I would like disparity in attrition between bragging, we got to demo it on NPR and to leave off on a high note. I in Fast Company. genders, the resounding re- mentioned earlier that men are sponse was along the lines of, 3 In addition to asking interviewers doing a lot better on the plat- “Well, yeah. Just think about dat- how interviewees did, we also asked form than women, but here’s ing from a man’s perspective.” interviewees to rate themselves. After the startling thing. Once you reading the Dunning and Ehrlinger Indeed, a study published factor out interview data from study, we went back and checked to in the Archives of Sexual Be- see what role self-perception played in both men and women who quit havior confirms that men treat attrition. In our case, the answer is, I’m after one or two bad interviews, rejection in dating very different- afraid, TBD, as we’re going to need more the disparity goes away entirely. self-ratings to say anything conclusive. ly than women, even going so

Reprinted with permission of the original author. First appeared at blog.interviewing.io.

hacker bits 61 Interesting

A simple mind hack that helps beat procrastination

By SHYAL BEARDSLEY

62 hacker bits Imagine yourself starting, not finishing.

leaning the flat, working to say about it in his book: The The task is more or less the out, reaching inbox zero: Now Habit. same; however the mental repre- Cthese are all worthy goals, sentation is entirely different. yet sometimes getting stuck in “The task before you is to the infernal hackersnews ∞ red- walk a solid board that is dit loop seems so much more... thirty feet long, four inches Imagine yourself relaxing. thick, and one foot wide. starting, not finishing Let’s begin by delving into You have all the physical, the physiology of procrastina- mental, and emotion- What is the first, smallest, tion: al abilities necessary to shortest, and least effortful task perform this task. You can you can perform to get started? “[procrastination is] a carefully place one foot in Say you need to clean your flat: battle of the limbic system, front of the other, or you then it could be to: the unconscious zone that can dance, skip, or leap includes the pleasure cen- across the board. You can take 1 plate and put it in ter, and the prefrontal cor- do it. No problem.” the sink. tex, the internal planner. When the limbic system This is how non-procrastina- That simple act has now dominates, which is pretty tors visualise a task. However, set you in motion. In other often, the result is putting Situation B is how procrastina- words: the prefrontal cortex has off until tomorrow what tors visualise the same task. won. could, and should, be done today.” “Now imagine that the task is just the same, to walk a Why does this work? Source: scienceabc.com board thirty feet long and It works because your limbic one foot wide, and you system is now experiencing the So how can we trick our have the same abilities; previously terrifying task, but is limbic system to not “dominate”? only now the board is sus- no longer feeling terrified by it. The answer resides in how pended between two build- This enables you to build new, we picture the task. When we ings 100 feet above the fresh memories of what was picture a task, we often imag- pavement. Look across to previously uncomfortable to the ine completing the task, if such the other end of the board point of procrastination, but a thing is even possible, and are and contemplate beginning now feels perfectly fine and safe. also aware of task inter-related- your assignment. What ness. Therein lies the problem. do you feel? What are you tldr: imagine yourself starting, This is what Neil Fiore has thinking about? What are not finishing.  you saying to yourself?”

Reprinted with permission of the original author. First appeared at shyal.com/blog.

hacker bits 63 food bit *

The unmanly fork

hese days, eating with a fork can’t be any more unremark- able, but way before this table tool gained acceptance, it was Tviewed as unmanly and ridiculous. After all, most people ate with their fingers and anything that can’t be picked up were scooped up by spoons or cut up by knives. The early adopters of the fork were the 11th century denizens of the Byzantine Empire, who astonished foreigners by delicately spearing their food with forks. This practice, however, didn’t take off until 600 years later when the Europeans embraced the fork and established rules for its usage. The Americans, late to the party, only took up forks during the late 18th century, and by then, the explosion of specialized forks (fish fork, anyone?) had some wishing they’d never picked up forks in the first place.

* FOOD BIT is where we, enthusiasts of all edibles, sneak in a fun fact about food.

HACKER BITS is the monthly magazine that gives you the hottest technology stories crowdsourced by the readers of Hacker News. We select from the top voted stories and publish them in an easy-to-read magazine format.

Get HACKER BITS delivered to your inbox every month! For more, visit hackerbits.com.