Style Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry∗ Christopher McKnight Ian Goldberg Magnet Forensics University of Waterloo
[email protected] [email protected] ABSTRACT worm based on an examination of the reverse-engineered code [17], The results of recent experiments have suggested that code stylom- casting style analysis as a forensic technique. etry can successfully identify the author of short programs from This technique, however, may be used to chill speech for soft- among hundreds of candidates with up to 98% precision. This poten- ware developers. There are several cases of developers being treated tial ability to discern the programmer of a code sample from a large as individuals of suspicion, intimidated by authorities and/or co- group of possible authors could have concerning consequences for erced into removing their software from the Internet. In the US, the open-source community at large, particularly those contrib- Nadim Kobeissi, the Canadian creator of Cryptocat (an online se- utors that may wish to remain anonymous. Recent international cure messaging application) was stopped, searched, and questioned events have suggested the developers of certain anti-censorship by Department of Homeland Security officials on four separate oc- and anti-surveillance tools are being targeted by their governments casions in 2012 about Cryptocat and the algorithms it employs [16]. and forced to delete their repositories or face prosecution. In November 2014, Chinese developer Xu Dong was arrested, pri- In light of this threat to the freedom and privacy of individual marily for political tweets, but also because he allegedly “committed programmers around the world, we devised a tool, Style Counsel, to crimes of developing software to help Chinese Internet users scale aid programmers in obfuscating their inherent style and imitating the Great Fire Wall of China” [4] in relation to proxy software he another, overt, author’s style in order to protect their anonymity wrote.