<<

Deanonymizing ShapeShift: Linking Transactions Across Multiple

Nathan Borggren, Gary Koplik, Paul Bendich, John Harer Geometric Data AnalytIcs

[email protected]

Summary Volume of Transactions and Exploratory Data Analysis • We identify ShapeShift transactions in the blockchains of , , • For all figures: , and as they are converted to many other , - Currency amounts (using incoming currency amount) first including the popular . converted to BTC. • More than $250 million of traffic identified. - All conversions done with end of day rate. • Using machine learning, we can recall ~ ¾ of ShapeShift transactions - Data goes from June 6th, 2018 through September 27th, 2018. looking at only ~ ¼ of all transactions. - June 17th, 2018 and September 21st, 2018 are missing from our Introduction dataset. ● Prior to October 2018, the ShapeShift allowed for anonymous inter-cryptocurrency conversion. ● ShapeShift provides a mechanism to bypass know-your-customer (KYC) and anti-money-laundering (AML) common in exchanges and to bank transactions. ● Also allows for novel strategies for inter-cryptocurrency laundering (aka mixing or tumbling). Aggregating the Data ● Use ShapeShift’s API to collect rates and transactions. There is a clear trend away from Bitcoin ● Use other exchanges to collect BTC values in dollars and other ‘fiat’ Feature Extraction towards coins focused on privacy. currencies. ● We computed features for every Bitcoin, Litecoin, Dash, and Zcash transaction from June 2018 to October 2018 that are: Bitcoin Results - well-defined and comparable across multiple blockchains. - well-defined for transactions (tx) with 1 or 100s of inputs.

Output of ShapeShift rates API. The API gives an exchange Output of ShapeShift ‘recent transactions API’. The - 163 one hop and two hop features computed rate between many available cryptocurrency pairs. last 50 transactions have been requested every 5 ● Features include values and / or summary statistics of: seconds since April 2018. ● Used 6-fold cross validation with a random forest. BlockSci - transaction number of inputs / outputs ● Can recall ~ ¾ of ShapeShift transactions looking at ● Powerful tool to analyze blockchains, from the Arvind Narayanan group at - time since last tx only ~ ¼ of all transactions. Princeton University. - size of tx (in bytes) ● Builds an efficient for analyzing Bitcoin-like blockchains. (BTC, - value of tx LTC, ZEC, DASH = good) (Monero, = bad) Machine Learning ● Collecting ShapeShift Labels ● Highly unbalanced data (over 15 million points, less than 150,000 ShapeShift transactions e.g. less than 0.01% ● Using the full nodes and BlockSci we can make a list of candidates of a 18: Max size in bytes of any of the nearest 2: # tx outputs (0 hops) ShapeShift labels). neighbors (1-hop) 20: mean size in bytes (1-hop) recent ShapeShift transaction in the respective by looking for 1: # tx inputs (0 hops) 118: min size in bytes of alignment in time and value. ● Aiming for model with first priority on high recall of ShapeShift 5: mean time since last transaction (1-hop) nearest-nearest-neighbors (2-hop) 3: max time since last transaction (1-hop) 17: sum of number of outputs (1-hop) ● ShapeShift provides an additional API for traders to check if their transactions, second priority on reducing False positives. 4: min time since last transaction (1-hop) 21: median size in bytes (1-hop) ● Train with balanced training set (reducing the number of deposits have cleared; we however can use it to make sure our Further Studies hypothesized transaction is the ShapeShift transaction. When we are non-ShapeShift transactions through random sampling). ● Wannacry is known to have used ShapeShift (along with other ● Test on unbalanced data, and cross validate the results. right we receive a wealth of labels across both the incoming and mechanisms) to launder the malware Bitcoin. The mixing went outgoing blockchains. References from 3 addresses to 5 million addresses in a single hop! Can our Kalodner, H. A.,et al. (2017). BlockSci: Design and applications of a blockchain analysis platform. classifier filter the ShapeShift addresses to the top of this list and http://arxiv.org/abs/1709.02489 speed up their recovery? Acknowledgements ● Use classifier to extend our dataset to find transactions Many thanks to Peter Merrill and Matthew Schmidt of LAS for before the time period we began scraping data. We have now linked across blockchains! We even have values of discussions, assistance and feedback. Monero, which is private information in the Monero Blockchain! ● Continue analysis of Litecoin, Dash, and Zcash