<<

Cryptocurrency Address Clustering and Labeling

Mengjiao Wang, Hikaru Ichijo, Bob Xiao

Binance.com

1. Introduction

Anonymity is one of the most important qualities of technology. For example, one can simply create a address to send and receive funds without providing KYC to any authority. In general, the real identity behind addresses is not known, however, some addresses can be clustered according to their ownership by analyzing behavioral patterns, allowing those with known attribution to be assigned labels. These labels may be further used for legal and compliance purposes to assist in law enforcement investigations.

In this document, we discuss our methodology behind assigning attribution labels to cryptocurrency addresses. According to CoinMarketCap1, as of October 2019, there are more than three thousands in existence. Generally, we separate cryptocurrencies into two groups, each using a different method to assign address labels. The first group includes cryptocurrencies with an Unspent Transaction Output (UTXO), such as Bitcoin and . Common spending and one-time change addresses are commonly used methods to cluster addresses. The second group focuses on account-based cryptocurrencies, such as (including ERC20 tokens) and EOS, and we identify exchange addresses through analyzing funding patterns. Our initial address labels come primarily from two data sources: publicly available crowdsourcing websites such as the Bitcoin Abuse Database2 and Etherscan3; as well as data annotated by customers.

2. Related Works

Since the birth of Bitcoin, both anonymization and de-anonymization techniques have been studied within the crypto sphere. One such example is the use of common spending and one-time change addresses [2][3] to cluster addresses for UTXO-based cryptocurrencies, such as Bitcoin.

For account-based cryptocurrencies, like Ethereum, addresses are typically reused and do not change as frequently as UTXO-based cryptocurrencies. Although many address labels are published through official and unofficial channels (e.g. Etherscan for Ethereum and Bithomp for XRP), the vast majority of accounts still lack attribution. However, by using

1 ​CoinMarketCap ​https://coinmarketcap.com/ 2 Bitcoin Abuse ​https://www.bitcoinabuse.com/ 3 Etherscan ​https://etherscan.io/

available labels as seeds, we may expand address labels to include addresses belonging to cryptocurrency exchanges, dependent on their wallet architecture [5].

3. Address Clustering

a. UTXO-based Cryptocurrencies

i. Common Spending

Common spending is one of the most commonly used methods for clustering Bitcoin addresses. For example, in Figure 1, the three input addresses for the transaction can be attributed to the same owner. This is because, in order to initiate the transaction, the sender must know the private keys for all input addresses at the same time.

Figure 1: Screenshot of a Bitcoin Transaction

We can create clusters by first building a graph with only the input addresses and transactions themselves (Figure 2), which then allows us to identify new relationships within the graph. Each linked component can be treated as a part of the group.

Figure 2: Clustering via Transaction Graph

Common spending assumes that each Bitcoin address is controlled by a single entity in the real world. Although not necessarily true for multi-signature addresses, the multi-sig technique does not change the nature of address clustering, and there remains an implied association between the entities.

ii. One-time Change​ ​Inference

The change resulting from a Bitcoin transaction is returned to the sender using a new address located on the output side. Therefore, change address analysis offers another way to expand upon clusters. However, as change addresses are not explicitly marked in a Bitcoin transaction, and they cannot always be inferred correctly, we must develop a number of patterns to minimize false-positives.

An address has a high possibility of being a change address if it fits the following patterns:

● there is more than one input address in the transaction; ● one output address is new, but the remainder have been used; ● the transfer amount of the new output address extends greater than 4 digits following the decimal; ● the input and output share no common addresses;

Figure 3: One-time Change Patterns

b. Account-based Cryptocurrencies

Unlike UTXO-based cryptocurrencies, change is not needed as senders may always send exact quantities of assets to recipients. Most cryptocurrency exchanges generate dedicated blockchain addresses for customers to process deposits. Assets received in the dedicated address will later be automatically transferred to a hot wallet. As exchange hot wallets are usually public and labeled through crowdsourcing, we can infer customer addresses using this gathering pattern.

i. Fund Gathering Patterns

Fund gathering patterns are used to cluster addresses that are created by cryptocurrency exchanges for customer deposits. If the addresses are deemed to belong to the same exchange, they are then grouped together.

An example of fund gathering is as follows:

Figure 4: Ethereum is Swept to a Hot Wallet

A customer had 174.65893626 Ethereum and wanted to deposit it to their Binance account. To accomplish this, the customer simply transferred Ethereum to the deposit address for their account (provided and controlled by Binance). Some time after the transaction was confirmed, Binance automatically migrated the assets from that deposit address on-chain to the exchange’s hot wallet address.

This gathering pattern is summarized in Figure 5, showing that we may group customer deposit addresses, as well as exchange hot and cold wallets, into a single cluster.

Figure 5: Clustering Exchange Addresses using Gathering Patterns

Hot and cold wallets for most exchanges can be obtained from public sources (e.g. Etherscan for Ethereum). Once a hot wallet is correctly labeled, we may then identify customer deposit addresses using the hot wallet. Usually, a customer’s deposit address transfers assets only to the hot wallet address, but may receive assets from any source.

4. Experiments

We evaluated our methodology using the public transactional data of Bitcoin and Ethereum networks before November 2019. Detailed results are as follows:

a. Bitcoin

Executing our technique on the , we were able to assign labels for 115 million addresses. The top ten clusters with the most addresses are shown below:

category name number of addresses

exchange c​oinbase.com 23,189,630 merchant service gocoin.com 6,960,885 p2p exchange .com 5,793,274 merchant service coinpayments.net 5,080,340 merchant service .com 4,343,764 exchange binance.com 3,663,255 hosted wallet .com 2,171,501 exchange .co.th 2,087,623 exchange cubits.com 2,043,583 exchange bittrex.com 1,604,589

b. Ethereum

Using our methodology, we were able to label more than 7 million addresses from a set of only 515 exchange addresses. The top ten exchanges with most customer deposit addresses are below:

category peer_name number of addresses exchange binance.com 2,175,948 exchange bittrex.com 804,789 exchange kucoin.com 600,525 exchange .com 352,903 exchange .com 339,253 exchange .com 330,841 exchange poloniex.com 272,129 exchange .com 265,791 exchange .com 234,105 exchange hitbtc.com 224,654

5. Conclusion

● We demonstrated address clustering techniques for both UTXO-based and account-based cryptocurrencies;

● Our methodology was implemented and evaluated with the Bitcoin network and Ethereum network. In total, 115 million Bitcoin addresses and 7 million Ethereum addresses were assigned a label.

References

[1] S. Nakamoto. (2008). Bitcoin: A peer-to-peer electronic system. https://bitcoin.org/bitcoin.pdf

[2] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, 2016, pp. 368-373. doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071

[3] S. Meiklejohn et al. A fistful of : characterizing payments among men with no names. In: Proc. 2013 Internet Measurement Conference, IMC’13. Pp. 127–140, 2013, doi:10.1145/2504730.2504747

[4] D. Ermilov, M. Panov and Y. Yanovich, "Automatic Bitcoin Address Clustering" 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, 2017, pp. 461-466. doi: 10.1109/ICMLA.2017.0-118

[5] Huang Wenwei, Design Secure Wallet Architecture For Digital Asset Exchange, Blog, https://huangwenwei.com/blogs/design-secure-wallet-architecture-for-digital-asset-exchange