“The pool’s run dry” – analyzing anonymity in Zcash

Zcash is a cryptocurrency whose main feature is a “shielded pool” that is designed to provide strong anonymity guarantees. Indeed, the cryptographic foundations of the shielded pool are based in highly-regarded academic research. The deployed Zcash protocol, however, allows for transactions outside of the shielded pool (which, from an anonymity perspective, are identical to Bitcoin transactions), and it can be easily observed from blockchain data that the majority of transactions do not use the pool. Nevertheless, users of the shielded pool should be able to treat it as their anonymity set when attempting to spend coins in an anonymous fashion.

In a recent paper, An Empirical Analysis of Anonymity in Zcash, we (George Kappos, Haaroon Yousaf, Mary Maller, and Sarah Meiklejohn) conducted an empirical analysis of Zcash to further our understanding of its shielded pool and broader ecosystem. Our main finding is that is possible in many cases to identify the activity of founders and miners using the shielded pool (who are required by the consensus rules to put all newly generated coins into it). The implication for anonymity is that this activity can be excluded from any attempt to track coins as they move through the pool, which acts to significantly shrink the effective anonymity set for regular users. We have disclosed all our findings to the developers of Zcash, who have written their own blog post about this research.  This work will be presented at the upcoming USENIX Security Symposium.

What is Zcash?

In Bitcoin, the sender(s) and receiver(s) in a transaction are publicly revealed on the blockchain. As with Bitcoin, Zcash has transparent addresses (t-addresses) but gives users the option to hide the details of their transactions using private addresses (z-addresses). Private transactions are conducted using the shielded pool and allow users to spend coins without revealing the amount and the sender or receiver. This is possible due to the use of zero-knowledge proofs.

Like Bitcoin, new coins are created in public “coingen” transactions within new blocks, which reward the miners of those blocks. In Zcash, a percentage of the newly minted coins are also sent to the founders (a predetermined list of Zcash addresses owned by the developers and embedded into the protocol).

There are thus five types of transactions in Zcash: public (also called transparent), private, shielded, deshielded, and mixed. Public transactions act like just like Bitcoin, and use solely t-addresses. Shielded (t-to-z) transactions are when a user sends coins from a t-address into the shielded pool. Private (z-to-z) transactions only use z-addresses and do not reveal the amount, sender, or receiver. Deshielded (z-to-t) transactions send shielded coins to a t-address in order to withdraw money from the pool. Finally, mixed transactions use a combination of the above.

General blockchain statistics

We last parsed the blockchain on January 21 2018, when it had 258,472 blocks. Overall, 3,106,643 ZEC had been generated since the genesis block, out of which 80% (2,485,461 ZEC) went to the miners and the remaining 20% (621,182 ZEC) went to the founders. Across all transactions, 1,740,378 distinct t-addresses had been used. Of these, 0.5% (8,727) acted as inputs in a t-to-z transaction and 19% (330,780) acted as outputs in a z-to-t transaction. A breakdown of the different types of transactions is shown below.

By ranking addresses by their wealth (current balance), we observed that only 25% had a non-zero balance. Moreover, the top 1% of the wealthiest users controlled 78% of the total monetary supply, while the richest address of all (with 118,257.75 ZEC) had more coins than the total amount in the shielded pool.

Clustering transparent transactions

Since public transactions behave just like in Bitcoin, it is easy to adapt the analysis that has been done there to Zcash. We used the well-known “multi-input” heuristic, which says that if two or more t-addresses are inputs in the same transaction then they are controlled by the same entity, in order to cluster the owners of the addresses. We supplemented our clusters using a scaled-down “re-identification attack” in which we manually interacted with online services, and tagged addresses from the Zchain explorer and the specialized heuristics we developed for founders and miners (described below).

We found a total of 560,319 clusters, of which 97,539 contained more than a single address. We found that the largest four clusters (in number of addresses) belonged to digital asset exchanges, the largest being the U.S.-based exchange Poloniex. Many of the exchange clusters contained large fractions of miner addresses, implying that miners use the addresses of exchange accounts to receive their rewards. ShapeShift, a cross-currency digital asset exchange, was also heavily used, as it received and sent over 1.1M ZEC.

Case study: The Shadow Brokers

The Shadow Brokers (TSB) are a hacker collective that sell and distribute tools supposedly created by the NSA. In May 2017, they began to accept Zcash as a form of payment for their tools and services. We attempted to identify transactions to TSB by searching the blockchain for transactions that deposited the amounts requested by TSB during their sale periods. We clustered the addresses and transactions found.

Out of all our clusters, we found one that belonged to a new user (i.e., a user with essentially no previous activity) who sent transactions to the shielded pool with amounts and timings that corresponded to TSB’s sale activity. Most of their coins came directly from a cluster belonging to the exchange Bitfinex.

General interactions with the shielded pool

As mentioned earlier, both miners and founders are required to use the shielded pool. We were able to easily identify their deposits into the pool because: (a) in shielded transactions the source addresses are transparent, (b) founder addresses are publicly known and act as recipients in coingen transactions, and (c) miner addresses can be isolated as the set of all distinct addresses that received newly minted coins but were not founders. The deposits into the pool over time could thus be sorted into categories as shown below.

An optimist might say that in order to find the withdrawals corresponding to these deposits, you could just look for withdrawals using the same addresses we identified above. However, life is never so easy. The founders in particular never use the same addresses to withdraw from the pool, and miners rarely do, as we can see in Figure 8a.

The naïve interpretation of this analysis would suggest that miners and founders haven’t taken their money out of the pool, even though the total curve shows that almost all of the deposited coins have indeed been withdrawn. This led to the obvious conclusion that miners and founders just used different addresses for withdrawals than they did for deposits, so we would have to be at least a little more clever.

Identifying founders

Out of all the deposits made by the founders, we observed that the majority of them (75%) were of a particular value (249.9999 ZEC), which is equal to 100 block rewards. Moreover, this was a highly uncommon value across the whole blockchain: only five other deposits into the pool carried a value between 249 and 251 ZEC that did not come from a founder address. Upon examining the withdrawals from the pool, we did not find any of exact value 249.9999 ZEC, but we did find 1,953 withdrawals of exactly 250.0001 ZEC. Beyond this correlation in value, we analyzed the block interval between deposits and withdrawals and observed an even stronger correlation here (details are in the paper). We thus concluded that any z-to-t transaction carrying 250.0001 ZEC in value was done by the founders.

Identifying miners

To identify miners, we exploited the fact that most of their activity was performed by pools, and that some pools engaged with the shielded pool in a predictable fashion. For example, some miners would deposit money in the pool, followed by a withdrawal with hundreds of recipients out of which one was a previously identified miner address (i.e., an address that has previously received coins from a coingen). This is similar to how some pools operate in other cryptocurrencies, where one address collects the reward and then distributes it to individual miners that work for their pool.

We thus created a heuristic that identifies a withdrawal as belonging to a miner if it had over 100 recipients, with one of them belonging to a known mining pool. Once again, full details of this heuristic can be found in our paper. After applying both our heuristics, we were able to categorize withdrawals from the pool as shown in Figure 8c. In total, we could associate 69% of the activity surrounding the shielded pool with miners and founders, leaving only 31% left as the anonymity set for regular users.

But wait, there’s more! Identifying regular users too

Even for these regular users, we were still interested to see if it was possible to link together deposit and withdrawal transactions. Here we used a heuristic that had already been introduced by Jeffrey Quesnelle, and linked deposit and withdrawal transactions if they had exactly the same value and this particular value was unique in the whole blockchain. The total correlated value was over 1 million ZEC and represented 28.5% of all coins ever deposited in the pool. Luckily for regular users hoping to remain anonymous though, most (87%) of the linked coins were in transactions already attributed to the founders and miners.

Leave a Reply

Your email address will not be published. Required fields are marked *