A Reflection on the Waves Of Malice: Malicious File Distribution on the Web (part 2)

The first part of this article introduced the malicious file download dataset and the delivery network structure. This final part explores the types of files delivered, discusses how the network varies over time, and concludes with challenges for the research community.

The Great Divide: A PUP Ecosystem and a Malware Ecosystem

We found a notable divide in the delivery of PUP and malware. First, there is much more PUP than malware in the wild: we found PUP-to-malware ratios of 5:1 by number of SHA-2s, and 17:2 by number of raw downloads. Second, we found that mixed delivery mechanisms of PUP and malware are not uncommon (e.g., see our Opencandy case study in the paper). Third, the highly connected Giant Component is predominantly a PUP Ecosystem (8:1 PUP-to-malware by number of SHA-2s), while the many “islands” of download activity outside of this component are predominantly a Malware Ecosystem (1.78:1 malware-to-PUP by number of SHA-2s).

Comparing the structures of the two ecosystems,we found that the PUP Ecosystem leverages a higher degree of IP address and autonomous system (AS) usage per domain and per dropper than the Malware Ecosystem, possibly indicating higher CDN usage or the use of evasive fast-flux techniques to change IP addresses (though, given earlier results, the former is the more likely). On the other hand, the Malware Ecosystem was attributed with fewer SHA-2s being delivered per domain than the PUP Ecosystem with the overall numbers in raw downloads remaining the same, which could again be indicative of a disparity in the use of CDNs between the two ecosystems (i.e., CDNs typically deliver a wide range of content). At the same time, fewer suspicious SHA-2s being delivered per domain could also be attributable to evasive techniques being employed (e.g., malicious sites delivering a few types of files before changing domain) or distributors in this ecosystem dealing with fewer clients and smaller operations.

We tried to estimate the number of PPIs in the wild by defining a PPI service as a network-only component (or group of components aggregated by e2LD) that delivered more than one type of malware or PUP family. Using this heuristic, we estimated a lower bound of 394 PPIs operating on the day, 215 of which were in the PUP Ecosystem. In terms of proportions, we found that the largest, individual PPIs in the PUP and Malware Ecosystems involved about 99% and 24% of all e2LDs and IPs in their ecosystems, respectively.

With there being a number of possible explanations for these structural differences between ecosystems, and such a high degree of potential PPI usage in the wild (especially within the PUP Ecosystem), this is clearly an area in which further research is required.

Keeping Track of the Waves

The final part of the study involved tracking these infrastructures and their activities over time. Firstly, we generated tracking signatures of the network-only (server-side) and file-only (client-side) delivery infrastructures. In essence, this involved tracking the root and trunk nodes in a component, which typically had the highest node degrees, and thus, were more likely to be stable, as opposed to the leaf nodes, which were more likely to be ephemeral.

