How 4chan and The_Donald Influence the Fake News Ecosystem

On July 2nd, 2017, Donald Trump, the President of the United States of America, tweeted a short video clip of him punching out a CNN logo. The video was modified from an appearance that Mr. Trump made at a Wrestlemania event. It originally appeared on Reddit’s /r/The_Donald subreddit. Although The_Donald was infamous in some circles, the uproar the image caused was many’s first introduction to a community of users that have had a striking amount of influence on the world stage. While memes like the one birthed from The_Donald are worrying, but mostly harmless, a shocking amount of disinformation (“fake news”) is also created by and spread from smaller, fringe Web communities that have relatively outsized influence on the greater Web.

In a nutshell, the explosion of the Web has commoditized the creation of false information and enabled it to spread like wild fire at unprecedented scale. After a decade and a half of experience with social media platforms, bad actors have honed their techniques and been surprisingly adept at crafting messages that at best make it difficult to distinguish between fact and fiction, and at worst propagate dangerous falsehoods.

While recent discourse has tried to blur the lines between real and fake news, there are some fundamental differences. For example, the simple fact that fake news has to be created in the first place. Real news, even opinion pieces, is based around reporting and interpretation of factual material. Not to dismiss the efforts of journalists, but, the fact remains: they are not responsible for generating stories from whole cloth. This is not the case with the type of misinformation pushed by certain corners of the Web.

Consider recent events like the death of Heather Heyer during the Charlottesville protests earlier this year. While facts had to be discovered, they were facts, supported by evidence gathered by trained professionals (both law enforcement and journalists) over a period of time. This is real news. However, the facts did not line up with the far right political ideology espoused on 4chan’s /pol/ board (or if we want to play Devil’s Advocate, it made for good trolling material), and thus its users set about creating alternative narratives. Immediately they began working towards a shocking, to the uninitiated, nearly singular goal: deflect from the fact that a like mind committed a heinous act of violence in any way possible.

4chan: Crowdsourced Opposition Intelligence

With over a year of observing, measuring, and trying to understand the rise of the alt-right online we saw a familiar pattern emerge: crowdsourced opposition intelligence.

/pol/ users mobilized in a perverse, yet fascinating, use of the Web. Dozens of, often conflicting, discussion threads putting forth alternative theories of Ms. Heyer’s killing, supported by everything from pure conjecture, to dubious analysis of mobile phone video and pictures, to impressive investigations discovering personal details and relationships of victims and bystanders. Over time, pieces of the fabrication were agreed upon and tweaked until it resembled in large part a plausible, albeit eyebrow raising, false reality ready for consumption by the general public. Further, as bits of the narrative are debunked, it continues to evolve, weeks after the actual facts have been established.

One month after Heather Heyer’s killing, users on /pol/ were still pushing fabricated alternative narratives of events.

The Web Centipede

There are many anecdotal examples of smaller communities on the Web bubbling up and influencing the rest of the Web, but the plural of anecdote is not data. The research community has studied information diffusion on specific social media platforms like Facebook and Twitter, and indeed each of these platforms is under fire from government investigations in the US, UK, and EU, but the Web is much bigger than just Facebook and Twitter. There are other forces at play, where false information is incubated and crafted for maximum impact before it reaches a mainstream audience. Thus, we set out to measure just how this influence flows in a systematic and methodological manner, analyzing how URLs from 45 mainstream and 54 alternative news sources are shared across 8 months of Reddit, 4chan, and Twitter posts.

While we made many interesting findings, there are a few we’ll highlight here:

  1. Reddit and 4chan post mainstream news URLs at over twice the rate than Twitter does, and 4chan in particular posts alternative news URLs at twice the rate of Twitter and Reddit.
  2. We found that alternative news URLs spread much faster than mainstream URLs, perhaps an artifact of automated bots.
  3. While 4chan was usually the slowest to a post a given URL, it was also the most successful at “reviving” old stories: if a URL was re-posted after a long period of time, it probably showed up on 4chan originally.
Graph representation of news ecosystem for mainstream news domains (left) and alternative news domains (right). We create two directed graphs, one for each type of news, where the nodes represent alternative or mainstream domains, as well as the three platforms, and the edges are the sequences that consider only the first-hop of the platforms. For example, if a breitbart.com URL appears first on Twitter and later on the six selected subreddits, we add an edge from breitbart.com to Twitter, and from Twitter to the six selected subreddits. We also add weights on these edges based on the number of such unique URLs. Edges are colored the same as their source node.

Measuring Influence Through the Lens of Mainstream and Alternative News

While comparative analysis of news URL posting behavior provides insight into how Web communities connect together like a centipede through which information flows, it is not sufficiently powerful to quantify the specific levels of influence they have.

Continue reading How 4chan and The_Donald Influence the Fake News Ecosystem

Find Security Champions in Blends of Organisational Culture

I was at the EuroUSEC ’17 workshop in Paris at the end of April. Our own Angela Sasse was also there to deliver the keynote talk, and Ruba Abu-Salma presented our paper “The Security Blanket of the Chat World: An Analytic Evaluation and a User Study of Telegram” (which was based on research by undergraduate students studying UCL’s COMP3096 “Research Group Project” module). I presented secondary analysis, conducted with Ingolf Becker and Angela Sasse, of a survey deployed at a large partner organisation. This analysis builds on research we presented at the Symposium on Usable Privacy and Security (SOUPS) in 2016. Based on survey responses and voluntary free-text comments, we saw potential for employees to inform policy from the ‘ground up’, in contradiction to the current trend for identifying security champions as local representatives of pre-determined policy.

Top-down security policies

Organisational policies are intended to promote a unified approach to security, one that all the organisation’s employees are expected to follow. If security procedures and mechanisms are unusable, policies risk being seen as impossible to follow, or may be sidelined if they lack clear relevance to business goals. This can result in deliberate or unwitting non-compliance, and workarounds to prescribed procedures.

Organisations may promote security champions, as local representatives to promote policy in their part of the organisation. However, these security champions can be effective only if policy is workable. Encouraging ‘top down’ policy compliance assumes that policy is correct, complete, and appropriate. It also assumes that policy applies to everyone equally and that employees have no role to play in shaping effective policy. Our analysis explores the potential for employees to inform effective policies, in particular whether it was possible to (i) identify local pockets of security expertise, and (ii) target engagement with employees that involves them in the creation of workable security solutions.

Identifying security champions ‘from the ground up’

Level Attitude Approach
1 Uninfluenced Security behaviour is driven by personal knowledge.
2 Technically Controlled Technical controls enforce compliance with policy.
3 Ad-hoc Knowledge and Application Shallow understanding of policy.
Knowledge absorbed from surrounding work environment.
4 Policy Compliant Comprehensive knowledge and understanding of policy.
Willing policy compliance.
Role model for organisation’s security culture.
5 Active Approach to Security Actively promote and advance security culture.
Intent of policy carried into work activities
Leverage well-understood values that support both security and business.
Employee security – Attitude-Levels. We studied an organisation with IT systems, so there were no participants at Level 1

A scenario-based survey was deployed in the partner company. Scenarios were based upon in-depth interviews with employees that explored security behaviours in the workplace. Each scenario involved a dilemma, where fixed options described different responses and included an element of non-compliance or an implicit cost. Participant choices indicate their Behaviour Type (above) and Attitude Level (below), which we recorded across groups of employees to characterise the security culture of the organisation and in four specific divisions. Both interviews and surveys represent a cross-section of divisions, locations, and age groups. We collected 608 survey responses; crucially, the survey allowed participants to comment on the scenarios and the available options – we also looked at 267 additional free-text comments that were provided.

Behaviour-Type Description
Individualists Rely on self for solutions
Egalitarians Rely on social or group solutions
Hierarchists Rely on existing systems or technologies
Fatalists Take a ‘naive’ approach, that their actions are not significant in creating outcomes
Behaviour-Types

Continue reading Find Security Champions in Blends of Organisational Culture

Underground abraCARDabra: Understanding carding forums

Paying for dinner? A taxi ride? A tropical drink? Sure. Swipe or tap your card and it is done. Convenient. Payment cards make it easy for us to make payments at “brick-and-mortar” locations and online marketplaces. However, they are also attractive targets for cybercriminals seeking to steal funds from the accounts linked to payment cards, as seen in this recent high-profile theft of credit cards affecting more than 1,000 hotels, for instance.

Theft of payment card information via phishing, skimming, or hacking, is usually the first step in the chain of payment card fraud. Other steps include sales, validation, and monetisation of the stolen data. These illicit deals are aided by underground online forums where cybercriminals actively trade stolen credit card information. To tackle payment card fraud, it is therefore important to understand the characteristics of these forums and the activity of miscreants using them. In our paper, presented at the 2017 APWG Symposium on Electronic Crime Research (eCrime2017), we analyse and discuss the characteristics of underground carding forums. We focus on the available products and prices, characteristics of sellers, and features of the forums. We won the Best Paper Award at eCrime2017.

Products

The main products available on carding forums are credit card numbers, dumps, and fullz. Credit card numbers comprise the information actually printed on credit cards, that is, cardholder name, card number (16 digits on most cards), expiry date, and the security code on the back of the card (usually 3 digits).

Dumps comprise stolen information from the tracks of magnetic stripe of a credit card. Dumps are usually obtained via skimmers. Skimmers are devices attached to Automated Teller Machines (ATMs) and Point of Sale (POS) terminals by miscreants to steal data from unsuspecting victims. Afterwards, the miscreants create clones of the skimmed credit cards and monetise the clones, for instance, by making illicit purchases with them.

Fullz contain further information about the cardholder. In other words, fullz usually comprise information printed on the card plus additional information such as bank account information, cardholder’s date of birth, Social Security number, etc.

Sellers

Generally, there are several types of participants on carding forums: sellers, buyers, intermediaries, mules, administrators, and others. These roles are not mutually exclusive; sellers may simultaneously be buyers. In this study, we focus on sellers since they come before buyers in the fraud chain.

Our approach

We studied previous work on underground marketplaces and forums, and derived the following hypotheses from the insights gained. We then searched for names of carding forums, found 25 names, and collected data from 5 active forums. We then tested the hypotheses on the data.

Hypothesis 1. Prices of fullz (credit card numbers and additional cardholder information) are higher than prices of credit card numbers.
Hypothesis 2. A small number of traders are responsible for a large
proportion of traffic.
Hypothesis 3. Most traders sell only one product type (that is, they are specialised).
Hypothesis 4. Specialised traders sell their products at lower prices than unspecialised traders.
Hypothesis 5. Carding forums have working reputation systems that are as sophisticated as those of legal marketplaces (for instance, eBay).
Hypothesis 6. The vast majority of actors do not operate on more than
one forum.

Summary of findings

Our analyses confirmed Hypothesis 1, Hypothesis 2, and Hypothesis 6. In other words, prices of fullz are indeed higher than prices of credit card numbers (credit card numbers: mean = $10.08, median = $10.00; fullz: mean = $31.82, median = $30.00). Also, a small number of traders are responsible for a large proportion of traffic. Finally, most sellers focus their efforts on a single forum, as expected.

Hypothesis 4 was partially rejected, while Hypothesis 3 and Hypothesis 5 were completely rejected. In other words, specialised sellers do not always sell their products at lower prices than the unspecialised ones, most sellers advertise more than one type of product, and most of the carding forums under study do not have working reputation systems that are as elaborate as those of legitimate online marketplaces.

In conclusion, dumps and fullz are relatively expensive; they are more than three times as expensive as credit card numbers. This may be due to the effort needed to obtain or monetise the data, the amount of available information, or differing supply and demand. Sellers have varying success. Even though some sellers complete hundreds of transactions, most sellers do not succeed in selling anything. This means that the trading sections of the forums are profitable distribution channels for high-profile actors. Finally, specialisation is not a key characteristic of sellers, not even of high-profile sellers.

Further details can be found in the full paper All Your Cards Are Belong To Us: Understanding Online Carding Forums, by Andreas Haslebacher, Jeremiah Onaolapo, and Gianluca Stringhini.

Battery Status Not Included: Assessing Privacy in W3C Web Standards

Designing standards with privacy in mind should be a standard in itself. Historically this was not always the case and the idea of designing systems with privacy is relatively new – it dates from the beginning of this decade. One of the milestones is accepting this view on the IETF level, dating 2013.

This note and the referenced research work focusses on designing standards with privacy in mind. The World Wide Web Consortium (W3C) is one of the most important standardization bodies. W3C is standardizing something everyone – billions of people – use every day for entertainment or work, the web and how web browsers work. It employs a focused, lengthy, but very open and consensus-driven environment in order to make sure that community voice is always heard during the drafting of new specifications. The actual inner workings of W3C are surprisingly complex. One interesting observation is that the W3C Process Document document does not actually mention privacy reviews at all. Although in practice this kind of reviews are always the case.

However, as the web along with its complexity and web browser features constantly grow, and as browsers and the web are expected to be designed in an ever-rapid manner, considering privacy becomes a challenge. But this is what is needed in order to obtain a good specification and well-thought browser features, with good threat models, and well designed privacy strategies.

My recent work, together with the Princeton team, Arvind Narayanan and Steven Englehardt, is providing insight into the standardization process at W3C, specifically the privacy areas of standardization. We point out a number of issues and room for improvements. We also provide recommendations as well as broad evidence of a wide misuse and abuse (in tracking and fingerprinting) of a browser mechanism called Battery Status API.

Battery Status API is a browser feature that was meant to allow websites access the information concerning the battery state of a user device. This seemingly innocuous mechanism initially had no identified privacy concerns. However, following my previous research work in collaboration with Gunes Acar this view has changed (Leaking Battery and a later note).

The work I authored in 2015 identifies a number or privacy risks of the API. Specifically – the issue of potential exploitation of the API to tracking, as well as the potential possibility of recovering the raw value of the battery capacity – something that was not foreseen to be accessed by web sites. Ultimately, this work has led to a fix in Firefox browser and an update to W3C specification. But the matter was not over yet.

Continue reading Battery Status Not Included: Assessing Privacy in W3C Web Standards

A Longitudinal Measurement Study of 4chan’s Politically Incorrect Forum and its Effect on the Web

The discussion board site 4chan has been a part of the Internet’s dark underbelly since its creation, in 2003, by ‘moot’ (Christopher Poole). But recent events have brought it under the spotlight, making it a central figure in the outlandish 2016 US election campaign, with its links to the “alt-right” movement and its rhetoric of hate and racism. However, although 4chan is increasingly “covered” by the mainstream media, we know little about how it actually operates and how instrumental it is in spreading hate on other social platforms. A new study, with colleagues at UCL, Telefonica, and University of Rome now sheds light on 4chan and in particular, on /pol/, the “politically incorrect” board.

What is 4chan anyway?

4chan is an imageboard site, built around a typical bulletin-board model. An “original poster” creates a new thread by making a post, with one single image attached, to a board with a particular focus of interest. Other users can reply, with or without images. Some of 4chan’s most important aspects are anonymity (there is no identity associated with posts) and ephemerality (inactive threads are routinely deleted).

4chan currently features 69 boards, split into 7 high level categories, e.g. Japanese Culture or Adult. In our study, we focused on the /pol/ board, whose declared intended purpose is “discussion of news, world events, political issues, and other related topics”. Arguably, there are two main characteristics of /pol/ threads. One is its racist connotation, with the not-so-unusual aggressive tone, offensive and derogatory language, and links to the “alt-right” movement—a segment of right-wing ideologies supporting Donald Trump and rejecting mainstream conservatism as well as immigration, multiculturalism, and political correctness. The other characteristic is the fact that it generates a substantial amount of original content and “online” culture, ranging from  the “lolcats” memes to “pepe the frog.”

This figure below shows four examples of typical /pol/ threads:

/pol/ example threads
Examples of typical /pol/ threads. Thread (A) illustrates the derogatory use of “cuck” in response to a Bernie Sanders image, (B) a casual call for genocide with an image of a woman’s cleavage and a “humorous” response, (C) /pol/’s fears that a withdrawal of Hillary Clinton would guarantee Donald Trump’s loss, and (D) shows Kek the “god” of memes via which /pol/ believes influences reality.

Raids towards other services

Another aspect of /pol/ is its reputation for coordinating and organizing so-called “raids” on other social media platforms. Raids are somewhat similar to Distributed Denial of Service (DDoS) attacks, except that rather than aiming to interrupt the service at a network level, they attempt to disrupt the community by actively harassing users and/or taking over the conversation.

Continue reading A Longitudinal Measurement Study of 4chan’s Politically Incorrect Forum and its Effect on the Web

Understanding the Use of Leaked Webmail Credentials in the Wild

Online accounts enable us to store and access documents, make purchases, and connect to new friends, among many other capabilities. Even though online accounts are convenient to use, they also expose users to risks such as inadvertent disclosure of private information and fraud. In recent times, data breaches and subsequent exposure of users to attacks have become commonplace. For instance, over the last four years, account credentials of millions of users from Dropbox, Yahoo, and LinkedIn have been stolen in massive attacks conducted by cybercriminals.

After online accounts are compromised by cybercriminals, what happens to the accounts? In our paper, presented today at the 2016 ACM Internet Measurement Conference, we answer this question. To do so, we needed to monitor the compromised accounts. This is hard to do, since only large online service providers have access to data from such compromised accounts, for instance Google or Yahoo. As a result, there is sparse research literature on the use of compromised online accounts. To address this problem, we developed an infrastructure to monitor the activity of attackers on Gmail accounts. We did this to enable researchers to understand what happens to compromised webmail accounts in the wild, despite the lack of access to proprietary data on compromised accounts.

Cybercriminals usually sell the stolen credentials on the underground black market or use them privately, depending on the value of the compromised accounts. Such accounts can be used to send spam messages to other online online accounts, or to retrieve sensitive personal or corporate information from the accounts, among a myriad of malicious uses. In the case of compromised webmail accounts, it is not uncommon to find password reset links, financial information, and authentication credentials of other online accounts inside such webmail accounts. This makes webmail accounts particularly attractive to cybercriminals, since they often contain a lot of sensitive information that could potentially be used to compromise other accounts. For this reason, we focus on webmail accounts.

Our infrastructure works as follows. We embed scripts based on Google Apps Script in Gmail accounts, so that the accounts send notifications of activity to us. Such activity includes the opening of email messages, creation of email drafts, sending of email messages, and “starring” of email messages. We also record details of accesses including IP addresses, browser information, and access times of visitors to the accounts. Since we designed the Gmail accounts to lure cybercriminals to interact with them (in the sense of a honeypot system), we refer to the accounts as honey accounts.

To study webmail accounts stolen via malware, we also developed a malware sandbox infrastructure that executes information-stealing malware samples inside virtual machines (VMs). We supply honey credentials to the VMs, which drive web browsers and login to the honey accounts automatically. The login action triggers the malware in the VMs to steal and exfiltrate the honey credentials to Command-and-Control servers under the control of botmasters.

Continue reading Understanding the Use of Leaked Webmail Credentials in the Wild

QUUX: a QUIC un-multiplexing of the Tor relay transport

Latency is a key factor in the usability of web browsing. This has added relevance in the context of anonymity systems such as Tor, because the anonymity property is strengthened by having a larger user-base.

Decreasing the latency of typical web requests in Tor could encourage a wider user base, making it more viable for typical users who value their privacy and less conspicuous for the people who most need it. With this in mind for my MSc Information Security project at UCL, supervised by Dr Steven J. Murdoch, I looked at the transport subsystem used by the Tor network, hoping to improve its performance.

After a literature review of the area (several alternative transport designs have been proposed in the past), I started to doubt my initial mental model for an alternative design.

Data flow in Freedom
Data flow in Freedom (Murdoch, 2011)
Data flow in Tor
Data flow in Tor (Murdoch, 2011)

These diagrams show an end-to-end design (Freedom) and hop-by-hop design (Tor) respectively. In the end-to-end design, encrypted IP packets are transported between relays using UDP, with endpoints ensuring reliable delivery of packets. In the hop-by-hop design, TCP data is transported between relays, with relays ensuring reliable delivery of data.

The end-to-end Freedom approach seems elegant, with relays becoming somewhat closer to packet routers, however it also leads to longer TCP round-trip times (RTT) for web browser HTTP connections. Other things being equal, a longer TCP RTT will result in a slower transfer. Additional issues include difficulty in ensuring fairness of utilisation (requiring an approach outlined by Viecco), and potentially greater vulnerability to latency-based attack.

Therefore I opted to follow the hop-by-hop transport approach Tor currently takes. Tor multiplexes cells for different circuits over a single TCP connection between relay-pairs, and as a result a lost packet for one circuit could hold up all circuits that share the same connection (head-of-line blocking). A long-lived TCP connection is beneficial for converging on an optimal congestion window size, but the approach suffers from head-of-line blocking and doesn’t compete effectively with other TCP connections using the same link.

To remedy these issues, I made a branch of Tor which used a QUIC connection in place of the long-lived TCP connection. Because a QUIC connection carries multiple TCP-like streams, it doesn’t suffer from head-of-line blocking. The streams also compete for utilisation at the same level as TCP connections, allowing them to more effectively use either the link capacity or the relay-configured bandwidth limit.

Download time for a 320KiB file
Download time for a 320KiB file

Initial results from the experiments are promising, as shown above. There’s still a way to go before such a design could make it into the Tor network. This branch shows the viability of the approach for performance, but significant engineering work still lies ahead to create a robust and secure implementation that would be suitable for deployment. There will also likely be further research to more accurately quantify the performance benefits of QUIC for Tor. Further details can be found in my MSc thesis.

Moving towards security and privacy experiments for the real world

Jono and I recently presented our joint paper with Simon and Angela at the Learning from Authoritative Security Experiment Results (LASER) Workshop in San Jose, CA, USA. The workshop was co-located with the IEEE Security and Privacy Symposium. LASER has a different focus each year; in 2016, presented papers explored new approaches to computer security experiments that are repeatable and can be shared across communities.

Through our LASER paper, “Towards robust experimental design for user studies in security and privacy”, we wanted to advance the quest for better experiment design and execution. We proposed the following five principles for conducting robust experiments into usable security and privacy:

  1. Give participants a primary task
  2. Ensure participants experience realistic risk
  3. Avoid priming the participants
  4. Perform experiments double-blind whenever possible
  5. Define these elements precisely: threat model; security; privacy and usability

Understanding users and their interaction with security is a blind spot for many security practitioners and designers. Learning from prior studies within and outside our research group, we have defined principles for conducting robust experiments into usable security and privacy. These principles are informed by efforts in other fields such as biology, qualitative research methods, and medicine, where four overarching experiment-design factors guided our principles:

Internal validity – The experiment is of “suitable scope to achieve the reported results” and is not “susceptible to systematic error”.

External validity – The result of the experiment “is not solely an artifact of the laboratory setting”.

Containment  – There are no “confounds” in the results, and no experimental “effects are a threat to safety” of the participants, the environment, or society generally.

Transparency – “There are no explanatory gaps in the experimental mechanism” and the explanatory “diagram for the experimental mechanism is complete”, in that it covers all relevant entities and activities.

Continue reading Moving towards security and privacy experiments for the real world

Adblocking and Counter-Blocking: A Slice of the Arms Race

anti-adblocking message from WIRED
If you use an adblocker, you are probably familiar with messages of the kind shown above, asking you to either disable your adblocker, or to consider supporting the host website via a donation or subscription. This is the battle du jour in the ongoing adblocking arms race — and it’s one we explore in our new report Adblocking and Counter-Blocking: A Slice of the Arms Race.

The reasons for the rising popularity of adblockers include improved browsing experience, better privacy, and protection against malvertising. As a result, online advertising revenue is gravely threatened by adblockers, prompting publishers to actively detect adblock users, and subsequently block them or otherwise coerce the user to disable the adblocker — practices we refer to as anti-adblocking. While there has been a degree of sound and fury on the topic, until now we haven’t been able to understand the scale, mechanism and dynamics of anti-adblocking. This is the gap we have started to address, together with researchers from the University of Cambridge, Stony Brook University, University College London, University of California Berkeley, Queen Mary University of London and International Computer Science Institute (Berkeley). We address some of these questions by leveraging a novel approach for identifying third-party services shared across multiple websites to present a first characterization of anti-adblocking across the Alexa Top-5K websites.

We find that at least 6.7% of Alexa Top-5K websites employ anti-adblocking, with the practices finding adoption across a diverse mix of publishers; particularly publishers of “General News”, “Blogs/Wiki”, and “Entertainment” categories. It turns out that these websites owe their anti-adblocking capabilities to 14 unique scripts pulled from 12 unique domains. Unsurprisingly, the most popular domains are those that have skin in the game — Google, Taboola, Outbrain, Ensighten and Pagefair — the latter being a company that specialises in anti-adblocking services. Then there are in-house anti-adblocking solutions that are distributed by a domain to client websites belonging to the same organisation: TripAdvisor distributes an anti-adblocking script to its eight websites with different country code top-level domains, while adult websites (all hosted by MindGeek) turn to DoublePimp. Finally, we visited a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger, and discovered that half of the 12 anti-adblocking suppliers are counter-blocked by at least one adblocker — suggesting that the arms race has already entered the next level.

It is hard to say how many levels deeper the adblocking arms race might go. While anti-adblocking may provide temporary relief to publishers, it is essentially band-aid solution to mask a deeper issue — the disequilibrium between ads (and, particularly, their behavioural tracking back-end) and information. Any long term solution must address the reasons that brought users to adblockers in the first place. In the meantime, as the arms race continues to escalate, we hope that studies such as ours will bring transparency to this opaque subject, and inform policy that moves us out of the current deadlock.

 

“Ad-Blocking and Counter Blocking: A Slice of the Arms Races” by Rishab Nithyanand, Sheharbano Khattak, Mobin Javed, Narseo Vallina-Rodriguez, Marjan Falahrastegar, Julia E. Powles, Emiliano De Cristofaro, Hamed Haddadi, and Steven J. Murdoch. arXiv:1605.05077v1 [cs.CR], May 2016.

This post also appears on the University of Cambridge Computer Laboratory Security Group blog, Light Blue Touchpaper.

On the hunt for Facebook’s army of fake likes

As social networks are increasingly relied upon to engage with people worldwide, it is crucial to understand and counter fraudulent activities. One of these is “like farming” – the process of artificially inflating the number of Facebook page likes. To counter them, researchers worldwide have designed detection algorithms to distinguish between genuine likes and artificial ones generated by farm-controlled accounts. However, it turns out that more sophisticated farms can often evade detection tools, including those deployed by Facebook.

What is Like Farming?

Facebook pages allow their owners to publicize products and events and in general to get in touch with customers and fans. They can also promote them via targeted ads – in fact, more than 40 million small businesses reportedly have active pages, and almost 2 million of them use Facebook’s advertising platform.

At the same time, as the number of likes attracted by a Facebook page is considered a measure of its popularity, an ecosystem of so-called “like farms” has emerged that inflate the number of page likes. Farms typically do so either to later sell these pages to scammers at an increased resale/marketing value or as a paid service to page owners. Costs for like farms’ services are quite volatile, but they typically range between $10 and $100 per 100 likes, also depending on whether one wants to target specific regions — e.g., likes from US users are usually more expensive.

Screenshot from http://www.getmesomelikes.co.uk/
Screenshot from http://www.getmesomelikes.co.uk/

How do farms operate?

There are a number of possible way farms can operate, and ultimately this dramatically influences not only their cost but also how hard it is to detect them. One obvious way is to instruct fake accounts, however, opening a fake account is somewhat cumbersome, since Facebook now requires users to solve a CAPTCHA and/or enter a code received via SMS. Another strategy is to rely on compromised accounts, i.e., by controlling real accounts whose credentials have been illegally obtained from password leaks or through malware. For instance, fraudsters could obtain Facebook passwords through a malicious browser extension on the victim’s computer, by hijacking a Facebook app, via social engineering attacks, or finding credentials leaked from other websites (and dumped on underground forums) that are also valid on Facebook.

Continue reading On the hunt for Facebook’s army of fake likes