Social networks

A Privacy Framework for Research Using Social Media Data

Social media data enables researchers to understand current events and human behavior with unprecedented ease and scale. Yet, researchers often violate user privacy when they access, process, and store sensitive information contained within social media data.

Social media has proved largely beneficial for research. Studies measuring the spread of COVID-19 and predicting crime highlight the valuable insights that are not possible without the Big Data present on social media. Lurking within these datasets, though, are intimate personal stories related to sensitive topics such as sexual abuse, pregnancy loss, and gender transition— both anonymously and in identifying ways.

In our latest paper presented at the 2025 IEEE Symposium on Security and Privacy, “SoK: A Privacy Framework for Security Research Using Social Media Data,” we examine the tensions in social media research between pursuing better science enabled by social media data and protecting social media users’ privacy. We focus our efforts on security and privacy research as it can involve sensitive topics such as misinformation, harassment, and abuse as well as the need for security researchers to be held to a higher standard when revealing vulnerabilities that impact companies and users.

Methodology

Toward this end, we conducted a systematic literature review of security literature on social media. We collected over 10,000 papers from six different disciplines, Computer Security and Cryptography (CSC), Data Mining and Analysis (DMA), Human-Computer Interaction (HCI), Humanities, Literature & Arts, Communication, (HLAC), Social Sciences, Criminology (SSC), and Social Sciences, Forensic Science (SSFS). Our final dataset included 601 papers across 16 years after iterating through several screening rounds.

How do security researchers handle privacy of social media data?

Our most alarming finding is that only 35% of papers mention any considerations of data anonymization, availability, and storage. This means that security and privacy researchers are failing to report how they handle user privacy.

Continue reading A Privacy Framework for Research Using Social Media Data

Consider unintended harms of cybersecurity controls, as they might harm the people you are trying to protect

Well-meaning cybersecurity risk owners will deploy countermeasures in an effort to manage the risks they see affecting their services or systems. What is not often considered is that those countermeasures may produce unintended, negative consequences themselves. These unintended consequences can potentially be harmful, adversely affecting user behaviour, user inclusion, or the infrastructure itself (including services of others).

Here, I describe a framework co-developed with several international researchers at a Dagstuhl seminar in mid-2019, resulting in an eCrime 2019 paper later in the year. We were drawn together by an interest in understanding unintended harms of cybersecurity countermeasures, and encouraging efforts to preemptively identify and avoid these harms. Our collaboration on this theme drew on our varied and multidisciplinary backgrounds and interests, including not only risk management and cybercrime, but also security usability, systems engineering, and security economics.

We saw it as necessary to focus on situations where there is often an urgency to counter threats, but where efforts to manage threats have the potential to introduce harms. As documented in the recently published seminar report, we explored specific situations in which potential harms may make resolving the overarching problems more difficult, and as such cannot be ignored – especially where potentially harmful countermeasures ought to be avoided. Example case studies of particular importance include tech-abuse by an intimate partner, online disinformation campaigns, combating CEO fraud and phishing emails in organisations, and online dating fraud.

Consider disinformation campaigns, for example. Efforts to counter disinformation on social media platforms can include fact-checking and automated detection algorithms behind the scenes. These can reduce the burden on users to address the problem. However, automation can also reduce users’ scepticism towards the information they see; fact-checking can be appropriated as a tool by any one group to challenge viewpoints of dissimilar groups.

We then see how unintended harms can shift the burden of managing cybersecurity to others in the ecosystem without them necessarily expecting it or being prepared for it. There can be vulnerable populations which are disadvantaged by the effects of a control more than others. An example may be legitimate users of social media who are removed – or have their content removed – from a platform, due to traits shared with malicious actors or behaviour, e.g., referring to some of the same topics, irrespective of sentiment – an example of ‘Misclassification’, in the list below. If a user, user group, or their online activity are removed from the system, the risk owner for that system may not notice that problems have been created for users in this way – they simply will not see them, as their actions have excluded them. Anticipating and avoiding unintended harms is then crucial before any such outcomes can occur.

Continue reading Consider unintended harms of cybersecurity controls, as they might harm the people you are trying to protect

What We Disclose When We Choose Not To Disclose: Privacy Unraveling Around Explicit HIV Disclosure Fields

For many gay and bisexual men, mobile dating or “hook-up” apps are a regular and important part of their lives. Many of these apps now ask users for HIV status information to create a more open dialogue around sexual health, to reduce the spread of the virus, and to help fight HIV related stigma. Yet, if a user wants to keep their HIV status private from other app users, this can be more challenging than one might first imagine. While most apps provide users with the choice to keep their status undisclosed with some form of “prefer not to say” option, our recent study which we describe in a paper being presented today at the ACM Conference on Computer-Supported Cooperative Work and Social Computing 2018, finds privacy may “unravel” around users who choose this non-disclosure option, which could limit disclosure choice.

Privacy unraveling is a theory developed by Peppet in which he suggests people will self-disclose their personal information when it is easy to do so, low-cost, and personally beneficial. Privacy may then unravel around those who keep their information undisclosed, as they are assumed to be “hiding” undesirable information, and are stigmatised and penalised as a consequence.

In our study, we explored the online views of Grindr users and found concerns over assumptions developing around HIV non-disclosures. For users who believe themselves to be HIV negative, the personal benefits of disclosing are high and the social costs low. In contrast, for HIV positive users, the personal benefits of disclosing are low, whilst the costs are high due to the stigma that HIV still attracts. As a result, people may assume that those not disclosing possess the low gain, high cost status, and are therefore HIV positive.

We developed a series of conceptual designs that utilise Peppet’s proposed limits to privacy unraveling. One of these designs is intended to artificially increase the cost of disclosing an HIV negative status. We suggest time and financial as two resources that could be used to artificially increase disclosure cost. For example, users reporting to be HIV negative could be asked to watch an educational awareness video on HIV prior to disclosing (time), or only those users who had a premium subscription could be permitted to disclose their status (financial). An alternative (or in parallel) approach is to reduce the high cost of disclosing an HIV positive status by designing in mechanisms to reduce social stigma around the condition. For example, all users could be offered the option to sign up to “living stigma-free” which could also appear on their profile to signal others of their pledge.

Another design approach is to create uncertainty over whether users are aware of their own status. We suggest profiles disclosing an HIV negative status for more than 6 months be switched automatically to undisclosed unless they report a recent HIV test. This could act as a testing reminder, as well as increasing uncertainty over the reason for non-disclosures. We also suggest increasing uncertainty or ambiguity around HIV status disclosure fields by clustering undisclosed fields together. This may create uncertainty around the particular field the user is concerned about disclosing. Finally, design could be used to cultivate norms around non-disclosures. For example, HIV status disclosure could be limited to HIV positive users, with non-disclosures then assumed to be a HIV negative status, rather than HIV positive status.

In our paper, we discuss some of the potential benefits and pitfalls of implementing Peppet’s proposed limits in design, and suggest further work needed to better understand the impact privacy unraveling could have in online social environments like these. We explore ways our community could contribute to building systems that reduce its effect in order to promote disclosure choice around this type of sensitive information.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 675730.

Measuring and Modeling the Vivino Wine Social Network

Over the past few years, food and drink have become an essential part of our social media footprints. This shouldn’t come as a surprise – after all, eating and drinking were social activities long before the first #foodporn hashtag on Instagram. In fact, scientific studies have showed that what we gobble up or gulp down is shaped by social and regional influences, and how we tend to mirror habits of people with shared social connections.

Nowadays, we have an unprecedented opportunity to study eating & drinking habits at scale, as people share more and more of that online, both on popular social networks like Instagram, Twitter, and Facebook, but also on “dedicated” apps like Yummly or Untappd.

Along these lines is our recent paper, “Of Wines and Reviews: Measuring and Modeling the Vivino Wine Social Network,” recently presented at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018) in Barcelona. The study – co-authored by former UCL undergraduate student Neema Kotonya, Italian wine journalist Paolo De Cristofaro, and UCL faculty Emiliano De Cristofaro – presented a preliminary study showcasing big-data and social network analysis of how users worldwide consume, rate, and provide reviews of wines. We did so through the lens of Vivino, a popular wine social network. (And, yes, Paolo is Emiliano’s brother! 😊)

What is Vivino?

Vivino.com is an online community for wine enthusiasts, available both as a web and a mobile application. It was founded in 2009 by Heini Zachariassen, with his colleague Theis Sondergaard joining in 2010. In a nutshell, Vivino allows users to review and purchase wines through third-party vendors. The mobile app also provides a “wine scanner” functionality – i.e., users can upload pictures of wine labels and access reviews and details about the wine/winery.

But Vivino is actually a social network, as it allows wine enthusiasts to communicate with and follow each other, as well as share reviews and recommendations. As of September 2018, it had 32 million users, 9.7 million wines covering a multitude of wine styles, grapes, and geographical regions, as well as 103.7 million ratings and almost 35 million reviews.

Continue reading Measuring and Modeling the Vivino Wine Social Network

Thinking about fake news – As a security incident?

In Tristan and David’s Philosophy, Politics and Economics of Security and Privacy class, Jono gave a little information about incident response. As a result, we have been thinking about the recent furor over fake news. There are some big questions circling this topic, and we’re going to try to focus on a part we have some competence in: what an understanding of fake news as a security incident can contribute to the wider debate. Our goal here is mostly to highlight some lessons from security research that should be applicable, so we can help constrain the solution space. Ultimately, any solution will need to engage with wider civil society.

The lessons we will argue for in the following are:

Solutions need to support the elector’s primary task. Education to avoid cognitive biases is not a short- or medium-term solution.
Focus on aligning the incentives of the media companies and the voters. Reduce the return on investment for the adversary.
Any blocking should be strategically useful, and not merely reactionary.

First, we want a more specific term, as well as a less charged one. Fake news includes politically or financially motivated stories presented as factual reports on the world that are fictional in material ways, and usually are intended to stir strong feelings. This definition is hardly complete. Furthermore, similar to the term “post-truth” as discussed by Jasanoff and Simmet, the term “fake news” makes several value judgement we’d like to avoid. “Fake news” carries a strong suggestion that we, the speakers, know what is true and what isn’t, and it also indicates some condescension by the speaker for anyone who believes an item of fake news. We want to avoid such insults. Instead, let’s say we want to focus on the following hypothetical security policy: democratic elections should be free from foreign interference.

Grounding out this policy definition hangs on the term “interference.” This is hard. Ultimately, the will of an elector in a free and fair election needs to be respected. This makes it particularly challenging to agree on constraints to what information an elector has access to. In practice, no elector is omniscient, so some constraints de facto exist. But weighing in on this issue is outside our competence. Let’s assume for now that public policy will provide an assessment of “interference” eventually. The UK recently announced a “dedicated national security communications unit” would be charged with “combating disinformation by state actors and others.” In France, Emmanuel Macron plans legislation to fight interference from foreign sources during elections. Various social media platforms have likewise announced attempted fixes, which means they have some functional definition of what “interference” they’re seeking to remove. Unfortunately, “none of the tech giants claim to be ready” for the November 2018 elections in the US.

Interference in elections is a type of information warfare. An appropriate security policy needs to assess the threat environment and the capabilities of the adversaries. In particular, the Russian Federation has been assessed as a highly motivated and well-resourced actor in this space. We should note that Russia, in turn, assesses the intent and capability of the USA similarly. Tools and tactics within information warfare, particularly disinformation campaigns, help define “interference” within our security policy.

In this context, what can the security research community recommend? Well, the main target of the disinformation campaign are usual citizens. They are targetable largely due to inherent cognitive biases in the way humans process and reason about information. In security terms, we could see these biases as vulnerabilities in the system. Classically, we have two options to secure the system: patch the vulnerability, or prevent the adversary from exploiting it by controlling or filtering the attack before it reaches the target.

Patch in this case would mean teaching people to avoid cognitive biases in their day-to-day reasoning. Psychology tells us this is hard. Intelligence analysts train for months or years for this. And the research in usable security has affirmed time and time again that the users are not the enemy. That is, the system must alleviate the burden on the user’s attention and not interfere with their primary task, or else the user will subvert or avoid the protections put in place. Any changes in user culture are slow. This leads us to lesson 1 on preventing disinformation campaigns for election interference: solutions need to support the elector’s primary task. Education to avoid cognitive biases is not a short-term or medium-term solution.

Controlling the attack vectors is more promising, although filtering them is not. A key aspect of any information security policy is aligning the economic incentives of the actors. Economics is a main reason why infosec is hard. It may not be easy to reorganize the incentives in the advertising and news distribution media space. However, as long as organizations profit from more clicks on an article no matter the content, there will be an incentive to drive viewers that is ultimately at cross-purposes with our security goal. Such misaligned incentives often swamp any technical security solutions. And any adversary with an economic incentive to attack usually will. Thus our second lesson: focus on aligning the incentives of the media companies and the voters; reduce the return on investment for the adversary. Exactly how to do these things will require future work.

There are huge issues about human rights and free speech for blocking access to information. However, the technical aspects of blacklisting are worth understanding before even attempting such human-rights debates. Blacklists of internet resources, such as domain names, IP addresses, or web pages, are useful. But they’re not a final solution. Whether blacklists move at the speed of national legislatures or are updated every five minutes, their main impact is to cause the adversary to move around. Blacklists alone are not enough. We would need to look for suspiciously mobile resources (i.e. fast-flux), and eventually whitelist resources. Blacklists such as implemented by Facebook in response to Congress are helpful. But we should carefully consider how they drive the disinformation campaigns into a place we are better able to counteract them, and be sure we don’t make such campaigns harder to find instead. Lesson 3 is therefore that any blocking should be strategically useful, and not merely reactionary.

We’d be happy for further comments on fake news, disinformation campaigns that interfere with elections, lessons we’ve missed, disagreements about the value of security research to this topic, and other comments you might have! This is a wide open topic, and we’re still sounding it all out.

How 4chan and The_Donald Influence the Fake News Ecosystem

On July 2nd, 2017, Donald Trump, the President of the United States of America, tweeted a short video clip of him punching out a CNN logo. The video was modified from an appearance that Mr. Trump made at a Wrestlemania event. It originally appeared on Reddit’s /r/The_Donald subreddit. Although The_Donald was infamous in some circles, the uproar the image caused was many’s first introduction to a community of users that have had a striking amount of influence on the world stage. While memes like the one birthed from The_Donald are worrying, but mostly harmless, a shocking amount of disinformation (“fake news”) is also created by and spread from smaller, fringe Web communities that have relatively outsized influence on the greater Web.

#FraudNewsCNN #FNN pic.twitter.com/WYUnHjjUjg

— Donald J. Trump (@realDonaldTrump) July 2, 2017

In a nutshell, the explosion of the Web has commoditized the creation of false information and enabled it to spread like wild fire at unprecedented scale. After a decade and a half of experience with social media platforms, bad actors have honed their techniques and been surprisingly adept at crafting messages that at best make it difficult to distinguish between fact and fiction, and at worst propagate dangerous falsehoods.

While recent discourse has tried to blur the lines between real and fake news, there are some fundamental differences. For example, the simple fact that fake news has to be created in the first place. Real news, even opinion pieces, is based around reporting and interpretation of factual material. Not to dismiss the efforts of journalists, but, the fact remains: they are not responsible for generating stories from whole cloth. This is not the case with the type of misinformation pushed by certain corners of the Web.

Consider recent events like the death of Heather Heyer during the Charlottesville protests earlier this year. While facts had to be discovered, they were facts, supported by evidence gathered by trained professionals (both law enforcement and journalists) over a period of time. This is real news. However, the facts did not line up with the far right political ideology espoused on 4chan’s /pol/ board (or if we want to play Devil’s Advocate, it made for good trolling material), and thus its users set about creating alternative narratives. Immediately they began working towards a shocking, to the uninitiated, nearly singular goal: deflect from the fact that a like mind committed a heinous act of violence in any way possible.

4chan: Crowdsourced Opposition Intelligence

With over a year of observing, measuring, and trying to understand the rise of the alt-right online we saw a familiar pattern emerge: crowdsourced opposition intelligence.

/pol/ users mobilized in a perverse, yet fascinating, use of the Web. Dozens of, often conflicting, discussion threads putting forth alternative theories of Ms. Heyer’s killing, supported by everything from pure conjecture, to dubious analysis of mobile phone video and pictures, to impressive investigations discovering personal details and relationships of victims and bystanders. Over time, pieces of the fabrication were agreed upon and tweaked until it resembled in large part a plausible, albeit eyebrow raising, false reality ready for consumption by the general public. Further, as bits of the narrative are debunked, it continues to evolve, weeks after the actual facts have been established.

One month after Heather Heyer’s killing, users on /pol/ were still pushing fabricated alternative narratives of events.

The Web Centipede

There are many anecdotal examples of smaller communities on the Web bubbling up and influencing the rest of the Web, but the plural of anecdote is not data. The research community has studied information diffusion on specific social media platforms like Facebook and Twitter, and indeed each of these platforms is under fire from government investigations in the US, UK, and EU, but the Web is much bigger than just Facebook and Twitter. There are other forces at play, where false information is incubated and crafted for maximum impact before it reaches a mainstream audience. Thus, we set out to measure just how this influence flows in a systematic and methodological manner, analyzing how URLs from 45 mainstream and 54 alternative news sources are shared across 8 months of Reddit, 4chan, and Twitter posts.

While we made many interesting findings, there are a few we’ll highlight here:

Reddit and 4chan post mainstream news URLs at over twice the rate than Twitter does, and 4chan in particular posts alternative news URLs at twice the rate of Twitter and Reddit.
We found that alternative news URLs spread much faster than mainstream URLs, perhaps an artifact of automated bots.
While 4chan was usually the slowest to a post a given URL, it was also the most successful at “reviving” old stories: if a URL was re-posted after a long period of time, it probably showed up on 4chan originally.

Graph representation of news ecosystem for mainstream news domains (left) and alternative news domains (right). We create two directed graphs, one for each type of news, where the nodes represent alternative or mainstream domains, as well as the three platforms, and the edges are the sequences that consider only the first-hop of the platforms. For example, if a breitbart.com URL appears first on Twitter and later on the six selected subreddits, we add an edge from breitbart.com to Twitter, and from Twitter to the six selected subreddits. We also add weights on these edges based on the number of such unique URLs. Edges are colored the same as their source node.

Measuring Influence Through the Lens of Mainstream and Alternative News

While comparative analysis of news URL posting behavior provides insight into how Web communities connect together like a centipede through which information flows, it is not sufficiently powerful to quantify the specific levels of influence they have.

Continue reading How 4chan and The_Donald Influence the Fake News Ecosystem

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31