Can Ethics Help Restore Internet Freedom and Safety?

Internet services are suffering from various maladies ranging from algorithmic bias to misinformation and online propaganda. Could computer ethics be a remedy? Mozilla’s head Mitchell Baker warns that computer science education without ethics will lead the next generation of technologists to inherit the ethical blind spots of those currently in charge. A number of leaders in the tech industry have lent their support to Mozilla’s Responsible Computer Science Challenge initiative to integrate ethics with undergraduate computer science training. There is a heightened interest in the concept of ethical by design, the idea of baking ethical principles and human values into the software development process from design to deployment.

Ethical education and awareness are important, and there exist a number of useful relevant resources. Most computer science practitioners refer to the codes of ethics and conduct provided by the field’s professional bodies such as the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers, and in the UK the British Computing Society and the Institute of Engineering and Technology. Computer science research is predominantly guided by the principles laid out in the Menlo Report.

But aspirations and reality often diverge, and ethical codes do not directly translate to ethical practice. Or the ethical practices of about five companies to be precise. The concentration of power among a small number of big companies means that their practices define the online experience of the majority of Internet users. I showed this amplified power in my study on the Web’s differential treatment of the users of Tor anonymity network.

Ethical code alone is not enough and needs to be complemented by suitable enforcement and reinforcement. So who will do the job? Currently, for the most part, companies themselves are the judge and jury in how their practices are regulated. This is not a great idea. The obvious misalignment of incentives is aptly captured in an Urdu proverb that means: “The horse and grass can never be friends”. Self-regulation by companies can result in inconsistent and potentially biased regulation patterns, and/or over-regulation to stay legally safe.

Continue reading Can Ethics Help Restore Internet Freedom and Safety?

Scanning the Internet for Liveness

Internet-wide scanning (or probing) has emerged as a key measurement technique to study a diverse set of the Internet’s properties, including address space utilization, host reachability, topology, service availability, vulnerabilities, and service discrimination. But despite its widespread use and critical importance for Internet measurement, we still lack a clear understanding of IP liveness—whether a target IP address responds to a probe packet. What type of probe packets should we send if we, for example, want to maximize the responding host population? What type of responses can we expect and which factors determine such responses? What degree of consistency can we expect when probing the same host with different probe packets?

In our recent paper Scanning the Internet for Liveness, we presented a systematic analysis of liveness and how it shows up in active scanning campaigns. We developed a taxonomy of liveness which we employed to develop a method to perform concurrent IPv4 scans using ICMP, five TCP-based, and two UDP-based protocols, capturing all responses to our probes. Our key findings are:

  • Responsive host populations are highly sensitive to the choice of probe. While ICMP discovers the highest number of raw IPs, our TCP and UDP measurements exclusively contribute a fifth to the total population of responsive hosts.
  • Collecting ICMP Error messages for TCP and UDP scans increases the responsive population by more than 13%, and provides new opportunities to interpret scan results.
  • At the transport layer, our concurrent measurements reveal that the majority of hosts exhibit inconsistent behaviour when probed on different ports, and that capturing negative responses significantly improves scanning completeness.
  • Our concurrent scans allow us to identify nearly 2M tarpits (IPs masquerading as fake hosts) that would bias measurements that do not take them into account.
  • Our study of cross-protocol liveness shows that responsiveness for some protocols is correlated, suggesting that the same seed set of responsive IP addresses can be potentially used to bootstrap multiple highly-correlated target populations to reduce scan traffic.

This work recently appeared in the April 2018 issue of ACM SIGCOMM Computer Communication Review (CCR), and was conducted in collaboration with Philipp Richter (MIT), Mobin Javed (LUMS Pakistan, ICSI Berkeley), Srikanth Sundaresan (Princeton University), Zakir Durumeric (Stanford University), Steven J. Murdoch (University College London), Richard Mortier (University of Cambridge) and Vern Paxson (UC Berkeley, ICSI Berkeley). Overall, this study yields practical insights and methodological improvements for the design and the execution of active Internet measurement studies. We released the code and data of this work as open source to allow for reproducibility of the results, and to enable further research.

Continue reading Scanning the Internet for Liveness

Systematizing Consensus in the Age of Blockchains

We are at a crucial point in the evolution of blockchains, and the biggest hurdle in their widespread adoption is improving their performance and scalability. These properties are deeply related to the consensus protocol used—the core component of the blockchain allowing multiple nodes to agree on the data to be sealed in the chain. This week we published a pre-print of the first comprehensive systematization of blockchain consensus protocols. This blog post discusses the motivation for this study, the challenges in systematization, and a summary of the key contributions.

Consensus is an old well-studied problem in computer science. The distributed systems community has studied it for decades, and developed robust and practical protocols that can tolerate faulty and malicious nodes. However, these protocols were designed for small closed groups and cannot be directly applied to blockchains that require consensus in very large peer-to-peer open participation settings.

The Bitcoin Consensus Protocol

Bitcoin’s main innovation was to enable consensus among an open, decentralized group of nodes. This involves a leader election based on proof-of-work: all nodes attempt to find the solution to a hash puzzle and the node that wins adds the next block to the blockchain. A downside of its probabilistic leader election process, combined with performance variations in decentralized networks, is that Bitcoin offers only weak consistency. Different nodes might end up having different views of the blockchain leading to forks. Moreover, Bitcoin suffers from poor performance which cannot be fixed without fundamental redesign, and its proof-of-work consumes a huge amount of energy.

Improved Blockchain Consensus Protocols

Because of these issues, over the last few years a plethora of designs for new consensus protocols have been proposed. Some replace Bitcoin’s proof-of-work with more energy-efficient alternatives, while others modify Bitcoin’s original design for better performance. To achieve strong consistency and similar performance as mainstream payment processing systems like Visa and PayPal, another vein of work proposes to repurpose classical consensus protocols for use in blockchains. As a result of these various design proposals, the area has become too complex to see the big picture.

Systematization Challenges

To date there exists no systematic and comprehensive study of blockchain consensus protocols. Such a study is challenging because of two reasons. First, a comprehensive survey of blockchains would be incomplete without a discussion of classical consensus protocols. But the literature is vast and complex, which makes it hard to be tailored to blockchains. Second, conducting a survey of consensus protocols in blockchains has its own difficulties. Though the field is young, it is both high-volume and fast-paced. The figure above shows the number of papers published on blockchains each year since Bitcoin’s inception in 2008 (sourced from CABRA).  One might consider only accounting for work published in reputable venues, but this approach is not feasible in the case of blockchains because the bulk of the work is published in non peer-reviewed venues and as white papers for industrial platforms.

Systematization of Blockchain Consensus Protocols

To fill this gap, this week we published a pre-print of the first comprehensive systematization of blockchain consensus protocols—mapping out their evolution from the classical distributed systems use case to their application to blockchains. After first discussing key themes in classical consensus protocols, we describe: (i) protocols based on proof-of-work, (ii) proof-of-X protocols that replace proof-of-work with more energy-efficient alternatives, and (iii) hybrid protocols that are compositions or variations of classical consensus protocols. We developed a framework to evaluate their performance, security and design properties, and used it to systematize key themes in different protocol categories. This work highlighted a number of open areas and challenges related to gaps between classical consensus protocols and blockchains, security vs performance tradeoffs, incentives, and privacy. We hope that this longitudinal perspective will inspire the design of new and faster consensus protocols that can cater to varying security and privacy requirements.

Chainspace: A Sharded Smart Contracts Platform

Thanks to their resilience, integrity, and transparency properties, blockchains have gained much traction recently, with applications ranging from banking and energy sector to legal contracts and healthcare. Blockchains initially received attention as Bitcoin’s underlying technology. But for all its success as a popular cryptocurrency, Bitcoin suffers from scalability issues: with a current block size of 1MB and 10 minute inter-block interval, its throughput is capped at about 7 transactions per second, and a client that creates a transaction has to wait for about 10 minutes to confirm that it has been added to the blockchain. This is several orders of magnitude slower that what mainstream payment processing companies like Visa currently offer: transactions are confirmed within a few seconds, and have ahigh throughput of 2,000 transactions per second on average, peaking up to 56,000 transactions per second. A reparametrization of Bitcoin can somewhat assuage these issues, increasing throughput to to 27 transactions per second and 12 second latency. Smart contract platforms, such as Ethereum inherit those scalability limitations. More significant improvements, however, call for a fundamental redesign of the blockchain paradigm.

This week we published a pre-print of our new Chainspace system—a distributed ledger platform for high-integrity and transparent processing of transactions within a decentralized system. Chainspace uses smart contracts to offer extensibility, rather than catering to specific applications such as Bitcoin for a currency, or certificate transparency for certificate verification. Unlike Ethereum, Chainspace’s sharded architecture allows for a ledger linearly scalable since only the nodes concerned with the transaction have to process it. Our modest testbed of 60 cores achieves 350 transactions per second. In comparison, Bitcoin achieves a peak rate of less than 7 transactions per second for over 6k full nodes, and Ethereum currently processes 4 transactions per second (of a theoretical maximum of 25). Moreover, Chainspace is agnostic to the smart contract language, or identity infrastructure, and supports privacy features through modern zero-knowledge techniques. We have released the Chainspace whitepaper, and the code is available as an open-source project on GitHub.

System Overview

The figure above illustrates the system design of Chainspace. Chainspace is comprised of a network of infrastructure nodes that manage valid objects and ensure that only valid transactions on those objects are committed.  Let’s look at the data model of Chainspace first. An object represents a unit of data in the Chainspace system (e.g., a bank account), and is in one of the following three states: active (can be used by a transaction), locked (is being processed by an existing transaction), or inactive (was used by a previous transaction).  Objects also have a type that determines the unique identifier of the smart contract that defines them. Smart contract procedures can operate on active objects only, while inactive objects are retained just for the purposes of audit. Chainspace allows composition of smart contracts from different authors to provide ecosystem features. Each smart contract is associated with a checker to enable private processing of transactions on infrastructure nodes since checkers do not take any secret local parameters. Checkers are pure functions (i.e., deterministic, and have no side-effects) that return a boolean value.

Now, a valid transaction accepts active input objects along with other ancillary information, and generates output objects (e.g., transfers money to another bank account). To achieve high transaction throughput and low latency, Chainspace organizes nodes into shards that manage the state of objects, keep track of their validity, and record transactions aborted or committed. We implemented this using Sharded Byzantine Atomic Commit (S-BAC)—a protocol that composes existing Byzantine Fault Tolerant (BFT) agreement and atomic commit primitives in a novel way. Here is how the protocol works:

  • Intra-shard agreement. Within each shard, all honest nodes ensure that they consistently agree on accepting or rejecting a transaction.
  • Inter-shard agreement. Across shards, nodes must ensure that transactions are committed if all shards are willing to commit the transaction, and rejected (or aborted) if any shards decide to abort the transaction.

Consensus on committing (or aborting) transactions takes place in parallel across different shards. A nice property of S-BAC’s atomic commit protocol is that the entire shard—rather than a third party—acts as a coordinator. This is in contrast to other sharding-based systems with cryptocurrency application like OmniLedger or RSCoin where an untrusted client acts as the coordinator, and is incentivized to act honestly. Such incentives do not hold for a generalized platform like Chainspace where objects may have shared ownership.

Continue reading Chainspace: A Sharded Smart Contracts Platform

Adblocking and Counter-Blocking: A Slice of the Arms Race

anti-adblocking message from WIRED
If you use an adblocker, you are probably familiar with messages of the kind shown above, asking you to either disable your adblocker, or to consider supporting the host website via a donation or subscription. This is the battle du jour in the ongoing adblocking arms race — and it’s one we explore in our new report Adblocking and Counter-Blocking: A Slice of the Arms Race.

The reasons for the rising popularity of adblockers include improved browsing experience, better privacy, and protection against malvertising. As a result, online advertising revenue is gravely threatened by adblockers, prompting publishers to actively detect adblock users, and subsequently block them or otherwise coerce the user to disable the adblocker — practices we refer to as anti-adblocking. While there has been a degree of sound and fury on the topic, until now we haven’t been able to understand the scale, mechanism and dynamics of anti-adblocking. This is the gap we have started to address, together with researchers from the University of Cambridge, Stony Brook University, University College London, University of California Berkeley, Queen Mary University of London and International Computer Science Institute (Berkeley). We address some of these questions by leveraging a novel approach for identifying third-party services shared across multiple websites to present a first characterization of anti-adblocking across the Alexa Top-5K websites.

We find that at least 6.7% of Alexa Top-5K websites employ anti-adblocking, with the practices finding adoption across a diverse mix of publishers; particularly publishers of “General News”, “Blogs/Wiki”, and “Entertainment” categories. It turns out that these websites owe their anti-adblocking capabilities to 14 unique scripts pulled from 12 unique domains. Unsurprisingly, the most popular domains are those that have skin in the game — Google, Taboola, Outbrain, Ensighten and Pagefair — the latter being a company that specialises in anti-adblocking services. Then there are in-house anti-adblocking solutions that are distributed by a domain to client websites belonging to the same organisation: TripAdvisor distributes an anti-adblocking script to its eight websites with different country code top-level domains, while adult websites (all hosted by MindGeek) turn to DoublePimp. Finally, we visited a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger, and discovered that half of the 12 anti-adblocking suppliers are counter-blocked by at least one adblocker — suggesting that the arms race has already entered the next level.

It is hard to say how many levels deeper the adblocking arms race might go. While anti-adblocking may provide temporary relief to publishers, it is essentially band-aid solution to mask a deeper issue — the disequilibrium between ads (and, particularly, their behavioural tracking back-end) and information. Any long term solution must address the reasons that brought users to adblockers in the first place. In the meantime, as the arms race continues to escalate, we hope that studies such as ours will bring transparency to this opaque subject, and inform policy that moves us out of the current deadlock.

 

“Ad-Blocking and Counter Blocking: A Slice of the Arms Races” by Rishab Nithyanand, Sheharbano Khattak, Mobin Javed, Narseo Vallina-Rodriguez, Marjan Falahrastegar, Julia E. Powles, Emiliano De Cristofaro, Hamed Haddadi, and Steven J. Murdoch. arXiv:1605.05077v1 [cs.CR], May 2016.

This post also appears on the University of Cambridge Computer Laboratory Security Group blog, Light Blue Touchpaper.

“Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

If you use an anonymity network such as Tor on a regular basis, you are probably familiar with various annoyances in your web browsing experience, ranging from pages saying “Access denied” to having to solve CAPTCHAs before continuing. Interestingly, these hurdles disappear if the same website is accessed without Tor. The growing trend of websites extending this kind of “differential treatment” to anonymous users undermines Tor’s overall utility, and adds a new dimension to the traditional threats to Tor (attacks on user privacy, or governments blocking access to Tor). There is plenty of anecdotal evidence about Tor users experiencing difficulties in browsing the web, for example the user-reported catalog of services blocking Tor. However, we don’t have sufficient detail about the problem to answer deeper questions like: how prevalent is differential treatment of Tor on the web; are there any centralized players with Tor-unfriendly policies that have a magnified effect on the browsing experience of Tor users; can we identify patterns in where these Tor-unfriendly websites are hosted (or located), and so forth.

Today we present our paper on this topic: “Do You See What I See? Differential Treatment of Anonymous Users” at the Network and Distributed System Security Symposium (NDSS). Together with researchers from the University of Cambridge, University College London, University of California, Berkeley and International Computer Science Institute (Berkeley), we conducted comprehensive network measurements to shed light on websites that block Tor. At the network layer, we scanned the entire IPv4 address space on port 80 from Tor exit nodes. At the application layer, we fetch the homepage from the most popular 1,000 websites (according to Alexa) from all Tor exit nodes. We compare these measurements with a baseline from non-Tor control measurements, and uncover significant evidence of Tor blocking. We estimate that at least 1.3 million IP addresses that would otherwise allow a TCP handshake on port 80 block the handshake if it originates from a Tor exit node. We also show that at least 3.67% of the most popular 1,000 websites block Tor users at the application layer.

Continue reading “Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users