Scanning the Internet for Liveness

Internet-wide scanning (or probing) has emerged as a key measurement technique to study a diverse set of the Internet’s properties, including address space utilization, host reachability, topology, service availability, vulnerabilities, and service discrimination. But despite its widespread use and critical importance for Internet measurement, we still lack a clear understanding of IP liveness—whether a target IP address responds to a probe packet. What type of probe packets should we send if we, for example, want to maximize the responding host population? What type of responses can we expect and which factors determine such responses? What degree of consistency can we expect when probing the same host with different probe packets?

In our recent paper Scanning the Internet for Liveness, we presented a systematic analysis of liveness and how it shows up in active scanning campaigns. We developed a taxonomy of liveness which we employed to develop a method to perform concurrent IPv4 scans using ICMP, five TCP-based, and two UDP-based protocols, capturing all responses to our probes. Our key findings are:

  • Responsive host populations are highly sensitive to the choice of probe. While ICMP discovers the highest number of raw IPs, our TCP and UDP measurements exclusively contribute a fifth to the total population of responsive hosts.
  • Collecting ICMP Error messages for TCP and UDP scans increases the responsive population by more than 13%, and provides new opportunities to interpret scan results.
  • At the transport layer, our concurrent measurements reveal that the majority of hosts exhibit inconsistent behaviour when probed on different ports, and that capturing negative responses significantly improves scanning completeness.
  • Our concurrent scans allow us to identify nearly 2M tarpits (IPs masquerading as fake hosts) that would bias measurements that do not take them into account.
  • Our study of cross-protocol liveness shows that responsiveness for some protocols is correlated, suggesting that the same seed set of responsive IP addresses can be potentially used to bootstrap multiple highly-correlated target populations to reduce scan traffic.

This work recently appeared in the April 2018 issue of ACM SIGCOMM Computer Communication Review (CCR), and was conducted in collaboration with Philipp Richter (MIT), Mobin Javed (LUMS Pakistan, ICSI Berkeley), Srikanth Sundaresan (Princeton University), Zakir Durumeric (Stanford University), Steven J. Murdoch (University College London), Richard Mortier (University of Cambridge) and Vern Paxson (UC Berkeley, ICSI Berkeley). Overall, this study yields practical insights and methodological improvements for the design and the execution of active Internet measurement studies. We released the code and data of this work as open source to allow for reproducibility of the results, and to enable further research.

Continue reading Scanning the Internet for Liveness

Tampering with OpenPGP digitally signed messages by exploiting multi-part messages

The EFAIL vulnerability in the OpenPGP and S/MIME secure email systems, publicly disclosed yesterday, allows an eavesdropper to obtain the contents of encrypted messages. There’s been a lot of finger-pointing as to which particular bit of software is to blame, but that’s mostly irrelevant to the people who need secure email. The end result is that users of encrypted email, who wanted formatting better than what a mechanical typewriter could offer, were likely at risk.

One of the methods to exploit EFAIL relied on the section of the email standard that allows messages to be in multiple parts (e.g. the body of the message and one or more attachments) – known as MIME (Multipurpose Internet Mail Extensions). The authors of the EFAIL paper used the interaction between MIME and the encryption standard (OpenPGP or S/MIME as appropriate) to cause the email client to leak the decrypted contents of a message.

However, not only can MIME be used to compromise the secrecy of messages, but it can also be used to tamper with digitally-signed messages in a way that would be difficult if not impossible for the average person to detect. I doubt I was the first person to discover this, and I reported it as a bug 5 years ago, but it still seems possible to exploit and I haven’t found a proper description, so this blog post summarises the issue.

The problem arises because it is possible to have a multi-part email where some parts are signed and some are not. Email clients could have adopted the fail-safe option of considering such a mixed message to be malformed and therefore treated as unsigned or as having an invalid signature. There’s also the fail-open option where the message is considered signed and both the signed and unsigned parts are displayed. The email clients I looked at (Enigmail with Mozilla Thunderbird, and GPGTools with Apple Mail) opt for a variant of the fail-open approach and thus allow emails to be tampered with while keeping their status as being digitally signed.

Continue reading Tampering with OpenPGP digitally signed messages by exploiting multi-part messages

An investigation of online censorship in Cyprus

The island of Cyprus, situated in the east of the Mediterranean sea, has always been an important commercial and information exchange hub. Today, this is reflected on the large number of submarine cables that facilitate telecommunications with neighboring countries (Greece, Turkey, Egypt, Israel, Syria, and Lebanon) and with the rest of the world (reaching as far as India, South Korea, and Australia). Nevertheless, the Republic of Cyprus (RoC) is officially regarded as a freedom of expression safe haven, where “Internet is completely free of any specific regulation”. Unfortunately, Cypriot netizens claim that such statements couldn’t be further from the truth.

In recent years, Internet Service Providers (ISPs) in RoC have implemented an Internet filtering infrastructure to comply with the laws and regulations implied by the National Betting Authority (NBA). In an effort to understand the capacity of this infrastructure, a multi-disciplinary group of volunteers from the hack66 Observatory in Nicosia has collected and analyzed connectivity measurements from end-user connections on a variety of websites and services. Their report was presented at the 7th International Conference on e-Democracy.

For their experiments, the hack66 Observatory team put together a testlist comprising of domains from the National Betting Authority blocklist, the CitizenLab lists for Greece and Turkey, and WordPress blogs banned in Turkey as reported at the Lumen Database. The analysis was based on over 45,000 measurements from four residential ISPs operating in the Republic of Cyprus, that were anonymously submitted using a custom OONI probe during the months of March to May 2017. In addition, the team collected data using open DNS resolvers in Cyprus. Early findings suggest that the most common blocking method is DNS hijacking. Furthermore, the measurements indicate that some of the ISPs have deployed middle-boxes – network components capable of performing censorship, traffic manipulation or surveillance.

A closer inspection on the variations of the censorship mechanism implementations among ISPs raised concerns with regard to transparency and privacy: some ISPs do not inform users why a blocked website is not accessible; while others redirect requests to a web server controlled by the NBA, that could in turn log user identifiers such as their IP address. Similarly, the hack66 Observatory team was able to identify a number of unreported Internet censorship cases, entries in the NBA blocklist that either are invalid or that require sophisticated blocking techniques, and collateral damage due to blocking of email delivery to the regulated domains.

Understanding the case of Internet freedom in Cyprus becomes more complicated when the geopolitical situation is taken into consideration. Apart from the Republic of Cyprus, the island of Cyprus is divided into three other segments: the self-declared Turkish Republic of Northern Cyprus; the United Nations-controlled Green Line buffer zone; and the Sovereign Base Areas of Akrotiri and Dhekelia that remain under British control for military purposes. Measurements from the Multimax ISP operating in the area occupied by Turkey indicate network interference practices similar to those of mainland Turkey. This could be interpreted as the existence of two distinct regimes in terms of information policy on the island of Cyprus. No volunteers submitted measurements from the UN buffer zone or the British Sovereign bases. However, it is known via the Snowden revelations that GCHQ is operating a wiretap base in Cyprus codenamed “SOUNDER”, jointly funded by the NSA.

The purpose of the hack66 Observatory is to “to collect and analyze data, and routes of data through EMEA, […] in order to promote evidence based policy making”. The timing is just right, given the recent RoC government announcement of a new bill in the making, to regulate media operations and stop fake news. With their report, the hack66 Observatory aims to provide policy makers with a valuable asset for understanding the limitations and implications of the existing censorship infrastructure, and to start a debate around Internet freedom on the entirety of the island of Cyprus.

A Privacy Enhancing Architecture for Secure Wearable Devices

Why do we need security on wearable devices? The primary reason comes from the fact that, being in direct contact with the user, wearable devices have access to very private and sensitive user’s information more often than traditional technologies. The huge and increasing diversity of wearable technologies makes almost any kind of information at risk, going from medical records to personal habits and lifestyle. For that reason, when considering wearables, it is particularly important to introduce appropriate technologies to protect these data, and it is primary that both the user and the engineer are aware of the exact amount of collected information as well as the potential threats pending on the user’s privacy. Moreover, it has also to be considered that the privacy of the wearable’s user is not the only one at risk. In fact, more and more devices are not limited to record the user’s activity, but can also gather information about people standing around.

This blog post presents a flexible privacy enhancing system from its architecture to the prototyping level. The system takes advantage from anonymous credentials and is based on the protocols developed by M. Chase, S. Meiklejohn, and G. Zaverucha in Algebraic MACs and Keyed-Verification Anonymous Credentials. Three main entities are involved in this multi-purpose system: a main server, wearable devices and localisation beacons. In this multipurpose architecture, the server firstly issue some anonymous credentials to the wearables. Then, each time a wearable reach a particular physical location (gets close to a localisation beacon) where it desires to perform an action, it starts presenting its credentials in order to ask the server the execution of that a particular action. Both the design of the wearable and the server remain generic and scalable in order to encourage further enhancements and easy integration into real-world applications; i.e., the central server can manage an arbitrary number of devices, each device can posses an arbitrary number of credentials and the coverage area of the localisation system is arbitrarily extendable.


The complete system’s architecture can be modelled as depicted in the figure below. Roughly speaking, a web interface is used to manage and display the device’s functions. Each user and admin access the system from that interface.


During the setup phase, the server issues the credentials to a selected device (according to the algorithms presented in Algebraic MACs and Keyed-Verification Anonymous Credentials) granting it a given privilege level. The credentials’ issuance is a short-range process. In fact, the wearable needs to be physically close to the server to allow the admins to physically verify, once and for all, the identity of the wearables’ users. In order to improve security and battery life, the wearable only communicates using extremely low-power and short-range radio waves (dotted line on the figure). The server beacons can be seen as continuities of the main server and have essentially two roles: the first is to operate as an interface between the wearables and the server, and the second is to act as a RF localisation system. Each time the wearable granted with enough privileges reaches some particular physical location (gets close to a localisation beacon), it starts presenting its credentials in order to prove to the server that it possesses credentials over some attributes (without revealing them), and that these credentials have been previously issued by the server itself. Note that the system preserves anonymity only if many users are involved (for each privilege level), but this is a classic requirement of anonymous systems. Finally, once the credentials have been successfully verified by the server, the server issues a signed request to an external entity (which can be, for instance, an automatic door, an alarm system or any compatible IoT entity) to perform the requested action.

Continue reading A Privacy Enhancing Architecture for Secure Wearable Devices

QUUX: a QUIC un-multiplexing of the Tor relay transport

Latency is a key factor in the usability of web browsing. This has added relevance in the context of anonymity systems such as Tor, because the anonymity property is strengthened by having a larger user-base.

Decreasing the latency of typical web requests in Tor could encourage a wider user base, making it more viable for typical users who value their privacy and less conspicuous for the people who most need it. With this in mind for my MSc Information Security project at UCL, supervised by Dr Steven J. Murdoch, I looked at the transport subsystem used by the Tor network, hoping to improve its performance.

After a literature review of the area (several alternative transport designs have been proposed in the past), I started to doubt my initial mental model for an alternative design.

Data flow in Freedom
Data flow in Freedom (Murdoch, 2011)
Data flow in Tor
Data flow in Tor (Murdoch, 2011)

These diagrams show an end-to-end design (Freedom) and hop-by-hop design (Tor) respectively. In the end-to-end design, encrypted IP packets are transported between relays using UDP, with endpoints ensuring reliable delivery of packets. In the hop-by-hop design, TCP data is transported between relays, with relays ensuring reliable delivery of data.

The end-to-end Freedom approach seems elegant, with relays becoming somewhat closer to packet routers, however it also leads to longer TCP round-trip times (RTT) for web browser HTTP connections. Other things being equal, a longer TCP RTT will result in a slower transfer. Additional issues include difficulty in ensuring fairness of utilisation (requiring an approach outlined by Viecco), and potentially greater vulnerability to latency-based attack.

Therefore I opted to follow the hop-by-hop transport approach Tor currently takes. Tor multiplexes cells for different circuits over a single TCP connection between relay-pairs, and as a result a lost packet for one circuit could hold up all circuits that share the same connection (head-of-line blocking). A long-lived TCP connection is beneficial for converging on an optimal congestion window size, but the approach suffers from head-of-line blocking and doesn’t compete effectively with other TCP connections using the same link.

To remedy these issues, I made a branch of Tor which used a QUIC connection in place of the long-lived TCP connection. Because a QUIC connection carries multiple TCP-like streams, it doesn’t suffer from head-of-line blocking. The streams also compete for utilisation at the same level as TCP connections, allowing them to more effectively use either the link capacity or the relay-configured bandwidth limit.

Download time for a 320KiB file
Download time for a 320KiB file

Initial results from the experiments are promising, as shown above. There’s still a way to go before such a design could make it into the Tor network. This branch shows the viability of the approach for performance, but significant engineering work still lies ahead to create a robust and secure implementation that would be suitable for deployment. There will also likely be further research to more accurately quantify the performance benefits of QUIC for Tor. Further details can be found in my MSc thesis.