Program obfuscation

I had the pleasure of visiting the Simons Institute for the Theory of Computing at UC Berkeley over the Summer. One of the main themes of the programme was obfuscation.

Recently there has been a lot of exciting research on developing cryptographic techniques for program obfuscation. Obfuscation is of course not a new thing, you may already be familiar with the obfuscated C contest. But the hope underlying this research effort is to replace ad hoc obfuscation, which may or may not be possible to reverse engineer, with general techniques that can be applied to obfuscate any program and that satisfy a rigorous definition of what it means for a program to be obfuscated.

Even defining what it means for a program to be obfuscated is not trivial. Ideally, we would like an obfuscator to be a compiler that turns any computer program into a virtual black-box. By a virtual black-box we mean that the obfuscated program should preserve functionality while not leaking any other information about the original code, i.e., it should have the same input-output behaviour as the original program and you should not be able to learn anything beyond what you could learn by executing the original program. Unfortunately it turns out that this is too ambitious a goal: there are special programs, which are impossible to virtual black-box obfuscate.

Instead cryptographers have been working towards developing something called indistinguishability obfuscation. Here the goal is that given two functionally equivalent programs P1 and P2 of the same size and an obfuscation of one of them O(Pi), it should not be possible to tell which one has been obfuscated. Interestingly, even though this is a weaker notion than virtual black-box obfuscation it has already found many applications in the construction of cryptographic schemes. Furthermore, it has been proved that indistinguishability obfuscation is the best possible obfuscation in the sense that any information leaked by an obfuscated program is also leaked by any other program computing the same functionality.

So, when will you see obfuscation algorithms that you can use to obfuscate your code? Well, current obfuscation algorithms have horrible efficiency and are far from practical applicability so there is still a lot of research to be done on improving them. Moreover, current obfuscation proposals are based on something called graded encoding schemes. At the moment, there is a tug of war going on between cryptographers proposing graded encoding schemes and cryptanalysts breaking them. Breaking graded encoding schemes does not necessarily break the obfuscation algorithms building on them but it is fair to say that right now the situation is a mess. Which from a researcher’s perspective makes the field very exciting since there is still a lot to discover!

If you want to learn more about obfuscation I recommend the watching some of the excellent talks from the programme.

Scaling Tor hidden services

Tor hidden services offer several security advantages over normal websites:

  • both the client requesting the webpage and the server returning it can be anonymous;
  • websites’ domain names (.onion addresses) are linked to their public key so are hard to impersonate; and
  • there is mandatory encryption from the client to the server.

However, Tor hidden services as originally implemented did not take full advantage of parallel processing, whether from a single multi-core computer or from load-balancing over multiple computers. Therefore once a single hidden service has hit the limit of vertical scaling (getting faster CPUs) there is not the option of horizontal scaling (adding more CPUs and more computers). There are also bottle-necks in the Tor networks, such as the 3–10 introduction points that help to negotiate the connection between the hidden service and the rendezvous point that actually carries the traffic.

For my MSc Information Security project at UCL, supervised by Steven Murdoch with the assistance of Alec Muffett and other Security Infrastructure engineers at Facebook in London, I explored possible techniques for improving the horizontal scalability of Tor hidden services. More precisely, I was looking at possible load balancing techniques to offer better performance and resiliency against hardware/network failures. The focus of the research was aimed at popular non-anonymous hidden services, where the anonymity of the service provider was not required; an example of this could be Facebook’s .onion address.

One approach I explored was to simply run multiple hidden service instances using the same private key (and hence the same .onion address). Each hidden service periodically uploads its own descriptor, which describes the available introduction points, to six hidden service directories on a distributed hash table. The hidden service instance chosen by the client depends on which hidden service instance most recently uploaded its descriptor. In theory this approach allows an arbitrary number of hidden service instances, where each periodically uploads its own descriptors, overwriting those of others.

This approach can work for popular hidden services because, with the large number of clients, some will be using the descriptor most recently uploaded, while others will have cached older versions and continue to use them. However my experiments showed that the distribution of the clients over the hidden service instances set up in this way is highly non-uniform.

I therefore ran experiments on a private Tor network using the Shadow network simulator running multiple hidden service instances, and measuring the load distribution over time. The experiments were devised such that the instances uploaded their descriptors simultaneously, which resulted in different hidden service directories receiving different descriptors. As a result, clients connecting to a hidden service would be balanced more uniformly over the available instances.

Continue reading Scaling Tor hidden services

Sarah Meiklejohn – Security and Cryptography

Sarah Meiklejohn As a child, Sarah Meiklejohn thought she might become a linguist, largely because she was so strongly interested in the work being done to decode the ancient Greek writing systems Linear A and Linear B.

“I loved all that stuff,” she says. “And then I started doing mathematics.” At that point, with the help of Simon Singh’s The Code Book, she realised the attraction was codebreaking rather than human languages themselves. Simultaneously, security and privacy were increasingly in the spotlight.

“I’m a very private person, and so privacy is near and dear to my heart,” she says. “It’s an important right that a lot of people don’t seem interested in exercising, but it’s still a right. Even if no one voted we would still agree that it was important for people to be able to vote.”

It was during her undergraduate years at Brown, which included a fifth-year Masters degree, that she made the transition from mathematics to cryptography and began studying computer science. She went on to do her PhD at the University of California at San Diego. Her appointment at UCL, which is shared between the Department of Computer Science and the Department of Crime Science, is her first job.

Probably her best-known work is A Fistful of Bitcoins: Characterizing Payments Among Men with No Names (PDF), written with Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage and presented at USENIX 2013, which studied the question of how much anonymity bitcoin really provides.

“The main thing I was trying to focus on in that paper is what bitcoin is used for,” she says. The work began with buying some bitcoin (in 2012, at about £3 each), and performing some transactions with them over a period of months. Using the data collected this way allowed her to uncover some “ground truth” data.

“We developed these clustering techniques to get down to single users and owners.” The result was that they could identify which addresses belonged to which exchanges and enabled them to get a view of what was going on in the network. “So we could say this many bitcoins passed through this exchange per month, or how many were going to underground services like Silk Road.”

Continue reading Sarah Meiklejohn – Security and Cryptography

Category errors in (information) security: how logic can help

(Information) security can, pretty strongly arguably, be defined as being the process by which it is ensured that just the right agents have just the right access to just the right (information) resources at just the right time. Of course, one can refine this rather pithy definition somewhat, and apply tailored versions of it to one’s favourite applications and scenarios.

A convenient taxonomy for information security is determined by the concepts of confidentiality, integrity, and availability, or CIA; informally:

the property that just the right agents have access to specified information or systems;
the property that specified information or systems are as they should be;
the property that specified information or systems can be accessed or used when required.

Alternatives to confidentiality, integrity, and availability are sensitivity and criticality, in which sensitivity amounts to confidentiality together with some aspects of integrity and criticality amounts to availability together with some aspects of integrity.

But the key point about these categories of phenomena is that they are declarative; that is, they provide a statement of what is required. For example, that all documents marked ‘company private’ be accessible only to the company’s employees (confidentiality), or that all passengers on the aircraft be free of weapons (integrity), or that the company’s servers be up and running 99.99% of the time (availability).

It’s all very well stating, declaratively, one’s security objectives, but how are they to be achieved? Declarative concepts should not be confused with operational concepts; that is, ones that describe how something is done. For example, passwords and encryption are used to ensure that documents remain confidential, or security searches ensure that passengers do not carry weapons onto an aircraft, or RAID servers are employed to ensure adequate system availability. So, along with each declarative aim there is a collection of operational tools that can be used to achieve it.

Continue reading Category errors in (information) security: how logic can help