Program obfuscation

I had the pleasure of visiting the Simons Institute for the Theory of Computing at UC Berkeley over the Summer. One of the main themes of the programme was obfuscation.

Recently there has been a lot of exciting research on developing cryptographic techniques for program obfuscation. Obfuscation is of course not a new thing, you may already be familiar with the obfuscated C contest. But the hope underlying this research effort is to replace ad hoc obfuscation, which may or may not be possible to reverse engineer, with general techniques that can be applied to obfuscate any program and that satisfy a rigorous definition of what it means for a program to be obfuscated.

Even defining what it means for a program to be obfuscated is not trivial. Ideally, we would like an obfuscator to be a compiler that turns any computer program into a virtual black-box. By a virtual black-box we mean that the obfuscated program should preserve functionality while not leaking any other information about the original code, i.e., it should have the same input-output behaviour as the original program and you should not be able to learn anything beyond what you could learn by executing the original program. Unfortunately it turns out that this is too ambitious a goal: there are special programs, which are impossible to virtual black-box obfuscate.

Instead cryptographers have been working towards developing something called indistinguishability obfuscation. Here the goal is that given two functionally equivalent programs P1 and P2 of the same size and an obfuscation of one of them O(Pi), it should not be possible to tell which one has been obfuscated. Interestingly, even though this is a weaker notion than virtual black-box obfuscation it has already found many applications in the construction of cryptographic schemes. Furthermore, it has been proved that indistinguishability obfuscation is the best possible obfuscation in the sense that any information leaked by an obfuscated program is also leaked by any other program computing the same functionality.

So, when will you see obfuscation algorithms that you can use to obfuscate your code? Well, current obfuscation algorithms have horrible efficiency and are far from practical applicability so there is still a lot of research to be done on improving them. Moreover, current obfuscation proposals are based on something called graded encoding schemes. At the moment, there is a tug of war going on between cryptographers proposing graded encoding schemes and cryptanalysts breaking them. Breaking graded encoding schemes does not necessarily break the obfuscation algorithms building on them but it is fair to say that right now the situation is a mess. Which from a researcher’s perspective makes the field very exciting since there is still a lot to discover!

If you want to learn more about obfuscation I recommend the watching some of the excellent talks from the programme.