To ensure platform independence, mobile programs are distributed in forms that are isomorphic to the original source code. Such codes are easy to decompile, and therefor they increase the risk of malicious reverse engineering attacks.
Several methods have been proposed to alleviate this situation. The highest level of protection is achieved with cryptographic solutions, but, unfortunately, this requires dedicated hardware with integrated decryption and execution units.
A more modest level of protection is achieved through obfuscation. An obfuscator is a tool which - through the application of code transformations - converts a program into an equivalent one that is more difficult to reverse engineer. The advantage of this method is that it runs on standard hardware and without any changes to virtual machines or available interpreters.
Obfuscation means making difficult to perceive or understand. Code obfuscation in programming world means making code harder to understand or read, generally for privacy or security purposes. There are applications where obscurity can provide a higher level of protection to its source code.
Simple obfuscators could just change the names of private variables and methods, while more complex ones can change even public names to be unintelligible, changing references to those names from different assemblies. Others can make the code flow harder to follow, and some obfuscators can confuse some decompilers enough to stop them from being able to produce any code at all. Others will even encrypt the code, only decrypting it at runtime.
Several different obfuscators should be tried before settling on one, and check the results of decompilation with a few different tools if possible. Once settled on an obfuscator, it is advised to use it all the time carrying out as much testing as possible on the obfuscated version rather than the "clear" version, as obfuscation can raise some subtle problems.
Obfuscation tools mangle symbols and rearrange code blocks to foil decompiling. They also may encrypt strings containing sensitive data. Obfuscators can't completely protect the intellectual property. Because the code is on the client machine, a really determined hacker with lots of time can study the code and data structures enough to understand what's going on.
Obfuscators do provide value in raising the bar, however, defeating most decompiler tools and preventing the casual hacker from stealing the intellectual property. They can make the code as difficult to reverse engineer as optimized native code.
Obfuscating the code may make debugging more difficult or impossible. Many of the third-party obfuscators have features that help with debugging, such as keeping a mapping file from obfuscated symbol names to original symbol names.
Defining code obfuscation is difficult; it's different from encryption or sheer mangling of code. Code obfuscation is the generation of code, which is still understandable by the compiler, but is very difficult for humans to comprehend. From a computer science point of view it's merely a translation. Computer scientists and software developers consider it to be a one-way translation but with proper code changes and logging. Professional obfuscation software exists which can un-obfuscate or even help by re-obfuscating.
There are various algorithms used for code obfuscation, providing various degrees of transformation and protection against potential reverse engineering.
Renaming metadata to gibberish or less obvious identifiers is one of several defense mechanisms. Similarly, some of the obfuscation techniques include removing nonessential metadata, control flow obfuscation, string encryption, incremental obfuscation, and size reduction - all different ways to make decompilation and disassembling produce incomprehensible output.
Some generic techniques used by other obfuscation utilities include reordering the instantiation and methods, manipulating inheritance relationships, variable scope modification, mapping unboxed scalars to corresponding object types, etc. However, it.s very important to keep into consideration that obfuscation shouldn't end up changing a program's logic or flow, as its purpose is to protect and not deform.
There are three general methods for protecting source code:
Code authentication and verification - meant to protect against unauthorized tampering and unauthorized access to the code. This method is most efficient when authentication data is sent via the network. User has the complete code, which in theory can also be in mangled form.
Server side invocation - provides protection by restricting the distribution of the code. This method allows avoiding sending of the final code to the user. A fundamental requirement for this method is high bandwidth.
Code obfuscation . the code may need to be distributed to several entities and needs to be protected against reverse engineering or copying. It involves transformation of executable code to make it hard for tools like decompilers.
With obfuscated code, information accessed by third-party is garbled or hidden, and generally harder to understand. And if anyone wants to crack the obfuscated code, just like hash functions, it will require significantly more processing to de-obfuscate than what was required to obfuscate it.
Code obfuscation can be achieved through one or more of the following methods:
Source or binary structure obfuscation - A source code obfuscator accepts a program source file, and generates another functionally equivalent source file, which is much harder to understand or reverse-engineer. This is useful for technical protection of intellectual property when source code must be delivered for public execution purposes.
Data Obfuscation - This is aimed at obscuring data and data structures. Techniques used in this method range from splitting variables, promoting scalars to objects, converting static data to procedure, change the encoding, changing the variable lifetime, etc.
Control Flow Obfuscation - This aims at changing the control hierarchy with logic preservation. Here false conditional statements and other misleading constructs are introduced to confuse decompilers, but the logic of the code remains intact.
Preventive Obfuscation - The focus is on protection against decompilers and reverse engineering methods. Renaming metadata to gibberish or less obvious identifiers is one such technique.
Obfuscation methods can also be classified depending on the information they target. Some simple transformations target the lexical structure of the program while others target the data structures or the control flow. Obfuscation methods are further classified based on the kind of operation they perform on the targeted information. Some methods manipulate the aggregation of control or data, while others affect the ordering.
The different obfuscation methods are:
Layout obfuscation - Targets the layout of the application, such as source code formatting, variable names and comments.
Data obfuscation - Targets the data structures used by the program.
Storage obfuscation - Alters how data is stored in memory.
Encoding obfuscation - Alters how stored data is interpreted.
Aggregation obfuscation - Alters how data is grouped together.
Ordering obfuscation - Alters how data is ordered.
Control obfuscation - Targets the control flow of the program.
Aggregation obfuscation: Alters how statements are grouped together.
Ordering obfuscation - Alters the order in which statements are executed.
Computation obfuscation - Alters the control flow in a program.
Preventive transformation - The main goal of this method is not to obscure the code but to make it more difficult to break for the deobfuscators.
Targeted - Tries to make automatic deobfuscation techniques more difficult.
Inherent - Tries to exploit known weaknesses in deobfuscators.
Parameters for evaluating quality of an obfuscation method
To study obfuscation methods in detail it should be possible to evaluate the quality of the transformation. The quality of an obfuscation method is determined by the combination of its potency, resilience, stealth and cost.
Potency: Potency defines to what degree the transformed code is more obscure than the original. Software complexity metrics define various complexity measures for software, such as number of predicates it contains, depth of its inheritance tree, nesting levels, etc. While the goal of good software design is to minimize complexity based on these parameters, the goal of obfuscation is to maximize it.
Resilience: Resilience defines how well the transformed code can resist automated deobfuscation attacks. It is a combination of the programmer effort to create a deobfuscator and the time and space required by the deobfuscator. The highest degree of resilience is a one-way transformation that cannot be undone by a deobfuscator. An example is when the obfuscation removes information such as source code formatting.
Stealth: Stealth defines how well the obfuscated code blends with the rest of the program. If the transformation introduces code that stands out from the rest of the program, it may be difficult for a deobfuscator to spot, but it can easily be spotted by a reverse engineer. Stealth is context-sensitive; what is stealthy in one program may not be in another.
Cost: Cost is the execution time and space overhead in the obfuscated code compared to the original code. A transformation with no cost associated is free. Cost is also context-sensitive.
Layout obfuscation refers to altering the formatting of the source file. This involves removing source code comments, removing debug information and changing the names of elements such as the class, member variables, and the local variable.
Source code comment removal and formatting removal are free transformations, since there is no increase in space and time from the original application. The potency is low because there is very little semantic content in formatting. It is a one-way transformation because the formatting, once removed, cannot be recovered. Scrambling of variable names is also a one-way and free transformation, but it has much higher potency than formatting removal
There are many commercial tools and some open source tools available in the market for achieving code obfuscation. Code obfuscation introduces greater overhead. Unless the transform is optimized, obfuscated code runs slower in general than normal source code and wrapped package can be larger in size too. These however may be the price to be paid for enhanced protection of the source code.