Botnets software Research Paper
Botnets are a network of malicious software which is headed by a botmaster from where they get all the instructions to execute. In this paper, a novel technique is proposed to infer malware network specifications given a sample of malware binary which is a file consisting of instructions. (bin file is with .bin execution). Botmasters use Command-and-Control protocols to control malware-infected hosts for executing malicious activities. Every family of malware has their own set of instructions called fingerprint which they follow to execute malicious activities. In the paper, they have tried to extract the fingerprint which acts as a unique identity for the malware family. They have employed reverse engineering in which they explore the messages and infer all the fields in it. But before applying the inference algorithm on the messages, they need to be decrypted since they are encrypted. For the purpose, they have proposed a system to extract C&C encryption keys by applying the dynamic analysis of the malware binary. So basically they’re trying to extract the protocol specification from samples of malware communication binary and also decrypt it before applying the inference algorithm. So basically, the main two contributions of their work are:
1. A technique for rich malware protocol inference which requires only decrypted network traffic (you’ll obviously need to decrypt the encrypted traffic before applying inference algorithm)
2. Proposing an encryption algorithm which is used by the malware networks. This obviously is needed for the first step.
CRITICAL REVIEW OF PAST WORK
The main concept employed to achieve the above-defined steps is Reverse Engineering. In this field, quite much work has already been done but since it is a very vast field, I would divide it into three sub-categories and highlight the respective work done in that area. The first category is the one which generates the specifications of the protocol from the network. In 2005, C. Leita along with other members introduced ScriptGen  which observes the traffic over the network and then makes a state machine based over that which produces approximate responses to the protocol requests. W. Cui, in 2007, the proposed technique for the reverse engineering by inferring the message formats using pre-defined sets of field semantics. But it has been quite rare that the binary protocols are explored. Also, in the previous work, the encryption has not yet dealt with while the proposed approach in this paper does. Also, the protocol specifications that are inferred from this paper are quite detailed and include rich field types which are not pre-defined and generic.
The second group of reverse engineering is the one which deals with the technique used for analyzing the traces of execution. Those traces are run as a program and their behavior is analyzed about how they are used to communicate over a network. A lot of work has been done in this area. G. Wondracek, in 2008  proposed a technique through which one can generate the network specifications by running an algorithm during the processing of the message. This approach had its focus mostly on the server side which is a bit difficult to access in the world of malware. J. Caballero in 2009  attempts to work on the malware binary, He employed an automatic reverse engineering algorithm for C&C protocols through the dynamic analysis on the executable binary files containing malware. He called this Dispatcher. He analyzed the data coming from the network and derived the detailed description of the semantics in the message field by first identifying the function prototype of the data. Z. Wang in 2009  consider the encrypted protocols too where they employ encryption buffers fed with the plain-text information extracted from the incoming data. The proposed method in this paper is a little different from Dispatcher since the technique can learn and make generic encryption details through which all kinds of samples of the messages can be analyzed. Also, the paper also deals with signature generation and decryption of the messages being delivered online. Instead of ever-long algorithms and files, the proposed technique only address passive binary malware files which can also be avoided if the same malware binary is encountered twice.
The third type of the work is based on hybrid approaches. Many people have worked in this area as well highlighting the benefits of amalgamating several approaches. P. M. Comparetti in 2009  proposed Prospex which can infer the protocol of malware for the analysis of traffic being communicated over the internet and the traces of execution as well. But the biggest distinction between the proposed approach in this paper and Prospex is that the latter one cannot handle the encrypted data. Most of the communication of data being done over the internet is encrypted. It is a foremost step to first decrypt the message and then infers the protocols from them.
A lot of approaches have also been employed to generate the signatures or unique patterns which can be used to identify different families of malware. One of the closest work to the one proposed in this paper is of ProVex . C. Rossow the author of ProVex, proposed the technique to detect the malicious network of botnet through encrypted C&C channels. The encryption and decryption are also catered in it. It decrypts the packets through pre-defined algorithms for encryption. After that, it extracts the statistics about how the bytes are distributed in the payload. The difference with the proposed technique lies within the encryption keys. In ProVex, prior knowledge of encryption keys are required. In addition to this, the signatures generated are very probabilistic due to which they are more prone to getting false positive results. This issue is also catered in the proposed paper.
The overall proposition stated in the paper presents a good solution to infer the protocol of C&C being used in the malicious network of botnets. It also helps to alleviate the task of manually understanding the malicious communications by providing detailed specifications of the protocol.