Botnets are a network of malicious software which is headed by a botmaster from where they get all the instructions to execute. In this paper, a novel technique is proposed to infer malware network specifications given a sample of malware binary which is a file consisting of instructions. (bin file is with .bin execution). Botmasters use Command-and-Control protocols to control malware-infected hosts for executing malicious activities. Every family of malware has their own set of instructions called fingerprint which they follow to execute malicious activities. In the paper, they have tried to extract the fingerprint which acts as a unique identity for the malware family. They have employed reverse engineering in which they explore the messages and infer all the fields in it. But before applying the inference algorithm on the messages, they need to be decrypted since they are encrypted. For the purpose, they have proposed a system to extract C&C encryption keys by applying the dynamic analysis of the malware binary. So basically they’re trying to extract the protocol specification from samples of malware communication binary and also decrypt it before applying the inference algorithm
Since botnets can attack your computer and infect it through malicious activities, there is a need to devise a system which can detect suspicious activity and, through a series of steps, reverse the process. There is much work already done in the botnets domain, but the state-of-the-art techniques do not cater for the encrypted botnet protocols i.e. while communicating, botmaster and bots can send messages to one another in encrypted form. In order to unveil such C&C protocols, firstly, the messages should be captured and decrypted through the reverse-engineering protocols and type inference information. In order to infer the information of encryption, the state-of-the-art techniques are enhanced to do binary analysis.
The key way to solve the issue is through reverse-engineering. The first part of reverse engineering comprises of the decryption of the traffic using dynamic analysis so that the keys can be extracted from the malware binary. Then the second part consists of the automatic derivation of the specifications of protocol by using type inference information over the traffic that has been decrypted. There are different types of malware families, each having their own signatures and protocols. Message format used is that of a malware family, ZeroAccess. By message, payload (executable file) is meant that is downloaded over the infected computer. Once the messages are accessed, encryption analysis is performed that filters out the candidates that may be behaving like encryption functions. After analyzing the output and input of network system calls, candidates are further filtered out. Then the static or derived nature of the encryption key so that decryption can be performed. Once the decryption protocol is learned, any message from the family of ZeroAccess can be decrypted.
Once the encryption analysis is done, next step is protocol analysis. Different message types serve different purposes. For the sake of easiness, clustering of messages can be performed based on message type. The message is then split into content, non-content, and magic fields. Magic field is the field which holds constant value for all the messages. Some other field types are also extracted like EXE field, dependent fields, composite field types etc. Then by using sequence alignment, each field is reconciled to form a single specification of the protocol.
The biggest motivation behind doing all the above steps i.e. decryption of messages, and extracting protocol specification is that the network signatures can be generated.
The overall proposition stated in the paper presents a good solution to infer the protocol of C&C being used in the malicious network of botnets. It also helps to alleviate the task of manually understanding the malicious communications by providing detailed specifications of the protocol.