Detecting Domain Generating Algorithm based Malicious Domains using Bigram and Word2Vec Model


  • Steven Smith
  • Md Ahsan Ayub


Domain Generating Algorithms (DGAs) are used to facilitate server communication for various types of cyber-attacks, primarily botnets. These algorithms are used to evade defensive methods to block malicious communication such as blacklisting. Using DGAs allows botmasters and malware owners to easily communicate with their victim machines to facilitate attacks without the fear of their command and control (C&C) server. Attackers can predominantly change the domain used for their C&C server from one to another in short time intervals as DGA has got tremendous capability of generating millions of pseudo-random domains. Network administrators find it difficult to detect DGA through network logs due to the flood of DNS queries for non-existent domains (NXDomain). Therefore, automatic detection of DGA created domains is a critical countermeasure to prevent associated malicious botnet communication. In our research, we propose a comprehensive solution to detect such DGA-based domains used in malware communication. Two distinct feature extraction methods, the Bigram model and the Word2Vec model, are applied for text processing. We apply machine learning and deep learning techniques on the largest reported dataset of 84 different traditional and dictionary-based DGA. Our results demonstrate exceptional success in both binary classification, classifying a given domain as benign or malicious, and multi-class classification, identifying the specific malware family that produced a domain.






Computer Science