*WINNER* NELMA: Never-Ending Learner for Malware Analysis

  • Joe Bivens
  • Moumita Kamal


The availability of potentially malicious software continues to outpace the security industry’s capacity for manual analysis. Machine learning models can provide highly accurate malware classification, but often rely on small datasets labeled by humans. Additionally, obfuscation techniques widely employed by malware authors may conceal some malicious attributes from analysis. Inspired by NELL (Never-Ending Language Learner), NELMA: Never-Ending Learner for Malware Analysis — aims to provide a continually learning system that is primarily self-sufficient (requiring little human interaction) and integrates knowledge from multiple learning agents that utilize different feature representations. NELMA is implemented as a flexible, modular, and extensible testbed for use by researchers, supporting future work in machine learning, malware analysis, data visualization, and software engineering. In its reference configuration, NELMA processes potentially malicious Android applications (APKs), is primarily implemented in Python (supporting popular libraries such as androguard and scikit-learn), and uses Elasticsearch as a backend database.

Computer Science