Catch'em all: Classification of Rare, Prominent, and Novel Malware Families

Abstract

National security is threatened by malware, which remains one of the most dangerous and costly cyber threats. As of last year, researchers reported 1.3 billion known malware specimens, motivating the use of data-driven machine learning (ML) methods for analysis. However, shortcomings in existing ML approaches hinder their mass adoption. These challenges include detection of novel malware and the ability to perform malware classification in the face of class imbalance; a situation where malware families are not equally represented in the data. Our work addresses these shortcomings with MalwareDNA; an advanced dimensionality reduction and feature extraction framework. We demonstrate stable task performance under class imbalance for the following tasks; malware family classification and novel malware detection with a trade-off in increased abstention or reject-option rate.

Publication
In IEEE 12th International Symposium on Digital Forensics and Security (ISDFS), 2024

Keywords:

non-negative matrix factorization, novel malware, semi-supervised learning, reject-option, class-imbalance

Citation:

M. E. Eren, R. Barron, M. Bhattarai, S. Wanna, N. Solovyev, K. Rasmussen, B. S. Alexandrov, and C. Nicholas, “Catch’em all: Classification of Rare, Prominent, and Novel Malware Families,” 2024 IEEE International Symposium on Digital Forensics and Security (ISDFS), 2024, pp. 1-6.

BibTeX:

@INPROCEEDINGS{erenISDFS2024,
  author={M. E. {Eren} and R. {Barron} and M. {Bhattarai} S. {Wanna} and N. {Solovyev} and K. {Rasmussen} and B. S. {Alexandrov} and C. {Nicholas}},
  booktitle={IEEE International Symposium on Digital Forensics and Security (ISDFS)}, 
  title={Catch'em all: Classification of Rare, Prominent, and Novel Malware Families}, 
  year={2024},
  volume={},
  number={},
  pages={1-6},
  doi={}}
Maksim E. Eren
Maksim E. Eren
Scientist

My research interests lie at the intersection of the machine learning and cybersecurity disciplines, with a concentration in tensor decomposition.