Maksim E. Eren

Scientist

Los Alamos National Laboratory

Biography

Maksim E. Eren is an early career scientist in the Information Systems and Modeling (A-1) group at Los Alamos National Laboratory (LANL) and a LANL Center for National Security and International Studies (CNSIS) Fellow. He is an alumnus of the Scholarship for Service CyberCorps program. Maksim graduated Summa Cum Laude with a Bachelor’s degree in Computer Science from the University of Maryland Baltimore County (UMBC) in 2020 and earned his Master’s degree from the same institution in 2022. In 2024, he received his Ph.D. from UMBC, focusing on tensor decomposition methods for malware characterization.

Maksim’s research interests span an interdisciplinary set of topics in artificial intelligence (AI) and applied data science. He is particularly interested in leveraging AI to address challenges across diverse domains, including biology and cybersecurity. Maksim’s work in AI and data science include tensor decomposition, pattern extraction, natural language processing (NLP), malware characterization, anomaly detection, text mining, large language models (LLMs), knowledge graphs (KGs), high-performance computing (HPC), and data privacy. In addition to research, Maksim actively develops high-performance software and efficient machine learning (ML) pipelines optimized for extra-large datasets and real-world applications. At LANL, Maksim was a member of the 2021 R&D 100 winning project SmartTensors AI, where he has released a fast tensor decomposition and anomaly detection software, contributed to the design and development of various other tensor decomposition libraries, and developed state-of-the-art text mining tools.

CV
Portfolio Headshot

Interests

Artificial Intelligence
Data Science
Tensor Decomposition
Cybersecurity
Natural Language Processing
High Performance Computing
Knowledge Representation
Pattern Extraction

Education

PhD in Computer Science, 2024
University of Maryland, Baltimore County (UMBC)
MS in Computer Science, 2022
University of Maryland, Baltimore County (UMBC)
BS in Computer Science, 2020
University of Maryland, Baltimore County (UMBC)
AA in Computer Science, 2018
Montgomery College (MC)

Featured Publications

Afia Anjum, Maksim E. Eren, Ismael Boureima, Boian S. Alexandrov, Manish Bhattarai

August 2024 In IEEE Conference on Machine Learning and Applications (ICMLA 2024) with Best Paper Award, 2024

Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.

DOI Preprint PDF Details

Maksim E. Eren, Manish Bhattarai, Robert J. Joyce, Edward Raff, Charles Nicholas, Boian S. Alexandrov

September 2023 In ACM Transactions on Privacy and Security (TOPS) journal, 2023

Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection

Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this paper, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier, that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.

DOI Preprint PDF Details

Maksim E. Eren, Juston S. Moore, Erik Skau, Elisabeth Moore, Manish Bhattarai, Gopinath Chennupati, Boian S. Alexandrov

February 2022 In ACM Digital Threats Research and Practice (DTRAP) Journal, 2022

General-Purpose Unsupervised Cyber Anomaly Detection via Non-Negative Tensor Factorization

Distinguishing malicious anomalous activities from unusual but benign activities is a fundamental challenge for cyber defenders. Prior studies have shown that statistical user behavior analysis yields accurate detections by learning behavior profiles from observed user activity. These unsupervised models are able to generalize to unseen types of attacks by detecting deviations from normal behavior, without knowledge of specific attack signatures. However, approaches proposed to date based on probabilistic matrix factorization are limited by the information conveyed in a two-dimensional space. Non-negative tensor factorization, on the other hand, is a powerful unsupervised machine learning method that naturally models multi-dimensional data, capturing complex and multi-faceted details of behavior profiles. Our new unsupervised statistical anomaly detection methodology matches or surpasses state-of-the-art supervised learning baselines across several challenging and diverse cyber application areas, including detection of compromised user credentials, botnets, spam e-mails, and fraudulent credit card transactions.

DOI Preprint PDF Code Details

News

Using AI to develop enhanced cybersecurity measures

New research helps identify an unprecedented number of malware families

Last updated on Apr 14, 2025

Using AI to develop enhanced cybersecurity measures

Not too big - Machine learning tames huge datasets

Using the Summit supercomputer, Los Alamos algorithm breaks the exabyte barrier.

Last updated on Apr 14, 2025

Not too big - Machine learning tames huge datasets

Our paper that sets a new world record

A new world record by simultaneously classifying an unprecedented number of malware families under extreme class imbalance, surpassing prior work by a factor of 29

Last updated on Sep 27, 2023

R&D 100 winner of the day - SmartTensors AI Platform

The SmartTensors AI Platform, developed at Los Alamos National Laboratory, is a scalable, unsupervised machine-learning software suite capable of identifying, extracting essential hidden features, and efficiently compressing information in massive datasets.

Last updated on Sep 28, 2023

R&D 100 winner of the day - SmartTensors AI Platform

Computer scientists build new tool to fight coronavirus

There are over 57,000 research papers that could help fight COVID-19. A text mining tool is helping scientists navigate the data.

Last updated on Sep 28, 2023

Computer scientists build new tool to fight coronavirus

Recent Publications

Quickly discover relevant content by filtering publications.

Ryan Barron, Maksim E. Eren, Olga M. Serafimova, Cynthia Matuszek, Boian S. Alexandrov (2025). Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization. In 20th International Conference on Artificial Intelligence and Law (ICAIL), 2025.

Preprint PDF Details

Manish Bhattarai, Ryan Barron, Maksim E. Eren, Minh Vu, Vesselin Grantcharov, Ismael Boureima, Valentin Stanev, Cynthia Matuszek, Vladimir Valtchinov, Kim Rasmussen, Boian Alexandrov (2025). HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning. In 13th International Conference on Learning Representations, Workshop on Scaling Self-Improving Foundation Models without Human Supervision (ICLR 2025 SSI-FM).

Preprint PDF Details

Maksim E. Eren, Boian S. Alexandrov, Charles Nicholas (2024). Classifying Malware Using Tensor Decomposition. Chapter in Springer Nature book Malware; Handbook of Prevention and Detection, 2024.

DOI Preprint PDF Details

Ryan Barron, Vesselin Grantcharov, Selma Wanna, Maksim E. Eren, Manish Bhattarai, Nicholas Solovyev, George Tompkins, Charles Nicholas, Kim O. Rasmussen, Cynthia Matuszek, Boian S. Alexandrov (2024). Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization. In IEEE Conference on Machine Learning and Applications, Special Session on Machine Learning for Natural Language Processing (ICMLA 2024).

DOI Preprint PDF Details

Maksim E. Eren (2024). Advanced Semi-supervised Tensor Decomposition Methods for Malware Characterization. Ph.D. Dissertation in Computer Science at the University of Maryland, Baltimore County Department of Computer Science and Electrical Engineering.

Dissertation PDF Code Details

Afia Anjum, Maksim E. Eren, Ismael Boureima, Boian S. Alexandrov, Manish Bhattarai (2024). Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs. In IEEE Conference on Machine Learning and Applications (ICMLA 2024) with Best Paper Award, 2024.

DOI Preprint PDF Details

Ryan Barron, Maksim E. Eren, Manish Bhattarai, Ismael Boureima, Cynthia Matuszek, Boian S. Alexandrov (2024). Binary Bleed: Fast Distributed and Parallel Method for Automatic Model Selection. In the IEEE High Performance Extreme Computing (HPEC) Conference, 2024.

DOI Preprint PDF Details

Selma Wanna, Nick Solovyev, Ryan Barron, Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov (2024). TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs. In ACM Symposium on Document Engineering 2024 (DocEng ’24), 2024.

DOI Preprint PDF Details

Ryan Barron, Maksim E. Eren, Manish Bhattarai, Selma Wanna, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas, Cynthia Matuszek (2024). Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization. In IEEE 12th International Symposium on Digital Forensics and Security (ISDFS), 2024.

DOI Preprint PDF Details

Maksim E. Eren, Ryan Barron, Manish Bhattarai, Selma Wanna, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas (2024). Catch'em all: Classification of Rare, Prominent, and Novel Malware Families. In IEEE 12th International Symposium on Digital Forensics and Security (ISDFS), 2024.

DOI Preprint PDF Details

See all publications

Software

lanl/T-ELF

Tensor Extraction of Latent Features (T-ELF) is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets.

pyCP_ALS

pyCP_ALS is the Python implementation of CP-ALS algorithm that was originally introduced in the MATLAB Tensor Toolbox.

RFoT

Random Forest of Tensors (RFoT) is a novel ensemble semi-supervised classification algorithm based on tensor decomposition. We show the capabilities of RFoT when classifying Windows Portable Executable (PE) malware and benign-ware.

lanl/pyDNTNK

pyDNTNK is a software package for applying non-negative Hierarchical Tensor decompositions such as Tensor train and Hierarchical Tucker decompositons in a distributed fashion to large datasets. It is built on top of pyDNMFk.

lanl/pyQBTNs

pyQBTNs is a Python library for boolean matrix and tensor factorization using D-Wave quantum annealers.

lanl/pyCP_APR

pyCP_APR is a Python library for tensor decomposition and anomaly detection that is developed as part of the R&D 100 award wining SmartTensors project. It is designed for the fast analysis of large datasets by accelerating computation speed using GPUs.

lanl/pyDNMFk

pyDNMFk is a software package for applying non-negative matrix factorization in a distributed fashion to large datasets. It has the ability to minimize the difference between reconstructed data and the original data through various norms (Frobenious, KL-divergence).

lanl/pyDRESCALk

pyDRESCALk is a software package for applying non-negative RESCAL decomposition in a distributed fashion to large datasets. It can be utilized for decomposing relational datasets.