Matrix Factorization for Inferring Associations and Missing Links

Abstract

Missing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.

Publication
Under review in IEEE Access journal, 2026

Keywords:

AI reasoning, missing links prediction, network analysis, matrix factorization, Boolean, data completion

Citation:

Maksim E. Eren, Juston S. Moore, Erik Skau, Elisabeth Moore, Manish Bhattarai, Gopinath Chennupati, and Boian S. Alexandrov. 2023. General-purpose Unsupervised Cyber Anomaly Detection via Non-negative Tensor Factorization. Digital Threats 4, 1, Article 6 (March 2023), 28 pages. https://doi.org/10.1145/3519602

BibTeX:

@article{Barron2025MatrixFF,
  title={Matrix Factorization for Inferring Associations and Missing Links},
  author={Ryan Barron and Maksim Ekin Eren and Duc Phuc Truong and Cynthia Matuszek and James Wendelberger and Mary Frances Dorn and Boian Alexandrov},
  journal={ArXiv},
  year={2025},
  volume={abs/2503.04680},
  url={https://api.semanticscholar.org/CorpusID:276813355}
}
Maksim E. Eren
Maksim E. Eren
Scientist

Maksim E. Eren is a Scientist at Los Alamos National Laboratory, specializing in machine learning and artificial intelligence for large-scale data science applications.