Retrieval-Augmented Generation

Automated Vulnerability Validation and Verification: A Large Language Model Approach

By Alireza Lotfi, Charalampos Katsis, Elisa Bertino

DOI https://doi.org/10.48550/arXiv.2509.24037

Abstract

Software vulnerabilities remain a critical security challenge, providing entry points for attackers into enterprise networks. Despite advances in security practices, the lack of high-quality datasets capturing diverse exploit behavior limits effective vulnerability assessment and mitigation. This paper introduces an end-to-end multi-step pipeline leveraging generative AI, specifically large language models (LLMs), to address the challenges of orchestrating and reproducing attacks to known software vulnerabilities. Our approach extracts information from CVE disclosures in the National Vulnerability Database,augments it with external public knowledge (e.g., threat advisories, code snippets) using Retrieval-Augmented Generation (RAG), and automates the creation of containerized environments and exploit code for each vulnerability. The pipeline iteratively refines generated artifacts, validates attack success with test cases, and supports complex multi-container setups. Our methodology overcomes key obstacles, including noisy and incomplete vulnerability descriptions, by integrating LLMs and RAG to fill information gaps. We demonstrate the effectiveness of our pipeline across different vulnerability types, such as memory overflows, denial of service, and remote code execution,spanning diverse programming languages, libraries and years. In doing so, we uncover significant inconsistencies in CVE descriptions, emphasizing the need for more rigorous verification in the CVE disclosure process. Our approach is model-agnostic, working across multiple LLMs, and we open-source the artifacts to enable reproducibility and accelerate security research. To the best of our knowledge, this is the first system to systematically orchestrate and exploit known vulnerabilities in containerized environments by combining general-purpose LLM reasoning with CVE data and RAG-based context enrichment.

Introduction

Software vulnerabilities have been exploited in high-profile cyberattacks, leading to significant security breaches. For instance, the Clop Ransomware Attack and issues in the Ivanti VPN have highlighted how easily attackers can capitalize on unaddressed vulnerabilities. Despite many vulnerabilities being disclosed monthly, effectively assessing their potential for exploitation is hampered by a lack of comprehensive information on how these vulnerabilities behave. Thus, the paper outlines a solution centered around automating the reproduction of attacks on software vulnerabilities to deepen understanding and improve defenses.

Problem Scope

The authors focus on creating automated methods for reproducing known vulnerabilities (CVE entries). Their pipeline aims to:

Generate containerized environments to safely execute attacks.
Automate the setup of these environments, including necessary software components.
Create exploitation code for actual attack execution.

Challenges

Several challenges obstruct the progress in this area:

Vulnerability descriptions are often unclear and inconsistent.
Disclosures frequently lack details on how exploits function.
There is a general scarcity of public exploit code available for many vulnerabilities.

Proposed Approach

The proposed approach leverages LLMs in a structured multi-step pipeline to analyze CVE disclosures, extract critical information, and generate exploitable environments and code. The methodology also incorporates an iterative refinement process that improves the generated artifacts based on results from previous attempts. The system operates in containerized environments to ensure safe and reproducible testing.

Key Findings

Pipeline Effectiveness: The pipeline was tested on 102 CVEs spanning multiple programming languages and libraries, successfully reproducing 71 (approximately 70%) of them. This includes vulnerabilities that had no public proofs of concept available.
Issues with CVE Descriptions: The study highlights substantial inconsistencies within the descriptions provided by the CVE. For successful attack reproductions, the quality of the information in these disclosures is critical. It indicates that better and more standardized reporting would benefit security researchers and practitioners.
Integration of External Knowledge: By implementing Retrieval-Augmented Generation (RAG), the pipeline enriches its understanding beyond the raw CVE data, enhancing the context from which attack vectors can be derived.
Containerization: The use of Docker containers allows for the creation of isolated environments needed to test vulnerabilities without the risks associated with running tests on live systems. This reduces the complexity of reproducing multi-step attacks.
Open-Source Contribution: The authors have made their pipeline and generated artifacts openly available to encourage further research and reproducibility in vulnerability exploitation studies.

Conclusion

The paper concludes that their novel pipeline successfully addresses many challenges posed by software vulnerabilities while providing a robust framework for automating vulnerability validation and verification. The findings underline the importance of improving the quality of CVE disclosures and suggest areas for future work that could expand the pipeline’s capabilities into more complex scenarios, such as multi-step attacks.

Future directions include better integration of concrete attack information to enhance CVE reports and the pursuit of tailored exploitations for proprietary systems. The study emphasizes that a combination of increased rigor in vulnerability documentation and refined detection methods can significantly bolster software security efforts.

Harnessing LLMs for Automated Vulnerability Validation: A New Era in Cybersecurity