Self-Healing Infrastructure: Autonomous LLM Agents for Real-Time Remediation of Configuration Drift and Security Misconfigurations in IaC Deployments
Harish Apuri¹, Madhan Mohan Reddy Chinthala², Shikher Goel³, Mukesh Aurangabadkar⁴, Charani Yepuri⁵

¹Harish Apuri, Department of IT, IT Induct Inc, Charlotte (NC), United States of America (USA).

²Madhan Mohan Reddy Chinthala, Department of IT, Franklin Info Tech, Charlotte (NC), United States of America (USA).

³Shikher Goel, Department of IT, JPMorgan Chase, Jersey (New Jersey), United States of America (USA).

⁴Mukesh Aurangabadkar, Department of IT, Spectrum, Denver (Colorado), Vanuatu.

⁵Charani Yepuri, Independent Researcher, Department of IT, Hyderabad (Telangana), India.

Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The application of Infrastructure as Code (IaC) has enhanced cloud environment scalability and automation, but configuration drift and security misconfigurations remain critical operational and security issues. Current drift detection and remediation solutions rely largely on reactive, rules-based, and human intervention; therefore, they are ineffective in dynamic, multi-cloud environments. This research aims to develop and deploy a self-healing infrastructure architecture that autonomously identifies and recovers from configuration drift and security misconfigurations in real time. The paper suggests the following to accomplish this: a new multi-agent architecture based on Large Language Models (LLMs), in which Drift detectors, security reasoners, root-cause analysers, remediation generators, and post-remediation validators operate within a closed-loop pipeline. To evaluate the framework, a publicly available IaC dataset (written in Terraform) of simulated drift situations is used. According to experimental results, the proposed LLM-agent system outperforms rule-based and semi automated systems, with a drift detection rate of 96.8, a security misconfiguration detection rate of 95.2, and a mean time to remediation of 6.9 minutes. The framework is also very effective in reducing false positives and manual intervention, as well as getting high policy compliance. Such findings affirm the usefulness of autonomous LLM agents in empowering proactive, intelligent and scalable self-healing infrastructure management in contemporary cloud systems.

Keywords: Self-Healing Infrastructure, Infrastructure as Code (IaC), Large Language Models (LLMs), Configuration Drift Remediation, Cloud Security Automation
Scope of the Article: Computer Science and Engineering

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US

D475715040426

Share this entry

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US