Apr 7, 2026

How We Built an AI Agent That Fixes CI/CD Pipeline Failures Automatically

A deep dive into how we built an autonomous AI agent that detects and fixes CI/CD pipeline failures without human intervention.

DevOps

Artificial Intelligence

Author

Deepanshu GoyalSenior Software Engineer - III

How We Built an AI Agent That Fixes CI/CD Pipeline Failures Automatically

Book a call

Table of Contents

Engineering teams spend between 15 and 25% of their development time responding to CI/CD pipeline failures. This figure represents hours that do not go toward product work, architecture, or anything a team ships. The cost compounds further when context-switching comes into the frame: Microsoft's Developer Productivity research found that each interruption to debug a build failure costs an average of 23 minutes of recovery time. Multiply that across a team and a sprint, and the number becomes an operational liability.

The pattern that makes this problem solvable is its predictability. Seventy-three percent of pipeline failures fall into automatable categories: type errors, broken imports, dependency conflicts, and test regressions. Google's SRE handbook advocates automating any repetitive operational task that scales linearly with growth. To solve this, we built a Stateful Agentic Remediation System—an autonomous agent designed to watch your pipelines and act the moment something breaks.

What the AI Agent Does

The system is a stateful agentic remediation system. When a CI/CD pipeline fails, it detects the failure, diagnoses the root cause using AI, generates a targeted code fix, and opens a pull request—all without requiring a developer to act. The fix is then validated against the same CI pipeline, running on GitHub runners, that surfaced the original failure.

If the fix does not pass after three attempts, the system escalates to the engineering team via Slack with full context: the original error, every attempted fix, and the agent's reasoning at each step. It is not a chatbot; it is an always-on agent that watches your pipelines and acts the moment something breaks.

Architecture Overview

The system runs as a distributed, event-driven architecture with three separated layers: Detection, Reasoning, and Orchestration. The entire codebase lives in an Nx monorepo containing:

A NestJS backend API that handles webhook intake and orchestration.
A BullMQ worker process that processes jobs asynchronously.
A Next.js frontend dashboard that provides visibility into every repair cycle.

The Tech Stack

The backend runs on NestJS with TypeScript at maximum strictness. Data persistence uses Drizzle ORM against PostgreSQL, extended with pgvector for embedding-based semantic search. Redis powers both the caching layer and the job queue. The AI layer routes through OpenRouter to Claude Sonnet 3.5, using LangChain.js for structured prompting and LangGraph for stateful agent execution.

How it Works: End-to-End

1. Detection

GitHub sends a webhook event to the controller on pipeline failure. All processing happens asynchronously via BullMQ.

2. Log Parsing

The agent strips noise (ANSI codes/timestamps) and isolates the specific TypeScript or build errors. It enriches these with source code snippets fetched directly from the GitHub commit.

3. Semantic Search

Every past fix is stored in PostgreSQL with vector embeddings. The system performs a similarity search to see if a similar problem was solved before, improving accuracy and reducing token usage.

4. AI Diagnosis

An error classifier categorizes the failure (e.g., syntax, dependency). The agent generates a structured JSON fix with a confidence score.

5. Fix & Validate

The agent commits changes and opens a PR. If the pipeline passes, it’s ready for review. If it fails, the agent captures the new logs and retries with an adjusted strategy (capped at three attempts).

Safety and Security The system operates on the principle of least privilege:

Write access is restricted to temporary branches; no direct access to main.
It never auto-merges; a human reviewer must approve every PR.
Loop prevention ensures the agent never attempts to fix its own generated branches.

The Dashboard

The Next.js frontend provides a single visibility layer for the entire system. On landing, it displays all connected repositories. Drilling into a repository reveals its branches; drilling into a branch shows individual commits with their pipeline statuses, passed, failed, in progress, or under repair. For each pipeline run, the dashboard shows the exact changes the agent made. Engineering teams gain full transparency without switching between tools or parsing logs.

Results

Metric	Without an AI Agent	With an AI Agent
Mean Time to Recovery	30 – 60 minutes	3 minutes
Cost per Incident	$150 (developer time)	$0.05 (tokens)
Developer Interruptions	High	None
Night / Weekend Failures	Block releases	Auto-resolved

What Comes Next

The roadmap addresses several key areas: converting the system into a platform any team can adopt with one click, real-time pipeline status surfacing, cross-repository learning, and multi-language support (Python, Go, Java, Rust).

The goal of this project was to return the hours they spend on routine build failures so they can concentrate on what matters: building software that ships.

SHARE ON

More from the engineering frontline.

Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

Build vs Buy: Choosing the Right AI Strategy for Insurance Companies

Article

May 15, 2026

Build vs Buy: Choosing the Right AI Strategy for Insurance Companies

Build or buy AI for insurance? Learn how to avoid vendor lock-in, lower AI operating costs, and build scalable, compliant insurance platforms.

Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing

Article

May 15, 2026

Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing

Build production-ready RCM platforms for denial prevention, coding accuracy, smarter billing, compliance, and scalable healthcare AI revenue operations.

Why AI Insurance Projects Fail in Production

Article

May 15, 2026

Why AI Insurance Projects Fail in Production

Why do most AI insurance projects fail in production? Discover the hidden architectural, compliance, and scaling gaps behind failed AI deployments.

A 50-Point Production Readiness Checklist for AI-Generated Products

Article

May 14, 2026

A 50-Point Production Readiness Checklist for AI-Generated Products

This 50-point AI production readiness checklist helps engineering leaders determine whether an AI-generated prototype is ready for enterprise production, or whether it needs to be hardened, refactored, or rebuilt before launch. It covers five pillars: architecture, model and data readiness, observability, security and compliance, and product and business readiness.

From MVP to Scale: Designing Architecture for AI-First Products

Article

May 11, 2026

From MVP to Scale: Designing Architecture for AI-First Products

A panel of architects and engineering leaders at thegeekconf mini 2026 discuss how to build and scale AI-first products — from MVP decisions to production-level challenges. The conversation covers data quality, model selection, security, token economics, and the mindset teams need to navigate a fast-moving AI landscape.

The AI native Enterprise Evolution | Saurabh Sahu

Article

May 7, 2026

The AI native Enterprise Evolution | Saurabh Sahu

Explore Saurabh Sahu’s insights on AI-native enterprise, AI gateways, model governance, agentic SDLC, and workspace.build for scalable AI adoption from thegeekconf mini 2026.

Scroll for more

View all articles

How We Built an AI Agent That Fixes CI/CD Pipeline Failures Automatically

What the AI Agent Does

Architecture Overview

The Tech Stack

How it Works: End-to-End

1. Detection

2. Log Parsing

3. Semantic Search

4. AI Diagnosis

5. Fix & Validate

The Dashboard

Results

What Comes Next

More from the engineering frontline.

Build vs Buy: Choosing the Right AI Strategy for Insurance Companies

Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing

Why AI Insurance Projects Fail in Production

A 50-Point Production Readiness Checklist for AI-Generated Products

From MVP to Scale: Designing Architecture for AI-First Products

The AI native Enterprise Evolution | Saurabh Sahu

The Right Conversation Can Save You Six Months.