Agentic Ops

Self-Healing CI/CD with AI Agents

Part of a bigger experiment: how much of running a company can you loop through AI agents? This piece is about the CI/CD pipeline. When a build breaks, a coding agent reads the logs, writes a fix, and opens a PR. No human required.

The experiment

The bigger picture

I've been running Far Horizons as a one-person company for five years. Increasingly, the question I keep coming back to is: what if I could hand off the boring operational loops to AI agents? Not the interesting decisions, but the stuff that breaks at 2am and needs someone to look at logs and push a fix.

This is one piece of that. The self-healing pipeline. Most CI/CD pipelines end at "send a Slack notification." You read the alert, open the logs, figure out the issue, write a fix, push it, wait for CI again. That loop takes minutes to hours depending on how deep you are in something else when the notification comes in.

So I wired up a webhook that fires when post-deploy checks fail. That webhook triggers Claude Code, which has access to the GitHub Actions logs and the repo. It diagnoses the issue, writes a fix, and opens a PR with auto-merge. When CI passes, it merges. The loop closes itself.

It's not foolproof. It only works because I own the full stack and there's nobody else merging code at the same time. But when it works, it's pretty wild. Push, break, fix, merge, all while I'm making coffee.

How it works

1. Push triggers CI

GitHub Actions builds, tests, and deploys to Cloudflare Workers.

2. Post-deploy checks run

Health checks and Playwright E2E tests validate the deployment.

3. Failures fire a webhook

Any check failure sends the run URL to a coding agent.

4. Agent diagnoses and fixes

Claude Code reads the logs, identifies the issue, writes a fix, and opens a PR with auto-merge.

5. CI re-triggers

The merged PR starts the loop again. Build, deploy, validate.

Architecture

The pipeline

Hover over nodes to see details. Dashed lines show failure paths and planned integrations.

FAR HORIZONS LABS · EXPERIMENTAL

Self-Healing Pipeline

ci.yml · on: push · detect → diagnose → fix → deploy

happy path
failure path
agent action
loop back
planned
01 BUILD
02 DEPLOY
03 VALIDATE
04 ALERT
05 HEAL
FAILFAILplannedre-triggers on mergeCIBuild + Test3m 9s · 1m 4spassingCIDeploy Servicesmigrations · media · frontendsauto-deployCHECKHealth Checkpost-deploy · ~20swebhook on failCHECKPlaywright E2Epost-deploy · 4m 19swebhook on failFUTURERuntime AlertsSentry-style · plannednext upWARNFailure Webhookfires on any step failtriggerAGENTCoding Agentreads logs · diagnoses · writes fixdeployedAGENTPull Requestauto-merge on CI pass⚠ no auto-rebase yet↩ MERGE RE-TRIGGERS CI · auto-rebase not yet implemented
auto-merge fails if another commit lands first · only viable when you own the full stack farhorizons.io

Status

What's working today

Automated build & deploy

Push to main triggers GitHub Actions: build, test, deploy to Cloudflare Workers. Migrations, media, and frontends all go out in one pipeline.

Post-deploy validation

Health checks and Playwright E2E tests run after every deploy. If anything returns a non-200 or a test fails, a webhook fires.

Agent-driven fixes

The failure webhook triggers a coding agent (Claude Code) that reads the CI logs, diagnoses the issue, writes a fix, and opens a PR with auto-merge enabled.

Where it falls over

Current limitations

  • Auto-merge only works if nothing else merges first. The agent doesn't rebase yet
  • Only viable when you own the full stack. No shared repos, no external dependencies
  • Agent fixes are limited to what it can diagnose from logs. No runtime debugging yet
  • Human review is still recommended for non-trivial changes
  • Cost per agent invocation is non-zero. Needs monitoring at scale

What's next

next

Auto-rebase

Agent rebases its branch if CI fails due to merge conflicts

next

Runtime error integration

Sentry-style alerts trigger the same healing loop, not just CI failures

planned

Cost dashboard

Track agent invocations, token usage, and fix success rate

planned

Multi-repo orchestration

Coordinate fixes across frontend and backend repos

Interested?

This feeds into client work

The same approach works for error monitoring, content pipelines, and automated QA. If you're curious about what agents can do for your ops, let's talk.

Explore further

The experiments here feed directly into how I work with clients.