🔄 FixLoop — AI Agent Loop for Self-Correcting Code

Pranesh Nikhar Apr 13, 2026 04/13/26 3 min read

 󰈤

An AI agent that writes code, runs tests, spots failures, and retries until everything passes. Supports OpenAI, Gemini, Groq, and Ollama.

🎯 What It Does

FixLoop is an AI agent loop that writes code from a natural-language description, runs pytest tests to verify, spots failures, feeds the error output back to the LLM, and retries until all tests pass.

$ python main.py --challenge fibonacci
┌────────────────────────────────────────────┐
│ 🎯 Challenge: fibonacci                     │
│ Trying iteration 1...                      │
│ ❌ 2/3 tests passed                        │
│ → Feeding errors back to LLM...            │
│ Trying iteration 2...                      │
│ ✅ 3/3 tests passed!                       │
│ 💾 Saved to solutions/fibonacci.py         │
└────────────────────────────────────────────┘

🧱 Tech Stack

Component	Technology
Language	Python 3.10+
LLMs	OpenAI SDK, google-genai SDK, Groq SDK, Ollama
Testing	pytest

The LLM client abstraction layer supports 4 providers behind a common generate() interface.

🏗️ Architecture

main.py                       # argparse CLI entry point
fixloop/
├── runner.py                 # FixLoop class: tempdir → generate → test → loop
├── coder.py                  # LLM prompt to write code from description
├── tester.py                 # Runs pytest --tb=short, parses results
├── debugger.py               # LLM prompt with code + errors to fix bugs
├── llm.py                    # LLMClient abstraction (4 providers)
├── utils.py                  # Strip markdown code fences from LLM output
└── challenges.py             # Challenge dataclass + built-in challenges dict

Each module is under 100 lines — a deliberate design goal for maintainability.

🔄 The FixLoop Algorithm

1. Create temp directory
2. Prompt LLM to write {filename}.py from the challenge description
3. Run pytest on the generated file
4. Parse pytest output for passed/failed/error counts
5. If all pass → save solution, exit 0
6. If any fail → prompt LLM (debugger) with:
   - Generated code
   - pytest error output (stdout + stderr)
7. LLM returns fixed code (may be same as before)
8. Go to step 3, up to max_iterations
9. If exhausted → save best attempt, exit 1

📋 Built-in Challenges

Challenge	Description
`fibonacci`	Return nth Fibonacci number
`prime_checker`	Check if a number is prime
`valid_parentheses`	Validate balanced parentheses
`two_sum`	Find two indices summing to target
`valid_sudoku`	Validate a 9x9 Sudoku board

Each challenge includes:

A natural-language description (fed to the coder)
3+ pytest test cases (fed to the tester)
An entry point function name

Custom challenges can be added via the Challenge dataclass.

🔌 Provider Support

# OpenAI (default)
python main.py --challenge fibonacci

# Gemini
python main.py --challenge two_sum --provider gemini

# Groq
python main.py --challenge valid_parentheses --provider groq

# Ollama (local)
python main.py --challenge prime_checker --provider ollama --model llama3.1

The LLMClient abstraction wraps different SDKs behind a unified generate(system_prompt, user_prompt) interface, making provider switching transparent.

🧹 Key Detail: Markdown Fence Stripping

LLMs love wrapping code in markdown code fences:

```python
def fibonacci(n):
    ...
```

The utils.py module strips these automatically with a regex, along with any surrounding explanation text from the LLM response.

🚀 Quick Start

pip install fixloop

# Run with OpenAI
export OPENAI_API_KEY=sk-...
python main.py --challenge fibonacci

# Run with Ollama (local)
python main.py --challenge two_sum --provider ollama --model llama3.1

# Custom challenge
python main.py --challenge my_challenge --max-iterations 10

💡 Why It’s Interesting

FixLoop is a minimal, focused implementation of the “write code → test → fix → retest” loop that powers more complex AI coding agents. It strips away everything non-essential: each module is <100 lines, the loop is explicit and inspectable, and the provider abstraction is clean. Despite its simplicity, it solves a genuinely hard problem — LLMs rarely write perfect code on the first try, and the self-correcting loop dramatically improves success rates. It’s a great reference for understanding how AI coding agents actually work under the hood.

← [b]ack

posts/ ⚡ Functional vs OOP — Same Problem, Both Ways [n]ext → posts/ ⛵ Raft Consensus Algorithm Explained