Skip to content

2025

5 Mistakes That Make Your RAG App Hallucinate

The problem isn't that RAG doesn't work. It's the small gaps we leave open.


You built a RAG system. You connected your LLM to your knowledge base. Users ask questions, documents get retrieved, answers get generated. Everything works... until your chatbot confidently tells a customer the price is \$29.99 instead of \$49.99.

Welcome to the world of RAG hallucinations.

The problem isn't that RAG doesn't work. There are small mistakes in how we build these systems that leave the door wide open for the LLM to make things up. Here are the five most common ones.

Ship Faster Without Breaking Your Agent

A simple evaluation system that catches prompt drift and tool bugs before your users do


Every time I changed the system prompt or a tool description in our agentic chatbot, something else broke. Tool B would finally start working, but then Tool A would hallucinate responses that were perfectly fine before. I'd check the logs, compare prompts, roll back changes in git, and manually test a dozen use cases. It was a nightmare that made development painfully slow.

When you're building an AI agent with multiple tools, you craft this big, careful system prompt that explains which tool to use when and how to use each one. Change one sentence, and the whole thing can shift in ways you don't expect. The complexity comes from these emergent behaviors. LLMs don't follow strict logic. They're probabilistic. A small wording change in your prompt can make the llm act in weird ways.

This problem slowed us down so much that I had to solve it. I needed something that could check hundreds of use cases in minutes instead of hours, something that would tell me exactly what broke and why.

Why Your GenAI App Needs a Task Queue (And How Celery Solves It)

Building reliable AI applications means handling slow, expensive operations without making users wait


You've built a GenAI application. Users send requests, your code calls an LLM or generates images, and returns results. Simple enough. Until reality hits.

Your image generation model takes 35 seconds to run. Your agentic workflow makes six separate LLM calls, each taking 3-8 seconds. A user submits a request, and their browser times out waiting. Or worse, they refresh the page halfway through, and you've just burned through $2 in API credits for nothing.

This is the problem every GenAI developer hits eventually. AI operations are slow. They're expensive. They fail randomly when APIs have problems. And you can't make users sit there staring at a loading spinner for 40 seconds. You need a better way.

Recommender Systems: The Unsexy AI That Actually Works

Everyone is obsessed with AI agents right now. Autonomous systems that plan and execute tasks. Multi-step reasoning. Tool use. The whole package.

Meanwhile, half the apps you use every day run on recommender systems. Netflix telling you what to watch. Spotify building your playlists. Amazon showing you products. YouTube deciding your next video. These systems move billions of dollars and nobody talks about them anymore.

I spent last year deep in recommender systems for my computer science degree. Built a few from scratch. Learned what works and what breaks at scale. The best part? If you've worked with RAG applications, you already understand half of it. The concepts overlap way more than you'd think.

How I Build Python Projects (My Way, Simple & Solid)

An opinionated, scalable way to structure Python projects using hexagonal architecture, SOLID, and practical DDD.

A friend asked me: "What's the proper way to build a Python project?" Here's my take. It's what I use for clients and at my 9 to 5. I like SOLID and Domain‑Driven Design, but I don't chase perfection. I just want code that stays tidy when the app grows.