Skip to content

AI Engineering

Vocabulary Drift: The Naming Problem Slowing Down Your AI Projects

Your deployment pipeline is only as fast as your slowest miscommunication.


Last quarter I was leading the architecture for a GenAI document processing system.

The technical work was fast. We had RAG pipelines running in days. Prototypes that impressed stakeholders. A team that could iterate quickly.

But something was broken. Users felt like we were moving at a really slow pace. Stakeholders were frustrated. Every meeting felt like we were speaking different languages.

The problem wasn't our code. It was our words.

5 Mistakes That Make Your RAG App Hallucinate

The problem isn't that RAG doesn't work. It's the small gaps we leave open.


You built a RAG system. You connected your LLM to your knowledge base. Users ask questions, documents get retrieved, answers get generated. Everything works... until your chatbot confidently tells a customer the price is \$29.99 instead of \$49.99.

Welcome to the world of RAG hallucinations.

The problem isn't that RAG doesn't work. There are small mistakes in how we build these systems that leave the door wide open for the LLM to make things up. Here are the five most common ones.

Ship Faster Without Breaking Your Agent

A simple evaluation system that catches prompt drift and tool bugs before your users do


Every time I changed the system prompt or a tool description in our agentic chatbot, something else broke. Tool B would finally start working, but then Tool A would hallucinate responses that were perfectly fine before. I'd check the logs, compare prompts, roll back changes in git, and manually test a dozen use cases. It was a nightmare that made development painfully slow.

When you're building an AI agent with multiple tools, you craft this big, careful system prompt that explains which tool to use when and how to use each one. Change one sentence, and the whole thing can shift in ways you don't expect. The complexity comes from these emergent behaviors. LLMs don't follow strict logic. They're probabilistic. A small wording change in your prompt can make the llm act in weird ways.

This problem slowed us down so much that I had to solve it. I needed something that could check hundreds of use cases in minutes instead of hours, something that would tell me exactly what broke and why.