
Grounded — Production RAG Starter
Engineering Overview
Grounded is the boring, reliable parts of a RAG system done right, in a codebase small enough to read in 20 minutes. Every answer cites the source chunks it used; if retrieval returns nothing relevant, it refuses (`grounded: false`) instead of guessing. Ingestion is idempotent — chunks are content-hashed so re-ingesting only embeds new or changed content, eliminating wasted API spend on every deploy. Transient embedding/LLM errors retry with backoff while 4xx fail fast. A labelled Q&A eval harness scores retrieval and answers so a prompt or model change can't silently regress quality in CI. Embedder, vector store, and LLM are all swappable via env — the default runs with an in-memory store and extractive answers, so the whole thing is testable offline with no API key and no database.
The Problem
Most RAG demos look great until real users hit them: they hallucinate when the answer isn't in the corpus, re-embed everything on every deploy, double-charge on retries, and give you no way to tell whether a prompt change made retrieval quality worse.
The Solution
A small, readable starter that does the reliable parts properly: grounded cited answers with an explicit refusal path, content-hash idempotent ingestion, retry policy that distinguishes transient from fatal, and an eval harness that turns 'did this change make things worse?' into a CI signal instead of a vibe.
Key Engineering Challenges
The interesting design call was making the whole pipeline runnable with zero external dependencies so it's genuinely testable in CI: an in-memory vector store and an extractive answerer stand in for the real embedder/LLM behind the same interface, so the guardrail, citation, and idempotency logic are all exercised offline. The other was idempotent ingestion — hashing chunk content (not just document IDs) so an edited document re-embeds only the chunks that actually changed, which is what keeps re-ingestion cheap.
Core Capabilities
System Architecture
System Architecture
API
Ingestion
Retrieval & Eval
Pluggable
"An ask request retrieves the top-k chunks, refuses if nothing clears the relevance bar, otherwise composes a cited answer. Ingestion is a near-no-op on unchanged content thanks to content-hash dedup. The eval harness runs the same path against a labelled set so regressions surface in CI."