Grounded — Production RAG Starter

TypeScriptRAGFastifyPostgreSQLpgvectorOpenAIIdempotencyVitest

Engineering Overview

Grounded is the boring, reliable parts of a RAG system done right, in a codebase small enough to read in 20 minutes. Every answer cites the source chunks it used; if retrieval returns nothing relevant, it refuses (`grounded: false`) instead of guessing. Ingestion is idempotent — chunks are content-hashed so re-ingesting only embeds new or changed content, eliminating wasted API spend on every deploy. Transient embedding/LLM errors retry with backoff while 4xx fail fast. A labelled Q&A eval harness scores retrieval and answers so a prompt or model change can't silently regress quality in CI. Embedder, vector store, and LLM are all swappable via env — the default runs with an in-memory store and extractive answers, so the whole thing is testable offline with no API key and no database.

The Problem

Most RAG demos look great until real users hit them: they hallucinate when the answer isn't in the corpus, re-embed everything on every deploy, double-charge on retries, and give you no way to tell whether a prompt change made retrieval quality worse.

The Solution

A small, readable starter that does the reliable parts properly: grounded cited answers with an explicit refusal path, content-hash idempotent ingestion, retry policy that distinguishes transient from fatal, and an eval harness that turns 'did this change make things worse?' into a CI signal instead of a vibe.

Key Engineering Challenges

The interesting design call was making the whole pipeline runnable with zero external dependencies so it's genuinely testable in CI: an in-memory vector store and an extractive answerer stand in for the real embedder/LLM behind the same interface, so the guardrail, citation, and idempotency logic are all exercised offline. The other was idempotent ingestion — hashing chunk content (not just document IDs) so an edited document re-embeds only the chunks that actually changed, which is what keeps re-ingestion cheap.

Core Capabilities

Cited answers — every response references the exact source chunks it used

“I don't know” guardrail — refuses instead of hallucinating when nothing relevant is retrieved

Idempotent ingestion — content-hash dedup means re-ingesting only embeds new/changed chunks

Retries with backoff on transient API errors; 4xx fail fast

Eval harness scores retrieval + answers so prompt/model changes can't silently regress in CI

Provider/store-agnostic and offline-testable — default runs with no API key and no database

System Architecture

API

Fastify: POST /ingest, POST /ask

Cited answers — every response references its source chunks

“I don't know” guardrail when retrieval is empty

Ingestion

Content-hash dedup → embed only new/changed chunks (idempotent)

Retry with backoff on transient errors; 4xx fail fast

Retrieval & Eval

pgvector cosine similarity (in-memory store for offline mode)

Eval harness scores retrieval + answers on a labelled set

Pluggable

Embedder / store / LLM swappable via env

Default: no API key, no DB — fully offline-testable

"An ask request retrieves the top-k chunks, refuses if nothing clears the relevance bar, otherwise composes a cited answer. Ingestion is a near-no-op on unchanged content thanks to content-hash dedup. The eval harness runs the same path against a labelled set so regressions surface in CI."

Quick Actions

View Source

Tech Stack

Language: TypeScript

Server: Fastify

Vector store: PostgreSQL + pgvector (in-memory fallback for offline mode)

LLM/embeddings: OpenAI (swappable via env)

Tests: Vitest (10, offline — no API key/DB)

Inquiry

Interested in discussing this engineering approach?

Message Shailesh →