
Engineering Overview
A monorepo (pnpm workspaces) exploring AI-powered semantic code search. Shipped so far: Next.js landing page with feature showcase, search UI with real-time suggestions and filtering against a ~50-example mock dataset, NestJS backend scaffolding, Prisma schema with a pgvector column, and a Docker Compose for Postgres + Redis + pgAdmin. The ingestion → embedding → retrieval pipeline is prototyped but not yet wired end-to-end; Phase 2 (real content ingestion from GitHub / StackOverflow) is the active work.
The Problem
grep and file search break down on large codebases — you need to know the exact term. New engineers and AI tools struggle to navigate unfamiliar code when the vocabulary is unknown. A semantic layer over the codebase could turn 'where is auth handled?' into a ranked list of file + line locations.
The Solution
Monorepo with an ingestion pipeline that chunks source at function/class boundaries (not fixed-line), embeds each chunk, and stores vectors in pgvector. Queries are embedded at runtime and matched via cosine similarity. The UX layer is built against mock data so the interaction design is validated before the retrieval backend is complete.
Key Engineering Challenges
The scaffolding phase made the key insight concrete: fixed-size chunks split function bodies mid-logic and produce low-quality embeddings, whereas function/class-boundary chunks preserve semantic units. Phase 1 validated the UX affordances needed around results (filters, sort, source badges) by running the UI against mock data first, so the retrieval work can focus on quality rather than on reactive UI changes. Next: wire real embeddings through the chunker, finalise the pgvector index strategy (ivfflat vs hnsw), and add chunker unit tests.
Core Capabilities
System Architecture
System Architecture
Frontend (Phase 1 — shipped)
Backend (scaffolded)
Infrastructure
"A pnpm monorepo laying groundwork for semantic code search. Phase 0/1 are complete (monorepo tooling + UI against mock data); the retrieval pipeline is next. Included here as a design + scaffolding sample, not a finished product."