Case StudyRAG / Knowledge Systems

RAG Knowledge Base

A retrieval-augmented generation system built over internal documentation — enabling staff to query years of institutional knowledge in plain natural language, with cited sources.

RAGPineconeOpenAI EmbeddingsFastAPIPython

800+

Docs Indexed

<3s

Query Response

85%

Search Time Saved

Overview

An organization had accumulated years of internal documentation — SOPs, policy documents, training materials, meeting notes — spread across Google Drive, Notion, and a legacy intranet. New staff spent weeks onboarding, and experienced staff regularly wasted time hunting for information they knew existed somewhere.

The goal: make the entire knowledge base queryable in natural language, with answers grounded in actual source documents and citations provided for verification.

Technical Approach

Built an ingestion pipeline that crawls connected document sources, chunks content intelligently (respecting document structure), and generates embeddings via OpenAI's text-embedding model.
Stored all vectors in Pinecone with rich metadata (source, date, author, document type) to enable filtered retrieval.
Implemented a FastAPI backend that accepts natural language queries, retrieves the top-k most relevant chunks, and passes them to GPT-4 with a strict grounding prompt.
Responses always include source citations with links back to the original document — preventing hallucination and building user trust.
Added a feedback loop where users can rate answers, with low-rated responses flagged for prompt refinement.

Results & Learnings

Staff reported an 85% reduction in time spent searching for information. New employee onboarding time dropped significantly — instead of asking colleagues or digging through folders, they could query the system directly. The citation feature was critical for adoption; users trusted answers they could verify.

The most important technical lesson: chunking strategy matters enormously. Naive fixed-size chunking produced poor retrieval quality. Switching to semantic chunking — splitting on meaningful boundaries like headings and paragraphs — dramatically improved answer relevance.

Back to all projects