Retrieval-Augmented Generation · GraphRAG · Hallucination Mitigation
RCM Knowledge Assistant
A true RAG system — not prompt-stuffing — over public revenue-cycle-management references. It retrieves from a vector index, augments with a domain knowledge graph, answers only from what it found with inline citations, and refuses when its sources don’t cover the question.
RCM Knowledge Assistant
Grounded RAG · inline citations · confidence-gated
Try an example
How it works
The retrieval pipeline
Each question runs the full guardrailed RAG loop below. Embeddings are computed locally with Xenova/all-MiniLM-L6-v2 (free, no embedding API); generation uses Groq llama-3.3-70b-versatile.
Guard
Rate-limit by IP, then a prompt-injection filter blocks attempts to override the “answer only from context” contract or exfiltrate the system prompt.
Embed
The query is embedded into the same 384-dimension MiniLM vector space as the corpus, locally and for free — no embedding-API cost or key.
Retrieve + gate
Cosine similarity ranks the top-k chunks. A confidence gate (best score < 0.35) refuses out-of-scope questions instead of hallucinating — the core hallucination-mitigation control.
Graph-augment
A domain knowledge graph (CARC/RARC codes → denial reasons → appeal steps) expands the vector hits with graph-adjacent chunks, surfacing related context pure vector search would miss.
Ground & cite
A strict system prompt instructs the model to answer only from the numbered context and cite sources as [n]. Citations are reconciled against what the answer actually used.
Return
The API returns { answer, citations, confidence }. The UI renders inline [n] chips, a sources panel with the real public URLs, and a confidence badge.
Why this exists
A domain-grounded RAG proof of work
This is a portfolio demo for an AI Architect application. It targets the role’s exact asks — RAG and GraphRAG, retrieval guardrails, and hallucination mitigation — applied to the revenue-cycle domain. The guardrails are real: a confidence gate that refuses, an injection filter, citation reconciliation, rate limiting, and a deterministic extractive fallback so the system is grounded even without an LLM key. It is labeled honestly as graph-augmented retrieval, and built entirely on public standards (X12 / WPC / CMS).