Retrieval-Augmented Generation · GraphRAG · Hallucination Mitigation

RCM Knowledge Assistant

A true RAG system — not prompt-stuffing — over public revenue-cycle-management references. It retrieves from a vector index, augments with a domain knowledge graph, answers only from what it found with inline citations, and refuses when its sources don’t cover the question.

Public sources

RCM domains

590

Graph relationships

384

Vector dimensions

Prefer to read first? Browse the Knowledge Base — the same 31 public references, searchable.

See it answer

Grounded, cited, confidence-gated

Example answer

High confidence · 0.71

What does CARC 16 mean, and how do I work it?

CARC 16 means the claim or service lacks required information or contains a submission/billing error, and it must always be accompanied by at least one Remark Code (RARC) that specifies exactly what is missing1. Because it signals a data defect rather than a coverage denial, the fix is usually a corrected claim, not a formal appeal: read the paired RARC, supply the missing element (a required modifier, the rendering NPI, or a referring-provider ID), and resubmit2.

Sources

[1] X12 / Washington Publishing Company · Claim Adjustment Reason Codes (CARC 16)
[2] CMS · Medicare Claims Processing Manual (claim-correction guidance)

A sample of what the assistant returns: grounded, cited, confidence-scored. Ask your own below ↓

Try it yourself

Ask your own question

How it works

The retrieval pipeline

Each question runs the full guardrailed RAG loop below. The corpus is embedded once at build time with Xenova/all-MiniLM-L6-v2; at query time a single 384-d vector is embedded via a lightweight hosted endpoint, so the serverless function ships zero ML weights and stays well under Vercel’s 250 MB cap. Generation uses Groq moonshotai/kimi-k2-instruct.

Guard

Rate-limit by IP, then a prompt-injection filter blocks attempts to override the “answer only from context” contract or exfiltrate the system prompt.

Embed

The query is embedded into the same 384-dimension MiniLM space as the prebuilt corpus index. The model weights never ship in the function — one query vector is fetched from a lightweight hosted endpoint — so the serverless bundle stays far under Vercel’s 250 MB cap.

Retrieve + gate

Cosine similarity ranks the top-k chunks. A confidence gate (best score < 0.35) refuses out-of-scope questions instead of hallucinating — the core hallucination-mitigation control.

Graph-augment

A domain knowledge graph (CARC/RARC codes → denial reasons → appeal steps) expands the vector hits with graph-adjacent chunks, surfacing related context pure vector search would miss.

Ground & cite

A strict system prompt instructs the model to answer only from the numbered context and cite sources as [n]. Citations are reconciled against what the answer actually used.

Return

The API returns { answer, citations, confidence }. The UI renders inline [n] chips, a sources panel with the real public URLs, and a confidence badge.

Why this exists

A domain-grounded RAG proof of work

This is a portfolio demo for an AI Architect application. It targets the role’s exact asks — RAG and GraphRAG, retrieval guardrails, and hallucination mitigation — applied to the revenue-cycle domain. The guardrails are real: a confidence gate that refuses, an injection filter, citation reconciliation, rate limiting, and a deterministic extractive fallback so the system is grounded even without an LLM key. It is labeled honestly as graph-augmented retrieval, and built entirely on public standards (X12 / WPC / CMS).