Building a MCP RAG Server: Enhancing Developer Tools with Contextual Documentation Search

May 13, 2025 7 minute read

Let’s be real. Documentation is often the unsung hero of developer experience. But even the best docs are useless if they’re buried somewhere nobody looks. For years, we’ve been trying to solve this with better search, better organization, and better writing. Yet somehow, most developers still resort to Google searches or asking teammates instead of digging through official docs.

What if there was a way to bring documentation directly into developers’ workflow? No context switching, no hunting through multiple tabs, no fighting with search engines that don’t understand what you’re really asking.

That’s what I’ve been experimenting with recently: an MCP RAG server that lets developers search documentation without ever leaving their code editor.

What’s the Problem?

On my previous blogs I had referred an imaginary company called SuperNova. Let’s keep that going. Imagine at SuperNova we had a growing documentation problem.

Our mobile SDK had hundreds of components, each with its own configuration options, examples, and gotchas. New team members were spending days trying to find the right information, and even experienced devs were wasting time hunting through multiple sources.

The problem isn’t the quality of our docs. It’s the experience of accessing them.

Breaking Down the Solution

Before jumping into code, let’s untangle what each piece of this solution actually does:

MCP (Model Context Protocol)

MCP is an open protocol that connects AI assistants to tools, letting them reach outside the chat window to access specific functions. Think of it as a standardized way for LLMs to call external tools and bring results back to your conversation.

In practical terms, it’s what allows coding assistants like Cursor to run executable code, access real-time data, or query custom databases - all without leaving the editor.

RAG (Retrieval-Augmented Generation)

RAG is a technique that enhances language models by retrieving relevant information from a knowledge base before generating responses. Instead of relying solely on what the model was trained on, RAG pulls in specific contextual information that might be more accurate, up-to-date, or domain-specific.

For documentation search, this means the model can pull in the exact reference material needed to answer a technical question, rather than hallucinating or giving general advice.

Vector Stores

Vector databases store and retrieve information based on semantic similarity, not just keyword matching. They convert text into numerical vectors (embeddings) that capture meaning, allowing for more intelligent “fuzzy” searches.

For our docs, this means developers can ask questions in natural language and get relevant results even if they don’t use the exact terminology from the documentation.

So how do all these pieces fit into a single flow developers can use without friction?

How They Work Together

Here’s how these components create a seamless documentation experience:

A developer asks a question in Cursor’s chat interface
The MCP protocol connects their question to our custom search tool
Our RAG system finds relevant documentation chunks using semantic search
The results get passed back to the assistant, which can now give an informed answer

No browser tabs. No context switching. Just answers, right there in the editor.

Building the Solution

I didn’t plan to build a local RAG server. But once I saw how well MCP plugged into Cursor, it clicked — this could actually work.

System Architecture

flowchart LR
    Dev[Developer in Cursor] -->|"Ask Question"| Cursor[Cursor AI Assistant]
    Cursor <-->|MCP Tool Call| MCP[MCP RAG Server]
    MCP -->|Search Query| Vector[Vector Store]
    Vector -->|Relevant Chunks| MCP
    MCP -->|Contextual Results| Cursor
    Cursor -->|Enhanced Answer| Dev

    subgraph MCP RAG Server
        Vector
        Docs[(SuperNova HTML Docs)]
        Embed[HuggingFace Embeddings]
    end

    Docs -->|Process & Chunk| Embed
    Embed -->|Store| Vector

Setting Up the MCP Server

The heart of our system is the MCP server that exposes a tool for searching documentation. Here’s the key code snippet from mcp.ts:

const mcp = () => {
  const server = new McpServer({
    name: 'Supernova MCP RAG Server',
    version: '1.0.0',
  });

  const {
    initializeVectorStore,
    getVectorStore,
  } = rag();

  server.tool(
    'search_docs',
    {
      query: z.string(),
      k: z.number().optional().default(3),
    },
    async ({ query, k }) => {
      const vectorStore = getVectorStore();
      if (!vectorStore) throw new Error('Vector store not initialized');
      const results = await vectorStore.similaritySearch(query, k ?? 3);
      return {
        content: results.map(r => ({
          type: 'text',
          text: r.pageContent,
          source: r.metadata.source,
        })),
      };
    }
  );

  async function main() {
    await initializeVectorStore();
    const transport = new StdioServerTransport();
    await server.connect(transport);
  }
};

This defines a server with a single tool called search_docs that takes a query string and returns relevant documentation chunks.

Building the RAG Pipeline

The real magic happens in the RAG implementation. Here’s where we:

Find all HTML documentation files
Extract and clean the text
Split into manageable chunks
Create embeddings using HuggingFace
Store everything in an in-memory vector database

Here’s how the core RAG functionality works in rag.ts:

async function* getAllHtmlFiles(dir: string): AsyncGenerator<string> {
  const dirents = await fs.readdir(dir, { withFileTypes: true });
  for (const dirent of dirents) {
    const res = path.resolve(dir, dirent.name);
    if (dirent.isDirectory()) {
      yield* getAllHtmlFiles(res);
    } else if (dirent.isFile() && res.endsWith('.html')) {
      yield res;
    }
  }
}

I used a generator function here for a reason. When recursively searching through a large documentation set, loading everything into memory at once can be inefficient. The generator pattern allows us to process files one at a time, keeping memory usage low while still handling a large number of files.

This function yields paths one by one as it finds them, rather than building a giant array in memory. It’s a small optimization, but these kinds of details matter when you’re dealing with potentially thousands of documentation files.

With the files identified, we can process them into a format suitable for semantic search.

Real-World Considerations

This implementation works well for small to medium documentation sets, but there are some minimal practical considerations for production use:

Performance Tradeoffs

The current setup comes with a few deliberate performance tradeoffs:

In-Memory Storage: Fast for small to medium documentation sets, but not ideal for very large sets, e.g. 1000+ files.
On-the-Fly Embedding: We generate embeddings at server startup, which makes initial startup slower but ensures all content is searchable.
Sequential Processing: Files are processed one after another for simplicity.

For a POC or small team documentation, this approach works fine. The startup time is acceptable, and the memory usage is reasonable.

API Rate Limits

One practical challenge with free services is rate limits. The HuggingFace Inference API has limits (around 300 requests/hour for free users). If you have thousands of documentation chunks, you might hit these limits during initialization.

For a production system, you might want to:

Batch your embedding requests
Cache embeddings to disk
Consider a paid API plan or self-hosted embedding model

Extending With LangChain and LangGraph

This setup works on its own, but if you want to extend it, LangChain and LangGraph offer some ready-made components:

Better Chunking Strategies: LangChain provides more sophisticated text splitting methods that can preserve document structure
Chain of Thought Reasoning: Build more complex query pipelines that refine search results
Hybrid Search: Combine semantic and keyword search for better precision
Source Attribution: Track exactly which parts of the documentation contributed to answers

Using It In Practice

Here’s how a developer actually uses this in their workflow:

Open Cursor

Configure the MCP server by pointing to your server:

{
  "mcpServers": {
    "mcp-rag-server": {
      "command": "node",
      "args": [
        "/path-to/supernova-mcp-rag/mcp-rag-server/dist/index.js"
      ],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Ask questions about documentation directly in their editor

For example, they might ask:

“Can you call the MCP tool to look up how to add a hero card using SuperNovaStorybook-Mobile-Swift?”

The assistant now has direct access to the relevant documentation chunks and can provide an accurate, contextual answer without the developer ever having to leave their editor.

Beyond Documentation: Other Applications

This same pattern can be applied to other developer experience improvements:

Searching codebase for examples and patterns
Finding relevant test cases
Locating configuration options across microservices
Exploring architecture decisions from ADRs

Is It Worth It?

Let’s break down the cost-benefit analysis:

Time Investment:

Initial setup
Customizing for your docs
Adding to your workflow

Benefits:

Reduced context switching during development
Faster onboarding for new team members
More accurate answers based on your actual documentation
Documentation usage analytics (if you add telemetry)

For teams with substantial documentation or specialized knowledge requirements, the ROI can be significant. Even saving 15 minutes per developer per day adds up quickly.

Key Takeaways

Modern tools make this accessible: You don’t need deep ML expertise to implement a practical RAG system
Start small: An in-memory approach works fine for most documentation sets
Free tiers work for POCs: HuggingFace’s free embedding API is sufficient for testing
Workflow integration matters: The best documentation search is the one developers actually use

This is just the beginning of what’s possible with MCP and RAG for developer tools. The core idea—bringing information to where developers already work—can be extended in countless ways.

For your next project, consider: what other information could you bring directly into your development environment?

The full code is available at github.com/shabib87/supernova-mcp-rag

References

Share on

X Facebook LinkedIn Bluesky

Ahmad Shabibul Hossain