RAG for Security: Evolution and Modern Implementation

Anthony G. Tellez6 min read
RAGClaudeAISecurity AnalyticsVector SearchAnthropicLLMSuricataMITRE ATT&CKEmbeddingsCISA

In November 2024, I presented at SuriCon 2024 about using RAG for security operations. That work, done in partnership with Graphistry as part of my work at BNY Mellon, focused on OpenAI embeddings and graph visualization to transform Suricata rule management. Read the original SuriCon post to see what we built. This post covers how the approach has evolved since then, including a working demo you can run below.

From OpenAI to Claude

The SuriCon implementation used OpenAI's text-embedding-3-small for retrieval and GPT-4o for generation. The current implementation keeps the same embedding model for document and query encoding but migrated to Anthropic's Claude Sonnet for generation. The migration confirmed something worth stating directly: LLM choice matters less than you think at the generation step. Claude's responses stay closer to retrieved context and are less likely to elaborate beyond what the source material supports — useful for security queries where a hallucinated CVE identifier or technique ID is worse than no answer. The pipeline itself is identical regardless of provider: embed the query, retrieve the top documents, pass them with the question to the model.

The embedding model and retrieval quality are where architecture decisions actually matter. Document embeddings are generated offline using text-embedding-3-small and stored on Cloudflare R2. At query time, the same model encodes the user's question so both vectors live in the same 1,536-dimensional space. Cosine similarity against the pre-computed corpus runs in the browser in under 100ms.

Expanding the Knowledge Base

The SuriCon work was scoped to Suricata rules from a single ruleset. The current implementation draws from four sources:

  • MITRE ATT&CK — ~500 enterprise techniques with tactic mappings and procedure descriptions
  • Emerging Threats Open — ~300 detection rules across 21 categories: malware, exploit kits, botnet C2, phishing, shellcode, web attacks, JA3 TLS fingerprints, and current campaigns
  • CISA Known Exploited Vulnerabilities — 1,536 real CVEs actively exploited in the wild, each tagged with vendor, product, required remediation action, and a flag for known ransomware campaign use
  • Suricata Documentation — 57 chunks from the official docs covering rule syntax, keywords, thresholding, and configuration

Total corpus: ~2,400 documents, each embedded with text-embedding-3-small and stored as a single JSON file on R2. The browser fetches it once on page load and caches it in memory.

Expanding the scope surfaced a problem the single-domain version did not have: semantic search across multiple knowledge domains requires the source data to carry enough context for embeddings to be meaningful. Early testing with unenriched Suricata rules produced retrieval results that were technically related but practically wrong. An analyst querying for "lateral movement" would get network reconnaissance rules back, because both share vocabulary around port scanning and network connections. The fix was enriching source text before embedding — appending category, protocol, and any available metadata so the resulting vector captures intent rather than surface syntax.

The hybrid search approach matters here too. Dense vector search alone misses exact CVE identifiers and specific technique IDs that analysts type precisely. Combining semantic similarity with keyword scoring handles both cases: fuzzy conceptual queries and exact string lookups.

What the CISA KEV Adds

This is the most practically useful addition. The CISA Known Exploited Vulnerabilities catalog is a living list of CVEs that threat actors are actively using right now — not theoretical vulnerabilities but ones observed in real attacks. Each entry includes the vendor and product name, a short description, a required remediation action, a due date under CISA's Binding Operational Directive, and a flag for whether the CVE has appeared in a documented ransomware campaign.

The retrieval quality on vulnerability questions is noticeably different with real KEV data compared to the synthetic CVE dataset it replaced. A query like "what Microsoft RDP vulnerabilities need immediate patching" retrieves CVE-2019-0708 (BlueKeep), CVE-2019-1182, and other RDP-specific entries with accurate vendor guidance attached — rather than returning generated text that approximates what such an advisory might say.

Architecture: Static Site, Browser-Side RAG

The production implementation at BNY Mellon ran on OpenSearch, BigQuery, and VectorAI with server-side retrieval and generation. This portfolio implementation runs entirely in the browser with pre-computed embeddings. That is not an architecture I would recommend for production — it exposes API keys client-side and caps the corpus at what a browser can reasonably load — but it is the right choice for a static site on Cloudflare Pages with no backend.

The practical scale limit with this approach is around 3,000–5,000 documents before the initial load time becomes noticeable. Beyond that, you want a vector database (Pinecone, pgvector, Cloudflare Vectorize) and a server-side retrieval layer. The query embedding call also moves server-side, so API keys never reach the browser.

Context window economics also shifted from the SuriCon version. With Claude's larger context window, the temptation is to pass in as many retrieved documents as possible. In practice, 5 documents at around 5,000 tokens of context worked better than larger retrieval sets. More context added latency and diluted relevance — the model synthesizes across all retrieved documents even when only two or three are actually relevant. Smaller, higher-precision retrieval windows produce cleaner answers.

Try It

The demo runs the full retrieval and generation pipeline in your browser. The left column shows traditional keyword search; the right column shows semantic RAG retrieval using OpenAI embeddings. Source labels on each result indicate which knowledge base it came from.

RAG Playground

Compare traditional keyword search with RAG-powered semantic search. Ask security questions and see how embeddings improve result quality.

Example Queries

The difference between the two columns is most visible on conceptual queries — "how do adversaries establish persistence" surfaces different results than an exact string match would — and on queries that require synthesizing across technique and vulnerability data simultaneously. Try asking about a CVE to see CISA guidance alongside related MITRE techniques, or ask a Suricata configuration question to pull from the documentation corpus.


This demo processes queries entirely in your browser using the Anthropic and OpenAI APIs. No query data is stored or logged.