API Economics and MCP: Designing Tools for Credit-Metered Threat Intelligence
The friction point in most threat intelligence workflows is context switching. You are in Claude Code working through a hypothesis about a suspicious IP, and you need external data: what services it is running, whether a certificate is shared across related infrastructure, what its historical exposure looks like. The normal path is to break out of Claude Code, run a query in a separate terminal, copy the JSON result, and paste it back into your conversation. Do that four or five times across an investigation and you have lost the thread.
In December 2025 I built an MCP server for a credit-metered threat intelligence API to eliminate that overhead. What I found is that the interesting design decisions are not about MCP at all. They are about API credit economics, cache consistency, and how to structure tools so an AI assistant selects the right one without being told.
Why MCP Is Different from a CLI Wrapper
A shell alias that calls a CLI tool and prints results to the terminal solves the typing problem. It does not solve the context problem. Claude Code still cannot see those results unless you paste them in, and pasting breaks conversational flow.
Anthropic's Model Context Protocol lets you register tools that Claude Code can call directly during a conversation. The results flow into the context automatically. Claude Code can reason over the data inline, correlate it with other information from the same conversation, and call follow-up tools without any user intervention. The difference is not convenience. It is that the AI assistant is doing the synthesis work rather than acting as a relay between your terminal and the conversation.
Narrow Tools Instead of a Generic Query Interface
The first design question was whether to build a single generic tool that accepts arbitrary query strings, or to build narrowly scoped tools for specific question types. Narrow tools win for two reasons.
First, Claude Code can select the right tool for a specific question when the tool name and its typed inputs make the correct choice obvious. A question about certificate sharing across infrastructure has a different answer than a question about open ports, and a different tool should handle each. With a generic query interface, the model has to know the API's filter syntax to construct a valid query. With narrow tools, that knowledge is encoded in the tool schema.
Second, per-tool input validation prevents malformed queries from reaching the API. For credit-metered APIs, a query that fails server-side can still consume a credit. Catching the error at the tool layer costs nothing.
One design pattern worth calling out specifically: many threat intelligence APIs separate the operation of counting results from the operation of retrieving them, and count queries often consume no credits while full retrieval queries do. Exposing explicit count tools surfaces this capability. For scoping work where you want to understand the size of a result set before committing to it, a pre-flight count is frequently the first call. Burying this in a generic query interface means the model has to know to ask for it; a named tool makes it discoverable.
Cache Design and the Credit Calculation
Credit-metered APIs have a real cost, and active investigations re-query the same targets repeatedly as new context accumulates. Without a cache, each revisit consumes a credit for data you already have.
The cache in this server is SQLite rather than in-memory, and the reason is persistence. An in-memory cache dies when the session ends. Threat intelligence investigations rarely complete in a single sitting. A host that appears on day one may reappear on day three when a new artifact connects back to it. A persistent cache makes that second lookup free.
The cache key is a deterministic hash of the tool name and normalized input parameters. Normalization matters: the same query with parameters in a different order should produce the same key. TTL defaults to 24 hours, which handles the within-session and within-day re-query patterns without serving substantially stale data. It is configurable for deployments where freshness requirements differ.
Every cache entry includes a hit count field. Tracking which queries repeat in practice informs which tools to prioritize and which edge cases in the API's response schema matter most. The hit count data shaped subsequent development in ways that would not have been visible otherwise.
The cache also creates a consistency property that matters during investigations. You want to see the same state of a target across multiple queries within a session. Seeing different states mid-investigation introduces ambiguity about whether differences reflect genuine changes or query timing. Consistent data within a session is more operationally useful than maximally fresh data, even if the tradeoff is a view that lags slightly behind reality.
Rate Limiting
Credit-metered APIs enforce rate limits, and hitting them during an investigation returns errors rather than results. The rate limiter here is a token bucket at one request per second with a burst capacity of five.
The burst capacity matters because of how Claude Code uses tools at session start. When a new investigation begins and several questions need answers immediately, Claude Code often calls multiple tools in rapid succession. Without burst capacity, those calls queue at one per second even though the rate limit has not been approached. Burst lets the first several queries go out immediately, after which subsequent queries pace to the steady-state rate.
The async lock on the rate limiter is not optional. The MCP server handles concurrent tool invocations, and without a lock, two concurrent calls could both read available tokens and both proceed, doubling effective throughput and triggering server-side limits. The same concurrency concern applies when the underlying API client makes synchronous HTTP calls: calling it directly inside an async tool handler blocks the event loop for the duration of the network request, preventing other pending tool calls from making progress. Running the synchronous call in a thread pool executor keeps the event loop free. This is easy to omit and the failure mode is subtle — the server works correctly under light load and degrades under the concurrent calls that are typical of an active investigation.
Credential and Cache Hygiene
The API key is an environment variable passed at server startup. It does not appear in the codebase, in committed configuration files, or in the SQLite cache. The cache file itself should be excluded from version control: it contains full API responses, which depending on the investigation context may include sensitive target data. Gitleaks in CI/CD handles secret detection. The failure mode of committing an API key is not just a credit exposure problem — for a key associated with investigations into specific infrastructure, the key itself could reveal investigative interest.
The Durable Lesson
The workflow change is concrete. An investigation that previously required repeated terminal context switches now runs entirely within a Claude Code conversation. Structured data flows back into context automatically and the analysis continues without interruption.
The more transferable lesson is about tool design for AI assistant workflows. Narrow, well-named tools with typed inputs outperform generic query interfaces because tool selection becomes part of the model's reasoning rather than an additional burden on the user. The cache and rate limiter are not implementation details. For any credit-metered API integrated into an interactive investigation workflow, they are the features that determine whether the system is actually usable — because the same questions recur, and hitting a limit or running up a credit bill mid-session is not an acceptable outcome.
Related Articles
Cutting Out the Context Switch: An MCP Server for CISA Advisories and SIEM Query Generation
How I built a Python MCP server that brings CISA KEV lookups, CSAF advisory parsing, IOC extraction, and KQL/SPL query generation directly into Claude Code, and the filtering and caching decisions that made it practical.
Building a Public API Without Exposing Your Private Application
How to serve curated public data from a private analytics app by building a completely separate API layer on Cloudflare Workers with Hono, D1, and Chanfana.
Rewriting a Python Moderation Service in Go: From 3GB to 50MB
Why I rewrote a real-time moderation and TTS service from Python to Go, what the memory and startup time differences actually looked like, and how the concurrent pipeline architecture changed what the service could do.