Graph Analytics for Blockchain Forensics: Tracing $252M in Suspicious Transactions

When I joined BlockFi as Senior Security Architect and the first Head of Machine Learning in April 2021, the compliance and security teams were drowning in blockchain transaction data they could not effectively analyze. The tooling was ad hoc, the scale was growing rapidly, and the regulatory stakes were severe: OFAC sanctions compliance, AML requirements, and transaction monitoring. This post describes the graph analytics framework we built to address that, the technology choices we made, and what the experience taught me about applying graph theory to financial forensics at scale.

The Blockchain Forensics Problem

There is a tempting misconception about blockchain data: that because it is public and immutable, it is easy to analyze. The opposite is closer to the truth. The transparency is real, with every transaction on-chain and permanently recorded, but the analytical challenge is substantial.

In Bitcoin's UTXO model, addresses are not accounts. A single entity controls many addresses, and best practice for privacy involves generating a new address for every transaction. A wallet holding one million dollars might be represented by thousands of distinct addresses in the UTXO set. Linking those addresses back to a single controlling entity requires heuristic analysis: common input ownership, change address detection, and clustering by behavioral patterns. None of these relationships are obvious from the raw transaction graph.

Mixing services and coinjoin transactions deliberately obscure the link between inputs and outputs, breaking naive chain-following approaches. Even without mixing, the sheer volume of the transaction graph makes brute-force traversal computationally intractable with conventional tools. Bitcoin history alone spans hundreds of millions of edges across a decade, with Ethereum adding more on top.

The question we needed to answer was not just "where did this transaction come from?" but "is this address or cluster of addresses connected to sanctioned entities, known fraud infrastructure, or behavioral patterns consistent with money laundering?" That requires graph analytics, not just ledger queries.

Why Relational Databases Are the Wrong Tool

Before building the graph stack, I spent time understanding why the existing tooling was failing. The compliance team was running SQL queries against a relational representation of the transaction ledger. The queries worked for simple cases but became unusably slow for anything involving multi-hop traversal.

The fundamental problem is that graph traversal in a relational database requires recursive joins. Finding all addresses reachable from a given address within three hops requires three self-joins on a transaction table with billions of rows. With proper indexing you can get somewhere, but the query plans become exponentially expensive as hop count increases, and the patterns that matter for AML and sanctions screening often involve five, ten, or fifteen hops across complex clustering of addresses.

Graph databases and graph processing frameworks are designed for exactly this access pattern. Traversal is a first-class operation, not an afterthought bolted onto a join model.

Graph Data Model Design

The data model we settled on was conceptually straightforward but required careful engineering to make performant at scale.

Nodes represented two things: individual blockchain addresses and clustered wallet entities. Clustering was its own significant subproblem. We used a combination of deterministic heuristics (common input ownership for Bitcoin, contract interaction patterns for Ethereum) and probabilistic entity resolution to group addresses into wallet clusters. A wallet cluster node carried attributes: estimated controlling entity where known, risk scores, flags for known-bad attribution from threat intel feeds, and OFAC screening status.

Edges represented transactions: directed, weighted by value transferred, and timestamped. Edges also carried derived attributes: whether the transaction involved a mixing service, whether it occurred during a period of market volatility that might indicate manipulation, and metadata linking to the originating raw transaction record.

This dual-level model (address graph and wallet cluster graph) was essential. Most analytical queries ran against the cluster graph, which was orders of magnitude smaller than the full address graph. Dropping down to the address graph was reserved for cases where precise provenance tracing was required.

The Analytics Stack

We chose tools deliberately, and the tradeoffs were different for each layer.

Nvidia Rapids (cuGraph) gave us GPU-accelerated graph algorithm execution. For algorithms like PageRank, connected components, and shortest path that needed to run over the entire wallet cluster graph on a schedule, GPU acceleration made the difference between a pipeline that completed in minutes and one that ran for hours. The cuGraph API was close enough to NetworkX that porting algorithms was straightforward, though memory management on GPU required more care than on-CPU equivalents.

Apache Arrow served as the data interchange layer. Moving data between Databricks, Rapids, and downstream consumers without serialization overhead was non-trivial at our data volumes. Arrow's columnar in-memory format eliminated the repeated serialization and deserialization that would otherwise dominate pipeline latency. The interoperability with both Spark (via the Arrow-Spark bridge) and Python analytics tools made it the obvious choice.

Graphistry handled visual graph exploration. When a compliance analyst needed to look at the neighborhood of a suspicious address, not just a risk score but a visual map of connected entities, Graphistry provided an interface that non-engineers could use effectively. The ability to visually trace fund flows, identify structural patterns, and annotate the graph during investigation was qualitatively different from reading tabular query results. Investigations that previously took hours of SQL iteration could be completed in minutes with visual exploration.

Neo4j served as the persistent graph store for entities and relationships that needed to be queryable interactively. While Databricks handled batch pipeline processing and Rapids handled compute-intensive analytics, Neo4j was the operational database for real-time compliance queries: "has this address transacted with any sanctioned entity within N hops?"

Databricks orchestrated the overall pipeline: ingesting on-chain data from multiple sources, running the clustering and enrichment jobs, writing to both the Arrow-format analytical store and Neo4j, and scheduling the periodic full-graph analytics runs.

Key Graph Algorithms

Connected components was the foundation of wallet clustering. If a set of addresses are connected through co-spending patterns, they are likely controlled by the same entity. Running connected components over the address transaction graph, filtered by the co-spending heuristic, gave us initial wallet clusters. These were refined iteratively with additional signals.

PageRank variants provided risk scoring. A standard PageRank on a transaction graph propagates importance based on transaction volume and connectivity. We adapted this to propagate risk: addresses that transacted with high-risk entities have their own risk scores elevated. The intuition is similar to web PageRank but the semantics are inverted: rather than "important pages link to you," the signal becomes "risky entities transacted with you." We tuned the damping factor and seeded the initial risk scores from known-bad attribution data.

Shortest path was the core of fund tracing. Given a suspicious incoming transaction, tracing the shortest path back to a known source answered the provenance question directly. That source might be a sanctioned entity, a fraud report, or a prior investigation. Shortest path over the full blockchain graph was computationally expensive, which is why the wallet cluster graph (smaller by orders of magnitude) handled most of these queries, with address-level graphs reserved for confirmed cases requiring precise documentation.

How a Suspicious Pattern Gets Flagged

Without describing any specific case, the general pattern worked like this.

A transaction arrives at a monitored address. The pipeline looks up the sending address in the graph and computes its risk score: a function of its PageRank-based risk propagation, its distance from any known-bad cluster, and behavioral features of the transaction itself (timing, amount, whether it has structuring characteristics).

If the risk score exceeds a threshold, the transaction is queued for review. The analyst opens the Graphistry visualization, sees the neighborhood of the sending address, and can immediately see whether it is connected to flagged entities, how many hops away those connections are, and the path of fund flow between them. The Neo4j query returns the shortest path to any OFAC-listed entity within the graph.

The documentation for a suspicious activity report is then substantially generated from the graph data: the structural evidence of connection, the fund flow path, the estimated amounts.

OFAC Sanctions Screening via Graph Proximity

Standard OFAC screening checks whether a specific address appears on the SDN list. That check is necessary but not sufficient. OFAC guidance, reinforced by FinCEN's subsequent clarification, makes clear that knowledge of sanctions-adjacent activity matters, not just direct transactions with listed entities.

Graph proximity gives us a principled measure of that adjacency. An address that is one direct transaction away from a sanctioned wallet is a different risk profile than one that is seven hops away through diverse intermediaries. Setting thresholds and escalation criteria based on hop count and transaction path characteristics, rather than just binary address matching, aligned our screening posture with the actual regulatory intent.

This is also where the two-level graph model paid dividends: OFAC screening at the wallet cluster level meant we were not deceived by address proliferation, where a sanctioned actor generates hundreds of new addresses to dilute graph proximity.

Production Scale and Throughput

At peak, the pipeline processed millions of transactions per day across multiple blockchains. The batch analytics jobs ran on a schedule tuned to keep the risk scores fresh enough for compliance purposes without saturating the Databricks cluster. Real-time screening (the direct OFAC address match) happened synchronously at transaction time. The deeper graph analytics ran asynchronously, with alerts generated when risk scores crossed thresholds.

The hardest operational problem was keeping the entity clustering current. New address attribution data arrived continuously from threat intel feeds, blockchain analytics vendors, and our own internal investigations. Updating the cluster graph incrementally, rather than recomputing from scratch, required careful attention to the consistency of connected component updates.

What Traditional AML Tools Miss

Most traditional AML systems are built around accounts, not graphs. They look for patterns within a single account's activity, such as velocity, amounts, and geographic anomalies, and do not effectively model the network of relationships between accounts.

A sophisticated laundering scheme might involve dozens of intermediary wallets, each individually showing unremarkable transaction patterns, but collectively forming a recognizable layering structure. Graph analytics finds that structure. Rule-based systems tuned to individual account behavior miss it entirely.

The other gap is the speed of graph-based tracing relative to manual investigation. A compliance analyst following a chain of transactions manually, wallet by wallet, might take days to establish a connection that a graph query answers in seconds. Speed matters both for regulatory response timelines and for catching active fraud before more funds move.

The framework we built was not perfect. Entity resolution on-chain remains genuinely hard, and sophisticated actors who understand graph analytics can deliberately create structures that inflate hop count or dilute risk propagation. But it was substantially more effective than what existed before, and it established a foundation that made subsequent improvements incremental rather than requiring another ground-up build.