How BlockFi Is Using Machine Learning To Take Crypto Safety to the Moon!

Anthony G. Tellez3 min read
Machine LearningSplunkCryptocurrencyBlockchainSecurityFraud DetectionGraph AnalyticsBlockFiConference2021

BlockFi is a cryptocurrency platform that lets clients grow wealth through loans, trading, and interest accounts. Protecting those assets and the personal information behind them is something the security operations team takes seriously enough that we built dedicated machine learning infrastructure around it.

At .conf21, I presented how BlockFi uses Splunk to identify operational risks and keep client assets safe. The session covered three distinct machine learning techniques, each addressing a different category of threat.

Anomaly detection is the workhorse. In a system where transaction patterns are highly individual — some clients trade aggressively, others hold for months — you cannot define "suspicious" with a static rule set. The model learns what normal looks like for each account and flags deviations. Account takeover attempts tend to show up as anomalies before they show up as anything else: an unusual login time, a new device paired with a withdrawal request, session behavior that doesn't match the account's history. Getting that signal early is the difference between catching the attempt and writing a post-mortem.

Forecasting serves a different purpose. On the security side, it's primarily risk assessment and threat prediction — using historical incident patterns to anticipate where pressure is likely to increase. On the infrastructure side, it feeds capacity planning so the systems running the models don't fall over when trading volume spikes during a market move.

Graph analytics is where blockchain analysis gets interesting. The fundamental challenge with cryptocurrency forensics is that a single entity can control thousands of distinct addresses, and tracing fund flows across those addresses requires understanding the graph structure of the transaction network, not just querying a ledger. We used graph theory to cluster addresses, trace fund movements, and flag connections to known-bad infrastructure. The ability to follow money across the chain and identify when it touches a high-risk cluster is something simple SQL queries against a relational transaction table can't provide at scale.

Underlying all three of these techniques is an MLOps layer. Training models on raw event data requires efficient data summarization — you can't feed unprocessed log volume directly into most algorithms. Feature engineering for crypto security data involves constructing signals that aren't obvious from the raw fields: velocity calculations, behavioral fingerprints, network-derived risk scores. And getting models from development into production Splunk environments reliably, so that what worked in the notebook runs correctly in the scheduled search, requires treating the deployment step as seriously as the modeling step.

Download Slides


Presented at Splunk .conf21