Splunk put big data on the security practitioner's desktop, but most SOC engineers aren't data scientists. The tools existed; the accessibility didn't. That gap was the problem I wanted to address at .conf16.
The Splunk Machine Learning Toolkit was already shipping. The question was whether it could be made operationally useful without requiring a PhD to stand it up. The answer was yes, but it required thinking carefully about which problems actually benefit from ML and which are better handled with a well-tuned SPL query.
The talk walked through three use cases where the ML approach genuinely earned its complexity.
Data exfiltration detection is hard with signatures because there's no universal signature for "too much data leaving." What there is, is a baseline. Normal egress has a shape — predictable volumes, predictable destinations, predictable timing. Deviations from that shape are worth looking at. Splunk's outlier detection and time-series modeling give you a way to define that baseline programmatically and alert when something breaks pattern, without manually tuning a threshold for every host.
Port and traffic analysis is similar. Most hosts talk to a consistent set of services. When a workstation starts hitting ports it's never touched before, or talking to infrastructure outside its normal peer group, that's a behavioral anomaly. Statistical analysis of port usage over time surfaces those deviations before a signature would ever fire.
Advanced threat detection — meaning the stuff that specifically evades signatures — is where the ML framing makes the most sense. Sophisticated attackers blend in. ML doesn't care about what the attack looks like; it cares about what normal looks like, and flags the distance from it.
The throughline for all three: security practitioners don't need to become data scientists. They need tools that encode the data science so they can focus on the threat.
Watch Video
Download Slides
Presented at Splunk .conf16