SuriCon 2016 - Applying Data Science to Suricata

Anthony G. Tellez2 min read
SuricataSplunkMachine LearningSecurityData ScienceIDSSuriConConference2016

Suricata generates rich network telemetry. The challenge isn't collection — it's making the data work harder. The SuriCon audience already knew the IDS side cold. What I wanted to show was what happens when you layer Splunk's machine learning toolkit on top of that telemetry.

The specific friction I was addressing: Suricata practitioners are network security people, not necessarily SIEM people or ML people. Splunk's Machine Learning Toolkit is powerful, but it's designed for Splunk users. Bridging those two communities meant showing the IDS side what was possible with their own data, in a tool they may not have been using that way.

The core of the talk was feature engineering from network data. Raw Suricata logs are events. What ML needs are features — derived fields that encode behavior over time, not just individual packets. Session counts per source, destination diversity, byte ratios, timing patterns. Once you frame the problem that way, Suricata's telemetry becomes an extremely rich feature set.

From there, the use cases map cleanly. Clustering on connection patterns catches botnet behavior that rules miss — a compromised host checking in on a regular beacon interval looks nothing like human browsing, but it also doesn't look like any specific known C2. You find it by finding the cluster of hosts that behave like each other in a way no legitimate application would explain.

Data exfiltration from network logs is about volume and destination entropy. A host that starts sending unusual byte counts to destinations outside its normal peer group, especially at odd hours, stands out statistically even when it doesn't match any signature.

The argument I was making to the Suricata community was straightforward: you've already done the hard part. You have the data. The ML toolkit doesn't require you to be a data scientist — it requires you to know your network, which you already do.

Download Slides


Presented at SuriCon 2016