At .conf17 I ran a hands-on workshop on building security apps in Splunk. Not conceptual security apps — working ones, with real data, that you could take home and deploy. The session was built around the Boss of the SOC dataset from .conf2016, which is the closest thing to production-quality adversarial data you can get your hands on in a workshop setting.
The premise of the workshop was that building a Splunk app is one of the most effective ways to operationalize security work, and that most security practitioners already have everything they need to do it. They just haven't been shown the path from raw data to something deployable.
Starting with Data You Can Trust
Before anything else, the app needs to ingest data correctly. A Technical Add-on handles that: it tells Splunk how to parse a specific data source, normalizes field names, and ensures the data lands in a state that downstream searches can rely on. Skipping this step is the most common reason security apps break after deployment. If the TA isn't right, every search built on top of it is wrong.
Data validation comes next. The question isn't whether Splunk received events — it's whether the events contain what you expect. Timestamps in the right timezone. Fields populated with the right types. No silently truncated payloads. This sounds tedious because it is, but it's the kind of problem that only becomes visible weeks later when a detection fires on garbage data or fails to fire on real activity.
The Common Information Model is what makes it possible to write searches that work across multiple data sources. CIM standardizes field names — src_ip, dest_ip, user, action — so a search written against web proxy data also works against firewall logs. The alternative is writing and maintaining separate searches for every data source, which is how SOC tooling turns into an unmaintainable pile of one-offs.
Getting to Signal Faster
Raw event data is expensive to search at scale. Summary indexing accelerates this by pre-computing aggregations — counts, sums, distinct values — and writing them to a summary index on a schedule. A search that would scan millions of events instead reads thousands of pre-aggregated records. For dashboards and scheduled detections that run frequently against large datasets, this is the difference between a tool that's usable and one that people stop opening.
Data enrichment is where threat context comes from. A lookup table that maps IP addresses to known bad actors, geolocation data that puts connections on a map, WHOIS data that shows when a domain was registered — none of this comes from the raw event, but all of it changes how an analyst interprets what they're seeing. Enrichment done at search time through lookups keeps the enrichment data current without requiring re-indexing.
Analysis Before Visualization
There's a tendency to jump to dashboards before the underlying analysis is solid. The analysis techniques covered in the workshop — statistical baselines, frequency analysis, outlier detection — are the things a dashboard should surface, not the other way around. A visualization of bad analysis is just a better-looking wrong answer.
Machine learning fits into this as hypothesis testing. You have a theory about what botnet beaconing looks like in network traffic. You encode that theory as features, train a model on labeled examples, and see whether the model's behavior matches your intuition. The Splunk Machine Learning Toolkit makes this accessible without requiring a Python environment or data science background — the clustering and anomaly detection assistants walk through feature selection and model configuration in the browser. The point isn't to automate analyst judgment. It's to get a signal in front of an analyst faster than signature-matching alone can.
Building Something You Can Actually Use
The workshop was iterative. Participants built a working app over the course of the session rather than watching a pre-built demo. The app that came out of it could be taken home, customized for a specific environment, or rebuilt against a different security framework. The Boss of the SOC dataset provided enough realistic activity — scanning, exploitation, exfiltration — to exercise the detection logic being built.
The methodology question the workshop was designed to answer isn't "how do I use Splunk" but "when does building an app actually solve my problem, and what should the app do." The answer depends on what data you have, what you're trying to detect, and what an analyst needs to be able to act on a finding. App structure follows from that, not the other way around.
Download Slides
Presented at Splunk .conf17 | Hands-on Workshop