Ever wanted to manage and integrate your Splunk Enterprise deployment using your favorite data science tool? Then this blog's for you.
Important Notes:
- This is for development and single instance deployments only
- Requires sudo/root access to properly map user PIDs and directory ownership
Requirements
- Sudo/Root access
- Docker & knowledge of Docker CLI
- Splunk Enterprise
- Understanding of Jupyter Notebook
Preparing the Environment
Verify Docker is installed:
$ docker --version
Docker version 18.09.2, build 6247962
If not installed, see Docker installation docs.
Determine Splunk's UID
Following Splunk's best practices, run Splunk Enterprise as a local user. You'll need the UID to map directory ownership to the container.
For a Splunk installation owned by user splunk:
$ id -u splunk
1001
Stop Splunk Enterprise
If Splunk is running, shut it down:
$ /opt/splunk/bin/splunk stop
Install Jupyter Notebook via Docker
Map Splunk's web and splunkd ports to the container:
$ docker run -t -i --user root \
-p 8888:8888 -p 8000:8000 -p 8089:8089 \
-e NB_UID=1001 -e NB_GID=1001 \
-e JUPYTER_ENABLE_LAB=yes \
-e NB_USER=splunk \
-e CHOWN_EXTRA="/home/splunk" \
-v /opt/splunk/:/home/splunk/ \
jupyter/base-notebook
Assumptions
- Splunk Enterprise installed in
/opt/splunk/
- All files owned by user
splunk with UID 1001
- Ports 8888, 8000, 8089 are free
Disconnect from Docker Terminal
Use escape sequence: Ctrl+P, then Ctrl+Q
Verify Jupyter Access
If permissions are correct, Jupyter will treat /opt/splunk as /home/splunk.

Test Permissions
Open a terminal in Jupyter:
splunk@6ae2fb6269c4:~$ whoami
splunk
splunk@6ae2fb6269c4:~$ pwd
/home/splunk
splunk@6ae2fb6269c4:~$ ls
bin etc lib openssl share var
splunk@6ae2fb6269c4:~$ bin/splunk start
Access Splunk Web
Once started, access Splunk at:

Verify Splunk is running using top:

Leverage Splunk's CLI for Data Science
Interact with Splunk Enterprise via CLI for searches.
Basic Search
splunk@6ae2fb6269c4:~$ bin/splunk search 'index=_internal | fields _time | head 1'
Splunk username: admin
Password:
04-01-2019 08:28:15.935 +0000 INFO Metrics...
CSV Output
Change output format for easier Python integration:
$ bin/splunk search 'index=_internal | fields _time | head 1' -output csv
Using ML Toolkit Commands
Splunk's CLI supports app contexts and ML commands:
$ bin/splunk search '| inputlookup firewall_traffic.csv | head 50000
| fit LogisticRegression fit_intercept=true "used_by_malware"
from "bytes_sent" "bytes_received" "packets_sent" "packets_received"
"dest_port" "src_port" "has_known_vulnerability"
into "example_malware"'

Next Steps
In part two, we'll cover hands-on examples of leveraging this configuration for machine learning and analytics workflows.
This integration enables data scientists to use familiar tools while working with Splunk's powerful data platform.