Using Docker and Splunk to Operationalize the Machine Learning Toolkit

Anthony G. Tellez2 min read
SplunkDockerMachine LearningData ScienceMLOpsMLTKDevOps2019

Keeping a Splunk dev environment current is more work than it sounds. Every time a new version of the ML Toolkit or the Python for Scientific Computing add-on ships, there is a manual install cycle: download from Splunkbase, unpack, restart, verify. If you're running this on a shared server or want to hand the environment to someone else, the problem compounds. The official Splunk Docker image removes most of that friction. You can pull a fresh Splunk instance, tell it which apps to install at startup, and have a fully configured ML development environment running in a few minutes without touching the host.

You need Docker installed, root or sudo access on the host, and a Splunkbase account. Verify Docker is working before going further:

$ docker --version
Docker version 18.09.2, build 6247962

If that doesn't return something similar, the Docker.com installation docs cover all major platforms.

Pulling the Image

The official image lives on Docker Hub:

$ docker pull splunk/splunk:latest

More detail on the image is at Docker Hub - Splunk.

Running with Pre-installed Apps

The part that makes this useful for MLTK work is the SPLUNK_APPS_URL environment variable. Passing it at container launch time tells Splunk to authenticate against Splunkbase and download specific app versions before starting. This means the container comes up with the Python for Scientific Computing add-on and the ML Toolkit already installed, without any manual intervention:

$ docker run -d -p 8000:8000 \
  -e 'SPLUNK_START_ARGS=--accept-license' \
  -e 'SPLUNK_PASSWORD=splunk123' \
  -e SPLUNK_APPS_URL=https://splunkbase.splunk.com/app/2882/release/1.3/download,https://splunkbase.splunk.com/app/2890/release/4.1.0/download \
  -e SPLUNKBASE_USERNAME=<your_email@domain.com> \
  -e SPLUNKBASE_PASSWORD=<your_password> \
  splunk/splunk:latest

Replace the SPLUNKBASE_USERNAME and SPLUNKBASE_PASSWORD values with your Splunkbase credentials. The two URLs in SPLUNK_APPS_URL install Python for Scientific Computing v1.3 and the Splunk Machine Learning Toolkit v4.1.0. The comma-separated format means you can add more apps to that list without changing anything else about the command.

Once the container is running, Splunk Web is at http://localhost:8000. The default credentials are admin:splunk123 — change the password either by updating the SPLUNK_PASSWORD parameter before launch or through the UI afterward.

For a deeper look at what the ML Toolkit can do once the environment is up, the Splunk Machine Learning YouTube playlist covers the core use cases.