Creating Custom Containers for the Deep Learning Toolkit

Anthony G. Tellez4 min read
Machine LearningDeep LearningDockerSplunkGPUMLOps

The Deep Learning Toolkit (DLTK) was launched at .conf19 with the intention of helping customers leverage additional deep learning frameworks as part of their machine learning workflows.

The app ships with four separate containers:

  • TensorFlow 2.0 - CPU
  • TensorFlow 2.0 - GPU
  • PyTorch
  • SpaCy

All containers provide a base install of Jupyter Lab & TensorBoard to help customers develop and create neural nets or custom algorithms.

Extensibility

The DLTK was built with extensibility in mind, as there are many different deep learning frameworks available to data scientists. This blog covers the open-sourced framework and how to build a custom container with pre-built libraries.

Deep Learning Frameworks Source: RTInsights - Top Deep Learning Tools

DLTK Container Repository

Clone the DLTK container repository:

$ git clone https://github.com/splunk/splunk-mltk-container-docker

Review the README.md for context on different flavors available for building base images. These are sorted into tags like tf-cpu, pytorch and match up with the build.sh located in the root directory.

Key Notes

  • The bases in build.sh use official images from tensorflow/tensorflow on DockerHub
  • The Dockerfile uses pip to install new libraries to customize the image

Creating an Image

Basic syntax:

$ ./build.sh tf-cpu your_local_docker_repo/

Creating a Custom Image: Nvidia Rapids Example

In this guide, we're creating a custom container to install the Nvidia Rapids Framework [rapids.ai]. Think of these libraries as similar to the ones that ship with the Machine Learning Toolkit, but capable of running on Nvidia GPUs.

Files to Modify

  1. build.sh - Add support for new tag
  2. Dockerfile - Use conda install instead of pip
  3. bootstrap.sh - Adjust container startup for virtual environments

Modify build.sh

Add after the nlp section:

rapidsai)
    base="rapidsai/rapidsai:cuda10.0-base-ubuntu18.04"
    ;;

Modify Dockerfile

Replace pip install commands with conda syntax:

Before:

RUN pip install Flask
RUN pip install h5py
RUN pip install pandas
RUN pip install scipy
RUN pip install scikit-learn
RUN pip install jupyterlab
RUN pip install shap
RUN pip install lime
RUN pip install matplotlib
RUN pip install networkx

After:

RUN conda install -n rapids jupyterlab flask h5py tensorboard nb_conda

Note: The Rapids container ships with specialized environments. Use -n rapids to install into the Rapids Python path.

Modify bootstrap.sh

Update shell settings and activate Rapids environment:

#!/bin/bash
source activate rapids

Building the Custom Image

Local Deployment

If Splunk is on the same machine, DockerHub setup is optional:

$ ./build.sh rapidsai docker_repo_name/

Example:

$ ./build.sh rapidsai anthonygtellez/

Push to DockerHub

For remote deployments, push to DockerHub:

$ docker push anthonygtellez/mltk-container-rapidsai:latest

Output:

The push refers to repository [docker.io/anthonygtellez/mltk-container-rapidsai]
8116c2c543aa: Pushed
aea568d7d2e7: Pushed
...
digest: sha256:46278436db8c3471f246d2334cc05ad9c8093ab7f98011fc92dc7d807faf4047 size: 2418

Note: Existing layers from rapidsai won't be re-uploaded - only changed layers.

Configuring the DLTK

Create images.conf in the DLTK local directory:

Location: $SPLUNK_HOME/etc/apps/mltk-container/local/images.conf

Configuration:

[conda]
title = Conda Rapids
image = mltk-container-rapidsai
repo = anthonygtellez/
runtime = none,nvidia

Deploy the Container

  1. Restart Splunk
  2. Open DLTK container management page
  3. Select new image from dropdown

Container Overview

Note: First launch may take time as Docker downloads the image. Subsequent deployments will be faster.

Using Jupyter Notebook

Once running, access Jupyter to use the newly installed libraries:

Jupyter Notebook

Additional Resources

Missed the .conf session? Watch it online:

FN1409 - Advances in Deep Learning with the MLTK Container for TensorFlow 2.0, PyTorch and Jupyter Notebooks

Watch on Splunk .conf


This framework enables customers to extend their machine learning pipelines with custom deep learning capabilities tailored to their specific needs.