Rewriting a Python Moderation Service in Go: From 3GB to 50MB

The Python service worked, in the sense that it produced correct output. But it consumed roughly 3GB of memory at rest, took between 30 and 60 seconds to fully initialize, and had to run on a machine with enough RAM to absorb that footprint before the first request was ever handled. For a service that needed to be always-on, respond to events within a few hundred milliseconds, and restart cleanly after a crash, those numbers were a problem. I rewrote it from scratch in Go over a few weeks. The result uses 20 to 50MB of memory and starts in under 100 milliseconds. This is an account of what changed and why.

What the Service Does

The service connects to a live platform's IRC and EventSub APIs to receive real-time chat events, classifies incoming messages against a rule set of 50-plus regex patterns, conditionally synthesizes speech responses via ElevenLabs TTS, posts behavioral data to a separate tracking microservice, and writes structured JSON logs to disk. Periodically, it flushes session data to S3-compatible object storage. It also sends commands to streaming software via a WebSocket API to update on-screen overlays based on live events.

Every one of those operations has a different latency profile. IRC messages arrive in bursts. TTS synthesis involves a network round trip to an external API. Behavior tracking API calls can be slow without affecting the critical path. S3 uploads are bursty and periodic. Coordinating all of this without one slow operation blocking the others was the core architectural problem.

Why Python Was the Wrong Tool for This

Python was the wrong choice for this service for reasons that are not primarily about raw speed. Go is faster than Python for CPU-bound work, but this service barely does any CPU-bound work. The real problems were three distinct issues.

The GIL and I/O concurrency. Python's Global Interpreter Lock means that only one thread executes Python bytecode at a time. Async Python (asyncio) works around this for I/O-bound work, but it requires careful, consistent use of await throughout the entire call stack. Mixing synchronous and asynchronous code blocks the event loop, and this happens easily when you import a library that predates asyncio. The original service had exactly this problem in two places, and fixing one introduced the other.

Import overhead and library weight. Python's startup time scales with what you import. The original service imported several libraries that are standard in Python AI work. Some of them loaded model weights or initialized subsystems at import time even when those subsystems were never used in this service's execution path. The 3GB memory footprint was not the running service doing useful work. It was the Python runtime and imported libraries sitting in memory waiting.

No single binary distribution. Deploying the Python service required getting the right Python version onto the target machine, installing dependencies via pip into a virtual environment, and verifying that the environment was consistent with development. Go compiles to a static binary with no external runtime dependencies.

None of these are Python's fault in a general sense. Python is excellent for interactive data work, scripting, and applications where developer velocity matters more than runtime characteristics. An always-on, latency-sensitive service that does I/O coordination is just not that kind of application.

The Go Architecture

The Go service is organized around a small number of goroutines with explicit channel communication between them.

IRC consumer goroutine maintains a persistent TCP connection to the platform's chat server, reads incoming messages, and puts them on a messages channel. Reconnection logic lives here: exponential backoff with a cap at 60 seconds, and a forced restart if the connection has been down for more than five minutes.

EventSub consumer goroutine maintains a WebSocket connection to the platform's event subscription system and puts relevant events on the same messages channel as the IRC consumer. From the downstream perspective, chat messages and subscription events look identical because they are both Event structs with a type field and a payload. The consumer goroutines are the only place in the codebase that knows the two sources exist.

Classifier goroutine reads from the messages channel and applies the regex rule set synchronously. Regex matching is fast enough that this does not need further parallelization. On a match, it constructs an action struct and puts it on an actions channel.

Action dispatcher goroutine reads from the actions channel and fans out concurrent goroutines for each required response: TTS synthesis, behavior API post, log write, and overlay update. These are launched with go func() calls rather than a worker pool, because action bursts are short and the overhead of goroutine creation at this scale is negligible compared to the I/O latency of any of the actions.

S3 backup goroutine ticks on a five-minute timer using time.NewTicker. On each tick, it serializes current session state to JSON and uploads it to S3-compatible object storage. This is entirely independent of the main message processing pipeline and has no effect on message throughput regardless of how long the upload takes.

Overlay WebSocket goroutine maintains a persistent connection to the streaming software's WebSocket API and accepts commands via a dedicated channel. Other goroutines send overlay updates by putting command structs on that channel rather than calling WebSocket methods directly. This prevents concurrent writes on a single WebSocket connection without any mutex usage.

The practical result is that a slow TTS API call does not delay the next message from being classified. A slow behavior API post does not block overlay updates. Everything that can run concurrently does, and the channel structure makes the data flow between components explicit and auditable.

Rule-Based Classification: Why Not ML

The classification system is 50-plus regex patterns organized into named categories: backseating, unsolicited advice, solicitation, excessive repetition, spam, and a handful of others specific to the platform context. When a message matches one or more patterns, it is labeled with those categories and routed to a configurable response action.

I deliberately did not use a machine learning model for this. The reasons were practical and not ideological.

Latency. Running inference through a local model or sending text to a remote classification endpoint adds latency on the critical path between message receipt and response. Regex matching on a message is measured in microseconds.

Interpretability. When a message is misclassified, I need to know which pattern fired and why so I can update it. A regex pattern is self-documenting in context. The debugging experience for regex is linear time to understanding; for a model it requires tooling that is not worth building for this task.

Updateability without redeployment. The regex patterns are loaded from a JSON config file at startup. Updating a pattern means editing a file and restarting the service, which takes about ten seconds. Adding a new category means adding a new key to the config. Retraining a model and redeploying it is a substantially higher bar for what are often minor behavioral adjustments.

LLM-powered moderation would provide better coverage on adversarial inputs and edge cases. But the latency and cost tradeoffs do not make sense for a task that already has well-defined rules expressible as patterns.

Structured Logging Without a Log Aggregator

The service writes to three separate JSON log files: chat.log for all incoming messages and their classifications, error.log for service errors and API failures, and session.log for session lifecycle events and metadata.

Each log entry is a JSON object on its own line, using newline-delimited JSON rather than a JSON array. The format is designed to be parseable with jq without preprocessing. When something goes wrong, I open a terminal and filter the log directly rather than standing up a log aggregation stack for a single-instance service.

{"ts":"2026-01-14T22:31:05Z","type":"message","user":"example","text":"let me help you with this","categories":["backseating"],"action":"tts_response","latency_ms":3}

Separate log files per concern turned out to be more useful in practice than a single log with a level or type field. When debugging a TTS synthesis failure, I do not want to scroll through a thousand chat message entries. When reviewing classification decisions for a session, I do not want error entries mixed in.

Log writes are handled by a dedicated goroutine with a buffered channel. Entries are put on the channel by action goroutines and written to disk serially by the log goroutine. This prevents concurrent writes to the same file without requiring a mutex on the write path, and keeps file handles open throughout the service lifetime rather than opening and closing them per entry.

Cross-Platform Builds via Makefile

The Go service builds to native binaries for Mac Intel, Mac ARM, and Linux via a Makefile with explicit GOOS and GOARCH settings:

build-mac-intel:
	GOOS=darwin GOARCH=amd64 go build -o bin/service-darwin-amd64 ./cmd/service

build-mac-arm:
	GOOS=darwin GOARCH=arm64 go build -o bin/service-darwin-arm64 ./cmd/service

build-linux:
	GOOS=linux GOARCH=amd64 go build -o bin/service-linux-amd64 ./cmd/service

build: build-mac-intel build-mac-arm build-linux

Cross-compilation is built into the Go toolchain. No Docker, no external cross-compilation toolchain, no per-platform CI runners. Running make build on a Mac ARM machine produces three working binaries in under ten seconds. The Linux binary can be copied directly to a Linux server and executed with no additional installation.

Graceful Shutdown and Final S3 Sync

The service registers a signal handler for SIGTERM and SIGINT. On receiving either signal, it follows a defined shutdown sequence: stop accepting new messages from the consumer goroutines, wait for the actions channel to drain with a configurable timeout (default 10 seconds), flush any pending log entries in the log goroutine's buffer, perform a final S3 sync of current session state, and close all external connections.

The final S3 sync was the most important part to get right. If the service is stopped mid-session, whether intentionally or due to a crash, the session data on S3 should reflect the state at shutdown, not the state of the last five-minute periodic backup. The graceful shutdown path writes the same JSON structure as the periodic backup goroutine, so downstream code reading session data does not need to distinguish between a scheduled backup and a shutdown snapshot.

Handling context.Context cancellation properly throughout the codebase made this straightforward. Each goroutine that does external I/O accepts a context. Canceling the root context at shutdown signal propagates cleanly through the whole system.

What Go Made Easier and Harder

Compared to the Python version, Go made the concurrent pipeline architecture dramatically simpler to reason about. Goroutines and channels are the right abstraction for this problem. The Python version had a mix of asyncio tasks and threads that required attention to which execution context each piece of code was running in. The Go version has explicit goroutines with explicit channels and none of that ambiguity. When I want two things to happen concurrently, I start two goroutines. When I want them to communicate, I use a channel.

The things Go made harder were mostly in the domain of text processing. Go's standard library has less built-in tooling for Unicode normalization and text transformation than Python's. I implemented a few utility functions that would have been single expressions against Python's str and unicodedata modules.

There are no machine learning libraries in this service, and that is the appropriate outcome. Go's ML ecosystem is thin by design. If I needed local model inference in this service, I would call out to a Python sidecar process rather than trying to replicate the Python ML ecosystem in Go. For networking, concurrency, structured I/O, and binary distribution, Go was the right tool for exactly this job.

The 3GB to 50MB memory improvement is real, but the startup time improvement is what matters more operationally. A service that starts in 100 milliseconds can be restarted transparently. A service that takes 30 to 60 seconds to start creates a visible gap in coverage every time it restarts for any reason, whether crash recovery, configuration change, or scheduled maintenance. That gap compounds over time in a way that raw memory consumption does not.

Part of an ongoing series on building personal engineering projects in Go.

Rewriting a Python Moderation Service in Go: From 3GB to 50MB

What the Service Does

Why Python Was the Wrong Tool for This

The Go Architecture

Rule-Based Classification: Why Not ML

Structured Logging Without a Log Aggregator

Cross-Platform Builds via Makefile

Graceful Shutdown and Final S3 Sync

What Go Made Easier and Harder

Related Articles

The Agentic Loop from Scratch: Building Function-Calling AI Without a Framework

API Economics and MCP: Designing Tools for Credit-Metered Threat Intelligence

Building a Provider-Agnostic LLM Interface in Go: Nine Providers, One Abstraction