feat: add Docker Compose deployment, refactor config paths and mock modes to CLI by hwei0 · Pull Request #6 · NetSys/turbo

hwei0 · 2026-03-01T10:10:45Z

Description

This PR adds a full Docker Compose deployment setup for TURBO, refactors how config paths (ZMQ sockets, log directories) are resolved at runtime, and moves mock mode toggles from YAML config fields to CLI flags.

Pre-built Docker image workflow (new)

Adds a root-level compose.yaml and .env.example that pull pre-built images from DockerHub, eliminating the need for local builds, SSL key generation, or Rust toolchain installation
The Quick Start now defaults to docker compose pull + docker compose up (no --build), reducing setup to: clone, download data, configure .env, and run
The existing docker/ directory (Dockerfiles, build-oriented compose.yaml, docker/.env.example) is preserved as the "build from source" workflow for development/customization
docker/README.Docker.md rewritten to focus on building from source as the secondary path, with shared config reference and troubleshooting kept in place

Docker Compose deployment

Adds 4 multi-stage Dockerfiles (Python base/binary, Rust base, QUIC binary) and a compose.yaml orchestrating 7 services across client and server profiles
Services start in dependency order via health checks: client_python_main writes /health/client_main_ready (only after all Client processes have bound their quic_rcv_zmq_socket), then client_python_monitor writes /health/monitor_ready, then quic_client starts. This is coordinated via a multiprocessing.Manager().Queue() for cross-process ZMQ bind readiness
All services use ipc: host for ZeroMQ IPC and POSIX shared memory across containers, init: true (tini) for signal forwarding, and a custom bridge network (10.64.89.0/24) for QUIC UDP
Both Python orchestrators now handle SIGTERM in addition to SIGINT for Docker graceful shutdown
Docker-specific YAML configs (docker/config/) mirror the manual configs with container-internal paths (/app/...)
.env.example provides all configurable variables (host UID/GID, model/eval data paths, mock mode toggles, networking)
Docker Compose Watch support for hot-reload during development

GPU support made optional via compose override

Removed hardcoded NVIDIA GPU reservations from both client_python_main (never needed — client uses PyTorch for CPU tensor ops only, no torch.cuda usage) and server_python_main
GPU reservations for the server are now in separate compose.gpu.yaml override files (root and docker/), included only on GPU hosts via -f compose.gpu.yaml
Non-GPU hosts can run the server with mock inference without the NVIDIA Container Toolkit: docker compose --profile server up
GPU hosts include the override: docker compose -f compose.yaml -f compose.gpu.yaml --profile server up
Fixes Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] on non-GPU hosts

Documentation improvements

Updated README.md to recommend pre-built Docker images as the default, with detailed prerequisites including:
- Docker 28 incompatibility warning (nf_tables/iptables issue; Docker 27 recommended)
- Disk space requirements (client ~50 GB, server ~13 GB)
- Clarified that NVIDIA GPU drivers + Container Toolkit are server-only (not needed for client or mock inference)
- USB webcam prerequisites (client-only, with mock camera alternative)
- Docker Compose V2 plugin installation instructions
- "No GPU?" guidance for running with MOCK_INFERENCE=true
Added per-step annotations for multi-machine deployments ("both client and server", "server only", "client only") across all setup steps
.env table now shows quickstart default values (~/av-models, ~/full-eval, ~/experiment2-out) instead of "(must set)"
Clarified mock mode data requirements: full-eval evaluation data is always required on the client (even in mock camera mode — the bandwidth allocator needs it for utility curve computation); av-models model checkpoints are not needed in mock inference mode
Added "Data Required" column to the mock mode combination table
Manual (non-Docker) setup collapsed into a <details> block with deduplicated prerequisites
docker/README.Docker.md streamlined — duplicated setup steps replaced with cross-references to main README; removed erroneous Rust host prerequisite (Rust compiles inside the Docker build stage)

Config path refactoring (Python + Rust + YAML)

ZMQ socket paths: Changed from hardcoded full ipc:///absolute/path/socket-name to bare names (e.g., service1-camera-socket) in all YAML configs. The Python orchestrators (client_main.py, server_main.py) and web dashboard (web_config.py) resolve them at runtime to ipc://<zmq_dir>/<name>. The Rust QUIC binaries already used this pattern and were updated to match the renamed config key (zmq_pathdir → zmq_dir)
Log output directories: Removed per-component client_savedir, camera_savedir, ping_savedir, server_log_savedir, quic_client_log_path, and quic_server_log_path from YAML. Each orchestrator now auto-creates a timestamped run directory (e.g., experiment_output_dir/client_main_2024-01-15_10-30-00/client/) and injects the save path into component configs at startup. The Rust QUIC binaries now also create timestamped subdirectories using chrono
Server IP for ping: Removed DST_IP YAML field. Server IP is now extracted from the new required -s/--server_address CLI argument on client_main.py
New top-level config keys: experiment_output_dir, zmq_dir, client_subdir/server_subdir, quic_client_log_subdir/quic_server_log_subdir

Mock mode refactoring

Mock camera (client_main.py --mock-camera) and mock inference (server_main.py --mock-inference) are now toggled via CLI flags instead of setting null vs a path in YAML. The YAML always specifies the mock data paths; the CLI flag controls whether they're used
In Docker, mock modes are toggled via MOCK_CAMERA/MOCK_INFERENCE environment variables in .env
Added mock_model_latency_csv_path to ModelServerConfig — mock inference now simulates realistic per-model latency from experiment_model_info.csv instead of a hardcoded 50ms sleep
Renamed mock output file from .npz to .npy and switched from np.fromfile() to np.load()

Other changes

Cargo.lock now tracked in version control — removed **/Cargo.lock from .gitignore and committed the lockfile to ensure reproducible Rust builds across environments
Added panic hooks with backtrace capture in both Rust QUIC binaries
Error callbacks in Python orchestrators now log full stack traces
start_web_dashboard.py: Added -c short flag for --config
web_frontend.py: Added allow_unsafe_werkzeug=True for running inside Docker containers
Dockerfile UID/GID defaults normalized to 1000 across all images; groupadd moved from base to binary stage
Updated CONFIGURATION.md, IPC.md, and LOGGING.md to reflect all the above changes

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Tested successful connectivity using mock gpus and mock cameras on google cloud. Also tested successful connectivity using mock modes from laptop to google cloud.

Checklist:

My commit title follows conventional commit guidelines
I have made corresponding changes to the documentation
I have tested my changes, confirming that my feature works

README.md

…er requirements

…to external ip

hwei0 and others added 8 commits February 13, 2026 15:41

docs: update docs with waymo terms of use

9125ffe

docs: fix pull request template placement

39a9d9f

refactor: restructure config format with timestamps and common root dir

7e16bef

fix: fix mock logic paths; add stack trace outputs for errors

42ac239

Merge branch 'NetSys:main' into mock-setup

2d8aeaa

feat: docker setup; other doc and config refactors

dd75f00

Merge branch 'mock-setup'

a63743c

feat: move mock toggle to CLI rather than via yaml

b063066

akrentsel reviewed Mar 1, 2026

View reviewed changes

hwei0 added 7 commits March 1, 2026 14:14

fix: remove Cargo.lock from .gitignore

735ff43

feat: add pre-built docker setup

92c239d

docs: update README with refinements for pre-reqs and setup steps

e9fd8f3

feat: split gpu docker compose into separate declaration; update dock…

5fdf4a8

…er requirements

docs: update config info; add troubleshooting info

859bdc9

refactor: rename some .env ip vars

62b8fe7

docs: add troubleshooting info when connecting docker client network …

fdc686a

…to external ip

hwei0 enabled auto-merge (squash) March 2, 2026 11:12

hwei0 added 2 commits March 2, 2026 13:03

docs: update mock mode docs to clarify usage

86b6c04

docs: update setup steps to highlight 3 options

13ea260

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Docker Compose deployment, refactor config paths and mock modes to CLI#6

feat: add Docker Compose deployment, refactor config paths and mock modes to CLI#6
hwei0 wants to merge 17 commits intoNetSys:mainfrom
hwei0:main

hwei0 commented Mar 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hwei0 commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pre-built Docker image workflow (new)

Docker Compose deployment

GPU support made optional via compose override

Documentation improvements

Config path refactoring (Python + Rust + YAML)

Mock mode refactoring

Other changes

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hwei0 commented Mar 1, 2026 •

edited

Loading