feat: add Docker Compose deployment, refactor config paths and mock modes to CLI#6
Open
hwei0 wants to merge 17 commits intoNetSys:mainfrom
Open
feat: add Docker Compose deployment, refactor config paths and mock modes to CLI#6hwei0 wants to merge 17 commits intoNetSys:mainfrom
hwei0 wants to merge 17 commits intoNetSys:mainfrom
Conversation
akrentsel
reviewed
Mar 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a full Docker Compose deployment setup for TURBO, refactors how config paths (ZMQ sockets, log directories) are resolved at runtime, and moves mock mode toggles from YAML config fields to CLI flags.
Pre-built Docker image workflow (new)
compose.yamland.env.examplethat pull pre-built images from DockerHub, eliminating the need for local builds, SSL key generation, or Rust toolchain installationdocker compose pull+docker compose up(no--build), reducing setup to: clone, download data, configure.env, and rundocker/directory (Dockerfiles, build-orientedcompose.yaml,docker/.env.example) is preserved as the "build from source" workflow for development/customizationdocker/README.Docker.mdrewritten to focus on building from source as the secondary path, with shared config reference and troubleshooting kept in placeDocker Compose deployment
compose.yamlorchestrating 7 services acrossclientandserverprofilesclient_python_mainwrites/health/client_main_ready(only after all Client processes have bound theirquic_rcv_zmq_socket), thenclient_python_monitorwrites/health/monitor_ready, thenquic_clientstarts. This is coordinated via amultiprocessing.Manager().Queue()for cross-process ZMQ bind readinessipc: hostfor ZeroMQ IPC and POSIX shared memory across containers,init: true(tini) for signal forwarding, and a custom bridge network (10.64.89.0/24) for QUIC UDPdocker/config/) mirror the manual configs with container-internal paths (/app/...).env.exampleprovides all configurable variables (host UID/GID, model/eval data paths, mock mode toggles, networking)GPU support made optional via compose override
client_python_main(never needed — client uses PyTorch for CPU tensor ops only, notorch.cudausage) andserver_python_maincompose.gpu.yamloverride files (root anddocker/), included only on GPU hosts via-f compose.gpu.yamldocker compose --profile server updocker compose -f compose.yaml -f compose.gpu.yaml --profile server upError response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]on non-GPU hostsDocumentation improvements
README.mdto recommend pre-built Docker images as the default, with detailed prerequisites including:nf_tables/iptables issue; Docker 27 recommended)MOCK_INFERENCE=true.envtable now shows quickstart default values (~/av-models,~/full-eval,~/experiment2-out) instead of "(must set)"full-evalevaluation data is always required on the client (even in mock camera mode — the bandwidth allocator needs it for utility curve computation);av-modelsmodel checkpoints are not needed in mock inference mode<details>block with deduplicated prerequisitesdocker/README.Docker.mdstreamlined — duplicated setup steps replaced with cross-references to main README; removed erroneous Rust host prerequisite (Rust compiles inside the Docker build stage)Config path refactoring (Python + Rust + YAML)
ipc:///absolute/path/socket-nameto bare names (e.g.,service1-camera-socket) in all YAML configs. The Python orchestrators (client_main.py,server_main.py) and web dashboard (web_config.py) resolve them at runtime toipc://<zmq_dir>/<name>. The Rust QUIC binaries already used this pattern and were updated to match the renamed config key (zmq_pathdir→zmq_dir)client_savedir,camera_savedir,ping_savedir,server_log_savedir,quic_client_log_path, andquic_server_log_pathfrom YAML. Each orchestrator now auto-creates a timestamped run directory (e.g.,experiment_output_dir/client_main_2024-01-15_10-30-00/client/) and injects the save path into component configs at startup. The Rust QUIC binaries now also create timestamped subdirectories usingchronoDST_IPYAML field. Server IP is now extracted from the new required-s/--server_addressCLI argument onclient_main.pyexperiment_output_dir,zmq_dir,client_subdir/server_subdir,quic_client_log_subdir/quic_server_log_subdirMock mode refactoring
client_main.py --mock-camera) and mock inference (server_main.py --mock-inference) are now toggled via CLI flags instead of settingnullvs a path in YAML. The YAML always specifies the mock data paths; the CLI flag controls whether they're usedMOCK_CAMERA/MOCK_INFERENCEenvironment variables in.envmock_model_latency_csv_pathtoModelServerConfig— mock inference now simulates realistic per-model latency fromexperiment_model_info.csvinstead of a hardcoded 50ms sleep.npzto.npyand switched fromnp.fromfile()tonp.load()Other changes
**/Cargo.lockfrom.gitignoreand committed the lockfile to ensure reproducible Rust builds across environmentsstart_web_dashboard.py: Added-cshort flag for--configweb_frontend.py: Addedallow_unsafe_werkzeug=Truefor running inside Docker containers1000across all images;groupaddmoved from base to binary stageCONFIGURATION.md,IPC.md, andLOGGING.mdto reflect all the above changesFixes # (issue)
Type of change
How Has This Been Tested?
Tested successful connectivity using mock gpus and mock cameras on google cloud. Also tested successful connectivity using mock modes from laptop to google cloud.
Checklist: