ATS Configuration Reload with observability/tracing - Token model#12892
ATS Configuration Reload with observability/tracing - Token model#12892brbzull0 wants to merge 8 commits intoapache:11-Devfrom
Conversation
a48a5dc to
8f9af3c
Compare
7d7ac21 to
1ffaeda
Compare
2e4c04f to
c84487c
Compare
|
[approve ci autest 1] |
c84487c to
55ee35b
Compare
validate_dependencies() incorrectly triggers on options that have default values but were not explicitly passed by the user. The root cause is that append_option_data() populates default values into the Arguments map before validate_dependencies() runs. When validate_dependencies() calls ret.get(key) for an option with a default, the lookup finds the entry and sets _is_called = true, making the option appear "used" even though the user never specified it on the command line. Fix by extracting the default-value loop into apply_option_defaults() and calling it after validate_dependencies() in parse().
Pre-register subtasks via reserve_subtask() when on_record_change() fires, preventing the main task from reaching SUCCESS before all record-triggered handlers have registered. Call RecFlushConfigUpdateCbs() after rereadConfig() to process callbacks synchronously instead of waiting for the 3s timer.
ad90671 to
d0fe21c
Compare
The test waited for 'sni.yaml finished loading' to appear 3 times in the diags log before proceeding after a config reload. The 3rd occurrence relied on duplicate handler execution from multiple trigger records firing independently — a pre-existing bug now fixed by the ConfigRegistry deduplication logic. With dedup, ssl_client_coordinator fires exactly once per reload cycle, so the expected count is 2 (startup + one reload).
cmcfarlen
left a comment
There was a problem hiding this comment.
I tried this out and it seems to work! Pretty sweet.
Does this really need to wait on those two PRs mentioned in the description?
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive configuration reload framework for Apache Traffic Server that replaces the previous fire-and-forget mechanism with a token-based, observable reload system. The new system provides full traceability of reload operations, centralized config registration through ConfigRegistry, and CLI/RPC-based monitoring and status querying.
Changes:
- Introduces
ConfigRegistry,ReloadCoordinator,ConfigReloadTask, andConfigContextas core framework components that centralize config file registration, track reload sessions with tokens, and provide status reporting through task trees - Migrates all existing config handlers (ip_allow, sni, ssl, remap, parent, cache, logging, splitdns, etc.) from the old
ConfigUpdateHandler/ConfigUpdateContinuationpattern to the newConfigRegistrypattern withConfigContextcallbacks - Adds new
traffic_ctlcommands (config reloadwith--monitor,--token,--data,--show-detailsoptions;config statuswith--token,--count) and JSONRPC APIs (admin_config_reload,get_reload_config_status) along with an ArgParser fix for default value dependency validation
Reviewed changes
Copilot reviewed 100 out of 101 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
include/mgmt/config/ConfigContext.h |
New context class passed to reload handlers for status tracking and inline YAML support |
include/mgmt/config/ReloadCoordinator.h |
Singleton managing reload sessions, tokens, concurrency, and history |
include/mgmt/config/ConfigReloadErrors.h |
Error code enum shared between server and client |
include/mgmt/config/ConfigReloadExecutor.h |
Header for async reload scheduling on ET_TASK |
include/mgmt/config/FileManager.h |
Replaced ConfigUpdateCbTable* with std::function<void()> for plugin callbacks |
include/records/RecCore.h |
Added RecFlushConfigUpdateCbs() and updated RecConfigWarnIfUnregistered signature |
include/records/YAMLConfigReloadTaskEncoder.h |
YAML encoder for task info in JSONRPC responses |
include/iocore/eventsystem/ConfigProcessor.h |
Removed legacy ConfigUpdateHandler/ConfigUpdateContinuation templates |
include/proxy/*.h, include/iocore/net/*.h, include/iocore/dns/*.h |
Updated reconfigure() signatures to accept ConfigContext |
include/shared/rpc/yaml_codecs.h |
Added default parameter to try_extract helper |
include/tscore/ArgParser.h |
Added apply_option_defaults() method |
src/mgmt/config/ConfigContext.cc |
ConfigContext implementation (status tracking, dependent contexts, YAML support) |
src/mgmt/config/ReloadCoordinator.cc |
Reload lifecycle management, token generation, history, deduplication |
src/mgmt/config/ConfigReloadExecutor.cc |
Schedules reload work on ET_TASK with FileManager integration |
src/mgmt/config/ConfigRegistry.cc |
New centralized config registry (not shown in diffs but referenced) |
src/mgmt/config/FileManager.cc |
Delegated records.yaml reload to ConfigRegistry, simplified plugin callbacks |
src/mgmt/config/AddConfigFilesHere.cc |
Removed — replaced by ConfigRegistry registrations in individual modules |
src/mgmt/config/CMakeLists.txt |
Updated build to include new source files |
src/mgmt/rpc/handlers/config/Configuration.cc |
New JSONRPC handlers for reload and status with token/force/inline support |
src/traffic_server/traffic_server.cc |
Replaced initializeRegistry() with register_config_files() using ConfigRegistry |
src/traffic_server/RpcAdminPubHandlers.cc |
Registered new get_reload_config_status RPC handler |
src/traffic_ctl/traffic_ctl.cc |
Added reload/status subcommand options with monitor, token, data, force flags |
src/traffic_ctl/jsonrpc/CtrlRPCRequests.h |
New request/response models for reload and status |
src/traffic_ctl/jsonrpc/ctrl_yaml_codecs.h |
YAML codecs for reload request/response serialization |
src/traffic_ctl/CtrlPrinters.cc |
Reload report/progress bar rendering with tree display |
src/traffic_ctl/CtrlPrinters.h |
Added printer methods and as<>() template |
src/traffic_ctl/CtrlCommands.h |
Added reload helper method declarations |
src/traffic_ctl/TrafficCtlStatus.h |
Added CTRL_EX_TEMPFAIL exit code |
src/proxy/IPAllow.cc |
Migrated to ConfigRegistry with add_file_dependency for ip_categories |
src/proxy/CacheControl.cc |
Migrated to ConfigRegistry |
src/proxy/ParentSelection.cc |
Migrated to ConfigRegistry |
src/proxy/ReverseProxy.cc |
Migrated to ConfigRegistry, removed UR_UpdateContinuation |
src/proxy/logging/LogConfig.cc |
Migrated to ConfigRegistry with deferred reload context |
src/proxy/http/PreWarmConfig.cc |
Migrated to ConfigRegistry via register_record_config |
src/iocore/net/SSLClientCoordinator.cc |
Migrated to ConfigRegistry with add_file_and_node_dependency |
src/iocore/net/SSLConfig.cc |
Migrated ticket key to ConfigRegistry, updated reconfigure signatures |
src/iocore/net/SSLSNIConfig.cc |
Updated reconfigure to accept ConfigContext |
src/iocore/net/QUICMultiCertConfigLoader.cc |
Updated reconfigure to accept ConfigContext |
src/iocore/net/quic/QUICConfig.cc |
Updated reconfigure to accept ConfigContext |
src/iocore/dns/SplitDNS.cc |
Migrated to ConfigRegistry |
src/iocore/cache/Cache.cc |
Migrated hosting.config to ConfigRegistry (late registration) |
src/iocore/cache/CacheHosting.cc |
Removed old config_callback |
src/iocore/cache/P_CacheHosting.h |
Removed CacheHostTableConfig continuation |
src/records/RecCore.cc |
Updated RecConfigWarnIfUnregistered to log via ConfigContext |
src/records/P_RecCore.cc |
Added RecFlushConfigUpdateCbs() |
src/records/RecordsConfig.cc |
Added reload timeout/check_interval records |
src/records/CMakeLists.txt |
Added reload infrastructure and test sources |
src/tscore/ArgParser.cc |
Extracted apply_option_defaults(), called after dependency validation |
src/tscore/unit_tests/test_ArgParser.cc |
Tests for case-sensitive short options and default value validation |
src/records/unit_tests/test_ConfigReloadTask.cc |
Unit tests for state transitions and timeout behavior |
src/records/unit_tests/test_ConfigRegistry.cc |
Unit tests for registry resolution and dependency management |
Various CMakeLists.txt files |
Added configmanager dependency to tests and libraries |
tests/gold_tests/traffic_ctl/traffic_ctl_config_reload.test.py |
New test for traffic_ctl config reload commands |
tests/gold_tests/traffic_ctl/traffic_ctl_test_utils.py |
Added ConfigReload/ConfigStatus helper classes |
tests/gold_tests/jsonrpc/config_reload_tracking.test.py |
JSONRPC-level reload tracking tests |
tests/gold_tests/jsonrpc/config_reload_reserve_subtask.test.py |
Tests for subtask reservation during reload |
tests/gold_tests/jsonrpc/config_reload_full_smoke.test.py |
Full smoke test for all config handlers |
tests/gold_tests/jsonrpc/config_reload_dedup.test.py |
Deduplication test for multi-trigger configs |
tests/gold_tests/ip_allow/ip_allow_reload_triggered.test.py |
IP allow reload via file dependency test |
tests/gold_tests/parent_config/parent_config_reload.test.py |
Parent config reload test |
tests/gold_tests/cache/cache_config_reload.test.py |
Cache/hosting config reload test |
tests/gold_tests/dns/splitdns_reload.test.py |
SplitDNS reload test |
You can also share your feedback on Copilot code review. Take the survey.
| url_rewrite_CB(const char * /* name ATS_UNUSED */, RecDataT /* data_type ATS_UNUSED */, RecData data, | ||
| void * /* cookie ATS_UNUSED */) | ||
| { | ||
| int my_token = static_cast<int>((long)cookie); | ||
|
|
||
| switch (my_token) { | ||
| case REVERSE_CHANGED: | ||
| rewrite_table->SetReverseFlag(data.rec_int); | ||
| break; | ||
|
|
||
| case TSNAME_CHANGED: | ||
| case FILE_CHANGED: | ||
| case HTTP_DEFAULT_REDIRECT_CHANGED: | ||
| eventProcessor.schedule_imm(new UR_UpdateContinuation(reconfig_mutex), ET_TASK); | ||
| break; | ||
|
|
||
| case URL_REMAP_MODE_CHANGED: | ||
| // You need to restart TS. | ||
| break; | ||
|
|
||
| default: | ||
| ink_assert(0); | ||
| break; | ||
| } | ||
|
|
||
| rewrite_table->SetReverseFlag(data.rec_int); | ||
| return 0; |
There was a problem hiding this comment.
The url_rewrite_CB function now only handles REVERSE_CHANGED (calling SetReverseFlag), since TSNAME_CHANGED, FILE_CHANGED, and HTTP_DEFAULT_REDIRECT_CHANGED are now handled by ConfigRegistry. However, the function still takes cookie parameter (previously used to determine which case to handle) but now ignores it. The callback registration at line 91 still passes (void *)REVERSE_CHANGED as cookie. This works, but the function signature with unused cookie and the remaining RecRegisterConfigUpdateCb for proxy.config.reverse_proxy.enabled should be documented or simplified to clarify that this is now only for reverse proxy enable/disable toggling.
| p.Ready = When.FileContains(ts.Disk.diags_log.Name, "parent.config finished loading", 3) | ||
| p.Timeout = 20 | ||
| tr.Processes.Default.StartBefore(p) | ||
| ## TODO: we should have an extension like When.ReloadCompleted(token, success) to validate this inetasd of parsing |
There was a problem hiding this comment.
Typo in comment: "inetasd" should be "instead".
| void setup_log_objects(); | ||
|
|
||
| static int reconfigure(const char *name, RecDataT data_type, RecData data, void *cookie); | ||
| // static int reconfigure(const char *name, RecDataT data_type, RecData data, void *cookie); |
There was a problem hiding this comment.
The commented-out old declaration // static int reconfigure(const char *name, RecDataT data_type, RecData data, void *cookie); should be removed. The new declaration is directly below it.
| tr = Test.AddTestRun("remap_config reload, test") | ||
| tr.Processes.Default.Env = tm.Env | ||
| tr.Processes.Default.Command = 'sleep 2; traffic_ctl rpc invoke get_reload_config_status -f json | jq' |
There was a problem hiding this comment.
The remap_reload.test.py appends a test run that invokes get_reload_config_status via traffic_ctl rpc invoke and pipes to jq, but doesn't set a return code expectation or add any validation. This appears to be a debug/exploratory command left in the test. It should either validate something meaningful or be removed before merging.
| if (auto p = _task.lock()) { | ||
| auto child = p->add_child(description); | ||
| // child task will get the full content of the parent task | ||
| // TODO: eventyually we can have a "key" passed so child module |
There was a problem hiding this comment.
Typo in comment: "eventyually" should be "eventually".
| swoc::bwprint(text, "{} failed to load", new_table->ip_categories_config_file); | ||
| Error("%s\n%s", text.c_str(), swoc::bwprint(text, "{}", errata).c_str()); |
There was a problem hiding this comment.
The IpAllow::reconfigure reuses the same text string variable for both the bwprint formatting call and the Error/Fatal macro. On line 135, swoc::bwprint(text, ...) writes into text, then the Error macro passes text.c_str() as the first %s argument, and swoc::bwprint(text, "{}", errata).c_str() as the second — but this second call overwrites text in-place, so text.c_str() in the first argument now points to the errata content, not the original error message. The same issue exists on line 143/145. Each bwprint call should use a separate buffer for the errata string.
| // The server will not let you start two reload at the same time. This option will force a new reload | ||
| // even if there is one in progress. Use with caution as this may have unexpected results. | ||
| // This is mostly for debugging and testing purposes. note: Should we keep it here? | ||
| .add_option("--force", "-F", "Force reload even if there are unsaved changes") |
There was a problem hiding this comment.
The --force option's description says "Force reload even if there are unsaved changes" but according to the PR description and the RPC handler, --force forces a new reload even if one is in progress. The help text is misleading — "unsaved changes" implies something different. The description should be something like "Force a new reload even if one is in progress".
|
|
||
| /// Generate a warning if any configuration name/value is not registered. | ||
| void RecConfigWarnIfUnregistered(); | ||
| // void RecConfigWarnIfUnregistered(); |
There was a problem hiding this comment.
The commented-out old declaration // void RecConfigWarnIfUnregistered(); should be removed. It serves no purpose and could be confusing alongside the new declaration immediately below it.
| if(NOT APPLE) | ||
| target_link_options(test_NextHopStrategyFactory PRIVATE -Wl,--allow-multiple-definition) | ||
| endif() |
There was a problem hiding this comment.
Using --allow-multiple-definition linker flag is a workaround that suppresses legitimate duplicate symbol warnings. This can mask real issues where the same symbol is defined in multiple compilation units. Consider using proper library factoring or weak symbols instead. If this is truly needed as a short-term fix, at minimum add a comment explaining which symbols conflict and why.
| // | ||
| // The server will not let you start two reload at the same time. This option will force a new reload | ||
| // even if there is one in progress. Use with caution as this may have unexpected results. | ||
| // This is mostly for debugging and testing purposes. note: Should we keep it here? |
There was a problem hiding this comment.
The comment says "note: Should we keep it here?" — this internal discussion question should be resolved before merging. Either remove the option or remove the comment.
New Configuration Reload Framework
Contents:
traffic_ctlCommandsConfigRegistryTL;DR
How Reload Works — The Token Model
When you run
traffic_ctl config reload, the command sends a JSONRPC request to ATS andreturns immediately — it does not block until every config handler finishes. Instead, ATS:
(e.g.,
rldtk-1739808000000, a timestamp-based ID) or user-supplied via-t(e.g.,
-t deploy-v2.1).ET_TASK). Each registered confighandler runs, reports its status (
in_progress→successorfail), and the results areaggregated into a task tree.
The token is the handle for everything that follows:
traffic_ctl config reload -t <token> -mtraffic_ctl config status -t <token>traffic_ctl config status -t <token> -lIf you don't supply
-t, ATS generates a token automatically and prints it so you can copy-pasteit into follow-up commands. If you supply
-t deploy-v2.1, that exact string becomes the token —useful for CI pipelines, deploy scripts, or any workflow where you want a meaningful, predictable
identifier.
In short: the token is a reload's unique ID. You get it when the reload starts, and you use it
to ask "what happened?" at any point afterwards.
New
traffic_ctlcommands:# Monitor mode with a custom token $ traffic_ctl config reload -t deploy-v2.1 -m ✔ Reload scheduled [deploy-v2.1] ✔ [deploy-v2.1] ████████████████████ 11/11 success (245ms)Failed reload — monitor mode:
$ traffic_ctl config reload -t hotfix-ssl-cert -m ✔ Reload scheduled [hotfix-ssl-cert] ✗ [hotfix-ssl-cert] ██████████████░░░░░░ 9/11 fail (310ms) Details : traffic_ctl config status -t hotfix-ssl-certFailed reload — status report:
$ traffic_ctl config status -t hotfix-ssl-cert ✗ Reload [fail] — hotfix-ssl-cert Started : 2025 Feb 17 14:30:10.500 Finished: 2025 Feb 17 14:30:10.810 Duration: 310ms ✔ 9 success ◌ 0 in-progress ✗ 2 failed (11 total) Tasks: ✔ ip_allow.yaml ·························· 18ms ✔ remap.config ··························· 42ms ✗ logging.yaml ·························· 120ms ✗ FAIL ✗ ssl_client_coordinator ················· 85ms ✗ FAIL ├─ ✔ sni.yaml ··························· 20ms └─ ✗ ssl_multicert.config ··············· 65ms ✗ FAIL ...Inline YAML reload (runtime only, not persisted to disk):
$ traffic_ctl config reload -d @ip_allow_new.yaml -t update-ip-rules -m ✔ Reload scheduled [update-ip-rules] ✔ [update-ip-rules] ████████████████████ 1/1 success (18ms) Note: Inline configuration is NOT persisted to disk. Server restart will revert to file-based configuration.The
-dflag accepts@filenameto read from a file, or@-to read from stdin. The YAML fileuses registry keys as top-level keys — the key string passed as the first argument to
register_config()orregister_record_config(). The content under each key is the actual YAMLthat the config file normally contains — it is passed as-is to the handler via
ctx.supplied_yaml().A single file can target multiple handlers:
New
traffic_ctlCommandstraffic_ctl config reloadtraffic_ctl config reload -mtraffic_ctl config reload -s -ltraffic_ctl config reload -t <token>traffic_ctl config reload -d @file.yamltraffic_ctl config reload -d @-traffic_ctl config reload --forcetraffic_ctl config reload -m -r 0.5traffic_ctl config reload -m -w 2traffic_ctl config reload -m -T 30sEX_TEMPFAIL(75) if not done. Requires-m.traffic_ctl config statustraffic_ctl config status -t <token>traffic_ctl config status -c allAll commands support
--format jsonto output the raw JSONRPC response instead of human-readabletext. This is useful for automation, CI pipelines, or any tool that consumes structured output
directly:
traffic_ctl config status -t reload1 --format json{ "tasks": [ { "config_token": "reload1", "status": "success", "description": "Main reload task - 2026 Feb 18 20:03:10", "filename": "", "meta": { "created_time_ms": "1771444990585", "last_updated_time_ms": "1771444991015", "main_task": "true" }, "log": [], "sub_tasks": [ { "config_token": "reload1", "status": "success", "description": "ip_allow", "filename": "/opt/ats/etc/trafficserver/ip_allow.yaml", "meta": { "created_time_ms": "1771444991013", "last_updated_time_ms": "1771444991015", "main_task": "false" }, "log": [], "logs": [ "Finished loading" ], "sub_tasks": [] }, { "config_token": "reload1", "status": "success", "description": "ssl_ticket_key", "filename": "", "meta": { "created_time_ms": "1771444991015", "last_updated_time_ms": "1771444991015", "main_task": "false" }, "log": [], "logs": [ "SSL ticket key reloaded" ], "sub_tasks": [] } ] } ] }New JSONRPC APIs
admin_config_reloadconfigsparam is present. Params:token,force,configs.get_reload_config_statustokenor get the last N reloads viacount.Inline reload RPC example:
Background: Issues with the Previous Reload Mechanism
The previous configuration reload relied on a loose collection of independent record callbacks
(
RecRegisterConfigUpdateCb) wired throughFileManagerandAddConfigFilesHere.cc. Each configmodule registered its file independently, and reloads were fire-and-forget:
ran, or how long each one took.
a "reload session" grouping all config updates triggered by a single request.
to push YAML content at runtime through the RPC or CLI.
AddConfigFilesHere.cc(forFileManager) and individual modules (for record callbacks), making it hard to reason about whichfiles were tracked and which records triggered reloads.
status of a specific reload or distinguish between overlapping reloads.
What the New Design Solves
(
in_progress,success,fail) through aConfigContext. Results are aggregated into a tasktree with per-handler timings and logs.
ConfigRegistryis the single source of truth for all config files,their filename records, trigger records, and reload handlers.
ConfigSource::FileAndRpc) can receive YAMLcontent directly through the RPC, without writing to disk. This is runtime-only — the content
lives in memory and is lost on restart.
ReloadCoordinatormanages the lifecycle of each reload:token generation, concurrency control (
--forceto override), timeout detection, and history.traffic_ctl config reload -mshows a live progress bar.traffic_ctl config statusprovides a full post-mortem with task tree, durations, and failuredetails.
Basic Design
Key components:
ConfigRegistryFileManager.ReloadCoordinatorConfigReloadTaskConfigContextin_progress(),complete(),fail(),log(),supplied_yaml(), andadd_dependent_ctx(). Safe no-op at startup (no active reload task).ConfigReloadProgressTIMEOUT.Thread model:
All reload work runs on
ET_TASKthreads — never on the RPC thread or event-loop threads:admin_config_reload), creates the reload tokenand task via
ReloadCoordinator::prepare_reload(), then schedules the actual work onET_TASKand returns immediately. The RPC response (with the token) is sent back to
traffic_ctlbeforeany handler runs.
ET_TASK— file-based reload —ReloadWorkContinuationfires onET_TASK. It callsFileManager::rereadConfig(), which walks every registered file and invokesConfigRegistry::execute_reload()for each changed config. Each handler runs synchronouslyon this thread.
ET_TASK— inline (RPC) reload —ScheduledReloadContinuationfires onET_TASK. It callsConfigRegistry::execute_reload()directly for the targeted config key(s). Same synchronousexecution.
LogConfig) schedule work on other threads andreturn before completion. The
ConfigContextthey hold remains valid (weak pointer + ref-countedYAML node), and they must call
ctx.complete()orctx.fail()from whatever thread finishesthe work. If they don't, the timeout checker marks the task as
TIMEOUT.ConfigReloadProgressis a per-reload continuation onET_TASKthat pollsperiodically and marks stuck tasks as
TIMEOUT.Handlers block
ET_TASKwhile they run. A slow handler delays all subsequent handlers in the samereload cycle. This is the same behavior as the previous mechanism — the difference is that now it's
visible through the task tree and timings.
Stuck reload checker:
ConfigReloadProgressis a periodic continuation scheduled onET_TASK. It monitors active reloadtasks and marks any that exceed the configured timeout as
TIMEOUT. This acts as a safety net forhandlers that fail to call
ctx.complete()orctx.fail()— for example, if a handler crashes,deadlocks, or its deferred thread never executes. The checker reads
proxy.config.admin.reload.timeoutdynamically at each interval, so the timeout can be adjusted at runtime without a restart. This is
a simple record read (
RecGetRecordString), not an expensive operation. Setting thetimeout to
"0"disables it (tasks will run indefinitely until completion).The checker is not a global poller — a new instance is created per-reload and self-terminates once
the task reaches a terminal state. No idle polling when no reload is in progress.
How Handlers Work
Before — scattered registration (ip_allow example):
Registration was split across multiple files with no centralized tracking:
Now — each module self-registers with full tracing:
Each module registers itself directly with
ConfigRegistry. No more separateAddConfigFilesHere.ccentry — the registry handles
FileManagerregistration, record callbacks, and status trackingautomatically:
Additional triggers can be attached from any module at any time:
Composite configs can declare file dependencies and dependency keys. For example,
SSLClientCoordinatorowns
sni.yamlandssl_multicert.configas children:Handler interaction with
ConfigContext:Each config module implements a C++ reload handler — the callback passed to
register_config().The handler reports progress through the
ConfigContext:When a reload fires, the handler receives a
ConfigContext:ctx.supplied_yaml()is undefined; the handler reads from its registered file on disk.ctx.supplied_yaml()contains the YAML node passed via--data/ RPC.The content is runtime-only and is never written to disk.
Handlers report progress:
Supplied YAML — inline content via
-d/ RPC:When a handler opts into
ConfigSource::FileAndRpc, it can receive YAML content directly insteadof reading from disk. The handler checks
ctx.supplied_yaml()to determine the source:For composite configs (e.g.,
SSLClientCoordinator), handlers create child contexts to trackeach sub-config independently. From
SSLClientCoordinator::reconfigure():The parent task automatically aggregates status from its children. In
traffic_ctl config status,this renders as a tree:
Design Challenges
1. Handlers must reach a terminal state — or the task hangs
The entire tracing model relies on handlers calling
ctx.complete()orctx.fail()beforereturning. If a handler returns without reaching a terminal state, the task stays
IN_PROGRESSindefinitely until the timeout checker marks it as
TIMEOUT.After
execute_reload()calls the handler, it checksctx.is_terminal()and emits a warningif the handler left the task in a non-terminal state:
The safety net:
ConfigReloadProgressruns periodically onET_TASKand marks stuck tasks asTIMEOUTafter the configured duration (proxy.config.admin.reload.timeout, default:1h).2. Parent status aggregation from sub-tasks
Parent tasks do not track their own status directly — they derive it from their children.
When a child calls
complete()orfail(), it notifies its parent, which re-evaluates:FAILIN_PROGRESSSUCCESSThis aggregation is recursive: a sub-task can have its own children (e.g.,
ssl_client_coordinator→sni+ssl_multicert), and status bubbles up through the tree.One subtle issue: if a handler creates child contexts but forgets to call
complete()orfail()on one of them, that child staysCREATEDand the parent never reachesSUCCESS.It is the handler developer's responsibility to ensure every
ConfigContext(and its children)reaches a terminal state (
complete()orfail()). The timeout checker is the ultimate safetynet for cases where this is not properly handled.
3. Startup vs. reload — same handler, different context
Handlers are called both at startup (initial config load) and during runtime reloads. At startup,
there is no active
ReloadCoordinatortask, so allConfigContextoperations (in_progress(),complete(),fail(),log()) are safe no-ops — they check_task.lock()and returnimmediately if the weak pointer is expired or empty.
This avoids having two separate code paths for startup vs. reload. The handler logic is identical
in both cases:
4. Duplicate handler execution for multi-record triggers (fixed)
Known issue:
ssl_client_coordinatorregisters multiple trigger records and file dependencies,each wiring an independent
on_record_changecallback. When several of these fire during the samereload, the handler executes more than once, producing duplicate entries in the reload status output.
This behavior exists on
masteras well, but was not visible because the old reload path had no status tracking. The new framework's task tree makes the duplicate executions observable.This is a pre-existing issue present on
master— see#11724.
This issue is resolved in this PR.
5. Plugin support
Plugins are not supported by
ConfigRegistryin this PR. The legacy reload notificationmechanism (
TSMgmtUpdateRegister) still works — plugins registered through it will continueto be invoked via
FileManager::invokeConfigPluginCallbacks()during every reload cycle.A dedicated plugin API to let plugins register their own config handlers and participate in
the reload framework will be addressed in a separate PR.
6. Subtask reservation — ensuring all handlers are tracked from the start
During a file-based reload, handlers are discovered in two phases:
ip_allow,remap) run synchronously insideFileManager::rereadConfig(). Their subtasks are created and completed immediately on theET_TASKthread that drives the reload.cache_control,ssl_ticket_key,ssl_client_coordinator) are activated by record callbacks. WhenrereadConfig()detectschanged files, it calls
RecSetSyncRequired()to flag the associated records as dirty.To avoid a gap between file-based completions and record-triggered registrations,
RecFlushConfigUpdateCbs()is called immediately afterrereadConfig(). This synchronouslyfires all pending record callbacks instead of waiting for the next
config_update_conttick(~3s). Each
on_record_change()callback callsreserve_subtask()to pre-register a CREATEDsubtask on the main task, then schedules the actual handler on
ET_TASK.The result is that all subtasks — both file-based and record-triggered — are registered before
the reload executor returns. The task count is stable from the first status poll:
rereadConfig()finishes — file-based subtasks are created and completed.RecFlushConfigUpdateCbs()fires —on_record_change()callbacks run synchronously,each calling
reserve_subtask()to register a CREATED subtask.ET_TASK— activate their reserved subtasks viacreate_config_context(), execute the handler, and complete the subtask.As a safety net,
add_sub_task()also callsaggregate_status()when the parent has alreadyreached SUCCESS, reverting it to IN_PROGRESS. This handles any edge case where a subtask is
registered after all known work has completed (e.g., from a plugin or deferred callback).
Configs Migrated to
ConfigRegistryip_allowip_allow.yamlip_allow)ip_categories.yamlcache_controlcache.configcache_hostinghosting.configparent_proxyparent.configsplit_dnssplitdns.configremapremap.config¹logginglogging.yamlssl_client_coordinatorssl_client_coordinator)sni.yamlssl_client_coordinator)ssl_multicert.configssl_ticket_keyrecordsrecords.yamlstorage.configsocks.configvolume.configplugin.configjsonrpc.yaml¹ Remap migration will be refactored after #12813 (remap.yaml)
and #12669 (virtual hosts) land.
New Configuration Records
TODO
traffic_ctl.en.rst(commands & options),jsonrpc-api.en.rst(admin_config_reload/get_reload_config_status), andconfig-reload-framework.en.rst(developer guide).ConfigUpdateHandler/ConfigUpdateContinuationremoved fromConfigProcessor.h. RemainingregisterFile()calls inAddConfigFilesHere.ccwill be retired via inventory-onlyConfigRegistryentries (see below).AddConfigFilesHere.ccintoConfigRegistry— Static files (storage.config,volume.config,plugin.config, etc.) registered as inventory-only entries.records.yamlregistered with reload handler.AddConfigFilesHere.ccremoved.Future Work (separate PRs)
ConfigRegistry. Remaining work: fully log detailed errors viactx.log(),ctx.fail(), etc.ConfigSource::FileOnlyand record-only handlers useConfigSource::RecordOnly. Migrate file-based handlers toFileAndRpcso they can read YAML directly from the RPC (viactx.supplied_yaml()).traffic_ctl config status -t <token>) instead of grepping log files.trigger_records/RecRegisterConfigUpdateCb) are not currently tracked. Create a main task with a synthetic token so they appear intraffic_ctl config status.Dependencies and Related Issues
Cherry-pick #12934 for an Argparser fix.
Fixes #12324 — Improving
traffic_ctl config reload.This PR will likely land after:
There should be no major conflicts with those PRs. Conversation and coordination needs to be done before merging.