Added support for lifecycle.started option#4672
Conversation
|
Commit: 7cb5e94
24 interesting tests: 11 flaky, 7 SKIP, 6 RECOVERED
Top 20 slowest tests (at least 2 minutes):
|
| App: input.App, | ||
| SourceCodePath: input.SourceCodePath, | ||
| Config: input.Config, | ||
| GitSource: input.GitSource, |
There was a problem hiding this comment.
Why do we need all these new fields to support 'started' state?
Also, why don't we track 'bool started' here?
There was a problem hiding this comment.
Why do we need all these new fields to support 'started' state?
While working on the feature I realised we don;t track these field in the state correctly, so fixed it. Started should be here as well indeed, will add it
There was a problem hiding this comment.
While working on the feature I realised we don;t track these field in the state correctly, so fixed it.
Interesting, is that possible to have a regression test for this + fix in a separate PR? I wonder how our test suite missed this omission, is that because none of our configs use that fields?
Can we add a new invariant test config or extend existing one with these fields? That should take care of the rest.
There was a problem hiding this comment.
I need to keep these fields here because I introduce AppState and Started field, so I would prefer not to make multiple PRs out of it.
But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in
There was a problem hiding this comment.
But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in
okay, let's review this first. It still seems that AppState/SourceCodePath/Config/GitSource change is big enough to deserve its own PR / changelog entry but let's see the root cause first.
There was a problem hiding this comment.
SourceCodePath/Config/GitSource fields do not matter for Apps CRUD API because they are no part of it, they are used in Apps.Deploy call which previously only part of bundle run and we didn't really needed this in state.
But now with introducution of Started field we do the Deploy call on Update as well (and as a result on bundle deploy), so worth including it in state now.
So it wasn't a regression which tests did not catch
There was a problem hiding this comment.
But now with introducution of Started field we do the Deploy call on Update as well (and as a result on bundle deploy), so worth including it in state now.
Why include it in the state? What do we get out of it? We usually just snapshot the config there.
There was a problem hiding this comment.
If we do not include it in the state, how do we know if any of these fields changed / drifted?
| App: input.App, | ||
| SourceCodePath: input.SourceCodePath, | ||
| Config: input.Config, | ||
| GitSource: input.GitSource, |
There was a problem hiding this comment.
But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in
okay, let's review this first. It still seems that AppState/SourceCodePath/Config/GitSource change is big enough to deserve its own PR / changelog entry but let's see the root cause first.
| // With lifecycle.started=true, ensure the app compute is running and deploy the latest code. | ||
| if config.Started { | ||
| // Start compute if it is stopped (mirrors bundle run behavior). | ||
| app, err := r.client.Apps.GetByName(ctx, id) |
There was a problem hiding this comment.
why do we need separate GetByName call here? does not response already have the same type?
There was a problem hiding this comment.
This is to get a status of the compute below, Update request does not have a status of compute
There was a problem hiding this comment.
I meant update's response, is that not apps.App?###
There was a problem hiding this comment.
Yes, it's not the apps.App, it's a separate data structure which does not have compute status
| AppName: id, | ||
| UpdateMask: updateMask, | ||
| } | ||
| updateWaiter, err := r.client.Apps.CreateUpdate(ctx, request) |
There was a problem hiding this comment.
Could it be possible that the only thing that changed is 'started' field, making this API call unnecessary?
There was a problem hiding this comment.
Yes, but we skip these type of fields, see OverrideChangeDesc definition here
There was a problem hiding this comment.
I don't understand, from the code it looks that we unconditionally call CreateUpdate. This is regardless of what is in changes.
However, we don't always need that, right?
There was a problem hiding this comment.
Yes, in DoUpdate we don't do any checks, but we mark it as Skip action here, so we won't even call Update at all
simonfaltum
left a comment
There was a problem hiding this comment.
Review (automated, 2 agents)
Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion
[Critical] DoCreate never deploys app code when lifecycle.started=true
bundle/direct/dresources/app.go (DoCreate)
DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.
Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.
[Critical] All local-only fields Skipped, preventing DoUpdate from running
bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.
Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.
[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op
bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go
started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.
Suggestion: Plan an explicit Start step when started=true and resource is stopped.
[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
bundle/config/resources/lifecycle.go:18-32
If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.
[Major] plan_test.go lost coverage breadth
bundle/phases/plan_test.go
Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.
[Major] No validation for lifecycle.started on unsupported resource types
bundle/config/mutator/validate_lifecycle_started.go:30-46
Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.
[Gap (Major)] Acceptance test never tests started-only toggle
The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.
[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started
Only the app path is tested.
[Nit] Validation error doesn't identify which resource
validate_lifecycle_started.go:40-46 - Include resource key in error message.
[Nit] Duplicate lifecycle entries in schema output
out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.
[Nit] Redundant zero-value assignments in RemapState
app.go:93-100 - Explicit zero values are unnecessary in Go struct init.
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant).
Priority: HIGH — Several critical correctness issues
MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys
For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.
MAJOR: If only started changes on an app, DoUpdate is never called
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.
MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
type LifecycleWithStarted struct {
PreventDestroy bool `json:"prevent_destroy,omitempty"`
Started *bool `json:"started,omitempty"`
}Should embed Lifecycle instead:
type LifecycleWithStarted struct {
Lifecycle
Started *bool `json:"started,omitempty"`
}Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.
MAJOR: Field shadowing creates duplicate lifecycle schema entries
Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:
resources.apps.*.lifecycle resources.Lifecycle INPUT
resources.apps.*.lifecycle resources.LifecycleWithStarted INPUT
MEDIUM: ILifecycle naming not idiomatic Go
The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.
MEDIUM: Lost parametric test coverage
The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.
MEDIUM: No unit tests for ValidateLifecycleStarted
The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.
What looks good
appdeploypackage extraction is clean DRY improvement- Test server additions are thorough with proper state management
- Schema and annotation descriptions are clear
- The overall feature design is well thought out
Focus areas for review
- Cluster/warehouse update path —
started=trueineffective after creation - App started-only toggle — silent no-op
- Field embedding —
LifecycleWithStartedshould embedLifecycle - Test coverage restoration
Changes
Added support for lifecycle.started option
Why
This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy
Tests
Added an acceptance test