Skip to content

Added support for lifecycle.started option#4672

Open
andrewnester wants to merge 12 commits intomainfrom
feat/lifecycle-started
Open

Added support for lifecycle.started option#4672
andrewnester wants to merge 12 commits intomainfrom
feat/lifecycle-started

Conversation

@andrewnester
Copy link
Contributor

Changes

Added support for lifecycle.started option

Why

This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy

Tests

Added an acceptance test

@eng-dev-ecosystem-bot
Copy link
Collaborator

eng-dev-ecosystem-bot commented Mar 6, 2026

Commit: 7cb5e94

Run: 22947660912

Env 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 8 7 268 783 6:12
💚​ aws windows 8 7 270 781 5:51
🔄​ aws-ucws linux 2 7 7 364 698 7:39
🔄​ aws-ucws windows 2 7 7 366 696 6:46
💚​ azure linux 2 9 271 781 6:00
💚​ azure windows 2 9 273 779 5:29
🔄​ azure-ucws linux 9 1 9 362 694 9:00
🔄​ azure-ucws windows 4 1 9 369 692 5:08
💚​ gcp linux 2 9 267 784 6:02
💚​ gcp windows 2 9 269 782 4:51
24 interesting tests: 11 flaky, 7 SKIP, 6 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🔄​ TestAccept 💚​R 💚​R 💚​R 🔄​f 💚​R 💚​R 🔄​f 🔄​f 💚​R 💚​R
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestAccept/ssh/connect-serverless-gpu 🙈​s 🙈​s 🔄​f 🔄​f 🙈​s 🙈​s 🔄​f 🔄​f 🙈​s 🙈​s
🔄​ TestAccept/ssh/connection 💚​R 💚​R 🔄​f 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🔄​ TestFsCatDoesNotSupportOutputModeJson ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f 🔄​f ✅​p ✅​p
🔄​ TestFsCpFileToDirForWindowsPaths 🙈​s ✅​p 🙈​s ✅​p 🙈​s ✅​p 🙈​s 🔄​f 🙈​s ✅​p
🔄​ TestFsCpFileToNonExistentDir ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestFsCpFileToNonExistentDir/uc-volumes_to_dbfs 🙈​s 🙈​s ✅​p ✅​p 🙈​s 🙈​s 🔄​f ✅​p 🙈​s 🙈​s
🔄​ TestFsRmDirRecursively ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestFsRmDirRecursively/dbfs ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestFsRmNonEmptyDirectory ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestFsRmNonEmptyDirectory/dbfs ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
Top 20 slowest tests (at least 2 minutes):
duration env testname
3:44 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:42 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:38 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:29 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:24 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:24 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:23 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:18 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:14 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:10 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:09 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:05 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:48 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:46 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:42 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:40 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:36 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:21 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:14 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

App: input.App,
SourceCodePath: input.SourceCodePath,
Config: input.Config,
GitSource: input.GitSource,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need all these new fields to support 'started' state?

Also, why don't we track 'bool started' here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need all these new fields to support 'started' state?

While working on the feature I realised we don;t track these field in the state correctly, so fixed it. Started should be here as well indeed, will add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here da107eb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on the feature I realised we don;t track these field in the state correctly, so fixed it.

Interesting, is that possible to have a regression test for this + fix in a separate PR? I wonder how our test suite missed this omission, is that because none of our configs use that fields?

Can we add a new invariant test config or extend existing one with these fields? That should take care of the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to keep these fields here because I introduce AppState and Started field, so I would prefer not to make multiple PRs out of it.

But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in

okay, let's review this first. It still seems that AppState/SourceCodePath/Config/GitSource change is big enough to deserve its own PR / changelog entry but let's see the root cause first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceCodePath/Config/GitSource fields do not matter for Apps CRUD API because they are no part of it, they are used in Apps.Deploy call which previously only part of bundle run and we didn't really needed this in state.

But now with introducution of Started field we do the Deploy call on Update as well (and as a result on bundle deploy), so worth including it in state now.

So it wasn't a regression which tests did not catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now with introducution of Started field we do the Deploy call on Update as well (and as a result on bundle deploy), so worth including it in state now.

Why include it in the state? What do we get out of it? We usually just snapshot the config there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not include it in the state, how do we know if any of these fields changed / drifted?

@andrewnester andrewnester requested a review from denik March 6, 2026 13:12
App: input.App,
SourceCodePath: input.SourceCodePath,
Config: input.Config,
GitSource: input.GitSource,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will add a regression test and investigate why we missed it in a separate PR, we have tests that included all of these fields including invariant test with SourceCodePath field in

okay, let's review this first. It still seems that AppState/SourceCodePath/Config/GitSource change is big enough to deserve its own PR / changelog entry but let's see the root cause first.

// With lifecycle.started=true, ensure the app compute is running and deploy the latest code.
if config.Started {
// Start compute if it is stopped (mirrors bundle run behavior).
app, err := r.client.Apps.GetByName(ctx, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need separate GetByName call here? does not response already have the same type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get a status of the compute below, Update request does not have a status of compute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant update's response, is that not apps.App?###

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not the apps.App, it's a separate data structure which does not have compute status

AppName: id,
UpdateMask: updateMask,
}
updateWaiter, err := r.client.Apps.CreateUpdate(ctx, request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible that the only thing that changed is 'started' field, making this API call unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we skip these type of fields, see OverrideChangeDesc definition here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, from the code it looks that we unconditionally call CreateUpdate. This is regardless of what is in changes.

However, we don't always need that, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in DoUpdate we don't do any checks, but we mark it as Skip action here, so we won't even call Update at all

https://github.com/databricks/cli/pull/4672/changes#diff-898594fcda57afc5088d44c562d35fb693c59f9fac3b01568791273e720c61a0R169

@andrewnester andrewnester requested a review from denik March 10, 2026 16:21
Copy link
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 2 agents)

Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion

[Critical] DoCreate never deploys app code when lifecycle.started=true

bundle/direct/dresources/app.go (DoCreate)

DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.

Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.

[Critical] All local-only fields Skipped, preventing DoUpdate from running

bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.

Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.

[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op

bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go

started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.

Suggestion: Plan an explicit Start step when started=true and resource is stopped.

[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

bundle/config/resources/lifecycle.go:18-32

If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.

[Major] plan_test.go lost coverage breadth

bundle/phases/plan_test.go

Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.

[Major] No validation for lifecycle.started on unsupported resource types

bundle/config/mutator/validate_lifecycle_started.go:30-46

Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.

[Gap (Major)] Acceptance test never tests started-only toggle

The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.

[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started

Only the app path is tested.

[Nit] Validation error doesn't identify which resource

validate_lifecycle_started.go:40-46 - Include resource key in error message.

[Nit] Duplicate lifecycle entries in schema output

out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.

[Nit] Redundant zero-value assignments in RemapState

app.go:93-100 - Explicit zero values are unnecessary in Go struct init.

Copy link
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was posted by Claude (AI assistant).

Priority: HIGH — Several critical correctness issues

MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys

For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.

MAJOR: If only started changes on an app, DoUpdate is never called

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.

MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

type LifecycleWithStarted struct {
    PreventDestroy bool  `json:"prevent_destroy,omitempty"`
    Started        *bool `json:"started,omitempty"`
}

Should embed Lifecycle instead:

type LifecycleWithStarted struct {
    Lifecycle
    Started *bool `json:"started,omitempty"`
}

Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.

MAJOR: Field shadowing creates duplicate lifecycle schema entries

Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:

resources.apps.*.lifecycle  resources.Lifecycle           INPUT
resources.apps.*.lifecycle  resources.LifecycleWithStarted  INPUT

MEDIUM: ILifecycle naming not idiomatic Go

The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.

MEDIUM: Lost parametric test coverage

The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.

MEDIUM: No unit tests for ValidateLifecycleStarted

The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.

What looks good

  • appdeploy package extraction is clean DRY improvement
  • Test server additions are thorough with proper state management
  • Schema and annotation descriptions are clear
  • The overall feature design is well thought out

Focus areas for review

  1. Cluster/warehouse update path — started=true ineffective after creation
  2. App started-only toggle — silent no-op
  3. Field embedding — LifecycleWithStarted should embed Lifecycle
  4. Test coverage restoration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants