-
Notifications
You must be signed in to change notification settings - Fork 148
Replace regex-based interpolation with character scanner, add "did you mean" suggestions #4715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d54e321
fb2fec3
04e8995
8b55146
7cf85e5
04fe53e
12949c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| bundle: | ||
| name: did-you-mean | ||
|
|
||
| variables: | ||
| my_cluster_id: | ||
| default: abc123 | ||
|
|
||
| resources: | ||
| jobs: | ||
| my_job: | ||
| name: "${var.my_clster_id}" |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
|
|
||
| >>> [CLI] bundle validate | ||
| Error: reference does not exist: ${var.my_clster_id}. did you mean ${var.my_cluster_id}? | ||
|
|
||
| Name: did-you-mean | ||
| Target: default | ||
| Workspace: | ||
| User: [USERNAME] | ||
| Path: /Workspace/Users/[USERNAME]/.bundle/did-you-mean/default | ||
|
|
||
| Found 1 error | ||
|
|
||
| Exit code: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| trace $CLI bundle validate |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| bundle: | ||
| name: "${foo.bar-}" | ||
|
|
||
| variables: | ||
| a: | ||
| default: hello |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
|
|
||
| >>> [CLI] bundle validate | ||
| Warning: invalid variable reference ${foo.bar-}: invalid key "bar-" | ||
| in databricks.yml:2:9 | ||
|
|
||
| Name: ${foo.bar-} | ||
| Target: default | ||
| Workspace: | ||
| User: [USERNAME] | ||
| Path: /Workspace/Users/[USERNAME]/.bundle/${foo.bar-}/default | ||
|
|
||
| Found 1 warning |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| trace $CLI bundle validate |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| package mutator | ||
|
|
||
| import ( | ||
| "context" | ||
|
|
||
| "github.com/databricks/cli/bundle" | ||
| "github.com/databricks/cli/libs/diag" | ||
| "github.com/databricks/cli/libs/dyn" | ||
| "github.com/databricks/cli/libs/dyn/dynvar" | ||
| ) | ||
|
|
||
| type warnMalformedReferences struct{} | ||
|
|
||
| // WarnMalformedReferences returns a mutator that emits warnings for strings | ||
| // containing malformed variable references (e.g. "${foo.bar-}"). | ||
| func WarnMalformedReferences() bundle.Mutator { | ||
| return &warnMalformedReferences{} | ||
| } | ||
|
|
||
| func (*warnMalformedReferences) Name() string { | ||
| return "WarnMalformedReferences" | ||
| } | ||
|
|
||
| func (*warnMalformedReferences) Validate(ctx context.Context, b *bundle.Bundle) error { | ||
| return nil | ||
| } | ||
|
|
||
| func (*warnMalformedReferences) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| var diags diag.Diagnostics | ||
| err := b.Config.Mutate(func(root dyn.Value) (dyn.Value, error) { | ||
| _, err := dyn.Walk(root, func(p dyn.Path, v dyn.Value) (dyn.Value, error) { | ||
| // Only check values with source locations to avoid false positives | ||
| // from synthesized/computed values. | ||
| if len(v.Locations()) == 0 { | ||
| return v, nil | ||
| } | ||
| _, _, refDiags := dynvar.NewRefWithDiagnostics(v) | ||
| diags = diags.Extend(refDiags) | ||
| return v, nil | ||
| }) | ||
| return root, err | ||
| }) | ||
| if err != nil { | ||
| diags = diags.Extend(diag.FromErr(err)) | ||
| } | ||
| return diags | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # Variable Interpolation: Parser & "Did You Mean" Suggestions | ||
|
|
||
| Author: Shreyas Goenka | ||
| Date: 12 March 2026 | ||
|
|
||
| ## Motivation | ||
|
|
||
| DABs variable interpolation (`${...}`) was regex-based. This caused: | ||
|
|
||
| 1. **Silent failures** — `${foo.bar-}` silently treated as literal text with no warning. | ||
| 2. **No suggestions** — `${bundle.nme}` produces "reference does not exist" with no hint. | ||
| 3. **No escape mechanism** — no way to produce a literal `${` in output. | ||
|
|
||
| ## Background: How Other Systems Parse `${...}` | ||
|
|
||
| | System | Strategy | Escape | Error Quality | | ||
| |--------|----------|--------|---------------| | ||
| | Go `text/template` | State-function lexer | None | Line + template name | | ||
| | HCL2 (Terraform) | Ragel FSM + recursive descent | `$${` → literal `${` | Source range + suggestions | | ||
| | Python f-strings | Mode-stack tokenizer | `{{` → `{` | Line/column | | ||
| | Rust `format!` | Iterator-based descent | `{{`/`}}` | Spans + suggestions | | ||
| | Bash | Char-by-char + depth tracking | `\$` | Line number | | ||
|
|
||
| For a syntax as simple as `${path.to.var[0]}` (no nesting, no functions, no | ||
| operators), a full recursive descent parser is overkill. A **two-mode character | ||
| scanner** — the same core pattern used by Go's `text/template` and HCL — gives | ||
| proper error reporting and escape support without the complexity. | ||
|
|
||
| ## Design Decisions | ||
|
|
||
| ### Two-mode character scanner | ||
|
|
||
| A two-mode scanner (TEXT / REFERENCE) that produces a flat list of tokens. | ||
| No AST, no recursive descent. Easy to port to the Python implementation. | ||
|
|
||
| See `libs/dyn/dynvar/interpolation/parse.go`. | ||
|
|
||
| ### Nested `${` handling | ||
|
|
||
| Existing configs use patterns like `${var.foo_${var.tail}}` where the inner | ||
| reference resolves first. The old regex matched only `${var.tail}` (the | ||
| innermost pair). The new parser preserves this: when scanning for `}` inside | ||
| a reference, if another `${` is encountered, the outer `${` is abandoned | ||
| (treated as literal) and scanning restarts from the inner `${`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that what we want? I think we've discussed that we want to reject these constructs. |
||
|
|
||
| ### `$$` escape sequence | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've discussed in this in core eng meeting and there were a number of concerns with this. Notably, in bash it's current process. I think least controversial option was: |
||
|
|
||
| Following HCL2's precedent, `$$` before `{` produces a literal `$`. This is | ||
| backward compatible — no existing config uses `$${` (the old regex wouldn't | ||
| match it). | ||
|
|
||
| ### Malformed reference warnings | ||
|
|
||
| A standalone `WarnMalformedReferences` mutator walks the config tree once | ||
| before variable resolution. It only checks values with source locations | ||
| (`len(v.Locations()) > 0`) to avoid false positives from synthesized values | ||
| (e.g., normalized/computed paths). | ||
|
|
||
| ### "Did you mean" suggestions | ||
|
|
||
| When a valid-syntax reference fails to resolve (`NoSuchKeyError`), the | ||
| resolver calls a `SuggestFn` that walks the config tree component by | ||
| component using Levenshtein distance. The suggestion is appended to the | ||
| existing error: `did you mean ${var.my_cluster_id}?`. | ||
|
|
||
| The `SuggestFn` receives the raw path from the reference (e.g., `var.X`), | ||
| rewrites it to `variables.X.value` for lookup, then converts the suggestion | ||
| back to `var.X` form for user-facing messages. | ||
|
|
||
| See `libs/dyn/dynvar/suggest.go`. | ||
|
|
||
| ### Token-based resolution | ||
|
|
||
| The resolver's string interpolation changed from `strings.Replace` (with | ||
| count=1 to avoid double-replacing duplicate refs) to a token concatenation | ||
| loop. Each `TokenRef` maps 1:1 to a resolved value, eliminating the ambiguity. | ||
|
|
||
| ## Python sync | ||
|
|
||
| The Python regex in `python/databricks/bundles/core/_transform.py` needs a | ||
| corresponding update in a follow-up PR. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| # "Did You Mean?" Suggestions for Invalid Variable References | ||
|
|
||
| Author: Shreyas Goenka | ||
| Date: 12 March 2026 | ||
|
|
||
| ## Problem | ||
|
|
||
| `${bundle.git.origin_urlx}` produces `reference does not exist` with no hint. | ||
| A single character typo in a long path can take minutes to spot. | ||
|
|
||
| ## Design: Fuzzy Path Walk | ||
|
|
||
| When a lookup fails with `NoSuchKeyError`, we do a separate fuzzy walk of the | ||
| normalized config tree (which includes all struct-defined fields via | ||
| `IncludeMissingFields` + all user-defined map keys). | ||
|
|
||
| The walk processes every component independently: | ||
| 1. Exact key match → follow it | ||
| 2. No exact match → Levenshtein fuzzy match against siblings → follow best match | ||
| 3. Index components (`[0]`) → pass through verbatim | ||
| 4. Any component unfixable (all candidates too far) → give up, no suggestion | ||
|
|
||
| This corrects **multiple** typos simultaneously (e.g., `${resources.jbs.my_jb.id}` | ||
| → `did you mean ${resources.jobs.my_job.id}?`). | ||
|
|
||
| Distance threshold: `min(3, max(1, len(key)/2))`. | ||
|
|
||
| See `libs/dyn/dynvar/suggest.go`. | ||
|
|
||
| ## Wiring | ||
|
|
||
| The suggestion callback is passed via `dynvar.WithSuggestFn(...)` into | ||
| `dynvar.Resolve`. On `NoSuchKeyError` in `resolveKey`, the suggestion is | ||
| appended to the error message. | ||
|
|
||
| ## `var` Shorthand | ||
|
|
||
| `${var.foo}` is rewritten to `${variables.foo.value}` before lookup. The | ||
| `SuggestFn` in `resolve_variable_references.go` handles this bidirectionally: | ||
| rewrite `var.X` → `variables.X.value` for the fuzzy walk, then convert the | ||
| suggestion back to `var.X` form for the user-facing message. | ||
|
|
||
| ## Scope | ||
|
|
||
| **Covered**: typos in struct fields, user-defined names, resource types/instances, | ||
| multi-level typos. | ||
|
|
||
| **Not covered**: malformed references (handled by the parser), cross-section | ||
| suggestions (user writes `${bundle.X}` meaning `${var.X}`), array index | ||
| out of bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.