Skip to content

Conversation

@hanblack26
Copy link

Issue
Attempt to optimize rust-analyzer#7857 (“Optmize mbe::match_ for performance”).

Summary
This PR makes small, localized changes to the mbe matcher to reduce unnecessary allocations and intermediate work in the macro matcher. Initializes with small capacities to avoid repeated allocations and shave off some time.
Functionally, no behavior has been changed. The goal was to make the path of the NFA-based matcher cheaper.

Testing
To test, I ran the mbe create tests before and after the change.
Before:
`running 7 tests
test benchmark::benchmark_expand_macro_rules ... ok
test benchmark::benchmark_parse_macro_rules ... ok
test tests::unbalanced_brace ... ok
test tests::token_mapping_smoke_test ... ok
test tests::token_mapping_floats ... ok
test tests::minus_belongs_to_literal ... ok
test tests::expr_2021 ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
After:running 7 tests
test benchmark::benchmark_expand_macro_rules ... ok
test benchmark::benchmark_parse_macro_rules ... ok
test tests::unbalanced_brace ... ok
test tests::token_mapping_smoke_test ... ok
test tests::token_mapping_floats ... ok
test tests::expr_2021 ... ok
test tests::minus_belongs_to_literal ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
`

All the unit tests and existing benchmark tests continue to pass.

Benchmarks
To measure the impact on benchmark_expand_macro_rules, I ran the slow benchmark command before and after:
`RUN_SLOW_TESTS=1 cargo test --release --package mbe -- benchmark::benchmark_expand_macro_rules --nocapture

Single-run before:
`running 1 test
test benchmark::benchmark_expand_macro_rules ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.26s`

5 runs after the change:
`=== Run 1 ===
mbe expand macro rules: 87.29ms, 0b
test result: ... finished in 0.27s

=== Run 2 ===
mbe expand macro rules: 112.52ms, 0b
test result: ... finished in 0.30s

=== Run 3 ===
mbe expand macro rules: 84.34ms, 0b
test result: ... finished in 0.25s

=== Run 4 ===
mbe expand macro rules: 85.42ms, 0b
test result: ... finished in 0.26s

=== Run 5 ===
mbe expand macro rules: 99.77ms, 0b
test result: ... finished in 0.28s`

So across multiple runs after the change, the end-to-end time stays in the same 0.25–0.30s range as the original single-run measurement (0.26s). Within that noise window, this change does not appear to significantly regress the benchmark_expand_macro_rules microbenchmark. The main benefit is simplifying the allocation patterns in the matcher so that repeated traversals of binding trees incur fewer small allocations.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 11, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 11, 2025

⚠️ Warning ⚠️

  • There are issue links (such as #123) in the commit messages of the following commits.
    Please move them to the PR description, to avoid spamming the issues with references to the commit, and so this bot can automatically canonicalize them to avoid issues with subtree.

Copy link
Contributor

@ChayimFriedman2 ChayimFriedman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One (or two) places where I believe this is not important. Also, the comments are redundant. Anyone knows what Vec::with_capacity() does. Please remove them. Other than that, LGTM.

self.nodes.push(Vec::with_capacity(4));
let nidx = self.nested.len();
self.nested.push(Vec::new());
self.nested.push(Vec::with_capacity(4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it hard to believe that this one, and the one two lines above, will do any difference to perf. 4 is the default initial capacity for Vec (or more, for small types), so only one check+jump will be prevents. So this IMO worsens the code, please revert this.

@Veykril
Copy link
Member

Veykril commented Dec 21, 2025

This PR does not really seem to actually do much, as was already pointed out, the capacity calls will actually worsen things at best, as the initial capacity of a vec when it allocates is already 4 items anyways. And given your benching numbers don't really reflect improvements either way the changes don't seem too useful to us.

Thanks for looking into this either way!

@Veykril Veykril closed this Dec 21, 2025
@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants