-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat(auto_cl): add error rate threshold for punishment attenuation #3219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add new GFlag `auto_cl_error_rate_punish_threshold` to enable error-rate-based punishment attenuation in AutoConcurrencyLimiter. Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause disproportionate avg_latency inflation (+31%), leading the limiter to mistakenly shrink max_concurrency and trigger ELIMIT rejections. Solution: Inspired by Alibaba Sentinel's threshold-based approach: - threshold=0 (default): Original behavior preserved (backward compat) - threshold>0 (e.g., 0.1): Error rates below threshold produce zero punishment; above it, punishment scales linearly from 0 to full Example: With threshold=0.1, a 5% error rate produces no punishment, while a 50% error rate produces 44% of the original punishment. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a configurable “dead-zone” for AutoConcurrencyLimiter’s failed-request punishment so low error rates don’t disproportionately inflate computed latency and shrink max concurrency.
Changes:
- Introduces new GFlag
auto_cl_error_rate_punish_thresholdto attenuate/disable failed-request punishment below a configured error-rate threshold. - Applies threshold-based linear scaling of
failed_punishinAutoConcurrencyLimiter::UpdateMaxConcurrency. - Adds unit tests covering threshold=0 (backward compatible), below/at/above-threshold behavior, and scaling cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/brpc/policy/auto_concurrency_limiter.cpp |
Adds the new GFlag and applies threshold-based attenuation to failed-request punishment during window updates. |
test/brpc_auto_concurrency_limiter_unittest.cpp |
Adds new unit tests validating the threshold attenuation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use synthetic timestamps instead of sleeping for deterministic tests - Fix trigger sample counting to preserve exact error rates - Consolidate 7 tests to 4 core tests with two-sided assertions - Add expected value range validation in assertions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document error punishment related GFlags - Add detailed explanation for auto_cl_error_rate_punish_threshold - Include table of all configurable parameters with defaults Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix avg_latency comments to reflect std::ceil() rounding behavior - Add cc_test target in BUILD.bazel for Bazel CI coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b90b9fb to
1a01df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1a01df3 to
dbd4089
Compare
- Skip attenuation logic when threshold <= 0 or >= 1 - Update GFlag description to document valid range (0, 1) - Add documentation for the new parameter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
dbd4089 to
a18e2bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@wwbmmm fix |
Add new GFlag
auto_cl_error_rate_punish_thresholdto enable error-rate-based punishment attenuation in AutoConcurrencyLimiter.Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause disproportionate avg_latency inflation (+31%), leading the limiter to mistakenly shrink max_concurrency and trigger ELIMIT rejections.
Solution: Inspired by Alibaba Sentinel's threshold-based approach:
Example: With threshold=0.1, a 5% error rate produces no punishment, while a 50% error rate produces 44% of the original punishment.
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
What is changed and the side effects?
Changed:
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: