[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow#6934
[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow#6934jackyYang6 wants to merge 7 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6934 +/- ##
==========================================
Coverage ? 73.78%
==========================================
Files ? 399
Lines ? 55697
Branches ? 8784
==========================================
Hits ? 41094
Misses ? 11699
Partials ? 2904
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
该 PR 旨在修复并统一 ThinkingBudgetLogitsProcessor 与 ERNIE 推理(reasoning)流程在“思考段截断”上的行为:去掉 </think> 前的隐式换行,并补齐 ERNIE 在 GPU 路径下 prompt 侧 <think> 状态无法传播导致 thinking_budget 失效的问题,同时将 request 侧的 thinking-budget 预处理逻辑抽取为通用 helper 以复用到多类 Processor 中。
Changes:
- 调整
ThinkingBudgetLogitsProcessor:budget 达到后直接强制输出</think>(stop sentence 场景则先输出 stop sentence 再输出</think>),并新增 prompt 状态扫描的 GPU fallback(token_ids_all + prompt_lens)。 - 在 text/ernie/v1/ernie_vl 等多个输入处理器中抽取并复用 thinking-budget 相关的 request-side 预处理 helper(stop sentence 编码、prompt
<think>状态更新等),对齐“prompt 侧不消耗 budget、decode 侧消耗 budget”的语义。 - 更新单测与中英文文档,覆盖新语义与 GPU fallback 行为。
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
fastdeploy/model_executor/logits_processor/thinking_budget.py |
budget 达到后直接强制 </think>;stop sentence 优先;增加 prompt 扫描的 token_ids_all fallback。 |
fastdeploy/input/text_processor.py |
抽取/新增 thinking-budget 通用 helper(stop sentence 编码、prompt 状态更新、literal 编码缓存);tokenize cache 支持 lazy init。 |
fastdeploy/input/v1/text_processor.py |
同上(v1 版本)。 |
fastdeploy/input/ernie4_5_processor.py |
复用通用 helper,确保 ERNIE 文本处理器能写入 thinking-budget 所需的 prompt-side 状态与 stop sentence token ids。 |
fastdeploy/input/v1/ernie4_5_processor.py |
同上(v1 版本)。 |
fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py |
复用通用 helper,确保 ERNIE-VL 路径也能准备 thinking-budget 参数与 prompt-side 状态。 |
fastdeploy/input/v1/ernie4_5_vl_processor/ernie4_5_vl_processor.py |
同上(v1 版本)。 |
tests/model_executor/test_thinking_budget.py |
更新/新增用例以匹配新语义(prompt 不消耗 budget、直接 </think>、stop sentence 行为、GPU fallback)。 |
docs/features/thinking_budget.md |
更新英文文档描述与实践建议,对齐新行为。 |
docs/zh/features/thinking_budget.md |
更新中文文档描述与实践建议,对齐新行为。 |
Motivation
Fix two behavior inconsistencies in
ThinkingBudgetLogitsProcessor:thinking_budgetended thinking with an extra newline before</think>, which was not aligned with the existingreasoning_max_tokensbehavior.thinking_budgetcould fail to take effect on GPU because prompt-side<think>state was not propagated through ERNIE-specific processors, and the runtime fallback only checkedprompt_ids, which is not available on the current GPU path.This PR aligns
thinking_budgetwith the current ERNIE reasoning flow and removes the extra newline before</think>.Modifications
ThinkingBudgetLogitsProcessorto terminate thinking by forcing</think>directly instead of\n + </think>.think_stop_sentencebehavior asthink_stop_sentence + </think>, and remove the implicit leading newline before the stop sentence.ThinkingBudgetLogitsProcessor:prompt_idsscan for XPU/HPU compatibilitytoken_ids_all + prompt_lenswhenprompt_idsis unavailable on GPUfastdeploy/input/text_processor.pyfastdeploy/input/v1/text_processor.pyfastdeploy/input/ernie4_5_processor.pyfastdeploy/input/v1/ernie4_5_processor.pyfastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.pyfastdeploy/input/v1/ernie4_5_vl_processor/ernie4_5_vl_processor.py<think>scaffolding or prefilled thinking content does not consumethinking_budgetthinking_budgetthink_stop_sentencestill consumesthinking_budget</think>does not consumethinking_budgetUsage or Command
Example request:
{ "messages": [ { "role": "user", "content": "你好" } ], "enable_thinking": true, "reasoning_max_tokens": 10, "logits_processors_args": { "thinking_budget": 10, "think_stop_sentence": "思考已结束,开始回复" }, "logprobs": true, "top_logprobs": 1, "include_logprobs_decode_token": true, "temperature": 0, "max_tokens": 20, "return_token_ids": true }Accuracy Tests
This PR does not modify kernel math or model forward numerics. The validation focus is output behavior consistency for reasoning truncation.
Behavioral validation on ERNIE thinking model:
thinking_budgetonlythinking_budget=10ThinkingBudgetLogitsProcessortakes effect on ERNIE and inserts</think>after 10 decode-time thinking tokensthinking_budget + think_stop_sentencethinking_budget=10,think_stop_sentence="思考已结束,开始回复"</think>reasoning_max_tokensonlyreasoning_max_tokens=10</think>is inserted after 10 thinking tokensreasoning_max_tokens=10,thinking_budget=10,think_stop_sentence=...</think>stay aligned at 10</think>Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.