[Optimization]Optimize CPU utilization#6950
[Optimization]Optimize CPU utilization#6950luukunn wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 旨在通过减少解码阶段的临时对象创建与重复反射判断,优化服务端在流式/非流式输出处理时的 CPU 开销,涉及输入处理器的增量解码与 OpenAI 响应处理流程。
Changes:
- 在
DataProcessor/Ernie4_5Processor.ids2tokens中改为对历史 token 列表就地extend,避免每步生成previous + token_id的 O(n) 临时 list。 - 在
ChatResponseProcessor中缓存process_response_dict是否为协程函数,避免在循环中重复inspect.iscoroutinefunction(...)。 - 对若干内部状态访问进行局部变量缓存(
status = self.decode_status[task_id])以减少字典/索引访问开销。
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fastdeploy/input/text_processor.py | 增量解码路径就地累积 token,减少临时 list 分配 |
| fastdeploy/input/ernie4_5_processor.py | 同步增量解码路径同样改为就地累积 token |
| fastdeploy/entrypoints/openai/response_processors.py | 缓存协程判断,减少循环内反射开销(同时暴露出一个需要修复的同步分支问题) |
| for part in self._multipart_buffer: | ||
| if part["decode_type"] == 0: | ||
| if inspect.iscoroutinefunction(self.data_processor.process_response_dict): | ||
| if self._is_async_processor: | ||
| await self.data_processor.process_response_dict( | ||
| response_dict=part["request_output"], |
There was a problem hiding this comment.
这里改为使用 self._is_async_processor 后,同一代码块下方的同步分支(对应 else)仍然调用 process_response_dict 时传入了外层的 request_output/stream 变量,而不是当前 part["request_output"](且应固定 stream=False),会导致非流式 multipart(text+image+text) 场景文本解码使用错误输入。建议把同步分支也改成对每个 part 调用 process_response_dict(response_dict=part["request_output"], stream=False, ...) 以与异步路径一致。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6950 +/- ##
==========================================
Coverage ? 73.73%
==========================================
Files ? 399
Lines ? 55624
Branches ? 8766
==========================================
Hits ? 41017
Misses ? 11707
Partials ? 2900
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.