[PD Disaggregation] pd + cache_storage support vl model#6906
[PD Disaggregation] pd + cache_storage support vl model#6906juncaipeng wants to merge 3 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 旨在让 PD Disaggregation + cache_storage 在 VL(多模态)模型场景下能够生成一致的 cache key:通过在 PD 分离传输时透传 multimodal 的位置/哈希信息,并在 Decode 侧写回存储时把这些 extra keys 纳入 block hash 计算。
Changes:
Request.to_dict()在 V1 调度模式下,除position_ids外额外保留mm_positions/mm_hashes用于解码侧。PrefixCacheManager.write_cache_to_storage_decode()生成 chained hash key 时引入get_block_hash_extra_keys()(将 mm_hashes 参与每个 block 的 hash 输入)。- 扩充
write_cache_to_storage()的 docstring 说明写回流程。
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| fastdeploy/engine/request.py | PD 分离传输时放宽 multimodal_inputs 白名单,确保解码侧拿到 mm_positions/mm_hashes |
| fastdeploy/cache_manager/prefix_cache_manager.py | Decode 侧写回存储的 key 生成逻辑加入 multimodal extra keys,使其与 RadixTree/P 侧一致 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6906 +/- ##
==========================================
Coverage ? 72.83%
==========================================
Files ? 399
Lines ? 55716
Branches ? 8776
==========================================
Hits ? 40578
Misses ? 12232
Partials ? 2906
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
该 PR 旨在让 PD 分离(PD Disaggregation)+ cache_storage 在多模态/VL 请求下生成一致的缓存 key,从而支持多模态内容参与 prefix cache / storage cache 的命中与回写。
Changes:
- 在
PrefixCacheManager的 storage 预取与 D 节点回写路径中,引入get_block_hash_extra_keys()将多模态 hash 纳入 block key 计算。 - 在
Request.to_dict()的 V1 过滤逻辑中,额外透传mm_positions/mm_hashes以支持 decode 节点侧计算一致的 cache key。 - 更新 prefix cache manager 的相关测试用例构造,补齐
multimodal_inputs字段。
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/cache_manager/test_prefix_cache_manager.py | 更新测试 request 构造以适配多模态字段接入 |
| fastdeploy/engine/request.py | V1 模式下 multimodal_inputs 过滤白名单加入 mm_positions/mm_hashes |
| fastdeploy/cache_manager/prefix_cache_manager.py | storage 预取与 decode 回写的 key 生成链路加入多模态 extra keys;扩充写回函数说明 |
Comments suppressed due to low confidence (1)
fastdeploy/engine/request.py:470
- PR 描述里 Usage/Accuracy Tests 仍是占位符(“-”),Checklist 也未勾选;但该改动会影响 PD 分离场景下多模态请求在 cache/storage 的 key 计算与数据透传。建议补充:1) 触发该功能的配置/环境变量与使用方式;2) 至少一条可复现的验证步骤或准确性/回归测试结果;3) 若暂不加单测请说明原因。
if isinstance(self.multimodal_inputs, dict):
# Optimize multimodal data transfer during PD separation:
# - V1 mode (ENABLE_V1_KVCACHE_SCHEDULER=1): position_ids, mm_positions and mm_hashes needed for decode nodes
# - V0 mode (ENABLE_V1_KVCACHE_SCHEDULER=0): Full field set required for compatibility
# This filtering significantly reduces serialized data size for large numpy arrays
allowed_keys = {"position_ids", "mm_positions", "mm_hashes"}
if not envs.ENABLE_V1_KVCACHE_SCHEDULER:
allowed_keys.update(["input_ids", "token_type_ids", "images", "image_type_ids", "grid_thw"])
data["multimodal_inputs"] = {
key: value for key, value in self.multimodal_inputs.items() if key in allowed_keys
Motivation
pd + cache_storage support vl model
Modifications
Use get_block_hash_extra_keys to cal cache keys
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.