【QA TEST Don't merge 】support eb5#6944
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 标题为“support eb5”,从改动内容看主要是在模型加载/量化(尤其 NVFP4 MoE)以及权重后处理路径上做了调整,并插入了较多调试日志;同时对 FDConfig.override_name_from_config() 的层数逻辑做了破坏性改动。
Changes:
- 调整
modules_to_convert():合并多来源的 exclude pattern,并尝试适配不同模型前缀名。 - 修改 NVFP4 MoE:变更 gate/up 权重加载顺序开关、跳过 blockscale 的 swizzle/interleave 处理、增加大量运行期日志。
- 在权重 transpose、Linear 权重加载、默认 loader 等路径加入(或保留注释的)调试日志;并强制
num_hidden_layers = 1。
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/utils.py | 给权重 transpose/ckpt suffix rename 增加日志与分支处理(含 early return)。 |
| fastdeploy/model_executor/model_loader/default_loader_v1.py | 增加被注释掉的参数打印调试代码。 |
| fastdeploy/model_executor/layers/utils.py | 扩展量化转换的模块排除规则来源与前缀适配逻辑。 |
| fastdeploy/model_executor/layers/quantization/nvfp4.py | 更改 NVFP4 MoE 权重加载顺序与 blockscale 处理,并加入大量 info 日志。 |
| fastdeploy/model_executor/layers/moe/moe.py | 增加若干注释掉的调试日志。 |
| fastdeploy/model_executor/layers/linear.py | 引入多处 info 日志(构建/加载/分片加载路径)。 |
| fastdeploy/config.py | 注释掉原 remove_tail_layer 逻辑并强制设置 num_hidden_layers = 1。 |
| logger.info(f"weight_name:{weight_name}") | ||
| weight = getattr(layer, weight_name) | ||
| if not weight._is_initialized(): | ||
| logger.info("权重没初始化啊!") | ||
| return |
There was a problem hiding this comment.
process_weight_transpose 里新增的 logger.info 会在每个权重后处理时输出,且包含中文日志(“权重没初始化啊!”),容易造成线上日志噪声/国际化问题。建议删除这些 info 日志,或至少改为 logger.debug 并用英文消息。
| # if hasattr(self, "num_hidden_layers") and self.runner != "pooling": | ||
| # if hasattr(self, "remove_tail_layer"): | ||
| # if self.remove_tail_layer is True: | ||
| # self.num_hidden_layers -= 1 | ||
| # elif isinstance(self.remove_tail_layer, int): | ||
| # self.num_hidden_layers -= self.remove_tail_layer | ||
| self.num_hidden_layers = 1 |
There was a problem hiding this comment.
override_name_from_config 里将 num_hidden_layers 强制设置为 1 会影响几乎所有模型的层数配置(例如构建 decoder 层数量、KV cache 形状等),属于全局破坏性改动;同时把原先的 remove_tail_layer 逻辑整段注释掉也会改变已有行为。建议恢复原逻辑,仅在确有需求的特定模型/配置条件下调整层数,并通过配置项显式控制。
| @property | ||
| def load_up_proj_weight_first(self) -> bool: | ||
| # FlashInfer CUTLASS kernel assumes [Up, Gate] Proj as W13 | ||
| # 目前默认给True | ||
| return True | ||
| return False |
There was a problem hiding this comment.
NVFP4 的 load_up_proj_weight_first 从 True 改为 False,会直接改变 MoE gate/up 两个 shard 的加载顺序(moe.py 会据此计算 param_shard_offset),容易造成 gate/up 权重对调而精度异常。若只为支持特定 checkpoint/模型,建议通过配置或在检测到特定格式时切换,而不要改成对所有 NVFP4 全局生效。
| return loaded_weight_name | ||
| fd_suffix_map = {} |
There was a problem hiding this comment.
rename_offline_ckpt_suffix_to_fd_suffix 里外层已经定义过 fd_suffix_map(并标了 noqa),这里又在 fn 内重新创建同名变量,会遮蔽外层变量且引入冗余,降低可读性。建议移除外层无用的 fd_suffix_map,或保留外层并在 fn 中只更新引用,避免重复定义。
| # if hasattr(self, "num_hidden_layers") and self.runner != "pooling": | ||
| # if hasattr(self, "remove_tail_layer"): | ||
| # if self.remove_tail_layer is True: | ||
| # self.num_hidden_layers -= 1 | ||
| # elif isinstance(self.remove_tail_layer, int): | ||
| # self.num_hidden_layers -= self.remove_tail_layer | ||
| self.num_hidden_layers = 1 |
There was a problem hiding this comment.
当前 PR 标题未按仓库约定的“[TAG]Title”格式(例如 [BugFix] ...),且描述基本仍是模板占位(Motivation/Modifications/Tests 未填写)。为避免后续合入/回溯困难,建议补全 PR 标题标签并在描述中说明改动动机、影响范围与验证方式。
| logger.info(f"up_gate_proj_input_scale:{up_gate_proj_input_scale}") | ||
| logger.info(f"up_gate_proj_weight_scale_2:{up_gate_proj_weight_scale_2}") | ||
| logger.info(f"down_proj_input_scale:{down_proj_input_scale}") | ||
| logger.info(f"down_proj_weight_scale_2:{layer.down_proj_weight_scale_2}") |
There was a problem hiding this comment.
process_weights_after_loading() 末尾新增的 logger.info 会在权重加载后对每个 MoE layer 打印 scale 值,既噪声大也可能泄露内部数值细节;同时这些张量可能较大,格式化也有开销。建议移除或降级为 debug,并加可控开关。
| logger.info(f"self.quant_method:{self.quant_method}") | ||
| self.quant_method.process_loaded_weights(self, weight_tensor) |
There was a problem hiding this comment.
load_weight() 中新增的 logger.info 会在每个 Linear 权重加载时触发,大模型/多层会产生大量日志并影响加载耗时。建议删除该 info 日志或改为 logger.debug 并增加可配置开关。
| ) | ||
|
|
||
| self.hidden_size = fd_config.model_config.hidden_size | ||
| logger.info(f"prefix:{prefix}") |
There was a problem hiding this comment.
ReplicatedLinear.init 中新增的 logger.info 会在模型构建时对每个线性层输出 prefix,日志量很大且对定位问题帮助有限。建议移除或改为 debug,并通过开关控制仅在排障时开启。
| logger.info(f"prefix:{prefix}") | |
| logger.debug(f"prefix:{prefix}") |
| "gate", | ||
| ], f"loaded_shard_id must be one of ['qkv', 'gate'], but got {loaded_shard_id}" | ||
|
|
||
| logger.info(f"loaded_shard_id:{loaded_shard_id}") |
There was a problem hiding this comment.
weight_loader() 中对 loaded_shard_id 的 logger.info 会在 shard 级别频繁触发(尤其是分片加载/TP 场景),容易造成日志刷屏。建议改为 debug 或移除。
| logger.info(f"loaded_shard_id:{loaded_shard_id}") | |
| logger.debug(f"loaded_shard_id:{loaded_shard_id}") |
|
AI CI Agent Test This is a test comment from AI CI Agent.
|
Request Changes - P0/P1 问题列表: P0 - 严重问题:
P1 - 重要问题:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
| default_initializer=paddle.nn.initializer.Constant(0), | ||
| is_bias=False, | ||
| ) | ||
| logger.info(f"weight_tmp:{weight_tmp}") |
There was a problem hiding this comment.
logger.info(f"weight_tmp:{weight_tmp}") 会打印 Parameter 对象(可能包含设备/shape 等),在加载阶段频繁调用会显著拖慢并污染日志。建议删除该 info 日志或改为 debug 且仅打印 shape/dtype 等摘要信息。
| logger.info(f"weight_tmp:{weight_tmp}") | |
| logger.debug( | |
| "Created temporary weight parameter for %s with shape=%s, dtype=%s", | |
| weight_name, | |
| tuple(weight_tmp.shape), | |
| str(weight_tmp.dtype), | |
| ) |
| def fn(loaded_weight_name, is_moe): | ||
| if fd_config.quant_config is None or fd_config.quant_config.is_checkpoint_bf16: | ||
| return loaded_weight_name | ||
| fd_suffix_map = {} | ||
| # Can be extended to other offline quantization suffixes if needed. |
There was a problem hiding this comment.
fn 内部重新赋值 fd_suffix_map = {} 会遮蔽外层同名变量,且属于冗余代码(后面会在分支里覆盖)。建议删除该行或直接复用外层映射,减少不必要的变量重置/可读性负担。
| for name, weight_scale in [ | ||
| ("up_gate", layer.up_gate_proj_weight_scale), | ||
| ("down", layer.down_proj_weight_scale), | ||
| ]: | ||
| assert weight_scale.shape[2] % 16 == 0, f"Expected {name}_weight_scale.dim(2) to be divisible by 16" | ||
| assert ( | ||
| weight_scale.dtype == paddle.float8_e4m3fn | ||
| ), f"{name} Weight Blockscale must be represented as FP8-E4M3" | ||
|
|
||
| up_gate_proj_blockscale_swizzled = _process_scale_interleaved(layer.up_gate_proj_weight_scale) | ||
| free_tensor(layer.up_gate_proj_weight_scale) | ||
| layer.up_gate_proj_weight_scale = None | ||
| if weight_scale.shape[2] % 4 != 0: | ||
| logger.warning( | ||
| "NVFP4 %s_weight_scale K' not multiple of 4: shape=%s, group_size=%s", |
There was a problem hiding this comment.
这里移除了对 weight_scale dtype/维度可整除性的硬断言(之前用于保证后续 scale interleave/swizzle 的前置条件),现在即使 shape 不满足要求也会继续执行,可能导致 FlashInfer 内核读取越界或数值错误。建议保留必要的断言/显式错误(至少保证 K' 可被 4 整除且 dtype 为 float8_e4m3fn),避免仅 warning 后继续跑。
|
|
||
| logger.info(f"loaded_shard_id:{loaded_shard_id}") | ||
| if loaded_shard_id == "qkv": |
There was a problem hiding this comment.
weight_loader 在热路径以 info 级别打印 loaded_shard_id,会在权重分片加载过程中被高频调用,导致日志膨胀和加载变慢。建议删除或降级为 debug,并仅在排障开关打开时记录。
| param.tensor_track.mark(start=param_shard_offset, end=param_shard_offset + param_shard_size) | ||
|
|
||
| param = slice_fn(param, output_dim, start=param_shard_offset, end=param_shard_offset + param_shard_size) | ||
| logger.info(f"loaded_weight.shape:{loaded_weight.shape}") |
There was a problem hiding this comment.
qkv_weight_loader 中新增的 logger.info(loaded_weight.shape) 位于循环/分片加载路径,会产生大量日志并影响加载性能。建议删除该日志或改为 debug 且仅在异常分支(assert 失败前)输出必要信息。
| logger.info(f"loaded_weight.shape:{loaded_weight.shape}") | |
| if param.shape != loaded_weight.shape: | |
| logger.debug( | |
| "Shape mismatch before loading qkv weight, param.shape=%s, loaded_weight.shape=%s", | |
| param.shape, | |
| loaded_weight.shape, | |
| ) |
| # up_gate_proj_blockscale_swizzled = _process_scale_interleaved(layer.up_gate_proj_weight_scale) | ||
| up_gate_proj_blockscale_swizzled = layer.up_gate_proj_weight_scale | ||
| create_parameter_and_copy( | ||
| layer, name="up_gate_proj_blockscale_swizzled", weight=up_gate_proj_blockscale_swizzled | ||
| ) | ||
| down_proj_blockscale_swizzled = _process_scale_interleaved(layer.down_proj_weight_scale) | ||
| free_tensor(layer.up_gate_proj_weight_scale) | ||
| layer.up_gate_proj_weight_scale = None | ||
|
|
||
| # down_proj_blockscale_swizzled = _process_scale_interleaved(layer.down_proj_weight_scale) | ||
| down_proj_blockscale_swizzled = layer.down_proj_weight_scale | ||
| create_parameter_and_copy(layer, name="down_proj_blockscale_swizzled", weight=down_proj_blockscale_swizzled) | ||
| free_tensor(layer.down_proj_weight_scale) | ||
| layer.down_proj_weight_scale = None |
There was a problem hiding this comment.
up_gate/down 的 blockscale_swizzled 现在直接复用原始 weight_scale(并注释掉 _process_scale_interleaved),但后续 flashinfer_cutlass_fused_moe 仍按“swizzled blockscale”来消费(quant_scales 里传入 view(int32))。如果 kernel 期望的是 interleaved/swizzled 布局,这会导致错误结果。建议恢复 _process_scale_interleaved 或明确实现与 kernel 对齐的 swizzle 逻辑,并在注释/文档中说明原因。
| logger.info(f"up_gate_proj_input_scale:{layer.up_gate_proj_input_scale_quant}") | ||
| logger.info(f"g1_alphas:{layer.g1_alphas}") | ||
| logger.info( | ||
| f"layer.up_gate_proj_blockscale_swizzled:{layer.up_gate_proj_blockscale_swizzled.view(paddle.float8_e4m3fn)}" | ||
| ) | ||
| logger.info(f"down_proj_input_scale_quant:{layer.down_proj_input_scale_quant}") | ||
| logger.info( | ||
| f"layer.down_proj_blockscale_swizzled:{layer.down_proj_blockscale_swizzled.view(paddle.float8_e4m3fn)}" | ||
| ) | ||
| logger.info(f"g2_alphas:{layer.g2_alphas}") | ||
|
|
There was a problem hiding this comment.
这里在 MoE apply 的热路径使用 logger.info 打印缩放/权重 blockscale 的内容(包含 view(float8) 的张量值),会造成严重的性能下降并可能刷爆日志。建议删除这些 info 日志,或改为 debug 并只打印 shape/dtype/统计信息(max/min),且确保默认关闭。
| logger.info(f"up_gate_proj_input_scale:{layer.up_gate_proj_input_scale_quant}") | |
| logger.info(f"g1_alphas:{layer.g1_alphas}") | |
| logger.info( | |
| f"layer.up_gate_proj_blockscale_swizzled:{layer.up_gate_proj_blockscale_swizzled.view(paddle.float8_e4m3fn)}" | |
| ) | |
| logger.info(f"down_proj_input_scale_quant:{layer.down_proj_input_scale_quant}") | |
| logger.info( | |
| f"layer.down_proj_blockscale_swizzled:{layer.down_proj_blockscale_swizzled.view(paddle.float8_e4m3fn)}" | |
| ) | |
| logger.info(f"g2_alphas:{layer.g2_alphas}") | |
| def _log_tensor_stats(name, tensor): | |
| # Debug-only tensor stats to avoid logging full tensor contents in hot path | |
| try: | |
| t_min = float(paddle.min(tensor)) | |
| t_max = float(paddle.max(tensor)) | |
| except Exception: | |
| t_min, t_max = None, None | |
| logger.debug( | |
| "MoE quant tensor stats - %s: shape=%s, dtype=%s, min=%s, max=%s", | |
| name, | |
| list(tensor.shape), | |
| str(tensor.dtype), | |
| t_min, | |
| t_max, | |
| ) | |
| _log_tensor_stats("up_gate_proj_input_scale_quant", layer.up_gate_proj_input_scale_quant) | |
| _log_tensor_stats("g1_alphas", layer.g1_alphas) | |
| _log_tensor_stats("up_gate_proj_blockscale_swizzled", layer.up_gate_proj_blockscale_swizzled) | |
| _log_tensor_stats("down_proj_input_scale_quant", layer.down_proj_input_scale_quant) | |
| _log_tensor_stats("down_proj_blockscale_swizzled", layer.down_proj_blockscale_swizzled) | |
| _log_tensor_stats("g2_alphas", layer.g2_alphas) |
| # logger.info(f"param:{param}") | ||
| output_size = param[expert_id - self.expert_id_offset].shape[SHARD_ID_TO_SHARDED_DIM["gate"]] | ||
| shard_offsets = [ | ||
| # (shard_id, shard_offset, shard_size) | ||
| ("gate", 0, output_size // 2 * self.tp_size), | ||
| ("up", output_size // 2 * self.tp_size, output_size // 2 * self.tp_size), | ||
| ] | ||
|
|
||
| # logger.info(f"shard_offsets是啥:{shard_offsets}") | ||
|
|
There was a problem hiding this comment.
该函数中新增了多处注释掉的 logger.info 调试残留。建议在合入前删除这些注释代码,避免影响可读性;如需调试,请使用受控的 debug 开关。
DetailsCI Failure Analysis现在让我生成完整的分析报告: 🔍 PR #6944 CI 执行分析📋 PR 基本信息
🚦 CI 检查结果总览: ✅ 1 通过 | ❌ 1 失败 | ⏳ 1 待处理 失败的 Job
通过/待处理的 Job
❌ 失败原因分析Trigger Jenkins for PR (Exit Code: 1)错误摘要: Jenkins 测试 Job 具体错误: 详细执行过程: Jenkins Job 链接: https://cicd.metax-tech.com/job/paddle_fastdeploy_metax_smoketest/5251/console 🔬 根因分析
💡 修复建议1. 首要任务:解决分支冲突# 1. 同步 develop 分支的最新代码
git checkout liucong-eb5
git fetch origin develop
git rebase origin/develop
# 2. 解决冲突后提交
git add .
git rebase --continue
git push origin liucong-eb5 --force2. 查看详细 Jenkins 测试日志访问以下链接查看具体哪个测试用例失败: 3. 根据 Jenkins 日志定位问题在 Jenkins 日志中搜索以下关键字: 4. 本地复现测试# 尝试在本地运行 smoketest 验证功能
# 具体命令请参考 Jenkins job 配置或项目文档5. 签署 CLA# CLA 签署状态为 PENDING,需要签署
# 访问: https://cla-assistant.io/PaddlePaddle/FastDeploy?pullRequest=6944
|
| 操作 | 命令/链接 |
|---|---|
| 查看冲突详情 | git checkout develop && git pull && git checkout liucong-eb5 && git merge develop |
| 查看 Jenkins 日志 | Console Output |
| 重新触发 CI | 在 PR 中写新评论 /retest 或推新 commit |
| 签署 CLA | CLA Assistant |
分析时间: 2026-03-20
是否需要我将此分析报告发布到 PR #6944 的评论区?
DetailsCI Failure Analysis🔍 PR #6944 CI 执行分析🚦 CI 检查结果总览: ✅ 1 通过 | ❌ 1 失败 | ⏳ 1 运行中
❌ 失败原因分析Trigger Jenkins for PR (Exit Code: 1)错误摘要: Jenkins job 构建失败 具体错误: 根因分析:
修复建议:
📌 常见失败处理
分析时间: 2026-03-20 |
|
AI CI Agent Test - Suggestion Demo 建议修复代码建议完整建议
|
|
AI CI Agent Test - Suggestion Demo 建议修复代码建议完整建议
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.