Nvfp4 decode by mpgemm · Pull Request #6955 · PaddlePaddle/FastDeploy

mpgemm · 2026-03-20T09:57:12Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-20T09:57:17Z

Thanks for your contribution!

CLAassistant · 2026-03-20T09:57:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

lizexu123 · 2026-03-20T10:31:27Z

fastdeploy/worker/gpu_model_runner.py

            4 if not self.speculative_decoding else (self.speculative_config.num_speculative_tokens + 1) * 4
        )
        self.infer_seed_increment = paddle.full(
            shape=[self.scheduler_config.max_num_seqs, 1], fill_value=self.increment_value, dtype="int64"


这里为什么修改？

lizexu123 · 2026-03-20T10:31:56Z

fastdeploy/model_executor/layers/quantization/nvfp4.py

+                return scale.max(axis=1).values.cast("float32")
+            raise ValueError(f"{name} rank not supported: shape={scale.shape}")
+
+        w1_alpha = _to_expert_scale_vec(layer.g1_alphas, "g1_alphas")


这边改成flashinfer_cutedsl_moe.py的接口

lizexu123 · 2026-03-20T10:32:17Z

fastdeploy/model_executor/layers/quantization/nvfp4.py

        gate: nn.Layer,
        topk_ids_hookfunc: Callable = None,
    ) -> paddle.Tensor:
        pass


这里把flashinfer_cutlass的补全，它只能跑prefill

lizexu123 · 2026-03-20T10:44:42Z

fastdeploy/model_executor/layers/quantization/nvfp4.py

+    def load_up_proj_weight_first(self) -> bool:
        # FlashInfer CUTLASS kernel assumes [Up, Gate] Proj as W13
+        # 目前默认给True
+        return False


这里不能强制设置为False

mpgemm added 2 commits March 20, 2026 17:16

first commit

f2db042

nvfp4

f126507

mpgemm had a problem deploying to Metax_ci March 20, 2026 09:57 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Mar 20, 2026

lizexu123 reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvfp4 decode#6955

Nvfp4 decode#6955
mpgemm wants to merge 2 commits intoPaddlePaddle:developfrom
mpgemm:nvfp4-decode

mpgemm commented Mar 20, 2026

Uh oh!

paddle-bot bot commented Mar 20, 2026

Uh oh!

CLAassistant commented Mar 20, 2026

Uh oh!

lizexu123 Mar 20, 2026

Uh oh!

lizexu123 Mar 20, 2026

Uh oh!

lizexu123 Mar 20, 2026

Uh oh!

lizexu123 Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mpgemm commented Mar 20, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 20, 2026

Uh oh!

CLAassistant commented Mar 20, 2026

Uh oh!

lizexu123 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

lizexu123 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

lizexu123 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

lizexu123 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants