Support Flashinfer-cutedsl nvfp4 grouped masked gemm by mpgemm · Pull Request #6924 · PaddlePaddle/FastDeploy

mpgemm · 2026-03-18T14:54:23Z

Motivation

支持 flashinfer-cutedsl NVFP4 FusedMoE 计算。

Modifications

1.新增 /moe/flashinfer_cutedsl_moe.py 用于引入 flashinfer-cutedsl nvfp4 group gemm 支持 FFN 计算。

2.在 /quantization/nvfp4.py 新增 ModelOptNvFp4FusedMoECuteDSL 以支持Nvfp4FusedMoE 计算。

3.新增测试文件 tests/layer/test_cutedsl_moe.py 和 tests/layers/test_nvfp4_fusedmoe.py

Usage or Command

Paddle Flashinfer 和 nvidia-cutlass-dsl 存在问题，导入时需要修改 python3.10/site-packages/flashinfer 和 nvidia-dsl。
总结出了三个问题，1个nvidia-dsl和2个flashinfer。

1：nvidia_cutlass_dsl/python_packages/cutlass/torch.py 将 torch.device 改成 "torch.device"。
2：flashinfer/utils.py. get_compute_capability函数下面改成：
@functools.cache
def get_compute_capability(device: torch.device) -> Tuple[int, int]:
return torch.cuda.get_device_capability(device)
if device.type != "cuda":
raise ValueError("device must be a cuda device")
return torch.cuda.get_device_capability(device.index)
注：如果遇到device的问题，将 A.place 换成 A.device 可以解决大部分问题。
3：flashinfer/cute_dsl/blockscaled_gemm.py
首先 import cuda.bindings.driver as cuda
然后将 cutlass_torch.current_stream() 替换成 cuda.CUstream(torch.cuda.current_stream().stream_base.raw_stream)

Flashinfer-cutedsl nvfp4 grouped mask gemm 算子测试： python tests/layer/test_cutedsl_moe.py

decode误差测试：python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tests/layers/test_nvfp4_fusedmoe.py TestFusedMoE.test_decode_correctness 2>&1

Prefill测试: NVFP4_TEST_MODE=prefill NVFP4_TEST_ITERS=2 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tests/layers/test_nvfp4_fusedmoe.py

Decode测试： NVFP4_TEST_MODE=decode NVFP4_TEST_ITERS=2 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tests/layers/test_nvfp4_fusedmoe.py

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

CLAassistant · 2026-03-18T14:54:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2026-03-18T14:54:35Z

Thanks for your contribution!

mpgemm

support flashinfer-cutedsl nvfp4 fusedmoe， prefill仍然存在问题

fastdeploy/model_executor/layers/quantization/nvfp4.py

lizexu123 · 2026-03-19T09:01:47Z

fastdeploy/model_executor/layers/moe/flashinfer_cutedsl_moe.py

+        raise ValueError(f"Unsupported cute dtype {input.dtype}")
+
+
+def flashinfer_cutedsl_moe_masked(


你这个函数写了，但是apply里面没用到是为什么？

mpgemm added 4 commits March 18, 2026 21:00

flashinfer-cutedsl-nvfp4-groupgemm

395357a

flashinfer-cutedsl-nvfp4-groupgemm

07e6fae

flashinfer-cutedsl-nvfp4-groupgemm

0140f51

flashinfer-cutedsl-nvfp4-groupgemm

6d65bc8

mpgemm temporarily deployed to Metax_ci March 18, 2026 14:54 — with GitHub Actions Inactive

paddle-bot bot added the contributor External developers label Mar 18, 2026

mpgemm commented Mar 18, 2026

View reviewed changes

fix prefill

400c6f9

mpgemm temporarily deployed to Metax_ci March 19, 2026 03:50 — with GitHub Actions Inactive

delete cutedslmoe

98e44bc

mpgemm temporarily deployed to Metax_ci March 19, 2026 05:30 — with GitHub Actions Inactive

prefill error

0bc8a0c

mpgemm had a problem deploying to Metax_ci March 19, 2026 06:38 — with GitHub Actions Failure

zhoutianzi666 previously approved these changes Mar 19, 2026

View reviewed changes

fastdeploy/model_executor/layers/quantization/nvfp4.py Show resolved Hide resolved

fix prefill

3d8ecc3

mpgemm dismissed zhoutianzi666’s stale review via 3d8ecc3 March 19, 2026 08:55

mpgemm had a problem deploying to Metax_ci March 19, 2026 08:55 — with GitHub Actions Failure

lizexu123 reviewed Mar 19, 2026

View reviewed changes

test decode correctness

9208d29

mpgemm had a problem deploying to Metax_ci March 19, 2026 13:50 — with GitHub Actions Failure

test decode correctness

8757d7f

mpgemm had a problem deploying to Metax_ci March 19, 2026 13:52 — with GitHub Actions Failure

eb5 size

d4a96f0

mpgemm had a problem deploying to Metax_ci March 20, 2026 03:38 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Flashinfer-cutedsl nvfp4 grouped masked gemm#6924

Support Flashinfer-cutedsl nvfp4 grouped masked gemm#6924
mpgemm wants to merge 11 commits intoPaddlePaddle:developfrom
mpgemm:develop

mpgemm commented Mar 18, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 18, 2026

Uh oh!

paddle-bot bot commented Mar 18, 2026

Uh oh!

mpgemm left a comment

Uh oh!

Uh oh!

lizexu123 Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		raise ValueError(f"Unsupported cute dtype {input.dtype}")


		def flashinfer_cutedsl_moe_masked(

Conversation

mpgemm commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented Mar 18, 2026

Uh oh!

paddle-bot bot commented Mar 18, 2026

Uh oh!

mpgemm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lizexu123 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mpgemm commented Mar 18, 2026 •

edited

Loading