Skip to content

【Hackathon 9th No.39】自定义算子 moe_expert_ffn_wint2 单测补充#6687

Draft
cloudforge1 wants to merge 4 commits intoPaddlePaddle:developfrom
cloudforge1:task/039-unit-test-moe-expert-ffn-wint2
Draft

【Hackathon 9th No.39】自定义算子 moe_expert_ffn_wint2 单测补充#6687
cloudforge1 wants to merge 4 commits intoPaddlePaddle:developfrom
cloudforge1:task/039-unit-test-moe-expert-ffn-wint2

Conversation

@cloudforge1
Copy link
Contributor

@cloudforge1 cloudforge1 commented Mar 5, 2026

Motivation

Add unit tests for custom operator moe_expert_ffn_wint2 to improve test coverage and prevent regressions.

Modifications

  • Added operator unit test file: tests/operators/test_moe_expert_ffn_wint2.py
  • Includes a decomposed reference implementation: uses winx_unzip for WINT2 dequant (independently validated) + explicit matmul → SwiGLU → matmul pipeline
  • Validates numerical correctness via np.testing.assert_allclose against reference
  • Covers edge cases: zero input, determinism, sparse experts, dtype variants

Usage or Command

python -m pytest tests/operators/test_moe_expert_ffn_wint2.py -v

Accuracy Tests

Local verification (no GPU):

  • py_compile syntax check: passes
  • pre-commit (black/isort/flake8/ruff): passes

Tests call CUDA custom ops directly (SM80+ required). Full execution validated by CI run_tests_with_coverage job. Will request AI Studio access for on-device verification if needed.

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results. N/A — unit test only.
  • If the current PR is submitting to the release branch, cherry-pick from develop. N/A — targeting develop.

@paddle-bot
Copy link

paddle-bot bot commented Mar 5, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@30f9f33). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6687   +/-   ##
==========================================
  Coverage           ?   72.10%           
==========================================
  Files              ?      392           
  Lines              ?    53835           
  Branches           ?     8459           
==========================================
  Hits               ?    38817           
  Misses             ?    12246           
  Partials           ?     2772           
Flag Coverage Δ
GPU 72.10% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cloudforge1 cloudforge1 force-pushed the task/039-unit-test-moe-expert-ffn-wint2 branch from 76f208e to a0dc725 Compare March 9, 2026 05:20
@cloudforge1 cloudforge1 force-pushed the task/039-unit-test-moe-expert-ffn-wint2 branch from a0dc725 to 61a4357 Compare March 9, 2026 05:42
Reference pipeline validates fused WINT2 MoE FFN against:
  winx_unzip(raw_weights) → matmul → SwiGLU → matmul
with np.testing.assert_allclose(rtol=5e-2, atol=5e-2).

Uses the independently-validated winx_unzip op for WINT2 dequant.
@cloudforge1 cloudforge1 marked this pull request as ready for review March 9, 2026 15:25
@cloudforge1 cloudforge1 marked this pull request as draft March 9, 2026 17:28
@cloudforge1 cloudforge1 marked this pull request as ready for review March 9, 2026 17:59
Root cause: reference _reference_moe_expert_ffn passed raw weights
to winx_unzip, but the fused op receives CUTLASS-rearranged weights.
Production applies perm=[0,3,1,4,2] rearrangement during loading.

Fix: pass up_gate_proj_weight/down_proj_weight (rearranged) instead
of _up_weight_raw/_down_weight_raw to winx_unzip in reference.
@EmmonsCurse
Copy link
Collaborator

Hi @cloudforge1, thanks for contributing additional tests to FastDeploy.

We noticed that a large number of PR updates were pushed in a short period of time recently. Each push triggers the full CI pipeline. Since FastDeploy CI includes multi-platform and multi-device tests, these jobs consume shared CI resources.

Currently our CI resources are relatively limited. When many PR updates are triggered in a short time window, it can significantly increase the CI queue time for all contributors. Recently we observed that the queue time increased several times compared with normal, and some PRs required several hours from submission to completion.

To help keep the CI system stable and fair for all contributors, we suggest the following:

  • Limit the frequency of PR updates, especially avoiding many pushes in a short period of time.
  • Run local validation before pushing whenever possible, so that CI is mainly used for final verification.
  • Batch multiple fixes into a single update rather than pushing after every small change.

These practices can help reduce unnecessary CI triggers and keep the CI queue running more smoothly for everyone. This does not affect your contribution itself — we just need to manage CI resource usage.

Due to the current CI resource pressure, we will temporarily cancel the PRs for now. Once the changes are validated locally and ready for a more stable submission, please feel free to rerun or submit an updated PR.

Thanks again for your contributions and understanding.

@cloudforge1 cloudforge1 marked this pull request as draft March 11, 2026 04:17
@cloudforge1
Copy link
Contributor Author

@EmmonsCurse Batch-push protocol adopted — single push per PR, full local validation. All 14 subsequent submissions (#6730#6771, #6881#6941) follow this.

Separately, proposed CI pipeline optimizations addressing the 8-workflows-per-push architecture upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants