【Hackathon 9th No.39】自定义算子 moe_expert_ffn_wint2 单测补充#6687
【Hackathon 9th No.39】自定义算子 moe_expert_ffn_wint2 单测补充#6687cloudforge1 wants to merge 4 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6687 +/- ##
==========================================
Coverage ? 72.10%
==========================================
Files ? 392
Lines ? 53835
Branches ? 8459
==========================================
Hits ? 38817
Misses ? 12246
Partials ? 2772
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
76f208e to
a0dc725
Compare
a0dc725 to
61a4357
Compare
61a4357 to
7ec07ba
Compare
Reference pipeline validates fused WINT2 MoE FFN against: winx_unzip(raw_weights) → matmul → SwiGLU → matmul with np.testing.assert_allclose(rtol=5e-2, atol=5e-2). Uses the independently-validated winx_unzip op for WINT2 dequant.
Root cause: reference _reference_moe_expert_ffn passed raw weights to winx_unzip, but the fused op receives CUTLASS-rearranged weights. Production applies perm=[0,3,1,4,2] rearrangement during loading. Fix: pass up_gate_proj_weight/down_proj_weight (rearranged) instead of _up_weight_raw/_down_weight_raw to winx_unzip in reference.
|
Hi @cloudforge1, thanks for contributing additional tests to FastDeploy. We noticed that a large number of PR updates were pushed in a short period of time recently. Each push triggers the full CI pipeline. Since FastDeploy CI includes multi-platform and multi-device tests, these jobs consume shared CI resources. Currently our CI resources are relatively limited. When many PR updates are triggered in a short time window, it can significantly increase the CI queue time for all contributors. Recently we observed that the queue time increased several times compared with normal, and some PRs required several hours from submission to completion. To help keep the CI system stable and fair for all contributors, we suggest the following:
These practices can help reduce unnecessary CI triggers and keep the CI queue running more smoothly for everyone. This does not affect your contribution itself — we just need to manage CI resource usage. Due to the current CI resource pressure, we will temporarily cancel the PRs for now. Once the changes are validated locally and ready for a more stable submission, please feel free to rerun or submit an updated PR. Thanks again for your contributions and understanding. |
|
@EmmonsCurse Batch-push protocol adopted — single push per PR, full local validation. All 14 subsequent submissions (#6730–#6771, #6881–#6941) follow this. Separately, proposed CI pipeline optimizations addressing the 8-workflows-per-push architecture upstream. |
Motivation
Add unit tests for custom operator
moe_expert_ffn_wint2to improve test coverage and prevent regressions.Modifications
tests/operators/test_moe_expert_ffn_wint2.pywinx_unzipfor WINT2 dequant (independently validated) + explicitmatmul → SwiGLU → matmulpipelinenp.testing.assert_allcloseagainst referenceUsage or Command
Accuracy Tests
Local verification (no GPU):
py_compilesyntax check: passesTests call CUDA custom ops directly (SM80+ required). Full execution validated by CI
run_tests_with_coveragejob. Will request AI Studio access for on-device verification if needed.Checklist
pre-commitbefore commit.releasebranch, cherry-pick fromdevelop. N/A — targetingdevelop.