HIP: Refactor mma for RDNA and CDNA #17990

zhang-hui-yulo · 2025-12-13T09:38:37Z

Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.

CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.

align tile of mfan in mmq.

JohannesGaessler

If you don't have MFMA hardware for development I would suggest that you simply don't touch the corresponding code for now.

JohannesGaessler · 2025-12-13T13:09:58Z

ggml/src/ggml-cuda/mma.cuh

+        DATA_LAYOUT_J_MAJOR           = 10, // Matrix C for CDNA and RDNA4, int and float matrix C for RDNA3.
+        DATA_LAYOUT_I_MAJOR_MIRRORED  = 20,
+        DATA_LAYOUT_J_MAJOR_MIRRORED  = 30,
+        DATA_LAYOUT_I_MAJOR_DUAL      = 40, // Matrix A&B for RDNA3.


Is there a reason why you're not using I_MAJOR_MIRRORED?

I just have a check in I_MAJOR_MIRRORED, ne = I * J / (WARP_SIZE/4), so it's for volta 8x8 gemm, so I add I_MAJOR_DUAL to handle RDNA3 problems, I don't think that mixing volta and rdna3 codes is a good choice.

It's not about what is a good choice, it's about what is the least bad choice. For this PR it's fine to add an extra value to the enum but I will refactor this to instead use either I_MAJOR or I_MAJOR_MIRRORED at some later time.

Honestly, I_MAJOR for RDNA3 matrix A&B is the worst choice, I_MAJOR is only for RDNA3 matrix C not A&B, or you can only judge A&B or C by the shape, this is the current way is doing.

It can be moved to I_MAJOR_MIRRORED if you think mixing Volta and RDNA3 is acceptable.

zhang-hui-yulo · 2025-12-14T03:07:42Z

Honestly, as the refactor changes too much code, keeping the old path of MFMA still needs full test on CDNA, so I think it's worth to have a try to make the code correct first.

JohannesGaessler · 2025-12-14T16:37:45Z

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

zhang-hui-yulo · 2025-12-15T02:19:51Z

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

I agree, I also don't want to touch MFMA part as I've been spending more than one month to acquire a MI308 but there is still no good response, I'm not sure if I'm able to get one.

Anyway, could you help to run a quick test of MUL_MAT on your CDNA then I can decide how to move forward? Thank you.

But, even not touch MFMA way will still modify the code of MFMA in mmq, it still need your help to do test, thank you.

JohannesGaessler · 2025-12-15T09:32:48Z

test-backend-ops is failing on my MI100: log.txt

I'm willing to give you SSH access for development purposes but the machine with the MI100 would only be running during the daytime in Germany since it's in my living space and very loud.

zhang-hui-yulo · 2025-12-15T13:01:31Z

Thank you for the help, inf is not a good signal as it loads wrong data, let me revert CDNA part first then wait for AMD's response for a while to see if I'm able to access a CDNA3.

zhang hui added 6 commits December 13, 2025 13:42

mma.cuh for rdna4

318cb5b

mma for rdna3

074b931

mmq for rdna4

98846cb

mmq for rdna3

62e4954

align i-major and j-major

8b26bc3

cdna

afb0e3d

zhang-hui-yulo requested review from JohannesGaessler and am17an as code owners December 13, 2025 09:38

fix cuda error

6b8ed41

zhang-hui-yulo marked this pull request as draft December 13, 2025 11:32

loci-dev mentioned this pull request Dec 13, 2025

UPSTREAM PR #17990: HIP: Refactor mma for RDNA and CDNA auroralabs-loci/llama.cpp#548

Open

add missing tile of mfma

6acad9c

JohannesGaessler reviewed Dec 13, 2025

View reviewed changes

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIP: Refactor mma for RDNA and CDNA #17990

HIP: Refactor mma for RDNA and CDNA #17990

zhang-hui-yulo commented Dec 13, 2025 •

edited

Loading

Uh oh!

JohannesGaessler left a comment

Uh oh!

JohannesGaessler Dec 13, 2025

Uh oh!

zhang-hui-yulo Dec 14, 2025

Uh oh!

JohannesGaessler Dec 14, 2025

Uh oh!

zhang-hui-yulo Dec 15, 2025

Uh oh!

zhang-hui-yulo commented Dec 14, 2025

Uh oh!

JohannesGaessler commented Dec 14, 2025

Uh oh!

zhang-hui-yulo commented Dec 15, 2025

Uh oh!

JohannesGaessler commented Dec 15, 2025

Uh oh!

zhang-hui-yulo commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HIP: Refactor mma for RDNA and CDNA #17990

Are you sure you want to change the base?

HIP: Refactor mma for RDNA and CDNA #17990

Conversation

zhang-hui-yulo commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

zhang-hui-yulo Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

zhang-hui-yulo Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

zhang-hui-yulo commented Dec 14, 2025

Uh oh!

JohannesGaessler commented Dec 14, 2025

Uh oh!

zhang-hui-yulo commented Dec 15, 2025

Uh oh!

JohannesGaessler commented Dec 15, 2025

Uh oh!

zhang-hui-yulo commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhang-hui-yulo commented Dec 13, 2025 •

edited

Loading