Skip to content

Conversation

@zhang-hui-yulo
Copy link
Contributor

@zhang-hui-yulo zhang-hui-yulo commented Dec 13, 2025

Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.

CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.

  • align tile of mfan in mmq.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have MFMA hardware for development I would suggest that you simply don't touch the corresponding code for now.

DATA_LAYOUT_J_MAJOR = 10, // Matrix C for CDNA and RDNA4, int and float matrix C for RDNA3.
DATA_LAYOUT_I_MAJOR_MIRRORED = 20,
DATA_LAYOUT_J_MAJOR_MIRRORED = 30,
DATA_LAYOUT_I_MAJOR_DUAL = 40, // Matrix A&B for RDNA3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you're not using I_MAJOR_MIRRORED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a check in I_MAJOR_MIRRORED, ne = I * J / (WARP_SIZE/4), so it's for volta 8x8 gemm, so I add I_MAJOR_DUAL to handle RDNA3 problems, I don't think that mixing volta and rdna3 codes is a good choice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about what is a good choice, it's about what is the least bad choice. For this PR it's fine to add an extra value to the enum but I will refactor this to instead use either I_MAJOR or I_MAJOR_MIRRORED at some later time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I_MAJOR for RDNA3 matrix A&B is the worst choice, I_MAJOR is only for RDNA3 matrix C not A&B, or you can only judge A&B or C by the shape, this is the current way is doing.

It can be moved to I_MAJOR_MIRRORED if you think mixing Volta and RDNA3 is acceptable.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 13, 2025
@zhang-hui-yulo
Copy link
Contributor Author

Honestly, as the refactor changes too much code, keeping the old path of MFMA still needs full test on CDNA, so I think it's worth to have a try to make the code correct first.

@JohannesGaessler
Copy link
Collaborator

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

@zhang-hui-yulo
Copy link
Contributor Author

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

I agree, I also don't want to touch MFMA part as I've been spending more than one month to acquire a MI308 but there is still no good response, I'm not sure if I'm able to get one.

Anyway, could you help to run a quick test of MUL_MAT on your CDNA then I can decide how to move forward? Thank you.

But, even not touch MFMA way will still modify the code of MFMA in mmq, it still need your help to do test, thank you.

@JohannesGaessler
Copy link
Collaborator

test-backend-ops is failing on my MI100: log.txt

I'm willing to give you SSH access for development purposes but the machine with the MI100 would only be running during the daytime in Germany since it's in my living space and very loud.

@zhang-hui-yulo
Copy link
Contributor Author

Thank you for the help, inf is not a good signal as it loads wrong data, let me revert CDNA part first then wait for AMD's response for a while to see if I'm able to access a CDNA3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants