-
Notifications
You must be signed in to change notification settings - Fork 14.1k
HIP: Refactor mma for RDNA and CDNA #17990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HIP: Refactor mma for RDNA and CDNA #17990
Conversation
JohannesGaessler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have MFMA hardware for development I would suggest that you simply don't touch the corresponding code for now.
| DATA_LAYOUT_J_MAJOR = 10, // Matrix C for CDNA and RDNA4, int and float matrix C for RDNA3. | ||
| DATA_LAYOUT_I_MAJOR_MIRRORED = 20, | ||
| DATA_LAYOUT_J_MAJOR_MIRRORED = 30, | ||
| DATA_LAYOUT_I_MAJOR_DUAL = 40, // Matrix A&B for RDNA3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you're not using I_MAJOR_MIRRORED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a check in I_MAJOR_MIRRORED, ne = I * J / (WARP_SIZE/4), so it's for volta 8x8 gemm, so I add I_MAJOR_DUAL to handle RDNA3 problems, I don't think that mixing volta and rdna3 codes is a good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about what is a good choice, it's about what is the least bad choice. For this PR it's fine to add an extra value to the enum but I will refactor this to instead use either I_MAJOR or I_MAJOR_MIRRORED at some later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I_MAJOR for RDNA3 matrix A&B is the worst choice, I_MAJOR is only for RDNA3 matrix C not A&B, or you can only judge A&B or C by the shape, this is the current way is doing.
It can be moved to I_MAJOR_MIRRORED if you think mixing Volta and RDNA3 is acceptable.
|
Honestly, as the refactor changes too much code, keeping the old path of MFMA still needs full test on CDNA, so I think it's worth to have a try to make the code correct first. |
|
If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you. |
I agree, I also don't want to touch MFMA part as I've been spending more than one month to acquire a MI308 but there is still no good response, I'm not sure if I'm able to get one. Anyway, could you help to run a quick test of MUL_MAT on your CDNA then I can decide how to move forward? Thank you. But, even not touch MFMA way will still modify the code of MFMA in mmq, it still need your help to do test, thank you. |
|
I'm willing to give you SSH access for development purposes but the machine with the MI100 would only be running during the daytime in Germany since it's in my living space and very loud. |
|
Thank you for the help, inf is not a good signal as it loads wrong data, let me revert CDNA part first then wait for AMD's response for a while to see if I'm able to access a CDNA3. |
Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.
CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.