take fast path if c2c transform does not need padding or trimming#283
take fast path if c2c transform does not need padding or trimming#283ndgrigorian merged 10 commits intoIntelPython:masterfrom
Conversation
|
Can one of the admins verify this patch? |
|
Oops, I didn't realize that the CI would have to be approved again after fixing the whitespace. Sorry! The tests previously approved did run and pass, though. |
It's no problem, thanks for this contribution to the project. :) I'm not sure if any of our tests currently cover this case and compare with e.g. numpy, so adding a test would be good too. |
antonwolfy
left a comment
There was a problem hiding this comment.
Could you please populate the changelog also.
|
I just rebased off the master branch to get rid of the merge conflict, so the conversation looks a bit garbled--apologies. |
antonwolfy
left a comment
There was a problem hiding this comment.
Thank you @chillenb a lot for the identified and proposed improvement.
No more comments from me.
@ndgrigorian, let me if you have any comment, or we can merge the PR them.
ndgrigorian
left a comment
There was a problem hiding this comment.
assuming tests pass this LGTM. I've merged in the master branch and will merge when tests pass
if you do find a way to make such an optimization for rfftn and irfftn, I would look forward to seeing that PR :) thank you again for the contribution @chillenb
|
Thanks for all your help! Next time, I will try to make sure fewer corrections are required. |
Thanks for creating and maintaining this package!
If you try to get MKL C-API performance out of this package, you will probably discover that
fftnis very sensitive to the input arguments. Here is an example:This is because
mkl_fft.fftnalways takes a slow path (_iter_fftnd) whens != None. Furthermore, the NumPy and SciPy interfaces don't pass throughs=Noneunchanged, so they are also forced to take this path.This pull request allows
fftnto detect when the inputsargument is equivalent tos=Noneso it can use the faster function_iter_complementary.After these code changes, performance aligns better with expectations:
Test system: dual-socket Xeon Platinum 8268 server.