[DO NOT MERGE] Sasha triage topk test failure amd#1947
[DO NOT MERGE] Sasha triage topk test failure amd#1947oleksandr-pavlyk wants to merge 25 commits intofeature/topkfrom
Conversation
oleksandr-pavlyk
commented
Dec 22, 2024
- Have you provided a meaningful PR description?
- Have you added a test, reproducer or referred to an issue with a reproducer?
- Have you tested your changes locally for CPU and GPU devices?
- Have you made sure that new changes do not introduce compiler warnings?
- Have you checked performance impact of proposed changes?
- Have you added documentation for your changes, if necessary?
- Have you added your changes to the changelog?
- If this PR is a work in progress, are you opening the PR as a draft?
The implementation leverages existing merge-sort code, and partially sorts the array in cases where a parial sort reduces the size of temporary memory allocation
Reduces amount of casting. `k` will need to fit in `py::ssize_t` regardless.
Instead of using an overload to handle the `axis=None` case, use std::optional and check for trailing_dims_to_search in validation logic
rounded value of k must be divisible by the merge sort chunk size
Reuse that function call in sorting code-base where argsort is used.
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1947/index.html |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_368 ran successfully. |
de53055 to
3e5e303
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_370 ran successfully. |
3e5e303 to
d50092a
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_340 ran successfully. |
Until it is passed over to the host function, and
unique_ptr's ownership is released.
Also reduced allocation sizes, where too much was being
allocated.
Introduce smart_malloc_device, etc.
The smart_malloc_device<T>(count, q) makes USM allocation
and returns a unique_ptr<T, USMDeleter> which owns the
allocation. The function throws an exception (std::runtime_error)
if USM allocation is not successful.
Introduce async_smart_free.
This function intends to replace use of host_task submissions
to manage USM temporary deallocations.
The usage is as follows:
```
// returns unique_ptr
auto alloc_owner = smart_malloc_device<T>(count, q);
// get raw pointer for use in kernels
T *data = alloc_owner.get();
[..SNIP..]
// submit host_task that releases the unique_ptr
// after the host task was successfully submitted
// and ownership of USM allocation is transfered to
// the said host task
sycl::event ht_ev =
async_smart_free(q,
dependent_events,
alloc_owner);
[...SNIP...]
```
bbb55f1 to
da3fbcc
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_340 ran successfully. |
d50092a to
b411407
Compare
8214855 to
04b6629
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_343 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_344 ran successfully. |
Factored out map_back_impl projects indexing from flat index to a row-wise index. Removed dead code excluded by preprocessor conditional.
04b6629 to
dfb521f
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_346 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_347 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_348 ran successfully. |
Replaced it with hand-written implementation of ceil_log2(n),
such that n <= (dectype(n){1} << ceil_log2(n)) is true for all
positive values of `n` in the range.
c351d0b to
c1f8a74
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_349 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_350 ran successfully. |
0869128 to
210500f
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_344 ran successfully. |
Add check of computed against expected indices
210500f to
387a3d9
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_345 ran successfully. |
One asserts that at least one unique pointer is specified. Another that specified arguments are unique pointers with USMDeleter.
387a3d9 to
fd65511
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_347 ran successfully. |
84d1388 to
809cb70
Compare