-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I’ve seen related issues around alignment and vectorization (e.g., #3146
) and understand pandas prioritizes index alignment and general-purpose data structures. However, this question is focused specifically on whether SIMD or other low-level vectorization techniques have been considered or intentionally avoided in the internal hashtable engine.
Feature Description
Hi pandas team,
I'm wondering what the current stance is in the community regarding the possibility of introducing vectorized (e.g., SIMD-based) operations into pandas' hashtable infrastructure (e.g., used in groupby, factorize, categorical operations, etc.).
Hashtable lookups and insertions are often performance-critical paths, especially when dealing with large, high-cardinality data. With modern CPUs supporting SIMD instructions (e.g., AVX2, AVX-512), has there been any past discussion or interest in exploring:
-
SIMD acceleration for probing and inserting into hash tables?
-
Potential trade-offs in code complexity, portability, and maintainability?
-
Alignment with pandas’ reliance on NumPy, PyArrow, or other external backends for low-level performance?
Would love to know the core team’s view — especially if this is considered an area for experimentation, or whether existing architectural decisions rule this out.
Thanks for the amazing work on pandas!
Alternative Solutions
.
Additional Context
.