ENH: Vectorization (e.g. SIMD) in pandas' Hashtable Operations for Performance Improvement

### Feature Type

- [x] Adding new functionality to pandas

- [x] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I’ve seen related issues around alignment and vectorization (e.g., #3146
) and understand pandas prioritizes index alignment and general-purpose data structures. However, this question is focused specifically on whether SIMD or other low-level vectorization techniques have been considered or intentionally avoided in the internal hashtable engine.

### Feature Description

Hi pandas team,

I'm wondering what the current stance is in the community regarding the possibility of introducing vectorized (e.g., SIMD-based) operations into pandas' hashtable infrastructure (e.g., used in groupby, factorize, categorical operations, etc.).

Hashtable lookups and insertions are often performance-critical paths, especially when dealing with large, high-cardinality data. With modern CPUs supporting SIMD instructions (e.g., AVX2, AVX-512), has there been any past discussion or interest in exploring:

- SIMD acceleration for probing and inserting into hash tables?

- Potential trade-offs in code complexity, portability, and maintainability?

- Alignment with pandas’ reliance on NumPy, PyArrow, or other external backends for low-level performance?

Would love to know the core team’s view — especially if this is considered an area for experimentation, or whether existing architectural decisions rule this out.

Thanks for the amazing work on pandas!

### Alternative Solutions

.

### Additional Context

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Vectorization (e.g. SIMD) in pandas' Hashtable Operations for Performance Improvement #63374

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Vectorization (e.g. SIMD) in pandas' Hashtable Operations for Performance Improvement #63374

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions