Implement optimized movemasks for NEON #1236
Open
+87
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While the scalar post-processing required to obtain one bit per lane
makes this more expensive than directly supporting variable-sized bit
groups (as done in Zstandard1), the result is still an improvement
over the current lane-by-lane algorithm.
To reduce duplication,
XSIMD_LITTLE_ENDIANis moved frommath/xsimd_rem_pio2.hpptoconfig/xsimd_config.hpp, and will now beavailable outside the defining header.
Footnotes
See [lazy] Optimize ZSTD_row_getMatchMask for levels 8-10 for ARM facebook/zstd#3139, namely
ZSTD_row_matchMaskGroupWidth. ↩