fix(wc): respect C/POSIX locale for character counting#11006
fix(wc): respect C/POSIX locale for character counting#11006naoNao89 wants to merge 2 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
Merging this PR will improve performance by ×2.2
Performance Changes
Comparing Footnotes
|
|
As no human would write such duplication of code, i guess it is LLM generated ... |
There was a problem hiding this comment.
In the newly added tests you always use -m or -cm. However, this means only your changes in count_fast.rs are tested due to the logic in word_count_from_reader in wc.rs. To test your changes in wc.rs you also have to provide -w or -L.
|
sr, refactored 💀 |
d906c13 to
1ed5ccd
Compare
src/uu/wc/src/wc.rs
Outdated
| } | ||
| if SHOW_CHARS { | ||
| total.chars += 1; | ||
| if chars_are_bytes { |
There was a problem hiding this comment.
seriously ?!
please review your patches before substitutions ...
There was a problem hiding this comment.
i thought clippy had the ability to check for empty if :v
|
GNU testsuite comparison: |
Modify wc -m to count bytes instead of UTF-8 characters when LC_ALL, LC_CTYPE, or LANG is set to C or POSIX. This matches GNU coreutils behavior where MB_CUR_MAX == 1 in these locales. Changes: - Add is_c_or_posix_locale() helper in count_fast.rs - Export and reuse function in wc.rs to avoid duplication - Update fast path and UTF-8 decoding path - Add regression tests with Vietnamese text Fixes uutils#9712, fixes uutils#5831.
1ed5ccd to
bf04096
Compare
|
GNU testsuite comparison: |
Add tests with -w flag to ensure both count_fast.rs and wc.rs paths are tested for locale-aware character counting.
|
GNU testsuite comparison: |
In C/POSIX locale,
wc -mnow counts bytes (not UTF-8 chars), matching GNU coreutils behavior using MB_CUR_MAX logicFixes #9712
Fixes #5831