tac: support non-UTF-8 separator#10934
Conversation
|
The ultimate goal it to get the non utf-8 stuff to be fully matching how gnu handles it, can you try to handle it in the same pr? |
6b18317 to
9bfe761
Compare
Merging this PR will improve performance by 3.55%
Performance Changes
Comparing Footnotes
|
|
By fixing this it should also fix the tac-locale test so you can use that to guide you on whether the fix is working properly |
9bfe761 to
1d21b13
Compare
|
GNU testsuite comparison: |
1d21b13 to
e38c5aa
Compare
|
GNU testsuite comparison: |
e38c5aa to
5bde2a8
Compare
|
GNU testsuite comparison: |
|
i found this trick of turning the |
|
|
||
| match c { | ||
| match b { | ||
| _ if inside_brackets && !b.is_ascii() => { |
There was a problem hiding this comment.
TIL GNU also ignores non-ASCII bytes inside bracket expressions
| last_byte = Some(*b); | ||
| } | ||
| _ if !b.is_ascii() => { | ||
| let _ = write!(result, r"(?-u:\x{b:02x})"); |
|
Thanks! |
fixes #9502
open question: GNU tac also accepts non-UTF-8 separators in regex mode. right now, in this pr i return an error. should i handle this in the same PR, or i should do it in another?