gh-146169: Prevent re-entrant Parse() calls in pyexpat#146179
gh-146169: Prevent re-entrant Parse() calls in pyexpat#146179okiemute04 wants to merge 4 commits intopython:mainfrom
Conversation
2368c9d to
4557157
Compare
afe297f to
4146ff1
Compare
| .. gh-issue: 146169 | ||
| .. section: Library | ||
There was a problem hiding this comment.
| .. gh-issue: 146169 | |
| .. section: Library |
The news entry format is invalid, these should be removed. The filename is also wrong. I think you can delete this file, and re-generate it by blurb-it or the blurb command line tool.
00844fd to
b6c2df5
Compare
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
b6c2df5 to
c924f57
Compare
|
Hi @okiemute04 please avoid force push in the future, which will make the review work harder, see the Dev Guide: https://devguide.python.org/getting-started/pull-request-lifecycle/#don-t-force-push |
|
@aisk Thanks for the feedback. I apologize for the repeated force pushes. I was trying to clean up the commit history and fix the NEWS file issues. for surei will avoid force pushing going forward and just add new commits normally. Appreciate the guidance! |
Lib/test/test_pyexpat.py
Outdated
| expat.errors.XML_ERROR_FINISHED) | ||
|
|
||
| def test_reentrant_parse_crash(self): | ||
| from xml.parsers import expat |
There was a problem hiding this comment.
I think this is already imported at the begaining of this file: https://github.com/python/cpython/blob/main/Lib/test/test_pyexpat.py#L18
There was a problem hiding this comment.
Yes thats true, Thanks @aisk i already updated the pr with a new push to remove the import.
picnixz
left a comment
There was a problem hiding this comment.
Please address the comments.
I suspect that this has been generated by an LLM. If an LLM replies to me I will not consider this PR as being valid as this goes against our ToS about LLMs. So carefully address those comments and properly review any LLM output before committing anything.
| p = expat.ParserCreate() | ||
| data = "<root><child/></root>".encode('utf-8') | ||
| try: | ||
| p.Parse(data, 1) |
There was a problem hiding this comment.
Don't use 0 or 1 for the second argument of parse but real booleans.
There was a problem hiding this comment.
thanks you righ i will change it booleans
| data = "<root><child/></root>".encode('utf-8') | ||
| try: | ||
| p.Parse(data, 1) | ||
| except RuntimeError: |
There was a problem hiding this comment.
Remove this. If the test fails it must fail with the RuntimeError or any other exception that was unexpected
There was a problem hiding this comment.
yeah okay I will remove the try/except and let any unexpected exceptions propagate, the test should fail naturally.
| self.assertEqual(str(cm.exception), | ||
| "cannot call Parse() from within a handler") | ||
|
|
||
| def test_parse_normal(self): |
There was a problem hiding this comment.
Why do we even need this test in the first place? we already test this somewhere don't we?
There was a problem hiding this comment.
for this, yeah test ensures that normal parsing without re-entrant call still works after the change. i will just check if there's already an existing test that covers normal parsing, if so, i will get rid of it to avoid duplication. If not, I'll keep it but simplify it, what do you think @picnixz ?
There was a problem hiding this comment.
Well yes that is what I am asking.
| data = b"\xff\xfe<\x00a\x00>\x00x\x00" | ||
| with self.assertRaises(RuntimeError) as cm: | ||
| for i in range(len(data)): | ||
| p.Parse(data[i:i+1], i == len(data) - 1) | ||
| self.assertEqual(str(cm.exception), |
There was a problem hiding this comment.
Why this complex test? just have p.Parse(data, False) and that's it.
There was a problem hiding this comment.
okay yeah, you're right, the incremental parsing loop isn't necessary for the test, the re-entrant condition can be triggered with a single call, but I will simplify to p.Parse(data, True). The original reproduction code used incremental parsing, but the test only needs to verify that the RuntimeError is raised.
| p.CharacterDataHandler = handler | ||
|
|
||
| p.StartElementHandler = start | ||
| data = b"\xff\xfe<\x00a\x00>\x00x\x00" |
There was a problem hiding this comment.
the payload b"\xff\xfe<\x00a\x00>\x00x\x00" is a UTF-16 encoded XML fragment from the original issue, the \xff\xfe is a BOM (Byte Order Mark) for UTF-16, followed by x. This specific payload triggered the re-entrant crash in the original report.
There was a problem hiding this comment.
But you do not need such payload or? It should be triggered by just reparsing the same data in the handler.
| expat.errors.XML_ERROR_FINISHED) | ||
|
|
||
| def test_reentrant_parse_crash(self): | ||
| p = expat.ParserCreate(encoding="utf-16") |
There was a problem hiding this comment.
the utf-16 encoding is part of the original reproduction code from issue #146169, the crash was specifically triggered with utf-16 encoded data, so keeping it ensures we're testing the exact scenario that caused the problem.
There was a problem hiding this comment.
It is wrong. We just want to check for reentrancy. Handler doesn't care about the payload encoding as long as it is being called.
|
|
||
| if (self->in_callback) { | ||
| PyErr_SetString(PyExc_RuntimeError, | ||
| "cannot call Parse() from within a handler"); |
There was a problem hiding this comment.
Are other re-entrant calls raising a RuntimeError or something else?
There was a problem hiding this comment.
Currently only the Parse() method is blocked. Other re-entrant scenarios like ParseFile() or ParseBuffer() would still crash. Let me investigate whether those need similar protection.
There was a problem hiding this comment.
Ok this is clearly an LLM answer. Since it will be faster if I were to address this I am closing this PR.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
If you want to contribute with an LLM agent:
If you do not follow those guidelines we will restrict your access. |
Fixes #146169
Add a check in
Parse()to prevent calls whenin_callbackis true,as this violates expat's requirements and can cause crashes. Now raises
RuntimeErrorwith a clear message.Added tests to
test_pyexpat.pyto verify the fix:test_reentrant_parse_crash: Verifies re-entrant calls raise RuntimeErrortest_parse_normal: Ensures normal parsing still works