Skip to content

gh-146169: Prevent re-entrant Parse() calls in pyexpat#146179

Closed
okiemute04 wants to merge 4 commits intopython:mainfrom
okiemute04:fix-pyexpat-reentrant-crash-final
Closed

gh-146169: Prevent re-entrant Parse() calls in pyexpat#146179
okiemute04 wants to merge 4 commits intopython:mainfrom
okiemute04:fix-pyexpat-reentrant-crash-final

Conversation

@okiemute04
Copy link
Contributor

@okiemute04 okiemute04 commented Mar 19, 2026

Fixes #146169

Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Added tests to test_pyexpat.py to verify the fix:

  • test_reentrant_parse_crash: Verifies re-entrant calls raise RuntimeError
  • test_parse_normal: Ensures normal parsing still works

@okiemute04 okiemute04 force-pushed the fix-pyexpat-reentrant-crash-final branch 3 times, most recently from 2368c9d to 4557157 Compare March 19, 2026 17:10
@okiemute04 okiemute04 changed the title gh-146174: Prevent re-entrant Parse() calls in pyexpat gh-146169: Prevent re-entrant Parse() calls in pyexpat Mar 19, 2026
@okiemute04 okiemute04 force-pushed the fix-pyexpat-reentrant-crash-final branch 2 times, most recently from afe297f to 4146ff1 Compare March 19, 2026 17:22
Comment on lines +1 to +4
.. gh-issue: 146169
.. section: Library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. gh-issue: 146169
.. section: Library

The news entry format is invalid, these should be removed. The filename is also wrong. I think you can delete this file, and re-generate it by blurb-it or the blurb command line tool.

@okiemute04 okiemute04 force-pushed the fix-pyexpat-reentrant-crash-final branch 2 times, most recently from 00844fd to b6c2df5 Compare March 19, 2026 17:32
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
@okiemute04 okiemute04 force-pushed the fix-pyexpat-reentrant-crash-final branch from b6c2df5 to c924f57 Compare March 19, 2026 17:35
@aisk
Copy link
Member

aisk commented Mar 19, 2026

Hi @okiemute04 please avoid force push in the future, which will make the review work harder, see the Dev Guide: https://devguide.python.org/getting-started/pull-request-lifecycle/#don-t-force-push

@okiemute04
Copy link
Contributor Author

@aisk Thanks for the feedback. I apologize for the repeated force pushes. I was trying to clean up the commit history and fix the NEWS file issues. for surei will avoid force pushing going forward and just add new commits normally. Appreciate the guidance!

expat.errors.XML_ERROR_FINISHED)

def test_reentrant_parse_crash(self):
from xml.parsers import expat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already imported at the begaining of this file: https://github.com/python/cpython/blob/main/Lib/test/test_pyexpat.py#L18

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats true, Thanks @aisk i already updated the pr with a new push to remove the import.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments.

I suspect that this has been generated by an LLM. If an LLM replies to me I will not consider this PR as being valid as this goes against our ToS about LLMs. So carefully address those comments and properly review any LLM output before committing anything.

p = expat.ParserCreate()
data = "<root><child/></root>".encode('utf-8')
try:
p.Parse(data, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use 0 or 1 for the second argument of parse but real booleans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks you righ i will change it booleans

data = "<root><child/></root>".encode('utf-8')
try:
p.Parse(data, 1)
except RuntimeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this. If the test fails it must fail with the RuntimeError or any other exception that was unexpected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah okay I will remove the try/except and let any unexpected exceptions propagate, the test should fail naturally.

self.assertEqual(str(cm.exception),
"cannot call Parse() from within a handler")

def test_parse_normal(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even need this test in the first place? we already test this somewhere don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this, yeah test ensures that normal parsing without re-entrant call still works after the change. i will just check if there's already an existing test that covers normal parsing, if so, i will get rid of it to avoid duplication. If not, I'll keep it but simplify it, what do you think @picnixz ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes that is what I am asking.

Comment on lines +288 to +292
data = b"\xff\xfe<\x00a\x00>\x00x\x00"
with self.assertRaises(RuntimeError) as cm:
for i in range(len(data)):
p.Parse(data[i:i+1], i == len(data) - 1)
self.assertEqual(str(cm.exception),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this complex test? just have p.Parse(data, False) and that's it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay yeah, you're right, the incremental parsing loop isn't necessary for the test, the re-entrant condition can be triggered with a single call, but I will simplify to p.Parse(data, True). The original reproduction code used incremental parsing, but the test only needs to verify that the RuntimeError is raised.

p.CharacterDataHandler = handler

p.StartElementHandler = start
data = b"\xff\xfe<\x00a\x00>\x00x\x00"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this payload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the payload b"\xff\xfe<\x00a\x00>\x00x\x00" is a UTF-16 encoded XML fragment from the original issue, the \xff\xfe is a BOM (Byte Order Mark) for UTF-16, followed by x. This specific payload triggered the re-entrant crash in the original report.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you do not need such payload or? It should be triggered by just reparsing the same data in the handler.

expat.errors.XML_ERROR_FINISHED)

def test_reentrant_parse_crash(self):
p = expat.ParserCreate(encoding="utf-16")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why using utf16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the utf-16 encoding is part of the original reproduction code from issue #146169, the crash was specifically triggered with utf-16 encoded data, so keeping it ensures we're testing the exact scenario that caused the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is wrong. We just want to check for reentrancy. Handler doesn't care about the payload encoding as long as it is being called.


if (self->in_callback) {
PyErr_SetString(PyExc_RuntimeError,
"cannot call Parse() from within a handler");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other re-entrant calls raising a RuntimeError or something else?

Copy link
Contributor Author

@okiemute04 okiemute04 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only the Parse() method is blocked. Other re-entrant scenarios like ParseFile() or ParseBuffer() would still crash. Let me investigate whether those need similar protection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is clearly an LLM answer. Since it will be faster if I were to address this I am closing this PR.

@bedevere-app
Copy link

bedevere-app bot commented Mar 20, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz picnixz closed this Mar 20, 2026
@picnixz
Copy link
Member

picnixz commented Mar 20, 2026

If you want to contribute with an LLM agent:

  • read our policy in the devguide
  • always review the code
  • never let an agent reply in your stead

If you do not follow those guidelines we will restrict your access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault/UB from expat when re-entering the XML Parser

3 participants