-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
gh-146169: Prevent re-entrant Parse() calls in pyexpat #146179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c924f57
0e67052
029e408
57faef4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -276,6 +276,30 @@ def test_parse_again(self): | |
| self.assertEqual(expat.ErrorString(cm.exception.code), | ||
| expat.errors.XML_ERROR_FINISHED) | ||
|
|
||
| def test_reentrant_parse_crash(self): | ||
| p = expat.ParserCreate(encoding="utf-16") | ||
|
|
||
| def start(name, attrs): | ||
| def handler(data): | ||
| p.Parse(data, 0) | ||
| p.CharacterDataHandler = handler | ||
|
|
||
| p.StartElementHandler = start | ||
| data = b"\xff\xfe<\x00a\x00>\x00x\x00" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this payload?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But you do not need such payload or? It should be triggered by just reparsing the same data in the handler. |
||
| with self.assertRaises(RuntimeError) as cm: | ||
| for i in range(len(data)): | ||
| p.Parse(data[i:i+1], i == len(data) - 1) | ||
| self.assertEqual(str(cm.exception), | ||
|
Comment on lines
+288
to
+292
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this complex test? just have p.Parse(data, False) and that's it. |
||
| "cannot call Parse() from within a handler") | ||
|
|
||
| def test_parse_normal(self): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we even need this test in the first place? we already test this somewhere don't we?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well yes that is what I am asking. |
||
| p = expat.ParserCreate() | ||
| data = "<root><child/></root>".encode('utf-8') | ||
| try: | ||
| p.Parse(data, 1) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't use 0 or 1 for the second argument of parse but real booleans. |
||
| except RuntimeError: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove this. If the test fails it must fail with the RuntimeError or any other exception that was unexpected |
||
| self.fail("Parse() raised RuntimeError during normal operation") | ||
|
|
||
| class NamespaceSeparatorTest(unittest.TestCase): | ||
| def test_legal(self): | ||
| # Tests that make sure we get errors when the namespace_separator value | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Prevent re-entrant calls to ``Parse()`` from within expat handlers, | ||
| which could cause a crash. Now raises :exc:`RuntimeError` when such a | ||
| call is attempted. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -857,6 +857,13 @@ pyexpat_xmlparser_Parse_impl(xmlparseobject *self, PyTypeObject *cls, | |
| PyObject *data, int isfinal) | ||
| /*[clinic end generated code: output=8faffe07fe1f862a input=053e0f047e55c05a]*/ | ||
| { | ||
|
|
||
| if (self->in_callback) { | ||
| PyErr_SetString(PyExc_RuntimeError, | ||
| "cannot call Parse() from within a handler"); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are other re-entrant calls raising a RuntimeError or something else?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok this is clearly an LLM answer. Since it will be faster if I were to address this I am closing this PR. |
||
| return NULL; | ||
| } | ||
|
|
||
| const char *s; | ||
| Py_ssize_t slen; | ||
| Py_buffer view; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why using utf16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wrong. We just want to check for reentrancy. Handler doesn't care about the payload encoding as long as it is being called.