Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Dec 10, 2025

Bookmark exhaustion, typically from deep nesting, can cause the HTML Processor to throw an Exception.

The Exception is thrown by a private method, handle the exception and return false to indicate a failure to process.

Trac ticket: https://core.trac.wordpress.org/ticket/64394


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Comment on lines +6307 to +6308
* @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.

* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhausted bookmarks throw a generic Exception.

This block catches the exceptions thrown by insert_virtual_token().

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not perhaps lead a developer to think that they reached the end of the document, when in reality the nesting is too large (or the max bookmarks exceeded)? In that way, I think an exception is more helpful. Otherwise, wouldn't every loop over tokens in a doc need to do something like:

while ( $p->next_tag() ) {
    // ...
}
if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
     // Handle max bookmark error.
}

This would put the exception case in the regular code that always runs. Since exceeding the max bookmarks should be exceptional, I would think an exception is preferred:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
     if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
          // Handle max bookmark error.
     }
}

But since this is the only exception that WP_HTML_Tag_Processor throws (currently), then it could be just:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
    // Handle max bookmark error.
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since this is the only exception that WP_HTML_Tag_Processor throws

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Will this not perhaps lead a developer to think that they reached the end of the document

That's already the case, the HTML processor has avoided throwing errors and exposes problems through some getters. Primarily, ::get_last_error() should be used:

<?php
require '/wordpress/wp-load.php';
echo '<plaintext>';
echo "WordPress " . wp_get_wp_version() . "\n";

$p = WP_HTML_Processor::create_fragment('<table><tbody>unsupported');
while( $p->next_token() ) {
  var_dump($p->get_tag());  
}
// Need to check error status.
var_dump( $p->get_last_error() );
var_dump( $p->get_unsupported_exception()->getMessage() );

When these APIs throw errors that callers are supposed to handle, it's just too easy to bring down users' sites with errors that aren't actionable for them. It's true that superficially "end of document" is the same as "error." It seems preferable that a document silently fail to fully parse instead of crashing and bringing down a site.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Yes, sorry, I meant WP_HTML_Processor.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

OK, makes sense to me.

Co-authored-by: Weston Ruter <[email protected]>

if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
try {
$bookmark_name = $this->bookmark_token();
Copy link
Member Author

@sirreal sirreal Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exceptions really aren't helpful here and we want them to remain internal to the class. The affected methods are all private so theres some flexibility.

I'd consider returning false or null and handling those types of values instead of using the blanket try/catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants