Skip to content

fix: css_selector ignored in LXML scraping for raw:// URLs (#1484)#1833

Open
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-css-selector-raw-1484
Open

fix: css_selector ignored in LXML scraping for raw:// URLs (#1484)#1833
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-css-selector-raw-1484

Conversation

@hafezparast
Copy link

Summary

Changes

  • crawl4ai/content_scraping_strategy.py: Added css_selector filtering before target_elements processing; target_elements now searches within the css_selector result instead of the full body

Test plan

  • New test suite: tests/test_issue_1484_css_selector.py (10 tests)
  • Regression suite: 304 passed, 1 pre-existing failure (no new regressions)

Generated with Claude Code

…#1484)

css_selector was skipped in _scrap() — only target_elements was
applied. Now css_selector filters the DOM first, then target_elements
narrows within that selection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant