Skip to content

fix: css_selector for raw://, ~ sibling combinator, background image extraction (#1484, #1254, #1691)#1826

Closed
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-css-selector-sibling-bgimg-1484-1254-1691
Closed

fix: css_selector for raw://, ~ sibling combinator, background image extraction (#1484, #1254, #1691)#1826
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-css-selector-sibling-bgimg-1484-1254-1691

Conversation

@hafezparast
Copy link

Summary

Changes

  • crawl4ai/content_scraping_strategy.py: Added css_selector filtering in _scrap() before target_elements processing; added background image extraction from style attributes and data-* attributes on non-<img> elements
  • crawl4ai/extraction_strategy.py: Updated all 5 _resolve_source() implementations to accept ~ general sibling combinator; updated abstract method docstring

Test plan

  • New test suite: tests/test_issue_1484_css_selector.py (10 tests)
  • New test suite: tests/test_issue_1254_sibling_selectors.py (10 tests)
  • New test suite: tests/test_issue_1691_background_images.py (11 tests)
  • Regression suite: 304/305 passing (1 pre-existing HuggingFace failure, no new regressions)

Generated with Claude Code

…extraction (unclecode#1484, unclecode#1254, unclecode#1691)

- unclecode#1484: css_selector param in _scrap() was accepted but never applied for
  raw:// URLs. Now filters content before target_elements processing.
- unclecode#1254: _resolve_source() only supported + (adjacent sibling). Added ~
  (general sibling) combinator across all 5 extraction strategy implementations.
- unclecode#1691: Image extraction only processed <img> tags. Now also extracts CSS
  background-image URLs and data-* attributes with image extensions on
  non-img elements (Elementor, page builders).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ntohidi
Copy link
Collaborator

ntohidi commented Mar 12, 2026

@hafezparast , thank you for your contribution.

Here are some notes:
Issue #1691 is already closed.
Issue #1254 is already fixed and in the develop branch.

Please update the PR to only address a fix for issue #1484

@ntohidi ntohidi closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants