-
Notifications
You must be signed in to change notification settings - Fork 628
Open
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
I initialized my project using Crawlee CLI with Playwright and Camoufox, and I am trying to configure user_data_dir for persistent login sessions.
Although the profile data is correctly created in the specified user_data_dir after logging in to Google, the session is not reused on subsequent runs. When I restart the crawler, it redirects to the Google login page instead of restoring the previous authenticated session.
from camoufox import AsyncNewBrowser
from crawlee import Request
from crawlee._utils.context import ensure_context
from crawlee.browsers import PlaywrightBrowserPlugin, PlaywrightBrowserController, BrowserPool
from crawlee.crawlers import PlaywrightCrawler
from typing_extensions import override
from .constants.HandlerType import HandlerType
from .routes import router
class CamoufoxPlugin(PlaywrightBrowserPlugin):
"""Example browser plugin that uses Camoufox Browser, but otherwise keeps the functionality of
PlaywrightBrowserPlugin."""
def __init__(self, user_data_dir: str = None):
super().__init__()
self.user_data_dir = user_data_dir
@ensure_context
@override
async def new_browser(self) -> PlaywrightBrowserController:
if not self._playwright:
raise RuntimeError('Playwright browser plugin is not initialized.')
return PlaywrightBrowserController(
browser=(await AsyncNewBrowser(self._playwright, persistent_context=True,
headless=False,
user_data_dir=self.user_data_dir,
)).browser,
max_open_pages_per_browser=1, # Increase, if camoufox can handle it in your use case.
header_generator=None, # This turns off the crawlee header_generation. Camoufox has its own.,
)
async def main() -> None:
"""The crawler entry point."""
crawler = PlaywrightCrawler(
max_request_retries=0,
max_requests_per_crawl=10,
request_handler=router,
fingerprint_generator=None,
browser_pool=BrowserPool(
plugins=[CamoufoxPlugin('xxx')]),
)
await crawler.run(
[
Request.from_url(url='https://accounts.google.com/', label=HandlerType.GOOGLE_LOGIN, user_data={
'email': 'xxx',
'password': 'xxx'
})
]
)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.