-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Toward SearchableToolSet and cross-model ToolSearch #3680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| class SearchableToolset(AbstractToolset[AgentDepsT]): | ||
| """A toolset that implements tool search and deferred tool loading.""" | ||
|
|
||
| toolset: AbstractToolset[AgentDepsT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at WrapperToolset which already handles this + properly forwards __aexit__ and __aenter__!
| @@ -0,0 +1,136 @@ | |||
| """Minimal example to test SearchableToolset functionality. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like proper tests need to go into:
- test_toolsets.py has space for unit tests
- somewhere there are VCR cassettes that record an interaction with an LLM could be useful here
I just wanted to get something quick to iterate with an actual LLM. This ended up working on Claude but took a few iterations on the prompt. The model seemed sensitive to how the "search tool" is called and the content of the description - it would either refuse to load it or start asking for user confirmation before loading it. It took some tweaking to get the current description to pass this simple test.
❯ uv run python test_searchable_example.py
============================================================
Testing SearchableToolset
============================================================
Test 1: Calculation task
------------------------------------------------------------
2025-12-11 07:20:48,189 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:20:48,189 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools']
Result: I can calculate that for you directly.
123 multiplied by 456 equals **56,088**.
Test 2: Database task
------------------------------------------------------------
2025-12-11 07:20:50,983 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:20:50,984 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools']
2025-12-11 07:20:54,254 - root - DEBUG - SearchableToolset.call_tool(load_tools, {'regex': 'database|sql|table|query'}) ==> ['fetch_user_data', 'list_database_tables']
2025-12-11 07:20:54,255 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:20:54,255 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools', 'fetch_user_data', 'list_database_tables']
2025-12-11 07:20:57,735 - root - DEBUG - SearchableToolset.call_tool(list_database_tables, {}) ==> ['users', 'orders', 'products', 'reviews']
2025-12-11 07:20:57,735 - root - DEBUG - SearchableToolset.call_tool(fetch_user_data, {'user_id': 42}) ==> {'id': 42, 'name': 'John Doe', 'email': '[email protected]'}
2025-12-11 07:20:57,735 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:20:57,736 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools', 'fetch_user_data', 'list_database_tables']
Result: Perfect! Here are the results:
**Database Tables:**
- users
- orders
- products
- reviews
**User 42 Data:**
- ID: 42
- Name: John Doe
- Email: [email protected]
Test 3: Weather task
------------------------------------------------------------
2025-12-11 07:21:00,605 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:21:00,607 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools', 'fetch_user_data', 'list_database_tables']
2025-12-11 07:21:04,597 - root - DEBUG - SearchableToolset.call_tool(load_tools, {'regex': 'weather'}) ==> ['get_weather']
2025-12-11 07:21:04,598 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:21:04,599 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools', 'get_weather', 'fetch_user_data', 'list_database_tables']
2025-12-11 07:21:07,769 - root - DEBUG - SearchableToolset.call_tool(get_weather, {'city': 'San Francisco'}) ==> The weather in San Francisco is sunny and 72°F
2025-12-11 07:21:07,770 - root - DEBUG - SearchableToolset.get_tools
2025-12-11 07:21:07,771 - root - DEBUG - SearchableToolset.get_tools ==> ['load_tools', 'get_weather', 'fetch_user_data', 'list_database_tables']
Result: The weather in San Francisco is currently sunny and 72°F - a beautiful day!
| from ..tools import ToolDefinition | ||
| from .abstract import AbstractToolset, SchemaValidatorProt, ToolsetTool | ||
|
|
||
| _SEARCH_TOOL_NAME = 'load_tools' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another curious bit is that when tool was called "more_tools", I hit a crash:
Traceback (most recent call last):
File "/Users/anton/code/pydantic-ai/test_searchable_example.py", line 136, in <module>
asyncio.run(main())
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/test_searchable_example.py", line 123, in main
result = await agent.run("Can you list the database tables and then fetch user 42?")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/agent/abstract.py", line 226, in run
async with self.iter(
^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 231, in __aexit__
await self.gen.athrow(value)
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/agent/__init__.py", line 658, in iter
async with graph.iter(
^^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 231, in __aexit__
await self.gen.athrow(value)
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/graph.py", line 270, in iter
async with GraphRun[StateT, DepsT, OutputT](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/graph.py", line 423, in __aexit__
await self._async_exit_stack.__aexit__(exc_type, exc_val, exc_tb)
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 754, in __aexit__
raise exc_details[1]
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 735, in __aexit__
cb_suppress = cb(*exc_details)
^^^^^^^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/graph.py", line 978, in _unwrap_exception_groups
raise exception
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/graph.py", line 750, in _run_tracked_task
result = await self._run_task(t_)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/graph.py", line 779, in _run_task
output = await node.call(step_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/pydantic_graph/pydantic_graph/beta/step.py", line 253, in _call_node
return await node.run(GraphRunContext(state=ctx.state, deps=ctx.deps))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 576, in run
async with self.stream(ctx):
^^^^^^^^^^^^^^^^
File "/Users/anton/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/contextlib.py", line 217, in __aexit__
await anext(self.gen)
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 590, in stream
async for _event in stream:
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 716, in _run_stream
async for event in self._events_iterator:
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 677, in _run_stream
async for event in self._handle_tool_calls(ctx, tool_calls):
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 732, in _handle_tool_calls
async for event in process_tool_calls(
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 925, in process_tool_calls
ctx.state.increment_retries(ctx.deps.max_result_retries, model_settings=ctx.deps.model_settings)
File "/Users/anton/code/pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py", line 127, in increment_retries
raise exceptions.UnexpectedModelBehavior(message)
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (1) for output validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, that suggests that the model was not calling it correctly (wrong args possibly). I suggest adding https://ai.pydantic.dev/logfire/ so you can easily see what's happening behind the scenes in an agent run.
| from ..tools import ToolDefinition | ||
| from .abstract import AbstractToolset, SchemaValidatorProt, ToolsetTool | ||
|
|
||
| _SEARCH_TOOL_NAME = 'load_tools' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, that suggests that the model was not calling it correctly (wrong args possibly). I suggest adding https://ai.pydantic.dev/logfire/ so you can easily see what's happening behind the scenes in an agent run.
| regex: str | ||
|
|
||
|
|
||
| def _search_tool_def() -> ToolDefinition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out Tool.from_schema and the Tool constructor that takes a function (as used by FunctionToolset) for easier ways to construct a single tool. The function approach is the easiest by far
| description="""Search and load additional tools to make them available to the agent. | ||
| DO call this to find and load more tools needed for a task. | ||
| NEVER ask the user if you should try loading tools, just try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see you explained below that this was needed to pass the tests, even for Sonnet 4.5, but tokens are expensive so it'll be worth another iteration on this.
| parameters_json_schema={ | ||
| 'type': 'object', | ||
| 'properties': { | ||
| 'regex': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like pattern slightly better as an argument name, as we may at some point support different ones. Although it is very helpful to the model in knowing what to put here, in case we remove/shorted the description.
| all_tools: dict[str, ToolsetTool[AgentDepsT]] = {} | ||
| all_tools[_SEARCH_TOOL_NAME] = _SearchTool( | ||
| toolset=self, | ||
| max_retries=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to increase this, to give the model a few chances to fix its regex, if it submitted an invalid one the first time
| ) -> Any: | ||
| if isinstance(tool, _SearchTool): | ||
| adapter = TypeAdapter(_SearchToolArgs) | ||
| typed_args = adapter.validate_python(tool_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments will/should already have been validated by this point when used through ToolManager/Agent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, was not obvious from the types, but sounds like I can just cast this. Thanks.
| matching_tool_names: list[str] = [] | ||
|
|
||
| for tool_name, tool in toolset_tools.items(): | ||
| rx = re.compile(args['regex']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll be more efficient one line up :)
|
|
||
| for tool_name, tool in toolset_tools.items(): | ||
| rx = re.compile(args['regex']) | ||
| if rx.search(tool.tool_def.name) or rx.search(tool.tool_def.description): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For error handling, check out the ModelRetry exception
| """A toolset that implements tool search and deferred tool loading.""" | ||
|
|
||
| toolset: AbstractToolset[AgentDepsT] | ||
| _active_tool_names: set[str] = field(default_factory=set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this has instance variables means that it can't be reused across multiple agent runs, even though the same instance is registered to an agent just once... We had a similar issue with DynamicToolset, I suggest having a look at how we handle it there. We could also leverage ctx.run_id to store state on the same SearchableToolset instance, but separate it for each agent run.
|
Largely following #3620 for the anthropic bulit-in. This can coexist I think with SearchableTool as suggested, no big surprises here. Couple small things to note:
|
| This is currently only used by `OpenAIChatModel`, `HuggingFaceModel`, and `GroqModel`. | ||
| """ | ||
|
|
||
| supports_tool_search: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #3456, we're getting supported_builtin_tools so we won't need this dedicated field anymore
| thinking_tags=('<thinking>', '</thinking>'), | ||
| supports_json_schema_output=supports_json_schema_output, | ||
| json_schema_transformer=AnthropicJsonSchemaTransformer, | ||
| supports_tool_search=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not actually supported by all models: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
| needs_tool_search = any(tool.get('defer_loading') for tool in tools) | ||
|
|
||
| if needs_tool_search: | ||
| beta_features.add('advanced-tool-use-2025-11-20') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that different providers use different headers: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
| ) | ||
| elif item.name in ('bash_code_execution', 'text_editor_code_execution'): # pragma: no cover | ||
| raise NotImplementedError(f'Anthropic built-in tool {item.name!r} is not currently supported.') | ||
| elif item.name in ('tool_search_tool_regex', 'tool_search_tool_bm25'): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this up so that the NotImplementedErrors are last
|
|
||
| async def get_tools(self, ctx: RunContext[AgentDepsT]) -> dict[str, ToolsetTool[AgentDepsT]]: | ||
| # Models that support built-in tool search are exposed to all the tools as-is. | ||
| if ctx.model.profile.supports_tool_search: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is not going to work correctly with FallbackModel, as I realized and edited into #3666 (comment) a few days ago:
(Edit: Checking
ctx.model.profile.supports_deferred_loadingfrom insideSearchableToolset.get_toolsis not going work right when the model is aFallbackModel, as we wouldn't be checking the underlying models' abilities. So similar to #3212, we likely need to arrange things such that the model classes themselves can determine how to handle the search tool and deferred-loading tools...)
#3212 (comment) sketches out an approach for how a toolset could return both an AbstractBuiltinTool, plus tool definitions to use when that builtin tool is not available, so that the model itself can determine which to send to the API.
In the case of tool search, we'd need to have either the built-in ToolSearchTool + tool defs with defer_loading=True, or the custom search_tool + those tool defs that have already been revealed (without defer_loading=True).
The implementation I sketched in that comment does not yet account for an abstract tool coming with its own set of tool definitions.
| sequential=sequential, | ||
| requires_approval=requires_approval, | ||
| metadata=metadata, | ||
| defer_loading=defer_loading, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to also add this field to the agent.tool and agent.tool_plain decorators.
| To defer loading a tool's definition until the model finds it, mark it as `defer_loading=True`. | ||
| Note that only models with `ModelProfile.supports_tool_search` use this builtin tool. These models receive all tool | ||
| definitions and natively implement search and loading. All other models rely on `SearchableToolset` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather explain this in the future Tools docs section on defer_loading, since the primary way users will interact with this feature will be through that field rather than this builtin tool (see above)
| # TODO proper error handling | ||
| assert tool_name != _SEARCH_TOOL_NAME | ||
|
|
||
| # Expose the tool unless it defers loading and is not yet active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user wraps a toolset in SearchableToolset, I think we should assume they mean for all of them to be defer_loading. Or perhaps we can support bool | None, and have None be interpreted as False normally, but as True here, with an explicit False still being respected here and having it always be surfaced.
| class ToolSearchTool(AbstractBuiltinTool): | ||
| """A builtin tool that searches for tools during dynamic tool discovery. | ||
| To defer loading a tool's definition until the model finds it, mark it as `defer_loading=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the penultimate paragraph in #3666 (comment), I envision that users will primarily interact with this feature via the defer_loading=True field on tools (rather than through ToolSearchTool or SearchableToolset), which we'd detect automatically and handle by wrapping those tools in SearchableToolset. Then, as implemented, that would fall back on Anthropic's native functionality when appropriate.
So I'd love to see a test where we register a few @agent.tool(defer_loading=True)s and then test that it works as expected with both Anthropic and OpenAI.
Uh oh!
There was an error while loading. Please reload this page.