Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,12 @@ When the model calls other tools in parallel with an output tool, you can contro

The `'exhaustive'` strategy is useful when tools have important side effects (like logging, sending notifications, or updating metrics) that should always execute.

!!! warning "Streaming vs Sync Behavior Difference"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly streaming vs sync; it's just run_stream that's odd. So maybe it makes sense to mention this under https://ai.pydantic.dev/agents/#streaming-events-and-final-output, either as well, or instead. I say "instead" because we already give context there on run_stream always preferring the first final result it sees, which is in line with what we're explaining here. So maybe that place should have the primary explanation, and then here we can link to there

`run_stream()` behaves differently from `run()` and `run_sync()` when choosing the final result:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please link to the API docs for run_stream
  • It's not just run and run_sync, also iter, run_stream_events, etc. So maybe link to agents.md#running-agents and just say "the other run methods"


- **`run_stream()`**: The first called tool that **can** produce a final result (output or deferred) becomes the final result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link to the deferred tool docs

- **`run()` / `run_sync()`**: The first **output** tool becomes the final result. If none are called, all **deferred** tools become the final result as `DeferredToolRequests`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none are called

The wording is like this, since we get an UnexpectedModelBehavior if both output and deferred tools are called, but none output tools are validated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the deferred tools args are not validated

I guess it means we could check that if any deferred tools calls are present and tool_call_results is None, we can skip ignore invalid output tool calls, as we do in here:

yield _messages.FunctionToolCallEvent(call)
output_parts.append(e.tool_retry)
yield _messages.FunctionToolResultEvent(e.tool_retry)

But I'm not sure about it...
And IMHO, if do this at all, then it should be done in a separate PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the deferred tools args are not validated

That's true for external tools, but not for function tools that raise CallDeferred, which do have args validated before the function is called

I guess it means we could check that if any deferred tools calls are present and tool_call_results is None, we can skip ignore invalid output tool calls, as we do in here:

I don't quite understand what you mean

Copy link
Contributor Author

@Danipulok Danipulok Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what you mean

I mean we could try to guess if we have any valid differed tools calls in the same model response. And if we have any, do not raise ModelUnexpectedBehavior, but ignore invalid output tool calls

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Danipulok I think the behavior of all-invalid output tool calls + deferred tool calls (valid or otherwise) should be identical to the behavior of all-invalid output tool calls + function tool calls. That means that the end_strategy determines whether the tools are executed, but either way if the model called any output tools (valid or otherwise) we interpret that to mean that model is ready to end the agent run, which means that if the output tools were valid, we return it directly, and if not, we tell the model to try again, but we don't send it the results of the other tool calls.

With deferred tool calls, the consistent behavior would be to never return DeferredToolRequests if there were any output tool calls, because that would imply that the run will be resumed later with the deferred tool results. So if none of the output tools are valid, and the model is told to retry and keeps failing, I think the output validation UnexpectedModelBehavior exception is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you!
So if both output tools and deferred tool calls were passed in the same response, and all output tools have failed, the UnexpectedModelBehavior exception is correct.
Then current behavior is okay


#### Native Output

Native Output mode uses a model's native "Structured Outputs" feature (aka "JSON Schema response format"), where the model is forced to only output text matching the provided JSON schema. Note that this is not supported by all models, and sometimes comes with restrictions. For example, Gemini cannot use tools at the same time as structured output, and attempting to do so will result in an error.
Expand Down
9 changes: 8 additions & 1 deletion tests/test_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -3291,7 +3291,14 @@ def deferred_tool(x: int) -> int: # pragma: no cover
)

def test_early_strategy_with_external_tool_call(self):
"""Test that early strategy handles external tool calls correctly."""
"""Test that early strategy handles external tool calls correctly.

Streaming and sync modes differ in how they choose the final result:
- Streaming: First tool call (in response order) that can produce a final result (output or deferred)
- Sync: First output tool (if none called, all deferred tools become final result)

See https://github.com/pydantic/pydantic-ai/issues/3636#issuecomment-3618800480 for details.
"""
tool_called: list[str] = []

def return_model(_: list[ModelMessage], info: AgentInfo) -> ModelResponse:
Expand Down
8 changes: 5 additions & 3 deletions tests/test_streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -1110,9 +1110,11 @@ def deferred_tool(x: int) -> int: # pragma: no cover
async def test_early_strategy_with_external_tool_call(self):
"""Test that early strategy handles external tool calls correctly.

Streaming mode expects the first output tool call to be the final result,
and has different behavior from sync mode in this regard.
See https://github.com/pydantic/pydantic-ai/issues/3636 for details.
Streaming and sync modes differ in how they choose the final result:
- Streaming: First tool call (in response order) that can produce a final result (output or deferred)
- Sync: First output tool (if none called, all deferred tools become final result)

See https://github.com/pydantic/pydantic-ai/issues/3636#issuecomment-3618800480 for details.
"""
tool_called: list[str] = []

Expand Down