-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add force_download support for Anthropic and OAIR models and clarify proper BinaryContent base64 handling
#3694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
c642ac7
754c0f0
2f5bcf6
5d99ed2
ff5d79a
887a79f
6bba553
7389467
8ad810d
da9ec78
9a55ce2
703b772
35d8745
d9e8a01
0bcd2e8
8a98f77
45d35e5
4a6234d
13c5284
91b1f79
0688875
eab1e14
6e0b72d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1034,6 +1034,33 @@ def _add_cache_control_to_last_param( | |
| # Add cache_control to the last param | ||
| last_param['cache_control'] = self._build_cache_control(ttl) | ||
|
|
||
| @staticmethod | ||
| def _map_binary_content(item: BinaryContent) -> BetaContentBlockParam: | ||
| # Anthropic SDK accepts file-like objects (IO[bytes]) and handles base64 encoding internally | ||
| if item.is_image: | ||
| return BetaImageBlockParam( | ||
| source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore | ||
| type='image', | ||
| ) | ||
| elif item.media_type == 'application/pdf': | ||
| return BetaBase64PDFBlockParam( | ||
|
||
| source=BetaBase64PDFSourceParam( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm looking at what other sources this supports, and there's also We're adding support for uploaded files in #2611, but that PR has been stale for a bit so may be interesting for you to pick up. |
||
| data=io.BytesIO(item.data), | ||
| media_type='application/pdf', | ||
| type='base64', | ||
| ), | ||
| type='document', | ||
| ) | ||
| elif item.media_type == 'text/plain': | ||
| return BetaBase64PDFBlockParam( | ||
| source=BetaPlainTextSourceParam( | ||
| data=item.data.decode('utf-8'), media_type=item.media_type, type='text' | ||
| ), | ||
| type='document', | ||
| ) | ||
| else: | ||
| raise RuntimeError(f'Unsupported binary content media type for Anthropic: {item.media_type}') | ||
|
|
||
| @staticmethod | ||
| async def _map_user_prompt( | ||
| part: UserPromptPart, | ||
|
|
@@ -1049,24 +1076,20 @@ async def _map_user_prompt( | |
| elif isinstance(item, CachePoint): | ||
| yield item | ||
| elif isinstance(item, BinaryContent): | ||
| if item.is_image: | ||
| yield AnthropicModel._map_binary_content(item) | ||
| elif isinstance(item, ImageUrl): | ||
| if item.force_download: | ||
| downloaded = await download_item(item, data_format='bytes') | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also respect
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to use |
||
| yield BetaImageBlockParam( | ||
| source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore | ||
| source={ | ||
| 'data': io.BytesIO(downloaded['data']), | ||
| 'media_type': item.media_type, | ||
| 'type': 'base64', | ||
| }, # type: ignore | ||
| type='image', | ||
| ) | ||
| elif item.media_type == 'application/pdf': | ||
| yield BetaBase64PDFBlockParam( | ||
| source=BetaBase64PDFSourceParam( | ||
| data=io.BytesIO(item.data), | ||
| media_type='application/pdf', | ||
| type='base64', | ||
| ), | ||
| type='document', | ||
| ) | ||
| else: | ||
| raise RuntimeError('Only images and PDFs are supported for binary content') | ||
| elif isinstance(item, ImageUrl): | ||
| yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image') | ||
| yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image') | ||
DouweM marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| elif isinstance(item, DocumentUrl): | ||
| if item.media_type == 'application/pdf': | ||
| yield BetaBase64PDFBlockParam(source={'url': item.url, 'type': 'url'}, type='document') | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Arbitrary" may be a little too strong, as it is specifically a file matching the type in
media_type😄I think if we make this
The binary file data, we can remove the specific mention of base64 that's added in the next linesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep one as it reduces the likelihood of users using it incorrectly