Skip to content

Conversation

@timsaucer
Copy link

This PR is a proposal for adding a single point where we do python object to scalar value conversion. It attempts to handle three known arrow libraries: pyarrow, nanoarrow, and arro3. It includes trying to convert any library that produces a pycapsule arrow interface. There is a fallback to take any regular Python object and try turning it into a pyarrow scalar value and then importing it.

@timsaucer
Copy link
Author

@kosiew I tried taking a stab at moving all of the scalar value conversion to a single point and supporting all of the libraries that I know about. I didn't add unit tests yet, though.

@timsaucer timsaucer marked this pull request as draft February 6, 2026 14:28
@timsaucer timsaucer marked this pull request as ready for review February 6, 2026 16:27
@timsaucer
Copy link
Author

@kosiew ready for review

Comment on lines +23 to +24
import arro3.core
import nanoarrow
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be included in pyproject.toml's dev dependencies.

Comment on lines +42 to +44
let field = Arc::new(Field::new_list_field(
array.data_type().clone(),
array.nulls().is_some(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic which is duplicated later can be extracted to a helper array_to_list_scalar to keep behaviour consistent and make future fixes easier.

Comment on lines +123 to +126
let field = Arc::new(Field::new_list_field(
array.data_type().clone(),
array.nulls().is_some(),
));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated logic which can be extracted to a helper array_to_list_scalar to keep behaviour consistent and make future fixes easier.

Comment on lines +82 to +83
let type_name = value.get_type().repr()?;
if type_name.contains("nanoarrow")? && type_name.contains("Scalar")? {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use is_instance instead of repr?

  if let Ok(scalar_type) = na.getattr("Scalar") {
                if value.is_instance(&scalar_type)? {

// Does it have a PyCapsule interface but isn't one of our known libraries?
// If so do our "best guess". Try checking type name, and if that fails
// return a single value if the length is 1 and return a List value otherwise
if value.hasattr("__arrow_c_array__")? {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find tests for the unknown __arrow_c_array__ path in python/tests/test_expr.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants