Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 17, 2025

Sampling failed for Array(List(...), ...) and other nested array types because _sample_unchecked assumed inner.sample() returned flat scalars suitable for reshape(). When the inner type is a complex nested type, it returns structured objects that cannot be reshaped.

Changes

  • dataframely/columns/array.py: Added type checking in _sample_unchecked to detect nested inner types (List, Array, Struct) and construct the array structure manually via a recursive helper function. Scalar inner types continue using the optimized reshape() path.

  • tests/columns/test_sample.py: Added parametrized tests covering Array(List(...)), Array(Array(...)), and Array(Struct(...)) with various shapes and sample counts.

Example

import dataframely as dy

class TestSchema(dy.Schema):
    a = dy.Array(dy.List(dy.Bool()), 2)

# Previously: InvalidOperationError: cannot reshape array of size 49 into shape (1, 2)
df = TestSchema.sample(1, generator=dy.random.Generator(0))  # Now works
Original prompt

This section details on the original issue you should resolve

<issue_title>Sampling fails for columns of dtype Array(List(...), ...)</issue_title>
<issue_description>Consider:

def test_sample_array_list(arr_size, n_samples):
    class TestSchema(dy.Schema):
        a = dy.Array(dy.List(dy.Bool()), arr_size)
    TestSchema.sample(n_samples, generator=dy.random.Generator(0))

On dataframely==2.3.1:

  • calling test_sample_array_list(2, 1)
    fails with polars.exceptions.InvalidOperationError: cannot reshape array of size 49 into shape (1, 2)
  • and calling test_sample_array_list(1, 1)
    fails with polars.exceptions.SchemaError: type Boolean is incompatible with expected type List(Boolean)

I'd expect both to pass.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix sampling issue for Array(List(...), ...) columns Fix sampling for Array columns with nested inner types (List, Array, Struct) Dec 17, 2025
Copilot AI requested a review from borchero December 17, 2025 12:57
@borchero borchero changed the title Fix sampling for Array columns with nested inner types (List, Array, Struct) fix: Properly sample Array columns with nested inner types Dec 17, 2025
@github-actions github-actions bot added the fix label Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.74%. Comparing base (b813ac9) to head (c3007ee).

Files with missing lines Patch % Lines
dataframely/columns/array.py 66.66% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #244      +/-   ##
===========================================
- Coverage   100.00%   99.74%   -0.26%     
===========================================
  Files           53       53              
  Lines         3066     3089      +23     
===========================================
+ Hits          3066     3081      +15     
- Misses           0        8       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sampling fails for columns of dtype Array(List(...), ...)

2 participants