Skip to content

Conversation

@mroeschke
Copy link
Member

@mroeschke mroeschke commented Dec 15, 2025

@jorisvandenbossche I went with your suggestion to

We changed Series to copy array input by default when we introduced CoW. We should maybe do the same for Index?

in #63306 (comment)

@mroeschke mroeschke added this to the 3.0 milestone Dec 15, 2025
@mroeschke mroeschke added Index Related to the Index class or subclasses Series Series data structure Copy / view semantics labels Dec 15, 2025
@jorisvandenbossche
Copy link
Member

We essentially already do a shallow copy of Index inputs in the Series contructor (by keeping track of the references the Index has). But the problem here is that the Index was created without a copy.

To clarify my suggestion:

We changed Series to copy array input by default when we introduced CoW. We should maybe do the same for Index?

I meant "doing the same for the Index constructor" (and not doing the same for Index input to the Series constructor ..). I see that the above sentence could have been interpreted both ways, sorry ;)

So that pd.Index(arr) copies arr by default.

@mroeschke
Copy link
Member Author

mroeschke commented Dec 15, 2025

So that pd.Index(arr) copies arr by default.

Ah makes sense. No problem, I mimicked the copying we do in the Series constructor in the Index constructor. If you're OK with the approach, I can follow up in another PR with:

  1. Applying the same changes to the index subclasses
  2. Including ExtensionArrays

@mroeschke mroeschke changed the title BUG: copy Index inputs to Series to preserve CoW reference tracking BUG: copy np.ndarray inputs to Index constructor by default Dec 16, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

if isinstance(data, (set, frozenset)):
data = list(data)

elif is_ea_or_datetimelike_dtype(data_dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For datetime64 ndarray, we wouldn't get in the elif block below that now copies, but this is what you mentioned to handle in the Index subclasses in a follow-up?

data=None,
dtype=None,
copy: bool = False,
copy: bool | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use the updated description in the Series docstring as a starting point to update the docstring here as well (as the end goal should be to make it similar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Copy / view semantics Index Related to the Index class or subclasses Series Series data structure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series stealing references from CategoricalIndex is invalid for read-only arrays

2 participants