Skip to content

Non-Deep copy for Categorical? #26995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
jbrockmendel opened this issue Jun 22, 2019 · 9 comments · Fixed by #27083
Closed
3 tasks

Non-Deep copy for Categorical? #26995

jbrockmendel opened this issue Jun 22, 2019 · 9 comments · Fixed by #27083
Labels
API Design ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jun 22, 2019

Categorical.copy passes values=self._codes.copy(), and doesn't have a deep=False kwarg (which the base ExtensionArray does). This is causing trouble in #26954. Is this intentional @TomAugspurger?

Pandas-internal EAs with copy behavior not respecting deep kwarg:

  • Categorical (no kwarg at all)
  • DatetimeLike (ignores kwarg)
  • PandasArray

IntegerArray, IntervalArray, and SparseArray look correct.

@TomAugspurger
Copy link
Contributor

Probably a bug.

@jorisvandenbossche
Copy link
Member

In general, we never really properly defined the expected semantics for EA.copy (or exposed it through eg Series.copy) -> #22314

Also currently, the default of deep is False, but I assume this should actually be True (if you use the same meaning as for Series). Or, we could just remove this keyword altogether for EAs (eg ndarray also has no deep argument)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 24, 2019 via email

@jorisvandenbossche
Copy link
Member

Is there a use case for a non-deep one?

@jreback
Copy link
Contributor

jreback commented Jun 24, 2019

actually need to be careful with this

we don’t normally deep copy the cells of object array
eg day you have dicts or lists inside a Series cell

as that is not python default behavior

however it has causes some bugs over time

@jorisvandenbossche
Copy link
Member

@jbrockmendel we don't get notified of edits of the first post, but I think your updated information is relevant for the discussion: basically none of the EAs implement this.

So before actually starting to implement this, to repeat my previous questions:

  • shouldn't deep=True be the default?
  • is there actually a use case for deep=False, or can we just remove this keyword alltogether?

@jbrockmendel
Copy link
Member Author

shouldn't deep=True be the default?

I would be +1 on this.

is there actually a use case for deep=False, or can we just remove this keyword alltogether?

The only thing that comes to mind is that this is kind of like a view that could be used for some copy-on-write behavior, but if that's the idea then we should just use view. I'd be on board with removing the kwarg.

@jbrockmendel
Copy link
Member Author

is there actually a use case for deep=False, or can we just remove this keyword alltogether?

Come to think of it, we don't have a non-copying view method, which would be helpful for reshape.

@jorisvandenbossche
Copy link
Member

@jbrockmendel can you explain a bit more how that would work? (why is a non-copy view needed there?)

@jreback jreback added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 27, 2019
@jreback jreback added this to the 0.25.0 milestone Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
4 participants