-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: EA value_counts -> _value_counts #30673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@simonjayhawkins suggestions for making mypy happy here? |
does this make possible EA.value_counts? |
Still trying to figure out what the base class implementation looks like. I think it involves _values_for_factorized, but that seems to have inconsistent copy/view semantics. |
adding
to DatetimeLikeArrayMixin works. may want to make DatetimeLikeArrayMixin an abstract base class instead of mixin. The docstring states "Shared Base/Mixin class for DatetimeArray, TimedeltaArray, PeriodArray". |
Can you clarify what you mean with this? Personally, I would even go for removing |
Suppose we just want to go through the existing implementations and substitute in _values_for_factorize wherever feasible:
_values_for_factorize copy/view behavior:
|
And how is the copy/view semantic important for a value_counts implementation? |
I'd also be happy to see ExtensionArray.value_counts go away if possible, in favor of |
really? that is a giant step backwards for the api |
If the existing implementations dont make a copy and the _values_for_factorize-based ones do, thats a performance hit that im not eager to take. Since I don't fully understand the distinction between _values_for_factorize vs _values_for_argsort, I find it worth holding off on implementing a general version until I find the right attribute. |
The method that users can use is |
Since Anyway, regardless of the "possible general implementation" discussion, I think this PR is a step forward, so fine with first focusing on the content right now in this PR. |
Though in its current state, the PR breaks API right? Won't |
what is the reason we are so against an EA.value_counts() method? We have one now, and IIRC you didn't want this on the base class, though this PR actually makes that a very simpl impl now. |
For me its about dependency structure. I don't want our EAs depending on Series/DataFrame/Index (and want to change the handful of places that they currently do) |
Yes. Easy to reinstate i guess |
Yes, for me the same. And next to the actual code dependency structure, there is also the mental model: for me, Array is something independent of Series/Index, while Series/Index holds arrays. It's biizarre to me for an array method to return a Series. Also, I doubt anyone is actually using this. Users have |
@jbrockmendel can you provide a depretion for |
closing to clear the queue, will revisit after indexing fixes are done |
@jorisvandenbossche I'm now trying to implement more ExtensionIndex methods in terms of the backing EA, am having trouble similar to the trouble with |
Instead of returning a Series, return a tuple with the index and values to be passed to Series.
Where possible I've changed the methods to use
_values_for_factorized
in the hopes of converging on a base class implementation. This is proving elusive, suggestions welcome. cc @TomAugspurger @jorisvandenbosschexref #22843, #23074.