Skip to content

Pass method in __finalize__ #33273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 6, 2020
Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Apr 3, 2020

This passes method everywhere we use __finalize__. I'm trying to get a better sense for

  1. When we call finalize (and when we fail too)
  2. When we finalize multiple times
  3. The overhead of finalize

I haven't called it anywhere new yet (followup PR with that coming though).

This passes `method` everywhere we use `__finalize__`. This will be
useful for future enhancements buliding on `__finalize__`.
@TomAugspurger TomAugspurger added this to the 1.1 milestone Apr 3, 2020
@jbrockmendel
Copy link
Member

maybe a silly question, but for this to be useful, doesnt it entail putting e.g. _getitem_multilevel-specific logic into __finalize__, when that logic would make more sense in _getitem_multilevel?

@TomAugspurger
Copy link
Contributor Author

when that logic would make more sense in _getitem_multilevel?

Depends on just how specific the logic is to _getitem_multilevel. A motivating use case is my allows_duplicate_labels PR. In that I would do something like

if method in {"concat", "merge", "align", ...} # methods involving multiple NDFrames:
    self.allows_duplicate_labels = all(ndframe.allows_duplicate_labels for ndframe in others)

in this case, there's not any logic specific to concat. But we use the method to constrain the type of other to Tuple[NDFrame, ...].

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 3, 2020 via email

@jorisvandenbossche
Copy link
Member

Do we want such specific method names? Because this is kind of "public" interface, we should maybe try to limit to a set of "categories" (but I didn't really think about whether that defeats the purpose of it or not)

result = pd.Series(values, index=self.index, dtype=self.dtype).__finalize__(self)
result = pd.Series(values, index=self.index, dtype=self.dtype).__finalize__(
self, method="_single_replace"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be broken into two lines? probably also should use _constructor instead of pd.Series

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make method a required parameter in __finalize__ to avoid this problem in the future.

@jreback jreback added the API - Consistency Internal Consistency of API/Behavior label Apr 5, 2020
@jreback
Copy link
Contributor

jreback commented Apr 5, 2020

also needs a whatsnew note

@TomAugspurger
Copy link
Contributor Author

For

Because this is kind of "public" interface

can you make method a required parameter in finalize to avoid this problem in the future.

and

also needs a whatsnew note

Do we make any claim that __finalize__ is public? There's just a couple whatsnew mentions in the docs. And the docstring doesn't say anything about it being public.

I'm OK with these specific names and making method required but not if it's currently part of the public API.

@jorisvandenbossche
Copy link
Member

Do we make any claim that finalize is public?

Looking at the docs, it's indeed not mentioned in the docs on subclassing. But personally, I always assumed this is part of our sublcassing API.

In any case, geopandas is using it for that reason (so that's probably the reason I was assuming that ;)).

@TomAugspurger
Copy link
Contributor Author

I always assumed this is part of our subclassing API.

I was also assuming that, since it does interact with metadata. So my compromise was to not make method required, in case a subclass' method was calling it without a method. But I will push up a doc note that the values in method aren't stable yet.

@jorisvandenbossche
Copy link
Member

Regarding part of our subclassing API or not: since we actually don't use methods internally ourselves, I suppose it's indeed the purpose for subclasses to be able to override finalize to use methods. (or do other custom handling of metadata)

@TomAugspurger is it also your goal that external users can implement similar metadata propagation logic as you are experimenting for the allow_duplicate_labels? (I seem to remember we were discussing this at some point I think)

@TomAugspurger
Copy link
Contributor Author

is it also your goal that external users can implement similar metadata propagation logic as you are experimenting for the allow_duplicate_labels

I'm going back and forth on this. I have a few ideas for what an API would look like, but I'm having trouble deciding what will work well in practice. If I had to guess, I'll propose handling allow_duplicate_labels explicitly in __finalize__, and use lessons learned there to inform an API design.

@jbrockmendel
Copy link
Member

w/r/t the private/public/experimental aspect, the only opinion i have is that we retain flexibility with this (and not need to wait until 2.0 if minds are changed)

@TomAugspurger
Copy link
Contributor Author

I think this PR is compatible with that. The only "API change" is a clarification that the actual method passed by pandas isn't currently stable.

@jorisvandenbossche
Copy link
Member

We should still guarantee some stability for the places where we already specify methods, though.
But fully ok with the others being regarded as experimental.

@TomAugspurger
Copy link
Contributor Author

That makes me hesitant to include method in some of the methods here, which in turn makes me think that's it's premature to say method is stable if it's been in a release.

I'd rather work with subclassers to find what they're relying on. But since we're in an API gray zone that may be hard.

@TomAugspurger
Copy link
Contributor Author

In an effort to unstick things, I've removed the addition of method="_*". So I've only added method= to public methods, which I hope is uncontroversial.

I opened #33338 as a general discussion about __finalize__ and whether it's public API.

@jreback jreback merged commit 8a8544f into pandas-dev:master Apr 6, 2020
@jreback
Copy link
Contributor

jreback commented Apr 6, 2020

thanks @TomAugspurger nice improvement

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants