Skip to content

PERF: use "_name" instead of "name" as Series metadata to avoid validation #51784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

See #46505 (comment) for context (and related to #51109).

The _metadata of a pandas object is what gets propagated in __finalize__ calls. For example the reason that ser.round() preserves its name is because of this mechanism (of course, we could also just do name=self.name when constructing the new series, which is also what we do in some places, but this is to explain how it works currently).

But so in practice, in such cases we are setting a name that already was a name of a Series, in which case we shouldn't need to "validate" that it is a valid name (i.e. that it is hashable). Setting the "name" attribute (the equivalent of ser.name = value) is a costly part of the current Series.__finalize__, and setting the private "_name" should (I think?) be functionally equivalent while avoiding the validation.

So for the functionality of __finalize__, I think this shouldn't break anything. But there might be other places, or subclasses, that rely on the exact content of _metadata = ["name"] ?

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Subclassing Subclassing pandas objects labels Mar 4, 2023
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review March 17, 2023 21:18
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jorisvandenbossche jorisvandenbossche added this to the 2.1 milestone Mar 23, 2023
@jorisvandenbossche jorisvandenbossche merged commit 540db96 into pandas-dev:main Mar 23, 2023
@jorisvandenbossche jorisvandenbossche deleted the perf-finalize-name branch March 23, 2023 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants