Skip to content

Fixed metadata propagation in DataFrame.__getitem__ #37037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 15, 2020

Conversation

TomAugspurger
Copy link
Contributor

xref #28283

@TomAugspurger TomAugspurger added the metadata _metadata, .attrs label Oct 10, 2020
@jreback jreback modified the milestones: 2.0, 1.2 Oct 10, 2020
@jreback
Copy link
Contributor

jreback commented Oct 11, 2020

can you merge master (i think should be good then)

@TomAugspurger
Copy link
Contributor Author

There's a deeper problem here. NDFrame.__finalize__ iterates through self._metadata copying items from other to new. When we have a method with a type like f(self: DataFrame, ...) -> Series, we end up trying to copy other.name to the result. That causes problem with a DataFrame with a name column:

In [10]: other = pd.DataFrame({"name": [1, 2]})

In [11]: self.__finalize__(other)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-0a67385a6694> in <module>
----> 1 self.__finalize__(other)

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/generic.py in __finalize__(self, other, method, **kwargs)
   5115             for name in self._metadata:
   5116                 assert isinstance(name, str)
-> 5117                 object.__setattr__(self, name, getattr(other, name, None))
   5118         return self
   5119

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/series.py in name(self, value)
    493     def name(self, value: Label) -> None:
    494         if not is_hashable(value):
--> 495             raise TypeError("Series.name must be a hashable type")
    496         object.__setattr__(self, "_name", value)
    497

TypeError: Series.name must be a hashable type

I'll think a bit about this.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 12, 2020
@TomAugspurger
Copy link
Contributor Author

df.eval(...) now propagates attrs with engine="python", but engine="numexpr" doesn't. We can fix numexpr as a followup.

@jreback jreback merged commit 7e96ad6 into pandas-dev:master Nov 15, 2020
@jreback
Copy link
Contributor

jreback commented Nov 15, 2020

thanks @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata _metadata, .attrs Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants