Skip to content

BUG-24915 fix unhashble Series.name in df aggregation #24920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lofoyet
Copy link

@lofoyet lofoyet commented Jan 25, 2019

Before creating a pd.Series and pass {name} as argument, check if {name} is hashable. If not use default None.

@lofoyet
Copy link
Author

lofoyet commented Jan 25, 2019

@YaoquanYe @daisysu0310

@lofoyet
Copy link
Author

lofoyet commented Jan 25, 2019

  1. not sure about what tests to add

@lofoyet
Copy link
Author

lofoyet commented Jan 25, 2019

@youhealthy and I are the same person

@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #24920 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24920      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52404    52406       +2     
==========================================
+ Hits        48412    48414       +2     
  Misses       3992     3992
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 42.9% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 97.76% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 539c54f...b270371. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #24920 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24920      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52404    52406       +2     
==========================================
+ Hits        48412    48414       +2     
  Misses       3992     3992
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 42.9% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 97.76% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 539c54f...b270371. Read the comment docs.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs:

  • Whatsnew entry for 0.24.1
  • Test: Add a test based on the example in the original issue. Construct the DataFrame/Series that replicates the issue and the expected DataFrame/Series

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a test. I am also not sure this is actually the correct fix. constructio of a Series from scalars with an invalid name with raise now. IOW this should be raising already, so not sure this is a good remedy.

`

In [1]: Series([1,2],name=[1,2])                                                                                                                                                                                                                                        
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-360525e04c11> in <module>
----> 1 Series([1,2],name=[1,2])

~/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    266         generic.NDFrame.__init__(self, data, fastpath=True)
    267 
--> 268         self.name = name
    269         self._set_axis(0, index, fastpath=True)
    270 

~/pandas/pandas/core/generic.py in __setattr__(self, name, value)
   5077             object.__setattr__(self, name, value)
   5078         elif name in self._metadata:
-> 5079             object.__setattr__(self, name, value)
   5080         else:
   5081             try:

~/pandas/pandas/core/series.py in name(self, value)
    400     def name(self, value):
    401         if value is not None and not is_hashable(value):
--> 402             raise TypeError('Series.name must be a hashable type')
    403         object.__setattr__(self, '_name', value)
    404 

TypeError: Series.name must be a hashable type

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 26, 2019
@lofoyet
Copy link
Author

lofoyet commented Jan 28, 2019

@jreback I agree with you. You can refer to #24915 for the problem.

There are 2 ways that I have in mind to fix this:

  1. This PR.
  2. Make change to how getattr works.

For 2. What I looked at is

#pandas.core.generic.py
if (name in self._internal_names_set or name in self._metadata or
    name in self._accessors):
        return object.__getattribute__(self, name)
else:
    if self._info_axis._can_hold_identifiers_and_holds_name(name):
        return self[name]
    return object.__getattribute__(self, name)

getattr(self, "name") will go to else clause, and if "name" is a column, it will return in if clause. But df.name is really an attribute rather not a column name. It should skip that "if" in "else" and goes to object.getattribute().

How do you propose to change? @jreback

@lofoyet
Copy link
Author

lofoyet commented Feb 8, 2019

@jreback any idea?

@jreback
Copy link
Contributor

jreback commented Feb 8, 2019

@lofoyet first thing you need here is a test. Then step thru it until you get to the source of the error, see if you patch fixes.

I would rather shy away from changing the way getattr works. Also I am not sure your change is the right way.

@WillAyd
Copy link
Member

WillAyd commented Feb 28, 2019

Closing as stale. @lofoyet ping if you'd like to continue though as mentioned above this needs a test first and foremost

@WillAyd WillAyd closed this Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot do agg if dataframe has column "name"
4 participants