Skip to content

BUG AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions' #11640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nbonnotte opened this issue Nov 18, 2015 · 13 comments
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Milestone

Comments

@nbonnotte
Copy link
Contributor

I guess it will be clearer with an example. First, let's prepare the dataframe:

In [2]: df = pd.DataFrame(columns=['a','b','c','d'], data=[[1,'b1','c1',3], [1,'b2','c2',4]])

In [3]: df = df.pivot_table(index='a', columns=['b','c'], values='d').reset_index()

In [4]: df
Out[28]: 
b  a b1 b2
c    c1 c2
0  1  3  4

Now, the exception raised:

In [5]: df.groupby('a').mean()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-a830c6135818> in <module>()
----> 1 df.groupby('a').mean()

/home/nicolas/Git/pandas/pandas/core/groupby.py in mean(self)
    764             self._set_selection_from_grouper()
    765             f = lambda x: x.mean(axis=self.axis)
--> 766             return self._python_agg_general(f)
    767 
    768     def median(self):

/home/nicolas/Git/pandas/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1245                 output[name] = self._try_cast(values[mask], result)
   1246 
-> 1247         return self._wrap_aggregated_output(output)
   1248 
   1249     def _wrap_applied_output(self, *args, **kwargs):

/home/nicolas/Git/pandas/pandas/core/groupby.py in _wrap_aggregated_output(self, output, names)
   3529     def _wrap_aggregated_output(self, output, names=None):
   3530         agg_axis = 0 if self.axis == 1 else 1
-> 3531         agg_labels = self._obj_with_exclusions._get_axis(agg_axis)
   3532 
   3533         output_keys = self._decide_output_index(output, agg_labels)

/home/nicolas/Git/pandas/pandas/core/groupby.py in __getattr__(self, attr)
    557 
    558         raise AttributeError("%r object has no attribute %r" %
--> 559                              (type(self).__name__, attr))
    560 
    561     def __getitem__(self, key):

AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions'

Maybe I'm doing something wrong, and it's not a bug, but then the exception raised should definitely be more explicit than a reference to an internal attribute :-)

This attribute, by the way, is (only) referenced in one file and in issue #5264. It might be connected, but the discussion is a bit long and technical.

I'll try to have a look at what's going on.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

it should be a better error message, but you are grouping on something which is not a column, your
columns are a multi-index.

In [16]: df.columns
Out[16]: 
MultiIndex(levels=[[u'b1', u'b2', u'a'], [u'c1', u'c2', u'']],
           labels=[[2, 0, 1], [2, 0, 1]],
           names=[u'b', u'c'])

In [17]: df.index
Out[17]: Int64Index([0], dtype='int64')

In [18]: df.columns.values
Out[18]: array([('a', ''), ('b1', 'c1'), ('b2', 'c2')], dtype=object)

what exactly are you trying to do?

@jreback jreback added Groupby Error Reporting Incorrect or improved errors from pandas labels Nov 18, 2015
@nbonnotte
Copy link
Contributor Author

I'm trying to group according to the column a, or ('a',''). What would be the proper way?

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

In [27]: df = pd.DataFrame(columns=['a','b','c','d'], data=[[1,'b1','c1',3], [1,'b2','c2',4]])

In [28]: df
Out[28]: 
   a   b   c  d
0  1  b1  c1  3
1  1  b2  c2  4

In [29]: df.groupby('a').mean()
Out[29]: 
     d
a     
1  3.5

@jreback jreback added this to the Next Major Release milestone Nov 18, 2015
@nbonnotte
Copy link
Contributor Author

But that's not the result I would expect: with my dumb example, I would like to get the same dataframe.

BTW, if df['a'] works whatever the status of a, wouldn't it be nice to be able to group according to a as well?

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

what are your expecattions for a result here? pls show an example.

a is not a group in your example

@nbonnotte
Copy link
Contributor Author

i would like that

b  a b1 b2
c    c1 c2
0  1  3  4
1  1  5  5

after grouping by a and taking the mean, yields

b b1   b2
c c1   c2
a        
1  4  4.5

where the first dataframe is for instance obtained with

In [88]: df = pd.DataFrame(columns=['a','b','c','d'], data=[[1,'b1','c1',3], [1,'b2','c2',4], [2,'b1','c1',5], [2,'b2','c2',5]]).pivot_table(index='a', columns=['b','c'], values='d').reset_index()

In [89]: df
Out[89]: 
b  a b1 b2
c    c1 c2
0  1  3  4
1  2  5  5

In [90]: df['a'] = 1

In [91]: df
Out[91]: 
b  a b1 b2
c    c1 c2
0  1  3  4
1  1  5  5

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

In [17]: df.groupby([('a','')]).mean()
Out[17]: 
b     b1   b2
c     c1   c2
(a, )        
1      4  4.5

@nbonnotte
Copy link
Contributor Author

So that was that... I had tried

In [99]: df.groupby(('a', '')).mean()
Out[99]: 
b  a b1 b2
c    c1 c2
   1  5  5
a  1  3  4

(the result of which I quite don't understand, but never mind) but not enclosing it betweens brackets. Thanks!

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

gr8.

if u are interested in improving he error message on he above case would be great

@nbonnotte
Copy link
Contributor Author

Sure!

@nbonnotte
Copy link
Contributor Author

@jreback digging about this issue, I think what is happening here is not so much a problem about reporting as a real bug. Indeed, my example just shows that after all issue #11185 was only partially solved by the PR #11202:

In [3]: df = pd.DataFrame(columns=['a', 'b', 'c', 'd'],
                       data=[[1, 'b1', 'c1', 3]])

In [4]: df.groupby('z').mean()
Out[4]: <pandas.core.groupby.DataFrameGroupBy object at 0x7f57f363d510>

This should produce a KeyError. The fact that a KeyError is not raised then allows for the AttributeError that is the subject of this issue, and is caused by the fact that the list of keys passed (here ['z']) is of the same length as the index, which in turn causes match_axis_length to be True in the following line:

https://github.com/pydata/pandas/blob/b07dd0cbd6d18c55aaa0043d85f42a483eab7dbb/pandas/core/groupby.py#L2210

I'll dig a bit deeper before making a PR

@jreback
Copy link
Contributor

jreback commented Nov 24, 2015

hmm, that does looks like a bug. I agree should give a KeyError (though a bit lower down in the code that where you pointed).

@nbonnotte
Copy link
Contributor Author

Well, this is quite interesting. I've found a correction of the last bug, which does not solve the first problem though. But digging a bit further, I've found another bug

In [16]: df = pd.DataFrame(columns=['a', 'b', 'c', 'd'],
                       data=[[1, 'b1', 'c1', 3],
                             [1, 'b2', 'c2', 4]])

In [17]: dg = df.pivot_table(index='a', columns=['b', 'c'], values='d').reset_index()

In [18]: dg.drop('a', axis=1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-90595ac9cb8f> in <module>()
----> 1 dg.drop('a', axis=1)

/home/nicolas/Git/pandas/pandas/core/generic.pyc in drop(self, labels, axis, level, inplace, errors)
   1615                 new_axis = axis.drop(labels, level=level, errors=errors)
   1616             else:
-> 1617                 new_axis = axis.drop(labels, errors=errors)
   1618             dropped = self.reindex(**{axis_name: new_axis})
   1619             try:

/home/nicolas/Git/pandas/pandas/core/index.py in drop(self, labels, level, errors)
   5011                 else:
-> 5012                     inds.extend(lrange(loc.start, loc.stop))
   5013             except KeyError:
   5014                 if errors != 'ignore':

AttributeError: 'numpy.ndarray' object has no attribute 'start'

Turns out, this is the AttributeError which is mistakenly displayed as

AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions'

I've not checked yet if there is already an issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests

2 participants