Skip to content

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nehalecky opened this issue Oct 3, 2013 · 15 comments · Fixed by #5105
Closed

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

nehalecky opened this issue Oct 3, 2013 · 15 comments · Fixed by #5105

Comments

@nehalecky
Copy link
Contributor

Previously, in 0.12 and earlier, I could quickly visualize groupby objects with a call to .plot(). Currently in master, the .plot method on a groupby object raises, with following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-3760a47a721b> in <module>()
----> 1 grp.kWh.plot()

/pandas/pandas/core/groupby.pyc in __getattr__(self, attr)
    250 
    251         if hasattr(self.obj, attr) and attr != '_cache':
--> 252             return self._make_wrapper(attr)
    253 
    254         raise AttributeError("%r object has no attribute %r" %

/pandas/pandas/core/groupby.pyc in _make_wrapper(self, name)
    265                    "using the 'apply' method".format(kind, name,
    266                                                      type(self).__name__))
--> 267             raise AttributeError(msg)
    268 
    269         f = getattr(self.obj, name)

AttributeError: Cannot access callable attribute 'plot' of 'SeriesGroupBy' objects, try using the 'apply' method
@nehalecky
Copy link
Contributor Author

Oh yeah, currently at:

print pd.__version__
0.12.0-706-g8e784e7

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2013

This is "sort of" by design. We're trying to discourage use of forwarded methods that don't make sense. Hopefully you're not using master in production :) Clearly this one shouldn't have been disabled. Very quick fix.

@ghost ghost assigned cpcloud Oct 3, 2013
@nehalecky
Copy link
Contributor Author

Hey @cpcloud, thanks for the quick reply, and makes sense to discourage methods that don't belong.

Not using in production, but testing master on our current build—I like living on the edge! Traced to this commit:
b709389

Also, I noticed that the helpful tab auto-complete of column names was clobbered with the commit as well?

Thanks again.

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2013

That code didn't touch any autocompletion code. Can you show me an example of what you mean?

@jreback
Copy link
Contributor

jreback commented Oct 3, 2013

@cpcloud there's no _local_dir() method on grouby's.....should add (similar to in core/generic.py as well (but in this case refer to IIRC self.obj, e.g. the groupby object
this should prob be in SeriesGroupBy and DataFrameGroupby slightly differently

@jreback
Copy link
Contributor

jreback commented Oct 3, 2013

something like this: you can just forward it I think

in core/groupby.py

on Groupby

   def _local_dir(self):
        """ add the string-like attributes from the info_axis """
        return self.obj._local_dir()

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2013

sounds good

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2013

bonus: figured out the groupby double pltting issue

@nehalecky
Copy link
Contributor Author

Hey @jreback, thanks for explaining more—I wasn't familiar with how pandas propagates the column names to be listed like attributes on the groupby object. @cpcloud, to answer your question, I have a df, like:

In [11]: df
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 30852 entries, 2013-01-01 00:00:00-06:00 to 2013-06-26 15:00:00-05:00
Data columns (total 2 columns):
kWh         30852  non-null values
meter_id    30852  non-null values
dtypes: float64(1), object(1)

In [12]: df.head()
Out[12]: 
                           kWh meter_id
2013-01-01 00:00:00-06:00   78  TSU_151
2013-01-01 00:15:00-06:00   72  TSU_151
2013-01-01 00:30:00-06:00   78  TSU_151
2013-01-01 00:45:00-06:00   78  TSU_151
2013-01-01 01:00:00-06:00   84  TSU_151

I create groupings, like:

In [18]: grp = df.groupby(df.meter_id)
In [19]: grp
Out[19]: <pandas.core.groupby.DataFrameGroupBy object at 0x10ef81ed0>

And it's here where things aren't the same, as I could previously access individual columns names via grp.k<tab> which would autocomplete to grp.kWh. I am still able to inspect, like:

In [20]: grp.kWh
Out[20]: <pandas.core.groupby.SeriesGroupBy object at 0x10ef81a90>

Also, I've just noticed that also missing in groupby autocomplete are are few other helpful methods like .describe(), which I am always using. Still, accessible via explicitly typing out:

In [21]: grp.describe()
Out[16]: 
                         kWh
meter_id                    
TSU_148  count  13362.000000
         mean     395.213434
         std       81.315125
         min        0.000000
         25%      325.300000
         50%      386.000000
         75%      453.600000
         max      666.500000
TSU_150  count   1672.000000
         mean     315.579725
         std       53.129335
         min       62.300000
         25%      280.700000
         50%      293.900000
         75%      339.825000
         max      577.100000
TSU_151  count  15818.000000
         mean     165.428246
         std       49.841351
         min       60.000000
         25%      108.000000
         50%      180.000000
         75%      204.000000
         max      276.000000

Continuing, after typing out all kWh, I can once again tab autocomplete, like:

In [21]: grp.kWh.<tab>
grp.kWh.agg        grp.kWh.first      grp.kWh.last       grp.kWh.min        grp.kWh.ohlc       grp.kWh.sum        
grp.kWh.aggregate  grp.kWh.get_group  grp.kWh.max        grp.kWh.name       grp.kWh.prod       grp.kWh.transform  
grp.kWh.apply      grp.kWh.groups     grp.kWh.mean       grp.kWh.ngroups    grp.kWh.size       grp.kWh.var        
grp.kWh.filter     grp.kWh.indices    grp.kWh.median     grp.kWh.nth        grp.kWh.std       

But again, .describe() is missing from the list! Hope that helps and let me know if I can get you any more info!

Thanks!

@jreback
Copy link
Contributor

jreback commented Oct 3, 2013

@nehalecky this was just changed, rather than do an explicity ipython autocomplete its more correct to define __dir__ on the object (which is done in the base class), with a _loca_dir() override to do local attributes (e.g. column names and such)

@jtratner
Copy link
Contributor

jtratner commented Oct 4, 2013

But clearly this needs to be fixed so it still completes column names and describe.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

this look right?

In [9]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                           names=['first', 'second'])

In [10]: df = DataFrame(np.random.randn(10, 3), index=index,columns=['A', 'B', 'C'])

In [11]: df
Out[11]: 
                     A         B         C
first second                              
foo   one    -0.939610 -0.109232 -0.540813
      two    -0.356905  1.118679  0.497318
      three  -0.262202  1.665174 -0.293807
bar   one     1.111391  2.378450 -0.252010
      two     0.155386 -0.893460  1.228347
baz   two     0.594110 -1.179119 -0.534873
      three  -1.523231  0.992770 -0.100973
qux   one     0.843675  0.546450 -0.669620
      two     1.147754  1.915836 -0.945840
      three   0.030786  0.375839  0.338216

In [12]: grp = df.groupby(level='second')

In [13]: grp.
grp.A          grp.C          grp.aggregate  grp.boxplot    grp.first      grp.groups     grp.last       grp.mean       grp.min        grp.ngroups    grp.ohlc       grp.size       grp.sum        grp.var        
grp.B          grp.agg        grp.apply      grp.filter     grp.get_group  grp.indices    grp.max        grp.median     grp.name       grp.nth        grp.prod       grp.std        grp.transform  

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

we have an open issue to put describe there....its not an office 'method' ATM (though it is dispactched)...hmm

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

revsied (@cploud is updating with .plot)

In [4]: grp.
grp.A          grp.agg        grp.boxplot    grp.cummin     grp.describe   grp.filter     grp.groups     grp.last       grp.median     grp.ngroups    grp.prod       grp.resample   grp.sum        grp.var        
grp.B          grp.aggregate  grp.count      grp.cumprod    grp.dtype      grp.first      grp.head       grp.max        grp.min        grp.nth        grp.quantile   grp.size       grp.tail       
grp.C          grp.apply      grp.cummax     grp.cumsum     grp.fillna     grp.get_group  grp.indices    grp.mean       grp.name       grp.ohlc       grp.rank       grp.std        grp.transform  

@nehalecky
Copy link
Contributor Author

@jreback and @cpcloud, that is looking great.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants