BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

nehalecky · 2013-10-03T22:46:08Z

Previously, in 0.12 and earlier, I could quickly visualize groupby objects with a call to .plot(). Currently in master, the .plot method on a groupby object raises, with following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-3760a47a721b> in <module>()
----> 1 grp.kWh.plot()

/pandas/pandas/core/groupby.pyc in __getattr__(self, attr)
    250 
    251         if hasattr(self.obj, attr) and attr != '_cache':
--> 252             return self._make_wrapper(attr)
    253 
    254         raise AttributeError("%r object has no attribute %r" %

/pandas/pandas/core/groupby.pyc in _make_wrapper(self, name)
    265                    "using the 'apply' method".format(kind, name,
    266                                                      type(self).__name__))
--> 267             raise AttributeError(msg)
    268 
    269         f = getattr(self.obj, name)

AttributeError: Cannot access callable attribute 'plot' of 'SeriesGroupBy' objects, try using the 'apply' method

The text was updated successfully, but these errors were encountered:

nehalecky · 2013-10-03T22:47:33Z

Oh yeah, currently at:

print pd.__version__
0.12.0-706-g8e784e7

cpcloud · 2013-10-03T22:49:00Z

This is "sort of" by design. We're trying to discourage use of forwarded methods that don't make sense. Hopefully you're not using master in production :) Clearly this one shouldn't have been disabled. Very quick fix.

nehalecky · 2013-10-03T22:53:28Z

Hey @cpcloud, thanks for the quick reply, and makes sense to discourage methods that don't belong.

Not using in production, but testing master on our current build—I like living on the edge! Traced to this commit:
b709389

Also, I noticed that the helpful tab auto-complete of column names was clobbered with the commit as well?

Thanks again.

cpcloud · 2013-10-03T22:54:55Z

That code didn't touch any autocompletion code. Can you show me an example of what you mean?

jreback · 2013-10-03T23:02:42Z

@cpcloud there's no _local_dir() method on grouby's.....should add (similar to in core/generic.py as well (but in this case refer to IIRC self.obj, e.g. the groupby object
this should prob be in SeriesGroupBy and DataFrameGroupby slightly differently

jreback · 2013-10-03T23:04:46Z

something like this: you can just forward it I think

in core/groupby.py

on Groupby

   def _local_dir(self):
        """ add the string-like attributes from the info_axis """
        return self.obj._local_dir()

cpcloud · 2013-10-03T23:07:56Z

sounds good

cpcloud · 2013-10-03T23:28:27Z

bonus: figured out the groupby double pltting issue

nehalecky · 2013-10-03T23:33:22Z

Hey @jreback, thanks for explaining more—I wasn't familiar with how pandas propagates the column names to be listed like attributes on the groupby object. @cpcloud, to answer your question, I have a df, like:

In [11]: df
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 30852 entries, 2013-01-01 00:00:00-06:00 to 2013-06-26 15:00:00-05:00
Data columns (total 2 columns):
kWh         30852  non-null values
meter_id    30852  non-null values
dtypes: float64(1), object(1)

In [12]: df.head()
Out[12]: 
                           kWh meter_id
2013-01-01 00:00:00-06:00   78  TSU_151
2013-01-01 00:15:00-06:00   72  TSU_151
2013-01-01 00:30:00-06:00   78  TSU_151
2013-01-01 00:45:00-06:00   78  TSU_151
2013-01-01 01:00:00-06:00   84  TSU_151

I create groupings, like:

In [18]: grp = df.groupby(df.meter_id)
In [19]: grp
Out[19]: <pandas.core.groupby.DataFrameGroupBy object at 0x10ef81ed0>

And it's here where things aren't the same, as I could previously access individual columns names via grp.k<tab> which would autocomplete to grp.kWh. I am still able to inspect, like:

In [20]: grp.kWh
Out[20]: <pandas.core.groupby.SeriesGroupBy object at 0x10ef81a90>

Also, I've just noticed that also missing in groupby autocomplete are are few other helpful methods like .describe(), which I am always using. Still, accessible via explicitly typing out:

In [21]: grp.describe()
Out[16]: 
                         kWh
meter_id                    
TSU_148  count  13362.000000
         mean     395.213434
         std       81.315125
         min        0.000000
         25%      325.300000
         50%      386.000000
         75%      453.600000
         max      666.500000
TSU_150  count   1672.000000
         mean     315.579725
         std       53.129335
         min       62.300000
         25%      280.700000
         50%      293.900000
         75%      339.825000
         max      577.100000
TSU_151  count  15818.000000
         mean     165.428246
         std       49.841351
         min       60.000000
         25%      108.000000
         50%      180.000000
         75%      204.000000
         max      276.000000

Continuing, after typing out all kWh, I can once again tab autocomplete, like:

In [21]: grp.kWh.<tab>
grp.kWh.agg        grp.kWh.first      grp.kWh.last       grp.kWh.min        grp.kWh.ohlc       grp.kWh.sum        
grp.kWh.aggregate  grp.kWh.get_group  grp.kWh.max        grp.kWh.name       grp.kWh.prod       grp.kWh.transform  
grp.kWh.apply      grp.kWh.groups     grp.kWh.mean       grp.kWh.ngroups    grp.kWh.size       grp.kWh.var        
grp.kWh.filter     grp.kWh.indices    grp.kWh.median     grp.kWh.nth        grp.kWh.std

But again, .describe() is missing from the list! Hope that helps and let me know if I can get you any more info!

Thanks!

jreback · 2013-10-03T23:34:52Z

@nehalecky this was just changed, rather than do an explicity ipython autocomplete its more correct to define __dir__ on the object (which is done in the base class), with a _loca_dir() override to do local attributes (e.g. column names and such)

jtratner · 2013-10-04T00:11:15Z

But clearly this needs to be fixed so it still completes column names and describe.

jreback · 2013-10-04T00:23:56Z

this look right?

In [9]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                           names=['first', 'second'])

In [10]: df = DataFrame(np.random.randn(10, 3), index=index,columns=['A', 'B', 'C'])

In [11]: df
Out[11]: 
                     A         B         C
first second                              
foo   one    -0.939610 -0.109232 -0.540813
      two    -0.356905  1.118679  0.497318
      three  -0.262202  1.665174 -0.293807
bar   one     1.111391  2.378450 -0.252010
      two     0.155386 -0.893460  1.228347
baz   two     0.594110 -1.179119 -0.534873
      three  -1.523231  0.992770 -0.100973
qux   one     0.843675  0.546450 -0.669620
      two     1.147754  1.915836 -0.945840
      three   0.030786  0.375839  0.338216

In [12]: grp = df.groupby(level='second')

In [13]: grp.
grp.A          grp.C          grp.aggregate  grp.boxplot    grp.first      grp.groups     grp.last       grp.mean       grp.min        grp.ngroups    grp.ohlc       grp.size       grp.sum        grp.var        
grp.B          grp.agg        grp.apply      grp.filter     grp.get_group  grp.indices    grp.max        grp.median     grp.name       grp.nth        grp.prod       grp.std        grp.transform

jreback · 2013-10-04T00:24:28Z

we have an open issue to put describe there....its not an office 'method' ATM (though it is dispactched)...hmm

jreback · 2013-10-04T00:36:30Z

revsied (@cploud is updating with .plot)

In [4]: grp.
grp.A          grp.agg        grp.boxplot    grp.cummin     grp.describe   grp.filter     grp.groups     grp.last       grp.median     grp.ngroups    grp.prod       grp.resample   grp.sum        grp.var        
grp.B          grp.aggregate  grp.count      grp.cumprod    grp.dtype      grp.first      grp.head       grp.max        grp.min        grp.nth        grp.quantile   grp.size       grp.tail       
grp.C          grp.apply      grp.cummax     grp.cumsum     grp.fillna     grp.get_group  grp.indices    grp.mean       grp.name       grp.ohlc       grp.rank       grp.std        grp.transform

nehalecky · 2013-10-04T00:54:09Z

@jreback and @cpcloud, that is looking great.

Thanks!

ghost assigned cpcloud Oct 3, 2013

cpcloud mentioned this issue Oct 4, 2013

BUG: allow plot, boxplot, hist and completion on GroupBy objects #5105

Merged

cpcloud closed this as completed in #5105 Oct 4, 2013

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

nehalecky commented Oct 3, 2013

nehalecky commented Oct 3, 2013

cpcloud commented Oct 3, 2013

nehalecky commented Oct 3, 2013

cpcloud commented Oct 3, 2013

jreback commented Oct 3, 2013

jreback commented Oct 3, 2013

cpcloud commented Oct 3, 2013

cpcloud commented Oct 3, 2013

nehalecky commented Oct 3, 2013

jreback commented Oct 3, 2013

jtratner commented Oct 4, 2013

jreback commented Oct 4, 2013

jreback commented Oct 4, 2013

jreback commented Oct 4, 2013

nehalecky commented Oct 4, 2013

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

BUG: in .groupby.SeriesGroupBy plot not accessible? #5102

Comments

nehalecky commented Oct 3, 2013

nehalecky commented Oct 3, 2013

cpcloud commented Oct 3, 2013

nehalecky commented Oct 3, 2013

cpcloud commented Oct 3, 2013

jreback commented Oct 3, 2013

jreback commented Oct 3, 2013

cpcloud commented Oct 3, 2013

cpcloud commented Oct 3, 2013

nehalecky commented Oct 3, 2013

jreback commented Oct 3, 2013

jtratner commented Oct 4, 2013

jreback commented Oct 4, 2013

jreback commented Oct 4, 2013

jreback commented Oct 4, 2013

nehalecky commented Oct 4, 2013