Skip to content

BUG: make sure that we are passing thru kwargs to groupby #15054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 4, 2017

BUG: allow timedelta64 to work in groupby with numeric_only=False

closes #5724

In [6]:         df = pd.DataFrame(
   ...:              {'group': [1, 1, 2],
   ...:               'int': [1, 2, 3],
   ...:               'float': [4., 5., 6.],
   ...:               'string': list('abc'),
   ...:               'category_string': pd.Series(list('abc')).astype('category'),
   ...:               'category_int': [7, 8, 9],
   ...:               'datetime': pd.date_range('20130101', periods=3),
   ...:               'datetimetz': pd.date_range('20130101',
   ...:                                           periods=3,
   ...:                                           tz='US/Eastern'),
   ...:               'timedelta': pd.timedelta_range('1 s', periods=3, freq='s')},
   ...:              columns=['group', 'int', 'float', 'string',
   ...:                       'category_string', 'category_int',
   ...:                       'datetime', 'datetimetz',
   ...:                       'timedelta'])

In [7]: df
Out[7]: 
   group  int  float string category_string  category_int   datetime                datetimetz  timedelta
0      1    1    4.0      a               a             7 2013-01-01 2013-01-01 00:00:00-05:00   00:00:01
1      1    2    5.0      b               b             8 2013-01-02 2013-01-02 00:00:00-05:00   00:00:02
2      2    3    6.0      c               c             9 2013-01-03 2013-01-03 00:00:00-05:00   00:00:03

# same as in master
In [8]: df.groupby('group').mean()
Out[8]: 
       int  float  category_int
group                          
1      1.5    4.5           7.5
2      3.0    6.0           9.0

# works
In [9]: df.groupby('group').mean(numeric_only=False)
Out[9]: 
       int  float  category_int            datetime                datetimetz       timedelta
group                                                                                        
1      1.5    4.5           7.5 2013-01-01 12:00:00 2013-01-01 12:00:00-05:00 00:00:01.500000
2      3.0    6.0           9.0 2013-01-03 00:00:00 2013-01-03 00:00:00-05:00        00:00:03

# same as in master
In [10]: df.groupby('group').sum()
Out[10]: 
       int  float  category_int
group                          
1        3    9.0            15
2        3    6.0             9

# works
In [11]: df.groupby('group').sum(numeric_only=False)
Out[11]: 
       int  float string  category_int  timedelta
group                                            
1        3    9.0     ab            15   00:00:03
2        3    6.0      c             9   00:00:03

# same as in master
In [13]: df.groupby('group').min()
Out[13]: 
       int  float string  category_int   datetime                datetimetz  timedelta
group                                                                                 
1        1    4.0      a             7 2013-01-01 2013-01-01 00:00:00-05:00   00:00:01
2        3    6.0      c             9 2013-01-03 2013-01-03 00:00:00-05:00   00:00:03

@jreback jreback added Groupby Timedelta Timedelta data type labels Jan 4, 2017
@codecov-io
Copy link

codecov-io commented Jan 4, 2017

Current coverage is 84.73% (diff: 64.61%)

Merging #15054 into master will decrease coverage by 0.01%

@@             master     #15054   diff @@
==========================================
  Files           145        145          
  Lines         51232      51270    +38   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43420      43445    +25   
- Misses         7812       7825    +13   
  Partials          0          0          

Powered by Codecov. Last update 0fe491d...768fce1

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also explicitly test that groupby().mean() now works on a timedelta column? (as I understand this was also what was reported in the issue)

# categoricals are only 1d, so we
# are not setup for dim transforming
if is_categorical_dtype(values):
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the coverage report, this error is never reached. Is it needed for the categorical you added in the test?

@jreback jreback force-pushed the groupby_arg branch 2 times, most recently from 045bcb5 to 75fc113 Compare January 9, 2017 21:33
@jreback
Copy link
Contributor Author

jreback commented Jan 9, 2017

so finally fixed this. was a PITA!

passing numeric_only=False simply did not work before at all. Now datetimes / timedeltas (and even strings work if the function supports it).

BUG: allow timedelta64 to work in groupby with numeric_only=False

closes pandas-dev#5724
@jreback
Copy link
Contributor Author

jreback commented Jan 11, 2017

@jorisvandenbossche

@jreback jreback closed this in 8e13da2 Jan 18, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
BUG: allow timedelta64 to work in groupby with numeric_only=False
closes pandas-dev#5724

Author: Jeff Reback <[email protected]>

Closes pandas-dev#15054 from jreback/groupby_arg and squashes the following commits:

768fce1 [Jeff Reback] BUG: make sure that we are passing thru kwargs to groupby BUG: allow timedelta64 to work in groupby with numeric_only=False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby.mean, etc, doesn't recognize timedelta64
3 participants