Skip to content

ENH: Allow aggregate numeric operations on timedelta64. #6884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

danielballan
Copy link
Contributor

closes #5724

Currently, timedelta64 columns can't be used in aggregate group operations. They are quietly dropped or, if no other columns are present, an exception is raised:

In [2]: df
Out[2]: 
     A        E
0  foo 00:00:01
1  bar 00:00:02
2  foo 00:00:03
3  bar 00:00:04


In [3]: df.groupby('A').mean()
Out[3]: 
(...)
DataError: No numeric types to aggregate

But any operation that works on numerical data is also well defined on timedeltas. This PR relaxes the restriction on those methods designated "numeric only" to accept numeric dtypes and timedelta64. I include a unit test with one simple case each for DataFrameGroupBy and SeriesGroupBy.

In [2]: df.groupby('A').mean()
Out[2]: 

           E
A           
bar 00:00:03
foo 00:00:02

@jreback
Copy link
Contributor

jreback commented Apr 15, 2014

FYI their is an issue for adding timedelta support to groupby - see if u can find it

also this needs a fair amount of tests as trit are many paths thru groupby

basically need to mimic most of the datetime64 type tests

@danielballan
Copy link
Contributor Author

Have we established some pattern for skipping timedelta tests for numpy < 1.7?

Also, while I'm here, I'll fix the English in this error....

ValueError: to_timedelta is not support for numpy < 1.7

@jreback
Copy link
Contributor

jreback commented Apr 15, 2014

numpy timedelta is thoroughly broken < 1.7

@cancan101
Copy link
Contributor

@jreback jreback added this to the 0.14.0 milestone Apr 16, 2014
@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

@danielballan hows' this coming?

@danielballan
Copy link
Contributor Author

A quick, perhaps obvious, thought: this is not exactly analogous to what was done for datetime. For example, summing datetimes is (correctly) not defined but summing timedeltas should be.

But many of the tests will be analogous. I have to put this aside until next week, but I can pick it up by the beginning of May. If someone wants to tap in before then, feel free to speak up.

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

gr8 thanks

would like like to get some minimal set at least in for 0.14

as I agree it's a bit nontrivial
so test away!

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

@danielballan can you rebase, add a couple more tests, and release note / v0.14.0.txt ?

@keir
Copy link

keir commented Apr 29, 2014

I am working on finishing this patch, and I'm not convinced this is the right approach. What about making timedeltas considered a numeric type? Otherwise we're going to have to update all locations where there are is_numeric checks.

I suppose this opens the question about what "is_numeric" means.

In particular, the current Pandas master does

def _cython_agg_blocks(self, how, numeric_only=True):
    ...
    if numeric_only:
        data = data.get_numeric_data(copy=False)

I just started using Pandas today, so it's possible the original patch is the right approach.

@keir
Copy link

keir commented Apr 29, 2014

@jreback

@danielballan
Copy link
Contributor Author

I considered lumping time deltas in with is_numeric, but I decided it was
clearer, more readable this way. Still up for debate though.

On Tuesday, April 29, 2014, Keir Mierle [email protected] wrote:

@jreback https://github.com/jreback


Reply to this email directly or view it on GitHubhttps://github.com//pull/6884#issuecomment-41642641
.

@danielballan
Copy link
Contributor Author

Rebased.

I can add tests this week, but I also welcome any contributions that @keir has to offer. FYI, @keir, as I revisit this, I find one place where we check is_numeric in a way that excludes timedeltas: above the call to ensure_float. This is why I made them separate.

@keir
Copy link

keir commented Apr 29, 2014

@danielballan, I am operating off trunk, which seems to have changed one of the functions (_cython_agg_blocks). I ported the code tried my approach to make timedelta is_numeric, but I am now not convinced of the approach. Do you want to update to master?

@danielballan
Copy link
Contributor Author

OK. I updated to master this morning. See the latest version of the commit. (N.B. The tests are broken; this is a work-in-progress commit.)

@jreback
Copy link
Contributor

jreback commented May 1, 2014

@danielballan how's this coming?

@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 5, 2014
@danielballan
Copy link
Contributor Author

@jreback If I can push through this this weekend, still time for 0.14? Actually maybe better to wait for 0.14.1 so we can get #5986 also.

@jreback
Copy link
Contributor

jreback commented May 16, 2014

well I want to get a RC1 out, but if you are ready after, then ok

@jreback
Copy link
Contributor

jreback commented May 16, 2014

FYI, I think #5986 for quantile has been fixed by @TomAugspurger very recently

@TomAugspurger
Copy link
Contributor

I don't know if I tested timedeltas at all in quantile, just timestamps.

@keir
Copy link

keir commented May 16, 2014

I don't suggest merging this patch. I have a new, smaller, different patch that changes timedelta64 to be a numeric type. I will try to upload it later today but currently I have some production issues.

@danielballan
Copy link
Contributor Author

@keir Does that approach create problems when com.ensure_float is called on timedelta64 data? I am eager to see it and build out tests. Maybe we are too late for 0.14, which is too bad, but I am eager to have it available on master regardless.

@jreback
Copy link
Contributor

jreback commented May 30, 2014

@danielballan 0.14.0 released (ann coming later today), but this could be on master/0.14.1 in any event

@jreback
Copy link
Contributor

jreback commented Jun 5, 2014

@danielballan can you rebase this?

@jreback
Copy link
Contributor

jreback commented Sep 4, 2014

@danielballan revisit? i'd like to get into 0.15

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@danielballan
Copy link
Contributor Author

@jreback I won't have time to touch this again until January, but at that point I'll be joining mpl's @tacaswell at Brookhaven, and I will have some solid time to contribute again.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2014

@danielballan np, should be straight forward with the new Timedelta type in any event.

have fun, according to cc @ericdill you have LOTS of data!

@danielballan
Copy link
Contributor Author

I was glad to see Timedelta go in. Yes, we have our work cut out for us!

@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@danielballan can you revisit / update

@danielballan
Copy link
Contributor Author

OK, I'm back on the job. :- D

@gruen
Copy link
Contributor

gruen commented Jan 18, 2015

My sanity thanks you.

On Jan 18, 2015, at 4:57 PM, Dan Allan [email protected] wrote:

OK, I'm back on the job. :- D


Reply to this email directly or view it on GitHub.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jreback
Copy link
Contributor

jreback commented May 9, 2015

closing pls reopen if/when updated

@jreback jreback closed this May 9, 2015
@rjp23
Copy link

rjp23 commented Jun 10, 2019

Trying to work out why groupby was dropping my datetime columns and came across this. Did it never get fixed?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 13, 2019 via email

@rjp23
Copy link

rjp23 commented Jun 13, 2019

Well it should be fixed but I don't think I'm knowledgeable enough to do it.

@jazzlw
Copy link

jazzlw commented Mar 20, 2024

This seems to have been fixed somewhere between version 1.3.4 and version 2.2.0 (two versions I had convenient to test) but I'm not sure when or how. If someone knows, can they put it here for people who end up here confused in the future, as I did?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby.mean, etc, doesn't recognize timedelta64
8 participants