Skip to content

Inconsistency in return value of mean, median, and quantile on timedelta64[ns] series #4984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nathanwdavis opened this issue Sep 25, 2013 · 6 comments · Fixed by #4985
Closed
Assignees
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Milestone

Comments

@nathanwdavis
Copy link

If I have a Series of timedelta64[ns] (the result of a diff() on a Timestamp Series in my case), doing mean, median, and quantile operations on that series returns inconsistent values. Example:

In [68]: s
Out[68]: 
0   00:00:14.020705
1   00:00:14.020705
2   00:00:14.020705
3   00:00:28.041410
4   00:00:28.041410
5   00:00:42.062116
6   00:00:42.062116
7   00:00:42.062116
8   00:00:56.082821
9   00:02:20.207052
dtype: timedelta64[ns]

In [69]: s.mean()
Out[69]: 42062115618.0

In [70]: type(s.mean())
Out[70]: numpy.float64

In [71]: s.median()
Out[71]: 35051763015.0

In [72]: type(s.median())
Out[72]: float

In [73]: s.quantile(.95)
Out[73]: numpy.timedelta64(102351148003,'ns')

In [74]: type(s.quantile(.95))
Out[74]: numpy.timedelta64

As you can see, mean and median return a float (although the printed Out is a little different), but quantile returns a single timedelta64[ns].

This is with versions pandas==0.12.0 and numpy==1.7.1

@jreback
Copy link
Contributor

jreback commented Sep 25, 2013

by definition mean cannot return a timedelta64[ns] type (I suppose there are cases it could, but the general case it cannot). You are meaning an integer value that might not be an integer after that.

median for the same reason (though you could make a case where it should just return the closests, but right now it doesn't do that)

in 0.13/master this is supported

In [73]: s = Series([Timestamp('20130101') + timedelta(seconds=i*i) for i in range(10) ])

In [74]: s
Out[74]: 
0   2013-01-01 00:00:00
1   2013-01-01 00:00:01
2   2013-01-01 00:00:04
3   2013-01-01 00:00:09
4   2013-01-01 00:00:16
5   2013-01-01 00:00:25
6   2013-01-01 00:00:36
7   2013-01-01 00:00:49
8   2013-01-01 00:01:04
9   2013-01-01 00:01:21
dtype: datetime64[ns]

In [75]: td = s.diff()

In [76]: td
Out[76]: 
0        NaT
1   00:00:01
2   00:00:03
3   00:00:05
4   00:00:07
5   00:00:09
6   00:00:11
7   00:00:13
8   00:00:15
9   00:00:17
dtype: timedelta64[ns]

In [77]: pd.to_timedelta([int(td.quantile(.1))])
Out[77]: 
0   00:00:02.600000
dtype: timedelta64[ns]

In [78]: pd.to_timedelta([int(td.mean())])
Out[78]: 
0   00:00:09
dtype: timedelta64[ns]

In [79]: pd.to_timedelta([int(td.median())])
Out[79]: 
0   00:00:08
dtype: timedelta64[ns]

@ghost ghost assigned jreback Sep 25, 2013
@jreback
Copy link
Contributor

jreback commented Sep 25, 2013

let me see if can do the automatic inferring....prob not too hard

@nathanwdavis
Copy link
Author

Yeah, it makes sense that mean and median return float (because they have to). I guess what I would do (if I were designing the API) would be to have quantile also return float even though it does not strictly need to - for the sake of consistency.

Thanks for the tip about the new stuff in 0.13

@jreback
Copy link
Contributor

jreback commented Sep 25, 2013

@nathanwdavis I just fixed this in #4985; they can easily return timedeltas (and it makes sense), it just truncates the float (at the ns level), so no biggie.

Note that most timedelta ops return a Series (this has to do with boxing them, as the np.timedelta64 scalar is not the greatest...

In [1]: s = Series([Timestamp('20130101') + timedelta(seconds=i*i) for i in range(10) ])

In [2]: td = s.diff()

In [3]: td
Out[3]: 
0        NaT
1   00:00:01
2   00:00:03
3   00:00:05
4   00:00:07
5   00:00:09
6   00:00:11
7   00:00:13
8   00:00:15
9   00:00:17
dtype: timedelta64[ns]

In [4]: td.mean()
Out[4]: 
0   00:00:09
dtype: timedelta64[ns]

In [5]: td.quantile(.1)
Out[5]: 
0   00:00:02.600000
dtype: timedelta64[ns]

In [6]: td.median()
Out[6]: 
0   00:00:08
dtype: timedelta64[ns]

In [7]: pd.to_timedelta('00:00:02.6')
Out[7]: 
0   00:00:02.600000
dtype: timedelta64[ns]

@nathanwdavis
Copy link
Author

@jreback Very nice!! I like this. You're fast too 👍 Looking forward to 0.13...

@jreback
Copy link
Contributor

jreback commented Sep 25, 2013

@nathanwdavis

hahah...95% was already done....this was just a last little bit...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants