BUG(?): Var of Timedelta with empty / NA #18880

TomAugspurger · 2017-12-20T19:46:49Z

edit: it seems like the bug is that var returns numeric values for the operation, instead of timedeltas like .std (#18880 (comment))

Working on the empty / all-NA stuff. This is related, but separate.

Currently for an all-NA series of timedelta's, we return NaT for .var() on an empty series (see Out[44]). I think this should be NaN since the reduction is returning numeric values, not timedeltas.

In [40]: s = pd.Series(pd.timedelta_range(0, periods=12, freq='S'))

In [41]: s[0] = np.nan

In [42]: s.var()
Out[42]: 1.1e+19

In [43]: s.var(skipna=False)
Out[43]: nan

In [44]: s[:0].var()
Out[44]: NaT

I've just noticed some other buggy behavior with aggregations on timedeltas, so maybe this can become a meta-issue about timedeltas and numeric aggregations:

In [57]: s.sum(skipna=False)
Out[57]: Timedelta('-106752 days +00:13:49.145223')

I'm guessing we improperly add the timedelta min here.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2017-12-20T19:49:36Z

Ahh interesting, .std returns the units in timedelta's so maybe that's the bug that .var isn't handling dtypes correctly:

In [11]: s.std()
Out[11]: Timedelta('0 days 00:00:03.316624')

In [12]: s[:0].std()
Out[12]: NaT

jreback · 2017-12-20T20:07:36Z

what would this be numeric? any reduction should stay the same dtype (except int-> float).

jreback · 2017-12-20T20:09:07Z

yeah it prob doesn't does handle the ufunc propertly (sqrt), e.g.

This works, but prob not what its doing.

In [7]: pd.Timedelta(np.sqrt(s.std().value))
Out[7]: Timedelta('0 days 00:00:00.000057')

TomAugspurger · 2017-12-20T20:23:06Z

Yeah, I was just saying numeric because the non-NA versions were numeric, but that's the real bug. I'll edit the original bug report.

Closes pandas-dev#18880

JesperDramsch · 2018-09-01T12:19:03Z

Should this be closed? @TomAugspurger

TomAugspurger · 2018-09-11T21:07:57Z

I don't think so, do you? Series[timedelta].var() is still numeric for me, rather than a Timedelta.

jbrockmendel · 2018-11-30T02:38:02Z

Shouldnt Series[timedelta].var() raise? std() makes sense.

TomAugspurger · 2019-06-07T13:15:01Z

Shouldnt Series[timedelta].var() raise? std() makes sense.

Can you clarify this? The meaning of std & var for timedeltas is a bit hazy to me, but why would one make sense and not the other?

jbrockmendel · 2019-06-07T13:49:37Z

Can you clarify this? The meaning of std & var for timedeltas is a bit hazy to me, but why would one make sense and not the other?

The issue is the units. Var(td64ns) would have to have unit ns^2, which we don't have a way to represent.

TomAugspurger · 2019-06-07T14:07:46Z

One more question, why do we not say that `Var(td64ns) <=> Timedelta(Var(td64ns.asi8))`? IOW ```python In [14]: values = [1000, 2000, 3000] In [15]: pd.Series(values).std() Out[15]: 1000.0 In [16]: pd.Series(values).var() Out[16]: 1000000.0 In [18]: pd.Series(pd.to_timedelta(values)).std() Out[18]: Timedelta('0 days 00:00:00.000001') # = Timedelta(values.std()) In [19]: pd.Series(pd.to_timedelta(values)).std().value Out[19]: 1000 ``` I agreed that ns^2 isn't something we can represent, but what about Timedelta(i8^2)? Like I said, this is hazy to me, so it may be meaningless.

…

On Fri, Jun 7, 2019 at 8:49 AM jbrockmendel ***@***.***> wrote: Can you clarify this? The meaning of std & var for timedeltas is a bit hazy to me, but why would one make sense and not the other? The issue is the units. Var(td64ns) would have to have unit ns^2, which we don't have a way to represent. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18880?email_source=notifications&email_token=AAKAOIRHOOKY5NU7V3B5HWDPZJRPTA5CNFSM4EJDQG6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXF366Q#issuecomment-499892090>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIVU5JZAMMXITAREZ7TPZJRPTANCNFSM4EJDQG6A> .

jbrockmendel · 2019-06-07T14:16:35Z

i8values = np.array([1e9, 2e9, 3e9], dtype='i8')
tdi = pd.to_timedelta(i8values)
ser = pd.Series(tdi)

>>> ser.std()
Timedelta('0 days 00:00:01')

I think we're on the same page that ser.std() here makes sense. If we could do ser.var(), it would have to obey the identity ser.var() == (ser.std())**2. But the right hand side we can't do.

Does this help? Any way you do it, you need to get a ns^2 in there.

TomAugspurger added this to the Next Major Release milestone Dec 20, 2017

TomAugspurger added Difficulty Intermediate Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type labels Dec 20, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 21, 2017

BUG: Fixed timedelta numeric operations

4cb07d6

Closes pandas-dev#18880

TomAugspurger mentioned this issue Dec 21, 2017

BUG: Fixed timedelta numeric operations #18892

Closed

TomAugspurger mentioned this issue Jun 7, 2019

Parallel variance computation for dataframes dask/dask#4865

Merged

2 tasks

jbrockmendel mentioned this issue Sep 4, 2019

BUG: Series[timdelta64].var() should _not_ work #28289

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Sep 7, 2019

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

jreback closed this as completed in #28289 Nov 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG(?): Var of Timedelta with empty / NA #18880

BUG(?): Var of Timedelta with empty / NA #18880

TomAugspurger commented Dec 20, 2017 •

edited

Loading

TomAugspurger commented Dec 20, 2017

jreback commented Dec 20, 2017

jreback commented Dec 20, 2017

TomAugspurger commented Dec 20, 2017

JesperDramsch commented Sep 1, 2018

TomAugspurger commented Sep 11, 2018

jbrockmendel commented Nov 30, 2018

TomAugspurger commented Jun 7, 2019 •

edited

Loading

jbrockmendel commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019 via email

jbrockmendel commented Jun 7, 2019

BUG(?): Var of Timedelta with empty / NA #18880

BUG(?): Var of Timedelta with empty / NA #18880

Comments

TomAugspurger commented Dec 20, 2017 • edited Loading

TomAugspurger commented Dec 20, 2017

jreback commented Dec 20, 2017

jreback commented Dec 20, 2017

TomAugspurger commented Dec 20, 2017

JesperDramsch commented Sep 1, 2018

TomAugspurger commented Sep 11, 2018

jbrockmendel commented Nov 30, 2018

TomAugspurger commented Jun 7, 2019 • edited Loading

jbrockmendel commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019 via email

jbrockmendel commented Jun 7, 2019

TomAugspurger commented Dec 20, 2017 •

edited

Loading

TomAugspurger commented Jun 7, 2019 •

edited

Loading