BUG/ENH: GH10319 added higher_precision argument to rolling_mean/sum #10328

kawochen · 2015-06-11T07:17:26Z

This is not specifically for #10319, but should close it for most cases, should users pass in higher_precision=True. #10319 is not really a bug in the Cython code, just precision loss when you do finite-precision arithmetic naively.

>>> sum([0.00012456, 0.0003, -0.00012456, -0.0003])
-5.421010862427522e-20

This is 2**-64 and quite negligible, although it can be made much bigger with carefully chosen numbers. I think rolling_* functions that require summation could use some more precision, since doing everything in one pass accumulates rounding error, even if the rolling window is narrow.
The only reason why [0.00012456, 0.0003, 0.0, 0.0] currently returns the 'correct' result while [0.00012456, 0.0003, -0.0, -0.0] doesn't is because of explicit negative entry counting. This is only done in rolling_mean but not rolling_sum, and the lack of symmetry doesn't seem desirable. Perhaps we could remove the counting, or add it to rolling_sum as well?

…rolling_sum

jreback · 2015-06-11T13:26:40Z

This is going to kill performance. I also think an option to do this is really a bad API. I would only be not -1 if this could be done by another non-slow algo (e.g. like we did using Welford method for variance) .

kawochen · 2015-06-11T13:57:01Z

Point taken. The algo itself doesn't seem that slow to me, though my implementation certainly is. I will do everything C-style and remove the option (no branching) and measure the performance.

shoyer · 2015-06-11T13:59:12Z

I agree with @jreback, I don't think this is right API. Double doubles are fine (in theory), but the right way to use them should be to cast your column to a double double explicitly, e.g., via astype. But I would also want to see this float128 type in numpy first rather than trying to implement it in pandas.

jreback · 2015-06-11T14:06:04Z

actually float128 DO exist in numpy (though not supported on some systems) - so better is to add them to the code generators (but these would have to be protected by try/excepts as the signature doesn't exist) - maybe a better soon for that is to use ifdef though

kawochen · 2015-06-11T14:23:42Z

@shoyer My concern is that the current rolling_sum/mean algorithm, although fast, introduces a lot more numerical instability than the most naive algorithm without the users being aware. The last sum, even if the window size is 1, takes 2*n computations.

jreback · 2015-06-11T14:38:14Z

@kawochen separately you could short-circuit rolling functions if the window==1 (though only for sum/mean, var/std etc are all nan at that point.

BUG/ENH: GH10319 added higher_precision argument to rolling_mean and …

fe4502c

…rolling_sum

kawochen closed this Jun 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/ENH: GH10319 added higher_precision argument to rolling_mean/sum #10328

BUG/ENH: GH10319 added higher_precision argument to rolling_mean/sum #10328

kawochen commented Jun 11, 2015

jreback commented Jun 11, 2015

kawochen commented Jun 11, 2015

shoyer commented Jun 11, 2015

jreback commented Jun 11, 2015

kawochen commented Jun 11, 2015

jreback commented Jun 11, 2015

BUG/ENH: GH10319 added higher_precision argument to rolling_mean/sum #10328

BUG/ENH: GH10319 added higher_precision argument to rolling_mean/sum #10328

Conversation

kawochen commented Jun 11, 2015

jreback commented Jun 11, 2015

kawochen commented Jun 11, 2015

shoyer commented Jun 11, 2015

jreback commented Jun 11, 2015

kawochen commented Jun 11, 2015

jreback commented Jun 11, 2015