Skip to content

BUG: Avoid cancellations in nanskew/nankurt. #12121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jvkersch
Copy link

closes #11974

This replaces the implementation of nanskew and nankurt by an algorithm that does not produce round-off error due to subtracting very similar large values. The sample skew and kurtosis are computed directly from the moments, rather than expanding the moments, which leads to cancellation errors. The algorithm implemented here is about as efficient as the original implementation (in terms of time and memory); a more efficient one-pass algorithm is certainly possible but would require some Cython code.

@jvkersch jvkersch force-pushed the fix/stable-kurt-skew branch 2 times, most recently from 7335213 to 92bf9a4 Compare January 24, 2016 14:43
@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jan 24, 2016
@@ -888,6 +888,103 @@ def prng(self):
return np.random.RandomState(1234)


class TestNanskewFixedValues(tm.TestCase):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number here as a comment

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

very minor change. pls rebase. ping when ready.

@jreback jreback added this to the 0.18.0 milestone Jan 24, 2016
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Jan 24, 2016

result = ((np.sqrt(count * count - count) * C) /
((count - typ(2)) * np.sqrt(B)**typ(3)))
result = count * (count-1) ** 0.5 / (count-2) * m3 / m2 ** 1.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add parens for clarity of precedence?

@kawochen
Copy link
Contributor

In computing skewness, is the computation of m3 still potentially unstable? (Just wanted to understand the new behavior)

@jvkersch jvkersch force-pushed the fix/stable-kurt-skew branch from 92bf9a4 to 7b310da Compare January 25, 2016 00:33
@jvkersch
Copy link
Author

@jreback Thanks for the quick review. I made the changes, and Travis passes.

@kawochen Not sure why you refer to m3 explicitly, but while the calculation with the moments can also be unstable (e.g. when you have a series with huge values and a huge mean), it should be much less so than the previous implementation, which does things like (x ** k).sum() - x.sum() ** k, which quickly picks up a lot of round-off error.

@kawochen
Copy link
Contributor

@jvkersch I referred to m3 because that was one of the calculations that could benefit from doing the positive and negative parts separately. Symmetric samples would have a 0 skew more often. But the final skew numbers of symmetric samples are close enough to 0 as they are, so I'm just pointing that out, but not suggetsing that we do that.

@jreback jreback closed this in 419e9a2 Jan 25, 2016
@jreback
Copy link
Contributor

jreback commented Jan 25, 2016

@jvkersch thanks!

@jvkersch
Copy link
Author

@kawochen I see, that makes sense; I didn't think about the alternate signs. Thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.skew() for constant series returns inconsistent values
3 participants