Skip to content

Series.diff() not exact for huge numbers: approximation or mistake? #2087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ktii opened this issue Oct 20, 2012 · 1 comment
Closed

Series.diff() not exact for huge numbers: approximation or mistake? #2087

ktii opened this issue Oct 20, 2012 · 1 comment
Labels
Milestone

Comments

@ktii
Copy link

ktii commented Oct 20, 2012

Series.diff() is not exact for huge numbers. This might be due to some quick approximation or just a bug/mistake.

In [40]: a = 10000000000000000

In [41]: log10(a)
Out[41]: 16.0

In [42]: b = a + 1

In [43]: s = Series([a,b])

In [44]: s.diff()
Out[44]:
0 NaN
1 0

numpy.diff() is not to blame:

In [45]: v = s.values

In [46]: v
Out[46]: array([10000000000000000, 10000000000000001], dtype=int64)

In [47]: diff(v)
Out[47]: array([1], dtype=int64)

One less digit is fine:

In [48]: a = 1000000000000000

In [48]: log10(a)
Out[48]: 15.0

In [49]: b = a + 1

In [50]: s = Series([a,b])

In [51]: s.diff()
Out[51]:
0 NaN
1 1

Why do I need these huge numbers? Certain timestamps (in my case VMS) have this many digits (tenth of microseconds since the year 1858 if I remember correctly).

@wesm
Copy link
Member

wesm commented Oct 20, 2012

This is a casualty of how NA values are represented in integer arrays (by upcasting the data to floating point). diff should be more careful in working with integer dtypes in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants