Skip to content

Series near-zero subtraction loss of precision #2760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nehalecky opened this issue Jan 27, 2013 · 3 comments
Closed

Series near-zero subtraction loss of precision #2760

nehalecky opened this issue Jan 27, 2013 · 3 comments

Comments

@nehalecky
Copy link
Contributor

Unless I am unaware of some preset precision in pandas, I've discovered a bug in what I believe to be some part of the Series subtraction operation in pandas 0.10.1, numpy 1.6.2. I haven't tested in other versions.

Data set:

a = [49105.962326,
49106.062271,
49106.112167,
49106.222305,
49106.332290]
s = pd.Series(a)

A simple slice to calculate deltas:

dt = s[1:] - s[:-1]

results in:

0   NaN
1     0
2     0
3     0
4   NaN

The resultant values (i.e., values stored at dt[1:3]) are exactly zero and of type numpy.float64. These zeros and the final null value are not correct, which can be shown with conversion to a numpy array and performing the same operation:

npa = np.array(s)
npa[1:] - npa[:-1]

results in the expected (and correct) values:

array([ 0.099945,  0.049896,  0.110138,  0.109985])

A little digging didn't reveal that this issue had been already mentioned, but may be related to Issues #2697 and/or #2069?

@dsm054
Copy link
Contributor

dsm054 commented Jan 30, 2013

I think this is expected behaviour. When you operate on the two Series, they're aligned on the index. s[1:] has 1 through 4, and s[:-1] 0 through 3. So you wind up with zero as a result where there are common indices (1,2,3), and NaN where the subtraction is undefined because of a missing value (0,4).

Either s.diff()[1:] or (s-s.shift())[1:], staying with Series, or s.values[1:] - s.values[:-1], dropping to arrays like you did, should give the numbers you're looking for.

@nehalecky
Copy link
Contributor Author

Hey @dsm054, I really appreciate your reply. Haha. Wow, you know, I'll just think I'll get out my sharpie now to write across the top of my laptop to not post issues late at night.... After not looking away from a screen for 14 hours and a few 🍸, one can get issue trigger happy. You're right, it is expected behavior (intrinsic to pandas' beauty, in fact), and noted from the first paragraph explaining pandas data structures:

Here is a basic tenet to keep in mind: data alignment is intrinsic. The link between labels and data will not be broken unless done so explicitly by you.

Thanks for taking the time to note this and do know that it wasn't in vain—I am smarter. Regarding myself and posting issues, a basic tenet to keep in mind: sit on it until the next morning.
Sorry for the noise. Closed. Keep calm, carry on.

@Geekly
Copy link

Geekly commented Feb 6, 2019

Just to test my understanding, when you slice a Series, the index is preserved. Any operations you make against another Series is align by the original Series index, not the index of the slice. Is that correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants