Skip to content

Setting a new column with the sum of two existing columns does not work correctly for large dataframes #12088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kirickt opened this issue Jan 19, 2016 · 5 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@kirickt
Copy link

kirickt commented Jan 19, 2016

Setting a new column with the sum of two existing columns does not work correctly for large dataframes: X['C']=X['A']+X['B'] may or may not work correctly for some runs and frame of size > 10000 X['A'] and X['B'] may contain any float number. This issue is absent in pandas 0.16.2.
I am running 64 bit python on Windows 10. See attached notebook for details:

ReadDebug1.zip

To reproduce:

+++++++++++++++++++++++++++++++++++++

import pandas as pd
DataFrameSize=10001 ## will work with 10000 or less
XR = pd.DataFrame({'A' : pd.Series(1,index=list(range(DataFrameSize)),dtype='float32'),
'B' : pd.Series(2,index=list(range(DataFrameSize)),dtype='float32')})

def CleanDebug(X):
X.loc[:,'C']=X.loc[:,'A']+X.loc[:,'B']
#X['C']=X['A']+X['B']
return X

for i in xrange(1000):
print 'iteration ',i
ry = CleanDebug(XR)
assert abs(ry.C.sum()-30003)<1

@jorisvandenbossche
Copy link
Member

Can you show the output of pd.show_versions()?
Maybe also put the notebook in a gist on github, then it is easier to see the content

@jreback
Copy link
Contributor

jreback commented Jan 19, 2016

this is almost certainly the same as #12023

you prob have and older numexpr, upgrade to 2.4.6 (latest) and reconfirm.

@kirickt
Copy link
Author

kirickt commented Jan 19, 2016

yes, it was older num_expr 2.4.4. After upgrade to 2.4.6 bug went away.Thanks a lot!

@kirickt
Copy link
Author

kirickt commented Jan 19, 2016

Should I delete the post?

@jreback jreback closed this as completed Jan 19, 2016
@jreback
Copy link
Contributor

jreback commented Jan 19, 2016

nope it's good

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Jan 19, 2016
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jan 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants