-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
in your example, what happens if you add an assert after the assignment? xx['d'][1] = 42
assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42) |
The assert passes, but printing it still yields the incorrect result. Python 2.7.3 (default, Apr 10 2013, 05:13:16)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
a b c d
0 0.026757 0.180201 0.370744 0.026757
1 -0.935387 0.784654 1.573150 -0.935387
2 1.069168 0.154530 -2.280620 1.069168
3 3.467050 0.701040 -1.047132 3.467050
4 0.761229 -0.739372 0.537327 0.761229
>>> xx['d'][1] = 42
>>> assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42)
>>> print xx
a b c d
0 0.026757 0.180201 0.370744 0.026757
1 -0.935387 0.784654 1.573150 -0.935387
2 1.069168 0.154530 -2.280620 1.069168
3 3.467050 0.701040 -1.047132 3.467050
4 0.761229 -0.739372 0.537327 0.761229
>>> quit() |
this is a chained assignment. there is an issue about it (but can't find it right this sec), there is a cache reference that is quite tricky to update, but fixed for 0.13 in any event this might work but cannot be relied upon. In particular for a multiple dtype frame it will not work, but a homogeneous dtyped frame it might be working on a view, depending on the exact memory layout and this could work bottom line is don't use this syntax (I would raise on it but without deep inspections of the call stack it's pretty hard to figure out that this is even going on when in the chained setitem) |
@jreback what's the preferred syntax for this? |
alwyas use ix/loc/ iloc xx.loc[1,'d'] = 42 |
Using loc solved the problem: Python 2.7.3 (default, Apr 10 2013, 05:13:16)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
a b c d
0 -0.554725 -0.024649 -0.044875 -0.554725
1 0.086357 0.974139 0.372063 0.086357
2 1.562558 0.881592 -0.267495 1.562558
3 -0.800493 -0.595808 -1.150203 -0.800493
4 -0.797210 -0.464506 -0.020967 -0.797210
>>> xx.loc[1,'d'] = 42
>>> print xx
a b c d
0 -0.554725 -0.024649 -0.044875 -0.554725
1 0.086357 0.974139 0.372063 42.000000
2 1.562558 0.881592 -0.267495 1.562558
3 -0.800493 -0.595808 -1.150203 -0.800493
4 -0.797210 -0.464506 -0.020967 -0.797210
>>> quit() I'm glad that this problem has already been addressed in 0.13.0. Thank you! |
Pandas 0.12.0 fails to print a column change in this specific example:
Cell D1 should be 42 in the last print statement.
After poking at this a bit, I discovered the following additional required conditions:
I am using Pandas '0.12.0' installed from pip on Linux Mint 14. Here is the (relevant) output from print_versions.py:
I have no idea where the root cause of this bug is -- it may not necessarily be pandas's fault. I just hope someone else can reproduce it...
The text was updated successfully, but these errors were encountered: