Skip to content

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gpcz opened this issue Aug 10, 2013 · 7 comments
Closed

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

gpcz opened this issue Aug 10, 2013 · 7 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@gpcz
Copy link

gpcz commented Aug 10, 2013

Pandas 0.12.0 fails to print a column change in this specific example:

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0  0.370188  0.437854 -0.020475  0.370188
1  1.839874  0.539278  0.954254  1.839874
2  0.183890 -0.924539 -0.943913  0.183890
3 -2.047403  1.213499 -0.742084 -2.047403
4 -1.003271  0.839525  0.154962 -1.003271
>>> xx['d'][1] = 42
>>> print xx
          a         b         c         d
0  0.370188  0.437854 -0.020475  0.370188
1  1.839874  0.539278  0.954254  1.839874
2  0.183890 -0.924539 -0.943913  0.183890
3 -2.047403  1.213499 -0.742084 -2.047403
4 -1.003271  0.839525  0.154962 -1.003271
>>> quit()

Cell D1 should be 42 in the last print statement.

After poking at this a bit, I discovered the following additional required conditions:

  1. You must enter this in the Python REPL (as in typing "python" at the command line and entering them manually). If you put the commands in a file and run the file, it will suddenly do what's expected.
  2. The np.may_share_memory() call must be there. If you omit it, it will suddenly do what's expected.
  3. A friend of mine attempted to reproduce this bug on Pandas 0.11.0, but he couldn't get it to work. Then again, he may have messed up conditions 1 and/or 2 -- he tried it before I realized the previous two conditions.

I am using Pandas '0.12.0' installed from pip on Linux Mint 14. Here is the (relevant) output from print_versions.py:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

Cython: Not installed
Numpy: 1.6.2
Scipy: 0.10.1
matplotlib: 1.1.1
lxml: 2.3.5

I have no idea where the root cause of this bug is -- it may not necessarily be pandas's fault. I just hope someone else can reproduce it...

@jtratner
Copy link
Contributor

in your example, what happens if you add an assert after the assignment?

xx['d'][1] = 42
assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42)

@gpcz
Copy link
Author

gpcz commented Aug 10, 2013

The assert passes, but printing it still yields the incorrect result.

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0  0.026757  0.180201  0.370744  0.026757
1 -0.935387  0.784654  1.573150 -0.935387
2  1.069168  0.154530 -2.280620  1.069168
3  3.467050  0.701040 -1.047132  3.467050
4  0.761229 -0.739372  0.537327  0.761229
>>> xx['d'][1] = 42
>>> assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42)
>>> print xx
          a         b         c         d
0  0.026757  0.180201  0.370744  0.026757
1 -0.935387  0.784654  1.573150 -0.935387
2  1.069168  0.154530 -2.280620  1.069168
3  3.467050  0.701040 -1.047132  3.467050
4  0.761229 -0.739372  0.537327  0.761229
>>> quit()

@jreback
Copy link
Contributor

jreback commented Aug 10, 2013

this is a chained assignment. there is an issue about it (but can't find it right this sec), there is a cache reference that is quite tricky to update, but fixed for 0.13

in any event this might work but cannot be relied upon. In particular for a multiple dtype frame it will not work, but a homogeneous dtyped frame it might be working on a view, depending on the exact memory layout and this could work

bottom line is don't use this syntax (I would raise on it but without deep inspections of the call stack it's pretty hard to figure out that this is even going on when in the chained setitem)

@jtratner
Copy link
Contributor

@jreback what's the preferred syntax for this?

@jreback
Copy link
Contributor

jreback commented Aug 10, 2013

the fix is in #4081, which was closed as I incorporated in. #3482

@jreback
Copy link
Contributor

jreback commented Aug 10, 2013

alwyas use ix/loc/ iloc

xx.loc[1,'d'] = 42

@gpcz
Copy link
Author

gpcz commented Aug 10, 2013

Using loc solved the problem:

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0 -0.554725 -0.024649 -0.044875 -0.554725
1  0.086357  0.974139  0.372063  0.086357
2  1.562558  0.881592 -0.267495  1.562558
3 -0.800493 -0.595808 -1.150203 -0.800493
4 -0.797210 -0.464506 -0.020967 -0.797210
>>> xx.loc[1,'d'] = 42
>>> print xx
          a         b         c          d
0 -0.554725 -0.024649 -0.044875  -0.554725
1  0.086357  0.974139  0.372063  42.000000
2  1.562558  0.881592 -0.267495   1.562558
3 -0.800493 -0.595808 -1.150203  -0.800493
4 -0.797210 -0.464506 -0.020967  -0.797210
>>> quit()

I'm glad that this problem has already been addressed in 0.13.0. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants