Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

gpcz · 2013-08-10T02:33:36Z

Pandas 0.12.0 fails to print a column change in this specific example:

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0  0.370188  0.437854 -0.020475  0.370188
1  1.839874  0.539278  0.954254  1.839874
2  0.183890 -0.924539 -0.943913  0.183890
3 -2.047403  1.213499 -0.742084 -2.047403
4 -1.003271  0.839525  0.154962 -1.003271
>>> xx['d'][1] = 42
>>> print xx
          a         b         c         d
0  0.370188  0.437854 -0.020475  0.370188
1  1.839874  0.539278  0.954254  1.839874
2  0.183890 -0.924539 -0.943913  0.183890
3 -2.047403  1.213499 -0.742084 -2.047403
4 -1.003271  0.839525  0.154962 -1.003271
>>> quit()

Cell D1 should be 42 in the last print statement.

After poking at this a bit, I discovered the following additional required conditions:

You must enter this in the Python REPL (as in typing "python" at the command line and entering them manually). If you put the commands in a file and run the file, it will suddenly do what's expected.
The np.may_share_memory() call must be there. If you omit it, it will suddenly do what's expected.
A friend of mine attempted to reproduce this bug on Pandas 0.11.0, but he couldn't get it to work. Then again, he may have messed up conditions 1 and/or 2 -- he tried it before I realized the previous two conditions.

I am using Pandas '0.12.0' installed from pip on Linux Mint 14. Here is the (relevant) output from print_versions.py:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

Cython: Not installed
Numpy: 1.6.2
Scipy: 0.10.1
matplotlib: 1.1.1
lxml: 2.3.5

I have no idea where the root cause of this bug is -- it may not necessarily be pandas's fault. I just hope someone else can reproduce it...

The text was updated successfully, but these errors were encountered:

jtratner · 2013-08-10T02:40:57Z

in your example, what happens if you add an assert after the assignment?

xx['d'][1] = 42
assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42)

gpcz · 2013-08-10T02:45:16Z

The assert passes, but printing it still yields the incorrect result.

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0  0.026757  0.180201  0.370744  0.026757
1 -0.935387  0.784654  1.573150 -0.935387
2  1.069168  0.154530 -2.280620  1.069168
3  3.467050  0.701040 -1.047132  3.467050
4  0.761229 -0.739372  0.537327  0.761229
>>> xx['d'][1] = 42
>>> assert xx["d"][1] == 42, "%r != %r" % (xx['d'][1], 42)
>>> print xx
          a         b         c         d
0  0.026757  0.180201  0.370744  0.026757
1 -0.935387  0.784654  1.573150 -0.935387
2  1.069168  0.154530 -2.280620  1.069168
3  3.467050  0.701040 -1.047132  3.467050
4  0.761229 -0.739372  0.537327  0.761229
>>> quit()

jreback · 2013-08-10T02:53:37Z

this is a chained assignment. there is an issue about it (but can't find it right this sec), there is a cache reference that is quite tricky to update, but fixed for 0.13

in any event this might work but cannot be relied upon. In particular for a multiple dtype frame it will not work, but a homogeneous dtyped frame it might be working on a view, depending on the exact memory layout and this could work

bottom line is don't use this syntax (I would raise on it but without deep inspections of the call stack it's pretty hard to figure out that this is even going on when in the chained setitem)

jtratner · 2013-08-10T03:00:07Z

@jreback what's the preferred syntax for this?

jreback · 2013-08-10T03:01:09Z

the fix is in #4081, which was closed as I incorporated in. #3482

jreback · 2013-08-10T03:03:13Z

alwyas use ix/loc/ iloc

xx.loc[1,'d'] = 42

gpcz · 2013-08-10T03:19:35Z

Using loc solved the problem:

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas
>>> length = 5
>>> xx = pandas.DataFrame({'a':np.random.randn(length),'b':np.random.randn(length),'c':np.random.randn(length)})
>>> xx['d']=xx['a']
>>> np.may_share_memory(xx['d'],xx['a'])
False
>>> print xx
          a         b         c         d
0 -0.554725 -0.024649 -0.044875 -0.554725
1  0.086357  0.974139  0.372063  0.086357
2  1.562558  0.881592 -0.267495  1.562558
3 -0.800493 -0.595808 -1.150203 -0.800493
4 -0.797210 -0.464506 -0.020967 -0.797210
>>> xx.loc[1,'d'] = 42
>>> print xx
          a         b         c          d
0 -0.554725 -0.024649 -0.044875  -0.554725
1  0.086357  0.974139  0.372063  42.000000
2  1.562558  0.881592 -0.267495   1.562558
3 -0.800493 -0.595808 -1.150203  -0.800493
4 -0.797210 -0.464506 -0.020967  -0.797210
>>> quit()

I'm glad that this problem has already been addressed in 0.13.0. Thank you!

jreback closed this as completed Aug 10, 2013

jreback mentioned this issue Aug 13, 2013

Issue Using Chained Accessors with Multiple dtypes & Performance Tips #4546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

gpcz commented Aug 10, 2013

jtratner commented Aug 10, 2013

gpcz commented Aug 10, 2013

jreback commented Aug 10, 2013

jtratner commented Aug 10, 2013

jreback commented Aug 10, 2013

jreback commented Aug 10, 2013

gpcz commented Aug 10, 2013

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

Bug: Bizarre DataFrame printing inconsistency (REPL only) #4531

Comments

gpcz commented Aug 10, 2013

jtratner commented Aug 10, 2013

gpcz commented Aug 10, 2013

jreback commented Aug 10, 2013

jtratner commented Aug 10, 2013

jreback commented Aug 10, 2013

jreback commented Aug 10, 2013

gpcz commented Aug 10, 2013