BUG: Accessing DataFrame multi-index column seems to modify its content #8850

kubaczer · 2014-11-18T14:48:27Z

In the example below, [13] gives a different result than [11], even though you would think that [12] has no effect on data.

Python 2.7.8 (default, Oct 19 2014, 16:02:00) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: d = {'COLA': {0: 0, 1: 0, 2: 0},
   ...:      'COLB': {0: 'a', 1: 'a', 2: 'a'},
   ...:      'COLC': {0: 0, 1: 0, 2: 1}}

In [4]: df = pd.DataFrame(d)

In [5]: df
Out[5]: 
   COLA COLB  COLC
0     0    a     0
1     0    a     0
2     0    a     1

In [6]: df = df.groupby(['COLA','COLB'])['COLC']\
   ...:         .agg({'Zeros': lambda x: 0,
   ...:               'Averages': lambda x: 100.*x.mean(),
   ...:               'Weird_stuff': np.size})\
   ...:     .unstack()

In [7]: df
Out[7]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333           3     0

In [8]: df.columns
Out[8]: 
MultiIndex(levels=[[u'Averages', u'Weird_stuff', u'Zeros'], [u'a']],
           labels=[[0, 1, 2], [0, 0, 0]],
           names=[None, u'COLB'])

In [9]: df.index
Out[9]: Int64Index([0], dtype='int64')

In [10]: df['Weird_stuff'] = df['Weird_stuff'].apply(lambda x: 1000000., axis=1)
/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py:3612: FutureWarning: in the future negative indices will not be ignored by `numpy.delete`.
  "`numpy.delete`.", FutureWarning)

In [11]: df
Out[11]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333     1000000     0

In [12]: df['Zeros']
Out[12]: 
COLB  a
COLA   
0     0

In [13]: df # Look the value below ('Weird_stuf', 'a') here and above.
Out[13]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333           3     0

In [14]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 14.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: pl_PL.UTF-8

pandas: 0.15.1
nose: 1.3.0
Cython: 0.20.2
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.2
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: 1.2
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2014-11-18T21:39:30Z

I think this is a manifestation of the same issue as in #8844

Essentially when you unstack the levels of the multi-index (the columns here) are not changes, only the labels are. (but the levels which don't appear are not included in the labels so it looks right).

so far so good. However, a further operation can cause this to break. The soln is to essentially a deep copy of the index during the unstack operation.

kubaczer changed the title ~~BUG: Accessing DataFrame column *seems* to modify its content~~ BUG: Accessing DataFrame multi-index column *seems* to modify its content Nov 18, 2014

seth-p mentioned this issue Nov 18, 2014

BUG: .stack(dropna=False) looks through views incorrectly for dataframe views with multi-index columns #8844

Closed

jreback added Bug IO SQL to_sql, read_sql, read_sql_query Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed IO SQL to_sql, read_sql, read_sql_query labels Nov 18, 2014

jreback added this to the 0.15.2 milestone Nov 18, 2014

behzadnouri mentioned this issue Nov 19, 2014

BUG: type change breaks BlockManager integrity #8853

Merged

jreback closed this as completed in #8853 Nov 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Accessing DataFrame multi-index column seems to modify its content #8850

BUG: Accessing DataFrame multi-index column seems to modify its content #8850

kubaczer commented Nov 18, 2014

jreback commented Nov 18, 2014

BUG: Accessing DataFrame multi-index column *seems* to modify its content #8850

BUG: Accessing DataFrame multi-index column *seems* to modify its content #8850

Comments

kubaczer commented Nov 18, 2014

jreback commented Nov 18, 2014

BUG: Accessing DataFrame multi-index column seems to modify its content #8850

BUG: Accessing DataFrame multi-index column seems to modify its content #8850