Skip to content

BUG: Accessing DataFrame multi-index column *seems* to modify its content #8850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kubaczer opened this issue Nov 18, 2014 · 1 comment · Fixed by #8853
Closed

BUG: Accessing DataFrame multi-index column *seems* to modify its content #8850

kubaczer opened this issue Nov 18, 2014 · 1 comment · Fixed by #8853
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@kubaczer
Copy link

In the example below, [13] gives a different result than [11], even though you would think that [12] has no effect on data.

Python 2.7.8 (default, Oct 19 2014, 16:02:00) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: d = {'COLA': {0: 0, 1: 0, 2: 0},
   ...:      'COLB': {0: 'a', 1: 'a', 2: 'a'},
   ...:      'COLC': {0: 0, 1: 0, 2: 1}}

In [4]: df = pd.DataFrame(d)

In [5]: df
Out[5]: 
   COLA COLB  COLC
0     0    a     0
1     0    a     0
2     0    a     1

In [6]: df = df.groupby(['COLA','COLB'])['COLC']\
   ...:         .agg({'Zeros': lambda x: 0,
   ...:               'Averages': lambda x: 100.*x.mean(),
   ...:               'Weird_stuff': np.size})\
   ...:     .unstack()

In [7]: df
Out[7]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333           3     0

In [8]: df.columns
Out[8]: 
MultiIndex(levels=[[u'Averages', u'Weird_stuff', u'Zeros'], [u'a']],
           labels=[[0, 1, 2], [0, 0, 0]],
           names=[None, u'COLB'])

In [9]: df.index
Out[9]: Int64Index([0], dtype='int64')

In [10]: df['Weird_stuff'] = df['Weird_stuff'].apply(lambda x: 1000000., axis=1)
/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py:3612: FutureWarning: in the future negative indices will not be ignored by `numpy.delete`.
  "`numpy.delete`.", FutureWarning)

In [11]: df
Out[11]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333     1000000     0

In [12]: df['Zeros']
Out[12]: 
COLB  a
COLA   
0     0

In [13]: df # Look the value below ('Weird_stuf', 'a') here and above.
Out[13]: 
       Averages Weird_stuff Zeros
COLB          a           a     a
COLA                             
0     33.333333           3     0

In [14]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 14.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: pl_PL.UTF-8

pandas: 0.15.1
nose: 1.3.0
Cython: 0.20.2
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.2
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: 1.2
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
@kubaczer kubaczer changed the title BUG: Accessing DataFrame column *seems* to modify its content BUG: Accessing DataFrame multi-index column *seems* to modify its content Nov 18, 2014
@jreback jreback added Bug IO SQL to_sql, read_sql, read_sql_query Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed IO SQL to_sql, read_sql, read_sql_query labels Nov 18, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 18, 2014
@jreback
Copy link
Contributor

jreback commented Nov 18, 2014

I think this is a manifestation of the same issue as in #8844

Essentially when you unstack the levels of the multi-index (the columns here) are not changes, only the labels are. (but the levels which don't appear are not included in the labels so it looks right).

so far so good. However, a further operation can cause this to break. The soln is to essentially a deep copy of the index during the unstack operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants