Skip to content

DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex #11372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rekcahpassyla opened this issue Oct 19, 2015 · 6 comments · Fixed by #11400
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@rekcahpassyla
Copy link
Contributor

This code is broken in 0.17.0 but not in 0.15.2:

import pandas as pd
import numpy as np

np.random.seed(1)

from itertools import product

from pandas.util.testing import assert_frame_equal

pd.show_versions()

idx = pd.MultiIndex.from_tuples(
    list(
        product(['A', 'B', 'C'], 
                pd.date_range('2015-01-01', '2015-04-01', freq='MS'))
    )
)

sub = pd.MultiIndex.from_tuples(
    [('A', pd.Timestamp('2015-01-01')), ('A', pd.Timestamp('2015-02-01'))]
)
# if cols = ['foo', 'bar', 'baz', 'quux'], there is no error. 
cols = pd.MultiIndex.from_tuples(
    list(
        product(['foo', 'bar'], 
                pd.date_range('2015-01-01', '2015-02-01', freq='MS'))
    )
)

test = pd.DataFrame(np.random.random((12, 4)), index=idx, columns=cols)
vals = pd.DataFrame(np.random.random((2, 4)), index=sub, columns=cols)
test.ix[sub, :] = vals

print test.ix[sub, :]
print vals

assert_frame_equal(test.ix[sub, :], vals)

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo                   bar
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.287775   0.130029   0.019367   0.678836
                    foo                   bar
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
Traceback (most recent call last):
  File "c:\dev\code\sandbox\multiindex.py", line 41, in <module>
    assert_frame_equal(test.ix[sub, :], vals)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 1028, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 925, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 798, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (50.0 %)
[left]:  [0.287775338586, 0.287775338586]
[right]: [0.287775338586, 0.211628116]

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo                   bar           
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
                    foo                   bar           
             2015-01-01 2015-02-01 2015-01-01 2015-02-01
A 2015-01-01   0.287775   0.130029   0.019367   0.678836
  2015-02-01   0.211628   0.265547   0.491573   0.053363
@rekcahpassyla
Copy link
Contributor Author

Indexing with a specific set of columns also gives the error:

Code sample:

import pandas as pd
import numpy as np

np.random.seed(1)

from itertools import product

from pandas.util.testing import assert_frame_equal

pd.show_versions()

idx = pd.MultiIndex.from_tuples(
    list(
        product(['A', 'B', 'C'], 
                pd.date_range('2015-01-01', '2015-04-01', freq='MS'))
    )
)
cols = pd.MultiIndex.from_tuples(
    list(
        product(['foo', 'bar'], 
                pd.date_range('2016-01-01', '2016-02-01', freq='MS'))
    )
)

# if cols = ['foo', 'bar', 'baz', 'quux'], there is no error. 

test = pd.DataFrame(np.random.random((12, 4)), index=idx, columns=cols)


subidx = pd.MultiIndex.from_tuples(
    [('A', pd.Timestamp('2015-01-01')), ('A', pd.Timestamp('2015-02-01'))]
)

subcols = pd.MultiIndex.from_tuples(
    [('foo', pd.Timestamp('2016-01-01')), ('foo', pd.Timestamp('2016-02-01'))]
)


vals = pd.DataFrame(np.random.random((2, 2)), index=subidx, columns=subcols)


test.ix[subidx, subcols] = vals

print test.ix[subidx, subcols]

print vals

assert_frame_equal(test.ix[subidx, subcols], vals)

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.287775   0.130029
                    foo
             2016-01-01 2016-02-01
A 2015-01-01   0.287775   0.130029
  2015-02-01   0.019367   0.678836
Traceback (most recent call last):
  File "c:\dev\code\sandbox\multiindex.py", line 48, in <module>
    assert_frame_equal(test.ix[subidx, subcols], vals)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 1028, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 925, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "c:\python\envs\pd017\lib\site-packages\pandas\util\testing.py", line 798, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (50.0 %)
[left]:  [0.287775338586, 0.287775338586]
[right]: [0.287775338586, 0.0193669578703]

@rekcahpassyla
Copy link
Contributor Author

(Deleted- misread something, my previous suggestion was not really a fix)

@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

hmm, surprised that broke. there is not must testing on that sub-section actually

The issue is here: https://github.com/pydata/pandas/blob/master/pandas/core/indexing.py#L450

self._align_series is called on a sub-section of the frame, but in the aligner, it looks at it and says oh you are a frame so gives back the wrong result.

So could prob pass in an additional parameter which would determine this.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Oct 19, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 19, 2015
@jreback jreback changed the title DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex Oct 19, 2015
@jreback jreback changed the title DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex DataFrame.ix[idx, :] = value sets wrong values when idx is a MultiIndex and DataFrame.columns is also a MultiIndex Oct 19, 2015
@rekcahpassyla
Copy link
Contributor Author

Since I've already got two test cases, I'd be happy to have a go if I can be pointed in the right direction. I'll start by looking at the history of indexing.py and following any referenced issues / PRs

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

the pointer above is to the relevant issues.

the way to do this is to setup the test cases and the expected results (in test_indexing); they should fail before a fix, then you can step thru to see where to put a fix and go from there

@rekcahpassyla
Copy link
Contributor Author

OK, here is my first attempt: #11400

I added a test for #5206 as well to test I hadn't broken that existing functionality.

C:\dev\code\opensource\pandas-rekcahpassyla [multiindex_setitem +2 ~0 -0 !]> C:\python\envs\pandasdev\scripts\nosetests .\pandas\tests\test_indexing.py
...........................................................................................................................................
----------------------------------------------------------------------
Ran 139 tests in 68.094s

OK

Attempted to run the whole test suite, but test_max_ext_len (pandas.tests.test_msgpack.test_limits.TestLimits) eats up 4GB of memory and causes Python to crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants