Skip to content

MultiIndex copies don't have _id field set. #9967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Screamik opened this issue Apr 22, 2015 · 4 comments · Fixed by #14936
Closed

MultiIndex copies don't have _id field set. #9967

Screamik opened this issue Apr 22, 2015 · 4 comments · Fixed by #14936

Comments

@Screamik
Copy link

I have stumbled across a curious issue when working with pandas.concat.
Let's consider a simple example:

import pandas as pd
from copy import deepcopy
example_multiindex1  = pd.MultiIndex.from_product([['a'], ['b']])
example_dataframe1 = pd.DataFrame([0], index=example_multiindex1)

example_multiindex2  = pd.MultiIndex.from_product([['a'], ['c']])
example_dataframe2 = pd.DataFrame([1], index=example_multiindex2)

example_dict = {'s1': example_dataframe1, 's2': example_dataframe2}
print pd.concat(example_dict, names=['testname'])

The output is as expected:

              0
testname       
s1       a b  0
s2       a c  1

Strange thing will happen if we pass to concat not an original dict but a deepcopy of it:

print pd.concat(deepcopy(example_dict), names=['testname'])

The output, surprisingly, will be:

              0
testname       
s1       a b  0
s2       a b  1

Thus meaning, that the multiindex of the first DataFrame was taken two times.
Contents of the copied dict though seems to be correct at first glance:

>>> print deepcopy(example_dict)
{'s2':      0
a c  1, 's1':      0
a b  0}

PS. Version information:

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.11.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.6
Cython: 0.20.2
numpy: 1.9.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 3.1.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

Issue is reproducible down to version 0.15.0. Version 0.14.1 is fine.

@Screamik
Copy link
Author

If we add _set_identity = True line to deep branch of MultiIndex's copy method - it will fix the issue. But I wonder, why is _set_identity set to False by default?

@Screamik
Copy link
Author

Found why concat fails with deepcopied MultiIndex
File merge.py function _make_concat_multiindex:

1052    if not _all_indexes_same(indexes):
                #functionality to actually concat multiindex
                ...
                ...
1095    new_index = indexes[0]

So, if we think that all indexes are the same we just take the first one.
Now let's consider function _all_indexes_same:

def _all_indexes_same(indexes):
    first = indexes[0]
    for index in indexes[1:]:
        if not first.equals(index):
            return False
    return True

It calls function equals of MultiIndex which in turn uses function is_ based on _id field of MultiIndex. As mentioned in above comment all copies of MultiIndex (even deep ones) are created with _id field equaling None, thus effectively rendering our copies indifferentiable.

I'll rename the Issue.

@Screamik Screamik changed the title Concat results in wrong index - Multiindex deepcopy issue. MultiIndex copies doesn't have _id field set. Apr 23, 2015
@Screamik Screamik changed the title MultiIndex copies doesn't have _id field set. MultiIndex copies don't have _id field set. Apr 23, 2015
@jreback
Copy link
Contributor

jreback commented Apr 23, 2015

I don't recall why this was done like this. You can try changing and see what pops up in the test suite.

@jreback jreback added this to the Next Major Release milestone Apr 23, 2015
@Screamik
Copy link
Author

Tests seem to be unharmed:

Ran 7631 tests in 694.204s

OK (SKIP=455)

mroeschke added a commit to mroeschke/pandas that referenced this issue Dec 21, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Dec 21, 2016
mroeschke added a commit to mroeschke/pandas that referenced this issue Dec 22, 2016
jreback pushed a commit that referenced this issue Dec 22, 2016
ShaharBental pushed a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants