BUG: merging with mixed types objects in py3 when unorderable #12814

sbuser · 2016-04-06T16:03:22Z

Code Sample, a copy-pastable example if possible

    df1 = dfAllStudies.copy(deep=True)  #make 2 copies of the same data
    df2 = dfAllStudies.copy(deep=True)

    #take a subset of 1 of the identical frames by dropping rows with certain dates
    df2 = df2[(df2['Reviewed on'] >= np.datetime64(DateWindowStart))]  

    #attempt to merge the frames on indexes
    common = pd.merge(df1, df2, left_index=True, right_index=True)

Whatever my data looks like, a copy, minus some rows, merged (on indexes) with a copy should function without error, no?

Expected Output

a merged dataframe and not:

\Anaconda3\lib\site-packages\pandas\tools\merge.py", line 535, in _get_join_indexers llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys))) TypeError: type object argument after * must be a sequence, not map

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
None

The text was updated successfully, but these errors were encountered:

jreback · 2016-04-06T16:09:55Z

pls provide a minimal but complete copy-pastable example

sbuser · 2016-04-06T16:30:02Z

Weirdly the following trivial example does work:

import pandas as pd

d = {'col1': 'foo', 'col2': 'bar'}
df = pd.DataFrame(data=d, index=[1, 2, 3, str(1)])

df1 = df.copy(deep=True)
print(df1)

df2 = df.copy(deep=True)
print(df2)

df2 = df2[(df2.index != 1)] #take a subset
print(df2)

common = pd.merge(df1, df2, left_index=True, right_index=True)
print(common)

So the problem must be with my data somehow. What's the best way to get my data into a trivial example? If I copy out strings I potentially lose datatypes and so forth. Will a pickled version of a few rows work?

jreback · 2016-04-06T16:34:30Z

sure if you have a reproducible and you don't mind sharing then that would work.

adamdivak · 2016-06-13T14:47:42Z

This is rather tricky to reproduce, but I had the same issue. Here is a minimal example that triggers it for me:

import pandas as pd
from math import nan
a = pd.DataFrame({'a': [1, 2, 3]}, index=[1, 2, 'a'])
b = pd.DataFrame({'b': [2, 3, 4]}, index=[1, nan, nan])
a.join(b)

I had to try a lot of combinations to nail it down, and it seems that the following conditions are needed to trigger this:

Exactly one of the indices is of type object, the other one is of type float - that's why one index contains a string in the example. (If both are object or both are float then it does not produce an error)
One index contains at least two nan values
The values are irrelevant, this is specifically about indices

Cheers,
Adam

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-36-generic
machine: x86_64
processor: 
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.1
setuptools: 22.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.9
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

jreback · 2016-06-14T13:04:26Z

you realized that doing from math import nan is completely useless as numpy is the definer of nan (they are the same), but that is completely non-idiomatic and just plain confusing.

The issue is mixed object indexes, not a good idea to have mixed types like this in the index ever (or in a column for that matter).

Yes this does trigger an error. If you want to have a look, go for it.

ipdb> p list(map(fkeys, left_keys, right_keys))
*** TypeError: unorderable types: str() > int()

jreback · 2016-06-14T13:05:32Z

xref #13432 which is the same unsortable condition.

cc @pijucha

1. Added an internal `safe_sort` to safely sort mixed-integer arrays in Python3. 2. Changed Index.difference and Index.symmetric_difference in order to: - sort mixed-int Indexes (pandas-dev#13432) - improve performance (pandas-dev#12044) 3. Fixed DataFrame.join which raised in Python3 with mixed-int non-unique indexes (issue with sorting mixed-ints, pandas-dev#12814) 4. Fixed Index.union returning an empty Index when one of arguments was a named empty Index (pandas-dev#13432)

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 6, 2016

jreback added Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Jun 14, 2016

jreback added this to the Next Major Release milestone Jun 14, 2016

jreback changed the title ~~pd.merge() fails in odd ways?~~ BUG: merging with mixed types objects in py3 when unorderable Jun 14, 2016

This was referenced Jun 22, 2016

BUG: Index set operations issues #13432

Closed

BUG/PERF: Sort mixed-int in Py3, fix Index.difference #13514

Closed

jreback modified the milestones: 0.18.2, Next Major Release Jun 27, 2016

jreback closed this as completed in b225cac Jul 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: merging with mixed types objects in py3 when unorderable #12814

BUG: merging with mixed types objects in py3 when unorderable #12814

sbuser commented Apr 6, 2016

jreback commented Apr 6, 2016

sbuser commented Apr 6, 2016

jreback commented Apr 6, 2016

adamdivak commented Jun 13, 2016

jreback commented Jun 14, 2016

jreback commented Jun 14, 2016

BUG: merging with mixed types objects in py3 when unorderable #12814

BUG: merging with mixed types objects in py3 when unorderable #12814

Comments

sbuser commented Apr 6, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Apr 6, 2016

sbuser commented Apr 6, 2016

jreback commented Apr 6, 2016

adamdivak commented Jun 13, 2016

jreback commented Jun 14, 2016

jreback commented Jun 14, 2016

output of `pd.show_versions()`