Skip to content

BUG: merge on multiple columns with mixed colums/index failing #11354

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

From the mailing list, considering this example:

In [56]: s = pd.Series(range(6), pd.MultiIndex.from_product([['A', 'B'], [1,2,3]], names=['lev1', 'lev2']), name='Amount')

In [57]: s
Out[57]:
lev1  lev2
A     1       0
      2       1
      3       2
B     1       3
      2       4
      3       5
Name: Amount, dtype: int64

In [58]: df = pd.DataFrame({'lev1':list('AAABBB'), 'lev2':[1,2,3,1,2,3], 'col':0})

In [59]: df
Out[59]:
   col lev1  lev2
0    0    A     1
1    0    A     2
2    0    A     3
3    0    B     1
4    0    B     2
5    0    B     3

So there are overlapping columns, but as the index in the series and as columns in the dataframe.
To merge both, these work:

In [63]: pd.concat([df.set_index(['lev1', 'lev2']), s], axis=1)
Out[63]:
           col  Amount
lev1 lev2
A    1       0       0
     2       0       1
     3       0       2
B    1       0       3
     2       0       4
     3       0       5

In [64]: pd.merge(df, s.reset_index(), on=['lev1', 'lev2'])
Out[64]:
   col lev1  lev2  Amount
0    0    A     1       0
1    0    A     2       1
2    0    A     3       2
3    0    B     1       3
4    0    B     2       4
5    0    B     3       5

But I would expect this also to work:

In [69]: df.merge(s, left_on=['lev1', 'lev2'], right_index=True)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-69-1b78b4267e47> in <module>()
----> 1 df.merge(s, left_on=['lev1', 'lev2'], right_index=True)

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in merge(self, right, how, o
n, left_on, right_on, left_index, right_index, sort, suffixes, copy)
   4069                      left_on=left_on, right_on=right_on,
   4070                      left_index=left_index, right_index=right_index, sor
t=sort,
-> 4071                      suffixes=suffixes, copy=copy)
   4072
   4073     #-------------------------------------------------------------------

---

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in merge(left, right, how,
on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     36                          right_index=right_index, sort=sort, suffixes=su
ffixes,
     37                          copy=copy)
---> 38     return op.get_result()
     39 if __debug__:
     40     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in get_result(self)
    184
    185     def get_result(self):
--> 186         join_index, left_indexer, right_indexer = self._get_join_info()
    187
    188         ldata, rdata = self.left._data, self.right._data

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in _get_join_info(self)
    254     def _get_join_info(self):
    255         left_ax = self.left._data.axes[self.axis]
--> 256         right_ax = self.right._data.axes[self.axis]
    257         if self.left_index and self.right_index:
    258             join_index, left_indexer, right_indexer = \

IndexError: list index out of range

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions