Skip to content

Aligning on multi-index with swapped levels gives unclear error message #9952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Apr 20, 2015 · 6 comments · Fixed by #41389
Closed

Aligning on multi-index with swapped levels gives unclear error message #9952

jorisvandenbossche opened this issue Apr 20, 2015 · 6 comments · Fixed by #41389
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

When trying to do a calculation with two dataframes with both multi-indexed rows, but with swapped levels, you get the error message "NotImplementedError: merging with more than one level overlap on a multi-index is not implemented".

This is not very informative in this case.

In [11]: df = pd.DataFrame({'a':np.random.randn(6)}, index=pd.MultiIndex.from_product([['a', 'b'],[0,1,2]], names=['levA', 'levB']))

In [12]: df2 = df.copy()

In [13]: df2.index = df2.index.swaplevel(0,1)

In [14]: df
Out[14]:
                  a
levA levB
a    0    -0.248626
     1     2.382662
     2     1.349179
b    0     0.720629
     1     0.205297
     2     0.113340

In [15]: df2
Out[15]:
                  a
levB levA
0    a    -0.248626
1    a     2.382662
2    a     1.349179
0    b     0.720629
1    b     0.205297
2    b     0.113340

In [16]:

In [16]: df - df2
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-16-ffb739d7f3d7> in <module>()
----> 1 df - df2

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\ops.pyc in f(self, other, axis,
 level, fill_value)
    813     def f(self, other, axis=default_axis, level=None, fill_value=None):
    814         if isinstance(other, pd.DataFrame):    # Another DataFrame
--> 815             return self._combine_frame(other, na_op, fill_value, level)
    816         elif isinstance(other, pd.Series):
    817             return self._combine_series(other, na_op, fill_value, axis,
level)

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\frame.pyc in _combine_frame(sel
f, other, func, fill_value, level)
   3099
   3100     def _combine_frame(self, other, func, fill_value=None, level=None):
-> 3101         this, other = self.align(other, join='outer', level=level, copy=
False)
   3102         new_index, new_columns = this.index, this.columns
   3103

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\generic.pyc in align(self, othe
r, join, axis, level, copy, fill_value, method, limit, fill_axis)
   3146                                      copy=copy, fill_value=fill_value,
   3147                                      method=method, limit=limit,
-> 3148                                      fill_axis=fill_axis)
   3149         elif isinstance(other, Series):
   3150             return self._align_series(other, join=join, axis=axis, level
=level,

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\generic.pyc in _align_frame(sel
f, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   3167                 join_index, ilidx, iridx = \
   3168                     self.index.join(other.index, how=join, level=level,
-> 3169                                     return_indexers=True)
   3170
   3171         if axis is None or axis == 1:

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\index.pyc in join(self, other,
how, level, return_indexers)
   1781                 pass
   1782             else:
-> 1783                 return self._join_multi(other, how=how, return_indexers=
return_indexers)
   1784
   1785         # join on the level

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\index.pyc in _join_multi(self,
other, how, return_indexers)
   1876             raise ValueError("cannot join with no level specified and no
 overlapping names")
   1877         if len(overlap) > 1:
-> 1878             raise NotImplementedError("merging with more than one level
overlap on a multi-index is not implemented")
   1879         jl = overlap[0]
   1880

NotImplementedError: merging with more than one level overlap on a multi-index i
s not implemented
@jorisvandenbossche
Copy link
Member Author

xref #9368, #5645

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Error Reporting Incorrect or improved errors from pandas labels Apr 20, 2015
@jreback jreback added this to the Next Major Release milestone Apr 20, 2015
@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

I suppose you could do a heuristic which tries to 'compare' two multi-indexes to give better feedback.

@jorisvandenbossche
Copy link
Member Author

Specific to the error message, I suppose the problem is that this message is also used eg when merging a multi-index with a multi-index when the levels are swapped? So probably difficult to get this 'useful' message for all use cases.

And regarding the actual use case, is this something we would want to be possible? Should alignment also try to align the levels of a multi-index first before aligning the index itself?

@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

I think you could simply raise a more informative message if you 'guess' that the levels are swapped. I wouldn't actually try to align though.

@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

Pandas already does automatic alignment in many cases, so I would like to be able to say that we should align the levels of a multi-index -- except for the fact that currently index names are just metadata, so we would have to guess about the right ordering. Given this fact, I agree with @jreback -- we should just raise a better error message.

@mroeschke
Copy link
Member

This appears to "work" now (and give a sensible result IMO). I think it would be okay to add a test for this behavior

In [8]: In [11]: df = pd.DataFrame({'a':np.random.randn(6)}, index=pd.MultiIndex.from_product([['a', 'b'],[0,1,2]], names=['
   ...: levA', 'levB']))
   ...:
   ...: In [12]: df2 = df.copy()
   ...:
   ...: In [13]: df2.index = df2.index.swaplevel(0,1)

In [9]: df - df2
Out[9]:
             a
levA levB
a    0     0.0
     1     0.0
     2     0.0
b    0     0.0
     1     0.0
     2     0.0

In [12]: pd.__version__
Out[12]: '1.3.0.dev0+1351.g04f9a4b10d.dirty'

@mroeschke mroeschke added Needs Tests Unit test(s) needed to prevent regressions and removed Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 18, 2021
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.3 May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants