Skip to content

BUG: binary operator method alignment with integer level (GH9463) #9475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ Bug Fixes




- Bug in binary operator method (eg ``.mul()``) alignment with integer levels (:issue:`9463`).



Expand Down
17 changes: 6 additions & 11 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3172,30 +3172,25 @@ def _align_series(self, other, join='outer', axis=None, level=None,

else:

# for join compat if we have an unnamed index, but
# are specifying a level join
other_index = other.index
if level is not None and other.index.name is None:
other_index = other_index.set_names([level])

# one has > 1 ndim
fdata = self._data
if axis == 0:
join_index = self.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't being hit anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the code, this was hit, and this fixed the issue in the PR I linked, but it only fixed it for the case when you provide real level names for level (in this case, other is given the same name, and then Index.join will automatically join on the common level name.
When passing integer number to level, setting this number as the level name does not ensure that Index.join can join it correctly, as there are no common level names.

Another option is to set here the correct level name from self that corresponds to the provided level number, but as passing level=level to join also works (for both cases) and is simpler, I took that option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, your soln seems right/reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! still have to add a release note

lidx, ridx = None, None
if not self.index.equals(other_index):
join_index, lidx, ridx = self.index.join(
other_index, how=join, return_indexers=True)
if not self.index.equals(other.index):
join_index, lidx, ridx = \
self.index.join(other.index, how=join, level=level,
return_indexers=True)

if lidx is not None:
fdata = fdata.reindex_indexer(join_index, lidx, axis=1)

elif axis == 1:
join_index = self.columns
lidx, ridx = None, None
if not self.columns.equals(other_index):
if not self.columns.equals(other.index):
join_index, lidx, ridx = \
self.columns.join(other_index, how=join,
self.columns.join(other.index, how=join, level=level,
return_indexers=True)

if lidx is not None:
Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5428,6 +5428,35 @@ def test_binary_ops_align(self):
expected = pd.concat([ opa(df.loc[idx[:,i],:],v) for i, v in x.iteritems() ]).reindex_like(df).sortlevel()
assert_frame_equal(result, expected)

## GH9463 (alignment level of dataframe with series)

midx = MultiIndex.from_product([['A', 'B'],['a', 'b']])
df = DataFrame(np.ones((2,4), dtype='int64'), columns=midx)
s = pd.Series({'a':1, 'b':2})

df2 = df.copy()
df2.columns.names = ['lvl0', 'lvl1']
s2 = s.copy()
s2.index.name = 'lvl1'

# different cases of integer/string level names:
res1 = df.mul(s, axis=1, level=1)
res2 = df.mul(s2, axis=1, level=1)
res3 = df2.mul(s, axis=1, level=1)
res4 = df2.mul(s2, axis=1, level=1)
res5 = df2.mul(s, axis=1, level='lvl1')
res6 = df2.mul(s2, axis=1, level='lvl1')

exp = DataFrame(np.array([[1, 2, 1, 2], [1, 2, 1, 2]], dtype='int64'),
columns=midx)

for res in [res1, res2]:
assert_frame_equal(res, exp)

exp.columns.names = ['lvl0', 'lvl1']
for res in [res3, res4, res5, res6]:
assert_frame_equal(res, exp)

def test_arith_mixed(self):

left = DataFrame({'A': ['a', 'b', 'c'],
Expand Down