Skip to content

BUG-24212 fix when other_index has incompatible dtype #25009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 5, 2019
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b04cee7
BUG-24212 fix usage of Index.take in pd.merge
JustinZhengBC Jan 11, 2019
a64b8fe
BUG-24212 add comment
JustinZhengBC Jan 11, 2019
022643d
BUG-24212 clarify test
JustinZhengBC Jan 12, 2019
e99dece
BUG-24212 make _create_join_index function
JustinZhengBC Jan 14, 2019
b95e1fe
BUG-24212 add docstring and comments
JustinZhengBC Jan 17, 2019
73be0d0
BUG-24212 fix regression
JustinZhengBC Jan 24, 2019
de3e2c7
BUG-24212 alter old test
JustinZhengBC Jan 24, 2019
1287758
fix typo
JustinZhengBC Jan 24, 2019
bdce7ac
BUG-24212 remove print and move whatsnew note
JustinZhengBC Jan 24, 2019
83ae393
BUG-24212 fix when other_index has incompatible dtype
JustinZhengBC Jan 29, 2019
4cb3ab0
Merge branch 'master' into BUG-24212
JustinZhengBC Jan 29, 2019
66f6fe4
merge issue
JustinZhengBC Jan 29, 2019
cf6fa14
fix whatsnew
JustinZhengBC Jan 29, 2019
0e6de81
BUG-24212 fix test
JustinZhengBC Jan 29, 2019
1da789a
BUG-24212 fix test
JustinZhengBC Jan 29, 2019
a0e5ffc
Merge branch 'BUG-24212' of https://github.com/justinzhengbc/pandas i…
JustinZhengBC Jan 29, 2019
27cdbc8
BUG-24212 simplify take logic
JustinZhengBC Jan 31, 2019
cd326b2
fix import order
JustinZhengBC Jan 31, 2019
2c65ebf
Merge branch 'master' into BUG-24212
JustinZhengBC Mar 26, 2019
d8d3cdf
make logic more generic
JustinZhengBC Mar 26, 2019
f9e7386
make logic more generic
JustinZhengBC Mar 26, 2019
8a36130
Merge branch 'BUG-24212' of https://github.com/justinzhengbc/pandas i…
JustinZhengBC Mar 27, 2019
7da3655
clean up test
JustinZhengBC Mar 29, 2019
17c5497
use compat=False for na_value_for_dtype
JustinZhengBC Mar 29, 2019
720dfbb
Merge branch 'master' into BUG-24212
JustinZhengBC Apr 21, 2019
6772618
clarify whatsnew
JustinZhengBC Apr 22, 2019
dacb4bc
Merge branch 'master' into BUG-24212
JustinZhengBC Apr 22, 2019
cad4398
add PR number to whatsnew
JustinZhengBC Apr 22, 2019
5e2eb0f
Merge branch 'BUG-24212' of https://github.com/justinzhengbc/pandas i…
JustinZhengBC Apr 22, 2019
88cdf8b
Merge branch 'master' into BUG-24212
JustinZhengBC Apr 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -809,10 +809,18 @@ def _create_join_index(self, index, other_index, indexer,
# if values missing (-1) from target index,
# take from other_index instead
join_list = join_index.to_numpy()
other_list = other_index.take(other_indexer).to_numpy()
join_list[mask] = other_list[mask]
join_index = Index(join_list, dtype=join_index.dtype,
name=join_index.name)
try:
other_list = other_index.take(other_indexer).to_numpy()
join_list[mask] = other_list[mask]
join_index = Index(join_list, dtype=other_index.dtype,
name=other_index.name)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really don't want to do a try/except here. What is falling into the except?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When 'other_index' has a different dtype that causes an exception to be raised when values from it are inserted into the current index. I did not compare dtypes because in some cases differing dtypes are possible (example: int can be added to a Float64Index)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you:

  • always just do this (what you have in the except)
  • or use is_dtype_equal to test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is_dtype_equal would work because some combinations of different dtypes are still usable.

I agree with the first option because joining on the index of the other frame kind of makes the row order of the other frame arbitrary anyway.

# if other_index does not have a compatible dtype, naively
# replace missing values by their column position in other
naive_index = np.arange(len(other_index))
other_list = naive_index.take(other_indexer)
join_list[mask] = other_list[mask]
join_index = Index(join_list, name=other_index.name)
return join_index

def _get_merge_keys(self):
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -973,6 +973,21 @@ def test_merge_right_index_right(self):
how='right')
tm.assert_frame_equal(result, expected)

def test_merge_take_missing_values_from_index_of_other_dtype(self):
left = pd.DataFrame({'a': [1, 2, 3],
'key': pd.Categorical(['a', 'a', 'b'],
categories=list('abc'))})
right = pd.DataFrame({'b': [1, 2, 3]},
index=pd.CategoricalIndex(['a', 'b', 'c']))
result = left.merge(right, left_on='key',
right_index=True, how='right')
expected = pd.DataFrame({'a': [1, 2, 3, None],
'key': pd.Categorical(['a', 'a', 'b', 'c']),
'b': [1, 1, 2, 3]},
index=[0, 1, 2, 2])
expected = expected.reindex(columns=['a', 'key', 'b'])
tm.assert_frame_equal(result, expected)


def _check_merge(x, y):
for how in ['inner', 'left', 'outer']:
Expand Down