Skip to content

ENH: clearer error msg for unequal categoricals in merge_asof (GH#26136) #26242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Other Enhancements
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
- :func:`merge_asof` now gives a clearer error message when merge keys are categoricals that are not equal (:issue:`26136`)

.. _whatsnew_0250.api_breaking:

Expand Down
16 changes: 12 additions & 4 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1446,10 +1446,18 @@ def _get_merge_keys(self):
# validate index types are the same
for i, (lk, rk) in enumerate(zip(left_join_keys, right_join_keys)):
if not is_dtype_equal(lk.dtype, rk.dtype):
raise MergeError("incompatible merge keys [{i}] {lkdtype} and "
"{rkdtype}, must be the same type"
.format(i=i, lkdtype=lk.dtype,
rkdtype=rk.dtype))
if (is_categorical_dtype(lk.dtype) and
is_categorical_dtype(rk.dtype)):
# The generic error message is confusing for categoricals.
msg = ("incompatible merge keys [{i}] both sides "
"category, but not equal ones"
.format(i=i))
else:
msg = ("incompatible merge keys [{i}] {lkdtype} and "
"{rkdtype}, must be the same type"
.format(i=i, lkdtype=lk.dtype,
rkdtype=rk.dtype))
raise MergeError(msg)

# validate tolerance; must be a Timedelta if we have a DTI
if self.tolerance is not None:
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/reshape/merge/test_merge_asof.py
Original file line number Diff line number Diff line change
Expand Up @@ -1006,6 +1006,18 @@ def test_merge_datatype_error(self):
with pytest.raises(MergeError, match=msg):
merge_asof(left, right, on='a')

def test_merge_datatype_categorical_error(self):
""" Tests merge datatype mismatch error """
msg = r'merge keys \[0\] both sides category, but not equal ones'

left = pd.DataFrame({'left_val': [1, 5, 10],
'a': pd.Categorical(['a', 'b', 'c'])})
right = pd.DataFrame({'right_val': [1, 2, 3, 6, 7],
'a': pd.Categorical(['a', 'X', 'c', 'X', 'b'])})

with pytest.raises(MergeError, match=msg):
merge_asof(left, right, on='a')

@pytest.mark.parametrize('func', [lambda x: x, lambda x: to_datetime(x)],
ids=['numeric', 'datetime'])
@pytest.mark.parametrize('side', ['left', 'right'])
Expand Down