Skip to content

BUG: Raise ValueError for non numerical join columns in merge_asof #34488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
660d2f5
Raise ValueError for non numerical join columns in merge_asof
phofl May 30, 2020
cafedcc
Delete unused imports
phofl May 30, 2020
fa1d909
Implement review requests
phofl May 30, 2020
34a5633
Run black pandas
phofl May 30, 2020
783bc60
Change error message
phofl May 31, 2020
e086864
Change whats new entry
phofl May 31, 2020
a4001c6
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl May 31, 2020
aaa3480
Change error type
phofl May 31, 2020
ec3147e
Move dtype check
phofl May 31, 2020
129eb77
Use raw string
phofl May 31, 2020
4680833
Move check again
phofl Jun 1, 2020
fb55c9f
Move dtype check to _get_merge_keys
phofl Jun 1, 2020
72fb8b4
Run black pandas
phofl Jun 1, 2020
0103974
Overwrite _maybe_coerce_merge_keys
phofl Jun 1, 2020
4cc8bdc
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jun 3, 2020
a6048a6
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Sep 16, 2020
cdaa8ef
Move whats new entry
phofl Sep 16, 2020
47e211c
Add docs from other pr
phofl Sep 23, 2020
3129353
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Sep 23, 2020
0242f47
Run black pandas
phofl Sep 23, 2020
ef47721
Revert "Add docs from other pr"
phofl Sep 23, 2020
109c824
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Dec 29, 2020
3a66f05
Move whatsnew
phofl Dec 29, 2020
da377c7
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jan 1, 2021
c0ce857
Move merge asof validation
phofl Jan 4, 2021
2c164c5
Remove newline
phofl Jan 4, 2021
a00eb9c
Parametrize test further
phofl Jan 4, 2021
f62a873
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jan 4, 2021
4782233
Add comment
phofl Jan 4, 2021
6ebb7a3
Change Error
phofl Jan 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ Reshaping
- Bug in :func:`union_indexes` where input index names are not preserved in some cases. Affects :func:`concat` and :class:`DataFrame` constructor (:issue:`13475`)
- Bug in func :meth:`crosstab` when using multiple columns with ``margins=True`` and ``normalize=True`` (:issue:`35144`)
- Bug in :meth:`DataFrame.agg` with ``func={'name':<FUNC>}`` incorrectly raising ``TypeError`` when ``DataFrame.columns==['Name']`` (:issue:`36212`)
- :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`)
-

Sparse
Expand Down
14 changes: 14 additions & 0 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1635,6 +1635,7 @@ def _get_merge_keys(self):

# validate index types are the same
for i, (lk, rk) in enumerate(zip(left_join_keys, right_join_keys)):

if not is_dtype_equal(lk.dtype, rk.dtype):
if is_categorical_dtype(lk.dtype) and is_categorical_dtype(rk.dtype):
# The generic error message is confusing for categoricals.
Expand Down Expand Up @@ -1700,6 +1701,19 @@ def _get_merge_keys(self):

return left_join_keys, right_join_keys, join_names

def _maybe_coerce_merge_keys(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not just a part of _validate_specification? seems like re-inventing the wheel here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is way easier after knowing the join keys than before. But I think we could probably move this into validate_specification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah would really like it to be part of the main check

"""Check if the merge keys are not object and call super method."""
len_by_cols = 0 if self.left_by is None else len(self.left_by)
for i, (lk, rk) in enumerate(zip(self.left_join_keys, self.right_join_keys,)):
if i >= len_by_cols:
if is_object_dtype(rk.dtype) or is_object_dtype(lk.dtype):
raise ValueError(
f"Incompatible merge [{i}] dtype, {repr(lk.dtype)} and "
f"{repr(rk.dtype)}, both sides must have numeric dtype"
)

super()._maybe_coerce_merge_keys()

def _get_join_indexers(self):
""" return the join indexers """

Expand Down
32 changes: 32 additions & 0 deletions pandas/tests/reshape/merge/test_merge_asof.py
Original file line number Diff line number Diff line change
Expand Up @@ -1361,3 +1361,35 @@ def test_left_index_right_index_tolerance(self):
tolerance=pd.Timedelta(seconds=0.5),
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"data",
[["2019-06-01 00:09:12", "2019-06-01 00:10:29"], [1.0, "2019-06-01 00:10:29"]],
)
def test_non_numerical_dtype(data):
# GH 29130
left = pd.DataFrame({"x": data})
right = pd.DataFrame({"x": data})
with pytest.raises(
ValueError,
match=r"Incompatible merge \[0\] dtype, .*, both sides must have numeric dtype",
):
pd.merge_asof(left, right, on="x")


def test_some_issues():
left = pd.DataFrame({"a": ["12", "13", "15"], "left_val1": ["a", "b", "c"]})
right = pd.DataFrame({"a": ["a", "b", "c"], "left_val": ["d", "e", "f"]})
with pytest.raises(
ValueError,
match=r"Incompatible merge \[1\] dtype, .*, both sides must have numeric dtype",
):
pd.merge_asof(
left,
right,
left_on="left_val1",
right_on="a",
left_by="a",
right_by="left_val",
)