Skip to content

BUG: Raise ValueError for non numerical join columns in merge_asof #34488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
660d2f5
Raise ValueError for non numerical join columns in merge_asof
phofl May 30, 2020
cafedcc
Delete unused imports
phofl May 30, 2020
fa1d909
Implement review requests
phofl May 30, 2020
34a5633
Run black pandas
phofl May 30, 2020
783bc60
Change error message
phofl May 31, 2020
e086864
Change whats new entry
phofl May 31, 2020
a4001c6
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl May 31, 2020
aaa3480
Change error type
phofl May 31, 2020
ec3147e
Move dtype check
phofl May 31, 2020
129eb77
Use raw string
phofl May 31, 2020
4680833
Move check again
phofl Jun 1, 2020
fb55c9f
Move dtype check to _get_merge_keys
phofl Jun 1, 2020
72fb8b4
Run black pandas
phofl Jun 1, 2020
0103974
Overwrite _maybe_coerce_merge_keys
phofl Jun 1, 2020
4cc8bdc
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jun 3, 2020
a6048a6
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Sep 16, 2020
cdaa8ef
Move whats new entry
phofl Sep 16, 2020
47e211c
Add docs from other pr
phofl Sep 23, 2020
3129353
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Sep 23, 2020
0242f47
Run black pandas
phofl Sep 23, 2020
ef47721
Revert "Add docs from other pr"
phofl Sep 23, 2020
109c824
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Dec 29, 2020
3a66f05
Move whatsnew
phofl Dec 29, 2020
da377c7
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jan 1, 2021
c0ce857
Move merge asof validation
phofl Jan 4, 2021
2c164c5
Remove newline
phofl Jan 4, 2021
a00eb9c
Parametrize test further
phofl Jan 4, 2021
f62a873
Merge branch 'master' of https://github.com/pandas-dev/pandas into 29130
phofl Jan 4, 2021
4782233
Add comment
phofl Jan 4, 2021
6ebb7a3
Change Error
phofl Jan 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -957,6 +957,7 @@ Reshaping
- Bug in :func:`cut` raised an error when non-unique labels (:issue:`33141`)
- Ensure only named functions can be used in :func:`eval()` (:issue:`32460`)
- Fixed bug in :func:`melt` where melting MultiIndex columns with ``col_level`` > 0 would raise a ``KeyError`` on ``id_vars`` (:issue:`34129`)
- :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`)

Sparse
^^^^^^
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1637,9 +1637,11 @@ def _get_merge_keys(self):

# note this function has side effects
(left_join_keys, right_join_keys, join_names) = super()._get_merge_keys()
len_by_cols = 0 if self.left_by is None else len(self.left_by)

# validate index types are the same
for i, (lk, rk) in enumerate(zip(left_join_keys, right_join_keys)):

if not is_dtype_equal(lk.dtype, rk.dtype):
if is_categorical_dtype(lk.dtype) and is_categorical_dtype(rk.dtype):
# The generic error message is confusing for categoricals.
Expand All @@ -1661,6 +1663,13 @@ def _get_merge_keys(self):
)
raise MergeError(msg)

if i >= len_by_cols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't want to do this.

put your changes in the super call to _get_merge_keys. just do it upfront (that is what was not clear before).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean _maybe_coerce_merge_keys then? The dtype validation for the regular merge is done there.

The _get_merge_keys function I changed is already from _AsOfMerge, so I could not overwrite this again. This function overwrites the original _get_merge_keys function and calls the super() function before checking if both join cols have the same dtype

if is_object_dtype(rk.dtype) or is_object_dtype(lk.dtype):
raise ValueError(
f"Incompatible merge [{i}] dtype, {repr(lk.dtype)} and "
f"{repr(rk.dtype)}, both sides must have numeric dtype"
)

# validate tolerance; datetime.timedelta or Timedelta if we have a DTI
if self.tolerance is not None:

Expand Down
32 changes: 32 additions & 0 deletions pandas/tests/reshape/merge/test_merge_asof.py
Original file line number Diff line number Diff line change
Expand Up @@ -1339,3 +1339,35 @@ def test_merge_index_column_tz(self):
index=pd.Index([0, 1, 2, 3, 4]),
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"data",
[["2019-06-01 00:09:12", "2019-06-01 00:10:29"], [1.0, "2019-06-01 00:10:29"]],
)
def test_non_numerical_dtype(data):
# GH 29130
left = pd.DataFrame({"x": data})
right = pd.DataFrame({"x": data})
with pytest.raises(
ValueError,
match=r"Incompatible merge \[0\] dtype, .*, both sides must have numeric dtype",
):
pd.merge_asof(left, right, on="x")


def test_some_issues():
left = pd.DataFrame({"a": ["12", "13", "15"], "left_val1": ["a", "b", "c"]})
right = pd.DataFrame({"a": ["a", "b", "c"], "left_val": ["d", "e", "f"]})
with pytest.raises(
ValueError,
match=r"Incompatible merge \[1\] dtype, .*, both sides must have numeric dtype",
):
pd.merge_asof(
left,
right,
left_on="left_val1",
right_on="a",
left_by="a",
right_by="left_val",
)