Skip to content

BUG: Add np.uintc to _factorizers in merge.py to fix KeyError when merging DataFrames with uintc columns #58727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1774489
Add np.uintc to _factorizers in merge.py to fix KeyError when merging…
Tirthchoksi22 May 15, 2024
5372107
add np.uintc to _factorizers in merge.py
Tirthchoksi22 May 15, 2024
3d75d94
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2024
0f79322
changes according to review
Tirthchoksi22 May 15, 2024
2c18b6b
Merge branch 'main' of https://github.com/Tirthchoksi22/pandas
Tirthchoksi22 May 15, 2024
1373e05
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2024
16adf4b
final commit
Tirthchoksi22 May 15, 2024
523efa4
final commit
Tirthchoksi22 May 16, 2024
f3de1f9
Merge branch 'main' into main
Tirthchoksi22 May 16, 2024
67297bd
doc commit
Tirthchoksi22 May 16, 2024
61ff0e2
Merge branch 'main' of https://github.com/Tirthchoksi22/pandas
Tirthchoksi22 May 16, 2024
c05255a
final commit
Tirthchoksi22 May 16, 2024
87cb28d
Merge branch 'main' into main
Tirthchoksi22 May 16, 2024
2adb5fc
final
Tirthchoksi22 May 16, 2024
88a1618
final
Tirthchoksi22 May 16, 2024
90e0b93
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2024
1b9e3d0
indentation change
Tirthchoksi22 May 16, 2024
4da0b86
indentation error solved
Tirthchoksi22 May 16, 2024
95bca2c
error solved
Tirthchoksi22 May 16, 2024
a37151e
Update pandas/core/reshape/merge.py
Tirthchoksi22 May 16, 2024
c5a3ccc
update
Tirthchoksi22 May 16, 2024
402c33c
Merge branch 'main' of https://github.com/Tirthchoksi22/pandas
Tirthchoksi22 May 16, 2024
7438297
update as said
Tirthchoksi22 May 16, 2024
9105c97
upadte
Tirthchoksi22 May 16, 2024
e3b76a1
Merge branch 'main' into main
Tirthchoksi22 May 16, 2024
8506f78
update
Tirthchoksi22 May 16, 2024
621c8d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2024
50fe143
Merge branch 'main' of https://github.com/Tirthchoksi22/pandas
Tirthchoksi22 May 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ Groupby/resample/rolling
Reshaping
^^^^^^^^^
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
- Fixed issue in `pd.merge` (`#58713`) where merging DataFrames with `np.intc` or `np.uintc` data types caused unexpected behavior or errors. Comprehensive testing now ensures consistent behavior across diverse data type combinations, enhancing stability and robustness of data merging operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you format like the notes around it.

start with "Bug in ...", keep the description short (one line) but be specific (mention "Windows"), and end with (:issue:`58713`)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

-

Sparse
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,9 @@
if np.intc is not np.int32:
_factorizers[np.intc] = libhashtable.Int64Factorizer

if np.uintc is not np.uint32:
_factorizers[np.uintc] = libhashtable.UInt64Factorizer

_known = (np.ndarray, ExtensionArray, Index, ABCSeries)


Expand Down
32 changes: 32 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1523,6 +1523,38 @@ def test_join_multi_dtypes(self, any_int_numpy_dtype, d2):
expected.sort_values(["k1", "k2"], kind="mergesort", inplace=True)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("d2", [np.int64, np.float64, np.float32, np.float16])
def test_join_multi_dtypes_with_uintc(self, np.uintc, d2):
dtype1 = np.dtype(np.uintc)
dtype2 = np.dtype(d2)

left = DataFrame(
{
"k1": np.array([0, 1, 2] * 8, dtype=dtype1),
"k2": ["foo", "bar"] * 12,
"v": np.array(np.arange(24), dtype=np.int64),
}
)

index = MultiIndex.from_tuples([(2, "bar"), (1, "foo")])
right = DataFrame({"v2": np.array([5, 7], dtype=dtype2)}, index=index)

result = left.join(right, on=["k1", "k2"])

expected = left.copy()

if dtype2.kind == "i":
dtype2 = np.dtype("float64")
expected["v2"] = np.array(np.nan, dtype=dtype2)
expected.loc[(expected.k1 == 2) & (expected.k2 == "bar"), "v2"] = 5
expected.loc[(expected.k1 == 1) & (expected.k2 == "foo"), "v2"] = 7

tm.assert_frame_equal(result, expected)

result = left.join(right, on=["k1", "k2"], sort=True)
expected.sort_values(["k1", "k2"], kind="mergesort", inplace=True)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize(
"int_vals, float_vals, exp_vals",
[
Expand Down
Loading