-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG]: Fix ValueError in concat() when at least one Index has duplicates #36290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6105102
7c9b8f5
b4c3b77
071be85
7129ced
9fcf794
9cce3ff
d0c4ea5
a96b262
2448326
e21939f
23344a2
a4e1851
354048d
ab8b03b
6fd368d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2150,3 +2150,24 @@ def _sort_tuples(values: np.ndarray[tuple]): | |
arrays, _ = to_arrays(values, None) | ||
indexer = lexsort_indexer(arrays, orders=True) | ||
return values[indexer] | ||
|
||
|
||
def make_duplicates_of_left_unique_in_right( | ||
left: np.ndarray, right: np.ndarray | ||
) -> np.ndarray: | ||
""" | ||
If left has duplicates, which are also duplicated in right, this duplicated values | ||
are dropped from right, meaning that every duplicate value from left exists only | ||
once in right. | ||
|
||
Parameters | ||
---------- | ||
left: ndarray | ||
right: ndarray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dtypes unrestricted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could not think of anything, why they should be restricted. |
||
|
||
Returns | ||
------- | ||
Duplicates of left are unique in right | ||
""" | ||
left_duplicates = unique(left[duplicated(left)]) | ||
return right[~(duplicated(right) & isin(right, left_duplicates))] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,3 +167,14 @@ def test_concat_dataframe_keys_bug(self, sort): | |
# it works | ||
result = concat([t1, t2], axis=1, keys=["t1", "t2"], sort=sort) | ||
assert list(result.columns) == [("t1", "value"), ("t2", "value")] | ||
|
||
def test_concat_duplicate_indexes(self): | ||
# GH#36263 ValueError with non unique indexes | ||
df1 = DataFrame([1, 2, 3, 4], index=[0, 1, 1, 4], columns=["a"]) | ||
df2 = DataFrame([6, 7, 8, 9], index=[0, 0, 1, 3], columns=["b"]) | ||
result = concat([df1, df2], axis=1) | ||
expected = DataFrame( | ||
{"a": [1, 1, 2, 3, np.nan, 4], "b": [6, 7, 8, 8, 9, np.nan]}, | ||
index=Index([0, 0, 1, 1, 3, 4]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to make sure i understand this, for any
We expect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes exactly. Thats perfectly on point. |
||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this related to or useful for the index.union-with-duplicates stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass in the union as left and right, you would get the distinct result. Have to take a look if we can use this.