-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG]: Fix ValueError in concat() when at least one Index has duplicates #36290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
6105102
7c9b8f5
b4c3b77
071be85
7129ced
9fcf794
9cce3ff
d0c4ea5
a96b262
2448326
e21939f
23344a2
a4e1851
354048d
ab8b03b
6fd368d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2149,3 +2149,23 @@ def _sort_tuples(values: np.ndarray[tuple]): | |
arrays, _ = to_arrays(values, None) | ||
indexer = lexsort_indexer(arrays, orders=True) | ||
return values[indexer] | ||
|
||
|
||
def make_duplicates_of_left_unique_in_right( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this related to or useful for the index.union-with-duplicates stuff? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you pass in the union as left and right, you would get the distinct result. Have to take a look if we can use this. |
||
left: np.ndarray, right: np.ndarray | ||
) -> np.ndarray: | ||
""" | ||
Drops all duplicates values from left in right, so that they are | ||
unique in right. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code itself looks good, but this sentence isn't clear to a reader without context There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Improved it? |
||
|
||
Parameters | ||
---------- | ||
left: ndarray | ||
right: ndarray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dtypes unrestricted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could not think of anything, why they should be restricted. |
||
|
||
Returns | ||
------- | ||
Duplicates of left are unique in right | ||
""" | ||
left_duplicates = unique(left[duplicated(left)]) | ||
return right[~(duplicated(right) & np.isin(right, left_duplicates))] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any reason to prefer np.isin vs the algos.isin? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not that i remember, changed it. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,3 +167,14 @@ def test_concat_dataframe_keys_bug(self, sort): | |
# it works | ||
result = concat([t1, t2], axis=1, keys=["t1", "t2"], sort=sort) | ||
assert list(result.columns) == [("t1", "value"), ("t2", "value")] | ||
|
||
def test_concat_duplicate_indexes(self): | ||
# GH#36263 ValueError with non unique indexes | ||
df1 = DataFrame([1, 2, 3, 4], index=[0, 1, 1, 4], columns=["a"]) | ||
df2 = DataFrame([6, 7, 8, 9], index=[0, 0, 1, 3], columns=["b"]) | ||
result = concat([df1, df2], axis=1) | ||
expected = DataFrame( | ||
{"a": [1, 1, 2, 3, np.nan, 4], "b": [6, 7, 8, 8, 9, np.nan]}, | ||
index=Index([0, 0, 1, 1, 3, 4]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to make sure i understand this, for any
We expect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes exactly. Thats perfectly on point. |
||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done