-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[WIP] Test (and more fixes) for duplicate indices with concat #38745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c6f1677
19c95f0
bb33098
11378a8
ca316ce
e59bef2
f176ad3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,5 +1,6 @@ | ||||||
from collections import abc, deque | ||||||
from decimal import Decimal | ||||||
from pandas.errors import InvalidIndexError | ||||||
from warnings import catch_warnings | ||||||
|
||||||
import numpy as np | ||||||
|
@@ -445,6 +446,26 @@ def test_concat_ordered_dict(self): | |||||
tm.assert_series_equal(result, expected) | ||||||
|
||||||
|
||||||
@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator()) | ||||||
ivirshup marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
@pytest.mark.parametrize("join", ["inner", "outer"]) | ||||||
def test_concat_duplicates_error(index_maker, join): | ||||||
# if index_maker is tm.makeMultiIndex: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add the issue number as a comment in all of the new tests |
||||||
# TODO: This generator only makes Indexs of size 4 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this about? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pandas/pandas/_testing/__init__.py Lines 491 to 492 in 0976c4c
This function only makes a MultiIndex with length 4, ignoring the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
# pytest.skip() | ||||||
index_unique = index_maker(k=4) | ||||||
index_non_unique = index_unique[[0, 0, 1, 2, 3]] | ||||||
with pytest.raises(InvalidIndexError): | ||||||
_ = pd.concat( | ||||||
ivirshup marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
[ | ||||||
pd.DataFrame( | ||||||
np.ones((1, len(index_non_unique))), columns=index_non_unique | ||||||
), | ||||||
pd.DataFrame(np.ones((1, len(index_unique))), columns=index_unique), | ||||||
], | ||||||
join=join, | ||||||
) | ||||||
|
||||||
|
||||||
@pytest.mark.parametrize("pdt", [Series, pd.DataFrame]) | ||||||
@pytest.mark.parametrize("dt", np.sctypes["float"]) | ||||||
def test_concat_no_unnecessary_upcast(dt, pdt): | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the place to do at all. should be in concat itself.
ideally we want a nice error message if this happens (and the indices are not equal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I need to add the error message, but I do think this function is where the error should be thrown.
Every function which uses
get_objs_combined_axis
throws an error if there are duplicate indices. Here is a demonstration of this from pandas 1.1.5:Instead of just giving concat a better error message, all these methods could get better (and more consistent) error messages. I think it's important for the rules about unions and intersections of indices to be defined in these more internal methods so behaviour isn't defined (and reimplemented) per top level function.
Additionally, this function essentially returns nonsense for unions when there are duplicated indices since different
Index
types have different definitions for unions with duplicates, so I think it's appropriate to throw errors here instead of passing on those nonsense results.This has also made me realize there is one case where it's okay for there to be duplicates, which is when indices contain duplicates, but the intersection would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got a bit more complicated.
idx.get_indexer(keys)
throws an error ifidx
is non-unique, regardless of whetherkeys
have unique indices inidx
.Interestingly enough, it does not throw an error if the
idx
is non-unique, butkeys
is empty (which is the case I had initially seen).Examples using pandas 1.1.5:
Examples with pd.concat
This had the same behaviour despite not using
.get_indexer
directly in 1.1.5.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a more general issue for this: #38797