-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/API: Index.append with mixed object/Categorical indices #14545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: Index.append with mixed object/Categorical indices #14545
Conversation
In the current PR, I just removed the check for category, which is the more intrusive change, as now the return result will always be object dtype if not self and all objects to append are CategoricalIndex. |
6cff6ed
to
1d96bf9
Compare
typs = _concat.get_dtype_kinds(to_concat) | ||
|
||
if 'category' in typs: | ||
if self.is_categorical(): | ||
# if any of the to_concat is category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then you should change this comment as this is no longer true (the Index must be a CI to return a CI) only.
Current coverage is 85.27% (diff: 100%)@@ master #14545 diff @@
==========================================
Files 140 140
Lines 50693 50693
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43229 43229
Misses 7464 7464
Partials 0 0
|
1d96bf9
to
dd461bc
Compare
OK, so I changed the PR to only fix the issue when the calling Index is not of Categorical dtype (which caused the bug), and left the behaviour of That is probably the easiest for now, to have this fix in 0.19.1, but we should then discuss the behaviour of |
+1 on the behavior here (coercing Categorical to Index). I'd feel slightly better if the monkey patching of from contextlib import contextmanager
@contextmanager
def stdout():
sys.stdout = StringIO()
yield
sys.stdout = sys.__stdout__ but not a huge deal. |
I just copied the approach from other tests :-), so would be a nice clean-up in general! |
don't re-invent the wheel, the pattern is:
|
df = pd.DataFrame(np.zeros((2, 2)), index=idx, columns=idx) | ||
|
||
import sys | ||
sys.stdout = StringIO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test
dd461bc
to
0d47495
Compare
Version 0.19.1 * tag 'v0.19.1': (43 commits) RLS: v0.19.1 DOC: update whatsnew/release notes for 0.19.1 (pandas-dev#14573) [Backport pandas-dev#14545] BUG/API: Index.append with mixed object/Categorical indices (pandas-dev#14545) DOC: rst fixes [Backport pandas-dev#14567] DEPR: add deprecation warning for com.array_equivalent (pandas-dev#14567) [Backport pandas-dev#14551] PERF: casting loc to labels dtype before searchsorted (pandas-dev#14551) [Backport pandas-dev#14536] BUG: DataFrame.quantile with NaNs (GH14357) (pandas-dev#14536) [Backport pandas-dev#14520] BUG: don't close user-provided file handles in C parser (GH14418) (pandas-dev#14520) [Backport pandas-dev#14392] BUG: Dataframe constructor when given dict with None value (pandas-dev#14392) [Backport pandas-dev#14514] BUG: Don't parse inline quotes in skipped lines (pandas-dev#14514) [Bacport pandas-dev#14543] BUG: tseries ceil doc fix (pandas-dev#14543) [Backport pandas-dev#14541] DOC: Simplify the gbq integration testing procedure for contributors (pandas-dev#14541) [Backport pandas-dev#14527] BUG/ERR: raise correct error when sql driver is not installed (pandas-dev#14527) [Backport pandas-dev#14501] BUG: fix DatetimeIndex._maybe_cast_slice_bound for empty index (GH14354) (pandas-dev#14501) [Backport pandas-dev#14442] DOC: Expand on reference docs for read_json() (pandas-dev#14442) BLD: fix 3.4 build for cython to 0.24.1 [Backport pandas-dev#14492] BUG: Accept unicode quotechars again in pd.read_csv [Backport pandas-dev#14496] BLD: Support Cython 0.25 [Backport pandas-dev#14498] COMPAT/TST: fix test for range testing of negative integers to neg powers [Backport pandas-dev#14476] PERF: performance regression in Series.asof (pandas-dev#14476) ...
Closes #14298, related to #13660 and #13767
This closes the
info()
issue with CategoricalIndex, but the underlying issue is actually related to Index.append, when appending mixed dtype index objects:This error occurs when the calling index is not a Categorical (because of https://github.com/pandas-dev/pandas/blob/v0.19.0/pandas/indexes/base.py#L1439 the Categorical
_is_dtype_compat
method is used anyway).But the question is, what should the result be of the above expression?
CategoricalIndex
? -> IMO not, as the calling index is not a CategoricalIndex, the result should also not be one (regardless of what is appended) -> so the intention of the code is IMO also not correctBut, the more important discussion is what to do when the calling index is a CategoricalIndex ? In 0.18 (and in 0.19.0) this either returned a CategoricalIndex or raised an error:
But, for the
concat
case we decided to return an object dtyped Index in those cases instead of coercing or raising (see discussion in #13767 and the summary table with rules in #13767 (comment)). So the question is, do think thatappend
should follow the same rules asconcat
. Or, do we want to be more flexible forappend
, and let it depend on the type of the calling Index?cc @sinhrks