Skip to content

BUG/API: Index.append with mixed object/Categorical indices #14545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Closes #14298, related to #13660 and #13767

This closes the info() issue with CategoricalIndex, but the underlying issue is actually related to Index.append, when appending mixed dtype index objects:

In [11]: pd.Index(['a']).append(pd.CategoricalIndex(['a', 'b']))
...
AttributeError: 'Index' object has no attribute '_is_dtype_compat'

This error occurs when the calling index is not a Categorical (because of https://github.com/pandas-dev/pandas/blob/v0.19.0/pandas/indexes/base.py#L1439 the Categorical _is_dtype_compat method is used anyway).

But the question is, what should the result be of the above expression?

  • a CategoricalIndex ? -> IMO not, as the calling index is not a CategoricalIndex, the result should also not be one (regardless of what is appended) -> so the intention of the code is IMO also not correct
  • an object Index ? -> yes, this is also what 0.18.1 does

But, the more important discussion is what to do when the calling index is a CategoricalIndex ? In 0.18 (and in 0.19.0) this either returned a CategoricalIndex or raised an error:

In [13]: pd.__version__
Out[13]: u'0.18.1'

In [104]: pd.CategoricalIndex(['a', 'b']).append(pd.Index(['a']))
Out[104]: CategoricalIndex([u'a', u'b', u'a'], categories=[u'a', u'b'], ordered=False, dtype='category')

In [105]: pd.CategoricalIndex(['a', 'b']).append(pd.Index(['c']))
...
TypeError: cannot append a non-category item to a CategoricalIndex

But, for the concat case we decided to return an object dtyped Index in those cases instead of coercing or raising (see discussion in #13767 and the summary table with rules in #13767 (comment)). So the question is, do think that append should follow the same rules as concat. Or, do we want to be more flexible for append, and let it depend on the type of the calling Index?

cc @sinhrks

@jorisvandenbossche jorisvandenbossche added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels Oct 31, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.1 milestone Oct 31, 2016
@jorisvandenbossche
Copy link
Member Author

In the current PR, I just removed the check for category, which is the more intrusive change, as now the return result will always be object dtype if not self and all objects to append are CategoricalIndex.
I could also change the if 'category' in typs: check to if self.is_categorical():, which then keeps the current behaviour if the calling object is a CategoricalIndex.

typs = _concat.get_dtype_kinds(to_concat)

if 'category' in typs:
if self.is_categorical():
# if any of the to_concat is category
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you should change this comment as this is no longer true (the Index must be a CI to return a CI) only.

@codecov-io
Copy link

codecov-io commented Nov 1, 2016

Current coverage is 85.27% (diff: 100%)

Merging #14545 into master will not change coverage

@@             master     #14545   diff @@
==========================================
  Files           140        140          
  Lines         50693      50693          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43229      43229          
  Misses         7464       7464          
  Partials          0          0          

Powered by Codecov. Last update 093aa82...0d47495

@jorisvandenbossche
Copy link
Member Author

OK, so I changed the PR to only fix the issue when the calling Index is not of Categorical dtype (which caused the bug), and left the behaviour of CategoricalIndex.append(..) alone.
As a result pd.Index(['a']).append(pd.CategoricalIndex(['a', 'b'])) works again (returning object Index as in 0.18.1).

That is probably the easiest for now, to have this fix in 0.19.1, but we should then discuss the behaviour of Index.append(..) a bit more in detail for 0.20 (whether to follow the same rules as concat or not).

@jorisvandenbossche jorisvandenbossche mentioned this pull request Nov 2, 2016
@TomAugspurger
Copy link
Contributor

+1 on the behavior here (coercing Categorical to Index).

I'd feel slightly better if the monkey patching of sys.stdout was in a context manager

from contextlib import contextmanager

@contextmanager
def stdout():
    sys.stdout = StringIO()
    yield
    sys.stdout = sys.__stdout__

but not a huge deal.

@jorisvandenbossche
Copy link
Member Author

I'd feel slightly better if the monkey patching of sys.stdout was in a context manager

I just copied the approach from other tests :-), so would be a nice clean-up in general!

@jreback
Copy link
Contributor

jreback commented Nov 2, 2016

don't re-invent the wheel, the pattern is:

buf = StringIO()
df.info(buf=buf, null_counts=null_counts)
self.assertTrue(('non-null' in buf.getvalue()) is result)

df = pd.DataFrame(np.zeros((2, 2)), index=idx, columns=idx)

import sys
sys.stdout = StringIO()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test

@jorisvandenbossche jorisvandenbossche merged commit 252526c into pandas-dev:master Nov 3, 2016
jorisvandenbossche added a commit that referenced this pull request Nov 3, 2016
… indices (#14545)

* BUG/API: Index.append with mixed object/Categorical indices

* Only coerce to object if the calling index is not categorical

* Add test for the df.info() case (GH14298)

(cherry picked from commit 252526c)
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Nov 18, 2016
Version 0.19.1

* tag 'v0.19.1': (43 commits)
  RLS: v0.19.1
  DOC: update whatsnew/release notes for 0.19.1 (pandas-dev#14573)
  [Backport pandas-dev#14545] BUG/API: Index.append with mixed object/Categorical indices (pandas-dev#14545)
  DOC: rst fixes
  [Backport pandas-dev#14567] DEPR: add deprecation warning for com.array_equivalent (pandas-dev#14567)
  [Backport pandas-dev#14551] PERF: casting loc to labels dtype before searchsorted (pandas-dev#14551)
  [Backport pandas-dev#14536] BUG: DataFrame.quantile with NaNs (GH14357) (pandas-dev#14536)
  [Backport pandas-dev#14520] BUG: don't close user-provided file handles in C parser (GH14418) (pandas-dev#14520)
  [Backport pandas-dev#14392] BUG: Dataframe constructor when given dict with None value (pandas-dev#14392)
  [Backport pandas-dev#14514] BUG: Don't parse inline quotes in skipped lines (pandas-dev#14514)
  [Bacport pandas-dev#14543] BUG: tseries ceil doc fix (pandas-dev#14543)
  [Backport pandas-dev#14541] DOC: Simplify the gbq integration testing procedure for contributors (pandas-dev#14541)
  [Backport pandas-dev#14527] BUG/ERR: raise correct error when sql driver is not installed (pandas-dev#14527)
  [Backport pandas-dev#14501] BUG: fix DatetimeIndex._maybe_cast_slice_bound for empty index (GH14354) (pandas-dev#14501)
  [Backport pandas-dev#14442] DOC: Expand on reference docs for read_json() (pandas-dev#14442)
  BLD: fix 3.4 build for cython to 0.24.1
  [Backport pandas-dev#14492] BUG: Accept unicode quotechars again in pd.read_csv
  [Backport pandas-dev#14496] BLD: Support Cython 0.25
  [Backport pandas-dev#14498] COMPAT/TST: fix test for range testing of negative integers to neg powers
  [Backport pandas-dev#14476] PERF: performance regression in Series.asof (pandas-dev#14476)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.info() fails with CategoricalIndex in the columns
4 participants