-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG-19214 int categoricals are formatted as ints #24494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #24494 +/- ##
==========================================
+ Coverage 92.31% 92.31% +<.01%
==========================================
Files 166 166
Lines 52412 52414 +2
==========================================
+ Hits 48382 48384 +2
Misses 4030 4030
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24494 +/- ##
==========================================
+ Coverage 92.32% 92.38% +0.05%
==========================================
Files 166 166
Lines 52440 52379 -61
==========================================
- Hits 48417 48388 -29
+ Misses 4023 3991 -32
Continue to review full report at Codecov.
|
@jreback I looked into it and the
instead of like a series. The PR now modifies the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fix is in .get_values()
do this
n [28]: c.categories.astype(object).take(c.codes, fill_value=np.nan)
Out[28]: Index([1, 2, nan], dtype='object')
if -1 in c.codes and is_integer_dtype(c.categories)
@TomAugspurger any objections here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the warnings are not needed
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1323,6 +1323,7 @@ Categorical | |||
- Bug in many methods of the ``.str``-accessor, which always failed on calling the ``CategoricalIndex.str`` constructor (:issue:`23555`, :issue:`23556`) | |||
- Bug in :meth:`Series.where` losing the categorical dtype for categorical data (:issue:`24077`) | |||
- Bug in :meth:`Categorical.apply` where ``NaN`` values could be handled unpredictably. They now remain unchanged (:issue:`24241`) | |||
- Bug in :meth:`Categorical.get_values` where integers would be formatted as floats if ``NaN`` values were present (:issue:`19214`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the user facing note though, it is formtting of a Series/Categorical where this happens. .get_values()
its just an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm a bit confused as to what a user-facing note is. Is it a specific section somewhere? Or does it mean to phrase the note differently, like "bug in the Categorical repr"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry. what I mean is a user would want to know that the string repr of Series (or categorical dtype) and Categoricals for integer categories will now not be coerced to float. Its maybe worth making this a sub-section (e.g. show previous / new behavior). you can just in-line it here.
pandas/core/arrays/categorical.py
Outdated
@@ -1520,6 +1520,11 @@ def get_values(self): | |||
# if we are a datetime and period index, return Index to keep metadata | |||
if is_datetimelike(self.categories): | |||
return self.categories.take(self._codes, fill_value=np.nan) | |||
elif is_integer_dtype(self.categories) and -1 in self._codes: | |||
warn("Integer values represented as objects to accomodate NaNs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for a warning
@@ -240,6 +241,16 @@ def test_categorical_repr_datetime_ordered(self): | |||
|
|||
assert repr(c) == exp | |||
|
|||
@pytest.mark.filterwarnings("ignore:Integer values:RuntimeWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no warnings are needed
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1653,6 +1683,7 @@ Reshaping | |||
- :meth:`DataFrame.nlargest` and :meth:`DataFrame.nsmallest` now returns the correct n values when keep != 'all' also when tied on the first columns (:issue:`22752`) | |||
- Constructing a DataFrame with an index argument that wasn't already an instance of :class:`~pandas.core.Index` was broken (:issue:`22227`). | |||
- Bug in :class:`DataFrame` prevented list subclasses to be used to construction (:issue:`21226`) | |||
- Calling :func:`pandas.concat` on a ``Categorical`` of ints with NA values now causes them to be processed as objects (formerly coerced to floats) (:issue:`19214`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only applies when concating a categorical with a different dtype, right? If I concat two integer cats with the same dtype, it’s still categorical right?
assert repr(c) == c_exp | ||
|
||
s = Series([1, 2, np.nan], dtype="object").astype("category") | ||
s_exp = """0 1\n1 2\n2 NaN\ndtype: category\nCategories (2, int64): [1, 2]""" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use parenthesis to wrap this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the change in dtype for concat is the larger change. Just from the release notes it looks like the formatting change is larger.
I don't think this is that big of a deal, do you want a sub-section for this? |
@TomAugspurger you think need to highlite the change more? |
The formatting change seems fine. I'm more concerned about the breaking
change to get_values
with no deprecation warning. Is there any way to deprecate the behavior, or
would that require to
many internal changes so that the warning doesn't show up when printing the
categorical?
…On Wed, Jan 2, 2019 at 6:57 PM Jeff Reback ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> you think need to
highlite the change more?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24494 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvMndIhl1YZHo1Iye0ANfYxd0P88ks5u_VV8gaJpZM4Zk2sS>
.
|
This is a very small case to be honest (and our current behavior for concatting any non-same types, so this is a bug IMHO). @JustinZhengBC can you make a small sub-section about the changes in Categorical int-like concat, showing previous and current behavior. I think that is fine. |
@TomAugspurger better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just doc comments.
thanks @JustinZhengBC keep em coming! |
git diff upstream/master -u -- "*.py" | flake8 --diff
This PR alters the
ExtensionArrayFormatter
such that if it receives a categorical series with only integers andNaN
, it will print the integers as integers instead of converting to floats.