-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG-19214 int categoricals are formatted as ints #24494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
a2593f0
36e16e1
ee22299
e6ef56e
97472b3
6938616
ee5ab83
bbd2ff0
2771401
2bb35dc
f8e67f9
4057932
7b46fec
7cc02c1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1323,6 +1323,7 @@ Categorical | |
- Bug in many methods of the ``.str``-accessor, which always failed on calling the ``CategoricalIndex.str`` constructor (:issue:`23555`, :issue:`23556`) | ||
- Bug in :meth:`Series.where` losing the categorical dtype for categorical data (:issue:`24077`) | ||
- Bug in :meth:`Categorical.apply` where ``NaN`` values could be handled unpredictably. They now remain unchanged (:issue:`24241`) | ||
- Bug in :meth:`Categorical.get_values` where integers would be formatted as floats if ``NaN`` values were present (:issue:`19214`) | ||
|
||
Datetimelike | ||
^^^^^^^^^^^^ | ||
|
@@ -1653,6 +1654,7 @@ Reshaping | |
- :meth:`DataFrame.nlargest` and :meth:`DataFrame.nsmallest` now returns the correct n values when keep != 'all' also when tied on the first columns (:issue:`22752`) | ||
- Constructing a DataFrame with an index argument that wasn't already an instance of :class:`~pandas.core.Index` was broken (:issue:`22227`). | ||
- Bug in :class:`DataFrame` prevented list subclasses to be used to construction (:issue:`21226`) | ||
- Calling :func:`pandas.concat` on a ``Categorical`` of ints with NA values now causes them to be processed as objects (formerly coerced to floats) (:issue:`19214`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only applies when concating a categorical with a different dtype, right? If I concat two integer cats with the same dtype, it’s still categorical right? |
||
- Bug in :func:`DataFrame.unstack` and :func:`DataFrame.pivot_table` returning a missleading error message when the resulting DataFrame has more elements than int32 can handle. Now, the error message is improved, pointing towards the actual problem (:issue:`20601`) | ||
|
||
.. _whatsnew_0240.bug_fixes.sparse: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1520,6 +1520,11 @@ def get_values(self): | |
# if we are a datetime and period index, return Index to keep metadata | ||
if is_datetimelike(self.categories): | ||
return self.categories.take(self._codes, fill_value=np.nan) | ||
elif is_integer_dtype(self.categories) and -1 in self._codes: | ||
warn("Integer values represented as objects to accomodate NaNs", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need for a warning |
||
RuntimeWarning) | ||
return self.categories.astype("object").take(self._codes, | ||
fill_value=np.nan) | ||
return np.array(self) | ||
|
||
def check_for_ordered(self, op): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
import numpy as np | ||
import pytest | ||
|
||
from pandas.compat import PY3, u | ||
|
||
|
@@ -240,6 +241,16 @@ def test_categorical_repr_datetime_ordered(self): | |
|
||
assert repr(c) == exp | ||
|
||
@pytest.mark.filterwarnings("ignore:Integer values:RuntimeWarning") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no warnings are needed |
||
def test_categorical_repr_int_with_nan(self): | ||
c = Categorical([1, 2, np.nan]) | ||
c_exp = """[1, 2, NaN]\nCategories (2, int64): [1, 2]""" | ||
assert repr(c) == c_exp | ||
|
||
s = Series([1, 2, np.nan], dtype="object").astype("category") | ||
s_exp = """0 1\n1 2\n2 NaN\ndtype: category\nCategories (2, int64): [1, 2]""" # noqa | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use parenthesis to wrap this line. |
||
assert repr(s) == s_exp | ||
|
||
def test_categorical_repr_period(self): | ||
idx = period_range('2011-01-01 09:00', freq='H', periods=5) | ||
c = Categorical(idx) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the user facing note though, it is formtting of a Series/Categorical where this happens.
.get_values()
its just an implementation detail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm a bit confused as to what a user-facing note is. Is it a specific section somewhere? Or does it mean to phrase the note differently, like "bug in the Categorical repr"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry. what I mean is a user would want to know that the string repr of Series (or categorical dtype) and Categoricals for integer categories will now not be coerced to float. Its maybe worth making this a sub-section (e.g. show previous / new behavior). you can just in-line it here.