Skip to content

API: Change str for CategoricalDtype to category #17783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 5, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,10 @@ The values have been correctly interpreted as integers.

The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a
``Series`` with categorical type will now return an instance of
``CategoricalDtype``. For the most part, this is backwards compatible, though
the string repr has changed. If you were previously using ``str(s.dtype) ==
'category'`` to detect categorical data, switch to
:func:`pandas.api.types.is_categorical_dtype`, which is compatible with the old
and new ``CategoricalDtype``.
``CategoricalDtype``. This change should be backwards compatible, though the
repr has changed. ``str(CategoricalDtype())`` is still the string
``'category'``, but the preferred way to detect categorical data is to use
:func:`pandas.api.types.is_categorical_dtype`.

See the :ref:`CategoricalDtype docs <categorical.categoricaldtype>` for more.

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ def __eq__(self, other):
# both unordered; this could probably be optimized / cached
return hash(self) == hash(other)

def __unicode__(self):
def __repr__(self):
tpl = u'CategoricalDtype(categories={}ordered={})'
if self.categories is None:
data = u"None, "
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/dtypes/test_dtypes.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# -*- coding: utf-8 -*-
import re
import pytest

from itertools import product
Expand Down Expand Up @@ -649,3 +650,10 @@ def test_from_categorical_dtype_both(self):
result = CategoricalDtype._from_categorical_dtype(
c1, categories=[1, 2], ordered=False)
assert result == CategoricalDtype([1, 2], ordered=False)

def test_str_vs_repr(self):
c1 = CategoricalDtype(['a', 'b'])
assert str(c1) == 'category'
# Py2 will have unicode prefixes
pat = r"CategoricalDtype\(categories=\[.*\], ordered=False\)"
assert re.match(pat, repr(c1))
13 changes: 2 additions & 11 deletions pandas/tests/series/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1784,7 +1784,8 @@ class TestNLargestNSmallest(object):
# not supported on some archs
# Series([3., 2, 1, 2, 5], dtype='complex256'),
Series([3., 2, 1, 2, 5], dtype='complex128'),
Series(list('abcde'))])
Series(list('abcde')),
Series(list('abcde'), dtype='category')])
def test_error(self, r):
dt = r.dtype
msg = ("Cannot use method 'n(larg|small)est' with "
Expand All @@ -1795,16 +1796,6 @@ def test_error(self, r):
with tm.assert_raises_regex(TypeError, msg):
method(arg)

def test_error_categorical_dtype(self):
# same as test_error, but regex hard to escape properly
msg = ("Cannot use method 'n(larg|small)est' with dtype "
"CategoricalDtype.+")
with tm.assert_raises_regex(TypeError, msg):
Series(list('ab'), dtype='category').nlargest(2)

with tm.assert_raises_regex(TypeError, msg):
Series(list('ab'), dtype='category').nsmallest(2)

@pytest.mark.parametrize(
"s",
[v for k, v in s_main_dtypes().iteritems()])
Expand Down