Skip to content

Commit 6980f7c

Browse files
committed
fixup! ENH: Parametrized CategoricalDtype
1 parent 2ebd1d6 commit 6980f7c

File tree

3 files changed

+56
-44
lines changed

3 files changed

+56
-44
lines changed

doc/source/categorical.rst

+11-8
Original file line numberDiff line numberDiff line change
@@ -152,13 +152,14 @@ CategoricalDtype
152152

153153
A categorical's type is fully described by
154154

155-
1. its categories: a sequence of unique values and no missing values
156-
2. its orderedness: a boolean
155+
1. ``categories``: a sequence of unique values and no missing values
156+
2. ``ordered``: a boolean
157157

158158
This information can be stored in a :class:`~pandas.api.types.CategoricalDtype`.
159159
The ``categories`` argument is optional, which implies that the actual categories
160160
should be inferred from whatever is present in the data when the
161-
:class:`pandas.Categorical` is created.
161+
:class:`pandas.Categorical` is created. The categories are assumed to be unordered
162+
by default.
162163

163164
.. ipython:: python
164165
@@ -172,11 +173,13 @@ A :class:`~pandas.api.types.CategoricalDtype` can be used in any place pandas
172173
expects a `dtype`. For example :func:`pandas.read_csv`,
173174
:func:`pandas.DataFrame.astype`, or in the Series constructor.
174175

175-
As a convenience, you can use the string ``'category'`` in place of a
176-
:class:`~pandas.api.types.CategoricalDtype` when you want the default behavior of
177-
the categories being unordered, and equal to the set values present in the
178-
array. In other words, ``dtype='category'`` is equivalent to
179-
``dtype=CategoricalDtype()``.
176+
.. note::
177+
178+
As a convenience, you can use the string ``'category'`` in place of a
179+
:class:`~pandas.api.types.CategoricalDtype` when you want the default behavior of
180+
the categories being unordered, and equal to the set values present in the
181+
array. In other words, ``dtype='category'`` is equivalent to
182+
``dtype=CategoricalDtype()``.
180183

181184
Equality Semantics
182185
~~~~~~~~~~~~~~~~~~

doc/source/whatsnew/v0.21.0.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Highlights include:
1111

1212
- Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>`.
1313
- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
14-
categoricals independent of the data (:issue:`14711`, :issue:`15078`)
14+
categoricals independent of the data, see :ref:`here <whatsnew_0210.enhancements.categorical_dtype>`.
1515

1616
Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations <whatsnew_0210.deprecations>` before updating.
1717

@@ -99,7 +99,7 @@ Setting a list-like data structure into a new attribute now raise a ``UserWarnin
9999
expanded to include the ``categories`` and ``ordered`` attributes. A
100100
``CategoricalDtype`` can be used to specify the set of categories and
101101
orderedness of an array, independent of the data themselves. This can be useful,
102-
e.g., when converting string data to a ``Categorical``:
102+
e.g., when converting string data to a ``Categorical`` (:issue:`14711`, :issue:`15078`):
103103

104104
.. ipython:: python
105105

pandas/core/categorical.py

+43-34
Original file line numberDiff line numberDiff line change
@@ -139,33 +139,6 @@ def maybe_to_categorical(array):
139139
setter to change values in the categorical.
140140
"""
141141

142-
_categories_doc = """The categories of this categorical.
143-
144-
Setting assigns new values to each category (effectively a rename of
145-
each individual category).
146-
147-
The assigned value has to be a list-like object. All items must be unique and
148-
the number of items in the new categories must be the same as the number of
149-
items in the old categories.
150-
151-
Assigning to `categories` is a inplace operation!
152-
153-
Raises
154-
------
155-
ValueError
156-
If the new categories do not validate as categories or if the number of new
157-
categories is unequal the number of old categories
158-
159-
See also
160-
--------
161-
rename_categories
162-
reorder_categories
163-
add_categories
164-
remove_categories
165-
remove_unused_categories
166-
set_categories
167-
"""
168-
169142

170143
class Categorical(PandasObject):
171144
"""
@@ -192,6 +165,10 @@ class Categorical(PandasObject):
192165
ordered : boolean, (default False)
193166
Whether or not this categorical is treated as a ordered categorical.
194167
If not given, the resulting categorical will not be ordered.
168+
dtype : CategoricalDtype
169+
An instance of ``CategoricalDtype`` to use for this categorical
170+
171+
.. versionadded:: 0.21.0
195172
196173
Attributes
197174
----------
@@ -203,6 +180,10 @@ class Categorical(PandasObject):
203180
ordered : boolean
204181
Whether or not this Categorical is ordered.
205182
dtype : CategoricalDtype
183+
The instance of ``CategoricalDtype`` storing the ``categories``
184+
and ``ordered``.
185+
186+
.. versionadded:: 0.21.0
206187
207188
Raises
208189
------
@@ -212,7 +193,6 @@ class Categorical(PandasObject):
212193
If an explicit ``ordered=True`` is given but no `categories` and the
213194
`values` are not sortable.
214195
215-
216196
Examples
217197
--------
218198
>>> from pandas import Categorical
@@ -224,17 +204,17 @@ class Categorical(PandasObject):
224204
[a, b, c, a, b, c]
225205
Categories (3, object): [a < b < c]
226206
207+
Only ordered `Categoricals` can be sorted (according to the order
208+
of the categories) and have a min and max value.
209+
227210
>>> a = Categorical(['a','b','c','a','b','c'], ['c', 'b', 'a'],
228211
ordered=True)
229212
>>> a.min()
230213
'c'
231-
"""
232-
_dtype = CategoricalDtype()
233-
"""The dtype (always "category")"""
234-
"""Whether or not this Categorical is ordered.
235214
236-
Only ordered `Categoricals` can be sorted (according to the order
237-
of the categories) and have a min and max value.
215+
Notes
216+
-----
217+
See the :ref:`user guide <categorical>` for more.
238218
239219
See also
240220
--------
@@ -247,6 +227,7 @@ class Categorical(PandasObject):
247227
# For comparisons, so that numpy uses our implementation if the compare
248228
# ops, which raise
249229
__array_priority__ = 1000
230+
_dtype = CategoricalDtype()
250231
_typ = 'categorical'
251232

252233
def __init__(self, values, categories=None, ordered=None, dtype=None,
@@ -359,6 +340,32 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,
359340

360341
@property
361342
def categories(self):
343+
"""The categories of this categorical.
344+
345+
Setting assigns new values to each category (effectively a rename of
346+
each individual category).
347+
348+
The assigned value has to be a list-like object. All items must be
349+
unique and the number of items in the new categories must be the same
350+
as the number of items in the old categories.
351+
352+
Assigning to `categories` is a inplace operation!
353+
354+
Raises
355+
------
356+
ValueError
357+
If the new categories do not validate as categories or if the
358+
number of new categories is unequal the number of old categories
359+
360+
See also
361+
--------
362+
rename_categories
363+
reorder_categories
364+
add_categories
365+
remove_categories
366+
remove_unused_categories
367+
set_categories
368+
"""
362369
return self.dtype.categories
363370

364371
@categories.setter
@@ -372,10 +379,12 @@ def categories(self, categories):
372379

373380
@property
374381
def ordered(self):
382+
"""Whether the categories have an ordered relationship"""
375383
return self.dtype.ordered
376384

377385
@property
378386
def dtype(self):
387+
"""The :ref:`~pandas.api.types.CategoricalDtype` for this instance"""
379388
return self._dtype
380389

381390
def __dir__(self):

0 commit comments

Comments
 (0)