Skip to content

Commit d10db1a

Browse files
committed
REF/ENH/API: Add parametrized CategoricalDtype
Moves the Categorical.ordered and Categorical.categories information to the dtype, an instance of CategoricalDtype.
1 parent 0f55de1 commit d10db1a

File tree

12 files changed

+426
-157
lines changed

12 files changed

+426
-157
lines changed

doc/source/categorical.rst

+22-2
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,14 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
9696
df["B"] = raw_cat
9797
df
9898
99-
You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:
99+
You can also specify differently ordered categories or make the resulting data
100+
ordered by passing a :class:`CategoricalDtype`:
100101

101102
.. ipython:: python
102103
103104
s = pd.Series(["a","b","c","a"])
104-
s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
105+
cat_type = pd.CategoricalDtype(categories=["b", "c", "d"], ordered=False)
106+
s_cat = s.astype(cat_type)
105107
s_cat
106108
107109
Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:
@@ -140,6 +142,24 @@ constructor to save the factorize step during normal constructor mode:
140142
splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
141143
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
142144
145+
146+
CategoricalDtype
147+
----------------
148+
149+
A categorical's type is fully described by 1.) its categories (an iterable with
150+
unique values and no missing values), and 2.) its orderedness (a boolean).
151+
This information can be stored in a :class:`~pandas.CategoricalDtype`.
152+
The ``categories`` argument is optional, which implies that the actual categories
153+
should be inferred from whatever is present in the data.
154+
155+
A :class:`~pandas.CategoricalDtype` can be used in any place pandas expects a
156+
`dtype`. For example :func:`pandas.read_csv`, :func:`pandas.DataFrame.astype`,
157+
the Series constructor, etc.
158+
159+
As a convenience, you can use the string `'category'` in place of a
160+
:class:`pandas.CategoricalDtype` when you want the default behavior of
161+
the categories being unordered, and equal to the set values present in the array.
162+
143163
Description
144164
-----------
145165

pandas/core/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
from pandas.core.algorithms import factorize, unique, value_counts
88
from pandas.core.dtypes.missing import isnull, notnull
9+
from pandas.core.dtypes.dtypes import CategoricalDtype
910
from pandas.core.categorical import Categorical
1011
from pandas.core.groupby import Grouper
1112
from pandas.io.formats.format import set_eng_float_format

0 commit comments

Comments
 (0)