Skip to content

Commit 30233fd

Browse files
committed
ENH: upgrade categoricals to a first class pandas type
GH3943, GH5313, GH5314, GH7444 ENH: delegate _reduction and ops from Series to the categorical to support min/max and raise TypeError on other ops (numerical) and reduction Add Categorical Properties to Series Default to 'ordered' Categoricals if values are ordered Categorical: add level assignments and reordering + changed default for ordered Add a `Categorical.reorder_levels()` method. Change some naming in `Series`, so that the methods do not clash with established standards and rename the other categorical methods accordingly. Also change the default for `ordered` to True if values + levels are passed in at creation time. Initial doc version for working with Categorical data Categorical: add Categorical.mode() and use that in Series.mode() Categorical: implement remove_unused_levels() Categorical: implement value_count() for categorical series Categorical: make Series.astype("category") work ENH: add setitem to Categorical BUG: assigning to levels not in level set now raises ValueError API: disallow numpy ufuncs with categoricals Categorical: Categorical assignment to int/obj column ENH: add support for fillna to Categoricals API: deprecate old style categorical constructor usage and change default Before it was possible to pass in precomputed labels/pointer and the corresponding levels (e.g.: `Categorical([0,1,2], levels=["a","b","c"])`). This could lead to subtle errors in case of integer categoricals: the following could be both interpreted as "precomputed pointers and levels" or "values and levels", but converting it back to a integer array would result in different arrays: `np.array(Categorical([1,2], levels=[1,2,3]))` interpreted as pointers: `[2,3]` interpreted as values: `[1,2]` Up to now we would favour old style "pointer and levels" if these values could be interpreted as such (see code for details...). With this commit we favour new style "values and levels" and only attempt to interprete them as "pointers and levels" if "compat=True" is passed to the constructor. BREAKS: This will break code which uses Categoricals with "pointer and levels". A short google search and a search on stackoverflow revealed no such useage. Categorical: document constructor changes and small fixes Categorical: document that inappropriate numpy functions won't work anymore ENH: concat support
1 parent 66e1763 commit 30233fd

30 files changed

+3233
-224
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,5 @@ doc/source/vbench
8888
doc/source/vbench.rst
8989
doc/source/index.rst
9090
doc/build/html/index.html
91+
# Windows specific leftover:
92+
doc/tmp.sv

doc/source/api.rst

+49-1
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ Time series-related
429429
Series.tz_localize
430430

431431
String handling
432-
~~~~~~~~~~~~~~~~~~~
432+
~~~~~~~~~~~~~~~
433433
``Series.str`` can be used to access the values of the series as
434434
strings and apply several methods to it. Due to implementation
435435
details the methods show up here as methods of the
@@ -468,6 +468,54 @@ details the methods show up here as methods of the
468468
StringMethods.upper
469469
StringMethods.get_dummies
470470

471+
.. _api.categorical:
472+
473+
Categorical
474+
~~~~~~~~~~~
475+
476+
.. currentmodule:: pandas.core.categorical
477+
478+
If the Series is of dtype ``category``, ``Series.cat`` can be used to access the the underlying
479+
``Categorical``. This data type is similar to the otherwise underlying numpy array
480+
and has the following usable methods and properties (all available as
481+
``Series.cat.<method_or_property>``).
482+
483+
484+
.. autosummary::
485+
:toctree: generated/
486+
487+
Categorical
488+
Categorical.levels
489+
Categorical.ordered
490+
Categorical.reorder_levels
491+
Categorical.remove_unused_levels
492+
Categorical.min
493+
Categorical.max
494+
Categorical.mode
495+
496+
To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
497+
are also introduced. Apart from these methods, ``np.asarray(categorical)`` works by implementing the
498+
array interface (`Categorical.__array__()`). Be aware, that this converts the
499+
Categorical back to a numpy array, so levels and order information is not preserved!
500+
501+
.. autosummary::
502+
:toctree: generated/
503+
504+
Categorical.from_array
505+
Categorical.get_values
506+
Categorical.copy
507+
Categorical.dtype
508+
Categorical.ndim
509+
Categorical.sort
510+
Categorical.describe
511+
Categorical.equals
512+
Categorical.unique
513+
Categorical.order
514+
Categorical.argsort
515+
Categorical.fillna
516+
Categorical.__array__
517+
518+
471519
Plotting
472520
~~~~~~~~
473521
.. currentmodule:: pandas

0 commit comments

Comments
 (0)