Skip to content

Commit 570584c

Browse files
committed
Merge pull request #7217 from jreback/categorical
WIP: categoricals as an internal CategoricalBlock GH5313
2 parents 4ddb73a + ea0a13c commit 570584c

39 files changed

+3707
-394
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,5 @@ doc/source/vbench
8888
doc/source/vbench.rst
8989
doc/source/index.rst
9090
doc/build/html/index.html
91+
# Windows specific leftover:
92+
doc/tmp.sv

doc/source/api.rst

+55-1
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ Time series-related
429429
Series.tz_localize
430430

431431
String handling
432-
~~~~~~~~~~~~~~~~~~~
432+
~~~~~~~~~~~~~~~
433433
``Series.str`` can be used to access the values of the series as
434434
strings and apply several methods to it. Due to implementation
435435
details the methods show up here as methods of the
@@ -468,6 +468,60 @@ details the methods show up here as methods of the
468468
StringMethods.upper
469469
StringMethods.get_dummies
470470

471+
.. _api.categorical:
472+
473+
Categorical
474+
~~~~~~~~~~~
475+
476+
.. currentmodule:: pandas.core.categorical
477+
478+
If the Series is of dtype ``category``, ``Series.cat`` can be used to access the the underlying
479+
``Categorical``. This data type is similar to the otherwise underlying numpy array
480+
and has the following usable methods and properties (all available as
481+
``Series.cat.<method_or_property>``).
482+
483+
484+
.. autosummary::
485+
:toctree: generated/
486+
487+
Categorical
488+
Categorical.from_codes
489+
Categorical.levels
490+
Categorical.ordered
491+
Categorical.reorder_levels
492+
Categorical.remove_unused_levels
493+
Categorical.min
494+
Categorical.max
495+
Categorical.mode
496+
Categorical.describe
497+
498+
``np.asarray(categorical)`` works by implementing the array interface. Be aware, that this converts
499+
the Categorical back to a numpy array, so levels and order information is not preserved!
500+
501+
.. autosummary::
502+
:toctree: generated/
503+
504+
Categorical.__array__
505+
506+
To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
507+
are also introduced.
508+
509+
.. autosummary::
510+
:toctree: generated/
511+
512+
Categorical.from_array
513+
Categorical.get_values
514+
Categorical.copy
515+
Categorical.dtype
516+
Categorical.ndim
517+
Categorical.sort
518+
Categorical.equals
519+
Categorical.unique
520+
Categorical.order
521+
Categorical.argsort
522+
Categorical.fillna
523+
524+
471525
Plotting
472526
~~~~~~~~
473527
.. currentmodule:: pandas

doc/source/basics.rst

+7-1
Original file line numberDiff line numberDiff line change
@@ -1574,7 +1574,8 @@ dtypes:
15741574
'float64': np.arange(4.0, 7.0),
15751575
'bool1': [True, False, True],
15761576
'bool2': [False, True, False],
1577-
'dates': pd.date_range('now', periods=3).values})
1577+
'dates': pd.date_range('now', periods=3).values}),
1578+
'category': pd.Categorical(list("ABC))
15781579
df['tdeltas'] = df.dates.diff()
15791580
df['uint64'] = np.arange(3, 6).astype('u8')
15801581
df['other_dates'] = pd.date_range('20130101', periods=3).values
@@ -1630,6 +1631,11 @@ All numpy dtypes are subclasses of ``numpy.generic``:
16301631
16311632
subdtypes(np.generic)
16321633
1634+
.. note::
1635+
1636+
Pandas also defines an additional ``category`` dtype, which is not integrated into the normal
1637+
numpy hierarchy and wont show up with the above function.
1638+
16331639
.. note::
16341640
16351641
The ``include`` and ``exclude`` parameters must be non-string sequences.

0 commit comments

Comments
 (0)