Skip to content

Commit 918c01a

Browse files
committed
docs
1 parent fde7dce commit 918c01a

File tree

4 files changed

+131
-3
lines changed

4 files changed

+131
-3
lines changed

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ dist
5555
######################
5656
.directory
5757
.gdb_history
58-
.DS_Store?
58+
.DS_Store
5959
ehthumbs.db
6060
Icon?
6161
Thumbs.db

doc/source/advanced.rst

+70-2
Original file line numberDiff line numberDiff line change
@@ -594,7 +594,76 @@ faster than fancy indexing.
594594
timeit ser.ix[indexer]
595595
timeit ser.take(indexer)
596596

597-
.. _indexing.float64index:
597+
.. _indexing.categoricalindex:
598+
599+
CategoricalIndex
600+
----------------
601+
602+
.. versionadded:: 0.16.1
603+
604+
We introduce a ``CategoricalIndex``, a new type of index object that is useful for supporting
605+
indexing with duplicates. This is a container around a ``Categorical`` (introduced in v0.15.0)
606+
and allows efficient indexing and storage of a index with a larger number of duplicated elements. Prior to 0.16.1,
607+
setting the index of a ``DataFrame/Series`` with a ``category`` dtype would convert this to regular object-based ``Index``.
608+
609+
.. ipython:: python
610+
611+
df = DataFrame({'A' : np.arange(6),
612+
'B' : Series(list('aabbca')).astype('category',
613+
categories=list('cab'))
614+
})
615+
df
616+
df.dtypes
617+
df.B.cat.categories
618+
619+
Setting the index, will create create a ``CategoricalIndex``
620+
621+
.. ipython:: python
622+
623+
df2 = df.set_index('B')
624+
df2.index
625+
df2.index.categories
626+
627+
Indexing works similarly to an ``Index`` with duplicates
628+
629+
.. ipython:: python
630+
631+
df2.loc['a']
632+
633+
# and preserves the CategoricalIndex
634+
df2.loc['a'].index
635+
df2.loc['a'].index.categories
636+
637+
Sorting will order by the order of the categories
638+
639+
.. ipython:: python
640+
641+
df2.sort_index()
642+
643+
Groupby operations on the index will preserve the index nature as well
644+
645+
.. ipython:: python
646+
647+
df2.groupby(level=0).sum()
648+
df2.groupby(level=0).sum().index
649+
650+
.. warning::
651+
652+
Reshaping and Comparision operations on a ``CategoricalIndex`` must have the same categories
653+
or a ``TypeError`` will be raised.
654+
655+
.. code-block:: python
656+
657+
In [10]: df3 = DataFrame({'A' : np.arange(6),
658+
'B' : Series(list('aabbca')).astype('category',
659+
categories=list('abc'))
660+
}).set_index('B')
661+
662+
In [11]: df3.index.categories
663+
Out[11]: Index([u'a', u'b', u'c'], dtype='object')
664+
665+
In [12]: pd.concat([df2,df3]
666+
TypeError: categories must match existing categories when appending
598667
599668
Float64Index
600669
------------
@@ -706,4 +775,3 @@ Of course if you need integer based selection, then use ``iloc``
706775
.. ipython:: python
707776
708777
dfir.iloc[0:5]
709-

doc/source/api.rst

+20
Original file line numberDiff line numberDiff line change
@@ -1289,6 +1289,26 @@ Selecting
12891289
Index.slice_indexer
12901290
Index.slice_locs
12911291

1292+
.. _api.categoricalindex:
1293+
1294+
CategoricalIndex
1295+
----------------
1296+
1297+
.. autosummary::
1298+
:toctree: generated/
1299+
1300+
CategoricalIndex
1301+
1302+
Categorical Components
1303+
~~~~~~~~~~~~~~~~~~~~~~
1304+
1305+
.. autosummary::
1306+
:toctree: generated/
1307+
1308+
CategoricalIndex.codes
1309+
CategoricalIndex.categories
1310+
CategoricalIndex.ordered
1311+
12921312
.. _api.datetimeindex:
12931313

12941314
DatetimeIndex

doc/source/whatsnew/v0.16.1.txt

+40
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ This is a minor bug-fix release from 0.16.0 and includes a a large number of
77
bug fixes along several new features, enhancements, and performance improvements.
88
We recommend that all users upgrade to this version.
99

10+
Highlights include:
11+
12+
- Support for a ``CategoricalIndex``, a category based index, see :ref:`here <whatsnew_0161`.enhancements.categoricalindex>`
13+
1014
.. contents:: What's new in v0.16.1
1115
:local:
1216
:backlinks: none
@@ -17,10 +21,46 @@ We recommend that all users upgrade to this version.
1721
Enhancements
1822
~~~~~~~~~~~~
1923

24+
.. _whatsnew_0161.enhancements.categoricalindex:
25+
26+
CategoricalIndex
27+
^^^^^^^^^^^^^^^^
28+
29+
We introduce a ``CategoricalIndex``, a new type of index object that is useful for supporting
30+
indexing with duplicates. This is a container around a ``Categorical`` (introduced in v0.15.0)
31+
and allows efficient indexing and storage of a index with a larger number of duplicated elements. Prior to 0.16.1,
32+
setting the index of a ``DataFrame/Series`` with a ``category`` dtype would convert this to regular object-based ``Index``.
33+
34+
.. ipython :: python
35+
36+
df = DataFrame({'A' : np.arange(6),
37+
'B' : Series(list('aabbca')).astype('category',
38+
categories=list('cab'))
39+
})
40+
df
41+
df.dtypes
42+
df.B.cat.categories
43+
44+
# setting the index, will create create a CategoricalIndex
45+
df2 = df.set_index('B')
46+
df2.index
47+
df2.index.categories
48+
49+
# indexing works similarly to an Index with duplicates
50+
df2.loc['a']
2051

52+
# and preserves the CategoricalIndex
53+
df2.loc['a'].index
54+
df2.loc['a'].index.categories
2155

56+
# sorting will order by the order of the categories
57+
df2.sort_index()
2258

59+
# groupby operations on the index will preserve the index nature as well
60+
df2.groupby(level=0).sum()
61+
df2.groupby(level=0).sum().index
2362

63+
See the :ref:`documentation <advanced.categoricalindex>` for more. (:issue:`7629`)
2464

2565
.. _whatsnew_0161.api:
2666

0 commit comments

Comments
 (0)