Skip to content

Categorical doc fixups #8413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 29, 2014
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -640,16 +640,14 @@ Categoricals
------------

Since version 0.15, pandas can include categorical data in a ``DataFrame``. For full docs, see the
:ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>` .
:ref:`categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>` .

.. ipython:: python

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

# convert the raw grades to a categorical
df["grade"] = pd.Categorical(df["raw_grade"])

# Alternative: df["grade"] = df["raw_grade"].astype("category")
df["grade"] = df["raw_grade"].astype("category")
df["grade"]

# Rename the categories inplace
Expand All @@ -658,7 +656,9 @@ Since version 0.15, pandas can include categorical data in a ``DataFrame``. For
# Reorder the categories and simultaneously add the missing categories
df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
df["grade"]
# Sorting is per order in the categories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a space before the '#'. alternatively you can do these in separate ipython blocks (I personally like the 2nd style, but I do the '#' comments too....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before the #?

Will convert to seperate ipython blocks in 10min.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so they render in mini-blocks (or separte blocks are cleaner IMHO)

df.sort("grade")
# groupby shows also empty categories
df.groupby("grade").size()


Expand Down
9 changes: 3 additions & 6 deletions doc/source/v0.15.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -540,21 +540,18 @@ Categoricals in Series/DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~pandas.Categorical` can now be included in `Series` and `DataFrames` and gained new
methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
methods to manipulate. Thanks to Jan Schulz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
:issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`, :issue:`7768`, :issue:`8006`, :issue:`3678`,
:issue:`8075`, :issue:`8076`, :issue:`8143`).

For full docs, see the :ref:`Categorical introduction <categorical>` and the
For full docs, see the :ref:`categorical introduction <categorical>` and the
:ref:`API documentation <api.categorical>`.

.. ipython:: python

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

# convert the raw grades to a categorical
df["grade"] = pd.Categorical(df["raw_grade"])

# Alternative: df["grade"] = df["raw_grade"].astype("category")
df["grade"] = df["raw_grade"].astype("category")
df["grade"]

# Rename the categories
Expand Down
40 changes: 29 additions & 11 deletions pandas/tools/tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
right == True (the default), then the bins [1,2,3,4] indicate
(1,2], (2,3], (3,4].
labels : array or boolean, default None
Labels to use for bins, or False to return integer bin labels.
Used as labels for the resulting bins. Must be of the same length as the resulting
bins. If False, return only integer indicators of the bins.
retbins : bool, optional
Whether to return the bins or not. Can be useful if bins is given
as a scalar.
Expand All @@ -47,7 +48,8 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
-------
out : Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series of type category if
input is a Series else Categorical.
input is a Series else Categorical. Bins are represented as categories when categorical
data is returned.
bins : ndarray of floats
Returned only if `retbins` is True.

Expand All @@ -63,12 +65,15 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,

Examples
--------
>>> cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
(array([(0.191, 3.367], (0.191, 3.367], (0.191, 3.367], (3.367, 6.533],
(6.533, 9.7], (0.191, 3.367]], dtype=object),
array([ 0.1905 , 3.36666667, 6.53333333, 9.7 ]))
>>> cut(np.ones(5), 4, labels=False)
array([2, 2, 2, 2, 2])
>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
([(0.191, 3.367], (0.191, 3.367], (0.191, 3.367], (3.367, 6.533], (6.533, 9.7], (0.191, 3.367]]
Categories (3, object): [(0.191, 3.367] < (3.367, 6.533] < (6.533, 9.7]],
array([ 0.1905 , 3.36666667, 6.53333333, 9.7 ]))
>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, labels=["good","medium","bad"])
[good, good, good, medium, bad, good]
Categories (3, object): [good < medium < bad]
>>> pd.cut(np.ones(5), 4, labels=False)
array([1, 1, 1, 1, 1], dtype=int64)
"""
# NOTE: this binning code is changed a bit from histogram for var(x) == 0
if not np.iterable(bins):
Expand Down Expand Up @@ -126,7 +131,8 @@ def qcut(x, q, labels=None, retbins=False, precision=3):
Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately
array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles
labels : array or boolean, default None
Labels to use for bin edges, or False to return integer bin labels
Used as labels for the resulting bins. Must be of the same length as the resulting
bins. If False, return only integer indicators of the bins.
retbins : bool, optional
Whether to return the bins or not. Can be useful if bins is given
as a scalar.
Expand All @@ -135,15 +141,27 @@ def qcut(x, q, labels=None, retbins=False, precision=3):

Returns
-------
cat : Categorical or Series
Returns a Series of type category if input is a Series else Categorical.
out : Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series of type category if
input is a Series else Categorical. Bins are represented as categories when categorical
data is returned.
bins : ndarray of floats
Returned only if `retbins` is True.

Notes
-----
Out of bounds values will be NA in the resulting Categorical object

Examples
--------
>>> pd.qcut(range(5), 4)
[[0, 1], [0, 1], (1, 2], (2, 3], (3, 4]]
Categories (4, object): [[0, 1] < (1, 2] < (2, 3] < (3, 4]]
>>> pd.qcut(range(5), 3, labels=["good","medium","bad"])
[good, good, medium, bad, bad]
Categories (3, object): [good < medium < bad]
>>> pd.qcut(range(5), 4, labels=False)
array([0, 0, 1, 2, 3], dtype=int64)
"""
if com.is_integer(q):
quantiles = np.linspace(0, 1, q + 1)
Expand Down