Skip to content

DOC: Clarify sorting and order of categoricals and fix typo #47846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -439,15 +439,19 @@ Sorting and order
.. _categorical.sort:

If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a
meaning and certain operations are possible. If the categorical is unordered, ``.min()/.max()`` will raise a ``TypeError``.
meaning and certain operations are possible. If the categorical is unordered, the data can still be sorted,
but ``.min()/.max()`` will raise a ``TypeError``.

.. ipython:: python

s = pd.Series(pd.Categorical(["a", "b", "c", "a"], ordered=False))
s.sort_values(inplace=True)
s = pd.Series(["a", "b", "c", "a"]).astype(CategoricalDtype(ordered=True))
s.sort_values(inplace=True)
s
s.sort_values(ascending=False)
try:
s.min()
except TypeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would better to demonstrate this with the :okexcept: directive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if I look at examples where the :okexcept: directive is used, for instance here or here, I'm not sure if this is so good an idea in the given case, as it inserts quite a lengthy (~15 lines) traceback which, in my opinion, distracts from reading and makes the example less clear than with try / except.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show an image of how the docs would render if :okexcept: was used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the links I provided in the reply:

.. ipython:: python
    :okexcept:

    if pd.Series([False, True, False]):
        print("I was true")

renders as
grafik

and

.. ipython:: python
   :okexcept:

   mi.levels[0].name = "name via level"

renders as
grafik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to clarify, how would the docs you modified in this PR render with :okexcept:?

Copy link
Contributor Author

@StefRe StefRe Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. ipython:: python
    :okexcept:

    s = pd.Series(pd.Categorical(["a", "b", "c", "a"], ordered=False))
    s.sort_values(ascending=False)
    s.min()

renders as

out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still be partial to adding :okexcept: here. Though verbose, it shows the exact behavior that the user should see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I can't agree here: IMO the purpose here is just to demonstrate that an exception will be raised when trying to do min or max on a not ordered categorical series. If we then catch it or let it be thrown doesn't play a role for this purpose, so I'd prefer the visually less distracting option of catching it. This is the way it's done all the way in the given section of the user guide (9 times), why should we treat this 10th time differently? Maybe we can leave the try/except here just for uniformity's sake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think those other sections should also use :okexcept: as well.

print(f"TypeError: {e}")
s = s.astype(CategoricalDtype(ordered=True))
s.sort_values()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will need to be s = s.sort_values()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course you could write s = s.sort_values() followed by s to show the result but as we don't need the sorted series further down (I mean for the following code it doesn't play a role if s has been sorted or not) I think this could be shortened to just s.sort_values() which will output the returned sorted Series.

In principle, one could remove this line completely, because if we can sort an unordered categories series, we can sort an ordered categories series all the more (it's a leftover from the beginning, when sorting of an unordered categories series was not allowed, this was changed in 87fec5b).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure removing this line would be better then

s.min(), s.max()

You can set categorical data to be ordered by using ``as_ordered()`` or unordered by using ``as_unordered()``. These will by
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ class Categorical(NDArrayBackedExtensionArray, PandasObject, ObjectStringArrayMi
"""
Represent a categorical variable in classic R / S-plus fashion.

`Categoricals` can only take on only a limited, and usually fixed, number
`Categoricals` can only take on a limited, and usually fixed, number
of possible values (`categories`). In contrast to statistical categorical
variables, a `Categorical` might have an order, but numerical operations
(additions, divisions, ...) are not possible.
Expand Down