Skip to content

DOC: Clarify sorting and order of categoricals and fix typo #47846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

StefRe
Copy link
Contributor

@StefRe StefRe commented Jul 25, 2022

Without the addition in the text it‘s unclear, what s.sort_values(inplace=True)
should demonstrate. Show some meaningful output now. Also, when using astype to
contrast ordered/unordered behavior, I think it‘s clearer to use the existing,
unordered Series than to create a new one.

Fix minor typo in categorical.py.

Without the addition in the text it‘s unclear, what s.sort_values(inplace=True)
should demonstrate. Show some meaningful output now. Also, when using astype to
contrast ordered/unordered behavior, I think it‘s clearer to use the existing,
unordered Series than to create a new one.
s.sort_values(ascending=False)
try:
s.min()
except TypeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would better to demonstrate this with the :okexcept: directive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if I look at examples where the :okexcept: directive is used, for instance here or here, I'm not sure if this is so good an idea in the given case, as it inserts quite a lengthy (~15 lines) traceback which, in my opinion, distracts from reading and makes the example less clear than with try / except.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show an image of how the docs would render if :okexcept: was used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the links I provided in the reply:

.. ipython:: python
    :okexcept:

    if pd.Series([False, True, False]):
        print("I was true")

renders as
grafik

and

.. ipython:: python
   :okexcept:

   mi.levels[0].name = "name via level"

renders as
grafik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to clarify, how would the docs you modified in this PR render with :okexcept:?

Copy link
Contributor Author

@StefRe StefRe Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. ipython:: python
    :okexcept:

    s = pd.Series(pd.Categorical(["a", "b", "c", "a"], ordered=False))
    s.sort_values(ascending=False)
    s.min()

renders as

out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still be partial to adding :okexcept: here. Though verbose, it shows the exact behavior that the user should see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I can't agree here: IMO the purpose here is just to demonstrate that an exception will be raised when trying to do min or max on a not ordered categorical series. If we then catch it or let it be thrown doesn't play a role for this purpose, so I'd prefer the visually less distracting option of catching it. This is the way it's done all the way in the given section of the user guide (9 times), why should we treat this 10th time differently? Maybe we can leave the try/except here just for uniformity's sake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think those other sections should also use :okexcept: as well.

except TypeError as e:
print(f"TypeError: {e}")
s = s.astype(CategoricalDtype(ordered=True))
s.sort_values()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will need to be s = s.sort_values()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course you could write s = s.sort_values() followed by s to show the result but as we don't need the sorted series further down (I mean for the following code it doesn't play a role if s has been sorted or not) I think this could be shortened to just s.sort_values() which will output the returned sorted Series.

In principle, one could remove this line completely, because if we can sort an unordered categories series, we can sort an ordered categories series all the more (it's a leftover from the beginning, when sorting of an unordered categories series was not allowed, this was changed in 87fec5b).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure removing this line would be better then

@mroeschke mroeschke added Docs Categorical Categorical Data Type labels Jul 25, 2022
@StefRe StefRe mentioned this pull request Aug 5, 2022
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StefRe what do you think if you implement the suggested changes here, and then you open an issue to discuss the problem of too long tracebacks. I totally see your point on that, and would be great to not have these huge tracebacks in the middle of the documentation. But let's better discuss that separately, maybe there are better solutions than using try/except blocks, which aren't great either. Thanks!

@mroeschke
Copy link
Member

xref #10715 for a discussion on limiting tracebacks.

I do sympathize and agree that having very long tracebacks in the documentation can be distract, but my primary objection with try/except in an ad-hoc fashion is that it can over-obscure the expected exception behavior.

@StefRe
Copy link
Contributor Author

StefRe commented Sep 2, 2022

@datapythonista Sounds good in principle but I just don't want to be the guy who insanely blew up the documentation by adding estimated 400 to 500 lines of traceback, making it practically unreadable.

So maybe we can do it just the other way round: introduce this one additional try/execpt to the 9 already existing in this tutorial and open another issue about replacing try/except constructs by (preferably shortened) tracebacks (there is a small number of try/expcepts in other pandas tutorials too). What do you think about this, @mroeschke ?

@mroeschke
Copy link
Member

I would slightly prefer #10715 addressed before adding more try/except patterns such that the other try/except patterns could be eliminated too.

@StefRe
Copy link
Contributor Author

StefRe commented Sep 3, 2022

just for myself a short summary of previous activity on this:

It's not mentioned what exactly was the error on CI (#10727 (comment) just states "doc builds are stopping since this commit").

Today, pandas uses the upstream IPython extension instead of its local copy (#19657), so we'd

  1. make a PR upstream
  2. verify that pandas docs build locally with this PR
  3. somehow use this updated upstream version for building the docs in CI and see if it still works.

This last part (# 3) is not clear to me how to accomplish - any thoughts about it?

@datapythonista
Copy link
Member

Limiting the tracebacks doesn't necessarily need to be addressed via ipython. We can also use other code directives in sphinx where we specify the output instead of being autogenerated, or maybe the python interpreter itself can be configured or monkeypatched to change the traceback behaviour.

Maybe for now you can explain in the docs about the exception instead of showing it, so we don't have the try/except or the long traceback.

@StefRe
Copy link
Contributor Author

StefRe commented Sep 3, 2022

We can also use other code directives in sphinx where we specify the output instead of being autogenerated

certainly one could use just code blocks (.. code-block:: ipython) as done here but in my opinion this is worse than try/except (in any case you'll need to manually adjust the command numbers to avoid such glitches where after # 68 come # 55 and 56 and then # 69:
image


My idea was more to change IPython's exception handling by setting a custom handler. This is rather straightforward, just want to see how to make it configurable via a parameter to okexcept or something. This would also cover cases where the error message stretches over more than one line (as in the concrete case here).

@StefRe StefRe marked this pull request as draft September 4, 2022 20:46
@StefRe
Copy link
Contributor Author

StefRe commented Sep 4, 2022

Converted to DRAFT until #10715 is resolved.

(I added an optional no_traceback argument to the :okexcept: option, works fine locally, will test it on CI next week:
image)

@mroeschke
Copy link
Member

mroeschke commented Sep 6, 2022

Thanks for looking into limiting the tracebacks @StefRe!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 7, 2022
@StefRe
Copy link
Contributor Author

StefRe commented Oct 7, 2022

respond to this comment if you're still interested in working on this.

still interested but IPython seem to ignore PR at the moment (no reaction whatsoever for a month already)

@simonjayhawkins
Copy link
Member

respond to this comment if you're still interested in working on this.

still interested but IPython seem to ignore PR at the moment (no reaction whatsoever for a month already)

ok let's close for now to clear the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Docs Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants