-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Clarify sorting and order of categoricals and fix typo #47846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Without the addition in the text it‘s unclear, what s.sort_values(inplace=True) should demonstrate. Show some meaningful output now. Also, when using astype to contrast ordered/unordered behavior, I think it‘s clearer to use the existing, unordered Series than to create a new one.
s.sort_values(ascending=False) | ||
try: | ||
s.min() | ||
except TypeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would better to demonstrate this with the :okexcept:
directive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if I look at examples where the :okexcept:
directive is used, for instance here or here, I'm not sure if this is so good an idea in the given case, as it inserts quite a lengthy (~15 lines) traceback which, in my opinion, distracts from reading and makes the example less clear than with try
/ except
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show an image of how the docs would render if :okexcept:
was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to clarify, how would the docs you modified in this PR render with :okexcept:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still be partial to adding :okexcept:
here. Though verbose, it shows the exact behavior that the user should see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I can't agree here: IMO the purpose here is just to demonstrate that an exception will be raised when trying to do min
or max
on a not ordered categorical series. If we then catch it or let it be thrown doesn't play a role for this purpose, so I'd prefer the visually less distracting option of catching it. This is the way it's done all the way in the given section of the user guide (9 times), why should we treat this 10th time differently? Maybe we can leave the try/except
here just for uniformity's sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think those other sections should also use :okexcept:
as well.
except TypeError as e: | ||
print(f"TypeError: {e}") | ||
s = s.astype(CategoricalDtype(ordered=True)) | ||
s.sort_values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will need to be s = s.sort_values()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course you could write s = s.sort_values()
followed by s
to show the result but as we don't need the sorted series further down (I mean for the following code it doesn't play a role if s
has been sorted or not) I think this could be shortened to just s.sort_values()
which will output the returned sorted Series.
In principle, one could remove this line completely, because if we can sort an unordered categories series, we can sort an ordered categories series all the more (it's a leftover from the beginning, when sorting of an unordered categories series was not allowed, this was changed in 87fec5b).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure removing this line would be better then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StefRe what do you think if you implement the suggested changes here, and then you open an issue to discuss the problem of too long tracebacks. I totally see your point on that, and would be great to not have these huge tracebacks in the middle of the documentation. But let's better discuss that separately, maybe there are better solutions than using try/except blocks, which aren't great either. Thanks!
xref #10715 for a discussion on limiting tracebacks. I do sympathize and agree that having very long tracebacks in the documentation can be distract, but my primary objection with |
@datapythonista Sounds good in principle but I just don't want to be the guy who insanely blew up the documentation by adding estimated 400 to 500 lines of traceback, making it practically unreadable. So maybe we can do it just the other way round: introduce this one additional try/execpt to the 9 already existing in this tutorial and open another issue about replacing try/except constructs by (preferably shortened) tracebacks (there is a small number of try/expcepts in other pandas tutorials too). What do you think about this, @mroeschke ? |
I would slightly prefer #10715 addressed before adding more |
just for myself a short summary of previous activity on this:
It's not mentioned what exactly was the error on CI (#10727 (comment) just states "doc builds are stopping since this commit"). Today, pandas uses the upstream IPython extension instead of its local copy (#19657), so we'd
This last part (# 3) is not clear to me how to accomplish - any thoughts about it? |
Limiting the tracebacks doesn't necessarily need to be addressed via ipython. We can also use other code directives in sphinx where we specify the output instead of being autogenerated, or maybe the python interpreter itself can be configured or monkeypatched to change the traceback behaviour. Maybe for now you can explain in the docs about the exception instead of showing it, so we don't have the try/except or the long traceback. |
certainly one could use just code blocks ( My idea was more to change IPython's exception handling by setting a custom handler. This is rather straightforward, just want to see how to make it configurable via a parameter to |
Converted to DRAFT until #10715 is resolved. (I added an optional |
Thanks for looking into limiting the tracebacks @StefRe! |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
still interested but IPython seem to ignore PR at the moment (no reaction whatsoever for a month already) |
ok let's close for now to clear the queue. |
Without the addition in the text it‘s unclear, what
s.sort_values(inplace=True)
should demonstrate. Show some meaningful output now. Also, when using
astype
tocontrast ordered/unordered behavior, I think it‘s clearer to use the existing,
unordered Series than to create a new one.
Fix minor typo in categorical.py.