-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ERR: Shorten traceback in groupby _cython_agg_general #52992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What's your opinion on raising
instead? This seems more deliberate than an internal bug like th other does to me |
I think that message would be misleading, e.g.
mean on dtype object is implemented, it just takes the Python fallback path.
Not sure what you mean by internal bug here. |
If I get an error message like the above, my first instinct would be that the maintainers of the library made a mistake, not that my computation is wrong. Wdyt about an error message like: |
My reaction to this is "of course you can't convert, I shouldn't be trying an operations that tries to convert the string to numeric".
What is "type" referring to here? It seems to me users could interpret this as object dtype which is technically not accurate. |
Yeah this is the tricky part, specifying what exactly is wrong. I think I illustrated my example in a bad way. It's hard to see which operation was wrong with this error. For example:
No reference to column/operation and dtype isn't great I think |
I think your last line should be:
Agreed this should be improved. |
@phofl - how does something like this look:
The negatives in my opinion are that it's back to being rather long, and it seems to me the user could think it means the function is not implemented for object dtype (which can be incorrect). However, it specifies the method of aggregation, the dtype, and the value on which it failed which can help users understand where it's coming from. |
I'd prefer that one. In general, I am not a fan that this works sometimes for object, but this is a different discussion. This message is much clearer based on what I'd expect |
…pby_mean_stack � Conflicts: � doc/source/whatsnew/v2.1.0.rst
@phofl - I reworked the raising here. We now raise the same type of exception as main; I think this is better than raising a NotImplementedError in general, and has the added benefit of being less likely to break user code. |
@phofl - friendly ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Sorry I was travelling the last 3 weeks
msg = r"Cannot convert \['one' 'three' 'two'\] to numeric" | ||
if agg_function in ("std", "sem"): | ||
klass = ValueError | ||
msg = "could not convert string to float: 'one'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference intended here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we checked multiple possibilities for the message, of which this was one. By separating out the ValueError from the TypeError cases, we only have to check the message associated with the ValuError.
At least, I think that's what you're asking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant that std and see have different errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see! I do think there is something to improve here. If the algorithm actually tries to convert the value to a numeric one and fails, that to me is a ValueError. If, on the other hand, it sees that the object is a string and then doesn't even make an attempt to convert it, that would be a TypeError.
It would be good if we made this consistent by either (a) always trying to convert or (b) never trying to convert. However, even without this consistency, currently mean
et al don't try to make the conversion and still raise "Could not convert":
df = DataFrame({'a': [1, 1, 2], 'b': list('456')})
gb = df.groupby('a')
gb.mean()
# TypeError: Could not convert string '45' to numeric
This should be improved, but separate from this PR.
thx @rhshadrach |
How would you be able to solve this? import pandas as pd highest_spread = df.sort_values('Spread', ascending=False) ERR: TypeError Traceback (most recent call last) 15 frames The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) TypeError: agg function failed [how->median,dtype->object] |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.The code
now raises