-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add mode method to Series and DataFrame #5380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add mode method to Series and DataFrame #5380
Conversation
# and also when not empty | ||
df["A"][1] = 0 | ||
df["B"][4] = df["B"][3] | ||
df["C"][5] = 'e' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chained indexing! not in the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad - it's so natural feeling haha
@jreback should I put this on hold until after 0.13? |
@jtratner seems reasonably tested.....i think its ok, need a doc mention? (in api, but near value_counts somewhere?) |
Yeah, I'll add it. |
Can you also add entries to api.rst? |
I missed putting it there. I'll do that--thanks for noticing that! |
added doc mention for Series and DataFrame |
Abstract building up the tables with counts so value_count and mode can share. Aww! DOC: Add documentation about mode()
ENH: Add mode method to Series and DataFrame
Closes #5367 - generally as fast or faster than value_counts (which makes sense because it has to construct Series), so should be relatively good performance-wise. Also, doesn't get stuck on value_counts' pathological case (huge array with # uniques close to/at the size of the array).
Not using result of hashtable.value_count() under the hood and instead iterating over klib table directly gives a huge speedup. I just moved the value_count table creation method to a separate function (zero perf hit). For performance breakouts check out this gist:
https://gist.github.com/jtratner/7225878
DataFrame version delegates to Series' version at each level.