Skip to content

ENH: Add mode method to Series and DataFrame #5380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 5, 2013

Conversation

jtratner
Copy link
Contributor

Closes #5367 - generally as fast or faster than value_counts (which makes sense because it has to construct Series), so should be relatively good performance-wise. Also, doesn't get stuck on value_counts' pathological case (huge array with # uniques close to/at the size of the array).

Not using result of hashtable.value_count() under the hood and instead iterating over klib table directly gives a huge speedup. I just moved the value_count table creation method to a separate function (zero perf hit). For performance breakouts check out this gist:

https://gist.github.com/jtratner/7225878

DataFrame version delegates to Series' version at each level.

# and also when not empty
df["A"][1] = 0
df["B"][4] = df["B"][3]
df["C"][5] = 'e'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chained indexing! not in the tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad - it's so natural feeling haha

@jtratner
Copy link
Contributor Author

@jreback should I put this on hold until after 0.13?

@jreback
Copy link
Contributor

jreback commented Oct 30, 2013

@jtratner seems reasonably tested.....i think its ok, need a doc mention? (in api, but near value_counts somewhere?)

@jtratner
Copy link
Contributor Author

Yeah, I'll add it.

@jorisvandenbossche
Copy link
Member

Can you also add entries to api.rst?

@jtratner
Copy link
Contributor Author

jtratner commented Nov 4, 2013

I missed putting it there. I'll do that--thanks for noticing that!

@jtratner
Copy link
Contributor Author

jtratner commented Nov 5, 2013

added doc mention for Series and DataFrame

Abstract building up the tables with counts so value_count and mode can
share. Aww! DOC: Add documentation about mode()
jtratner added a commit that referenced this pull request Nov 5, 2013
ENH: Add mode method to Series and DataFrame
@jtratner jtratner merged commit 2d2e8b5 into pandas-dev:master Nov 5, 2013
@jtratner jtratner deleted the add-mode-to-series-and-frame branch November 5, 2013 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mode() function to pandas.Series
3 participants