Skip to content

BENCH: programmatically create benchmarks for large ngroups (GH6787) #8410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 30, 2014

Conversation

dlovell
Copy link
Contributor

@dlovell dlovell commented Sep 28, 2014

closes #6787

Uses ngroups=10000 as suggested in the issue, which takes about 1 hour on my desktop.

For results (vb_suite.log, pkl file) see: https://gist.github.com/dlovell/ea3400273314e7612f6e
Note: gist references a different commit hash. I changed the commit message and added modification to doc/source/v0.15.0.txt, but the actual modifications to vb_suite/groupby.py are principally the same.

@jreback
Copy link
Contributor

jreback commented Sep 28, 2014

@dlovell this is gr8!

can u add small but for small groups as well? (say 100?
and name very similarly)

might then have some duplicates (well some that do very similar - so can eliminate those and just use your new ones)

pls also create an issue to investigate the perf of some of these (just paste you test run in new issue for reference)

@jreback jreback added Groupby Performance Memory or execution speed performance labels Sep 28, 2014
@jreback jreback added this to the 0.15.0 milestone Sep 28, 2014
@dlovell
Copy link
Contributor Author

dlovell commented Sep 30, 2014

Commit c494c03 runs the tests for ngroups values 100 and 10000. Output looks like

groupby_ngroups_100_describe                 | 780.9353 |
groupby_ngroups_10_cumcount                  |   0.3180 |

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

@dlovell thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: add vbenchs for groupby functions with different group sizes
2 participants