Skip to content

BUG/PERF: perf issues in object groupby aggregations (GH7555) #7568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 25, 2014

related #7555

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
groupby_object_last                          |  15.2340 | 4175.6846 |   0.0036 |
groupby_object_first                         |  15.6957 | 4166.6920 |   0.0038 |
groupby_transform_ufunc                      |   5.4990 |   5.7740 |   0.9524 |
groupby_indices                              |   6.7443 |   7.0153 |   0.9614 |
groupby_multi_different_functions            |  12.6866 |  12.9040 |   0.9832 |
groupby_multi_series_op                      |  13.2560 |  13.4466 |   0.9858 |
groupby_multi_cython                         |  15.0924 |  15.2720 |   0.9882 |
groupby_multi_different_numpy_functions      |  11.1883 |  11.2376 |   0.9956 |
groupby_first_float32                        |   3.4907 |   3.4963 |   0.9984 |
groupby_last_float32                         |   3.6677 |   3.6586 |   1.0025 |
groupby_frame_apply_overhead                 |   9.2754 |   9.2440 |   1.0034 |
groupby_first                                |   3.5706 |   3.5577 |   1.0036 |
groupby_pivot_table                          |  17.2007 |  17.1343 |   1.0039 |
groupby_transform                            | 174.6823 | 173.8520 |   1.0048 |
groupby_frame_singlekey_integer              |   2.4533 |   2.4397 |   1.0056 |
groupby_frame_nth                            |   2.7863 |   2.7630 |   1.0085 |
groupby_simple_compress_timing               |  35.6836 |  35.3360 |   1.0098 |
groupby_sum_booleans                         |   1.2880 |   1.2727 |   1.0121 |
groupby_transform2                           | 162.5650 | 160.2323 |   1.0146 |
groupby_apply_dict_return                    |  39.7460 |  39.1597 |   1.0150 |
groupby_multi_size                           |  23.2656 |  22.9087 |   1.0156 |
groupby_datetimes_last                       |  12.0660 |  11.8797 |   1.0157 |
groupby_series_simple_cython                 | 192.2614 | 188.9360 |   1.0176 |
groupby_last                                 |   3.8017 |   3.7254 |   1.0205 |
groupby_int_count                            |   4.6777 |   4.5680 |   1.0240 |
groupby_mixed_first                          |  12.3363 |  12.0014 |   1.0279 |
groupby_frame_median                         |   7.6403 |   7.4034 |   1.0320 |
groupby_frame_apply                          |  45.0197 |  43.6123 |   1.0323 |
groupby_object_nth                           | 430.5693 | 416.2673 |   1.0344 |
groupby_datetimes_nth                        | 428.6557 | 414.3833 |   1.0344 |
groupby_frame_cython_many_columns            |   4.3843 |   4.2194 |   1.0391 |
groupby_multi_python                         | 142.0707 | 135.8123 |   1.0461 |
groupby_nth_float32                          |  63.5343 |  59.9463 |   1.0599 |
groupby_nth_float64                          |  64.0316 |  59.8179 |   1.0704 |
groupby_multi_count                          |   9.0540 |   8.2114 |   1.1026 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [e28ec0d] : BUG/PERF: perf issues in object groupby aggregations (GH7555)
Base   [69bb0e8] : Merge pull request #7556 from onesandzeroes/expand-grid

DOC: Cookbook recipe for emulating R's expand.grid() (#7426)

@jreback jreback added this to the 0.14.1 milestone Jun 25, 2014
@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2014

hm sorry for the churn here, i would've thought that float64 output for object would be faster than object output

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2014

moral of the story: i should vbench more often

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2014

can you show the vbench?

@jreback
Copy link
Contributor Author

jreback commented Jun 25, 2014

no the problem with the float64 was that it was blowing up on object dtypes themselves (e.g. strings), forcing a slow python path for those.

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2014

ah ok

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2014

is it just me or are these kind of hard to test...

@jreback
Copy link
Contributor Author

jreback commented Jun 25, 2014

yeh...didn't have vbenches for them, that was a problem
further the exception messages are trapped....hmmm let me add that to #7569

I added for more dtypes

jreback added a commit that referenced this pull request Jun 25, 2014
BUG/PERF: perf issues in object groupby aggregations (GH7555)
@jreback jreback merged commit c98548b into pandas-dev:master Jun 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants