-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Fix performance regression with Series statistical ops (#25952) #25953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Fix performance regression with Series statistical ops (#25952) #25953
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls create an asv for this or show a current one that is affected
u don’t need a note as this is unreleased code
Can you post ASV results for this? Also does this impact performance of DataFrame ops? |
Here is the affected ASV tests:
Most I'll run a full ASV tonight to see if there are any other major affected areas. |
Codecov Report
@@ Coverage Diff @@
## master #25953 +/- ##
==========================================
- Coverage 91.82% 91.8% -0.02%
==========================================
Files 175 175
Lines 52581 52582 +1
==========================================
- Hits 48280 48271 -9
- Misses 4301 4311 +10
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25953 +/- ##
==========================================
- Coverage 91.98% 91.96% -0.02%
==========================================
Files 175 175
Lines 52372 52369 -3
==========================================
- Hits 48172 48161 -11
- Misses 4200 4208 +8
Continue to review full report at Codecov.
|
whatsnew line has been removed. Here are the 'full' ASV results. I re-ran all the deviant tests and removed those that weren't consistent. Here are the final results:
Notably, |
Hello @ArtificialQualia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-04-28 20:04:36 UTC |
Anything more needed on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show the asv's that improved for this change. I believe the regression is in 0.25, correct? so no whatsnew is then needed. (pls confirm)
@@ -520,21 +520,16 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True, | |||
any_arraylike = any(isinstance(g, (list, tuple, Series, Index, np.ndarray)) | |||
for g in keys) | |||
|
|||
try: | |||
if (not any_callable and not any_arraylike and not any_groupers and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments here on what is being checked
merge master as well |
…x-groupby-performance
Master has been merged Correct, the change this is fixing was merged in 0.25. No need for a whatsnew. (see #25743) Here are the ASV results from a previous comment:
|
thanks |
git diff upstream/master -u -- "*.py" | flake8 --diff
#25743 introduced a performance regression in
Series
MultiIndex
statistical operations.This happened due to the additional required check for Series groupby operations that already existed for
DataFrame
. By ensuring that the check needs to be run, this regression can be avoided also potentially slightly speeding up similarDataFrame
operations.