Skip to content

Unstack performance regression #19289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jan 17, 2018 · 3 comments · Fixed by #20703
Closed

Unstack performance regression #19289

TomAugspurger opened this issue Jan 17, 2018 · 3 comments · Fixed by #20703
Labels
Performance Memory or execution speed performance
Milestone

Comments

@TomAugspurger
Copy link
Contributor

http://pandas.pydata.org/speed/pandas/#reshape.SparseIndex.time_unstack

#18460 (comment)

cc @toobaz

@TomAugspurger TomAugspurger added the Performance Memory or execution speed performance label Jan 17, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Jan 17, 2018
@TomAugspurger
Copy link
Contributor Author

@toobaz do you have time to take a look at this for 0.23?

@toobaz
Copy link
Member

toobaz commented Mar 29, 2018

I'll try to look at this, and hopefully fix it, next week.

@toobaz
Copy link
Member

toobaz commented Apr 15, 2018

@TomAugspurger .unstack() needs to find unused levels. The problem is that since #18460, this is done twice for each level, once when building the index, and once when building the values (previously, it was done twice only for the level which was being unstacked). There is room for improving the code, but it's not simple, I won't have time soon.

#20703 recovers the performance drop by just making MultiIndex.remove_unused_levels more performant when there are no unused levels. It does not tackle the problem of checking twice, nor brings any improvement when all levels have unused items. We can leave this bug open if you want to keep a reminder for a more general refactoring.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants