-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF/API: fast paths for product MultiIndex? #15503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So this is quite a lot of work to actually create a separate MI type of index to do this. The bottleneck is not indexing anyhow. Its the reshaping.
makes this about 10x faster, BUT there are several cases that are failing. You are welcome to have a look at seeing if this can pass the test suite. |
|
(note that my final version is about 2x slower than before), because I have to do multiple reshapings to get things in the correct order. But still about 4x faster. you generally cannot simply do a reshaping, and esp directly on |
That's awesome, thanks! I was thinking we'd need a separate type to avoid checking the index each time, but I guess that's not an issue. With |
.stack and .sorting are separate issues why don't u profile a bit and see where the hotspots are |
closes pandas-dev#15503 Author: Jeff Reback <[email protected]> Closes pandas-dev#15510 from jreback/reshape3 and squashes the following commits: ec29226 [Jeff Reback] PERF: faster unstacking
Feature Proposal
At the moment, we have a few different methods for storing indexed higher-dimensional arrays:
For some datasets, I've found the PMI to be the best option, together with occasional workarounds for performance bottlenecks. Operations which are slow for a general MultiIndex, like
unstack()
orswaplevel().sortlevel()
, can be sped up for PMIs (see below).It would be great if we could do something like this more generally, with fast paths for PMIs. We could maybe have
MultiIndex.from_product()
return a PMI object, which would upcast to MultiIndex when necessary. We could also havestack()
andunstack()
create PMI objects where possible, and perhaps add an argument toconcat()
andset_index()
to create PMIs. Slow MultiIndex operations could then have a fast path for PMI objects.Code Sample
The text was updated successfully, but these errors were encountered: