-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New engine for MultiIndex? #18519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also previous discussion in #1752. |
Aha, and #16324, which was a response to #16319, which was a reaction to #15245, which was a response to #13904, which was a follow-up to the issue @chris-b1 mentioned. And while the
that is, large indexes are free from #18485 and friends. |
OK, after reviewing this evidence, I guess we probably don't need a new engine: but I think we should (be able to) improve the |
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
closes pandas-dev#18519 closes pandas-dev#18818 closes pandas-dev#18520 closes pandas-dev#18485 closes pandas-dev#15994 closes pandas-dev#19086
Currently,
MultiIndex.get_loc()
andMultiIndex.get_indexer()
both rely on an_engine
which is either aMultiIndexObjectEngine
or aMultiIndexHashEngine
: but both of these are thin layers over the flatObjectEngine
. This means that the actual structure of labels and levels is completely discarded (except e.g. for partial indexing, see_get_level_indexer()
).In principle, a completely different scheme could be used:
levels
, and find the corresponding codeIn most cases, the second part should be the computationally expensive one. It would consist in running
nlevels
searches in arrays ofdtype=int
(the.labels
) rather than (as it is now) one search in anobject
array in which each element is actually atuple
ofnlevels
elements. My guess is that thanks to vectorization the former should be much faster than the latter.Moreover (and maybe more importantly), with the current engine fixing a bug such as #18485 is a nightmare. And the same applies to
and probably others. This is because even though levels are not mixed, the elements are compared as
objects
.One caveat is that the single levels would be very often non-unique, and I'm not sure what is the impact of this with the current implementation of hash tables.
The text was updated successfully, but these errors were encountered: