Skip to content

PERF: use fastpath=True in Index methods (delete/drop/insert/etc)? #6933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
immerrr opened this issue Apr 22, 2014 · 3 comments · Fixed by #7040
Closed

PERF: use fastpath=True in Index methods (delete/drop/insert/etc)? #6933

immerrr opened this issue Apr 22, 2014 · 3 comments · Fixed by #7040
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Milestone

Comments

@immerrr
Copy link
Contributor

immerrr commented Apr 22, 2014

Been hit by this when optimizing index-oblivious Blocks. In my case,

is_deleted = np.zeros(len(index), dtype=np.bool_)
is_deleted[deleted_loc] = True
index = index[~is_deleted]

was still a lot faster then than index.delete(deleted_loc).

It appears, that delete doesn't add fastpath=True to ctor and that triggers type inference for string (object) indices. There seems to be plenty of methods that do it the same way and hence are slow too, can we do something about that?

@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

sure
want to do a quick pr ?

vbenches a plus but not that necessary
maybe spot check ok

@jtratner
Copy link
Contributor

so what's the downside here?

@immerrr
Copy link
Contributor Author

immerrr commented Apr 24, 2014

@jtratner it's the same as one of my earlier PRs about __getitem__: we drop type inference that might've changed index type after deletion of non-matching elements. That is if we go with "trivial" solution.

If trivial is not necessary, then dtype inference may be preserved without the necessity to traverse each element of dtype=O array. If one made an int8 array of object classes (or int 16 if 255 classes is not enough) to accompany the main one, then instead of enumerating all entries each time it would suffice to do single np.bincount(x.ravel()).nonzero() to find out which classes are present. Memory overhead for that int8 array should be acceptable, but it may require some non-trivial effort to make sure they're in sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants