-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: break reference cycle in Index._engine #27607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much better done by removing @cache_readonly
and then use a weakref with the same types of caching (because cache_readonly does this in a very specific way)
@jreback I don't really understand what you mean. The issue is not the cache but that |
and I am not really sure what you suggested actually does anything. so let's revisit. what is the actual problem? having a cyclic reference in not a bug. |
You're right. I should have labeled it as a performance optimization. The issue with the reference cycle is that (similar to the one we had some time ago with the cycle between the indexer and the dataframe) that indices are not cleared from memory w/o running the GC. Under notebook / low-load conditions, this might not be an issue. For high-load system like dask / dask.distributed, this can be an issue since the python interpreter requires more memory than it should (peak memory consumption). I have actually observed exactly this issue under dask.distributed and depending on the load pattern, users can expect a temporary (i.e. until the GC finds the Index) over a gigabyte per worker. So what I would like to archive with this PR is that the index does not have a reference cycle and is cleared instantly when not used anymore. If we agree on that change, I'll change the changelog entry and commit message before merging to make sure this is not labeled as a bug but as an enhancement. |
not objecting to the change in principle. But not sure how adding an indirection different from the lambda helps here. |
I've changed it to a more elegant solution and added a comment why this is helpful. |
can you add an asv for this as well (the memory kind) |
Did so. Not sure if this is the right place, but it demonstrates the issue as well: Before (current master):
After (this PR):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny comment, otherwise lgtm. don't push yet though, the CI is acting up.
86814d2
to
2923512
Compare
lgtm. you have some lint issues I think. ping on green. |
@jreback ping :) |
Thanks @crepererum! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff