-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Resolve Heisenbug in StringHashTable._unique #55530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
c300b46
Resolve Heisenbug in StringHashTable._unique
MichaelTiemannOSC b4157f0
Added test for #45929 and removed superfluous single_cpu mark
MichaelTiemannOSC fd9f388
Merge branch 'main' into issue_45929
MichaelTiemannOSC f33cdc0
Update v2.2.0.rst
MichaelTiemannOSC File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the same issue happen when the function exits if
keep_rval_refs
gets GC'ed but the exception message still persists? This may delay the issue from happening but I'm not sure it fully solves itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is bounded in time between the scanning of
values
in the firstrange(n)
loop and the finalization ofuniques
within the function. The exception message does not convey the ephemeral repr value. Anduniques
consists only of Python objects. When the function returns, only Python objects, not ephemeral C objects, are reachable.To say this a bit more clearly, we only need the ephemeral repr values to live long enough to become Python objects in the
kh_get_str/kh_put_str
processing (the secondrange(n)
loop).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the return value have the same lifecycle issues though? The lifecycle of
uniques
that is getting returned as well asself.table
outlive this function, so oncekeep_rval_refs
gets GC'ed I think you still have the same issueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, for two reasons. The first reason is that the rvals in question only need to be kept alive because they are what controls whether the ephemeral C strings live or die when the next GC runs. When there is no longer a need to reference those C strings, there's no need to protect the C strings from GC. The second reason is that the thing that keeps the rvals alive is an actual live reference to the rval. That is what
keep_rval_refs
does. Anything thatself.table
can reference, anything thatuniques
can reference is just as live a reference as anything else. So the very fact thatself.table
oruniques
live beyond the scope of this function is all that's needed to keep the Python objects they contain from being reaped. There's nothing special about the contents ofkeep_rval_refs
after the function exits.The
vecs
vector contains halflings (C strings contained within Python objects). Once we no longer need to touch the C strings, once we are only dealing with full-fledged Python objects, we can rely on Python's GC to do the right thing with the objects we still need and with the objects we no longer need. It is because we need to touch the strings within ephemeral repr objects that we have to keep those objects alive as long as the strings need to live. Not longer, and not because somebody else also might need the Python objects (because that need itself will keep the object alive).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution here is to figure out how to use the CPython API and not introduce any new lifecycle management. I am guessing that
PyUnicode_AsUTF8AndSize
is throwing an error on the surrogate pair, since that is what happens in the interpreter:Is there a way to handle that surrogate better in the C-API? (we can get rid of
get_c_string
in the first place; I think this was just left around as a Py2/3 compat function but is equivaelent toPyUnicode_AsUTF8AndSize
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the initial implementation touches a live wire when it reaches into Python to touch its C strings. All I did was create the industrial-strength insulating gloves to make that safe. Alas, what I don't know about Unicode could fill several books, so I would defer to somebody who has that knowledge. I just want to get my complex number tests integrated 🤓