-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix segfault with printing dataframe #47097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
roberthdevries
wants to merge
4
commits into
pandas-dev:main
from
roberthdevries:46848-fix-segfault-when-printing-dataframe
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea why this makes a difference? this looks deeply weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel i've posted the arguments that cause
take_2d_axis1_object_object
to segfault in #46848 (comment) if that helps.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps should explicitly check for NULL for clarity. see #46848 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps should explicitly check for NULL for clarity. see #46848 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fix here, probably should also patch
take_2d_axis0_
, see #46848 (comment)@roberthdevries would changing L159 like done in https://github.com/pandas-dev/pandas/pull/13764/files also work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmpff, It seems
empty_like
does not fill the object array withNone
, but just ensuresNULL
'ing. NumPy supports this (mostly?), it is perfectly fine if an array is filled with NULL's, this is the same as if the array was filled withNone
.It is probably best if you also add support for that in pandas. That said, it would also be better to explicitly fill the array with
None
in NumPy. (And I do think we should define that as the right thing, even if there are branches where NULL's must be expected a fully initialized array filled withNULL
s should never happen!)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To proof the point:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg How would you effectively detect
NULL
values in a numpy array? I assume that @simonjayhawkins prefers a fix somewhere in thesetitem
method ofDataFrame
, but I am at a loss how to intercept those values and convert them toNone
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, it seems to me that cython may not know about the possibility of NULLs for object typed buffers (either that way or typed memoryviews). The generated code doesn't look like it. So I am not sure what a decent way forward would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a work-around that is
relativelyquick may be the following:But maybe its nicer to just drop that weird
PyArray_GetPtr
:which will be as fast as possible without any micro-optimizations (not quite sure on Cython 3, but maybe it doesn't matter).
I am have planning on plugging this hole in NumPy (numpy/numpy#21817), but I am not sure if it is better there or in Cython.
(IMO, it is clearly right to fix this either in NumPy or in Cython, although if we fix it in NumPy, pandas still needs the work-around since you can rely on a new Cython, but not a new NumPy.)