Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BUG: Convert tuple to list before
_list_to_arrays
when construct DataFrame. #25731New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Convert tuple to list before
_list_to_arrays
when construct DataFrame. #25731Changes from 6 commits
2f00c7f
17827de
e4e68ce
4aa629e
d6c029d
2c896f9
d9e3fe7
24fc8de
e9a0ab3
4e41de3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this down even futher into to_object_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:
to_object_array_tuples
is used to convert list of tuples, andto_object_array
is used to convert list of lists. How could these two case be unified toto_object_array
?The tuple (or list) needs to be converted to list before send to
to_object_array_tuples
, otherwise we will see incorrect type error forpd.DataFrame(((),()))
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i mean is push the listifying down to these 2 routines, in fact
to_object_array
already has this, need to add similar into_object_array_tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the segmentation fault happens in #25691 is caused by
to_object_array
, rather thanto_object_array_tuples
. Theinput_rows = <list>rows
does nothing than an assignment:The
rows
andinput_rows
are bothPyObject *
. The whole story is, when we invoketo_object_array
with([], [])
, we use list's[]
(__Pyx_GetItemInt_List_Fast), it results a null pointer. Then wePy_INCREF
it, thus the segmentation fault.I have revised the patch to remove the unused conversion in
to_object_array
and restrict its argument type tolist
. Another possible solution should be change the type of argumentrows
toobject
fromlist
the both two methods, but I think lists is used more commonly to construct DataFrame than tuples.__Pyx_GetItemInt_List_Fast
should be faster than__Pyx_GetItemInt_Generic
and the conversion from tuple to list in python side is fairly ok.The backtrace of the segmentation fault
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, IIRC this was put in here pretty specifically, can you see why that was? in any event, I believe you can simply type the input arg to
to_object_array_tuples
no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
to_object_array_tuples
, if the element inrows
is not a tuple, it will be cast to a tuple.https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/lib.pyx#L2325-L2337
However I still think it make sense to treat
list of lists
andlist of tuples
differently and dispatch them to different methods to avoid the overhead caused by castinglist
totuple
. I thinklist of lists
may be used more frequently. I have add a comment toto_object_array_tuples
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sighingnow I agree with your point, I am not suggesting combing these routines at all, rather inside each routine the data is first cast to a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.