Skip to content

BUG: Convert tuple to list before _list_to_arrays when construct DataFrame. #25731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 5, 2019
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,7 @@ Reshaping
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as ``objs`` argument (:issue:`21510`)
- Bug in :func:`concat` where the resulting ``freq`` of two :class:`DatetimeIndex` with the same ``freq`` would be dropped (:issue:`3232`).
- Bug in :func:`merge` where merging with equivalent Categorical dtypes was raising an error (:issue:`22501`)
- Bug in :class:`DataFrame` constructor when passing non-empty tuples would cause a segmentation fault (:issue:`25691`)

Sparse
^^^^^^
Expand Down
35 changes: 29 additions & 6 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2267,7 +2267,7 @@ def to_object_array(rows: object, int min_width=0):
list input_rows
list row

input_rows = <list>rows
input_rows = list(rows)
n = len(input_rows)

k = min_width
Expand Down Expand Up @@ -2304,31 +2304,54 @@ def tuples_to_object_array(ndarray[object] tuples):
return result


def to_object_array_tuples(rows: list):
def to_object_array_tuples(rows: object):
"""
Convert a list of tuples into an object array. Any subclass of
tuple in `rows` will be casted to tuple.

Parameters
----------
rows : 2-d array (N, K)
A list of tuples to be converted into an array
min_width : int
The minimum width of the object array. If a tuple
in `rows` contains fewer than `width` elements,
the remaining elements in the corresponding row
will all be `NaN`.

Returns
-------
obj_array : numpy array of the object dtype
"""
cdef:
Py_ssize_t i, j, n, k, tmp
ndarray[object, ndim=2] result
list input_rows
tuple row

n = len(rows)
input_rows = list(rows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this as

rows=list(rows) and remove input_rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and removed the input_rows in to_object_array as well.

n = len(input_rows)

k = 0
for i in range(n):
tmp = 1 if checknull(rows[i]) else len(rows[i])
tmp = 1 if checknull(input_rows[i]) else len(input_rows[i])
if tmp > k:
k = tmp

result = np.empty((n, k), dtype=object)

try:
for i in range(n):
row = rows[i]
row = input_rows[i]
for j in range(len(row)):
result[i, j] = row[j]
except Exception:
# upcast any subclasses to tuple
for i in range(n):
row = (rows[i],) if checknull(rows[i]) else tuple(rows[i])
if (checknull(input_rows[i])):
row = (input_rows[i],)
else:
row = tuple(input_rows[i])
for j in range(len(row)):
result[i, j] = row[j]

Expand Down
18 changes: 16 additions & 2 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1209,12 +1209,26 @@ def test_constructor_mixed_type_rows(self):
expected = DataFrame([[1, 2], [3, 4]])
tm.assert_frame_equal(result, expected)

def test_constructor_tuples(self):
@pytest.mark.parametrize("tuples,lists", [
((), []),
((()), []),
(((), ()), [(), ()]),
(((), ()), [[], []]),
(([], []), [[], []]),
(([1, 2, 3], [4, 5, 6]), [[1, 2, 3], [4, 5, 6]])
])
def test_constructor_tuple(self, tuples, lists):
# GH 25691
result = DataFrame(tuples)
expected = DataFrame(lists)
tm.assert_frame_equal(result, expected)

def test_constructor_list_of_tuples(self):
result = DataFrame({'A': [(1, 2), (3, 4)]})
expected = DataFrame({'A': Series([(1, 2), (3, 4)])})
tm.assert_frame_equal(result, expected)

def test_constructor_namedtuples(self):
def test_constructor_list_of_namedtuples(self):
# GH11181
from collections import namedtuple
named_tuple = namedtuple("Pandas", list('ab'))
Expand Down