Skip to content

BUG: Don't overflow in DataFrame init #18624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ Conversion
^^^^^^^^^^

- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
-
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
-

Indexing
Expand Down Expand Up @@ -262,4 +262,3 @@ Other
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to conversion (prob should move some of the other ones appropriately as well) but other PR for that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done.

15 changes: 12 additions & 3 deletions pandas/_libs/src/inference.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -181,14 +181,22 @@ cdef class Seen(object):
"""
Set flags indicating that an integer value was encountered.

In addition to setting a flag that an integer was seen, we
also set two flags depending on the type of integer seen:

1) sint_ : a negative (signed) number in the
range of [-2**63, 0) was encountered
2) uint_ : a positive number in the range of
[2**63, 2**64) was encountered

Parameters
----------
val : Python int
Value with which to set the flags.
"""
self.int_ = 1
self.sint_ = self.sint_ or (val < 0)
self.uint_ = self.uint_ or (val > oINT64_MAX)
self.sint_ = self.sint_ or (oINT64_MIN <= val < 0)
self.uint_ = self.uint_ or (oINT64_MAX < val <= oUINT64_MAX)

@property
def numeric_(self):
Expand Down Expand Up @@ -1263,7 +1271,8 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
if not seen.null_:
seen.saw_int(int(val))

if seen.uint_ and seen.sint_:
if ((seen.uint_ and seen.sint_) or
val > oUINT64_MAX or val < oINT64_MIN):
seen.object_ = 1
break

Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,13 @@ def test_convert_numeric_int64_uint64(self, case, coerce):
result = lib.maybe_convert_numeric(case, set(), coerce_numeric=coerce)
tm.assert_almost_equal(result, expected)

@pytest.mark.parametrize("value", [-2**63 - 1, 2**64])
def test_convert_int_overflow(self, value):
# see gh-18584
arr = np.array([value], dtype=object)
result = lib.maybe_convert_objects(arr)
tm.assert_numpy_array_equal(arr, result)

def test_maybe_convert_objects_uint64(self):
# see gh-4471
arr = np.array([2**63], dtype=object)
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,18 @@ def test_constructor_overflow_int64(self):
df_crawls = DataFrame(data)
assert df_crawls['uid'].dtype == np.uint64

@pytest.mark.parametrize("values", [np.array([2**64], dtype=object),
np.array([2**65]), [2**64 + 1],
np.array([-2**63 - 4], dtype=object),
np.array([-2**64 - 1]), [-2**65 - 2]])
def test_constructor_int_overflow(self, values):
# see gh-18584
value = values[0]
result = DataFrame(values)

assert result[0].dtype == object
assert result[0][0] == value

def test_constructor_ordereddict(self):
import random
nitems = 100
Expand Down