-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: integer overflow in csv_reader #47167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A possible solution is to include a check after the cast like for the extension dtypes: pandas/pandas/core/arrays/integer.py Lines 53 to 55 in c355145
A PR follows ... |
This fails a bit differently now: ========================================================================================================= short test summary info =========================================================================================================
FAILED ../startup.py::test_integer_overflow_with_user_dtype[int64] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[uint32] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[int32] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[uint16] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[int16] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[uint8] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[int8] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[Int64] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[UInt32] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[Int32] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[UInt16] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[Int16] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[UInt8] - Failed: DID NOT RAISE <class 'Exception'>
FAILED ../startup.py::test_integer_overflow_with_user_dtype[Int8] - Failed: DID NOT RAISE <class 'Exception'>
====================================================================================================== 14 failed, 2 passed in 0.58s ======================================================================================================= However, >>> from io import StringIO
>>> import numpy as np
>>> import pandas as pd
>>> dtype = np.dtype(np.int16)
>>> maxint = np.iinfo(dtype).max
>>> text = f"{maxint + 1}"
>>> pd.DataFrame([maxint + 1], dtype= dtype) # fails
ValueError: Values are too large to be losslessly converted to int16. To cast anyway, use pd.Series(values).astype(int16)
>>> pd.read_csv(StringIO(text), header=None, dtype=dtype) # overflows
0
0 -32768 I.e. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
As the example shows, all invocations with the extension dtype variants (UInt64, etc.) and with the non-extension dtype uint64 manage to parse the max-value but fail at max + 1 with an exception (more specifically we get an OverflowError for uint64, a ValueError for UInt64 and TypeErrors for all other extension dtypes, so I simpled checked for any exception in the example). This is the safe and IMHO expected behavior.
The issue arises when parsing an integer value with a user defined dtype
TextReader(..., dtype != None)
and only for non-extension dtypes:The second problem comes from
pandas/pandas/_libs/parsers.pyx
Line 1191 in c355145
casting="unsafe"
parameter is used. Furthermore, for int64, we do not reach this line and just return with the result from_try_uint64
.Expected Behavior
Non-extension integer dtypes should have the same behavior like the extension dtypes, i.e. only return exactly the requested dtype (if specified by the user) and raise when this dtype is insufficient to hold the parsed value.
Installed Versions
1.5.0.dev0+839.gc355145c7f
The text was updated successfully, but these errors were encountered: