-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
I case of dtype issues, read_csv doesn't give an error as useful as pd.to_numeric does #15898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@p-himik yeah I suppose its possible to try/except the Then you could run |
This is actually kind of complex, as we're only using I do think it would be possible, and nice, to catch the case where the user dtype is float and float parsing fails, basically would need to bubble up the error from this line, conditional on being passed a float dtype. Line 1780 in d9e00d2
|
does anyone know if this issue is stale or has it been sorted? i've not tried to reproduce the code sample yet. |
I'd encourage you to try the sample, or make a simpler reproducing case, but as far as I know this is still an issue. |
thanks @chris-b1 , i'm unsure how to reproduce the sample code, so i guess coming up with a simpler case is the way forward. |
simpler sample code example: >>> import pandas as pd
>>>
>>> pd.to_numeric('1.7976931348623157e+308') Traceback (most recent call last):
File "pandas/_libs/src\inference.pyx", line 1152, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "1.7976931348623157e+308"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric
coerce_numeric=coerce_numeric)
File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "1.7976931348623157e+308" at position 0 >>> from io import StringIO
>>>
>>> import numpy as np
>>> import pandas as pd
>>>
>>> csv_str = StringIO(('1.7976931348623157e+308, 1.7976931348623157e+308 '))
>>> df = pd.read_csv(csv_str, engine='c', names=["a", "b"], dtype={
... "a": np.str, "b": np.float64}) Traceback (most recent call last):
File "pandas\_libs\parsers.pyx", line 1156, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 446, in _read
data = parser.read(nrows)
File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1164, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 1 |
This looks to work on master now. Could use a test
|
A follow-up to #13237 . Copied examples:
Here's what to_numeric shows:
And here's what
read_csv
shows (the data is at ftp://ftp.sanger.ac.uk/pub/consortia/ibdgenetics/iibdgc-trans-ancestry-summary-stats.tar):@jreback
I've finally started looking into it, and it seems that I can't implement it in a good way without changing NumPy because, in the end, it's NumPy who doesn't give any row/value information, albeit Pandas conditionally changes the exception to its own.
I can write an ad-hoc implementation for numeric conversion using
pd.to_numeric
though, and use its row/value information in case it raises an exception. What do you think?The text was updated successfully, but these errors were encountered: