Skip to content

cannot safely convert passed user dtype of <f8 for object dtyped data in column *** #13237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wildwild opened this issue May 20, 2016 · 13 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question

Comments

@wildwild
Copy link

Code Sample, a copy-pastable example if possible

under 64bit windows problem

it's ok on win32 platform ,but it does not work under 64bit,unless I remove the dtype and read the DataFrame and then transform the format asfloat ...

I cant debug into the file

market_head_list = ['TradingDay','ClosePrice']
dt['TradingDay'] = np.str
dt['ClosePrice'] = float # I used np.float64 too ,but it still cant work
csv_str_now ='1.7976931348623157e+308, 1.7976931348623157e+308 '

csv_str = StringIO.StringIO((csv_str_now))

df = pd.read_csv(csv_str, engine='c', names=market_head_list,dtype=dt)

output of pd.show_versions()

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8620)
File "pandas\parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:8876)
File "pandas\parser.pyx", line 904, in pandas.parser.TextReader._read_rows (pandas\parser.c:9893)
File "pandas\parser.pyx", line 1011, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:11286)
File "pandas\parser.pyx", line 1092, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:12659)
ValueError: cannot safely convert passed user dtype of <f8 for object dtyped data in column 1

@jreback
Copy link
Contributor

jreback commented May 20, 2016

Well the error message speaks for itself. Your values are above the float limit. Why would you actually want to do this?

In [11]: np.finfo(np.float64).max
Out[11]: 1.7976931348623157e+308

@jreback jreback closed this as completed May 20, 2016
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question labels May 20, 2016
@wildwild
Copy link
Author

I hope it can read it right and get the value ,it works well in 32bit ,but no ok under 64bit

@jreback
Copy link
Contributor

jreback commented May 25, 2016

This gives the same error on 32-bit windows as well. You are trying to fool with floats right at the limit, a recipe for disaster. No reason at all to do this.

@wildwild
Copy link
Author

hope it can transform correctly , as the max_dbl is still a legal float ,not a wrong float just the board of the float number

@p-himik
Copy link

p-himik commented Feb 28, 2017

Isn't it bad that read_csv works in one way and astype in another? Personally, I find it extremely confusing.
I spent a good couple of hours trying to figure out what on Earth read_csv didn't like about my data (and it's a large set, so no easy search), given that astype worked just fine.
Maybe astype should give the same error message for very high floats? Or maybe it's possible to print an actual conflicting value with the error message?

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

@p-himik well there are a couple of considerations

  • string -> float parsing is dependent on the exact precision selection (e.g. regular / high / whatever)
  • .astype is deferring to numpy astype, so it is possible to control, though you would have to specify casting=, but by default (in pandas) we don't do this
  • an alternative for safe casting is .to_numeric (though not really sure that is actually safe for very large floats either).

None of these are really good options. So sure, in an ideal world things are unified. I would say that .to_numeric is more friendly to a user, and its a pandas function, so easier to fix/control it, while .astype we have less control.

@p-himik
Copy link

p-himik commented Feb 28, 2017

Oh, at least to_numeric gives the actual value in the error message. Thanks for the hint and the explanation.

But can a similar thing be added to read_csv internals, so the problematic value is displayed?

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

can you show a copy-pastable example of what you mean?

@p-himik
Copy link

p-himik commented Feb 28, 2017

Here's what to_numeric shows:

In [137]: pd.to_numeric(o)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:55708)()

ValueError: Unable to parse string "52721156299871854681072370812488856336599863860274272781560003496046130816295143841376557767523688876482371118681808060318819187029855011862637267061548684520491431327693943042785705302632892888308342787190965977140539558800921069356177102235514666302335984730634641934384020650987601849970936578094137344.00000"

And here's what read_csv shows (the data is at ftp://ftp.sanger.ac.uk/pub/consortia/ibdgenetics/iibdgc-trans-ancestry-summary-stats.tar):

In [138]: d = pd.read_csv('EUR.UC.gwas.assoc', delim_whitespace=True, usecols=['OR'], dtype={'OR': np.float64})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14411)()

TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

[... long stacktrace ...]

pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14632)()

ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 8

The problem here is that it's impossible to tell anything about what's wrong with the data without loading and checking it manually. Apart from being a mild inconvenience, in some cases with strict data regulations manual checking can be just unfeasible.

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

@p-himik ahh I c. Well it would be quite simple to make a better error message (e.g. adding the problematic value, line number etc.).

Want to do a PR? (or if not, please open an issue with this example).

@p-himik
Copy link

p-himik commented Feb 28, 2017

I'll try to do a PR in a week or so. I've never been to the C part of Pandas before. :)

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

actually this is cython so should be pretty straightforward :>

@xmduhan
Copy link

xmduhan commented Mar 18, 2019

Just pass float_precision='high' to pd.read_csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants