-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_numeric(..., downcast='float') is too aggressive #19729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Docstring isn't really clear on what our policy is, but I think how you expected is reasonable Change would be around here - adding an equality check pandas/pandas/core/dtypes/cast.py Line 149 in df38f66
|
I think there may be a further (maybe related?) problem. pd.set_option('display.float_format', '{:.6f}'.format)
df = pd.DataFrame([{'val': 5.0},
{'val': 16786415.0},
])
df['val_downcast_flt'] = pd.to_numeric(df['val'], downcast='float')
print(df.info())
print(df) gives
|
Having the same issue here and lost information because of this! I know have to stick to the beefy float64... |
Having same problems here too |
The same problem (pandas 1.0.4) |
Encountered the same problem. |
Some more aggressive example: >>> pd.to_numeric(pd.Series([2.0 ** 128]), downcast='float')
0 inf
dtype: float32 |
Short summary
to_numeric
downcasts integers "safely," that is, it only returns a downcasted result if that result == the argument. But it downcasts floats "non-safely" / too aggressively, that is, it forces a downcasted result even when that result != the argument.Illustration for integers: Behavior is as expected
For big integers that must be represented by int64 (because they are greater than
np.iinfo('int32').max
), forcing a downcast to int32 by using.astype('int32')
is destructive in that the result is no longer == the argument. Butto_numeric
withdowncast='integer'
is "safe" in that it will refuse to downcast and instead return a result that is still int64.It looks like this behavior was discussed in the resolved issue #14941.
Illustration for floats: Behavior is unexpected and potentially harmful
For big floats, using
to_numeric
withdowncast='float'
appears to be just as forceful as using.astype('float32')
, in that it returns a downcasted result even if that result is no longer == the argument.Expected output:
Output of
pd.show_versions()
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: