pd.to_numeric(series, downcast='integer') does not prpoerly handle floats over 10,000 #14941

gryBox · 2016-12-21T11:54:27Z

Hi - I came across this issue in stackoverflow while testing pd.to_numeric()

If all floats in column are over 10000 it loses precision and converts them to integers.

tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
                      'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
                      'colC' : ['a1','b2','c3','a4','z5', 'q6'],
                      'colD': [10000.0, 20000, 3000, 40000.36, 50000, 50000.00]})

pd.to_numeric(tst_df['colD'],  downcast='integer')

This doesn't seem like the desired behavior.

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-21T12:55:31Z

so this is correct, it is effectively doing an .astype(int). though I suppose this is too aggressive and the downcast from float -> int should only succeed if they are actually equal.

@gfyoung thoughts

gryBox · 2016-12-21T14:47:27Z

Just so I am clear, this is the behavior expected? Not this?

tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
                      'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
                      'colC' : ['a1','b2','c3','a4','z5', 'q6'],
                      'colD': [1000.0, 2000, 3000, 4000.36, 5000, 50000.00]})

pd.to_numeric(tst_df['colD'],  downcast='integer')

jreback · 2016-12-21T16:28:34Z

hmm, this looks buggy. if you want to have a look see inside and see what's going on would be great.

gryBox · 2016-12-21T16:52:28Z

@jreback I will try - I am sort of new to this. Is this the module I should be looking at?
pandas/pandas/tools/util.py

jreback · 2016-12-21T17:05:11Z

yes

gfyoung · 2016-12-21T17:07:49Z

@gryBox : First of all, thanks for pointing this out! If you follow the code from where you started, the bug traces here. You can see here yourself:

>>> import numpy as np
>>>
>>> arr = [1000000]
>>> arr2 = [1000000.5]
>>>
>>> np.allclose(arr, arr2)  # This is what we do now
True
>>> np.allclose(arr, arr2, rtol=0)  # This is what we probably should do
False

I think if passing rtol=0, you should be able to patch this behavior. It also explains why it breaks with large numbers (the numbers are so large rtol isn't small enough to indicate that they're not close).

gryBox · 2016-12-22T15:17:00Z

@gfyoung : The credit goes to tworec at stack.

gfyoung · 2016-12-22T15:44:46Z

@gryBox : I'm not sure there is such a method. Also, numpy dtype checking is not 100% compatible with all of our pandas objects (deliberate), hence we prefer to stay away from such methods in numpy.

Closes pandas-devgh-14941.

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Dec 21, 2016

jreback added Bug Difficulty Novice labels Dec 21, 2016

jreback added this to the Next Major Release milestone Dec 21, 2016

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 31, 2016

BUG: Patch float and uint handling in to_numeric

9e35819

Closes pandas-devgh-14941.

gfyoung mentioned this issue Dec 31, 2016

BUG: Patch float and uint handling in to_numeric #15024

Closed

jreback modified the milestones: 0.20.0, Next Major Release Dec 31, 2016

jreback closed this as completed in 5353e59 Dec 31, 2016

jake-westfall mentioned this issue Feb 16, 2018

to_numeric(..., downcast='float') is too aggressive #19729

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.to_numeric(series, downcast='integer') does not prpoerly handle floats over 10,000 #14941

pd.to_numeric(series, downcast='integer') does not prpoerly handle floats over 10,000 #14941

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gfyoung commented Dec 21, 2016 •

edited

Loading

gryBox commented Dec 22, 2016 •

edited

Loading

gfyoung commented Dec 22, 2016 •

edited

Loading

pd.to_numeric(series, downcast='integer') does not prpoerly handle floats over 10,000 #14941

pd.to_numeric(series, downcast='integer') does not prpoerly handle floats over 10,000 #14941

Comments

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gryBox commented Dec 21, 2016

jreback commented Dec 21, 2016

gfyoung commented Dec 21, 2016 • edited Loading

gryBox commented Dec 22, 2016 • edited Loading

gfyoung commented Dec 22, 2016 • edited Loading

gfyoung commented Dec 21, 2016 •

edited

Loading

gryBox commented Dec 22, 2016 •

edited

Loading

gfyoung commented Dec 22, 2016 •

edited

Loading