BUG: to_numeric doesn't work uint64 numbers #14422

verhalenn · 2016-10-14T02:15:33Z

uint64 isn't very well supported right now but something to consider.

In [10]: pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-10-6e8272095758> in <module>()
----> 1 pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')

/home/verhalenn/Documents/Open-Source/pandas/pandas/tools/util.py in to_numeric(arg, errors, downcast)
    193             coerce_numeric = False if errors in ('ignore', 'raise') else True
    194             values = lib.maybe_convert_numeric(values, set(),
--> 195                                                coerce_numeric=coerce_numeric)
    196 
    197     except Exception:

/home/verhalenn/Documents/Open-Source/pandas/pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53043)()
    667             seen_float = True
    668         elif util.is_integer_object(val):
--> 669             floats[i] = ints[i] = val
    670             seen_int = True
    671         elif util.is_bool_object(val):

OverflowError: Python int too large to convert to C long

gfyoung · 2016-10-14T03:54:15Z

@jreback : The bug can be traced to this line here, where we try to store uint64 in an int64 array. Do we need to create another array for integers beyond np.iinfo(np.int64).max?

jreback · 2016-10-14T10:05:34Z

yep, we don't have lots of tests / support for uint64's atm. so pull requests to fix welcome.

gfyoung · 2016-10-14T15:13:12Z

@jreback : What would be the best way to patch do you think? The problematic case comes when we have negative integers and positive integers exceed np.iinfo(np.int64).max:

>>> import numpy as np
>>> np.array([-1, np.iinfo(np.int64).max + 1])  # Even numpy has issues.
array([ -1.00000000e+00,   9.22337204e+18])

mroeschke · 2019-10-27T18:27:41Z

Looks to work on master. Could use a test.

In [73]: In [10]: pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')
    ...:
Out[73]:
0                      0
1    9223372036854775808
dtype: uint64

In [74]: pd.__version__
Out[74]: '0.26.0.dev0+684.g953757a3e'

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Oct 14, 2016

jreback added this to the Next Major Release milestone Oct 14, 2016

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions labels Oct 27, 2019

pv8473h12 mentioned this issue Nov 2, 2019

GH14422: BUG: to_numeric doesn't work uint64 numbers #29348

Merged

jreback modified the milestones: Contributions Welcome, 1.0 Nov 5, 2019

jreback closed this as completed in #29348 Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_numeric doesn't work uint64 numbers #14422

BUG: to_numeric doesn't work uint64 numbers #14422

verhalenn commented Oct 14, 2016

gfyoung commented Oct 14, 2016 •

edited

Loading

jreback commented Oct 14, 2016

gfyoung commented Oct 14, 2016 •

edited

Loading

mroeschke commented Oct 27, 2019

BUG: to_numeric doesn't work uint64 numbers #14422

BUG: to_numeric doesn't work uint64 numbers #14422

Comments

verhalenn commented Oct 14, 2016

gfyoung commented Oct 14, 2016 • edited Loading

jreback commented Oct 14, 2016

gfyoung commented Oct 14, 2016 • edited Loading

mroeschke commented Oct 27, 2019

gfyoung commented Oct 14, 2016 •

edited

Loading

gfyoung commented Oct 14, 2016 •

edited

Loading