Skip to content

BUG: to_numeric doesn't work uint64 numbers #14422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
verhalenn opened this issue Oct 14, 2016 · 4 comments · Fixed by #29348
Closed

BUG: to_numeric doesn't work uint64 numbers #14422

verhalenn opened this issue Oct 14, 2016 · 4 comments · Fixed by #29348
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@verhalenn
Copy link

uint64 isn't very well supported right now but something to consider.

In [10]: pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-10-6e8272095758> in <module>()
----> 1 pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')

/home/verhalenn/Documents/Open-Source/pandas/pandas/tools/util.py in to_numeric(arg, errors, downcast)
    193             coerce_numeric = False if errors in ('ignore', 'raise') else True
    194             values = lib.maybe_convert_numeric(values, set(),
--> 195                                                coerce_numeric=coerce_numeric)
    196 
    197     except Exception:

/home/verhalenn/Documents/Open-Source/pandas/pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53043)()
    667             seen_float = True
    668         elif util.is_integer_object(val):
--> 669             floats[i] = ints[i] = val
    670             seen_int = True
    671         elif util.is_bool_object(val):

OverflowError: Python int too large to convert to C long
@gfyoung
Copy link
Member

gfyoung commented Oct 14, 2016

@jreback : The bug can be traced to this line here, where we try to store uint64 in an int64 array. Do we need to create another array for integers beyond np.iinfo(np.int64).max?

@jreback
Copy link
Contributor

jreback commented Oct 14, 2016

yep, we don't have lots of tests / support for uint64's atm. so pull requests to fix welcome.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Oct 14, 2016
@jreback jreback added this to the Next Major Release milestone Oct 14, 2016
@gfyoung
Copy link
Member

gfyoung commented Oct 14, 2016

@jreback : What would be the best way to patch do you think? The problematic case comes when we have negative integers and positive integers exceed np.iinfo(np.int64).max:

>>> import numpy as np
>>> np.array([-1, np.iinfo(np.int64).max + 1])  # Even numpy has issues.
array([ -1.00000000e+00,   9.22337204e+18])

@mroeschke
Copy link
Member

Looks to work on master. Could use a test.

In [73]: In [10]: pd.to_numeric(pd.Series([0, 9223372036854775808]), downcast = 'unsigned')
    ...:
Out[73]:
0                      0
1    9223372036854775808
dtype: uint64

In [74]: pd.__version__
Out[74]: '0.26.0.dev0+684.g953757a3e'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions labels Oct 27, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants