Skip to content

Fix to numeric on decimal fields #14842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

RogerThomas
Copy link
Contributor

@RogerThomas RogerThomas commented Dec 9, 2016

@RogerThomas RogerThomas force-pushed the fix_to_numeric_on_decimal_fields branch from 3996fe8 to 2d2488c Compare December 9, 2016 15:35
@RogerThomas
Copy link
Contributor Author

@jreback Does this look ok to you?

@jorisvandenbossche
Copy link
Member

@RogerThomas You don't need to open a new PR to update. You can just push again the same branch, and the PR on github will automatically be updated.

@jorisvandenbossche jorisvandenbossche added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 9, 2016
@RogerThomas
Copy link
Contributor Author

@jorisvandenbossche, thanks yeah, I thought the same, but the other PR doesn't seem to have been updated, maybe it only updates if its status is open. Anyway, I'll leave this one here and leave the other one closed if it's ok with you.

@jorisvandenbossche
Copy link
Member

maybe it only updates if its status is open.

indeed

@@ -173,6 +174,8 @@ def to_numeric(arg, errors='raise', downcast=None):
values = arg.values
elif isinstance(arg, (list, tuple)):
values = np.array(arg, dtype='O')
elif isinstance(arg, Decimal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be special cased

instead remove the use of np.isscalar and use is_scalar, which I think should work with decimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.lib.isscalar doesn't return true for decimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try changing that and see if anything breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I did, I tried

if lib.isscalar(arg):
    ...

But I don't see how lib.isscalar could return True for decimal

this is the isscalar code

def isscalar(object val):
    """
    Return True if given value is scalar.

    This includes:
    - numpy array scalar (e.g. np.int64)
    - Python builtin numerics
    - Python builtin byte arrays and strings
    - None
    - instances of datetime.datetime
    - instances of datetime.timedelta
    - Period

    """

    return (np.PyArray_IsAnyScalar(val)
            # As of numpy-1.9, PyArray_IsAnyScalar misses bytearrays on Py3.
            or PyBytes_Check(val)
            # We differ from numpy (as of 1.10), which claims that None is
            # not scalar in np.isscalar().
            or val is None
            or PyDate_Check(val)
            or PyDelta_Check(val)
            or PyTime_Check(val)
            or util.is_period_object(val))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jreback any more thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I understand where you're coming from, but I don't see how to get around it in this case.
If you're happy with it, this PR should be ready to go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you able to move addtl logic to is_scalar as well (IOW, did that break things irrespective of this change)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I can easily move the isinstance(value, Decimal) check to the isscalar function, but I don't see the point, I'd still have to check again after this in order to call float(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is what I'd like to see; we are trying to treat decimal.Decimal in a more thoughtful way

In [7]: d = decimal.Decimal('1.0')

In [8]: pandas.types.common.is_scalar(d)
Out[8]: False

In [9]: pandas.types.common.is_number(d)
Out[9]: True

I'd like [7] to be True.
further I'd like to add a is_decimal in pandas.types.inference

then the last part can be something like:

    elif isinstance(arg, (list, tuple)):
        values = np.array(arg, dtype='O')
    elif is_scalar(arg): # previous np.isscalar(arg):
        if is_decimal(arg):
            # coerce
            arg = float(arg)
        if is_number(arg):
            return arg
        is_scalar = True
        values = np.array([arg], dtype='O')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to add some tests where decimals are in a list/array (they will be then subseuqnetly converted via maybe_convert_numeric which is also fine.

@codecov-io
Copy link

codecov-io commented Dec 9, 2016

Current coverage is 84.64% (diff: 100%)

Merging #14842 into master will decrease coverage by <.01%

@@             master     #14842   diff @@
==========================================
  Files           144        144          
  Lines         51016      51019     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43185      43187     +2   
- Misses         7831       7832     +1   
  Partials          0          0          

Powered by Codecov. Last update 8e630b6...91d989b

@@ -173,7 +174,9 @@ def to_numeric(arg, errors='raise', downcast=None):
values = arg.values
elif isinstance(arg, (list, tuple)):
values = np.array(arg, dtype='O')
elif np.isscalar(arg):
elif lib.isscalar(arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import these from pandas.types.common (like everything else). you prob have to fix is_decimal to import also (it should be imported into pandas.types.inference, which pandas.types.common subseuqently imports). pandas.types.common is the internal API.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2016

almost perfect. left some comments.

@RogerThomas
Copy link
Contributor Author

Thanks @jreback, I actually had it like that originally but decided to just directly import it from lib. Changed back now

@jreback jreback added this to the 0.20.0 milestone Dec 19, 2016
@jreback
Copy link
Contributor

jreback commented Dec 19, 2016

ok lgtm. ping on green.

@RogerThomas
Copy link
Contributor Author

RogerThomas commented Dec 20, 2016

@jreback all ready to go

@jreback jreback closed this in 5faf32a Dec 20, 2016
@jreback
Copy link
Contributor

jreback commented Dec 20, 2016

thanks!

ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016
closes pandas-dev#14827

Author: Roger Thomas <[email protected]>
Author: Roger Thomas <[email protected]>

Closes pandas-dev#14842 from RogerThomas/fix_to_numeric_on_decimal_fields and squashes the following commits:

91d989b [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields
d7972d7 [Roger Thomas] Move isdecimal to internal api
1f1c62c [Roger Thomas] Add Test And Refactor is_decimal
f1b69da [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields
2d2488c [Roger Thomas] Fix To Numeric on Decimal Fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas.to_numeric raises ValueError when applied to python Decimal object
4 participants