Skip to content

PERF: to_numeric for numeric dtypes #12777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Apr 2, 2016

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

Skip object conversion if input is numeric already.

-  146.41ms    26.45μs      0.00  miscellaneous.to_numeric.time_from_float

@sinhrks sinhrks added Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 2, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Apr 2, 2016
@jreback
Copy link
Contributor

jreback commented Apr 2, 2016

lgtm. ping on green.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2016

In [1]: pd.date_range('20130101',periods=3,tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')

In [2]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern'))
TypeError: 'Index' does not have the buffer interface

In [3]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern').to_series())
Out[3]: 
2013-01-01 00:00:00-05:00    1357016400000000000
2013-01-02 00:00:00-05:00    1357102800000000000
2013-01-03 00:00:00-05:00    1357189200000000000
Freq: D, dtype: int64

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

@sinhrks this looked fine. can you just rebase (I think this was sent when codecov was misbehaving).

@jreback
Copy link
Contributor

jreback commented Apr 28, 2016

we can push this as well, lmk.

@codecov-io
Copy link

codecov-io commented Apr 29, 2016

Current coverage is 83.91%

Merging #12777 into master will increase coverage by +<.01%

@@             master     #12777   diff @@
==========================================
  Files           136        136          
  Lines         49918      49931    +13   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          41885      41899    +14   
+ Misses         8033       8032     -1   
  Partials          0          0          
  1. File ...das/tseries/index.py (not in diff) was modified. more
    • Misses -1
    • Partials 0
    • Hits +1

Powered by Codecov. Last updated by 2d2b45a

@sinhrks
Copy link
Member Author

sinhrks commented Apr 29, 2016

@jreback Current master has following issues. I've updated the PR to included the fixes.

1. to_numeric(Index) returns np.ndarray, not Index

To be compat with to_datetime, it must be Index.

pd.to_numeric(pd.Index([1, 2, 3]))
# array([1, 2, 3])

2. datetime-likes are not actually supported

Based on your examples, it should be asi8 repr.

pd.to_numeric(pd.date_range('2011-01-01', freq='M', periods=3))
# TypeError: 'Index' does not have the buffer interface

pd.to_numeric(pd.Series(pd.date_range('2011-01-01', freq='M', periods=3)))
# TypeError: Invalid object type

tm.assert_index_equal(res, pd.Index(idx.asi8, name='xxx'))

# ToDo: enable when we can support native PeriodDtype
# res = pd.to_numeric(pd.Series(idx, name='xxx'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoided to use is_period_array ATM. we can have faster impl when period dtype is added.

elif getattr(arg, 'ndim', 1) > 1:
raise TypeError('arg must be a list, tuple, 1-d array, or Series')
else:
values = arg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail with a scalar (though that might be another issue).

Copy link
Member Author

@sinhrks sinhrks Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it fails on master. Will include the fix.

pd.to_numeric(1)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr8!. I think there might be an issue about this somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12136 (comment)

not a separate issue.

@sinhrks
Copy link
Member Author

sinhrks commented Apr 29, 2016

Added scalar impl and tests (test_scalar). Now green except for codecov/changes.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

thanks!

@jreback jreback closed this in 84725fa Apr 29, 2016
@sinhrks sinhrks deleted the tonumeric_perf branch April 29, 2016 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants