Skip to content

pd.PeriodIndex is a lot slower since version 0.15.2 #12889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rs2 opened this issue Apr 13, 2016 · 4 comments
Closed

pd.PeriodIndex is a lot slower since version 0.15.2 #12889

rs2 opened this issue Apr 13, 2016 · 4 comments
Labels
Performance Memory or execution speed performance Period Period data type

Comments

@rs2
Copy link
Contributor

rs2 commented Apr 13, 2016

PeriodIndex is one of the key methods used for time series analysis. Its performance has deteriorated significantly since pandas version 0.15.2. The code snippet below is easy to profile.

python --version & python -c "import pandas as pd; print('Pandas version', pd.__version__)" & python -m timeit "import pandas as pd; pd.PeriodIndex(pd.date_range('2015-1-1', '2016-1-1', freq='D').to_pydatetime(), freq='D')"

Python 3.4.4
Pandas version 0.15.2
10 loops, best of 3: 3.27 msec per loop

Python 3.4.4
Pandas version 0.16.2
10 loops, best of 3: 94.8 msec per loop

Python 3.4.3 :: Continuum Analytics, Inc.
Pandas version 0.17.0
10 loops, best of 3: 221 msec per loop

Python 3.5.1 :: Anaconda 4.0.0 (32-bit)
Pandas version 0.18.0
10 loops, best of 3: 220 msec per loop
@rs2 rs2 changed the title pd.PeriodIndex is a lot slower since version 0.16.2 pd.PeriodIndex is a lot slower since version 0.15.2 Apr 13, 2016
@jreback
Copy link
Contributor

jreback commented Apr 13, 2016

In [10]: %timeit pd.PeriodIndex(pd.date_range('2015-1-1', '2016-1-1', freq='D').to_pydatetime(), freq='D')
100 loops, best of 3: 12.8 ms per loop

In [11]: %timeit pd.PeriodIndex(pd.date_range('2015-1-1', '2016-1-1', freq='D'))
10000 loops, best of 3: 192 µs per loop

In [13]: %timeit pd.date_range('2015-1-1', '2016-1-1', freq='D').to_period()
1000 loops, best of 3: 205 µs per loop

In [15]: %timeit pd.period_range('2015-1-1','2016-1-1',freq='D')
1000 loops, best of 3: 447 µs per loop
In [17]: result1 = pd.period_range('2015-1-1','2016-1-1',freq='D')

In [18]: result2 = pd.date_range('2015-1-1', '2016-1-1', freq='D').to_period()

In [19]: result3 = pd.PeriodIndex(pd.date_range('2015-1-1', '2016-1-1', freq='D'))

In [20]: result4 = pd.PeriodIndex(pd.date_range('2015-1-1', '2016-1-1', freq='D').to_pydatetime(), freq='D')

In [21]: result1.equals(result2)
Out[21]: True

In [22]: result1.equals(result3)
Out[22]: True

In [23]: result1.equals(result4)
Out[23]: True

that's a quite convoluted way of creating a PI, as are convering to python objects multiple times. many other much better ways.

@jreback jreback closed this as completed Apr 13, 2016
@jreback jreback added Performance Memory or execution speed performance Period Period data type labels Apr 13, 2016
@rs2
Copy link
Contributor Author

rs2 commented Apr 14, 2016

Jeff, any recommendations for an alternative way of creating a PI?

@jorisvandenbossche
Copy link
Member

@rs2 Jeff showed 3 alternative ways above to create a PI. Or is it the problem you cannot avoid to start with python datetime.datetime objects?

@jreback But, this is potentially related to the slowdown in timeseries plotting #11831, as I see for the above timeit a similar difference between python 2 and python 3 (ca 20ms vs ca 300 ms, both with 0.17.1)

@rs2
Copy link
Contributor Author

rs2 commented Apr 14, 2016

@jreback , @jorisvandenbossche : I need to have a PeriodIndex with irregular dates. Performance of the following snippet has deteriorated significantly after 0.15.2. Can you recommend an alternative way of creating a Series with data at irregular intervals while preserving those irregular dates rather than resampling the series with interpolation?

pd.PeriodIndex(['20160101', '20160106', '20160128'], freq='D')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

No branches or pull requests

3 participants