Skip to content

Fix performance issues when creating multiple instances of Period (#12903, #11831) #12909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

rs2
Copy link
Contributor

@rs2 rs2 commented Apr 16, 2016

closes #12903
closes #11831

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

@@ -0,0 +1,7 @@
US_RESO = 0
MS_RESO = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add another module instead put these in frequencies

@jreback
Copy link
Contributor

jreback commented Apr 16, 2016

pls run the asv suite for relevant benchmarks

@jreback
Copy link
Contributor

jreback commented Apr 16, 2016

needs a whatsnew entry

@jreback jreback added Performance Memory or execution speed performance Period Period data type labels Apr 17, 2016
@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

also pls show the asv runs for the period benchmarks

@rs2
Copy link
Contributor Author

rs2 commented Apr 20, 2016

@jreback , @tswicegood Is there a doc describing the hoops I need to jump through to get Travis and Jeff happy to merge? 😄 I.e. the names of the scripts I need to run to add asv benchmarks, pass coverage and pep8 requirements?

@jorisvandenbossche
Copy link
Member

@rs2 see @jreback's previous comments:

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

can you show some benchmark results (prob need some asv's for this as well).

@rs2 rs2 closed this Apr 26, 2016
@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

Will submit another pull request

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

just force push to this PR
opening multiple PRs for the same issue is confusing and more work for others

@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

The failing test is not mine.

@rs2 rs2 reopened this Apr 26, 2016
@@ -243,7 +243,7 @@ Performance Improvements


- Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`)

- Improved performance of creating multiple instances of Period. This has regressed from ``O(1)`` to ``O(n)`` in terms on number of calls to ``_find_and_load`` between ``0.15.2`` and ``0.18.0``. (:issue:`12903`, :issue:`11831`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say improved perf in period construction and plotting of periods

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

@rs2 that test test runner has been known failing for a while.

@codecov-io
Copy link

codecov-io commented Apr 26, 2016

Current coverage is 83.84%

Merging #12909 into master will increase coverage by +0.10%

@@           master     #12909   diff @@
========================================
  Files         134        136     +2   
  Lines       49565      49716   +151   
  Methods         0          0          
  Branches        0          0          
========================================
+ Hits        41507      41684   +177   
+ Misses       8058       8032    -26   
  Partials        0          0          
  1. 6 files in pandas/tseries were modified. more
    • Misses +4
    • Hits +3
  2. 2 files in pandas/tools were modified. more
    • Misses +1
    • Hits +53
  3. 2 files in pandas/core were modified. more
    • Misses +2
    • Hits -2
  4. 4 files in pandas were modified. more
    • Misses +2
    • Hits +8
  5. 2 files (not in diff) in pandas were modified. more
    • Misses -7
    • Hits +7

Sunburst

Powered by Codecov. Last updated by 5cbe9f7

@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

Adding asv benchmarks...

@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

asv benchmarks

From 5.89s to 525ms:

Before the change:

~/pandas_master/asv_bench$ asv dev -b create_period
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for /home/pashok/anaconda3/bin/python
[  0.00%] ·· Benchmarking /home/pashok/anaconda3/bin/python
[100.00%] ··· Running period.create_period_index_from_date_range.time_period_index                                                                                                    5.89s

After the change

~/pandas/asv_bench$ asv dev -b create_period
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for /home/pashok/anaconda3/bin/python
[  0.00%] ·· Benchmarking /home/pashok/anaconda3/bin/python
[100.00%] ··· Running period.create_period_index_from_date_range.time_period_index                                                                                   525.84ms


def time_period_index(self):
# Simulate irregular PeriodIndex
PeriodIndex(date_range('1985', periods=10000).to_pydatetime(), freq='D')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make this 1000 (it will be the same ratio, but the previous versions will take less time)

@jreback jreback added this to the 0.18.1 milestone Apr 26, 2016
@jorisvandenbossche
Copy link
Member

Nice speedup! Did you run the benchmarks on py2 or py3?

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

~/pandas/asv_bench$ asv continuous -b period (-b just matches so you can see results (will compare)

@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

@jorisvandenbossche : plot asv benchmarks:

Before the change

~/pandas_master/asv_bench$ asv dev -b foo
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] ·· Building for /home/pashok/anaconda3/bin/python
[  0.00%] ·· Benchmarking /home/pashok/anaconda3/bin/python
[ 50.00%] ··· Running foo_bar.test_foo_bar.time_plot_date_ranged_df                                                                                                                   1.81s
[100.00%] ··· Running foo_bar.test_foo_bar.time_plot_plain_df                                                                                                                      129.33ms
~/pandas_master/asv_bench$ python --version
Python 3.5.1 :: Anaconda 2.5.0 (64-bit)
~/pandas_master/asv_bench$

After the change

~/pandas/asv_bench$ asv dev -b foo
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] ·· Building for /home/pashok/anaconda3/bin/python
[  0.00%] ·· Benchmarking /home/pashok/anaconda3/bin/python
[ 50.00%] ··· Running foo_bar.test_foo_bar.time_plot_date_ranged_df                                                                                                  214.24ms
[100.00%] ··· Running foo_bar.test_foo_bar.time_plot_plain_df                                                                                                        123.06ms
~/pandas/asv_bench$ python --version
Python 3.5.1 :: Anaconda 2.5.0 (64-bit)
~/pandas/asv_bench$
import numpy as np
import pandas as pd


class test_foo_bar(object):

    goal_time = 0.2
    N = 2000

    def setup(self):
        self.date_ranged_df = pd.DataFrame(np.random.randn(self.N, 5), index=pd.date_range('1/1/1975', periods=self.N))
        self.plain_df = pd.DataFrame(np.random.randn(self.N, 5))

    def time_plot_date_ranged_df(self):
        self.date_ranged_df.plot()

    def time_plot_plain_df(self):
        self.plain_df.plot()

@rs2
Copy link
Contributor Author

rs2 commented Apr 26, 2016

Anything else needs to be done to be able to merge?

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

looks good. ping when green.

@jreback jreback closed this in 7fbc600 Apr 26, 2016
@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

thanks @rs2

@blbradley
Copy link
Contributor

This was a big TODO I had when moving Period to Cython. Thank you for working it out @rs2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
5 participants