-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: faster plotting of timeseries with PeriodIndex #4722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jorisvandenbossche I think you could try to create a vbench for this, nothing there now but create a file tlike this: (named say plotting or graphics): https://github.com/pydata/pandas/blob/master/vb_suite/ctors.py, and add it to the test using:
(lots of options to test_perf, e.g. you might want to start all of the tests with |
@jorisvandenbossche matplotlib actually tests this by comparing images. I don't know whether it's usable in client packages though. |
@jreback and @jtratner Thanks for the comments. Adding tests with comparing images does lead a little bit too far for this PR (and for me) I think (that is another discussion), but I will look at adding it to the current tests (the What is actually the difference between pandas/bench and pandas/vb_suite? |
vb_suite tests pandas performance against itself - used for testing that changes don't introduce performance regressions. bench - pandas vs. other software. On testing images - that seems fine to me, don't know what @jreback thinks. |
I added something to the vbench suite (see commit). This is the output I get when I run it with
The base is now taking 21 s to run (the target 0.26 s), should I decrease the size of the dataframe so it runs faster? |
prob make a bit smaller so that the older/slower one doesn't take so long |
OK, I added a test (actually there was already a PeriodSeries defined, but not used in a test) and changed the dimensions of the dataframe in the benchmark (it now runs in 0.9 s for base and 0.07 s for target) Should something be added to release notes? (it is not a bug fix, so maybe not?) |
Sure it's a bug-fix (it fixes 4705) so you should add to release.rst, also mention you've added it to vbench. :) |
Oeps, git-newbie here. I wanted to finish this up, but while rebasing interactive to squash some commits I also picked the last three commits from @jreback How can I remove them from this PR? (or is it not a problem for the history that it is included here in the overview?) |
rebase again to 1 commit prior to yours and force push |
@jreback What do you mean exactly? I dit |
I think this might work |
@jorisvandenbossche @jreback's cmd will only work if your better to do git remote add upstream git://github.com/pydata/pandas.git
git fetch upstream
git rebase upstream/master |
TST: add test for plotting PeriodSeries to SeriesPlots test
@cpcloud just about to ask! you have git-fu. I think I solved this once by
but a little kludgy |
OK, I also just did a simple |
great! |
@jorisvandenbossche you might want to read this wiki section. it gives a brief overview of rebase and merge |
@cpcloud Yeah, the basic rebase and merge I (think I) understand now, but it was because I picked too many commits with a I don't know there is something else that I have to do for this PR, but for me this is ready to merge. |
also @jorisvandenbossche if you ever run into this problem again one way to resolve it is this: let's say you're 2 commits ahead of git fetch upstream # don't forget this otherwise the next line will put you in a different state than current master!
git reset HEAD~2
git stash # push on to the stash stack
git reset --hard upstream/master # get back on master
git stash pop # pop off the stack
git add .
git commit -m'my aw-sum commit mess-ige' |
@jreback how's that for kludgy? 😄 |
I think I prefer the |
@cpcloud and I thought mine was kludgy! |
But back tot the PR, anything else I have to do? For me it is ready to merge. |
looks okay on my end |
PERF: faster plotting of timeseries with PeriodIndex
@jorisvandenbossche thanks much! |
@jreback Enjoying that I can help! |
or just use |
See discussion in #4705.
This improves the speed of plotting timeseries with PeriodIndex (or with DatetimeIndex that is converted to a PeriodIndex during plotting) drastically: from 30 s in 0.12 (but 7 s in 0.11) to 0.2 s with this commit for the plotting of following data:
Some questions I still have:
PeriodIndex
, but I left the originalif isinstance(values, Index)
in place. Is this needed? Is it possible that another index than PeriodIndex ends up in PeriodConverter?