Skip to content

PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible... #6650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 17, 2014
Merged

PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible... #6650

merged 4 commits into from
Mar 17, 2014

Conversation

agijsberts
Copy link
Contributor

closes #6636

This fixes the performance bottleneck described in #6636 by using the vectorized epoch2num for DataIndex arrays. The included test ensures that the DatetimeConverter produces the same results as the Matplotlib's date2num for individual Timestamp objects as well as DatetimeIndex array objects.

def _check_equal(self, ts1, ts2 = None):
from pandas.tseries.converter import DatetimeConverter
from matplotlib import dates
from numpy.testing import assert_almost_equal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy's assert_almost_equal is necessary, since pandas' variant does not seem to allow to specify the required significance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's modify pandas version to pass thru the significance chrck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cython function decimal_almost_equal (https://github.com/pydata/pandas/blob/master/pandas/src/testing.pyx#L37) seems to do what we need, but I don't know how/where it is exposed.

Edit: Sorry, decimal_almost_equal implements just the scalar variant. The problem with pandas' assert_almost_equal is not only that it hardcodes the number of decimal, but also that the meaning of decimal is different. Besides numpy.testing.assert_almost_equal, also numpy.allclose could be an option here.

@agijsberts
Copy link
Contributor Author

Will do. A quick question though: I just noticed there are already tests in pandas/tseries/test/test_converter.py. Would that be the correct place to add them instead, rather than pandas/tests/test_tseries.py?

@jreback
Copy link
Contributor

jreback commented Mar 15, 2014

ok to put then in the converter test suite

@jorisvandenbossche
Copy link
Member

Maybe also add a vbench speed benchmark?

@agijsberts
Copy link
Contributor Author

I created a simple vbench benchmark for datetimeindex conversion in vb_suite/timeseries.py. Before I commit though, 2 small questions:

  1. What should I use as start_date? I would imagine the date that DatetimeConverter was added to the project, any way to find out when that was?
  2. DatetimeConverter depends on matplotlib. Should I deal with possible import failures?

@jreback
Copy link
Contributor

jreback commented Mar 16, 2014

you can use 1/1/13 is fine
vbench will catch import failures so just import

@jreback jreback added this to the 0.14.0 milestone Mar 17, 2014
from pandas.tseries.converter import DatetimeConverter
"""

datetimeindex_converter = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to define rng in the setup. pls post a run of this vbench as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, rng is already defined in common_setup; it is best practice to redefine it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry....you are right..ok then

@agijsberts
Copy link
Contributor Author

vbench run:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
datetimeindex_converter                      |   0.8059 | 411.0680 |   0.0020 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [dc95c4c] : Added vbench performance benchmark for speed up of DatetimeConverter.
Base   [4216178] : Merge pull request #5505 from jsexauer/fix5505

@jreback
Copy link
Contributor

jreback commented Mar 17, 2014

@TomAugspurger ok with this?

@TomAugspurger
Copy link
Contributor

Yep this looks good. Should I merge it?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2014

@agijsberts just need a release note

@agijsberts
Copy link
Contributor Author

Latest commit implements comments and adds release notes. Ok like this?

@agijsberts
Copy link
Contributor Author

Sorry, some tests fail in the last commit. If I replace from pandas import Series with from pandas.core.series import Series all is fine. Anybody knows why this is?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2014

the first causes a circular import

@agijsberts
Copy link
Contributor Author

Good to know, thanks!

@jreback
Copy link
Contributor

jreback commented Mar 17, 2014

@TomAugspurger you have the honors!

TomAugspurger pushed a commit that referenced this pull request Mar 17, 2014
PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible...
@TomAugspurger TomAugspurger merged commit 19ab5ed into pandas-dev:master Mar 17, 2014
@TomAugspurger
Copy link
Contributor

Thanks @agijsberts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up DatetimeConverter for plotting
4 participants