Skip to content

PERF: improve perf of index iteration (GH7683) #7720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jul 10, 2014

closes #7683

PeriodIndex creation is still in python space, not much help

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_iter_datetimeindex_preexit        |  12.7254 | 3657.9890 |   0.0035 |
timeseries_iter_datetimeindex                | 679.8913 | 3726.6284 |   0.1824 |
timeseries_iter_periodindex_preexit          |  69.0370 |  62.8881 |   1.0978 |
timeseries_iter_periodindex                  | 6941.9633 | 6024.2947 |   1.1523 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [142718e] : PERF: DatetimeIndex.__iter__ now uses ints_to_pydatetime with boxing
Base   [f9493ea] : Merge pull request #7713 from jorisvandenbossche/doc-fixes3

DOC: fix doc build warnings

if box and util.is_string_object(offset):
from pandas.tseries.frequencies import to_offset
offset = to_offset(offset)

if tz is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot dup codes under this if/else clause. combine them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not completely dupes
this is how cython goes though
u could right an inline function to do it
why don't u give a try!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you make such a good case. ok then!
though i don't know how this work. how can i pull this change and continue to work on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this:

git checkout -b jreback-iter origin/master
git pull https://github.com/jreback/pandas.git iter

Then you will have a local branch jreback-iter where you can make changes and such
then you can push up this branch; it will be local to you. just ping and i'll pick-up your commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrlihuan
Copy link
Contributor

5 times faster! that's great!

PERF: DatetimeIndex.__iter__ now uses ints_to_pydatetime with boxing

tslib: remove code dup in ints_to_pydatetime

PERF: add inline to create_timestamp_from_*
jreback added a commit that referenced this pull request Jul 16, 2014
PERF: improve perf of index iteration (GH7683)
@jreback jreback merged commit 27b2b7d into pandas-dev:master Jul 16, 2014
@jreback
Copy link
Contributor Author

jreback commented Jul 16, 2014

@yrlihuan thanks for the fixes on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.iterrows takes too long and generate large memory footprint
2 participants