-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DatetimeIndex.__iter__ creates a temp array of Timestamp (GH7683) #7709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gr8! pls also post the result of the vbench in the top section |
idx1 = date_range(start='20140101', freq='T', periods=N) | ||
idx2 = period_range(start='20140101', freq='T', periods=N) | ||
|
||
def iter_n(iterable, n=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this, the vbench will run on different versions
hmm, so time increases for the iteration? I get the early exit issue. |
From the vbench result, iteration time over PeriodIndex increases 16%. It's consistent for full/partial iteration so I guess that could be from change of using Full iteration over DatetimeIndex increases 33% and that's probably because vectorization is used no more. |
ok, I think that for replace There is quite a bit of overhead of going back/forth between python/cython. This won't work for Periods (as that is python defined). E.g. take the approach like |
then, the call e.g. |
|
that's real improvement. i'd love to see that to happen cause Timestamp could be a real pain but i'm not familiar with cython and that's beyond me |
This is from
|
@yrlihuan ok no problem....take a stab if you'd like (cython is actually pretty much like python code, just with types, and a bit harder to debug). if not I don't think is hard to do. always a learning experience! |
see #7720. I think its possible to do some sort of chunk converting. e.g. say convert 10000 elements at a time and provide them to the iterator and so on. This will allow early exit to work and at the same time make the whole process quite a bit faster. want to take a stab at fixing that? (its all in python space) |
You can do almost exactly this for a chunked iterator : https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L1316 |
closing in favor or #7720 (I put in the iterator, will update the perf in a minute) |
closes #7683
Move
__iter__
fromDatatimeIndex/PeriodIndex
toDatetimeIndexOpsMixin
Adding perf tests for change