-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Quick dataframe shift #5609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this would need to be done in core/internals.py needs to handle non trivial shifts (eg something that is not so straightforward like this) and multi dtypes |
@halleygithub putting on the plate for 0.14. Care to do a PR for this? |
@jreback , as I am not familiar with github, so I am not sure what can I do . But you are free to handle the code as you want. :) |
it's a nice excuse to get familiar see here: http://pandas.pydata.org/developers.html |
@jreback I was wondering if I can take a stab at this? |
of course lmk if u need help |
@jreback I see shift function in core/internals.py in the Block class and in core/generic.py in the NDFrame class. Can you point me to a place where I can understand some basic code flow? Or can you give me some quick pointers? |
core/generic /shift sets up what to do eg how many periods to shift and such and translates it to move the data up 3 or whatever; this then calls the internals shift core/internals/ Block/shift is where to make this change simply swap out the implantation for the new |
@jreback Thanks. I will do that. |
FYI, I just fixed a bug in shift, see here: #6373 The soln proposed above is still valid in any case....make a change in |
Sure. I will take your changes and merge with mine. |
@jreback I swapped the current shift with a roll, in order to see if I get a speed up. I tried swapping in core/internals.py. But this did not get any performance improvements. On the other hand when I use a roll in core/generic.py (just as a test), I see good speedup. The two implementations are shown below:
Do you have any ideas why np.roll speeds up the NDFrame but not the Block class? The timings for the above mentioned example are as follows:
|
you can't do it in generic.py because that only handles a single dtype. The orientation needs to be changed as the blocks store values 'flipped' from what you think they are. You need to adjust the roll (which IIRC is just an axis swap) to account for this. In internals you are effectively using axis 1 for the shift (on a row shift), that's what the block_axis is. |
[goat-jreback-~/pandas] git diff
|
you can also remove |
pls add a vbench for this as well (vb_suite/frame_methods.py) |
Thanks. I will try this and add one to vbench. |
Quick implementation of dataframe shift
Perf benchmark:
Related to #4095
The text was updated successfully, but these errors were encountered: