Skip to content

PERF: Panel.shift vs 0.13.1 #6826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Apr 6, 2014 · 10 comments
Closed

PERF: Panel.shift vs 0.13.1 #6826

jreback opened this issue Apr 6, 2014 · 10 comments
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

related: #6605

panel_shift                                  | 263.2827 |   0.0746 | 3528.0756 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [e84efe5] : Merge pull request #6825 from cpcloud/replace-dict-vbenches

BENCH: add vbench for issue 6697
Base   [d10a658] : RLS: set released to True. v0.13.1
@dalejung
Copy link
Contributor

dalejung commented Apr 7, 2014

Moved from PR thread

So thinking about this. Is there a reason that this logic wouldn't also apply to DataFrame.shift?

In [26]: df = pd.DataFrame(np.random.randn(10000, 10000))

In [38]: panel = pd.Panel(np.random.randn(464, 464, 465))

In [47]: df.values.size
Out[47]: 100000000

In [39]: panel.values.size
Out[39]: 100112640

In [40]: %timeit df.shift(1)
1 loops, best of 3: 468 ms per loop

In [43]: %timeit df.shift(1, axis=1)
1 loops, best of 3: 543 ms per loop

In [41]: %timeit panel.shift(1)
1 loops, best of 3: 470 ms per loop

In [42]: %timeit panel.shift(1, axis=1)
1 loops, best of 3: 484 ms per loop

I suppose one could assume that Panel will generally be larger than DataFrame.

iirc, the reason we don't just take a slice for DataFrame is that it just defers work to later realignment.

Would be nice if numpy could understand a logical array consisting of multiple physical ones, in our case the shifted data + na data.

I'm going to poke at returning view slices for DataFrame.shift, but that's probably a no go since anything that doesn't use pandas label alignment would get a different set of values(no na row). More curious on what it does to the vbench.

@dalejung
Copy link
Contributor

dalejung commented Apr 7, 2014

http://nbviewer.ipython.org/gist/dalejung/10013405

In [4]:
%timeit df.slice_shift(1) # supa fast
10000 loops, best of 3: 152 µs per loop

In [5]:
%timeit df.shift(1)

In [9]:
%timeit df + df.shift()
1 loops, best of 3: 942 ms per loop

In [10]:
%timeit df + df.slice_shift()
1 loops, best of 3: 1.35 s per loop

While the slice shifting is fast in itself, it ends up costing more due to reindexing. So in a sense, we pay a higher price for deferring since the _align_frame logic is generalized and the non-view shift is specific.

@jreback
Copy link
Contributor Author

jreback commented Apr 7, 2014

I agree realiagnment is expensive, BUT I think if we can return a view it is better to delay that penalty (as it might not be paid, e.g. if you are indexing or something).

Did the previous impl do soemthing different?

@dalejung
Copy link
Contributor

dalejung commented Apr 9, 2014

The previous Panel shift just took that view slice. DataFrame shift always returned a copy.

I'm going to make a general view_shift and switch Panels to use that. Then afterwards explore the implications of switching DataFrame.shift. I think that might be a no go because of backwards compat, but it'll probably lead to some fast past in the _combine_frame to short circuit cases where right is a subset of left.

Might be not hit this for a few days but the reverting itself will be straight forward.

@jreback
Copy link
Contributor Author

jreback commented Apr 9, 2014

ok thanks

pls update when u r ready

@jreback
Copy link
Contributor Author

jreback commented Apr 21, 2014

@dalejung ?

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2014

@dalejung update on this?

@dalejung
Copy link
Contributor

@jreback I'll either push out the above or just a simple reversal shortly. Both paths are straight-forward. Dog had back surgery so I've been pre-occupied. She's fine but needs sling-walking every 2-3 hours 24/7 due to steroids.

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2014

ahh...hope dog feels better! I hear ya

ok....lmk when ready

dalejung added a commit to dalejung/pandas that referenced this issue Apr 27, 2014
…#6826

TST: Make sure Panel.shift retains dtypes

DOC: removed previous doc entries for pandas-dev#6605

Re-add note about dropping shifted periods

DOC: added note about bug fix

don't pass on freq
@jreback
Copy link
Contributor Author

jreback commented Apr 28, 2014

closed by #6974

@jreback jreback closed this as completed Apr 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants