Skip to content

Quick Frame Shift Implementation #6404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 138 commits into from

Conversation

gouthambs
Copy link
Contributor

Fix for issue #5609

@jreback
Copy link
Contributor

jreback commented Feb 19, 2014

ok....when you run the tests...lots of failure....pls investigate....

@gouthambs
Copy link
Contributor Author

I see that the indexer in pandas/core/internal is unused. Removing this argument will mean we can take out _shift_indexer. Should I remove the indexer argument in pandas/core/internal ?

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

yep indexer can go

@gouthambs
Copy link
Contributor Author

Taking indexer out clashes with the shift in the SparseBlock. Should the same roll logic work here ? Any idea what are the other shift calls that may need to be changed here?

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

So I would take the indexer out of the Block.shift method as you don't need it their. Take the argument out of the SparseBlock.shfit as well, and compute the indexer their (move the _shift_indexer code into that function).

@gouthambs
Copy link
Contributor Author

The indexer is obtained as:
indexer = com._shift_indexer(len(self._get_axis(axis)), periods)

The first argument to _shift_indexer requires the length of the axis using _get_axis method defined in NDFrame. Is there an equivalent call inside Block?

Is it the same as saying len(self.values) or len(self.values.T)?

@jreback
Copy link
Contributor

jreback commented Feb 21, 2014

just take the guts of _shift_indexer (and remove it from common.py). The length it is looking for is len(self.values) i believe.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

can you post the results of the vbench?

If you can rebase/squash would be great

@jreback jreback added this to the 0.14.0 milestone Mar 9, 2014
@gouthambs
Copy link
Contributor Author

Are there any docs on how to setup and run vbench? I couldn't get it running! I tested the old and new using timeit.

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

go to vb_suite; run ./test_perf.sh -b master -t HEAD

@TomAugspurger
Copy link
Contributor

I put some notes I had written up for statsmodels at https://github.com/pydata/pandas/wiki/Performance-Testing

@jreback hopefully everything is accurate. I'll add more later.

@jreback
Copy link
Contributor

jreback commented Mar 12, 2014

@TomAugspurger awesome just what was needed!

jreback and others added 19 commits March 12, 2014 16:31
Work on pandas-dev#6175. Changes instances of assert_([not] isinstance(a,b)) to specialized assert[Not]IsInstance(a, b).
Work on pandas-dev#6175. Change superclass of tests/test_compat, to use tm.TestCase
Finishes pandas-dev#6175, to the extent that everything remaining can be mapped to assertTrue or assertFalse, unless larger refactorings are desired.
TST dont ignore subselection in groupby
TST for selected groupby add resample ohlc and filter
This is an implementation of quick shift logic

Added a vbench to reflect quick shift implementation

This change is a working version that gives the performance improvement and passes tests. Refine in next steps.

Slightly modified and cleaner logic.

Removed unused indexer, _shift_indexer

Fixed the failing tests for SparseDataFrame
@gouthambs
Copy link
Contributor Author

@jreback I couldn't run the test like you said. Perhaps it has to do with the fact that I use a Windows system. Can some one run this? I do have the squashed commit here.

@jreback
Copy link
Contributor

jreback commented Mar 12, 2014

ok just post the results them of the benchmark before and after the change

@gouthambs
Copy link
Contributor Author

Test code

import pandas as pd
import numpy as np
import timeit

xdim = 10000
ydim = 500
repeats = 10
df = pd.DataFrame(np.random.rand(xdim,ydim))

# axis 0 shift
s = timeit.default_timer()
for i in range(repeats):
    df1 = df.shift(1,axis=0)
e = timeit.default_timer()
print "Axis 0 shift",e-s

# axis 1 shift
s = timeit.default_timer()
for i in range(repeats):
    df1 = df.shift(1,axis=1)
e = timeit.default_timer()
print "Axis 1 shift",e-s

Results

V0.13.1

Axis 0 shift 0.958936203491
Axis 1 shift: IndexError: index 500 is out of bounds for size 500

----------------------------------------------------------------------------------
V0.13.1-330-gb562eb2

Axis 0 shift 0.1622
Axis 1 shift 0.3466 

@gouthambs
Copy link
Contributor Author

@jreback I am not sure if the squashed set is supposed to have all these commits. Also I notice that after squashing, the shift has the axis in reverse. Not sure what happened here!!!!!

@jreback
Copy link
Contributor

jreback commented Mar 13, 2014

try

git  rebase upstream/master
git push yourforkname yourbranchname -f 

@gouthambs
Copy link
Contributor Author

@jreback The rebase was a little confusing. I am getting a lot of conflicts. Not sure why! Sorry I am a little new to git. My squashed commits are in b562eb2 + ff18ac9 shown above. Any chance we can work with this. As of the last commit, I checked that all the tests were passing.

@jreback
Copy link
Contributor

jreback commented Mar 17, 2014

ok...here's what you need to do:

git checkout -b new_branch --track origin/master
git cherry-pick <commit1> thisbranch
git cherry-pick <commit2> thisbranch
git push myfork thisbranch

where and 2 are your commits from above

this takes the commits out and puts them in a new branch

push this branch up and do a new PR, closing this one.

@gouthambs
Copy link
Contributor Author

Please refer to #6672 for the new pull request.

@gouthambs gouthambs closed this Mar 20, 2014
@gouthambs gouthambs deleted the bug-fix-shift branch March 20, 2014 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.