-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Tips and tricks for pandas devs #3156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adding
to a remote will pull all pull requests and make them available as branch "pr/xyz" |
Very cool. That's in function gcopr {
git fetch upstream
git checkout upstream/pr/$1
} in your .bashrc |
maybe create a milestone, something like info for items like this? |
I'll move it to the docs when I get a chance. |
Can we add a way to make tox nosetests line configurable? None of the tests I add get run :( |
would like to consolidate all the test_*.sh stuff to a single python script with bells and perhaps whistles. |
FYI, ghi now allows you to see the issues that you created. |
@y-p should |
another useful tool is nose-progressive, which runs your tests with much cleaner output and gives a nice if somewhat superfluous progress bar in the terminal. also scm_breeze gives you a bunch aliases for common git stuff |
nice way to time tests so that you know which ones are running slow |
now there's a set of commands in the
|
like this could go on the tips page |
should note that |
Stale, wiki pages replace this (somewhat). Also, slight whiff of unbecoming hubris. |
Working on pandas for a while now, there's a bunch of tools and tricks
I use, here's a list to help pandas devs slip into the zone:
Use ipdb rather then pdb with nose: --ipdb --ipdb-fail
https://github.com/flavioamieiro/nose-ipdb
Because tab-completion is not optional.
Re-running only failed tests
nosetests --with-id --failed
will rerun only the tests which failed lasttime you ran nosetests --with-id. If you use test_fast.sh
will do what you expect after you had some tests fail
Better integration of github and git commandline flow
hub a wrapper around git, with github
sugar. first and foremost:
adds a remote, fetches it, creates a branch for it, and generally puts your right there.
Note: see comment below for a way to do this with pure git, if you don't
mind thousands of remote branches.
GH issues from the command line
ghi
open/manipulate gh issues from the command line.
I use it to open issues when I hit a bug and want to quickly
open a reminder to fix, without breaking my focus.
Testing across python version locally
tox let's you run the test suites across all python versions using virtualenvs.
Everything is setup in the repo, just install and run.
detox parallelizes tox.
Faster pandas builds/testing
Note: the build cache was baked into setup.py from roughly 0.9.1. as of 0.11.0
it's been factored out into
scripts/use_build_cache.py
, which rewrites setup.pyto use the build cache. The script has been tested as far back as 0.7.0.
Putting the following in your .bashrc
c69e3aa
can be any recent commit, needs to be bumped if there are updatesto the script.
The pandas build cache code, caches cythonization, compilation and
2to3 artifacts for reuse in subsequent builds.
To compile, use "git reset --hard" to get the commit you're after, then use
cdev
to build pandas. setup.py will reuse what it can to speed this up.
Note that setup.py gets overwritten, but also restored when the build completes.
With a warm cache, moving to a given commit takes just a few seconds rather then
then the several minutes of a full compile.
You may also run
scripts/use_build_cache.py
prior to launching tox to speed up tetsing.Use ccache
The build cache just described caches things on a very coarse level, if there's
any change to .pyx (cython) files, all the files will be recythonized and rebuilt.
Using ccache (an apt-get+envar away on most distors these days) can speed
up the compilation part by caching the gcc compilation results. Yes, this overlaps
with the caching from the previous section, only it also caches the cythonized
c files.
Benchmarking commits
test_perf.sh let's you compare the performance of one commit against
another or benchmark the current HEAD.
It produces a table of results suitable for posting in a PR, and can serialize
the results dataframe into a pickle file, for analysis in pandas.
It can print summary stats over mutliple runs and all sorts of things.
see
test_perf.sh --help
,Easily generate dataframes of different kinds
mkdf
let's you easily fabricate dataframes of varying dimensionsand arbitrary data:
ipython startup file
your ipython installation has ~/.ipython/profile_default/startup directory,
put your imports, monkey-patches and utility function there and have them
always available.
Speel checking github issues
issues can quickly become stream of conciousness thing once
you start doing a lot of them, if you'd like an easy way to get red squigglies
when your comment contains silly mistaces, you might consider installing
After the deadline, available as an extension for firefox and chrome.
Handy git commands
There are too many git tricks to cover, but the following are both useful and less commonly known:
Generate a new Hash for the current commit, without any other changes to repo state.
git commit --amend -C HEAD
Report author of given commit hash:
and properly assign authorship of a commit:
where foohash is any previous commit authored by that contributor.
To locate the merge commit that introduced a commit into the branch:
https://github.com/jianli/git-get-merge
The text was updated successfully, but these errors were encountered: