-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC/DEV: document approach to bisect regressions #35685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This would be good...I've got a local script
and then put whatever example I'm trying to check in
. It took me more than I'm proud to admit to figure this out, so making a note of this (or of whatever the proper way of doing it is) would be useful IMO |
good idea. I'll use this issue to post a few notes in the first instance. as well as finding bug/regressions it can also be useful to know when issues are fixed, e.g. #35650 In that issue the old behaviour raised ValueError and the new behaviour didn't. (more detail on exit status will be needed in doc) so the script needed to exit with a non-zero exit code for the new behaviour and with a zero exit code for the old behaviour so in this case, this script was used (print statements not needed) import sys
import numpy as np
import pandas as pd
print(pd.__version__)
arr = np.array([np.datetime64("2015-11-20T15:06:58.000")])
arr.flags.writeable = False
try:
res = pd.factorize(arr)
print(res)
except:
pass
else:
sys.exit(1) |
I've been wondering how exactly @simonjayhawkins does this but was too afraid to ask 😊. Great idea to document this. |
for an regression which is not fixed on master and now raises an Exception can just use code sample (this will give a non-zero exit status for a bad commit and zero exit status for good commit eg. #35700 (comment) (don't need to assert correct result but probably better to document a better practice) ~/test.py import pandas as pd
print(pd.__version__)
df = pd.DataFrame({"a": [1, 2, 3] * 10000, "b": ["x", "y", "z"] * 10000})
res = df == "x"
print(res) and execute the following commands in the shell (git bash on Windows) see examples in https://git-scm.com/docs/git-bisect#_examples git bisect start HEAD v1.0.5
git bisect run /bin/bash -c "python setup.py build_ext -i;python ~/test.py"
git bisect reset
python setup.py build_ext -i # need to rebuild after reset |
We could even put something like this into a script in scripts/, possibly even accessible in via |
if it for dev use that's probably ok. but if we also want to encourage users to bisect the issues themselves we need to be Windows friendly or have a guide that includes Linux/Mac and Windows instructions. |
Thanks for writing these examples up so clearly here @simonjayhawkins, made it very easy to get started! A potentially useful addition is prepending |
Is this only true for bisecting? Any reason to not use it all the time? Anyway, here's a fully worked notebook I use to find regressions: https://www.kaggle.com/code/marcogorelli/pandas-regression-example |
Right, let's try to move this forwards - I think this could be a nice addition to this page https://github.com/pandas-dev/pandas/blob/0dadc71dd4653e5b858d7b4153df1d7aded4ba46/doc/source/development/maintaining.rst Doesn't need to be in-depth, I think it just needs:
Marking as good-first-issue then, if anyone wants to work on this feel free to open a pull request and to tag me |
Gives a tradeoff between compile time and performance. No optimizations gives much slower code, but for a case like bisecting when (usually) there's a bunch of compiling going on (since cython files often change across bisection steps), it can be much faster provided that the bisection test case still runs quickly |
Hi @MarcoGorelli I'd like to take a crack at writing this up. |
@simonjayhawkins you have been doing great work on finding the causes of bugs/regressions!
To make it easier for other contributors to also do this, I think it would be nice to document this is the contributing docs (even though it is relying on git, using it with pandas (which needs to be rebuilt for each commit) requires a bit of custom setup I think?)
The text was updated successfully, but these errors were encountered: