-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix errors='ignore' being ignored in astype #30324 #30670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: nbonnin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test(s)? Should be the first part of any PR
Hello @naomi172839! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-03-05 03:46:10 UTC |
@WillAyd I added unit tests to test_generic.py. I have never used pytest before and am having trouble building the c components to test it myself. Hopefully it was done right. |
I seemed to have messed up the GIT... Let me know if I need to start over:/ |
You can still use this PR/branch - fix it up locally and use —force if needed |
…was not working before. Signed-off-by: nbonnin <[email protected]>
OK so I fixed the git. The tests actually seemed to already exist to make sure that the error was thrown. I am not sure why it did not fail before. |
can you add a whatsnew note |
Signed-off-by: nbonnin <[email protected]>
Signed-off-by: nbonnin <[email protected]>
@jreback When I add my own whatsnew entry it seems to create a conflict that I am unsure how to resolve. I already updated my local whatsnew file to the latest version. |
Hmm, looking back at the original issue, I'm not sure if this is what we want to do. The |
Apologies for not catching that earlier @naomi172839. |
@TomAugspurger would you then suggest adding an additional parameter to the astype method to allow the user to ignore the KeyErrors? |
That's one option, though I'm not sure I'd find it useful enough. Others may though. Perhaps @climatebrad can share his original usecase. |
My usecase is that I was joining several dataframes generated from different years of NYPD stop-and-frisk datasets into one multiyear dataframe. A lot of the work to keep the dataframes manageable was making sure that categorical data was saved as such. So I had a list of all of the columns that needed to be categorical data and was setting them. (Technically I converted to object type, did more processing, then converted to category type.) This code failed:
Here's the diff where I dealt with what I consider the incorrect climatebrad/stop-and-frisk@d755084#diff-a0ebe5d9008a8b884ea03629345f9735R178
|
@climatebrad I do agree that errors='ignore' at least implies that errors will be ignored but @TomAugspurger does have a point. It would overload the method and that's probably not the best solution. |
Looks like the 1.0 whatsnew got mangled in this - can you check that out from master? |
Sorry that it took so long, I had to figure out how to check out a single file. |
@naomi172839 not sure why the whatsnew files are collected here, but can you try the following on your branch? git fetch upstream/master
git checkout upstream/master pandas/doc/source/whatsnew/v1* Then re-push? I think should fix the issues here (can add a whatsnew note after we fix these up) |
@WillAyd So I tried that with no real luck. I tried deleting the files and rechecking them out which got me to where we are now. I dont have high hopes. |
Your last push looks a lot better. The only outstanding issue is that the file permissions have changed for the v1.0.0 whatsnew. If you do |
Output is below: nothing to commit, working tree clean So I don't think that there is anything staged |
Hmm OK. I guess git checkout might not grab file permissions. Since you are on macOS you can just do |
Done, though its not really showing any changes and zero differences from upstream/master for that file |
Is your |
Output |
Just for posterity try this one more time - it worked for the other whatsnew files so not sure what makes the 1.0.0 any different but maybe the changed permissions prevented this from working git fetch upstream/master
git checkout upstream/master pandas/doc/source/whatsnew/v1.0.0.rst |
Doing that, git thought that there were changes so I went ahead and pushed it. |
Hmm still didn't work. If you run If all else fails maybe just download the file directly from GitHub |
Output: v1.0.0.rst isn't listed. Nevertheless I tried downloading straight from github. It still seems to have made no difference. |
Hmm OK. Actually just needed to do a I did that to fix things up here, so just |
Ok great! that did alot. At this point it should be just adding the what's new message correct? I'm not super great at git (obviously). Just want to make sure before I push another commit. |
Hmm just reading back through the comment history I do agree with @TomAugspurger that this blurs the purpose of the |
The only real alternative that I came up with was to add another optional argument to the method, but I am not sure that that is really the best solution. Looking into it now, it could also be an option like mode.sim_interactive is. I don't really know if that is any better though. |
I think I'm still against adding this change. I think the errors keyword should just be for controlling the values conversion. If we want to allow unused keys to be provided, we can add a keyword dedicated for that. There are other context like df = pd.DataFrame({"A": [1, 2]})
df.astype({"A": "int", "B": "float"}, ignore_unused_keys=True)
df.rename(columns={"A": "a", "B", "b"}, ignore_unused_keys=False) # raises |
Although I am in the camp of "errors='ignore'" should mean "ignore errors" I'd be perfectly happy with a separate keyword to allow for this functionality. I'd make it consistent with the missing_cols : {'raise', 'ignore'}, default 'raise' |
Your proposal should be |
Personally, I feel that ignore_unused_keys is a more descriptive argument. If everyone thinks this is a good/acceptable route to go, I would be happy to work on it. In addition, if we change it here, it should probably be changed everywhere. I.e. the rename method and anyothers |
Yea I think that would be clearer |
@naomi172839 still active? Want to try to incorporate latest feedback? |
@naomi172839 closing as stale. ping if you want to continue. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff