Fix errors='ignore' being ignored in astype #30324 #30670

naomi172839 · 2020-01-04T00:19:24Z

closes errors='ignore' does not work on df.replace({col : type}) if col not found #30324
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Sync

Signed-off-by: nbonnin <[email protected]>

WillAyd

Can you add test(s)? Should be the first part of any PR

pep8speaks · 2020-01-04T06:38:42Z

Hello @naomi172839! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-05 03:46:10 UTC

naomi172839 · 2020-01-04T06:40:45Z

@WillAyd I added unit tests to test_generic.py. I have never used pytest before and am having trouble building the c components to test it myself. Hopefully it was done right.

pandas/tests/generic/test_generic.py

naomi172839 · 2020-01-04T23:19:08Z

I seemed to have messed up the GIT... Let me know if I need to start over:/

alimcmaster1 · 2020-01-05T00:35:20Z

I seemed to have messed up the GIT... Let me know if I need to start over:/

You can still use this PR/branch - fix it up locally and use —force if needed

…was not working before. Signed-off-by: nbonnin <[email protected]>

naomi172839 · 2020-01-08T01:55:36Z

OK so I fixed the git. The tests actually seemed to already exist to make sure that the error was thrown. I am not sure why it did not fail before.

jreback · 2020-01-09T03:37:01Z

can you add a whatsnew note

Signed-off-by: nbonnin <[email protected]>

naomi172839 · 2020-01-09T14:00:05Z

@jreback When I add my own whatsnew entry it seems to create a conflict that I am unsure how to resolve. I already updated my local whatsnew file to the latest version.

TomAugspurger · 2020-01-09T17:04:07Z

Hmm, looking back at the original issue, I'm not sure if this is what we want to do.

The errors keyword is used to handle raising when there's in issue with the conversion of values. This PR overloads errors to handle raising when there are non-conversion issues (incorrect keys in dtype dict for example). I don't think we want to overload the meaning of errors like that.

TomAugspurger · 2020-01-09T17:04:49Z

Apologies for not catching that earlier @naomi172839.

naomi172839 · 2020-01-09T17:13:15Z

@TomAugspurger would you then suggest adding an additional parameter to the astype method to allow the user to ignore the KeyErrors?

TomAugspurger · 2020-01-09T17:46:28Z

That's one option, though I'm not sure I'd find it useful enough. Others may though. Perhaps @climatebrad can share his original usecase.

climatebrad · 2020-01-10T22:45:58Z

My usecase is that I was joining several dataframes generated from different years of NYPD stop-and-frisk datasets into one multiyear dataframe. A lot of the work to keep the dataframes manageable was making sure that categorical data was saved as such. So I had a list of all of the columns that needed to be categorical data and was setting them. (Technically I converted to object type, did more processing, then converted to category type.) This code failed:

cat_set = set(CLEAN_CAT_VALUES) | set(CAT_FILL_NA_VALUES) 
data = data.astype({cat : 'object' for cat in cat_set}, errors='ignore')

Here's the diff where I dealt with what I consider the incorrect errors='ignore' behavior. (I had to explicitly exclude columns that didn't exist in the dataframe I was editing.) Not the hardest workaround, but I do still think that errors='ignore' should, you know, ignore errors.

climatebrad/stop-and-frisk@d755084#diff-a0ebe5d9008a8b884ea03629345f9735R178

cat_set = set(CLEAN_CAT_VALUES) | set(CAT_FILL_NA_VALUES) 
cat_set = cat_set.intersection(set(data.columns))
data = data.astype({cat : 'object' for cat in cat_set}, errors='ignore')

naomi172839 · 2020-01-11T03:39:10Z

@climatebrad I do agree that errors='ignore' at least implies that errors will be ignored but @TomAugspurger does have a point. It would overload the method and that's probably not the best solution.

WillAyd · 2020-02-12T00:48:00Z

Looks like the 1.0 whatsnew got mangled in this - can you check that out from master?

naomi172839 · 2020-03-04T23:08:34Z

Sorry that it took so long, I had to figure out how to check out a single file.

WillAyd · 2020-03-05T02:33:11Z

@naomi172839 not sure why the whatsnew files are collected here, but can you try the following on your branch?

git fetch upstream/master
git checkout upstream/master pandas/doc/source/whatsnew/v1*

Then re-push? I think should fix the issues here (can add a whatsnew note after we fix these up)

naomi172839 · 2020-03-05T02:46:52Z

@WillAyd So I tried that with no real luck. I tried deleting the files and rechecking them out which got me to where we are now. I dont have high hopes.

WillAyd · 2020-03-05T02:48:49Z

Your last push looks a lot better. The only outstanding issue is that the file permissions have changed for the v1.0.0 whatsnew.

If you do git checkout upstream/master pandas/doc/source/whatsnew/v1.0.0.rst now does git status show anything as staged?

naomi172839 · 2020-03-05T02:53:37Z

Output is below:
nbonnin@Naomis-MacBook-Pro pandas % git checkout upstream/master doc/source/whatsnew/v1.0.0.rst
Updated 0 paths from 247718065
nbonnin@Naomis-MacBook-Pro pandas % git status
On branch #30324
Your branch is up to date with 'origin/#30324'.

nothing to commit, working tree clean

So I don't think that there is anything staged

WillAyd · 2020-03-05T02:54:51Z

Hmm OK. I guess git checkout might not grab file permissions.

Since you are on macOS you can just do chmod 1644 pandas/doc/source/whatsnew/v1.0.0.rst instead to restore back to where it was

naomi172839 · 2020-03-05T03:00:18Z

Done, though its not really showing any changes and zero differences from upstream/master for that file

WillAyd · 2020-03-05T03:01:49Z

Is your upstream pointing to pandas? If not sure, can you post the output of git remote -v show?

naomi172839 · 2020-03-05T03:02:37Z

Output
nbonnin@Naomis-MacBook-Pro pandas % git remote -v show
origin https://github.com/naomi172839/pandas.git (fetch)
origin https://github.com/naomi172839/pandas.git (push)
upstream https://github.com/pandas-dev/pandas.git (fetch)
upstream https://github.com/pandas-dev/pandas.git (push)

WillAyd · 2020-03-05T03:06:42Z

Just for posterity try this one more time - it worked for the other whatsnew files so not sure what makes the 1.0.0 any different but maybe the changed permissions prevented this from working

git fetch upstream/master
git checkout upstream/master pandas/doc/source/whatsnew/v1.0.0.rst

naomi172839 · 2020-03-05T03:11:31Z

Doing that, git thought that there were changes so I went ahead and pushed it.

WillAyd · 2020-03-05T03:25:59Z

Hmm still didn't work. If you run git diff upstream/master --name-only locally none of the whatsnew files should show, but they seem to here on GitHub

If all else fails maybe just download the file directly from GitHub

naomi172839 · 2020-03-05T03:36:33Z

Output:
...
doc/source/whatsnew/v0.21.0.rst
doc/source/whatsnew/v0.23.0.rst
doc/source/whatsnew/v0.25.0.rst
doc/source/whatsnew/v0.25.3.rst
doc/source/whatsnew/v1.0.1.rst
doc/source/whatsnew/v1.0.2.rst
doc/source/whatsnew/v1.1.0.rst
...

v1.0.0.rst isn't listed. Nevertheless I tried downloading straight from github. It still seems to have made no difference.

WillAyd · 2020-03-05T03:47:00Z

Hmm OK. Actually just needed to do a git merge upstream/master and repush

I did that to fix things up here, so just git pull from your local branch to grab the changes and should be able to move forward

naomi172839 · 2020-03-05T03:53:38Z

Ok great! that did alot. At this point it should be just adding the what's new message correct? I'm not super great at git (obviously). Just want to make sure before I push another commit.

WillAyd · 2020-03-05T03:58:05Z

Hmm just reading back through the comment history I do agree with @TomAugspurger that this blurs the purpose of the errors keyword; any chance you thought about alternative implementations?

naomi172839 · 2020-03-05T04:03:46Z

The only real alternative that I came up with was to add another optional argument to the method, but I am not sure that that is really the best solution.

Looking into it now, it could also be an option like mode.sim_interactive is. I don't really know if that is any better though.

TomAugspurger · 2020-03-05T12:51:30Z

I think I'm still against adding this change. I think the errors keyword should just be for controlling the values conversion.

If we want to allow unused keys to be provided, we can add a keyword dedicated for that. There are other context like .rename where people have expressed a desire for extra / unused keys to raise (they're currently ignored in rename).

df = pd.DataFrame({"A": [1, 2]})
df.astype({"A": "int", "B": "float"}, ignore_unused_keys=True)

df.rename(columns={"A": "a", "B", "b"}, ignore_unused_keys=False)  # raises

climatebrad · 2020-03-05T14:42:03Z

Although I am in the camp of "errors='ignore'" should mean "ignore errors" I'd be perfectly happy with a separate keyword to allow for this functionality.

I'd make it consistent with the errors syntax though. Here is language consistent with the df.rename documentation.

missing_cols : {'raise', 'ignore'}, default 'raise'
If ‘raise’, raise a KeyError when a dict of columns contains labels that are not present. If ‘ignore’, existing keys will be retyped and extra keys will be ignored.

TomAugspurger · 2020-03-05T14:44:12Z

Your proposal should be extra_columns: {'raise', 'ignore'}, or unused_columns, right? Not "missing"?

naomi172839 · 2020-03-05T22:50:13Z

Personally, I feel that ignore_unused_keys is a more descriptive argument. If everyone thinks this is a good/acceptable route to go, I would be happy to work on it.

In addition, if we change it here, it should probably be changed everywhere. I.e. the rename method and anyothers

WillAyd · 2020-03-17T00:16:29Z

Personally, I feel that ignore_unused_keys is a more descriptive argument. If everyone thinks this is a good/acceptable route to go, I would be happy to work on it.

Yea I think that would be clearer

WillAyd · 2020-04-07T16:08:44Z

@naomi172839 still active? Want to try to incorporate latest feedback?

simonjayhawkins · 2020-05-08T16:01:37Z

@naomi172839 closing as stale. ping if you want to continue.

naomi172839 added 2 commits December 26, 2019 06:23

Merge pull request #1 from pandas-dev/master

adca461

Sync

Fixed errors='ignore' being ignored

ffdc947

Signed-off-by: nbonnin <[email protected]>

WillAyd requested changes Jan 4, 2020

View reviewed changes

alimcmaster1 added the Error Reporting Incorrect or improved errors from pandas label Jan 4, 2020

alimcmaster1 reviewed Jan 4, 2020

View reviewed changes

pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved

jreback requested changes Jan 4, 2020

View reviewed changes

pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved

pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved

jreback changed the title ~~Fix errors='ignore' being ignored #30324~~ Fix errors='ignore' being ignored in astype #30324 Jan 4, 2020

naomi172839 closed this Jan 4, 2020

naomi172839 reopened this Jan 4, 2020

naomi172839 force-pushed the #30324 branch from 29d4f96 to ffdc947 Compare January 8, 2020 01:49

Added comments to the tests. Test is already in place, unsure why it …

c88b98e

…was not working before. Signed-off-by: nbonnin <[email protected]>

naomi172839 added 2 commits January 9, 2020 07:54

Added a test for a not covered case. Added whats new entry.

b73279e

Signed-off-by: nbonnin <[email protected]>

Updated whatsnew to the latest version, added my own whatsnew entry

88aa2f3

Signed-off-by: nbonnin <[email protected]>

Fix whats new entry

79ff6e3

Fix whats new entry

0cd4929

Trying again to fix whats new entry

94eb49e

Updated file permissions

061e02a

Trying yet again to fix the whatsnew

7a76a71

naomi172839 added 2 commits March 4, 2020 22:33

Tried downloading directly from github.

1272c8a

Tried downloading directly from github.

5010732

Merge remote-tracking branch 'upstream/master' into pandas-dev#30324

bee2222

simonjayhawkins closed this May 8, 2020

Fix errors='ignore' being ignored in astype #30324 #30670

Fix errors='ignore' being ignored in astype #30324 #30670

Conversation

naomi172839 commented Jan 4, 2020 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

pep8speaks commented Jan 4, 2020 • edited Loading

Comment last updated at 2020-03-05 03:46:10 UTC

naomi172839 commented Jan 4, 2020

naomi172839 commented Jan 4, 2020

alimcmaster1 commented Jan 5, 2020

naomi172839 commented Jan 8, 2020

jreback commented Jan 9, 2020

naomi172839 commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

naomi172839 commented Jan 9, 2020 • edited Loading

TomAugspurger commented Jan 9, 2020

climatebrad commented Jan 10, 2020

naomi172839 commented Jan 11, 2020

WillAyd commented Feb 12, 2020

naomi172839 commented Mar 4, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

TomAugspurger commented Mar 5, 2020

climatebrad commented Mar 5, 2020

TomAugspurger commented Mar 5, 2020

naomi172839 commented Mar 5, 2020

WillAyd commented Mar 17, 2020

WillAyd commented Apr 7, 2020

simonjayhawkins commented May 8, 2020

naomi172839 commented Jan 4, 2020 •

edited

Loading

pep8speaks commented Jan 4, 2020 •

edited

Loading

naomi172839 commented Jan 9, 2020 •

edited

Loading