Skip to content

BUG: fix df.where(cond) when cond is empty #21947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pajachiet
Copy link
Contributor

  • when cond is empty, cond.dtypes are objects, which raised
    ValueError: Boolean array expected for the condition, not object
  • closes #xxxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2018

@pajachiet : Thanks for the PR! Can you add a test and a whatsnew entry in 0.24.0.txt as well?

From a higher level, this seems like a corner case to me here: why would you want to pass in an empty object for cond for .where ?

@gfyoung gfyoung added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Jul 17, 2018
@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

how is this not an error?

if you don’t have a conf then where doesn’t make sure

@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2018

how is this not an error?

@jreback : That's what I was wondering too myself. Perhaps a better error message is in order?

@codecov
Copy link

codecov bot commented Jul 17, 2018

Codecov Report

Merging #21947 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21947      +/-   ##
==========================================
+ Coverage   92.25%   92.25%   +<.01%     
==========================================
  Files         161      161              
  Lines       51169    51170       +1     
==========================================
+ Hits        47207    47208       +1     
  Misses       3962     3962
Flag Coverage Δ
#multiple 90.64% <100%> (ø) ⬆️
#single 42.28% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.81% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6366f5...f930315. Read the comment docs.

@pajachiet
Copy link
Contributor Author

Hello

I wanted to write a test, but I did not find where were the unit tests for the 'where' function.

This is indeed a corner case, when both df and cond are empty. Example:

import pandas as pd
data = [1]
df = pd.DataFrame(columns=['a'], data=data)
cond = df.applymap(lambda x: x > 0)
df.where(cond)  # Ok

data = []
df = pd.DataFrame(columns=['a'], data=data)
cond = df.applymap(lambda x: x > 0)
df.where(cond)  # Error

This kind of corner case do happen :

  • when you do unit testing
  • when your dataframe comes from real 'empty' data.

I had the same kind of corner case with df.duplicated in previous version of pandas.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

yeah certainly could have a better error message (e.g. you can detect the .empty), but this should just be an error.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

the df empty is ok, but the cond cannot be empty (unless the df itself is empty).

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Jul 17, 2018
@pajachiet
Copy link
Contributor Author

pajachiet commented Jul 17, 2018

Yes, cond can be empty if and only if df is empty.

But it is a bug to raise an error when they are both empty, no ?

As the bug is on cond dtype checking, I thought the easiest way to solve it was to avoid dtypes checking on empty dataframe (as it does not makes sense anyway).

One could also explicitly check for df.empty, but this would be weird as it already works when cond is empty with correct dtypes :

data = []
df = pd.DataFrame(columns=['a'], data=data)
cond = df > 0 
df.where(cond)  # Ok
cond.dtypes  # bool

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a test

add in pandas/tests/frame/test_indexing (you may need one for series, though I don't know if this path is hit, in pandas/tests/series/tests_indexing)

@pep8speaks
Copy link

pep8speaks commented Jul 25, 2018

Hello @pajachiet! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 09, 2018 at 11:45 Hours UTC

@pajachiet pajachiet force-pushed the fix-bug-where-with-empty-cond-df branch from 8f799db to 6772d0e Compare July 25, 2018 12:11
@pajachiet pajachiet force-pushed the fix-bug-where-with-empty-cond-df branch 2 times, most recently from d04e677 to 319f1e2 Compare August 5, 2018 12:40
@pajachiet
Copy link
Contributor Author

Hello,
I added a simple test as requested.
I changed the commit two times because of pep8 mistake, then trying to fix the build, but it is still failing for a reason I do not understand (it does not seem related to the change).
Could you help fixing it ?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew entry in v0.24.0 / bug fixes / reshaping

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this PR number as the issue number (as I don't think there is an issue about this)

@jreback
Copy link
Contributor

jreback commented Nov 4, 2018

closing as stale. if you'd like to continue, pls ping.

@jreback jreback closed this Nov 4, 2018
@pajachiet
Copy link
Contributor Author

I believe the bug in the CI was not caused by changes in this PR.

This PR fixed a simple but real bug. I managed to fix it in another way for my client, so I only did this PR if it can improve pandas, especially in those edges cases that can be really annoying.

I added a test, a whatsnew entry and answered your comments.
Could you please manage to merge it ?

Otherwise keep it close, it's too much an investment for such a minor improvment.

@gfyoung
Copy link
Member

gfyoung commented Nov 4, 2018

@pajachiet : Do you want to finish this up? I think it requires a rebase, that's all.

@jreback
Copy link
Contributor

jreback commented Nov 4, 2018

Otherwise keep it close, it's too much an investment for such a minor improvment.

well if it’s a bug then it will probably come up again

it’s always helpful to improve things

@gfyoung
Copy link
Member

gfyoung commented Nov 5, 2018

Let's take this for another spin.

@gfyoung gfyoung reopened this Nov 5, 2018
When `cond` is empty, `cond.dtypes` are objects,
which raises a `ValueError`. Now, it does not.

Original author: @pajachiet
@gfyoung gfyoung force-pushed the fix-bug-where-with-empty-cond-df branch from 92ccd06 to 3342c09 Compare November 5, 2018 00:42
@gfyoung gfyoung added this to the 0.24.0 milestone Nov 5, 2018
@gfyoung gfyoung added the Bug label Nov 5, 2018
@jreback jreback removed this from the 0.24.0 milestone Nov 6, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

@jreback : Is there a reason why we can't attempt to merge this for 0.24.0 ? I addressed the last comment (regarding the issue reference). It's also rebased and green. Did you have any other comments to add?

@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

Just saw your email regarding 0.24.0. That answers my question above.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment.

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018
@jreback jreback merged commit eb4e11e into pandas-dev:master Nov 6, 2018
@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

thanks!

@pajachiet
Copy link
Contributor Author

Many thanks ! :)

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants