-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Make sure series-series boolean comparions are label based (GH4947) #4953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jtratner I think you might just want to incorporate this in your ops changes (or I can merge directly first), u lmk @hayd, @jtratner what do you think about non-matching cases. I am basically only matching on the lhs labels, |
I'd prefer you merge this first (if you can do it tonight/tomorrow). I need to restart the arithmetic refactor soon, otherwise it's going to cover waay too much, but it'd be helpful to have this change in already with tests, so I don't break it or abstract too much. |
do u think I should be filling these with False |
Not sure about:
shouldn't this be |
Realised you were saying exactly the same above: I also think this should be filled with False. |
updated...default is now |
Not quite, I think Cripes, it's pretty tricky to be consistent here cos of bool(nan)... |
This is why I think default for currently (default is False)
and this is odd
THIS should return a (and will if |
@hayd ? |
meh... so two things:
I suspect this is too much overhead (too slow?).... :S but that would give what I would expect.
I guess I should just accept that NaN is True. (...never!!) |
@hayd these are not comparing bools (except maybe in a special case). They are comparing values (possilby) nan, so the issue is that a nan compare with another nan gives a nan (I think in all cases). which you then should fill |
@jreback but then afterwards it's being forced to bool rather than values? (I think that confuses things...!) |
no....its a comparison operation between 2 objects, you should get a bool dtype |
What do you think about the reindex thing? I think we agree that when both are NaN it should be False, however the weird behaviour imo is when only one is NaN - I think then This is also a bit sketch as Should it be commute? I think yes (because we're talking bool):
Apologies if I'm not making any sense here... |
I originally was fillna for
'special' case of: |
@hayd what do you think? default 'or' to True? (as I show above) |
@jreback Does that mean that in psuedo: (e | a) == a == (a | e) ? What do you think about the index thing (should we | the indexes result)? |
To separate the issues here:
I think the rest of the logic follows neatly from these... |
I think the answer is yes to both. as I think the identity so for 2) that would essentially be an |
Shouldn't this be assert_series_equal(result, a) ? |
nope. this is a boolean evaluation, say basically, return me the 'truth' value element-wise with self 'or' other. The input series dtype is completely irrelevant; actually the only relevant characteristic of the input is the index.
|
fyi...just found a bug when doing this with a scalar on the rhs |
These were blowing up before
These are really not 'valid' comparsions as they are effectiviely bitwise but numpy just gives an answer
|
Ok, I think the thing I'm upset with is that NaN should be overridden to be falsey before applying this, otherwise the results don't make sense (I don't see when you would ever use "or" to mean "or, or NaN"). This logic is overridden at other times e.g. when masking with NaNs, and I think it should be here too. As it stands, I disagree with... :(. I think we need to fillna left and right before op. To me the above seem valid (imo & is a relational operator just like ==... I get the feeling we are thinking about this differently...). The commutative/alignment thing raises it's head again here with |
@hayd you are talking about the scalar rhs (which have to admit is an odd thing to do in any event). or the Series that has nans? filling is very difficult as what do you fill with? |
I still don't see why this wouldn't work ? (We special case the NaN)
|
why are you filling with
so then easy to define what to do. |
Maybe I'm confused, my thinking was that this result is bool so why does it matter that self/other could be anything (non-bool)? |
because you are comparing self and other for I guess you DO have a good point, what is the expected behavior for non-bool dtype series that are passed in? I guess I have been assuming you can pass arbitrary stuff.....maybe just filling NaN with False is enough then.... ok..let me try what you suggested above |
I'd expect it to refill with nan with a mask, right? |
Nvm, that's not existing behavior on anything else. Whoops. |
@@ -4523,8 +4523,10 @@ def f(): | |||
def test_logical_with_nas(self): | |||
d = DataFrame({'a': [np.nan, False], 'b': [True, True]}) | |||
|
|||
# GH4947 | |||
# bool comparisons should return bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I've been confused by this statement (that bool comparisons always return bool?) This isn't #4947.... :s
I think this looks good. |
BUG: Make sure series-series boolean comparions are label based (GH4947)
closes #4947
You ask for a boolean comparsion, that is what you get
these ok?
scalars
Invalid scalars