Skip to content

Mismatch in Series/DataFrame boolean ops behavior #22724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Sep 15, 2018 · 4 comments
Closed

Mismatch in Series/DataFrame boolean ops behavior #22724

jbrockmendel opened this issue Sep 15, 2018 · 4 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug DataFrame DataFrame data structure Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member

Based on test_bool_ops_df_compat from tests.series.test_operators:

s1 = pd.Series([True, False, np.nan, True], dtype=object)
s2 = pd.Series([True, True, False, np.nan], dtype=object)

>>> s1 & s2
0     True
1    False
2    False
3    False
dtype: bool

>>> s1.to_frame() & s2.to_frame()
       0
0   True
1  False
2    NaN
3    NaN

The DataFrame behavior strikes me as More Correct. Is the mismatch intentional?

@WillAyd
Copy link
Member

WillAyd commented Sep 17, 2018

I wouldn't think so and agree with your evaluation

@WillAyd WillAyd added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate DataFrame DataFrame data structure labels Sep 17, 2018
@TomAugspurger
Copy link
Contributor

Agreed.

(totally different code-path, but the index version seems buggy as well)

In [12]: pd.Index(s1) & pd.Index(s2)
Out[12]: Index([True, True, False, nan], dtype='object')

I believe we already have an issue for multiset ops being incorrect.

@h-vetinari
Copy link
Contributor

h-vetinari commented Sep 17, 2018

I believe we already have an issue for multiset ops being incorrect.

Not sure I understand the multiset aspect, but the code above with the non-commutativity of boolean operations is the subject of #6528 and #13806.

Self-quoting from the former about the Series case:

The main issue seems that in pandas.core.ops.wrapper (https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops.py#L1572), after aligning the Series, only other is filled (transforming NaN to False), but self isn't, which introduces this symmetry break.

@jbrockmendel
Copy link
Member Author

Closed by #29531 (that adds the test, not sure when the actual fix was)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug DataFrame DataFrame data structure Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

5 participants