Skip to content

QST: | (or) behavior when working with two series that have different indexes #41604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
ThomasDelorme opened this issue May 21, 2021 · 2 comments
Open
2 tasks done
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@ThomasDelorme
Copy link

ThomasDelorme commented May 21, 2021

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Question about pandas

When using the | operation between two dataframes, the order of the operation can give different results if you have nan, or different index.

df_1 = pd.DataFrame([None], index=[1])
df_2 = pd.DataFrame([True], index=[1])

print(df_1 | df_2)
print(df_2 | df_1)

will give you different results:

       0
1  False
      0
1  True

The same things with different index:

df_1 = pd.DataFrame([True], index=[1])
df_2 = pd.DataFrame([True], index=[2])

print(df_1 | df_2)
print(df_2 | df_1)

will return

       0
1   True
2  False
       0
1  False
2   True

Is this a normal behavior or a bug? If it is a normal behavior, what is the logic behind it?

Thanks!

@ThomasDelorme ThomasDelorme added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels May 21, 2021
@mzeitlin11
Copy link
Member

Thanks for the report @ThomasDelorme! These both look suspicious - I think both are bugs. Think the root cause is along the lines of weirdness with truthy operators for object arrays - see the whole host of issues around #12863 in both numpy and pandas.

What looks to be the reason for the lack of commutativity in the first case is that the logic for computing the binary op hits None or True, which raises and falls back to the give the left operator (which then gets filled with False when the result is coerced to a bool type). So None or True -> None - > False and True or None -> True. This logic can be seen in

pandas/pandas/_libs/ops.pyx

Lines 244 to 250 in e5f1f9c

try:
result[i] = op(x, y)
except TypeError:
if x is None or is_nan(x):
result[i] = x
elif y is None or is_nan(y):
result[i] = y

Didn't look into second case yet, but I'd guess it's an identical issue -> since the indices don't match the frames will be reindexed before computing the binary op, giving the same NaN to bool comparison issues.

Further investigations into fixing this are welcome!

@mzeitlin11 mzeitlin11 added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels May 21, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone May 21, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@corneliusroemer
Copy link

corneliusroemer commented Feb 9, 2023

This would be great if it could be fixed. It contradicts normal logic that NaN | True == False. If one is True, then or should yield True. Irrespective of what NaN is.

NaN | False is trickier, this could be either False or True, depending on what NaN is, so NaN seems appropriate. Or throwing an error, like numpy does.

This is probably a duplicate: #51267

Also, it would be good to rename the issue from QST: to BUG: as this clearly is a bug ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants