API: flex comparisons DataFrame vs Series inconsistency #28079

jbrockmendel · 2019-08-21T22:14:21Z

In tests.frame.test_arithmetic we test the DataFrame part of the following, but we do not test the Series behavior (at least not in that file)

arr = np.array([np.nan, 1, 6, np.nan])
arr2 = np.array([2j, np.nan, 7, None])
ser = pd.Series(arr)
ser2 = pd.Series(arr2)
df = pd.DataFrame(ser)
df2 = pd.DataFrame(ser2)

ser < ser2  # raises TypeError, makes sense
df < df2  # raises TypeError, makes sense

ser.lt(ser2)  # raises TypeError, makes sense

>>> df.lt(df2)
       0
0  False
1  False
2   True
3  False

The df.lt(df2) version (the "flex" op) masks positions where either df or df2 is null. That is fine, but why doesn't the Series version do the same thing?

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2019-08-31T02:02:48Z

@jreback @jorisvandenbossche @TomAugspurger this won't be hard to fix, needs input on what the desired behavior is.

TomAugspurger · 2019-09-03T11:46:45Z

What's the proposal here? To always mask before doing the op, so that ser.lt(ser2) doesn't raise?

jbrockmendel · 2019-09-03T14:26:30Z

Two options:

a) make Series.lt behave like DataFrame.lt
b) make DataFrame.lt behave like Series.lt

The main argument for a) is that it is less breaking for users relying on the DataFrame behavior.
The main argument for b) is that the behavior only matters in exactly 1 test, and it is a test with truly weird complex-dtype behavior (#28050)

TomAugspurger · 2019-09-03T14:43:41Z

OK, I'm not sure what's best here. My inclination is to follow NumPy where possible, which I think argues for option a, right? (I'm ignoring the None, which forces object dtype. Maybe I shouldn't ignore that).

jbrockmendel added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Aug 21, 2019

jbrockmendel mentioned this issue Sep 24, 2019

OPS: Remove mask_cmp_op fallback behavior #28601

Merged

5 tasks

jreback added this to the 1.0 milestone Sep 25, 2019

jreback closed this as completed in #28601 Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: flex comparisons DataFrame vs Series inconsistency #28079

API: flex comparisons DataFrame vs Series inconsistency #28079

jbrockmendel commented Aug 21, 2019

jbrockmendel commented Aug 31, 2019

TomAugspurger commented Sep 3, 2019

jbrockmendel commented Sep 3, 2019

TomAugspurger commented Sep 3, 2019

API: flex comparisons DataFrame vs Series inconsistency #28079

API: flex comparisons DataFrame vs Series inconsistency #28079

Comments

jbrockmendel commented Aug 21, 2019

jbrockmendel commented Aug 31, 2019

TomAugspurger commented Sep 3, 2019

jbrockmendel commented Sep 3, 2019

TomAugspurger commented Sep 3, 2019