BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

topper-123 · 2018-05-19T14:38:49Z

Using the built-in sum function gives correct result in df.agg(sum, axis=0), but wrong result in df.agg(sum, axis=1).

>>> df = pd.DataFrame([[np.nan, 2], [3, 4]])
>>> df.agg(sum, axis=0)
0    3.0
1    6.0
dtype: float64
>>> df.agg(sum, axis=1)
0    NaN
1    7.0
dtype: float64

The NaN in the last example should be 2.0.

Also, operation using the builtin sum in agg with axis=1 are very slow:

>>> n = 1_000
>>> df = pd.DataFrame({'a': range(n), 'b': range(n)})
>>> %timeit df.agg(sum, axis=1)
16.5 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df.T.agg(sum)  # correct result *and* faster
312 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Problem description

Currently, df.agg(func, axis=1) defers to df.apply(func, axis=1). This is not done for axis=0, and the operation may therefore give unexpected results and slow the operation down (because df.apply can be very slow).

Expected Output

The expected output is:

>>> df.agg(sum, axis=1)
0    2.0
1    7.0
dtype: float64

Solution proposal

I'm thinking about putting in df.T.agg(func, axis=0) rather than df.apply(func, axis=1) in a few strategic places. This should ensure both getting correct results and faster operations. will report back if this succeeds.

The text was updated successfully, but these errors were encountered:

topper-123 · 2018-06-06T07:31:36Z

closed as a duplicate of #16679.

topper-123 mentioned this issue May 19, 2018

ENH: add np.nan funcs to _cython_table #21123

Closed

3 tasks

gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions labels May 21, 2018

This was referenced May 27, 2018

BUG: df.agg(sum, axis=1) uses different method than when axis=0 #21222

Closed

BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 #21224

Merged

jreback added this to the 0.23.1 milestone May 29, 2018

topper-123 closed this as completed Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

topper-123 commented May 19, 2018 •

edited

Loading

topper-123 commented Jun 6, 2018

BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

Comments

topper-123 commented May 19, 2018 • edited Loading

Problem description

Expected Output

Solution proposal

topper-123 commented Jun 6, 2018

topper-123 commented May 19, 2018 •

edited

Loading