Skip to content

BUG: df.agg(sum, axis=1) gives wrong result when Nan value is in frame #21134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue May 19, 2018 · 1 comment
Closed
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented May 19, 2018

Using the built-in sum function gives correct result in df.agg(sum, axis=0), but wrong result in df.agg(sum, axis=1).

>>> df = pd.DataFrame([[np.nan, 2], [3, 4]])
>>> df.agg(sum, axis=0)
0    3.0
1    6.0
dtype: float64
>>> df.agg(sum, axis=1)
0    NaN
1    7.0
dtype: float64

The NaN in the last example should be 2.0.

Also, operation using the builtin sum in agg with axis=1 are very slow:

>>> n = 1_000
>>> df = pd.DataFrame({'a': range(n), 'b': range(n)})
>>> %timeit df.agg(sum, axis=1)
16.5 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df.T.agg(sum)  # correct result *and* faster
312 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Problem description

Currently, df.agg(func, axis=1) defers to df.apply(func, axis=1). This is not done for axis=0, and the operation may therefore give unexpected results and slow the operation down (because df.apply can be very slow).

Expected Output

The expected output is:

>>> df.agg(sum, axis=1)
0    2.0
1    7.0
dtype: float64

Solution proposal

I'm thinking about putting in df.T.agg(func, axis=0) rather than df.apply(func, axis=1) in a few strategic places. This should ensure both getting correct results and faster operations. will report back if this succeeds.

@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions labels May 21, 2018
@jreback jreback added this to the 0.23.1 milestone May 29, 2018
@topper-123
Copy link
Contributor Author

closed as a duplicate of #16679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
3 participants