Skip to content

sum in pandas can concatenate strings #13916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Aug 5, 2016 · 1 comment
Open

sum in pandas can concatenate strings #13916

shoyer opened this issue Aug 5, 2016 · 1 comment
Labels
Bug Reduction Operations sum, mean, min, max, etc. Strings String extension data type and string data

Comments

@shoyer
Copy link
Member

shoyer commented Aug 5, 2016

Possibly related: #13912

This looks wrong to me -- probably a bug?

In [36]: pd.Series(['a', 'b', 'c']).sum()
Out[36]: 'abc'

In [37]: pd.__version__
Out[37]: '0.18.1'

This happens on DataFrames, as well:

In [8]: pd.Series(['a', 'b', 'c']).to_frame().sum()
Out[8]:
0    abc
dtype: object

Note, of course, that summing strings:

In [2]: sum(['a', 'b', 'c'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-eba0487fc411> in <module>()
----> 1 sum(['a', 'b', 'c'])

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Interestingly, NumPy has the same (buggy?) behavior on dtype=object arrays:

In [11]: pd.Series(['a', 'b', 'c']).values.sum()
Out[11]: 'abc'

In [12]: pd.Series(['a', 'b', 'c']).values.astype(str).sum()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-4e25bf0067a8> in <module>()
----> 1 pd.Series(['a', 'b', 'c']).values.astype(str).sum()

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/_methods.py in _sum(a, axis, dtype, out, keepdims)
     30
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
     33
     34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type
@jreback
Copy link
Contributor

jreback commented Aug 5, 2016

this is just following numpy example. I agree it is a bit confusing, though, xref #9733. shouldn't something like this be fixed upstream? though I suspect a pandas fix would be much quicker.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions API Design Strings String extension data type and string data labels Aug 5, 2016
@jreback jreback added this to the Next Major Release milestone Aug 5, 2016
@mroeschke mroeschke added Bug and removed Dtype Conversions Unexpected or buggy dtype conversions API Design labels May 3, 2020
@jbrockmendel jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. labels Sep 20, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.4 Nov 13, 2021
@jreback jreback modified the milestones: 1.4, Contributions Welcome Dec 23, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reduction Operations sum, mean, min, max, etc. Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants