Skip to content

Series.str.cat() with NaN in series returns NaN, rather than ignoring NaN #11435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hack-c opened this issue Oct 26, 2015 · 5 comments
Closed
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Milestone

Comments

@hack-c
Copy link

hack-c commented Oct 26, 2015

I find this surprising as the rest of the pandas Series.str.* API ignores NaN values.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.__version__
Out[3]: u'0.17.0'

In [4]: s = pd.Series(['asdf','sdfg',np.nan,'qwer','wert'])

In [5]: s.str.cat(sep=' ')
Out[5]: nan

I think this should return

In [5]: s.str.cat(sep=' ')
Out[5]:'asdf sdfg qwer wert'
@hack-c hack-c closed this as completed Oct 26, 2015
@hack-c
Copy link
Author

hack-c commented Oct 26, 2015

Didn't read the docs closely enough-- the na_rep keyword more or less takes care of this. It's still a little strange that even if you set

In [6]: s.str.cat(sep=' ', na_rep='')
Out[6]: 'asdf sdfg  qwer wetr'

You get a double sep around the NaN. Easy to fix with a regex, but shouldn't NaN just be ignored?

@hack-c hack-c reopened this Oct 26, 2015
@hack-c
Copy link
Author

hack-c commented Oct 26, 2015

I actually think Series.str.cat() should ignore NaN by default, to comply with expected behavior for the rest of the API. But I'm curious if there's a reason for why this is a bad idea.

@Winterflower
Copy link
Contributor

I think this would benefit from an additional doc example involving na_rep. I just read the API doc and it's not very clear from the explanation that doing Series.str.cat() without specifying anything for na_rep.

@hack-c
Copy link
Author

hack-c commented Oct 26, 2015

That sounds like a good approach. I can do a pull request for the docs. I opened another issue (#11334) for str_cat as well that I haven't gotten around to yet that might also be best solved with a doc example as opposed to changing internals.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design Strings String extension data type and string data labels Oct 27, 2015
@jorisvandenbossche
Copy link
Member

I agree it seems more logical to ignore the NaN value by default.

@jorisvandenbossche jorisvandenbossche added this to the Someday milestone Oct 27, 2015
@jreback jreback modified the milestones: 0.18.0, Someday Feb 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants