Skip to content

Series.astype(str, skipna=True) vanished in the 1.0 release #31708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
languitar opened this issue Feb 5, 2020 · 16 comments
Closed

Series.astype(str, skipna=True) vanished in the 1.0 release #31708

languitar opened this issue Feb 5, 2020 · 16 comments
Labels
Astype Bug Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version

Comments

@languitar
Copy link

Code Sample, a copy-pastable example if possible

pd.Series([None, 42]).astype(str, skipna=True)

results in TypeError: astype() got an unexpected keyword argument 'skipna'

Problem description

The provided snippet used to work before the 1.0 release. With the skipna argument, the None was preserved as a real np.nan type, instead of being converted to the string nan. The skipna argument seems to have gone in the 1.0 release and I can't find any way to restore this behavior without a custom function being passed to apply.

Expected Output

A series with [np.nan, '42']

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.5.1-arch1-1 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.utf8 LOCALE : en_US.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3
setuptools : 44.0.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: 0.8.1
bs4 : 4.8.2
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@MarcoGorelli MarcoGorelli added the Regression Functionality that used to work in a prior pandas version label Feb 5, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 5, 2020

Can confirm this used to work in 0.25.3, but no longer works now.

DataFrame.astype used to accept kwargs but they were removed in #28646

@jorisvandenbossche
Copy link
Member

Thanks for opening the issue!
As I said on gitter, I was not aware of this keyword, which is not surprising as it has never been documented AFAIK (and also no tests clearly ..)

The reason it worked before (in certain cases) is because the kwargs were passed through to astype_nansafe, which has indeed a skipna keyword (

def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
)

@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 5, 2020
@jreback
Copy link
Contributor

jreback commented Feb 6, 2020

this may have accidentally worked (passing skipna they)
but is not supported in any way

@languitar
Copy link
Author

Actually, I found this only after reading some issues in this project. I can't exactly recall the source issue for my information on this argument, but the behavior was, for instance, also documented here: #25353

@languitar
Copy link
Author

In any case, is there an alternative proposal how to handle this case that is also working in 1.0?

@jreback
Copy link
Contributor

jreback commented Feb 6, 2020

this is not documented in #25353
this is a straight up bug with a PR but hasn’t been merged yet.

canonically something like

s[s.notna()] = s.astype(str)

would work

@jreback
Copy link
Contributor

jreback commented Feb 6, 2020

this is a duplicate issue

@jorisvandenbossche
Copy link
Member

@languitar And you are certainly not alone in finding it from that issue .. (according to the 👍's)

In practice, it would be easy to restore this for 1.0.2. But in principle, I would personally prefer to not do this, since it was never actually documented (apart from the comment on that issue), only worked kind f accidentally, and is relatively easy to workaround until the actual issue is fixed.

However, depending on the discussion in #28176, we might still want to add it back anyhow (if we would introduce such a keyword to handle the deprecation cycle). But, that will first need some discussion in #28176 then.

smorken pushed a commit to cat-cfs/libcbm_py that referenced this issue Feb 7, 2020
@MarcoGorelli MarcoGorelli added the Needs Discussion Requires discussion from core team before further action label Feb 16, 2020
@TomAugspurger
Copy link
Contributor

I don't think there's anything being done for 1.0.2 here. Pushing.

@TomAugspurger TomAugspurger removed this from the 1.0.2 milestone Mar 11, 2020
@sl224
Copy link

sl224 commented Mar 23, 2020

Any slick ways to handle this in versions > 1.0?
jrebacks solution seems the most straight forward s[s.notna()] = s.astype(str)

@simonjayhawkins
Copy link
Member

However, depending on the discussion in #28176, we might still want to add it back anyhow (if we would introduce such a keyword to handle the deprecation cycle). But, that will first need some discussion in #28176 then.

makes sense, see #28176 (comment), can we do this seperate to #28176 to resolve this regression.

@mroeschke mroeschke added the Bug label May 11, 2020
@t0ma-sz
Copy link

t0ma-sz commented Jun 29, 2020

Is there any update on this one? In pandas 1.0.5 it’s still complaining about unexpected keyword argument.

I think this may block the migration of some projects to 1.0.x.

@simonjayhawkins
Copy link
Member

@t0ma-sz #28176 was closed as stale.

if you would like to submit a PR and address #28176 (comment), we can review again.

@dbalduini
Copy link

dbalduini commented Apr 6, 2021

This solution s[s.notna()] = s.astype(str) will not work with mixed types.

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

This is another workaround that does work with mixed types: s = s.where(s.isna(), s.astype(str))

How to reproduce:

df = pd.DataFrame({'number': pd.Series(range(3))})
df["name"] = "abc"
df.number.loc[2] = None
df.name.loc[1] = None

   number  name
0     0.0   abc
1     1.0  None
2     NaN   abc

df['number'].to_list()
> [0.0, 1.0, nan]
df['name'].to_list()
> ['abc', None, 'abc']

df[df.notna()] = df.astype(str)
> TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

df = df.where(df.isnull(), df.astype(str))

df['number'].to_list()
> ['0.0', '1.0', nan]
df['name'].to_list()
> ['abc', None, 'abc']

@hans2520
Copy link

hans2520 commented Apr 7, 2022

This solution s[s.notna()] = s.astype(str) will not work with mixed types.

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

This is another workaround that does work with mixed types: s = s.where(s.isna(), s.astype(str))

This workaround does not work with Int64 columns:

*** TypeError: object cannot be converted to an IntegerDtype

Leaving both workarounds not working in such a use case.

@jbrockmendel
Copy link
Member

This hasn't been around for a while, not likely to come back. Closeable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Bug Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.