Skip to content

Astype keeps nan when converting into string #28176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
294dd2e
astype keeps nan when converting into string
makbigc Aug 27, 2019
39b4294
Move the entry to API change section and make it prominent
makbigc Sep 10, 2019
b82e02f
Fix entry in v1.0
makbigc Sep 16, 2019
55d1cf7
Move the whatsnew entry to deprecation section
makbigc Sep 19, 2019
99dc246
merge for update
makbigc Oct 6, 2019
f44afcf
Add skipna keyword into astype
makbigc Oct 6, 2019
b5428c3
Fix linting adn docstring format
makbigc Oct 7, 2019
aa62364
merge for update
makbigc Dec 5, 2019
d48dc29
fix black format
makbigc Dec 5, 2019
b8724e5
Add skipna parameter into DatetimeBlock.astype and Block.astype
makbigc Dec 7, 2019
7f48697
Fix black format
makbigc Dec 7, 2019
9c624c5
resolve mypy issue
makbigc Dec 7, 2019
369c641
Remove kwarg parameter in astype function
makbigc Dec 9, 2019
2754677
Add FutureWarning for string-type conversion
makbigc Dec 9, 2019
ce39f6a
Fix black format
makbigc Dec 9, 2019
8e7cd3c
Merge for update
makbigc Dec 9, 2019
35fd58f
Add okwarning to suppress FutureWarning
makbigc Dec 9, 2019
1d29cd0
Add :okwarning: in whatsnew to suppress FutureWarning
makbigc Dec 9, 2019
34c51e0
Add :okwarning: to suppress FutureWarning
makbigc Dec 9, 2019
d879778
Add :okwarning: into getting_started/basic.rst
makbigc Dec 9, 2019
9765497
Add :okwarning: into integer_na.rst
makbigc Dec 9, 2019
ffed0a0
merge for update
makbigc Dec 10, 2019
c0cbe9a
merge for resolving conflict
makbigc Jan 1, 2020
5ff30e0
merge for update
makbigc Jan 3, 2020
3057073
Remove skipna parameter and set skipna=True in astype_nansafe
makbigc Jan 3, 2020
59030b1
Change test_astype_str_map
makbigc Jan 3, 2020
11c2015
fix test_astype_str
makbigc Jan 4, 2020
68c8e85
Fix black format
makbigc Jan 4, 2020
4b80090
merge for update
makbigc Jan 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion doc/source/getting_started/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -974,12 +974,14 @@ On a ``Series``, multiple functions return a ``Series``, indexed by the function
Passing a ``lambda`` function will yield a ``<lambda>`` named row:

.. ipython:: python
:okwarning:

tsdf['A'].agg(['sum', lambda x: x.mean()])

Passing a named function will yield that name for the row:

.. ipython:: python
:okwarning:

def mymean(x):
return x.mean()
Expand Down Expand Up @@ -1034,6 +1036,7 @@ With ``.agg()`` is it possible to easily create a custom describe function, simi
to the built in :ref:`describe function <basics.describe>`.

.. ipython:: python
:okwarning:

from functools import partial

Expand Down Expand Up @@ -1066,7 +1069,6 @@ Transform the entire frame. ``.transform()`` allows input functions as: a NumPy
function name or a user defined function.

.. ipython:: python
:okwarning:

tsdf.transform(np.abs)
tsdf.transform('abs')
Expand All @@ -1093,13 +1095,15 @@ The first level will be the original frame column names; the second level
will be the names of the transforming functions.

.. ipython:: python
:okwarning:

tsdf.transform([np.abs, lambda x: x + 1])

Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

.. ipython:: python
:okwarning:

tsdf['A'].transform([np.abs, lambda x: x + 1])

Expand All @@ -1111,6 +1115,7 @@ Transforming with a dict
Passing a dict of functions will allow selective transforming per column.

.. ipython:: python
:okwarning:

tsdf.transform({'A': np.abs, 'B': lambda x: x + 1})

Expand Down Expand Up @@ -1138,6 +1143,7 @@ a single value and returning a single value. For example:
df4 = df_orig.copy()

.. ipython:: python
:okwarning:

df4

Expand Down
2 changes: 2 additions & 0 deletions doc/source/user_guide/integer_na.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Missing values will be propagated, and the data will be coerced to another
dtype if needed.

.. ipython:: python
:okwarning:

s = pd.Series([1, 2, None], dtype="Int64")

Expand Down Expand Up @@ -129,6 +130,7 @@ These dtypes can operate as part of of ``DataFrame``.
These dtypes can be merged & reshaped & casted.

.. ipython:: python
:okwarning:

pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
df['A'].astype(float)
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,7 @@ Or you can use the :func:`~pandas.to_numeric` function to coerce the
dtypes after reading in the data,

.. ipython:: python
:okwarning:

df2 = pd.read_csv(StringIO(data))
df2['col_1'] = pd.to_numeric(df2['col_1'], errors='coerce')
Expand Down
2 changes: 2 additions & 0 deletions doc/source/user_guide/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The sparse objects exist for memory efficiency reasons. Suppose you had a
large, mostly NA ``DataFrame``:

.. ipython:: python
:okwarning:

df = pd.DataFrame(np.random.randn(10000, 4))
df.iloc[:9998] = np.nan
Expand Down Expand Up @@ -300,6 +301,7 @@ meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sp
The method requires a ``MultiIndex`` with two or more levels.

.. ipython:: python
:okwarning:

s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan])
s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0),
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
Both outputs are ``Int64`` dtype. Compare that with object-dtype

.. ipython:: python
:okwarning:

s.astype(object).str.count("a")
s.astype(object).dropna().str.count("a")
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/timedeltas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ or by astyping to a specific timedelta type. These operations yield Series and p
Note that division by the NumPy scalar is true division, while astyping is equivalent of floor division.

.. ipython:: python
:okwarning:

december = pd.Series(pd.date_range('20121201', periods=4))
january = pd.Series(pd.date_range('20130101', periods=4))
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.11.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
Astype conversion on ``datetime64[ns]`` to ``object``, implicitly converts ``NaT`` to ``np.nan``

.. ipython:: python
:okwarning:

s = pd.Series([datetime.datetime(2001, 1, 2, 0, 0) for i in range(3)])
s.dtype
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.13.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,7 @@ Enhancements
is frequency conversion. See :ref:`the docs<timedeltas.timedeltas_convert>` for the docs.

.. ipython:: python
:okwarning:

import datetime
td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series(
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ marker of ``np.nan`` will infer to integer dtype. The display of the ``Series``
Operations on these dtypes will propagate ``NaN`` as other pandas operations.

.. ipython:: python
:okwarning:

# arithmetic
s + 1
Expand Down Expand Up @@ -85,6 +86,7 @@ These dtypes can operate as part of a ``DataFrame``.
These dtypes can be merged, reshaped, and casted.

.. ipython:: python
:okwarning:

pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
df['A'].astype(float)
Expand Down
24 changes: 24 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,30 @@ Documentation Improvements
Deprecations
~~~~~~~~~~~~

String conversion of Series with nan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:meth:Series.astype(str) previously would coerce a np.nan to the string nan. Now pandas will preserve the missing value indicator (:issue:`25353`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, @jschendel and I both prefer a deprecation cycle. It looks like you're making a breaking change. Is that correct?


*Previous behavior*:

.. code-block:: ipython

In [1]: pd.Series(['foo', np.nan]).astype(str)
Out[2]:
0 foo
1 nan
dtype: object

*New behavior*:

.. ipython:: python
pd.Series(['foo', np.nan]).astype(str)


Other deprecations
^^^^^^^^^^^^^^^^^^

- :meth:`Series.item` and :meth:`Index.item` have been _undeprecated_ (:issue:`29250`)
- ``Index.set_value`` has been deprecated. For a given index ``idx``, array ``arr``,
value in ``idx`` of ``idx_val`` and a new value of ``val``, ``idx.set_value(arr, idx_val, val)``
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -789,7 +789,7 @@ def conv(r, dtype):
return [conv(r, dtype) for r, dtype in zip(result, dtypes)]


def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = True):
"""
Cast the elements of an array to a given dtype a nan-safe manner.

Expand Down
11 changes: 10 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5526,6 +5526,8 @@ def astype(
- ``raise`` : allow exceptions to be raised
- ``ignore`` : suppress exceptions. On error return original object.

.. versionadded:: 0.20.0

Returns
-------
casted : same type as caller
Expand Down Expand Up @@ -5603,6 +5605,13 @@ def astype(
1 2
dtype: int64
"""
if isna(self.values).any():
msg = (
"The meaning of the missing value indicator is preserved "
"by default in the future version."
)
warnings.warn(msg, FutureWarning, stacklevel=2)

if is_dict_like(dtype):
if self.ndim == 1: # i.e. Series
if len(dtype) > 1 or self.name not in dtype:
Expand All @@ -5623,7 +5632,7 @@ def astype(
for col_name, col in self.items():
if col_name in dtype:
results.append(
col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
col.astype(dtype=dtype[col_name], copy=copy, errors=errors,)
)
else:
results.append(col.copy() if copy else col)
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -532,8 +532,10 @@ def f(mask, val, idx):
return self.split_and_operate(None, f, False)

def astype(self, dtype, copy: bool = False, errors: str = "raise"):
"""
Coerce to the new dtype.
return self._astype(dtype, copy=copy, errors=errors)

def _astype(self, dtype, copy=False, errors="raise"):
"""Coerce to the new type

Parameters
----------
Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/frame/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,13 +580,13 @@ def test_astype_str(self):
tm.assert_frame_equal(result, expected)

def test_astype_str_float(self):
# see gh-11302
# GH 25353
result = DataFrame([np.NaN]).astype(str)
expected = DataFrame(["nan"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback do you recall, was the intent of this test that np.nan be converted to the string 'nan'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC wanted to match numpy


expected = DataFrame([np.nan], dtype=object)
tm.assert_frame_equal(result, expected)
result = DataFrame([1.12345678901234567890]).astype(str)

# see gh-11302
result = DataFrame([1.12345678901234567890]).astype(str)
# < 1.14 truncates
# >= 1.14 preserves the full repr
val = "1.12345678901" if _np_version_under1p14 else "1.1234567890123457"
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/reductions/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1102,7 +1102,10 @@ def test_mode_numerical_nan(self, dropna, expected):

@pytest.mark.parametrize(
"dropna, expected1, expected2, expected3",
[(True, ["b"], ["bar"], ["nan"]), (False, ["b"], [np.nan], ["nan"])],
[
(True, ["b"], ["bar"], Series(["bar"])),
(False, ["b"], [np.nan], Series([np.nan], dtype=object)),
],
)
def test_mode_str_obj(self, dropna, expected1, expected2, expected3):
# Test string and object types.
Expand All @@ -1124,7 +1127,6 @@ def test_mode_str_obj(self, dropna, expected1, expected2, expected3):

s = Series(data, dtype=object).astype(str)
result = s.mode(dropna)
expected3 = Series(expected3, dtype=str)
tm.assert_series_equal(result, expected3)

@pytest.mark.parametrize(
Expand Down
25 changes: 17 additions & 8 deletions pandas/tests/series/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,11 @@ def test_astype_datetime64tz(self):
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("dtype", [str, np.str_])
@pytest.mark.parametrize(
"series",
[
Series([string.digits * 10, tm.rands(63), tm.rands(64), tm.rands(1000)]),
Series([string.digits * 10, tm.rands(63), tm.rands(64), np.nan, 1.0]),
],
)
def test_astype_str_map(self, dtype, series):
def test_astype_str_map(self, dtype):
# see gh-4405
series = Series(
[string.digits * 10, tm.rands(63), tm.rands(64), tm.rands(1000)]
)
result = series.astype(dtype)
expected = series.map(str)
tm.assert_series_equal(result, expected)
Expand All @@ -152,6 +148,19 @@ def test_astype_str_cast(self):
expected = Series([str("1 days 00:00:00.000000000")])
tm.assert_series_equal(s, expected)

def test_astype_str(self):
# GH 25353
ser = Series([1, "a", np.nan])
result = ser.astype(str)
expected = Series(["1", "a", np.nan])
tm.assert_series_equal(result, expected)

def test_deprecate_astype_str(self):
# GH 25353
ser = Series([1, "a", np.nan])
with tm.assert_produces_warning(expected_warning=FutureWarning):
ser.astype(str)

def test_astype_unicode(self):
# see gh-7758: A bit of magic is required to set
# default encoding to utf-8
Expand Down