Skip to content

BUG: fillna('') on a Int64 column causes TypeError: <U3 cannot be converted to an IntegerDtype #44289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
the21st opened this issue Nov 2, 2021 · 3 comments
Closed
3 tasks done
Labels
API - Consistency Internal Consistency of API/Behavior Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@the21st
Copy link

the21st commented Nov 2, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}, dtype="Int64")
df.fillna('nan')

Issue Description

The code above causes the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-2-fcfc27adad85> in <module>
      2 import numpy as np
      3 df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}, dtype="Int64")
----> 4 df.fillna('nan')

~/venv/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~/venv/lib/python3.8/site-packages/pandas/core/frame.py in fillna(self, value, method, axis, inplace, limit, downcast)
   5174         downcast=None,
   5175     ) -> DataFrame | None:
-> 5176         return super().fillna(
   5177             value=value,
   5178             method=method,

~/venv/lib/python3.8/site-packages/pandas/core/generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
   6380 
   6381             elif not is_list_like(value):
-> 6382                 new_data = self._mgr.fillna(
   6383                     value=value, limit=limit, inplace=inplace, downcast=downcast
   6384                 )

~/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py in fillna(self, value, limit, inplace, downcast)
    408 
    409     def fillna(self: T, value, limit, inplace: bool, downcast) -> T:
--> 410         return self.apply(
    411             "fillna", value=value, limit=limit, inplace=inplace, downcast=downcast
    412         )

~/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    325                     applied = b.apply(f, **kwargs)
    326                 else:
--> 327                     applied = getattr(b, f)(**kwargs)
    328             except (TypeError, NotImplementedError):
    329                 if not ignore_failures:

~/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py in fillna(self, value, limit, inplace, downcast)
   1570         self, value, limit=None, inplace: bool = False, downcast=None
   1571     ) -> list[Block]:
-> 1572         values = self.values.fillna(value=value, limit=limit)
   1573         return [self.make_block_same_class(values=values)]
   1574 

~/venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py in fillna(self, value, method, limit)
    174                 # fill with value
    175                 new_values = self.copy()
--> 176                 new_values[mask] = value
    177         else:
    178             new_values = self.copy()

~/venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py in __setitem__(self, key, value)
    186         if _is_scalar:
    187             value = [value]
--> 188         value, mask = self._coerce_to_array(value)
    189 
    190         if _is_scalar:

~/venv/lib/python3.8/site-packages/pandas/core/arrays/integer.py in _coerce_to_array(self, value)
    332 
    333     def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
--> 334         return coerce_to_array(value, dtype=self.dtype)
    335 
    336     def astype(self, dtype, copy: bool = True) -> ArrayLike:

~/venv/lib/python3.8/site-packages/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
    202 
    203     elif not (is_integer_dtype(values) or is_float_dtype(values)):
--> 204         raise TypeError(f"{values.dtype} cannot be converted to an IntegerDtype")
    205 
    206     if mask is None:

TypeError: <U3 cannot be converted to an IntegerDtype

Expected Behavior

df.fillna('nan') should return a new dataframe with np.nan values filled with 'nan' strings.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.47-linuxkit
Version : #1 SMP Sat Jul 3 21:51:47 UTC 2021
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.2.0
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : 1.0.2
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.2
IPython : 7.28.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None

@the21st the21st added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2021
@the21st the21st changed the title BUG: fillna('') causes TypeError: <U3 cannot be converted to an IntegerDtype BUG: fillna('') on a Int64 column causes TypeError: <U3 cannot be converted to an IntegerDtype Nov 2, 2021
@the21st the21st changed the title BUG: fillna('') on a Int64 column causes TypeError: <U3 cannot be converted to an IntegerDtype BUG: fillna('') on a Int64 column causes TypeError: <U3 cannot be converted to an IntegerDtype Nov 2, 2021
@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 6, 2021
@gabrieldi95
Copy link
Contributor

On master I get the error ValueError: invalid literal for int() with base 10: 'nan', but is this really a bug? I guess it is expected, if you set dtype as Int64 you shouldn't fillna with a string.

@the21st
Copy link
Author

the21st commented Nov 19, 2021

On master I get the error ValueError: invalid literal for int() with base 10: 'nan', but is this really a bug? I guess it is expected, if you set dtype as Int64 you shouldn't fillna with a string.

I would expect pandas to be consistent across types, and Int64 is not consistent with the case when you don't specify a dtype:

df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}) # column A is of type float64
df.fillna('nan') # column A is of type object, it contains both floats and strings

@phofl
Copy link
Member

phofl commented Dec 17, 2022

This is similar to #45729 and related discussions. Closing here to keep discussion together. As of right now this is expected

@phofl phofl closed this as completed Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants