You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ x ] I have checked that this issue has not already been reported.
[ x ] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
importpandasaspdimportnumpyasnp# Fillna without inplace - everything works OK.df=pd.DataFrame({"a": [1, 5, 7, 3, 7], "b": ["a", "b", "c", "d", np.nan]})
df["b"] =df["b"].astype("category")
df["b"] =df["b"].fillna("a")
df=df.sort_values(by="a")
print(df)
# As above, but the fillna() is done inplacedf=pd.DataFrame({"a": [1, 5, 7, 3, 7], "b": ["a", "b", "c", "d", np.nan]})
df["b"] =df["b"].astype("category")
df["b"].fillna("a", inplace=True)
# df looks normal here - the np.nan has been replacedprint(df)
df=df.sort_values(by="a")
# Crash
Traceback:
Traceback (most recent call last):
File "test.py", line 17, in <module>
df = df.sort_values(by="a")
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/frame.py", line 5301, in sort_values
indexer, axis=self._get_block_manager_axis(axis), verify=False
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/internals/managers.py", line 1415, in take
new_axis=new_labels, indexer=indexer, axis=axis, allow_dups=True
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/internals/managers.py", line 1259, in reindex_indexer
for blk in self.blocks
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/internals/managers.py", line 1259, in <listcomp>
for blk in self.blocks
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/internals/blocks.py", line 1720, in take_nd
new_values = self.values.take(indexer, fill_value=fill_value, allow_fill=True)
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/core/series.py", line 829, in take
nv.validate_take(tuple(), kwargs)
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/compat/numpy/function.py", line 68, in __call__
validate_kwargs(fname, kwargs, self.defaults)
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/util/_validators.py", line 148, in validate_kwargs
_check_for_invalid_keys(fname, kwargs, compat_args)
File "/home/jbanorthwest.co.uk/danielevans/venvs/farmcat3/lib64/python3.6/site-packages/pandas/util/_validators.py", line 122, in _check_for_invalid_keys
raise TypeError(f"{fname}() got an unexpected keyword argument '{bad_arg}'")
TypeError: take() got an unexpected keyword argument 'allow_fill'
Problem description
If a dataframe contains a categorical column, df["col"].fillna(..., inplace=True) is called on that column, and the dataframe is subsequently sorted by a different column, Pandas will crash due to an internal error.
This error only occurs in Pandas 1.1.0/1.1.1 (latest at time of writing); Pandas 1.0.5 and below behaves correctly.
Briefly poking around the internals of Pandas, the difference seems to be that self.values.take in take_nd() is operating on a pandas.core.series.Series object in the latest Pandas (i.e. self.values is a Series), but was formerly working with a pandas.core.arrays.categorical.Categorical object in 1.0.5 and below. As the traceback states, Series does not support allow_fill in its take() function, while most other implementations of take() in Pandas do allow it.
Expected Output
The two versions of the code both run, producting the sorted, na-filled output dataframe:
a b
0 1 a
3 3 d
1 5 b
2 7 c
4 7 a
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.13.1.el7.x86_64
Version : #1 SMP Tue Jun 23 10:32:27 CDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
On staring at the backlog again (now knowing better what the bug is), I think this is the same underlying issue as #35731. In that case, it was fixed yesterday in Master.
Edit: Yes, there's a reproduction there with the same fillna example.
[ x ] I have checked that this issue has not already been reported.
[ x ] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Traceback:
Problem description
If a dataframe contains a categorical column,
df["col"].fillna(..., inplace=True)
is called on that column, and the dataframe is subsequently sorted by a different column, Pandas will crash due to an internal error.This error only occurs in Pandas 1.1.0/1.1.1 (latest at time of writing); Pandas 1.0.5 and below behaves correctly.
Briefly poking around the internals of Pandas, the difference seems to be that
self.values.take
intake_nd()
is operating on apandas.core.series.Series
object in the latest Pandas (i.e.self.values
is a Series), but was formerly working with apandas.core.arrays.categorical.Categorical
object in 1.0.5 and below. As the traceback states,Series
does not supportallow_fill
in itstake()
function, while most other implementations oftake()
in Pandas do allow it.Expected Output
The two versions of the code both run, producting the sorted, na-filled output dataframe:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.13.1.el7.x86_64
Version : #1 SMP Tue Jun 23 10:32:27 CDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 45.2.0
Cython : 0.29.14
pytest : 5.4.3
hypothesis : 5.16.0
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : 3.5.2
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.0
The text was updated successfully, but these errors were encountered: