-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: where(...) unexpectedly casts None to np.NA for series with pd.Int64Dtype #34536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you convert it to >>> s.astype(object).where(pd.notnull(s), None)
0 1
1 2
2 3
3 None
dtype: object Personally, I might find it good behaviour that it actually by default tries to preserve the dtype. But it's certainly inconsistent with how it works for other dtypes. |
The question is a bit: what values are considered "valid" for the current dtype, and what not? Because for example for the numpy integer dtype, we also coerce the
And only when the value cannot be interpreted as an int, it gets upcasted to float or object dtype. Now, for the nullable integer dtype, we actually raise an error if it doesn't fit in integers:
(so in that light coercing None to NA certainly makes sense) |
Related |
@jorisvandenbossche is right, the solution here is to cast to object dtype |
Closing as expected behavior, masked dtypes don't upcast when setting incompatible values |
I think this is worth keeping open, as the lack of consistency with other dtypes is a pretty common complaint. |
I'd rather open an issue about documenting this? Was planning on doing this anyway after the rc for 2.0 is out |
The issue isnt the docs its the desired behavior. I'd be open to changing/deprecating the non-nullable behavior to not cast, but think we should be consistent throughout. |
Problem description
When trying to replace all
pd.NA
values withNone
usingwhere(...)
on apd.Series
with dtypepd.Int64Dtype
,None
values are automatically coerced topd.NA
in a way inconsistent with other dtypes. Additionally, thetry_cast
flag doesn't control this behavior.Expected Output
I expect that when
try_cast=False
,None
will not be coerced intonp.NA
, and the dtype of the series will instead be cast intoobject
to accommodate the newly introducedNone
value, as occurs for other dtypes.Example
In the following:
the
None
I'm trying to replacepd.NA
with is automatically cast back intopd.NA
. This occurs even thoughtry_cast
is set toFalse
by default, which I would expect to control this casting behavior.This is in contrast to the behavior of other dtypes, like numpy's float dtype:
which outputs the series with the dtype coerced to
object
after incorporatingNone
:Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.0.4
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: