-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Replacing pd.NA
by None
has no effect
#45601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried to investigate this issue. In the updated method, there is a new comment: Line 6597 in 20d0964
So there has been a change in response to issue #36984 Further investigation is needed for if this change is working as intended. |
Change in behavior is from merged PR #45081 |
I did further debugging. I found that PR #45081 is not what's causing this issue. The issue is from a change in a different method: I did a basic test with your reproducible example; using the |
take |
I investigated this issue thoroughly, I considered sending a pull request at first but I now I'm starting to think this might be intended behavior after all. I need help in determining whether or not this is really intended behavior. My explanation is below. First, here's a way to make the Reproducible Example work on pandas 1.4.0 and above (also tested on the main branch): import pandas as pd
df = pd.DataFrame({"value": [42, None]}).astype({"value": "Int64"})
assert df.astype(object).replace({pd.NA: None}).to_dict() == {"value": {0: 42, 1: None}} The addition here is explicit casting of To further explain: When using data types in pandas, It is worth noting however, that if the value is not explicitly cast initially then the implicit conversion to import pandas as pd
import numpy as np
df = pd.DataFrame({"value": [42, None]})
assert df.replace({np.nan: None}).to_dict() == {"value": {0: 42, 1: None}} Although this requires using NumPy's |
@roib20 Thanks for your thorough investigation and your explanation how the code can be made compatible to pandas 1.4.0! I'm also interested to know if this new behavior is intended and will be kept as it is. |
It turns out that it is not yet fully decided if this should be intended behavior or not. On a different but related issue I received this comment:
For now my recommendation would be to be explicit with the types you want to use with I did however write a small fix for this bug that makes the behavior more consistent with 1.3.5. I wrote this commit 0dd5a1d that appears to solve this issue and #45729, however I am waiting for more input on the desired behavior before making a pull request with more tests. |
@roib20 thanks for the investigation! I also wanted to highlight the solution / workaround that you already suggested as well to actually get
This way you ensure that you have a data type that can hold any value exactly as you want it (in this case But so the underlying question, as you mentioned above, is indeed: should operations like
Small note on this: I don't think a potential solution can use Aside: we should maybe add a |
@jorisvandenbossche the problem with explicit casting is that in more complex dataframes, the whole dataframe will become of type "object", not just "where necessary". The usual reason to replace NaN / NaT (at least for my usecase) is to get valid json after Now using Obviously we could loop through the columns, analyze "does this contain a very crude example for this would be the following: pd.DataFrame([[pd.NaT, 5], [pd.NaT, 3]]).replace({pd.np.NaN:None}) The dtypes after replacement are as follows:
Notice that only necessary types have changed. using Maybe a |
When replacing with None, and you actually want to have None in the values, then object dtype is the only dtype that can hold Writing this, another aside: we should maybe add a keyword to |
The best way forward here is maybe to be more "strict" about inferring the type from the replaced values. So if the replaced values have |
can confirm first bad commit: [9cd1c6f] BUG: nullable dtypes not preserved in Series.replace (#44940) |
mentioned in #45836, one option would be to check if both |
moving to 1.4.2 |
I've opened a PR that reverts to the 1.3.5 behavior for now. splitting the block so that only the columns where values are replaced are cast to object can be done as a follow-up on the main branch |
Should we upgrade to a latest pandas version to have this fix? I think I'm getting the issue also with pandas 2.2.2. Any advice for initializing a DataFrame for an integer dtype, that avoids this? When assigning to a row a list containing None values with I'd probably just switch pd.NA to be my only allowed null value in integer DataFrames in my code, departing from the usual python syntax and functions that seamlessly club together the various null types. as I understand, |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Since version 1.4.0
.replace({pd.NA: None})
has no effect,pd.NA
is not be replaced byNone
anymore.Expected Behavior
The assertion in the example shown above is fulfilled in version 1.3.5.
Installed Versions
INSTALLED VERSIONS
commit : bb1f651
python : 3.10.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.13-arch1-1
Version : #1 SMP PREEMPT Wed, 05 Jan 2022 16:20:59 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : de_DE.UTF-8
pandas : 1.4.0
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.1.0
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.29
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: