BUG: dataframe.replace({np.nan: None}) failed when replaced even number of times #44485

wchengit · 2021-11-16T15:36:52Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd


def check_nan(df):
    for entry in df.to_dict(orient='records'):
        if any([isinstance(v, float) and np.isnan(v) for v in entry.values()]):
            return True

    return False


def generate_df():
    return pd.DataFrame(dict(age=[5, 6, np.NaN],
                             born=[pd.NaT, pd.Timestamp('1939-05-27'),
                                   pd.Timestamp('1940-04-25')],
                             name=['Alfred', 'Batman', ''],
                             toy=[None, 'Batmobile', 'Joker']))


def where_not_working():
    df = generate_df()
    df = df.where((pd.notnull(df)), None)
    print('where not working', check_nan(df))

def replace_working():
    df = generate_df()
    df = df.replace({np.nan: None})
    print('replace working', check_nan(df))

def replace_twice_not_working():
    df = generate_df()
    df = df.replace({np.nan: None})
    df = df.replace({np.nan: None})
    print('replace twice not working', check_nan(df))

def replace_odd_num_working():
    df = generate_df()
    df = df.replace({np.nan: None})
    df = df.replace({np.nan: None})
    df = df.replace({np.nan: None})
    print('replace three times working', check_nan(df))


where_not_working()
replace_working()
replace_twice_not_working()
replace_odd_num_working()

Issue Description

As shown in the example above:
dataframe.replace({np.nan: None}) works when it's applied to a dataframe once, 3 times, but not working (or "reverse" the change) when run twice, 4 times, etc.

Expected Behavior

It should consistently have the result dataframe containing no np.nan, doesn't matter how many replace attempts have been made.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-240.1.1.el8_3.x86_64
Version : #1 SMP Fri Oct 16 13:36:46 EDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

mroeschke · 2021-11-21T19:57:19Z

Simplifying the example, agreed that it would be nice It would be better behavior if In[7] was a no-op

In [1]: df = pd.DataFrame(dict(age=[5, 6, np.NaN],
   ...:                              born=[pd.NaT, pd.Timestamp('1939-05-27'),
   ...:                                    pd.Timestamp('1940-04-25')],
   ...:                              name=['Alfred', 'Batman', ''],
   ...:                              toy=[None, 'Batmobile', 'Joker']))

In [3]: df = df.replace({np.nan: None})

In [5]: df
Out[5]:
    age                 born    name        toy
0   5.0                 None  Alfred       None
1   6.0  1939-05-27 00:00:00  Batman  Batmobile
2  None  1940-04-25 00:00:00              Joker

In [7]: df = df.replace({np.nan: None})

In [8]: df
Out[8]:
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

wchengit · 2021-11-21T21:59:33Z

Thanks for looking into it. The issue is more than df.replace({np.nan: None}) not working. It acutally has an odd behavior that for the same dataframe, if replaced by odd times, the result is correct. If replaced by even times, it's back the the original, with NaT and NaN.

benlindsay · 2021-11-29T22:29:58Z

I just ran into this problem today, which took a while to track down. Glad it's not just me being affected by this

Runningwater23 · 2021-12-08T03:38:30Z

Perhaps this has to do with the pandas feature and is not a problem.
In Python, NaN is of the float type, and None is of the NoneType type.
For the data frame column age of the float type that contains NaN, the type of the age column changes from float to object after the df.replace({np.nan: None}) command is executed successfully.
When the command is executed again, pandas automatically identifies the type in the age column, determines that the type is float, and fills the value of None with NaN by default.
The preceding is the reason why NaN is replaced with None for odd times and then is replaced with NaN again for even times.

eloyfelix · 2022-01-13T16:53:58Z

This a bit annoying if the df = df.replace({np.nan: None}) needs to be run a second time because new columns have been added to the dataframe, example:

In [1]: import pandas as pd
   ...: import numpy as np
   ...: import random

In [2]: df = pd.DataFrame(dict(age=[5, 6, np.NaN],
   ...:                        born=[pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')],
   ...:                        name=['Alfred', 'Batman', ''],
   ...:                        toy=[None, 'Batmobile', 'Joker']))

In [3]: df = df.replace({np.nan: None})
   ...: df
Out[3]:
    age                 born    name        toy
0   5.0                 None  Alfred       None
1   6.0  1939-05-27 00:00:00  Batman  Batmobile
2  None  1940-04-25 00:00:00              Joker

New columns added with apply:

In [4]: def do_stuff(age):
   ...:     return age
   ...:
   ...: df['new_col'] = df['age'].apply(do_stuff)

In [5]: df
Out[5]:
    age                 born    name        toy  new_col
0   5.0                 None  Alfred       None      5.0
1   6.0  1939-05-27 00:00:00  Batman  Batmobile      6.0
2  None  1940-04-25 00:00:00              Joker      NaN

running df = df.replace({np.nan: None}) the second time flips the original None to NaN while changing the new NaNs to None so the values can't ever be in sync

In [6]: df = df.replace({np.nan: None})
   ...: df
Out[6]:
   age       born    name        toy new_col
0  5.0        NaT  Alfred       None     5.0
1  6.0 1939-05-27  Batman  Batmobile     6.0
2  NaN 1940-04-25              Joker    None

running it a third time flips eveything again:

In [7]: df = df.replace({np.nan: None})
   ...: df
Out[7]:
    age                 born    name        toy  new_col
0   5.0                 None  Alfred       None      5.0
1   6.0  1939-05-27 00:00:00  Batman  Batmobile      6.0
2  None  1940-04-25 00:00:00              Joker      NaN

frcode · 2022-03-15T10:50:26Z

I found that, in this df = df.where(pd.notnull(df), None) pattern, if the nan in columns is type of 'float' then the pattern works, it the nan is type of numpy.float64 then it not works.
I just want to replace all the nan (what ever the type) to None to insert this df to database.

wchengit added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 16, 2021

mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 21, 2021

eloyfelix mentioned this issue Jan 12, 2022

pd.where OverflowError with large numbers #31687

Closed

jorisvandenbossche mentioned this issue Nov 12, 2024

BUG?: using None as replacement value in replace() typically upcasts to object dtype #60284

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: dataframe.replace({np.nan: None}) failed when replaced even number of times #44485

BUG: dataframe.replace({np.nan: None}) failed when replaced even number of times #44485

wchengit commented Nov 16, 2021 •

edited

Loading

mroeschke commented Nov 21, 2021

wchengit commented Nov 21, 2021

benlindsay commented Nov 29, 2021

Runningwater23 commented Dec 8, 2021 •

edited

Loading

eloyfelix commented Jan 13, 2022 •

edited

Loading

frcode commented Mar 15, 2022

BUG: dataframe.replace({np.nan: None}) failed when replaced even number of times #44485

BUG: dataframe.replace({np.nan: None}) failed when replaced even number of times #44485

Comments

wchengit commented Nov 16, 2021 • edited Loading

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

mroeschke commented Nov 21, 2021

wchengit commented Nov 21, 2021

benlindsay commented Nov 29, 2021

Runningwater23 commented Dec 8, 2021 • edited Loading

eloyfelix commented Jan 13, 2022 • edited Loading

frcode commented Mar 15, 2022

wchengit commented Nov 16, 2021 •

edited

Loading

Runningwater23 commented Dec 8, 2021 •

edited

Loading

eloyfelix commented Jan 13, 2022 •

edited

Loading