pd.DataFrame.replace regression causes dtype to remain object #26632

Kjili · 2019-06-03T14:34:31Z

Code Sample, a copy-pastable example if possible

import pandas as pd

# broken after pandas 0.23.4 if only "a" is replaced
def return_replace(initial):
	return initial.replace({"a": 1.0, "b": 0.0})

# working
def return_replace_just_one(initial):
	return initial.replace({"a": 1.0})

# the following should all be float64
print("problem:", return_replace(pd.DataFrame(["a"])).dtypes[0])
print("works:", return_replace(pd.DataFrame(["b"])).dtypes[0])
print("works:", return_replace_just_one(pd.DataFrame(["a"])).dtypes[0])
print("works:", return_replace(pd.DataFrame(["a", "b"])).dtypes[0])

Problem description

The behaviour shown above is inconsistent and hard to spot. In my case, it broke one of my tests due to mismatching types (as I was expecting a float64).
The problem seems to involve a regression when upgrading from pandas 0.23.4 to any later version (tested with 0.24.0, 0.24.1 and 0.24.2, all of which have the same issue).
Returning back to the old behaviour of changing the type also in the first case where it fails above (i.e. changing the type whenever possible?) would be more consistent and not require a manual type definition.

This is probably related to #23305.

Expected Output

problem: float64
works: float64
works: float64
works: float64

Output of `pd.show_versions()`

Note: The below is for the working version of pandas!

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 5.1.6-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.23.4
pytest: 4.4.2
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jschendel · 2019-06-03T19:17:50Z

Please provide a reproducible example with sample data: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Kjili · 2019-06-04T09:39:03Z

The above shows the inconsistency of the behaviour and should be reproducible (works in my Python console). For a true minimal example of the problematic part only, you can copy&paste the following:

import pandas as pd
df = pd.DataFrame(["a"])
print(df.replace({"a": 1.0, "b": 0.0}).dtypes[0])

This should print float64.

TomAugspurger · 2019-06-04T11:46:49Z

I suspect this is the same root cause as #26632.

qtux · 2019-08-26T18:31:48Z

This bug is still present as of version 0.25.0. Bisecting with Kjilis minimal working example revealed the commit 720d263 to be the one which introduced this bug.

TomAugspurger · 2019-08-26T19:36:01Z

Thanks for bisecting.

cc @peterpanmj.

peterpanmj · 2019-08-30T08:13:29Z

I am investigating into it.

TomAugspurger · 2019-08-30T13:29:35Z

Thanks!

…

On Fri, Aug 30, 2019 at 3:13 AM Wenhuan ***@***.***> wrote: I am investigating into it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26632?email_source=notifications&email_token=AAKAOIXV6GST72JU6I64SGTQHDJDLA5CNFSM4HSIVWLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q5RBI#issuecomment-526506117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIUDN5UAKOAPDIOWK7LQHDJDLANCNFSM4HSIVWLA> .

peterpanmj · 2019-11-01T09:00:00Z

This is indeed a bug .

In [1]: import pandas as pd
   ...: df = pd.DataFrame(["a"])
   ...: print(df.replace({"a": 1.0, "b": 0.0}).dtypes[0])
object
In [2]: import pandas as pd
   ...: df = pd.DataFrame(["a"])
   ...: print(df.replace({"a": 1.0}).dtypes[0])
float64

When there is a scend replacer in the dict, the results are different. I've come up a solution to it in my PR

…v#26632) (pandas-dev#29317)

jschendel added the Needs Info Clarification about behavior needed to assess issue label Jun 3, 2019

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Oct 16, 2019

BUG:fix replacer's dtype is not respected (pandas-dev#26632)

54b05d8

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Nov 1, 2019

BUG:fix replacer's dtype is not respected (pandas-dev#26632)

b714af9

peterpanmj mentioned this issue Nov 1, 2019

fix bugs cause replacer's dtype not respected #29317

Merged

5 tasks

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Info Clarification about behavior needed to assess issue labels Nov 2, 2019

jreback added this to the 1.0 milestone Nov 2, 2019

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Nov 19, 2019

BUG: fix replacer's dtypes not respected for frame replace (pandas-de…

deea374

…v#26632)

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Nov 20, 2019

BUG: fix bugs causes dataframe replace to not respect replacer's dtype (

a2ca9c4

pandas-dev#26632)

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Nov 20, 2019

BUG: fix replacer's dtypes not respected for frame replace (pandas-de…

f9c8d27

…v#26632)

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Nov 20, 2019

BUG: fix replacer's dtypes not respected for frame replace (pandas-de…

eeeae46

…v#26632)

WillAyd closed this as completed in #29317 Nov 20, 2019

WillAyd pushed a commit that referenced this issue Nov 20, 2019

BUG: fix replacer's dtypes not respected for frame replace (#26632) (#…

958756a

…29317)

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

BUG: fix replacer's dtypes not respected for frame replace (pandas-de…

96996b2

…v#26632) (pandas-dev#29317)

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

BUG: fix replacer's dtypes not respected for frame replace (pandas-de…

b39f085

…v#26632) (pandas-dev#29317)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.DataFrame.replace regression causes dtype to remain object #26632

pd.DataFrame.replace regression causes dtype to remain object #26632

Kjili commented Jun 3, 2019

INSTALLED VERSIONS

jschendel commented Jun 3, 2019

Kjili commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019

qtux commented Aug 26, 2019 •

edited

Loading

TomAugspurger commented Aug 26, 2019

peterpanmj commented Aug 30, 2019

TomAugspurger commented Aug 30, 2019 via email

peterpanmj commented Nov 1, 2019

pd.DataFrame.replace regression causes dtype to remain object #26632

pd.DataFrame.replace regression causes dtype to remain object #26632

Comments

Kjili commented Jun 3, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jschendel commented Jun 3, 2019

Kjili commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019

qtux commented Aug 26, 2019 • edited Loading

TomAugspurger commented Aug 26, 2019

peterpanmj commented Aug 30, 2019

TomAugspurger commented Aug 30, 2019 via email

peterpanmj commented Nov 1, 2019

Output of `pd.show_versions()`

qtux commented Aug 26, 2019 •

edited

Loading