-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.DataFrame.replace regression causes dtype to remain object #26632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please provide a reproducible example with sample data: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
The above shows the inconsistency of the behaviour and should be reproducible (works in my Python console). For a true minimal example of the problematic part only, you can copy&paste the following:
This should print |
I suspect this is the same root cause as #26632. |
This bug is still present as of version 0.25.0. Bisecting with Kjilis minimal working example revealed the commit 720d263 to be the one which introduced this bug. |
Thanks for bisecting. cc @peterpanmj. |
I am investigating into it. |
Thanks!
…On Fri, Aug 30, 2019 at 3:13 AM Wenhuan ***@***.***> wrote:
I am investigating into it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26632?email_source=notifications&email_token=AAKAOIXV6GST72JU6I64SGTQHDJDLA5CNFSM4HSIVWLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q5RBI#issuecomment-526506117>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIUDN5UAKOAPDIOWK7LQHDJDLANCNFSM4HSIVWLA>
.
|
This is indeed a bug . In [1]: import pandas as pd
...: df = pd.DataFrame(["a"])
...: print(df.replace({"a": 1.0, "b": 0.0}).dtypes[0])
object
In [2]: import pandas as pd
...: df = pd.DataFrame(["a"])
...: print(df.replace({"a": 1.0}).dtypes[0])
float64 When there is a scend replacer in the dict, the results are different. I've come up a solution to it in my PR |
Code Sample, a copy-pastable example if possible
Problem description
The behaviour shown above is inconsistent and hard to spot. In my case, it broke one of my tests due to mismatching types (as I was expecting a float64).
The problem seems to involve a regression when upgrading from pandas 0.23.4 to any later version (tested with 0.24.0, 0.24.1 and 0.24.2, all of which have the same issue).
Returning back to the old behaviour of changing the type also in the first case where it fails above (i.e. changing the type whenever possible?) would be more consistent and not require a manual type definition.
This is probably related to #23305.
Expected Output
Output of
pd.show_versions()
Note: The below is for the working version of pandas!
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 5.1.6-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.23.4
pytest: 4.4.2
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: