-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.replace() overwrites when values are non-numeric #16051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, I'd consider this more a bug than intended behavior. Deep in the replace code, if pandas/pandas/core/internals.py Line 3271 in 2522efa
|
Today, when replacing a DataFrame with nested mapping {col: {key: value}}, an error is raised when keys and values are overlapping. However, this is actually due to this bug, that happens no matter if the mapping is nested or not. Wouldn't it be more consistent to raise the error when the keys and values are overlapping and values are non-numeric, instead of only raising when the mapping is nested? This would render DataFrame replacing with nested mapping {col: {key: value}} usable whenever a loop could be used, while raising an the error in every place it is necessary. |
I don't know if I'm hitting the same bug, but
however,
I have no idea how |
Looks like this is fixed on master. Could use a test.
|
Code Sample, a copy-pastable example if possible
Problem description
I'd expect the replacement over values in a dataframe to be non-transitive. Suppose that we would like to replace
a
withb
, andb
withc
. When this replacement is applied to an entry containing the valuea
, replacement rules are propagated and thereforec
is returned instead ofb
. Same replacement is not transitive (as shown in example code) for numeric values.I think this default behavior should be mentioned explicitly in the documentation. It would also be nice to have a Boolean option to set the transitivity on/off.
Expected Output
Output of
pd.show_versions()
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: