-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistent replace #35376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can confirm this occurs on master as well. |
I'm guessing this has something to do with downcasting? floats can be downcasted to ints which is why the replace using floats works for both cases but for ints it only replaces ints |
Thanks @asishm for the report. This definitely looks like a bug. from https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.replace.html
so the definition of equal needs clarification. but if we consider pandas' notion of equals, just like python, 1 == 1.0
I would therefore expect, for consistency, the result to be
|
0.25.3 was giving the expected output, so marking as regression for now pending further invesitigation on whether the change was intentional
|
the behaviour changed with #27768, so not intentional. 01f90c1 is the first bad commit
|
After a few checks, I've find that this lines are at the bug source (from @simonjayhawkins ) : if not isinstance(to_replace, list):
if inplace:
return [self]
return [self.copy()] With this lines, I have : >>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
0 1
0 1 5.0
1 2 2.0 Without this lines, I have : >>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
0 1
0 5 5.0
1 2 2.0 |
In fact with the 4 lines, I can force changes by placing >>> pd.DataFrame([[1,1.0],[2,2.0]]).replace([1.0], 5)
0 1
0 5 5.0
1 2 2.0 |
Edit: qlieumontadv was my company account, I will do a PR with my personal account (this one :) |
@jbrockmendel can we remove this lines without affecting the results ? if not isinstance(to_replace, list):
if inplace:
return [self]
return [self.copy()] I have the impression that it allows to increase performance by not testing some cases as explained in the comment that was deleted during this commit : # TODO: we should be able to infer at this point that there is
# nothing to replace |
The line before the ones you quoted is |
After some tests, I may have found the problem: I'm not very satisfied with this fix, I feel it's more of a DIY fix. |
Here is the changes I made : |
|
Added return is_integer(element) or is_float(element) to the IntBlock._can_hold_element method because an Block of ints can be replaced from int. Read pandas-dev#35376 for more info
As @jbrockmendel said in pandas-dev#35376, you can replace an int by a float in an IntBlock only if the element is an integer.
I've create a PR for this. |
It might be usefull to add some unit tests for this. |
This is required as part of any PR. use the example in the OP as a test. |
Sorry but what is an OP ? 😞 |
it normally means 'original poster'. I actually meant the first post. so use the code sample in #35376 (comment) |
Ok, thx 😄 |
I'm working on the test for the PR and I want to put |
the issue number, i.e. 35376 |
Added the pandas-dev#35376 error as a test.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Problem description
Problem description
Maybe I don't understand somethink or this is just non-sens
Expected Output
Or
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-62-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.5
numpy : 1.19.0
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0
Cython : None
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : 0.6.2
lxml.etree : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.50.1
The text was updated successfully, but these errors were encountered: