Skip to content

Replace with nested dict raises for overlapping keys #27696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 27, 2019
6 changes: 5 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6640,7 +6640,11 @@ def replace(

for k, v in items:
keys, values = list(zip(*v.items())) or ([], [])
if set(keys) & set(values):
# add another check to avoid boolean being regarded
# as binary in python set
if set(keys) & set(values) and set(map(str, keys)) & set(
map(str, values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a little bit too permissive now, as it will allow {0: 1.0, 1: 'a'}, which was previously rejected (might not actually matter but is a change in behavior we should be cognizant of).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i should have thought of it more thoroughly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just not be removed altogether? Not clear on the purpose of it

):
raise ValueError(
"Replacement not allowed with "
"overlapping keys and values"
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/generic/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -948,3 +948,15 @@ def test_deprecated_get_dtype_counts(self):
df = DataFrame([1])
with tm.assert_produces_warning(FutureWarning):
df.get_dtype_counts()

def test_boolean_in_replace(self):
# GH 27660
df = DataFrame({"col": [False, True, 0, 1]})

result = df.replace({"col": {False: 0, True: 1}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to switch the replacement to {False: 1, True: 0} then this test will fail as the integers will get incorrectly replaced:

In [2]: df = pd.DataFrame({'col': [False, True, 0, 1]})

In [3]: df.replace({'col': {False: 1, True: 0}})
Out[3]: 
   col
0    1
1    0
2    1
3    0

I think the current version of the test is passing because 0/1 are just getting replaced by themselves.

This is a consequence of how Python handles hashing, specifically 0/False have the same hash and evaluate equally with == (Python's fallback on hash collision), so they'll be considered the same in set operations or when looking up keys in a dict (likewise for 1/True).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehh, you are very right, it's not good solution @jschendel

expected = DataFrame({"col": [0, 1, 0, 1]})
assert_frame_equal(result, expected)

msg = "Replacement not allowed with overlapping keys and values"
with pytest.raises(ValueError, match=msg):
df.replace({"col": {0: 1, 1: "a"}})