-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.replace(dict) has weird behaviour in some cases #5338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the case a original value (e.g. 0-9A-F) was part of a color defintion, using df.replace{dict} would do a double replace on the same cell, resulting in colors like "#12345#ABCDEF", which would then throw an error during plotting. Using apply(lambda x: replacements[x]) is probably slower, so should be replaced when the pandas bug is gone. See pandas-dev/pandas#5338 for the bugreport in pandas.
@cpcloud can you look at this? |
yep |
I guess I'm mr. replace now |
:) |
@JanSchulz @jreback As I'm sure you both know df.replace({"color":{"1":"2","2":"3","3":"4","4":"5"}}) is not well-defined. For example, |
What should |
With a non-nested dict like df.replace({'1': '2', '2': '3'}) exactly what I described above is happening |
I think u should raise on a dict that had keys and values overlap however an ordereddict would be ok |
I think if set(d.keys()) & set(d.values()) is not empty then u raise |
ordereddict would still have the same problem, since you'll just end up replace each current value with the next replacement |
suppose they're iterated over in order. then
|
ok then just raise if their is overlap then |
yep good call |
Hm okay it seems to be only happening with strings .. . let me investigate some more |
I would find it interesting if the individual replacements are not done one after another to the whole row but all together so that that Right now (and if that raises),I have to figure out the replacement order in python when I construct a replacement dict (probably via a loop), but I would like to have the code which does that in pandas so that it is fast. |
Yep. I'll check it out tonight. It should work the way you expect, since it works for ints. |
It looks like there's a somewhat subtle bug here. It's only apparent in certain cases ( This "works":
But this is broken (it shouldn't return the same as the previous call to
So, in fact, it doesn't work except in this trivial case, which inadvertently depends on how the I vote for banning this as @jreback suggested above by raising if the keys and values overlap. |
agreed |
i'm kind of down on these nested dict replaces right now 😡 (probably just debugging anger) Is there a use case for them other than convenience? |
@JanSchulz If you want to do this kind of replacement you can do
|
So given a |
That seems like it should be possible I'll investigate tonight. |
@cpcloud This issue is closed, but your last comment indicates that something is still to be done here. |
Any updates on this? Apparently it shouldn't be closed. |
In the case a original value (e.g. 0-9A-F) was part of a color defintion, using df.replace{dict} would do a double replace on the same cell, resulting in colors like "#12345#ABCDEF", which would then throw an error during plotting. Using apply(lambda x: replacements[x]) is probably slower, so should be replaced when the pandas bug is gone. See pandas-dev/pandas#5338 for the bugreport in pandas.
So, my expected behaviour would be:
I found the problem when I tried to match string values to colors and got blown up color values: like
{"3":"#123456","4":"#000000"}
wouldn't convert"3"
into"#123#00000056"
Edit: insert string cell cases and my expected behaviour and deleted the intial comments which had these examples
The text was updated successfully, but these errors were encountered: