Skip to content

Surprising behavior of DataFrame.replace #3582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue May 11, 2013 · 21 comments · Fixed by #3675
Closed

Surprising behavior of DataFrame.replace #3582

cpcloud opened this issue May 11, 2013 · 21 comments · Fixed by #3675
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

This seems a bit surprising:

>>> from string import ascii_lowercase as letters
>>> from pandas import DataFrame
>>> df = DataFrame(list(letters[:4]), columns=['a'])
>>> df
   a
0  a
1  b
2  c
3  d
>>> df.replace({'a': 'b'})
   a
0  a
1  a
2  c
3  d
>>> df.replace({'a': 'c'})
   a
0  a
1  b
2  b
3  d

Does this have to do with padding?

@jreback
Copy link
Contributor

jreback commented May 11, 2013

could be a bug

@cpcloud
Copy link
Member Author

cpcloud commented May 11, 2013

also no doc for the first parameter of this method, will submit a pr for that...

@jreback
Copy link
Contributor

jreback commented May 11, 2013

also if u have time:

I think we need clarification in the main docs on usage of when to use

filter, replace, select, update, lookup

I also don't think there is an example of filter anywhere

@cpcloud
Copy link
Member Author

cpcloud commented May 11, 2013

np, not sure if will be done 2day, trying to work out the regex replace on frames and just wanted to note this strangeness when i was looking at one of the simplest possible cases of replacement in an object block

@cpcloud
Copy link
Member Author

cpcloud commented May 11, 2013

@jreback A quick glance at indexing.rst shows docs for select and lookup (they could be fleshed out a bit), but nothing for the other 3. Will add the 3 and flesh the other 2 out.

@jreback
Copy link
Contributor

jreback commented May 12, 2013

@cpcloud I think replace is somewhere
a complete example

@cpcloud
Copy link
Member Author

cpcloud commented May 12, 2013

ah yes, grin '\.replace' doc shows it

@jankatins
Copy link
Contributor

Wow, this was worrying, but fortunately df.a.replace({'a': 'b'}) and df.replace({"a":{"a":"b"}}) works as intended...

@cpcloud
Copy link
Member Author

cpcloud commented May 12, 2013

@jreback seems the magic is already in _interpolate, I will leave it out in replace though.

@cpcloud
Copy link
Member Author

cpcloud commented May 12, 2013

@jreback The "bug" here is that when you call

df.replace({'a': 'b'})

the column 'a' replaces the value at index 'b' with the previous non-null value because the fill method default is pad. This seems a bit weird. Can fix if u want. I would change to "search the whole frame for occurrences of 'a' and replace with 'b'.

@jreback
Copy link
Contributor

jreback commented May 12, 2013

you want this equivalent to

df.replace('a','b')

?

@cpcloud
Copy link
Member Author

cpcloud commented May 12, 2013

yes, is that ok?

@jreback
Copy link
Contributor

jreback commented May 12, 2013

let me take a look a little later

was always fuzzy on this anyhow

are there tests for this usage anyhow ?
do they break?

@cpcloud
Copy link
Member Author

cpcloud commented May 12, 2013

There are tests. However, the replace method doesn't actually do anything so they don't break. Here's a pared down example from the test_replace_interpolate method of TestDataFrame from pandas/tests/test_frame.py:

# get the test frame
from pandas.tests.test_frame import _tsframe as df
from numpy.testing import assert_array_equal
res = df.replace({'A': nan}, method='pad', axis=1)
assert_array_equal(df, res)  # replace == identity function here

@jreback
Copy link
Contributor

jreback commented May 13, 2013

I think it makes sense to not have automatic interpolation by default

so, if say df.replace({ 'a' : 'b' }) would be translated to df.replace('a','b'), IOW, only allow None for the value if to_replace is a dict/Series (right now this will interpolate)

then make _interpolate an explicit method interpolate. Even though this is an API change, I doubt this is expected behavor

@wesm, @y-p any thoughts?

@cpcloud why don't you put up the PR when you are ready (as this would be an easy change anyhow)

@cpcloud
Copy link
Member Author

cpcloud commented May 13, 2013

okay. kicking tires a bit right now (added 12 new tests so far); adding the regex functionality is proving to be quite a bit more involved than i thought, but i have a much better understanding (although not complete) of DataFrame internals.

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2013

@jreback @y-p Can I address this issue in the regex pr? I think it will create fewer merge conflicts if I just do this all at once.

@jreback
Copy link
Contributor

jreback commented May 14, 2013

yes, I would

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2013

just realized the axis parameter in replace never gets used if the call to _interpolate is nixed...remove the parameter?

@jreback
Copy link
Contributor

jreback commented May 17, 2013

@cploud close this too?

@cpcloud
Copy link
Member Author

cpcloud commented May 17, 2013

Not just yet. Need to submit 0.12 pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants