Skip to content

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dieterv77
Copy link
Contributor

No description provided.

@ghost
Copy link

ghost commented Mar 16, 2013

needs a test case

@ghost
Copy link

ghost commented Mar 16, 2013

should mean() be cast to int if the series has int dtype or vice-versa?

In [30]: from pandas.util.testing import makeCustomDataframe as mkdf
In [26]: df=mkdf(3,2,data_gen_f=lambda *args: np.random.randint(5))

In [27]: df
Out[27]: 
C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g0        1        0
R_l0_g1        3        4
R_l0_g2        0        3

In [28]: df.mean()
Out[28]: 
C0
C_l0_g0    1.333333
C_l0_g1    2.333333
dtype: float64

In [29]: df.replace(3, df.mean())
Out[29]: 
C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g0        1        0
R_l0_g1        1        4
R_l0_g2        0        2

@jreback
Copy link
Contributor

jreback commented Mar 16, 2013

BlockManager handles this kind of upcasting for many other cases (e.g. the Int -> Float blocks), BUT, a casual glance at replace/replace_list in internals/BlockManager shows that this was never updated.

Basically even though an inplace update happens, blocks are ALWAYS returned (which may be different/more than one), so that you have to return the blocks (see for example the eval methods)....

So instead of returning self (which COULD have the old blocks if new ones are created), need to create a new block manager (which you then assign at a higer level), e.g., DataFrame.where does this

    def replace_list(self, src_lst, dest_lst, inplace=False):
        """ do a list replace """
        if not inplace:
            self = self.copy()

        sset = set(src_lst)
        if any([k in sset for k in dest_lst]):
            masks = {}
            for s in src_lst:
                masks[s] = [b.values == s for b in self.blocks]

            THIS NEEDS A CHANGE
            for s, d in zip(src_lst, dest_lst):
                [b.putmask(masks[s][i], d, inplace=True) for i, b in
                 enumerate(self.blocks)]
        else:
            for s, d in zip(src_lst, dest_lst):
                self.replace(s, d, inplace=True)

        return self

Something like (you are doing what the apply method does, you can prob call apply in fact to do this)

   for s, d in zip(src_lst, dest_lst):
      result _blocks= []
      for i, b in enumerate(self.blocks):
         result_blocks.extend(b.putmask(masks[s][i], d, inplace=True))

    bm = self.__class__(result_blocks, self.axes)
    bm._consolidate_inplace()
    return bm

@jreback
Copy link
Contributor

jreback commented Mar 16, 2013

@dieterv77 rebase on master and @y-p test case should work (with your fix)

@jreback
Copy link
Contributor

jreback commented Mar 16, 2013

this is still broken....working on a fix

@dieterv77
Copy link
Contributor Author

@y-p The original commit contains tests for handling Series, though not any related to the upcasting issues. I can add some tests for that, though once that is sorted out.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2013

@dieterv77 update to master, this has been sorted

@dieterv77
Copy link
Contributor Author

I did a rebase on my branch and resolved the conflicts. However, now one of the unittests i added is failing:

df = DataFrame({'zero': {'a': 0.0, 'b': 1.0}, 'one': {'a': 2.0, 'b': 0.0}})
result = df.replace(0.0, {'zero': 0.5, 'one': 1.0})
expected = DataFrame({'zero': {'a': 0.5, 'b': 1}, 'one': {'a': 2.0, 'b': 1.0}})
print result, expected

Is my expectation incorrect? i thought it should make the df['one']['b'] entry equal to 1.0, but instead it is giving 0.5

thanks for all your time on this.

@jreback
Copy link
Contributor

jreback commented Mar 17, 2013

I believe your expected is correct
I think I know where the issue is
I will put a revised up soon

had consolidated a bunch of code
and must have missed this

thxs

@jreback
Copy link
Contributor

jreback commented Mar 17, 2013

@dieterv77 i fixed replace to work with your test case (it was an issue on some sub-filtering in internals.py)

I incorporated your test cases and code changes....just easier that way

pls feel free to put in any more test cases if you'd like

this turned into a rabbit hole .....

@dieterv77
Copy link
Contributor Author

Thanks a lot for all your time on this! And sorry for taking you into this rabbit hole.

@dieterv77 dieterv77 closed this Mar 17, 2013
@dieterv77 dieterv77 deleted the FixGH2994 branch March 17, 2013 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants