BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

dieterv77 · 2013-03-16T02:27:03Z

No description provided.

ghost · 2013-03-16T02:57:39Z

needs a test case

ghost · 2013-03-16T03:06:59Z

should mean() be cast to int if the series has int dtype or vice-versa?

In [30]: from pandas.util.testing import makeCustomDataframe as mkdf
In [26]: df=mkdf(3,2,data_gen_f=lambda *args: np.random.randint(5))

In [27]: df
Out[27]: 
C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g0        1        0
R_l0_g1        3        4
R_l0_g2        0        3

In [28]: df.mean()
Out[28]: 
C0
C_l0_g0    1.333333
C_l0_g1    2.333333
dtype: float64

In [29]: df.replace(3, df.mean())
Out[29]: 
C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g0        1        0
R_l0_g1        1        4
R_l0_g2        0        2

jreback · 2013-03-16T04:13:51Z

BlockManager handles this kind of upcasting for many other cases (e.g. the Int -> Float blocks), BUT, a casual glance at replace/replace_list in internals/BlockManager shows that this was never updated.

Basically even though an inplace update happens, blocks are ALWAYS returned (which may be different/more than one), so that you have to return the blocks (see for example the eval methods)....

So instead of returning self (which COULD have the old blocks if new ones are created), need to create a new block manager (which you then assign at a higer level), e.g., DataFrame.where does this

    def replace_list(self, src_lst, dest_lst, inplace=False):
        """ do a list replace """
        if not inplace:
            self = self.copy()

        sset = set(src_lst)
        if any([k in sset for k in dest_lst]):
            masks = {}
            for s in src_lst:
                masks[s] = [b.values == s for b in self.blocks]

            THIS NEEDS A CHANGE
            for s, d in zip(src_lst, dest_lst):
                [b.putmask(masks[s][i], d, inplace=True) for i, b in
                 enumerate(self.blocks)]
        else:
            for s, d in zip(src_lst, dest_lst):
                self.replace(s, d, inplace=True)

        return self

Something like (you are doing what the apply method does, you can prob call apply in fact to do this)

   for s, d in zip(src_lst, dest_lst):
      result _blocks= []
      for i, b in enumerate(self.blocks):
         result_blocks.extend(b.putmask(masks[s][i], d, inplace=True))

    bm = self.__class__(result_blocks, self.axes)
    bm._consolidate_inplace()
    return bm

jreback · 2013-03-16T15:01:01Z

@dieterv77 rebase on master and @y-p test case should work (with your fix)

jreback · 2013-03-16T15:33:56Z

this is still broken....working on a fix

dieterv77 · 2013-03-16T17:38:46Z

@y-p The original commit contains tests for handling Series, though not any related to the upcasting issues. I can add some tests for that, though once that is sorted out.

jreback · 2013-03-16T20:20:48Z

@dieterv77 update to master, this has been sorted

dieterv77 · 2013-03-17T00:38:52Z

I did a rebase on my branch and resolved the conflicts. However, now one of the unittests i added is failing:

df = DataFrame({'zero': {'a': 0.0, 'b': 1.0}, 'one': {'a': 2.0, 'b': 0.0}})
result = df.replace(0.0, {'zero': 0.5, 'one': 1.0})
expected = DataFrame({'zero': {'a': 0.5, 'b': 1}, 'one': {'a': 2.0, 'b': 1.0}})
print result, expected

Is my expectation incorrect? i thought it should make the df['one']['b'] entry equal to 1.0, but instead it is giving 0.5

thanks for all your time on this.

jreback · 2013-03-17T01:07:54Z

I believe your expected is correct
I think I know where the issue is
I will put a revised up soon

had consolidated a bunch of code
and must have missed this

thxs

jreback · 2013-03-17T03:56:29Z

@dieterv77 i fixed replace to work with your test case (it was an issue on some sub-filtering in internals.py)

I incorporated your test cases and code changes....just easier that way

pls feel free to put in any more test cases if you'd like

this turned into a rabbit hole .....

dieterv77 · 2013-03-17T17:00:04Z

Thanks a lot for all your time on this! And sorry for taking you into this rabbit hole.

dieterv77 mentioned this pull request Mar 16, 2013

DataFrame.replace(<scalar>, ...) is not handled #2994

Closed

jreback mentioned this pull request Mar 16, 2013

BUG/ENH: guarantee blocks will upcast as needed, and split as needed #3065

Merged

jreback mentioned this pull request Mar 16, 2013

BUG: fixes in replace to deal with block upcasting #3068

Merged

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994

8f8b9c7

jreback mentioned this pull request Mar 17, 2013

BUG: replace with a dict misbehaving (GH 3064), due to incorrect filtering #3072

Merged

dieterv77 closed this Mar 17, 2013

dieterv77 deleted the FixGH2994 branch March 17, 2013 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

dieterv77 commented Mar 16, 2013

ghost commented Mar 16, 2013

ghost commented Mar 16, 2013

jreback commented Mar 16, 2013

jreback commented Mar 16, 2013

jreback commented Mar 16, 2013

dieterv77 commented Mar 16, 2013

jreback commented Mar 16, 2013

dieterv77 commented Mar 17, 2013

jreback commented Mar 17, 2013

jreback commented Mar 17, 2013

dieterv77 commented Mar 17, 2013

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064

Conversation

dieterv77 commented Mar 16, 2013

ghost commented Mar 16, 2013

ghost commented Mar 16, 2013

jreback commented Mar 16, 2013

jreback commented Mar 16, 2013

jreback commented Mar 16, 2013

dieterv77 commented Mar 16, 2013

jreback commented Mar 16, 2013

dieterv77 commented Mar 17, 2013

jreback commented Mar 17, 2013

jreback commented Mar 17, 2013

dieterv77 commented Mar 17, 2013