-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Handle Series arguments in DataFrame.replace, fixes GH2994 #3064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
needs a test case |
should mean() be cast to int if the series has int dtype or vice-versa? In [30]: from pandas.util.testing import makeCustomDataframe as mkdf
In [26]: df=mkdf(3,2,data_gen_f=lambda *args: np.random.randint(5))
In [27]: df
Out[27]:
C0 C_l0_g0 C_l0_g1
R0
R_l0_g0 1 0
R_l0_g1 3 4
R_l0_g2 0 3
In [28]: df.mean()
Out[28]:
C0
C_l0_g0 1.333333
C_l0_g1 2.333333
dtype: float64
In [29]: df.replace(3, df.mean())
Out[29]:
C0 C_l0_g0 C_l0_g1
R0
R_l0_g0 1 0
R_l0_g1 1 4
R_l0_g2 0 2 |
BlockManager handles this kind of upcasting for many other cases (e.g. the Int -> Float blocks), BUT, a casual glance at replace/replace_list in internals/BlockManager shows that this was never updated. Basically even though an inplace update happens, blocks are ALWAYS returned (which may be different/more than one), so that you have to return the blocks (see for example the eval methods).... So instead of returning self (which COULD have the old blocks if new ones are created), need to create a new block manager (which you then assign at a higer level), e.g., DataFrame.where does this
Something like (you are doing what the apply method does, you can prob call apply in fact to do this)
|
@dieterv77 rebase on master and @y-p test case should work (with your fix) |
this is still broken....working on a fix |
@y-p The original commit contains tests for handling Series, though not any related to the upcasting issues. I can add some tests for that, though once that is sorted out. |
@dieterv77 update to master, this has been sorted |
I did a rebase on my branch and resolved the conflicts. However, now one of the unittests i added is failing: df = DataFrame({'zero': {'a': 0.0, 'b': 1.0}, 'one': {'a': 2.0, 'b': 0.0}}) Is my expectation incorrect? i thought it should make the df['one']['b'] entry equal to 1.0, but instead it is giving 0.5 thanks for all your time on this. |
I believe your expected is correct had consolidated a bunch of code thxs |
@dieterv77 i fixed replace to work with your test case (it was an issue on some sub-filtering in internals.py) I incorporated your test cases and code changes....just easier that way pls feel free to put in any more test cases if you'd like this turned into a rabbit hole ..... |
Thanks a lot for all your time on this! And sorry for taking you into this rabbit hole. |
No description provided.