Skip to content

BUG: fix nested dict replace #6365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 16, 2014
Merged

BUG: fix nested dict replace #6365

merged 2 commits into from
Feb 16, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 16, 2014

closes #6342

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

see u r hacking the biggest hack in internals, putmask
because of the dtype changes that are possible

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

yes this is definitely a hack. np.putmask will cycle values after a false is found if the size of the array and replacement values are not equal but it will only use the next value in the replacements if the Trues are contiguous ... highly counterintuitive behavior for that if u ask me ... is there a more optimal way that i'm overlooking?

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

internals is kind of fun .... always have to clear out my weekend when a bug involves internals.py 😄

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

fyi copyto raises in the case that the sizes are unequal, but only available in numpy >= 1.7

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

I avoided copyto for that reason

yeh...putmask has a lot of 'special' cases....which I think really are just weird things numpy does

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

replace is turning into a god function ... worth opening an issue?

@cpcloud cpcloud self-assigned this Feb 16, 2014
@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

what do you mean?

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

It's just really, really large ...

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

hahah so true!

next project...you can effectively replace update with replace!

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

@jreback What do u think about this? I'll run a vbench to make sure this doesn't cause a hit.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

sure....looks fine otherwise.....not even sure how often that path is actually hit (can you put a case that hits it up?)

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Nested dicts hit it if they contain keys that aren't in the frame/series

df = DataFrame({'col': range(1, 5)})
df.replace({'col': {-1: 'a', 1: 'b', 2: 'c'}})

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

ok...then prob ok....not hit a lot...but do a perf check anyhow

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Perf looks good:

packers_write_pack                           |   0.9952 |   0.9129 |   1.0901 |
stats_rank_average                           |  23.8746 |  21.2846 |   1.1217 |
timeseries_asof                              |   7.9971 |   7.0529 |   1.1339 |
join_dataframe_index_single_key_small        |   6.4060 |   5.5654 |   1.1510 |
timeseries_asof_nan                          |   8.0025 |   6.8281 |   1.1720 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Random network failure unrelated to this

https://travis-ci.org/cpcloud/pandas/jobs/18993144

merging

cpcloud added a commit that referenced this pull request Feb 16, 2014
@cpcloud cpcloud merged commit 5e64e88 into pandas-dev:master Feb 16, 2014
@cpcloud cpcloud deleted the replace-nested-dict-6342 branch February 16, 2014 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.replace() doesn't work correctly when passing a nested dict
2 participants