-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add regex functionality to DataFrame.replace #3584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if you are passed a non-compiled regex, e.g. |
the difference between |
ok in order to use a regex I have to pass to_replace=my_regex, and alternatively could make so:
is would be equiv of:
? |
I chose to not detect compiled regexes, so those must also have a |
@jreback Requested functionality added, along with more tests for it. |
possible to change this to 0.11.1 or can I add |
@y-p what do you think, 0.11.1 or 0.12? |
I take that back. Probably 2 API changes should be left for 12.0. I also need a bit more time to document interpolate because of this. |
@jreback @y-p This API might be more drastic than I thought, (exposing |
ok how about we add your new functionality |
cool. will also allow me doc the rlnshp btwn fillna/interpolate/replace more thoroughly. |
oh sweetness, got nested dicts of regexes working, i.e., df = DataFrame({'a': list(letters[:4]), 'b': list(letters[4:8]), 'c': range(4)})
df.replace({'b': {'.*e.*': nan}}, regex=True)
# or
df.replace(regex={'b': {'.*e.*': nan}}) |
Pushing even though the next build will fail because of 2 empty tests, just in case someone wants to play with it. |
@jreback So far, I'm not seeing the point of having interpolate and replace since they are virtually the same thing. I could see interpolate being a |
I agree, though And I would be ok with removing the interpolating is just a form of filling |
okay. sounds good. will remove |
I've also added an |
and forever more you get to be |
i hope that is a good thing...fyi goodbye |
yep :) so you r going to do a pr for 0.11.1 then all this stuff in separate for 0.12? |
oh crap...i thought u meant combine them in your previous message :( oh well, git cherry pick here i come. |
although the current state of regex replace gh is solid, the failing tests have to do with interpolate; i will remove them so u can merge and then a separate pr for the stuff we just discussed. |
sorry for the confusion...go ahead and leave that for 0.12 (the interpolate stuff)...that is API change, while your other stuff is just added functionaility...(and seems to be done anyhow) |
I am a little iffy on the a string -> string replacement (that is actually a number)? |
No to your last question. Here's an example: from string import ascii_letters as ltrs
a, b, c = list(ltrs[:4]), list(ltrs[4:8]), list(ltrs[:3]) + [4]
df = DataFrame({'a': a, 'b': b, 'c': c})
df.replace(regex={'c': {'[a-c]': nan}}, infer_types=True, inplace=True)
print df.c.dtype # should be float64 (This won't work yet, I forgot to add the |
I think you can dispense with it, and just do a |
Right now there's a call into the blkmgr method |
|
sure ok, will remove the param. |
``convert_objects I dont think you need this (though if you wanted to get really fancy, maybe, but shouldn't be by default)
|
ok so i will just use |
ah %%capture magic. am so going to write a magic to output to gfm... |
@jreback will it be annoying if I branch off of this to remove interpolate and then submit a pr based on that branch? |
nope...close em and open 2 new PR,just fine |
not sure i follow. close this one, open it again and then open a separate one based on this...? why not just branch of this and open a new pr? sorry if being a little thick here. |
sorry....I think we talked about a new PR for the interpolate stuff (for 0.12), so this one is for 0.11.1? just rebase as you need on this one, and make new one (I guess you are asking if the new one is going to be based off of this one, ok by me) |
yup thanks. |
add default of None to to_replace add ability to pass regex as to_replace regex Remove cruft more tests and add ability to pass regex and value Make exceptions more clear; push examples to missing_data.rst remove interpolation call make inplace work across axes in interpolate method ability to use nested dicts for regexs and others mostly doc updates formatting infer_types correction rls notes
@jreback this is ready to go (modulo travis build passing). |
@cpcloud fyi...you don't need the imports in the docs, all of that is imported at the top of each file (unless you need something special) |
Ah ok. Thanks. I'm paranoid about impenetrable sphinx errors :) I guess. Will remove. |
@cpcloud merged..thanks! I added a link in v0.11.1 to the place in the docs maybe add a couple of example (you can pull them right out of the docs) in v0.11.1? for string replacement |
I just happened upon this. Does this close #1479? I don't follow closely how interpolation came into a regex PR. The interpolate needs documentation AFAICT. I didn't read through things closely. |
@jseabold Doesn't close #1479 (at least in full). There's a As per my conversation with @jreback above we decided that the functionality that would've been provided the would-be |
Ah, ok. I did not know about the Block method and that does clear it up. Thanks. |
Oh nice! Thanks. Bonus that it is faster too! |
addresses #2285. cc @jreback and #3582.