Skip to content

More tolerant dataframe drop method for multiple columns deletion #5300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
halleygithub opened this issue Oct 23, 2013 · 5 comments · Fixed by #6736
Closed

More tolerant dataframe drop method for multiple columns deletion #5300

halleygithub opened this issue Oct 23, 2013 · 5 comments · Fixed by #6736
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@halleygithub
Copy link

Current dataframe.drop will raise error for below code for 'non_exist_in_df_col' :

df = df.drop(['col_1', 'col_2', 'non_exist_in_df_col'], axis=1)

But below is better for it can accept it.

def drop_cols(df, del_cols):   
    for col in (set(del_cols) & set(df.columns)):
        df = df.drop([col], axis=1)
    return df
DataFrame.drop_cols = drop_cols

And it will be better to add an 'inplace' option to speed up the repeatly df self-assignment.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2013

not sure this is a good idea because then you can easily have silent errors when you just say misspelled something, e.g.

In [8]: df = DataFrame(randn(10,2),columns=['foo','bar'])

In [9]: df.drop('bah')
ValueError: labels ['bah'] not contained in axis

If drop is silent then this woulld be ok, but a no-op.

as for the inplace suggestion, that is being added in 0.14, however, this does not speed anything up,
just makes the method work inplace. Most operations require a copy to avoid data aliasing.

@halleygithub
Copy link
Author

Or add an "tolerant_option=True|False" in dataframe.drop method ?

@halleygithub
Copy link
Author

For manytimes, just want to make sure certain columns not exist any more in a dataframe after the drop, no matter the cols exists or not in the input df, just don't want to check.

@jtratner
Copy link
Contributor

jtratner commented Apr 5, 2014

Given that index is basically a hash table, you could wrap like this for the moment:

cols = ['a', 'b', 'c']
cols = [c for c in cols if c in df.index]
df.drop(cols)

@cpcloud
Copy link
Member

cpcloud commented Apr 5, 2014

You can also perform set operations with two indices like df.index - Index(['a','b'])

@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 5, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 5, 2014
@jreback jreback modified the milestones: 0.16.1, 0.16.0 Mar 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants