Skip to content

API: disable to_csv and friends on GroupBy objects #4887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 27, 2013
Merged

API: disable to_csv and friends on GroupBy objects #4887

merged 1 commit into from
Sep 27, 2013

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 19, 2013

No description provided.

@ghost ghost assigned cpcloud Sep 19, 2013
@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

better than doing a whitelist?

@cpcloud
Copy link
Member Author

cpcloud commented Sep 19, 2013

my hunch is that there are many more things to whitelist than blacklist

@cpcloud
Copy link
Member Author

cpcloud commented Sep 19, 2013

i could be wrong....

@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

so many things don't make sense

e.g.

abs,apply,shift,tshift,where,mask,align,groupby,clip....practially every method

allowed of course:

last,first,cum*,mean,sum,min,max,head,tail....

much smaller list

@cpcloud
Copy link
Member Author

cpcloud commented Sep 22, 2013

@jreback merge?

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

I would have your test try out some more methods?

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

@jreback Anything else to add here?

(type(self).__name__, attr))

def __getitem__(self, key):
raise NotImplementedError

def _make_wrapper(self, name):
if name not in _apply_whitelist:
raise AttributeError("%r object has no attribute %r" %
(type(self).__name__, name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a big more imformative, maybe ' This is a groupby object and cannot perform the operation ....`?, try use apply on tihs object.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

okay improved the message .. good 2 go?

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

looks good

cpcloud added a commit that referenced this pull request Sep 27, 2013
API: disable to_csv and friends on GroupBy objects
@cpcloud cpcloud merged commit 03ac0bf into pandas-dev:master Sep 27, 2013
@cpcloud cpcloud deleted the groupby-blacklist branch September 27, 2013 17:02
@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

closes #4882

@gdraps
Copy link
Contributor

gdraps commented Nov 9, 2013

fyi, while testing pandas-master, hit a method not in the new groupby dispatch whitelist: value_counts (on a SeriesGroupBy object). Another possible addition: shift, as used in this SO answer.

Tried to generate a list of blacklisted methods for DataFrame and Series, see below -- needs further filtering, but may reveal useful blocked methods. any thoughts on the remaining methods?

In [18]: sorted([x for x in set(dir(pd.DataFrame)) - pd.core.groupby._apply_whitelist - set(dir(pd.core.groupby.GroupBy)) if not x.startswith(('_', 'T', 'to', 'from', 'as'))])
Out[18]: ['abs', 'add', 'add_prefix', 'add_suffix', 'align', 'all', 'any', 'append', 'applymap', 'at', 'at_time', 'axes', 'between_time', 'bfill', 'blocks', 'bool', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine', 'combineAdd', 'combineMult', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'corrwith', 'cov', 'delevel', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes', 'duplicated', 'empty', 'eq', 'eval', 'ffill', 'filter', 'first_valid_index', 'floordiv', 'ftypes', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'iat', 'icol', 'idxmax', 'idxmin', 'iget_value', 'iloc', 'index', 'info', 'insert', 'interpolate', 'irow', 'isin', 'isnull', 'iteritems', 'iterkv', 'iterrows', 'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last_valid_index', 'le', 'load', 'loc', 'lookup', 'lt', 'mad', 'mask', 'merge', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'notnull', 'pct_change', 'pivot', 'pivot_table', 'pop', 'pow', 'product', 'query', 'radd', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'replace', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rpow', 'rsub', 'rtruediv', 'save', 'select', 'set_index', 'set_value', 'shape', 'shift', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'stack', 'sub', 'subtract', 'swapaxes', 'swaplevel', 'take', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack', 'update', 'values', 'where', 'xs']

In [19]: sorted([x for x in set(dir(pd.Series)) - pd.core.groupby._apply_whitelist - set(dir(pd.core.groupby.GroupBy)) if not x.startswith(('_', 'T', 'to', 'from', 'as'))]) 
Out[19]: ['abs', 'add', 'add_prefix', 'add_suffix', 'align', 'all', 'any', 'append', 'argmax', 'argmin', 'argsort', 'at', 'at_time', 'autocorr', 'axes', 'base', 'between', 'between_time', 'bfill', 'blocks', 'bool', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'cov', 'data', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'duplicated', 'empty', 'eq', 'ffill', 'filter', 'first_valid_index', 'flags', 'floordiv', 'ftype', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'iat', 'idxmax', 'idxmin', 'iget', 'iget_value', 'iloc', 'imag', 'index', 'interpolate', 'irow', 'is_time_series', 'isin', 'isnull', 'item', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'kurtosis', 'last_valid_index', 'le', 'load', 'loc', 'lt', 'mad', 'map', 'mask', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'pop', 'pow', 'product', 'ptp', 'put', 'radd', 'ravel', 'rdiv', 'real', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'repeat', 'replace', 'reset_index', 'reshape', 'rfloordiv', 'rmod', 'rmul', 'round', 'rpow', 'rsub', 'rtruediv', 'save', 'select', 'set_value', 'shape', 'shift', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'str', 'strides', 'sub', 'subtract', 'swapaxes', 'swaplevel', 'take', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'view', 'weekday', 'where', 'xs']

@jtratner
Copy link
Contributor

@gdraps can you move this to a separate issue?

@hayd
Copy link
Contributor

hayd commented Nov 28, 2013

Just to throw a spanner in the works here, I wonder if this is going to break existing code which uses this "feature" (i.e. user adds their own method to DataFrame then uses it on a groupby)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants