-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add df.split with predicate function #3066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe this merits a new API? Essentially "split with predicate function". so we'd do:
|
I think more of a scalar or list of values for that axis, maybe interval=open/closed? and don't return empty groups (eg if I select the last column) if cols are list('abcdefg') groups of |
Another addition would be to introspect the lambda for it's argcount df.groupby(lambda prev,curr: curr != prev). and add a |
implementing that would also answer #414 |
I wonder if a general edge binner would be useful here. def edge_groupby(df, edges):
edges[0] = True
edges.iloc[-1] = True
trues = edges[edges].index.values
trues[-1] = trues[-1] + 1 # make sure we include last value
bins = lib.generate_bins_dt64(edges.index, trues, closed='left')
binlabels = [0] + list(bins[:-1]) # label=left
grouper = BinGrouper(bins, binlabels)
return df.groupby(grouper)
grouped = edge_groupby(df, df.a == 'B') That would take in a bool series where the True values are the edges. |
Maybe a sliding window with a df.groupby_reduce(lambda acc,prev,curr: acc + (prev and prev == 'B')) or df.groupby_reduce(lambda acc,*vs: acc + (vs[0] and vs[0] == 'B'),2,'right') with |
closing this in favor of cleaner implementation for #4059 |
xref: http://stackoverflow.com/questions/13353233/best-way-to-split-a-dataframe-given-an-edge/15449992#15449992
Here's one for you, jeff.
The text was updated successfully, but these errors were encountered: