Add df.split with predicate function #3066

ghost · 2013-03-16T15:09:42Z

xref: http://stackoverflow.com/questions/13353233/best-way-to-split-a-dataframe-given-an-edge/15449992#15449992

Here's one for you, jeff.

wesm · 2013-03-16T16:31:01Z

Maybe this merits a new API? Essentially "split with predicate function". so we'd do:

df.split(lambda x: x == 'B', axis=0)

jreback · 2013-03-16T17:49:50Z

I think more of a scalar or list of values for that axis,
to allow multiple pieces
but then how do u control where the split value goes

maybe interval=open/closed? and don't return empty groups (eg if I select the last column)

if cols are list('abcdefg')
df.split(['a','c'])

groups of
a
bc
defg

ghost · 2013-03-16T22:52:18Z

Another addition would be to introspect the lambda for it's argcount
and provide a moving window of values:

df.groupby(lambda prev,curr: curr != prev).

and add a win_offset arg to specify nvals before, nvals ahead.

ghost · 2013-03-17T15:51:26Z

implementing that would also answer #414

dalejung · 2013-03-17T17:05:29Z

I wonder if a general edge binner would be useful here.

def edge_groupby(df, edges):
    edges[0] = True
    edges.iloc[-1] = True

    trues = edges[edges].index.values
    trues[-1] = trues[-1] + 1 # make sure we include last value

    bins = lib.generate_bins_dt64(edges.index, trues, closed='left')
    binlabels = [0] + list(bins[:-1]) # label=left
    grouper = BinGrouper(bins, binlabels)
    return df.groupby(grouper)

grouped = edge_groupby(df, df.a == 'B')

That would take in a bool series where the True values are the edges.

ghost · 2013-03-17T17:36:12Z

Maybe a sliding window with a reduce style operation?

df.groupby_reduce(lambda acc,prev,curr: acc + (prev and prev == 'B'))

or

df.groupby_reduce(lambda acc,*vs: acc + (vs[0] and vs[0] == 'B'),2,'right')

with acc=0 on init.

ghost · 2013-11-22T10:01:18Z

closing this in favor of cleaner implementation for #4059

ghost mentioned this issue Mar 20, 2013

API experiment: lambda grouper based on sliding window #3101

Closed

ghost mentioned this issue Apr 2, 2013

ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

Closed

jreback mentioned this issue Jun 27, 2013

API for splitting pandas objects #4059

Closed

ghost closed this as completed Nov 22, 2013

cpcloud mentioned this issue Jun 7, 2014

Split/Partition Master Issue #7387

Closed

8 tasks

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add df.split with predicate function #3066

Add df.split with predicate function #3066

ghost commented Mar 16, 2013

wesm commented Mar 16, 2013

jreback commented Mar 16, 2013

ghost commented Mar 16, 2013

ghost commented Mar 17, 2013

dalejung commented Mar 17, 2013

ghost commented Mar 17, 2013

ghost commented Nov 22, 2013

Add df.split with predicate function #3066

Add df.split with predicate function #3066

Comments

ghost commented Mar 16, 2013

wesm commented Mar 16, 2013

jreback commented Mar 16, 2013

ghost commented Mar 16, 2013

ghost commented Mar 17, 2013

dalejung commented Mar 17, 2013

ghost commented Mar 17, 2013

ghost commented Nov 22, 2013