API experiment: lambda grouper based on sliding window #3101

ghost · 2013-03-20T00:27:52Z

related #3066, #414, looking for an expressive, flexible
way of splitting frames into multiple frames based on
predicate functions.

Not optimized, just trying it out.

The number of args in the predicate function determines the width
of the sliding window
The df.groupby() offset kwd determines window offset
relative to current element.
The df.groupby() fill kwd determines the value provided
to the predicate instead of "phantom" locations when the window
slides past the edge of of array.

The predicate function also gets a _ variable injected into it's
context, which intially 0, and is updated after each call with the value
just returned. That is, _ holds the "group label of the previous item.
By updating the value as we go along we can always compute the
label for the "next" group. Somewhat like reduce
#3066 was prompted by this SO question

Note how concise doing that is, using this idiom (much more complex
splitting is easy as well):

In [1]: def prnt_dfs(g):
   ...:    for d in zip(*list(g))[-1]:
   ...:        print d 
   ...:        print ""
   ...: 

In [23]: ix = ['A','A','A','B','A','A','B','A','A']
    ...: df = pd.DataFrame(np.random.randn(9),index=ix)
    ...: prnt_dfs(df.groupby(lambda p,c: _+int(p=='B'),offset=-1))
          0
A -0.214422
A  1.373191
A -1.209503
B -0.258765

          0
A  1.014395
A -0.381113
B  0.707706

          0
A -2.038331
A  1.719545

In [2]: df = pd.DataFrame(range(4),index=['X0','X1','B','X3'])
   ...: 

# start new df if current is B
In [3]: g=(df.groupby(lambda c,n: _+int(c == 'B'))) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0
X1  1

    0
B   2
X3  3

# start new df if next is B
In [4]: g=(df.groupby(lambda c,n: _+int(n == 'B'))) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0

    0
X1  1
B   2
X3  3

# start new df if prev was B
In [5]: g=(df.groupby(lambda p,c: _+int(p == 'B'),offset=-1)) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0
X1  1
B   2

    0
X3  3

# start new df if prevPrev was B
In [25]: g=(df.groupby(lambda pp,p,c: _+int(pp == 'B'),offset=-1)) 
    ...: prnt_dfs(g)
          0
A  0.174714
A -0.761156
A  0.260196
B  0.381065
A -2.015324

          0
A  0.387559
B  0.800947
A -1.007098

          0
A -0.217266



In [38]: df = pd.DataFrame(range(10),index=[1,2,5,3,7,8,1,1,3,1])
    ...: g=(df.groupby(lambda p,c,n,nn: _+int(c+p<=3 or nn >5 ),offset=1,fill=0)) 
    ...: prnt_dfs(g)
   0
1  0

   0
2  1

   0
5  2

   0
3  3
7  4
8  5
1  6

   0
1  7
3  8
1  9

number of args in function determines window width df.groupby() `offset` kwd determines window offset relative to current element. df.groupby() `fill` kwd determines the value provided for values for "phantom" locations at edges. grouper function get a `_` variable injected into it's context, intialized to 0, and containing the value returned from the last call to the function. serves as an "accumalator" that you can use to return the previous/new grouping label Not optimized, just experimenting with the API.

ghost · 2013-05-05T09:15:58Z

way too funky.

ghost mentioned this pull request Apr 7, 2013

Idea: Frame/series should support grep-like context specification #3269

Closed

ghost closed this May 5, 2013

cpcloud mentioned this pull request Jun 7, 2014

Split/Partition Master Issue #7387

Closed

8 tasks

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API experiment: lambda grouper based on sliding window #3101

API experiment: lambda grouper based on sliding window #3101

ghost commented Mar 20, 2013

ghost commented May 5, 2013

API experiment: lambda grouper based on sliding window #3101

API experiment: lambda grouper based on sliding window #3101

Conversation

ghost commented Mar 20, 2013

ghost commented May 5, 2013