Skip to content

API experiment: lambda grouper based on sliding window #3101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

API experiment: lambda grouper based on sliding window #3101

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 20, 2013

related #3066, #414, looking for an expressive, flexible
way of splitting frames into multiple frames based on
predicate functions.

Not optimized, just trying it out.

  • The number of args in the predicate function determines the width
    of the sliding window
  • The df.groupby() offset kwd determines window offset
    relative to current element.
  • The df.groupby() fill kwd determines the value provided
    to the predicate instead of "phantom" locations when the window
    slides past the edge of of array.

The predicate function also gets a _ variable injected into it's
context, which intially 0, and is updated after each call with the value
just returned. That is, _ holds the "group label of the previous item.
By updating the value as we go along we can always compute the
label for the "next" group. Somewhat like reduce
#3066 was prompted by this SO question

Note how concise doing that is, using this idiom (much more complex
splitting is easy as well):

In [1]: def prnt_dfs(g):
   ...:    for d in zip(*list(g))[-1]:
   ...:        print d 
   ...:        print ""
   ...: 

In [23]: ix = ['A','A','A','B','A','A','B','A','A']
    ...: df = pd.DataFrame(np.random.randn(9),index=ix)
    ...: prnt_dfs(df.groupby(lambda p,c: _+int(p=='B'),offset=-1))
          0
A -0.214422
A  1.373191
A -1.209503
B -0.258765

          0
A  1.014395
A -0.381113
B  0.707706

          0
A -2.038331
A  1.719545
In [2]: df = pd.DataFrame(range(4),index=['X0','X1','B','X3'])
   ...: 

# start new df if current is B
In [3]: g=(df.groupby(lambda c,n: _+int(c == 'B'))) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0
X1  1

    0
B   2
X3  3

# start new df if next is B
In [4]: g=(df.groupby(lambda c,n: _+int(n == 'B'))) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0

    0
X1  1
B   2
X3  3

# start new df if prev was B
In [5]: g=(df.groupby(lambda p,c: _+int(p == 'B'),offset=-1)) 
   ...: prnt_dfs(g)
   ...: 
    0
X0  0
X1  1
B   2

    0
X3  3

# start new df if prevPrev was B
In [25]: g=(df.groupby(lambda pp,p,c: _+int(pp == 'B'),offset=-1)) 
    ...: prnt_dfs(g)
          0
A  0.174714
A -0.761156
A  0.260196
B  0.381065
A -2.015324

          0
A  0.387559
B  0.800947
A -1.007098

          0
A -0.217266



In [38]: df = pd.DataFrame(range(10),index=[1,2,5,3,7,8,1,1,3,1])
    ...: g=(df.groupby(lambda p,c,n,nn: _+int(c+p<=3 or nn >5 ),offset=1,fill=0)) 
    ...: prnt_dfs(g)
   0
1  0

   0
2  1

   0
5  2

   0
3  3
7  4
8  5
1  6

   0
1  7
3  8
1  9

number of args in function determines window width
df.groupby() `offset` kwd determines window offset
relative to current element.

df.groupby() `fill` kwd determines the value provided
for values for "phantom" locations at edges.

grouper function get a `_` variable injected into it's
context, intialized to 0, and containing the value returned
from the last call to the function. serves as an "accumalator"
that you can use to return the previous/new grouping label

Not optimized, just experimenting with the API.
@ghost
Copy link
Author

ghost commented May 5, 2013

way too funky.

@ghost ghost closed this May 5, 2013
@cpcloud cpcloud mentioned this pull request Jun 7, 2014
8 tasks
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants