Skip to content

BUG/ENH: groupby head returns empty result for negative n #9214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jan 8, 2015 · 9 comments
Closed

BUG/ENH: groupby head returns empty result for negative n #9214

TomAugspurger opened this issue Jan 8, 2015 · 9 comments

Comments

@TomAugspurger
Copy link
Contributor

More drive-by reports. I'll try to close some of these issues sometime...

In [8]: s = pd.Series(np.random.randn(10))

In [9]: s.groupby(s > 0).apply(lambda x: x.head(-1))  # works nicely
Out[9]:
False  0   -0.277400
       1   -0.875231
       3   -0.315087
       4   -0.966649
       6   -0.474389
       7   -0.121225
True   2    0.237861
       5    1.311601
dtype: float64

In [10]: s.groupby(s > 0).head(-1)  # doesn't.
Out[10]: Series([], dtype: float64)
@cpcloud
Copy link
Member

cpcloud commented Jan 8, 2015

this seems like a strange way to express what you're showing here. why wouldn't one use tail(1) instead?

@TomAugspurger
Copy link
Contributor Author

I wanted all but the last one.

@jorisvandenbossche
Copy link
Member

I found it a bit strange that it works like this. The docstring of head also does not really mentions this ("Returns first n rows").
But apparantly that is the way it works for DataFrame.head(), so it should also for groupby.head()?

In [32]: df = pd.DataFrame(np.random.randn(5,2))

In [33]: df
Out[33]:
          0         1
0  0.902403  0.270362
1 -0.139103 -0.680811
2  1.284748 -0.404867
3  0.857957 -0.341599
4  0.741684 -1.402058

In [34]: df.head(-1)
Out[34]:
          0         1
0  0.902403  0.270362
1 -0.139103 -0.680811
2  1.284748 -0.404867
3  0.857957 -0.341599

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

In [103]: df = DataFrame(np.arange(10).reshape(5,2))

In [104]: df
Out[104]: 
   0  1
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

# .head(2)
In [105]: df.iloc[0:2]
Out[105]: 
   0  1
0  0  1
1  2  3

# head(-2)
In [106]: df.iloc[0:-2]
Out[106]: 
   0  1
0  0  1
1  2  3
2  4  5

so this is how its implemented. it does look odd, but follows slicing semantics.

@TomAugspurger
Copy link
Contributor Author

I realized last night that I could, and should, just use iloc. I'm ok with closing this as not a priority.

On Jan 9, 2015, at 06:04, jreback [email protected] wrote:

In [103]: df = DataFrame(np.arange(10).reshape(5,2))

In [104]: df
Out[104]:
0 1
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9

.head(2)

In [105]: df.iloc[0:2]
Out[105]:
0 1
0 0 1
1 2 3

head(-2)

In [106]: df.iloc[0:-2]
Out[106]:
0 1
0 0 1
1 2 3
2 4 5
so this is how its implemented. it does look odd, but follows slicing semantics.


Reply to this email directly or view it on GitHub.

@jorisvandenbossche
Copy link
Member

I think we should aim for consistency with DataFrame.head(), so I wouldn't close it.

@hayd
Copy link
Contributor

hayd commented Nov 18, 2015

Commented on the SO question:

def negative_head(g, n):
    return g._selected_obj[g.cumcount(ascending=False) >= n]

def negative_tail(g, n):
    return g._selected_obj[g.cumcount() >= n]

I didn't realize this was a thing when implementing this, sorry.

@smcinerney
Copy link

Unaware that this one from 2015 existed, I independently just filed #30192. Its title and body make clear that it's suggesting the undocumented behavior is useful and should be documented.

@TomAugspurger
Copy link
Contributor Author

Closing in favor of #30192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants