Skip to content

API: update nth to use the _set_selection_from_grouper makes first==nth(0) and last==nth(-1) #7044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 12, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 5, 2014

closes #6732

nth sets the index appropriately and the same as first/last.

this becomes less like head/tail in that as_index determines when you have an index

here's the revised behavior:

In [1]: df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B'])

In [3]: df
Out[3]: 
   A   B
0  1 NaN
1  1   4
2  5   6

[3 rows x 2 columns]

In [2]: g = df.groupby('A')

In [9]: gni = df.groupby('A',as_index=False)
In [5]: g.first()
Out[5]: 
   B
A   
1  4
5  6

[2 rows x 1 columns]

# this was a regression from 0.13.1 (in that before this PR, this was returning like ``as_index=False``)
In [6]: g.nth(0)
Out[6]: 
    B
A    
1 NaN
5   6

[2 rows x 1 columns]

In [7]: g.nth(0,dropna='all')
Out[7]: 
   B
A   
1  4
5  6

[2 rows x 1 columns]
In [10]: gni.nth(0)
Out[10]: 
   A   B
0  1 NaN
2  5   6

[2 rows x 2 columns]

In [11]: gni.nth(0,dropna='all')
Out[11]: 
   A  B
0  1  4
1  5  6

[2 rows x 2 columns]

@jreback jreback added this to the 0.14.0 milestone May 5, 2014
@jreback
Copy link
Contributor Author

jreback commented May 5, 2014

@hayd @jorisvandenbossche pls review for saneness

@jorisvandenbossche
Copy link
Member

The question is a little bit if we see nth as a filtering function, or a reducing function? It looks like #6569 changed it be filtering, while this does revert it again to reducing?

@jreback
Copy link
Contributor Author

jreback commented May 5, 2014

I get it, the reason I 'changed' it was two-fold:

  • nth is a reducer, much like first and last, in fact first == nth(0)
  • you can now pass as_index=False to get a filter, so this preserves the behavior while making the default a more natural, show me the nth element.

so to answer your question, this is really a reducer (as it can return at most 1 per group!)

head/tail can return multiple per group (so they are filters)

@jreback
Copy link
Contributor Author

jreback commented May 8, 2014

@hayd ping!

@jreback
Copy link
Contributor Author

jreback commented May 12, 2014

@hayd any comments?

jreback added a commit that referenced this pull request May 12, 2014
API: update nth to use the _set_selection_from_grouper makes first==nth(0) and last==nth(-1)
@jreback jreback merged commit 809d9d1 into pandas-dev:master May 12, 2014
@hayd
Copy link
Contributor

hayd commented May 28, 2014

Looks good, thanks for putting this together, apologies for being AWOL of late!

@jreback
Copy link
Contributor Author

jreback commented May 28, 2014

glad 2 have u back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby().first() skips NaN values
3 participants