groupby.nth lost multiindex #11830

jesrael · 2015-12-12T10:04:38Z

It is bug or not? Because in function mean and first it is OK.
link

df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': ['b', 'b', 'b', 'a'], 'c': [1, 2, 3, 4]})
print df
#   a  b  c
#0  1  b  1
#1  1  b  2
#2  2  b  3
#3  2  a  4

#lost multiindex
print df.groupby(['a', 'b']).c.nth(0)
#0    1
#2    3
#3    4
#Name: c, dtype: int64

print df.groupby(['a', 'b']).c.mean()
#a  b
#1  b    1.5
#2  a    4.0
#   b    3.0
#Name: c, dtype: float64
print df.groupby(['a', 'b']).c.first()
#a  b
#1  b    1
#2  a    4
#   b    3
#Name: c, dtype: int64
print df.groupby(['a', 'b']).nth(0).c
#a  b
#1  b    1
#2  a    4
#   b    3
#Name: c, dtype: int64

The text was updated successfully, but these errors were encountered:

jreback · 2015-12-12T14:03:11Z

looks like a bug, but this is a quite complicated area, see here

If you'd like to dig-in would be great.

pwaller · 2016-01-09T19:36:03Z

By the way, I just hit this I think, and you don't need a multi-index to cause it:

df.groupby(df.b).a.nth(0), x.groupby(df.b).a.first()

gives:

(0    1
 3    2
 Name: a, dtype: int64,
 b
 a    2
 b    1
 Name: a, dtype: int64)

(And I would expect the two to be equal). It seems that for whatever reason nth is having its index lost.

pwaller · 2016-01-09T20:21:14Z

I just built several versions in the past to see if this was a regression. It's doesn't look like a regression. I've tested as far back as 0.12.

pwaller · 2016-01-09T20:31:51Z

Sorry. I think my test may have been depending on a feature which didn't exist in older versions (the .name property of indices). It now looks like it gives the correct result in v0.13.0. This is my test:

$ python3 -c 'import pandas as P; raise SystemExit(not P.DataFrame({"a": [1, 2, 3, 4], "b": [9, 9, 9, 9]}).groupby("a").a.nth(0).index.equals([1, 2, 3, 4]))'
$ echo $?
0

The latest version gives a non-zero exit status.

Going to try an automated git bisect. Any hints on how to make pandas build faster in development mode?

jreback · 2016-01-09T21:15:14Z

http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-nth

nth actually has slightly different semantics than .first. we have been discussing this in a couple of issues: #7569, #11038, and #11039

pwaller · 2016-01-09T21:57:15Z

OK. Not that it matters but I was able to bisect the behaviour change to c444c73.

FWIW, For me this wasn't very clearly explained in the documentation, for example Here. I guess it comes down to how you read the word "rows". For me, that just meant that it was picking out a row within the group, but I still expected it to have the index for that group - just as first() and last() do. It would be very useful if the docs made this behaviour difference clear.

Another thing that is a surprise to me:

>>> P.DataFrame({"a": [1, 2, 3, 4], "b": [9, 9, 9, 9]}).groupby("a").b.nth(0)
0    9
1    9
2    9
3    9
Name: b, dtype: int64

Is different from (flipped b.nth(0) to nth(0).b):

>>> P.DataFrame({"a": [1, 2, 3, 4], "b": [9, 9, 9, 9]}).groupby("a").nth(0).b
a
1    9
2    9
3    9
4    9
Name: b, dtype: int64

(Edited: I accidentally pressed submit prematurely!)

jreback · 2016-01-09T22:15:31Z

the issue is we need a bunch more test as their is s bug in the results - can u do a PR with some more tests - would help move this along

adneu · 2016-05-24T06:19:20Z

I believe this was fixed with 445d1c6. The discussion for #11039 mentions that the commit now makes Groupby.nth reducing for a Series (keeping the original index) vs. the former behavior, which was filtering (position based index). Also that commit added these tests which I think address this bug.

+        assert_series_equal(g.B.nth(0), df.set_index('A').B.iloc[[0, 2]])
+        assert_series_equal(g.B.nth(1), df.set_index('A').B.iloc[[1]])

jreback · 2016-05-24T13:30:45Z

ok #12839 closed this, but see the comment on the actual PR (we ended up re-opening that one) because the state was not being preserved.

welcome to have you address that issue.

thanks!

jreback added the Groupby label Dec 12, 2015

jreback added Bug Difficulty Intermediate labels Dec 12, 2015

jreback added this to the Next Major Release milestone Dec 12, 2015

jreback closed this as completed May 24, 2016

jreback modified the milestones: 0.18.2, Next Major Release May 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby.nth lost multiindex #11830

groupby.nth lost multiindex #11830

jesrael commented Dec 12, 2015

jreback commented Dec 12, 2015

pwaller commented Jan 9, 2016

pwaller commented Jan 9, 2016

pwaller commented Jan 9, 2016

jreback commented Jan 9, 2016

pwaller commented Jan 9, 2016

jreback commented Jan 9, 2016

adneu commented May 24, 2016 •

edited

Loading

jreback commented May 24, 2016

groupby.nth lost multiindex #11830

groupby.nth lost multiindex #11830

Comments

jesrael commented Dec 12, 2015

jreback commented Dec 12, 2015

pwaller commented Jan 9, 2016

pwaller commented Jan 9, 2016

pwaller commented Jan 9, 2016

jreback commented Jan 9, 2016

pwaller commented Jan 9, 2016

jreback commented Jan 9, 2016

adneu commented May 24, 2016 • edited Loading

jreback commented May 24, 2016

adneu commented May 24, 2016 •

edited

Loading