Skip to content

BUG: groupby apply throws error if custom func doesn't return non-None value. #9684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickeubank opened this issue Mar 19, 2015 · 6 comments
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Milestone

Comments

@nickeubank
Copy link
Contributor

This case may be unusual, but I was writing a test to make sure results are sorted within groupby calls, and found that if the func in a df.groupby().apply(func) call doesn't return a value, it causes an error.

Minimal code to replicate:

test_df = pd.DataFrame({'groups': [0,0,1,1], 'random_vars': [8,7,4,5]})
def test_func(x):
    pass
test_df.groupby('groups').apply(test_func)

Error:
Traceback (most recent call last):

  File "<ipython-input-18-d520af3d5d86>", line 11, in <module>
    z.groupby('d').apply(sort_test)

  File "/Users/Nick/GitHub/pandas/pandas/core/groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "/Users/Nick/GitHub/pandas/pandas/core/groupby.py", line 670, in _python_apply_general
    not_indexed_same=mutated)

  File "/Users/Nick/GitHub/pandas/pandas/core/groupby.py", line 2811, in _wrap_applied_output
    v = next(v for v in values if v is not None)

StopIteration

The problem seems to be due to the function test_func() not returning a value, though the error persists even if I return None in test_func().

My use, as example of actual situation it might come up:

test = pd.DataFrame({'groups': np.random.randint(0,10, size = 100), 'random_vars': np.random.rand(1,100)[0]})
test.sort('random_vars', inplace = True)

def sort_test(x):
    assert_frame_equal(x, x.sort('random_vars'))

from pandas.util.testing import assert_frame_equal
test.groupby('groups').apply(sort_test)
@shoyer
Copy link
Member

shoyer commented Mar 19, 2015

The issue here is that pandas uses None as a sentinel value for "skip this group" with apply. This lets you write things like this:

In [9]: test_df.groupby('groups').apply(lambda x: x if (x.index == 0).any() else None)
Out[9]:
          groups  random_vars
groups
0      0       0            8
       1       0            7

(not entirely sure why that extra groups column appears, but you get the idea)

So apparently GroupBy.apply doesn't know what to do when it back no non-None elements. It probably should default to creating an empty object?

@nickeubank
Copy link
Contributor Author

Seems reasonable. I think it's even fine if it throws an informative error, since this is such an odd case. It just took a while to figure out what was going on there.

@shoyer
Copy link
Member

shoyer commented Mar 19, 2015

Indeed, we all like informative errors. Want to give fixing this a try?

@nickeubank
Copy link
Contributor Author

:) Sure!

@nickeubank
Copy link
Contributor Author

OK @shoyer, added fix. nosetests run ok, as does my own experimentation, but groupby is a pretty complex bit of machinery, so if you wouldn't mind taking a quick look I'd appreciate it.

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Groupby labels Mar 19, 2015
@jreback jreback added this to the 0.16.1 milestone Mar 19, 2015
nickeubank pushed a commit to nickeubank/pandas that referenced this issue Apr 12, 2015
nickeubank pushed a commit to nickeubank/pandas that referenced this issue Apr 12, 2015
@jreback
Copy link
Contributor

jreback commented Apr 28, 2015

closed by #9685

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests

3 participants