group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699

Sereger13 · 2015-11-25T11:31:27Z

This used to work fine in previous versions but appears to be broken in 0.17.1

The following code:

import pandas as pd
df = pd.DataFrame({'A': [], 'B': []})
gb = df.groupby('A') .size()

Produces this error:

ValueError: minlength must be positive

In v 0.16.2 the same code produced an empty DataFrame. We'd really like to upgrade to 0.17.1 but heavily rely on this functionality so have to hold the upgrade. Checking for empty DataFrame is not going to work for us either as there are too many places where it can actually be empty.

If you can suggest any workaround in the meantime so we could upgrade that would be appreciated.

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-238.9.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.16.2
...

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-25T13:43:07Z

cc @behzadnouri

@Sereger13 I don't think their is an easy way around this w/o resorting to patching DataFrame.groupby to catch this situation (which while messy and normally nor recommended may work for you temporarily).

Sereger13 · 2015-11-25T13:54:10Z

I see...

We found that this code:
count().iloc[:, 0]
produces very similar results to size() and seems to be working for us - but does not look particularly attractive so still deciding whether to have it or not.

If you do decide to fix size() - is there any idea when the next version/patch is going to be available? Thanks..

jreback · 2015-11-25T13:59:59Z

will be fixed; 0.18.0 prob later january

Sereger13 · 2015-11-25T14:02:42Z

Thanks.

jreback · 2015-11-25T14:08:45Z

@Sereger13 my point about patching is that you can avoid any code changes.

note again that is a 'hack' but will work.

e.g.

In [109]: df1 = pd.DataFrame({'A': [], 'B': []})

In [110]: df2 = pd.DataFrame({'A': [1,2,1], 'B': [1,2,3]})

In [116]: def size(self):
   .....:     try:
   .....:         return self.grouper.size()
   .....:     except ValueError:
   .....:         self._set_selection_from_grouper()
   .....:         return self._selected_obj[0:0]
   .....:     

In [117]: pandas.core.groupby.GroupBy.size = size

In [118]: df1.groupby('A').size()
Out[118]: 
Empty DataFrame
Columns: [B]
Index: []

In [119]: df2.groupby('A').size()
Out[119]: 
A
1    2
2    1
dtype: int64

Sereger13 · 2015-11-25T15:25:59Z

Great - thanks for your help.

behzadnouri · 2015-11-26T15:34:38Z

This is more a bug in np.bincount because unnecessarily requires minlength to be strictly positive. though kind of ugly, the work-around would be simple:

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index e9aa906..d722ef8 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -1439,7 +1439,8 @@ class BaseGrouper(object):
         """
         ids, _, ngroup = self.group_info
         ids = com._ensure_platform_int(ids)
-        out = np.bincount(ids[ids != -1], minlength=ngroup)
+        mask = ids != -1
+        out = np.bincount(ids[mask], minlength=ngroup) if ngroup != 0 else []
         return Series(out, index=self.result_index, dtype='int64')

     @cache_readonly

Sereger13 · 2015-11-26T18:13:33Z

Interesting... thanks for the update. Yes they could have made np.bincount() better indeed - allowing either None or 0 having the same meaning would make it more usable.

So it looks like simply setting ngroup to None should also do the trick:

if not ngroup:
    ngroup=None
out = np.bincount(ids[ids != -1], minlength=ngroup)

Not sure this is more readable than @behzadnouri's solution though. Looking forward for a new pandas with the workaround!

jreback added Bug Groupby Difficulty Intermediate labels Nov 25, 2015

jreback added this to the 0.18.0 milestone Nov 25, 2015

behzadnouri mentioned this issue Nov 26, 2015

BUG: work around for np.bincount with minlength=0 #11709

Merged

jreback closed this as completed in #11709 Nov 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699

group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699

Sereger13 commented Nov 25, 2015

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

behzadnouri commented Nov 26, 2015

Sereger13 commented Nov 26, 2015

group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699

group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699

Comments

Sereger13 commented Nov 25, 2015

INSTALLED VERSIONS

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

jreback commented Nov 25, 2015

Sereger13 commented Nov 25, 2015

behzadnouri commented Nov 26, 2015

Sereger13 commented Nov 26, 2015