BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

TheNeuralBit · 2022-01-06T18:05:10Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

In [20]: df = pd.DataFrame({'group': pd.Series(dtype='object'), 'str': pd.Series(dtype='object')})                  
                                                                                                                    
In [21]: df.groupby('group').any()                        
---------------------------------------------------------------------------                                         
ValueError                                Traceback (most recent call last)                                         
<ipython-input-21-d0b9fa3e2ddd> in <module>               
----> 1 df.groupby('group').any()                                                                                   

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in any(self, skipna)                                                                                                                       
   1804             is True within its respective group, False otherwise.                                           
   1805         """                                       
-> 1806         return self._bool_agg("any", skipna)                                                                
   1807                                                   
   1808     @final                                        

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _bool_agg(self, val_test, skipna)
   1774                 return result.astype(inference, copy=False)                                                 
   1775                                                   
-> 1776         return self._get_cythonized_result(                                                                 
   1777             libgroupby.group_any_all,             
   1778             numeric_only=False,                                                                             

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, base_func, cython_dtype, numeric_only, needs_counts, needs_nullable, needs_mask, pre_processing, post_proce
ssing, **kwargs)                                          
   3383             mgr = mgr.get_numeric_data()                                                                    
   3384                                                                                                             
-> 3385         res_mgr = mgr.grouped_reduce(blk_func, ignore_failures=True)                                        
   3386                                                   
   3387         if not is_ser and len(res_mgr.items) != len(mgr.items):                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/managers.py in grouped_reduce(self, func, ignore_failures)
   1338                 for sb in blk._split():           
   1339                     try: 
-> 1340                         applied = sb.apply(func)  
   1341                     except (TypeError, NotImplementedError):                                                
   1342                         if not ignore_failures:                                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/blocks.py in apply(self, func, **kwargs)                                                                                                            
    388         one                                       
    389         """                                                                                                 
--> 390         result = func(self.values, **kwargs)      
    391                                                                                                             
    392         return self._split_op_result(result)      
                                                          
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in blk_func(values)    
   3342             vals = values                         
   3343             if pre_processing:                    
-> 3344                 vals, inferences = pre_processing(vals)                                                     
   3345                                                   
   3346             vals = vals.astype(cython_dtype, copy=False)                                                    
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in objs_to_bool(vals)  
   1752                 if skipna:                        
   1753                     func = np.vectorize(lambda x: bool(x) if not isna(x) else True)                         
-> 1754                     vals = func(vals)                                                                       
   1755                 else:                             
   1756                     vals = vals.astype(bool, copy=False)                                                    

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)                                                                                                             
   2106             vargs.extend([kwargs[_n] for _n in names])                                                      
   2107                                                   
-> 2108         return self._vectorize_call(func=func, args=vargs)                                                  
   2109                                                   
   2110     def _get_ufunc_and_otypes(self, func, args):  

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)    
   2184             res = func()                                                                                    
   2185         else:                                     
-> 2186             ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)                                
   2187                                                   
   2188             # Convert args to object arrays first                                                           

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _get_ufunc_and_otypes(self, func, args)                                                                                                     
   2140             args = [asarray(arg) for arg in args] 
   2141             if builtins.any(arg.size == 0 for arg in args):                                                 
-> 2142                 raise ValueError('cannot call `vectorize` on size 0 inputs '                                
   2143                                  'unless `otypes` is set')                                                  
   2144                                                   
                                                                                                                    
ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

Issue Description

Some grouped aggregations raise a ValueError (from numpy vectorization code) when operating on an empty DataFrame with an object dtype column. I've only observed this in any and all (perhaps because other aggregations drop the object column by default).

I've only observed this behavior in 1.4.0rc1. I've verified this code works fine in previous pandas versions, but I haven't tested with master.

Expected Behavior

Pandas should produce an empty result, as in previous versions.

Installed Versions

1.4.0rc1

The text was updated successfully, but these errors were encountered:

jreback · 2022-01-06T22:18:54Z

cc @jbrockmendel @rhshadrach

rhshadrach · 2022-01-08T19:51:16Z

@TheNeuralBit thanks for the report! Confirmed on master, adding otypes fixes. PR coming shortly.

TheNeuralBit · 2022-01-13T01:39:09Z

Thank you! I confirmed that this is fixed on master.

TheNeuralBit added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 6, 2022

jreback added this to the 1.4 milestone Jan 6, 2022

jreback added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 6, 2022

rhshadrach self-assigned this Jan 8, 2022

rhshadrach mentioned this issue Jan 8, 2022

Fix groupby any/all on an empty series/frame #45274

Merged

3 tasks

jreback closed this as completed in #45274 Jan 10, 2022

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 11, 2022

code sample for pandas-dev#45231

a304e47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

TheNeuralBit commented Jan 6, 2022

jreback commented Jan 6, 2022

rhshadrach commented Jan 8, 2022

TheNeuralBit commented Jan 13, 2022

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

Comments

TheNeuralBit commented Jan 6, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

jreback commented Jan 6, 2022

rhshadrach commented Jan 8, 2022

TheNeuralBit commented Jan 13, 2022