Skip to content

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
TheNeuralBit opened this issue Jan 6, 2022 · 3 comments · Fixed by #45274
Closed
2 of 3 tasks
Assignees
Milestone

Comments

@TheNeuralBit
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

In [20]: df = pd.DataFrame({'group': pd.Series(dtype='object'), 'str': pd.Series(dtype='object')})                  
                                                                                                                    
In [21]: df.groupby('group').any()                        
---------------------------------------------------------------------------                                         
ValueError                                Traceback (most recent call last)                                         
<ipython-input-21-d0b9fa3e2ddd> in <module>               
----> 1 df.groupby('group').any()                                                                                   

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in any(self, skipna)                                                                                                                       
   1804             is True within its respective group, False otherwise.                                           
   1805         """                                       
-> 1806         return self._bool_agg("any", skipna)                                                                
   1807                                                   
   1808     @final                                        

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _bool_agg(self, val_test, skipna)
   1774                 return result.astype(inference, copy=False)                                                 
   1775                                                   
-> 1776         return self._get_cythonized_result(                                                                 
   1777             libgroupby.group_any_all,             
   1778             numeric_only=False,                                                                             

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, base_func, cython_dtype, numeric_only, needs_counts, needs_nullable, needs_mask, pre_processing, post_proce
ssing, **kwargs)                                          
   3383             mgr = mgr.get_numeric_data()                                                                    
   3384                                                                                                             
-> 3385         res_mgr = mgr.grouped_reduce(blk_func, ignore_failures=True)                                        
   3386                                                   
   3387         if not is_ser and len(res_mgr.items) != len(mgr.items):                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/managers.py in grouped_reduce(self, func, ignore_failures)
   1338                 for sb in blk._split():           
   1339                     try: 
-> 1340                         applied = sb.apply(func)  
   1341                     except (TypeError, NotImplementedError):                                                
   1342                         if not ignore_failures:                                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/blocks.py in apply(self, func, **kwargs)                                                                                                            
    388         one                                       
    389         """                                                                                                 
--> 390         result = func(self.values, **kwargs)      
    391                                                                                                             
    392         return self._split_op_result(result)      
                                                          
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in blk_func(values)    
   3342             vals = values                         
   3343             if pre_processing:                    
-> 3344                 vals, inferences = pre_processing(vals)                                                     
   3345                                                   
   3346             vals = vals.astype(cython_dtype, copy=False)                                                    
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in objs_to_bool(vals)  
   1752                 if skipna:                        
   1753                     func = np.vectorize(lambda x: bool(x) if not isna(x) else True)                         
-> 1754                     vals = func(vals)                                                                       
   1755                 else:                             
   1756                     vals = vals.astype(bool, copy=False)                                                    

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)                                                                                                             
   2106             vargs.extend([kwargs[_n] for _n in names])                                                      
   2107                                                   
-> 2108         return self._vectorize_call(func=func, args=vargs)                                                  
   2109                                                   
   2110     def _get_ufunc_and_otypes(self, func, args):  

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)    
   2184             res = func()                                                                                    
   2185         else:                                     
-> 2186             ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)                                
   2187                                                   
   2188             # Convert args to object arrays first                                                           

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _get_ufunc_and_otypes(self, func, args)                                                                                                     
   2140             args = [asarray(arg) for arg in args] 
   2141             if builtins.any(arg.size == 0 for arg in args):                                                 
-> 2142                 raise ValueError('cannot call `vectorize` on size 0 inputs '                                
   2143                                  'unless `otypes` is set')                                                  
   2144                                                   
                                                                                                                    
ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

Issue Description

Some grouped aggregations raise a ValueError (from numpy vectorization code) when operating on an empty DataFrame with an object dtype column. I've only observed this in any and all (perhaps because other aggregations drop the object column by default).

I've only observed this behavior in 1.4.0rc1. I've verified this code works fine in previous pandas versions, but I haven't tested with master.

Expected Behavior

Pandas should produce an empty result, as in previous versions.

Installed Versions

1.4.0rc1

@TheNeuralBit TheNeuralBit added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 6, 2022
@jreback jreback added this to the 1.4 milestone Jan 6, 2022
@jreback jreback added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 6, 2022
@jreback
Copy link
Contributor

jreback commented Jan 6, 2022

cc @jbrockmendel @rhshadrach

@rhshadrach
Copy link
Member

@TheNeuralBit thanks for the report! Confirmed on master, adding otypes fixes. PR coming shortly.

@rhshadrach rhshadrach self-assigned this Jan 8, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 11, 2022
@TheNeuralBit
Copy link
Contributor Author

Thank you! I confirmed that this is fixed on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants