Skip to content

CI: numpydev failure #22960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Oct 3, 2018 · 1 comment
Closed

CI: numpydev failure #22960

TomAugspurger opened this issue Oct 3, 2018 · 1 comment
Labels
CI Continuous Integration
Milestone

Comments

@TomAugspurger
Copy link
Contributor

https://travis-ci.org/pandas-dev/pandas/jobs/436434493#L2766

____________ TestSelection.test_groupby_duplicated_column_errormsg _____________
[gw1] linux -- Python 3.7.0 /home/travis/miniconda3/envs/pandas/bin/python
self = <pandas.tests.groupby.test_grouping.TestSelection object at 0x7fb0ebdafeb8>
    def test_groupby_duplicated_column_errormsg(self):
        # GH7511
        df = DataFrame(columns=['A', 'B', 'A', 'C'],
                       data=[range(4), range(2, 6), range(0, 8, 2)])
    
>       pytest.raises(ValueError, df.groupby, 'A')
pandas/tests/groupby/test_grouping.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:7152: in groupby
    observed=observed, **kwargs)
pandas/core/groupby/groupby.py:1966: in groupby
    return klass(obj, by, **kwds)
pandas/core/groupby/groupby.py:365: in __init__
    mutated=self.mutated)
pandas/core/groupby/grouper.py:588: in _get_grouper
    if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]:
pandas/core/dtypes/common.py:574: in is_categorical_dtype
    return CategoricalDtype.is_dtype(arr_or_dtype)
pandas/core/dtypes/base.py:93: in is_dtype
    return cls.construct_from_string(dtype) is not None
pandas/core/dtypes/dtypes.py:359: in construct_from_string
    if string == 'category':
pandas/core/ops.py:1923: in f
    try_cast=False)
pandas/core/frame.py:4930: in _combine_const
    return ops.dispatch_to_series(self, other, func)
pandas/core/ops.py:1714: in dispatch_to_series
    new_data = expressions.evaluate(column_op, str_rep, left, right)
pandas/core/computation/expressions.py:205: in evaluate
    return _evaluate(op, op_str, a, b, **eval_kwargs)
pandas/core/computation/expressions.py:65: in _evaluate_standard
    return op(a, b)
pandas/core/ops.py:1694: in column_op
    for i in range(len(a.columns))}
pandas/core/ops.py:1694: in <dictcomp>
    for i in range(len(a.columns))}
pandas/core/ops.py:1546: in wrapper
    res = na_op(values, other)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
x = array([0, 2, 0]), y = 'category'
    def na_op(x, y):
        # TODO:
        # should have guarantess on what x, y can be type-wise
        # Extension Dtypes are not called here
    
        # Checking that cases that were once handled here are no longer
        # reachable.
        assert not (is_categorical_dtype(y) and not is_scalar(y))
    
        if is_object_dtype(x.dtype):
            result = _comp_method_OBJECT_ARRAY(op, x, y)
    
        elif is_datetimelike_v_numeric(x, y):
            return invalid_comparison(x, y, op)
    
        else:
    
            # we want to compare like types
            # we only want to convert to integer like if
            # we are not NotImplemented, otherwise
            # we would allow datetime64 (but viewed as i8) against
            # integer comparisons
    
            # we have a datetime/timedelta and may need to convert
            assert not needs_i8_conversion(x)
            mask = None
            if not is_scalar(y) and needs_i8_conversion(y):
                mask = isna(x) | isna(y)
                y = y.view('i8')
                x = x.view('i8')
    
            method = getattr(x, op_name, None)
            if method is not None:
                with np.errstate(all='ignore'):
>                   result = method(y)
E                   FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
pandas/core/ops.py:1429: FutureWarning

I haven't been able to reproduce locally yet.

@TomAugspurger TomAugspurger added the CI Continuous Integration label Oct 3, 2018
@TomAugspurger
Copy link
Contributor Author

I can reproduce locally now. Not sure what was going on earlier.

#22901 was the commit that introduced the failure, but we didn't catch it there since numpydev is in the "allowed failure" section. I think we should move it up to the non-allowed section.

I think the root cause is calling CategoricalDtype.construct_from_string(dtype) on a DataFrame. We used to do that in a bare except:, but now it's except TypeError. Ideally we wouldn't be calling that on data in the first place, since the results are unpredictable.

ipdb> pp string
   A  A
0  0  2
1  2  4
2  0  4

We do

ipdb> string == 'category'
*** FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

That FutureWarning is elevated to an error. I didn't realize this, but you can catch warnings, so the very simple fix is to also catch FutureWarning here, but that feels fragile. I'll investigate a better fix...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

2 participants