Skip to content

BUG: groupby.cummin/max changing passed values to nan uncesessarily #15109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jan 11, 2017 · 3 comments
Closed

BUG: groupby.cummin/max changing passed values to nan uncesessarily #15109

jreback opened this issue Jan 11, 2017 · 3 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jan 11, 2017

after #15053 is failing. (puzzled why its not actually failing on master though).

This is local in osx (3.5), so same as the travis test (though that used numpy 1.10.4), its possible that is the cause.

(pandas) bash-3.2$ nosetests pandas/tests/groupby/test_groupby.py  -s -m cummin_cummax
F
======================================================================
FAIL: test_cummin_cummax (test_groupby.TestGroupBy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/groupby/test_groupby.py", line 5798, in test_cummin_cummax
    tm.assert_frame_equal(result, expected)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1313, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1154, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 878, in assert_attr_equal
    left_attr, right_attr)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1018, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  float64
[right]: int64

----------------------------------------------------------------------
Ran 1 test in 0.046s

FAILED (failures=1)

because the inf values are == iNaT, and we convert these on exit from the function as we are using these to mark NaT in all int64, not just for datetimelike.

jreback@2af5806 fixes, but is kludgey. I pre-check for iNaT an if so, convert to float.

A better soln is to pass in the is_datetimelike flag so that the check could be done internally in cython. IOW, if is_datetimelike==True, then go ahead and treat iNaT as null, otherwise ignore it. Ideally we would actually be able to return a correctly dtyped array in the first place, but the templating code won't allow this, e.g. we have a 1-1 mapping between in-out and they are done a-priori.

part of this fix could also remove the need for passing in accum as its wholly internal to the group by cython functions (as part of this could remove the fused typed, numeric and replace with templated code).

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jan 11, 2017
@jreback jreback added this to the 0.20.0 milestone Jan 11, 2017
@jreback
Copy link
Contributor Author

jreback commented Jan 11, 2017

cc @mroeschke
@chris-b1

jreback added a commit to jreback/pandas that referenced this issue Jan 13, 2017
jreback added a commit that referenced this issue Jan 13, 2017
@jreback
Copy link
Contributor Author

jreback commented Jan 13, 2017

cc @mroeschke turns out we weren't testing groupby on travis after the refactor :<

I merged in #15127 which now will run these tests, but I skip the portions where this issue is causing things to fail.

@mroeschke
Copy link
Member

Ah so that's why that test didn't trip during the build. Thanks for keeping me posted @jreback

I'll uncomment those tests once I look into the fix.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#15109

Author: Matt Roeschke <[email protected]>

Closes pandas-dev#15205 from mroeschke/fix_15109 and squashes the following commits:

717afb4 [Matt Roeschke] BUG: cummin/cummax nans inf values for int64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants