Skip to content

BUG: EWM silently failed float32 #42650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 4, 2021
Merged

Conversation

debnathshoham
Copy link
Member

@debnathshoham debnathshoham commented Jul 21, 2021

@jbrockmendel
Copy link
Member

cc @mroeschke



@pytest.mark.parametrize("func", ["mean", "std", "var"])
@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the float_dtype pytest fixture instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a float_dtype fixture. Are you suggesting to add that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Added.

@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])
def test_float_dtype_ewma(dtype, func):
# GH#42452
df = DataFrame(np.random.rand(20, 3), dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use constant data, build a expected = DataFrame(...) and use tm.assert_frame_equal(result, expected)?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs a whatsnew entry for 1.4

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we certainly don't support float16; this is not actually testing the result is correct

@debnathshoham
Copy link
Member Author

Hi @jreback . Sorry I don't follow.

we certainly don't support float16

Are you saying I should remove float16 here?
converted_dtypes.extend([np.float64, np.float32, np.float16])

this is not actually testing the result is correct

I have changed the earlier test (where I was just comparing the shape), to include constant data. I am comparing the result now(after @mroeschke suggested).

elif dtype == "float":
# GH#42452 : np.dtype("float") coerces to np.float64 from Numpy 1.20
converted_dtypes.extend(
[np.float64, np.float32, np.float16] # type: ignore[list-item]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here @jreback means we don't explicitly support np.float64. We would just like to capture np.float32/64

@pytest.mark.parametrize("func", ["mean", "std", "var"])
def test_float_dtype_ewma(func, float_dtype):
# GH#42452
expected_mean = DataFrame(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you parameterize the expected results too like:

@pytest.mark.parametrize(
    "func, expected_data", 
    [["mean", [...]],
     ["std", [...],
     ...])

Testing on 1 column of data is fine as well e.g. range(5)

@debnathshoham debnathshoham requested a review from mroeschke July 25, 2021 12:28
@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Window rolling, ewma, expanding labels Jul 25, 2021
@jreback jreback added this to the 1.4 milestone Jul 25, 2021
@@ -4279,6 +4279,11 @@ def check_int_infer_dtype(dtypes):
# error: Argument 1 to "append" of "list" has incompatible type
# "Type[signedinteger[Any]]"; expected "Type[signedinteger[Any]]"
converted_dtypes.append(np.int64) # type: ignore[arg-type]
elif dtype == "float":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think this is actually doing? the else clase below should handle no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really far down the chain from the caller. what exactly is calling this? this has all kinds of implications changing here (which is really puzzling why nothing else breaks)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being called below in obj.select_dtypes

def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
"""
Split data into blocks & return conformed data.
"""
# filter out the on from the object
if self.on is not None and not isinstance(self.on, Index) and obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)
if self.axis == 1:
# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything
# to float to calculate the complete row at once. We exclude all non-numeric
# dtypes.
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])
obj = obj.astype("float64", copy=False)
obj._mgr = obj._mgr.consolidate()
return obj

Yes, ideally the else should have handled this. But this infer_dtype_from_object calls pandas_dtype, which ultimately calls np.dtype(dtype) as below (which defaults to np.float64)

npdtype = np.dtype(dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can you add some unit tests which exercise this particular piece of code (you will have to look for which ones)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback added another relevant test. Please let me know if this is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the added tests were good. I think what was meant was tests for the select_dtypes method specifically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke added test for select_dtypes. Please let me know if this is ok?

@debnathshoham debnathshoham requested a review from jreback July 26, 2021 13:15
@@ -1424,3 +1424,11 @@ def test_rolling_zero_window():
result = s.rolling(0).min()
expected = Series([np.nan])
tm.assert_series_equal(result, expected)


def test_rolling_float_dtype(float_dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this worked in 1.1.5, see #42452 (comment)

we may want to consider backporting this fix.

@simonjayhawkins
Copy link
Member

@debnathshoham does this fix also close #41779

@debnathshoham
Copy link
Member Author

hi @simonjayhawkins,
I am getting the below right now (i.e. columns a with float16 is getting dropped)

      b     c     d     e     f
0   1.0   1.0   2.0   3.0   4.0
1   7.0   7.0   8.0   9.0  10.0
2  13.0  13.0  14.0  15.0  16.0
3  19.0  19.0  20.0  21.0  22.0

Adding float16 as well to the converted_dtypes list, would fix that as well.

@debnathshoham
Copy link
Member Author

alternatively, i can see that just replacing with "number" below also fixes both (as suggested in the other BUG)
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])

@jreback
Copy link
Contributor

jreback commented Aug 3, 2021

we are not supporting float16 in any way

@debnathshoham
Copy link
Member Author

Hi @jreback, then this patch should work fine. Pls let me know if you think of any other changes I should make.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @mroeschke

@mroeschke mroeschke merged commit 226876a into pandas-dev:master Aug 4, 2021
@mroeschke
Copy link
Member

Thanks @debnathshoham

@debnathshoham debnathshoham deleted the 42452 branch August 4, 2021 04:31
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
* BUG: EWM silently failed float32

* added tests

* resolved mypy error

* added constant data in test

* added pytest.fixture & whatsnew

* parametrized expected df; removed float16

* added test for float32

* added tests on select_dtypes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas EWM fails silently if data types are float32 instead of float64
5 participants