BUG: EWM silently failed float32 #42650

debnathshoham · 2021-07-21T18:14:18Z

closes BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

jbrockmendel · 2021-07-23T23:55:19Z

mroeschke · 2021-07-24T04:55:22Z

pandas/tests/window/test_ewm.py

+
+
+@pytest.mark.parametrize("func", ["mean", "std", "var"])
+@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])


Could you use the float_dtype pytest fixture instead?

I don't see a float_dtype fixture. Are you suggesting to add that?

My bad. Added.

mroeschke · 2021-07-24T04:57:00Z

pandas/tests/window/test_ewm.py

+@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])
+def test_float_dtype_ewma(dtype, func):
+    # GH#42452
+    df = DataFrame(np.random.rand(20, 3), dtype=dtype)


Could you use constant data, build a expected = DataFrame(...) and use tm.assert_frame_equal(result, expected)?

mroeschke

Also needs a whatsnew entry for 1.4

jreback

we certainly don't support float16; this is not actually testing the result is correct

debnathshoham · 2021-07-24T15:56:17Z

Hi @jreback . Sorry I don't follow.

we certainly don't support float16

Are you saying I should remove float16 here?
converted_dtypes.extend([np.float64, np.float32, np.float16])

this is not actually testing the result is correct

I have changed the earlier test (where I was just comparing the shape), to include constant data. I am comparing the result now(after @mroeschke suggested).

mroeschke · 2021-07-24T20:08:25Z

pandas/core/frame.py

+                elif dtype == "float":
+                    # GH#42452 : np.dtype("float") coerces to np.float64 from Numpy 1.20
+                    converted_dtypes.extend(
+                        [np.float64, np.float32, np.float16]  # type: ignore[list-item]


I think here @jreback means we don't explicitly support np.float64. We would just like to capture np.float32/64

mroeschke · 2021-07-24T20:13:09Z

pandas/tests/window/test_ewm.py

+@pytest.mark.parametrize("func", ["mean", "std", "var"])
+def test_float_dtype_ewma(func, float_dtype):
+    # GH#42452
+    expected_mean = DataFrame(


Could you parameterize the expected results too like:

@pytest.mark.parametrize( "func, expected_data", [["mean", [...]], ["std", [...], ...])

Testing on 1 column of data is fine as well e.g. range(5)

jreback · 2021-07-25T13:59:29Z

pandas/core/frame.py

@@ -4279,6 +4279,11 @@ def check_int_infer_dtype(dtypes):
                    # error: Argument 1 to "append" of "list" has incompatible type
                    # "Type[signedinteger[Any]]"; expected "Type[signedinteger[Any]]"
                    converted_dtypes.append(np.int64)  # type: ignore[arg-type]
+                elif dtype == "float":


what do you think this is actually doing? the else clase below should handle no?

this is really far down the chain from the caller. what exactly is calling this? this has all kinds of implications changing here (which is really puzzling why nothing else breaks)

This is being called below in obj.select_dtypes

pandas/pandas/core/window/rolling.py

Lines 218 to 232 in b5bceb7

def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:

"""

Split data into blocks & return conformed data.

"""

# filter out the on from the object

if self.on is not None and not isinstance(self.on, Index) and obj.ndim == 2:

obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)

if self.axis == 1:

# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything

# to float to calculate the complete row at once. We exclude all non-numeric

# dtypes.

obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])

obj = obj.astype("float64", copy=False)

obj._mgr = obj._mgr.consolidate()

return obj

Yes, ideally the else should have handled this. But this infer_dtype_from_object calls pandas_dtype, which ultimately calls np.dtype(dtype) as below (which defaults to np.float64)

pandas/pandas/core/dtypes/common.py

Line 1775 in b5bceb7

npdtype = np.dtype(dtype)

ok can you add some unit tests which exercise this particular piece of code (you will have to look for which ones)

@jreback added another relevant test. Please let me know if this is fine.

While the added tests were good. I think what was meant was tests for the select_dtypes method specifically.

@mroeschke added test for select_dtypes. Please let me know if this is ok?

simonjayhawkins · 2021-08-03T10:31:58Z

pandas/tests/window/test_rolling.py

@@ -1424,3 +1424,11 @@ def test_rolling_zero_window():
    result = s.rolling(0).min()
    expected = Series([np.nan])
    tm.assert_series_equal(result, expected)
+
+
+def test_rolling_float_dtype(float_dtype):


this worked in 1.1.5, see #42452 (comment)

we may want to consider backporting this fix.

simonjayhawkins · 2021-08-03T10:34:42Z

@debnathshoham does this fix also close #41779

debnathshoham · 2021-08-03T13:27:53Z

hi @simonjayhawkins,
I am getting the below right now (i.e. columns a with float16 is getting dropped)

      b     c     d     e     f
0   1.0   1.0   2.0   3.0   4.0
1   7.0   7.0   8.0   9.0  10.0
2  13.0  13.0  14.0  15.0  16.0
3  19.0  19.0  20.0  21.0  22.0

Adding float16 as well to the converted_dtypes list, would fix that as well.

debnathshoham · 2021-08-03T13:37:07Z

alternatively, i can see that just replacing with "number" below also fixes both (as suggested in the other BUG)
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])

jreback · 2021-08-03T13:52:01Z

we are not supporting float16 in any way

debnathshoham · 2021-08-03T18:27:54Z

Hi @jreback, then this patch should work fine. Pls let me know if you think of any other changes I should make.

jreback

lgtm. @mroeschke

mroeschke · 2021-08-04T04:23:34Z

Thanks @debnathshoham

* BUG: EWM silently failed float32 * added tests * resolved mypy error * added constant data in test * added pytest.fixture & whatsnew * parametrized expected df; removed float16 * added test for float32 * added tests on select_dtypes

mroeschke reviewed Jul 24, 2021

View reviewed changes

jreback requested changes Jul 24, 2021

View reviewed changes

mroeschke requested changes Jul 24, 2021

View reviewed changes

mroeschke reviewed Jul 24, 2021

View reviewed changes

debnathshoham requested a review from mroeschke July 25, 2021 12:28

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Window rolling, ewma, expanding labels Jul 25, 2021

jreback added this to the 1.4 milestone Jul 25, 2021

jreback requested changes Jul 25, 2021

View reviewed changes

debnathshoham requested a review from jreback July 26, 2021 13:15

debnathshoham added 7 commits July 31, 2021 13:31

BUG: EWM silently failed float32

78c8d83

added tests

3917ab8

resolved mypy error

346e03f

added constant data in test

2677e83

added pytest.fixture & whatsnew

a0a531e

parametrized expected df; removed float16

e2e4fb6

added test for float32

aafef93

debnathshoham force-pushed the 42452 branch from 03d7cbf to aafef93 Compare July 31, 2021 08:02

added tests on select_dtypes

c866bed

simonjayhawkins mentioned this pull request Aug 3, 2021

BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452

Closed

1 task

simonjayhawkins reviewed Aug 3, 2021

View reviewed changes

jreback approved these changes Aug 4, 2021

View reviewed changes

mroeschke approved these changes Aug 4, 2021

View reviewed changes

mroeschke merged commit 226876a into pandas-dev:master Aug 4, 2021

debnathshoham deleted the 42452 branch August 4, 2021 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: EWM silently failed float32 #42650

BUG: EWM silently failed float32 #42650

debnathshoham commented Jul 21, 2021 •

edited

Loading

jbrockmendel commented Jul 23, 2021

mroeschke Jul 24, 2021

debnathshoham Jul 24, 2021

debnathshoham Jul 24, 2021

mroeschke Jul 24, 2021

mroeschke left a comment

jreback left a comment

debnathshoham commented Jul 24, 2021

mroeschke Jul 24, 2021

mroeschke Jul 24, 2021

jreback Jul 25, 2021

jreback Jul 25, 2021

debnathshoham Jul 25, 2021

jreback Jul 26, 2021

debnathshoham Jul 31, 2021

mroeschke Aug 1, 2021

debnathshoham Aug 1, 2021

simonjayhawkins Aug 3, 2021

simonjayhawkins commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

jreback commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

jreback left a comment

mroeschke commented Aug 4, 2021



		@pytest.mark.parametrize("func", ["mean", "std", "var"])
		@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])

	def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
	"""
	Split data into blocks & return conformed data.
	"""
	# filter out the on from the object
	if self.on is not None and not isinstance(self.on, Index) and obj.ndim == 2:
	obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)
	if self.axis == 1:
	# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything
	# to float to calculate the complete row at once. We exclude all non-numeric
	# dtypes.
	obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])
	obj = obj.astype("float64", copy=False)
	obj._mgr = obj._mgr.consolidate()
	return obj

BUG: EWM silently failed float32 #42650

BUG: EWM silently failed float32 #42650

Conversation

debnathshoham commented Jul 21, 2021 • edited Loading

jbrockmendel commented Jul 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

debnathshoham commented Jul 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

jreback commented Aug 3, 2021

debnathshoham commented Aug 3, 2021

jreback left a comment

Choose a reason for hiding this comment

mroeschke commented Aug 4, 2021

debnathshoham commented Jul 21, 2021 •

edited

Loading