-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: EWM silently failed float32 #42650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
debnathshoham
commented
Jul 21, 2021
•
edited
Loading
edited
- closes BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
cc @mroeschke |
pandas/tests/window/test_ewm.py
Outdated
|
||
|
||
@pytest.mark.parametrize("func", ["mean", "std", "var"]) | ||
@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the float_dtype
pytest fixture instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a float_dtype
fixture. Are you suggesting to add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Added.
pandas/tests/window/test_ewm.py
Outdated
@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"]) | ||
def test_float_dtype_ewma(dtype, func): | ||
# GH#42452 | ||
df = DataFrame(np.random.rand(20, 3), dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use constant data, build a expected = DataFrame(...)
and use tm.assert_frame_equal(result, expected)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs a whatsnew entry for 1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we certainly don't support float16; this is not actually testing the result is correct
Hi @jreback . Sorry I don't follow.
Are you saying I should remove float16 here?
I have changed the earlier test (where I was just comparing the shape), to include constant data. I am comparing the result now(after @mroeschke suggested). |
pandas/core/frame.py
Outdated
elif dtype == "float": | ||
# GH#42452 : np.dtype("float") coerces to np.float64 from Numpy 1.20 | ||
converted_dtypes.extend( | ||
[np.float64, np.float32, np.float16] # type: ignore[list-item] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here @jreback means we don't explicitly support np.float64
. We would just like to capture np.float32/64
pandas/tests/window/test_ewm.py
Outdated
@pytest.mark.parametrize("func", ["mean", "std", "var"]) | ||
def test_float_dtype_ewma(func, float_dtype): | ||
# GH#42452 | ||
expected_mean = DataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you parameterize the expected results too like:
@pytest.mark.parametrize(
"func, expected_data",
[["mean", [...]],
["std", [...],
...])
Testing on 1 column of data is fine as well e.g. range(5)
pandas/core/frame.py
Outdated
@@ -4279,6 +4279,11 @@ def check_int_infer_dtype(dtypes): | |||
# error: Argument 1 to "append" of "list" has incompatible type | |||
# "Type[signedinteger[Any]]"; expected "Type[signedinteger[Any]]" | |||
converted_dtypes.append(np.int64) # type: ignore[arg-type] | |||
elif dtype == "float": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think this is actually doing? the else clase below should handle no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really far down the chain from the caller. what exactly is calling this? this has all kinds of implications changing here (which is really puzzling why nothing else breaks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being called below in obj.select_dtypes
pandas/pandas/core/window/rolling.py
Lines 218 to 232 in b5bceb7
def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries: | |
""" | |
Split data into blocks & return conformed data. | |
""" | |
# filter out the on from the object | |
if self.on is not None and not isinstance(self.on, Index) and obj.ndim == 2: | |
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False) | |
if self.axis == 1: | |
# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything | |
# to float to calculate the complete row at once. We exclude all non-numeric | |
# dtypes. | |
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"]) | |
obj = obj.astype("float64", copy=False) | |
obj._mgr = obj._mgr.consolidate() | |
return obj |
Yes, ideally the else should have handled this. But this infer_dtype_from_object
calls pandas_dtype
, which ultimately calls np.dtype(dtype) as below (which defaults to np.float64)
pandas/pandas/core/dtypes/common.py
Line 1775 in b5bceb7
npdtype = np.dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can you add some unit tests which exercise this particular piece of code (you will have to look for which ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback added another relevant test. Please let me know if this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the added tests were good. I think what was meant was tests for the select_dtypes
method specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke added test for select_dtypes
. Please let me know if this is ok?
@@ -1424,3 +1424,11 @@ def test_rolling_zero_window(): | |||
result = s.rolling(0).min() | |||
expected = Series([np.nan]) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
def test_rolling_float_dtype(float_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this worked in 1.1.5, see #42452 (comment)
we may want to consider backporting this fix.
@debnathshoham does this fix also close #41779 |
hi @simonjayhawkins,
Adding float16 as well to the |
alternatively, i can see that just replacing with "number" below also fixes both (as suggested in the other BUG) |
we are not supporting float16 in any way |
Hi @jreback, then this patch should work fine. Pls let me know if you think of any other changes I should make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @mroeschke
Thanks @debnathshoham |
* BUG: EWM silently failed float32 * added tests * resolved mypy error * added constant data in test * added pytest.fixture & whatsnew * parametrized expected df; removed float16 * added test for float32 * added tests on select_dtypes