Skip to content

BUG: DataFrame with Int64 columns casts to float64 with .max()/.min() #34210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -865,6 +865,7 @@ ExtensionArray
- Fixed bug where :meth:`StringArray.memory_usage` was not implemented (:issue:`33963`)
- Fixed bug where :meth:`DataFrameGroupBy` would ignore the ``min_count`` argument for aggregations on nullable boolean dtypes (:issue:`34051`)
- Fixed bug that `DataFrame(columns=.., dtype='string')` would fail (:issue:`27953`, :issue:`33623`)
- Fixed bug where :meth:`DataFrame.min` and :meth:`DataFrame.max` with Int64 columns casts to float64 (:issue:`32651`)

Other
^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -1220,7 +1220,7 @@ def any(self, axis=0, *args, **kwargs):

return values.any().item()

def sum(self, axis=0, *args, **kwargs):
def sum(self, axis=0, min_count=0, *args, **kwargs):
"""
Sum of non-NA/null values

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -8419,7 +8419,9 @@ def _get_data(axis_matters):
raise NotImplementedError(msg)
return data

if numeric_only is not None and axis in [0, 1]:
if (
self._is_homogeneous_type and self._mgr.any_extension_types and axis == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need all of these conditions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't think so, as this is hiding some bugs (or inconsistencies)

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1600.gdede2c7a1e'
>>>
>>> df = pd.DataFrame({"A": pd.Series([0, 1], dtype="category")})
>>> df
   A
0  0
1  1
>>>
>>> df.all()
A    False
dtype: bool
>>>
>>> df.A.all()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\generic.py", line 11394, in logical_func
    return self._reduce(
  File "C:\Users\simon\pandas\pandas\core\series.py", line 4028, in _reduce
    return delegate._reduce(name, skipna=skipna, **kwds)
  File "C:\Users\simon\pandas\pandas\core\arrays\categorical.py", line 2076, in _reduce
    raise TypeError(f"Categorical cannot perform the operation {name}")
TypeError: Categorical cannot perform the operation all
>>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to fix #21020 first

) or (numeric_only is not None and axis in [0, 1]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the axis in [0, 1] is superfluous

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_any_all_np_func passes axis=None for reduction along both axes.

df = self
if numeric_only is True:
df = _get_data(axis_matters=True)
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/frame/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -907,6 +907,19 @@ def test_mean_extensionarray_numeric_only_true(self):
expected = pd.DataFrame(arr).mean()
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("numeric_only", [True, False, None])
@pytest.mark.parametrize("method", ["min", "max"])
def test_minmax_extensionarray(self, method, numeric_only):
# https://github.com/pandas-dev/pandas/issues/32651
int64_info = np.iinfo("int64")
ser = Series([int64_info.max, None, int64_info.min], dtype=pd.Int64Dtype())
df = DataFrame({"Int64": ser})
result = getattr(df, method)(numeric_only=numeric_only)
expected = Series(
[getattr(int64_info, method)], index=pd.Index(["Int64"], dtype="object")
)
tm.assert_series_equal(result, expected)

def test_stats_mixed_type(self, float_string_frame):
# don't blow up
float_string_frame.std(1)
Expand Down