Skip to content

API: make min/max on empty datetime df consistent with datetime serie… #33911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,8 +384,7 @@ def _na_for_min_count(
else:
assert axis is not None # assertion to make mypy happy
result_shape = values.shape[:axis] + values.shape[axis + 1 :]
result = np.empty(result_shape, dtype=values.dtype)
result.fill(fill_value)
result = np.full(result_shape, fill_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this not have a dtype parameter? does it matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result parameter here is what is making all the trouble. We set the dtype to datetime but our NaT is treated as NaN which numpy internally tries to cast into an integer which yields the error:
ValueError: cannot convert float NaN to integer.
The conversion happens somewhere inside the c-part of numpy, which I'm not familiar with.

But to come back to your question: for our result, specifying the dtype is not necessary. The given outputs are all corresponding to the type of the input. But good point, I should probably add this check to my tests.

return result


Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/arithmetic/test_datetime64.py
Original file line number Diff line number Diff line change
Expand Up @@ -2471,3 +2471,18 @@ def test_dt64arr_addsub_object_dtype_2d():
assert result2.shape == (4, 1)
assert result2.freq is None
assert (result2.asi8 == 0).all()


def test_sum_empty_df_series():
# Calling the following defined sum function returned an error for dataframes but
# returned NaT for series. # Check that the API is consistent in this sense when
# operating on empty Series/DataFrames. See GH:33704 for more information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame tests go here (you can rename slightly); these should eventually be moved into test_reductions.py

class TestDataFrameReductions:
def test_min_max_dt64_with_NaT

series are here:
tests/series/test_reductions.py

in tests/frame/test_analytics.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

df = pd.DataFrame(dict(x=pd.to_datetime([])))
series = pd.Series(pd.to_datetime([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for frame also need to test both axis=

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert (df.min().x is NaT) == (series.min() is NaT)
assert (df.max().x is NaT) == (series.max() is NaT)

df = pd.DataFrame(dict(x=[np.nan]))
series = pd.Series([np.nan])
assert np.isnan(df.min().x) == np.isnan(series.min())
assert np.isnan(df.max().x) == np.isnan(series.max())