-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: make min/max on empty datetime df consistent with datetime serie… #33911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
6c255d0
363898d
0d9a754
6e14ff8
60ea00f
5e57597
c62921b
b958e29
e309459
77fd5c6
9219ec3
6dfec8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2471,3 +2471,18 @@ def test_dt64arr_addsub_object_dtype_2d(): | |
assert result2.shape == (4, 1) | ||
assert result2.freq is None | ||
assert (result2.asi8 == 0).all() | ||
|
||
|
||
def test_sum_empty_df_series(): | ||
# Calling the following defined sum function returned an error for dataframes but | ||
# returned NaT for series. # Check that the API is consistent in this sense when | ||
# operating on empty Series/DataFrames. See GH:33704 for more information | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. frame tests go here (you can rename slightly); these should eventually be moved into test_reductions.py class TestDataFrameReductions: series are here: in tests/frame/test_analytics.py There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
df = pd.DataFrame(dict(x=pd.to_datetime([]))) | ||
series = pd.Series(pd.to_datetime([])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for frame also need to test both axis= There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
assert (df.min().x is NaT) == (series.min() is NaT) | ||
assert (df.max().x is NaT) == (series.max() is NaT) | ||
|
||
df = pd.DataFrame(dict(x=[np.nan])) | ||
series = pd.Series([np.nan]) | ||
assert np.isnan(df.min().x) == np.isnan(series.min()) | ||
assert np.isnan(df.max().x) == np.isnan(series.max()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this not have a dtype parameter? does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result parameter here is what is making all the trouble. We set the dtype to datetime but our
NaT
is treated asNaN
which numpy internally tries to cast into an integer which yields the error:ValueError: cannot convert float NaN to integer
.The conversion happens somewhere inside the c-part of numpy, which I'm not familiar with.
But to come back to your question: for our result, specifying the dtype is not necessary. The given outputs are all corresponding to the type of the input. But good point, I should probably add this check to my tests.