API: make min/max on empty datetime df consistent with datetime serie… #33911

CloseChoice · 2020-05-01T08:41:32Z

closes BUG: min/max of empty datetime dataframe raises #33704
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Fixes the following issue:

import pandas as pd
df = pd.DataFrame(dict(x=pd.to_datetime([])))
df.max()

throws a ValueError but

pd.Series(pd.to_datetime([])).max()

results in NaT.
With this change, calling an DataFrame on en empty pd.to_datetime([]), results in

In[1]: df = pd.DataFrame(dict(x=pd.to_datetime([])))
In[2]: df.max()
Out[5]: 
x   NaT
dtype: datetime64[ns]

pandas-dev#33704)

jreback

pls add a whatsnew note as well, 1.1 bug fixes datetimelike section

jreback · 2020-05-01T12:43:06Z

pandas/core/nanops.py

@@ -384,8 +384,7 @@ def _na_for_min_count(
    else:
        assert axis is not None  # assertion to make mypy happy
        result_shape = values.shape[:axis] + values.shape[axis + 1 :]
-        result = np.empty(result_shape, dtype=values.dtype)
-        result.fill(fill_value)
+        result = np.full(result_shape, fill_value)


does this not have a dtype parameter? does it matter?

The result parameter here is what is making all the trouble. We set the dtype to datetime but our NaT is treated as NaN which numpy internally tries to cast into an integer which yields the error:
ValueError: cannot convert float NaN to integer.
The conversion happens somewhere inside the c-part of numpy, which I'm not familiar with.

But to come back to your question: for our result, specifying the dtype is not necessary. The given outputs are all corresponding to the type of the input. But good point, I should probably add this check to my tests.

jreback · 2020-05-01T12:48:12Z

pandas/tests/arithmetic/test_datetime64.py

+def test_sum_empty_df_series():
+    # Calling the following defined sum function returned an error for dataframes but
+    # returned NaT for series. # Check that the API is consistent in this sense when
+    # operating on empty Series/DataFrames. See GH:33704 for more information


frame tests go here (you can rename slightly); these should eventually be moved into test_reductions.py

class TestDataFrameReductions:
def test_min_max_dt64_with_NaT

series are here:
tests/series/test_reductions.py

in tests/frame/test_analytics.py

jreback · 2020-05-01T12:50:33Z

pandas/tests/arithmetic/test_datetime64.py

+    # returned NaT for series. # Check that the API is consistent in this sense when
+    # operating on empty Series/DataFrames. See GH:33704 for more information
+    df = pd.DataFrame(dict(x=pd.to_datetime([])))
+    series = pd.Series(pd.to_datetime([]))


for frame also need to test both axis=

pep8speaks · 2020-05-01T15:54:38Z

Hello @CloseChoice! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-07 22:16:22 UTC

CloseChoice · 2020-05-01T15:59:56Z

pandas/tests/reductions/test_reductions.py

+        assert (df.max(axis=0).x is NaT) == (expected_dt_series.max() is NaT)
+
+        # check axis 1
+        tm.assert_series_equal(df.min(axis=1), expected_dt_series)


I just want to mention that calling min on axis=1 of an empty dataframe does not return the same as calling it on a Series, since this it currently not possible (and I don't think it is desired) with the implementation of reduce-operations. This test checks that when reducing over the index a Dataframe behaves like a series.

…I-consistent-empty-datetime-max-or-min

CloseChoice · 2020-05-04T11:02:33Z

pls add a whatsnew note as well, 1.1 bug fixes datetimelike section

done

jreback · 2020-05-06T23:10:06Z

doc/source/whatsnew/v1.1.0.rst

@@ -557,6 +557,7 @@ Datetimelike
 - Bug in :meth:`DatetimeIndex.intersection` losing ``freq`` and timezone in some cases (:issue:`33604`)
 - Bug in :class:`DatetimeIndex` addition and subtraction with some types of :class:`DateOffset` objects incorrectly retaining an invalid ``freq`` attribute (:issue:`33779`)
 - Bug in :class:`DatetimeIndex` where setting the ``freq`` attribute on an index could silently change the ``freq`` attribute on another index viewing the same data (:issue:`33552`)
+- Bug in :meth:`nanops._na_for_min_count` when called with empty :class:`DataFrame` of ``timedelta64`` dtype (:issue:`33911`)


can you make a userfacing note. IOW a user wants to know, what changed. you are referring to an internal routine.

jreback · 2020-05-06T23:10:36Z

pandas/core/nanops.py

+        # calling np.full with dtype parameter throws an ValueError when called
+        # with np.datetime64 and pd.NaT
+        try:
+            result = np.full(result_shape, fill_value, dtype=values.dtype)


what hits each of these branches?

when np.full is called with the parameter dtype=np.datetime64 and fill_value=pd.NaT a ValueError ValueError: cannot convert float NaN to integer. In this case I don't call np.full without an explicit dtype. Giving the array result an explicit dtype does not have an effect on the DataFrame column but I thought it is cleaner if I at least try to give it the correct one. I hoped the comment explains this. But if not, I'm gonna make it gonna improve the comment.

jreback · 2020-05-06T23:10:48Z

pandas/tests/frame/test_analytics.py

@@ -1258,3 +1258,26 @@ def test_min_max_dt64_with_NaT(self):
        res = df.max()
        exp = pd.Series([pd.NaT], index=["foo"])
        tm.assert_series_equal(res, exp)
+
+        # Calling the following sum functions returned an error for dataframes but


make a new test

…I-consistent-empty-datetime-max-or-min

jreback · 2020-05-09T20:01:06Z

thanks @CloseChoice

pandas-dev#33911)

API: make min/max on empty datetime df consistent with datetime series (

6c255d0

pandas-dev#33704)

jreback requested changes May 1, 2020

View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels May 1, 2020

update nanops and tests as result of PR discussion

363898d

CloseChoice added 2 commits May 1, 2020 17:55

removed git tracking of test file

0d9a754

removed git tracking of test file

6e14ff8

CloseChoice commented May 1, 2020

View reviewed changes

CloseChoice added 5 commits May 1, 2020 18:24

added whatsnew entry

60ea00f

move tests to test_analytics.py

5e57597

removed blank line at-end-of-file test_analytics.py

c62921b

Merge branch 'master' into API-consistent-empty-datetime-max-or-min

b958e29

Merge branch 'master' of https://github.com/pandas-dev/pandas into AP…

e309459

…I-consistent-empty-datetime-max-or-min

CloseChoice requested a review from jreback May 6, 2020 19:40

jreback requested changes May 6, 2020

View reviewed changes

CloseChoice added 3 commits May 7, 2020 20:46

splitted tests, update comment in nanops and update whatsnew entry

77fd5c6

update test_analytics for linting

9219ec3

Merge branch 'master' of https://github.com/pandas-dev/pandas into AP…

6dfec8f

…I-consistent-empty-datetime-max-or-min

CloseChoice requested a review from jreback May 7, 2020 23:09

jreback added this to the 1.1 milestone May 9, 2020

jreback approved these changes May 9, 2020

View reviewed changes

jreback merged commit ddbeca6 into pandas-dev:master May 9, 2020

CloseChoice deleted the API-consistent-empty-datetime-max-or-min branch May 9, 2020 20:31

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020

API: make min/max on empty datetime df consistent with datetime serie… (

9dbbe30

pandas-dev#33911)

Uh oh!

API: make min/max on empty datetime df consistent with datetime serie… #33911

API: make min/max on empty datetime df consistent with datetime serie… #33911

Uh oh!

Conversation

CloseChoice commented May 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented May 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-05-07 22:16:22 UTC

Uh oh!

CloseChoice May 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice commented May 4, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented May 9, 2020

Uh oh!

Uh oh!

CloseChoice commented May 1, 2020 •

edited

Loading

pep8speaks commented May 1, 2020 •

edited

Loading

CloseChoice May 1, 2020 •

edited

Loading

CloseChoice May 7, 2020 •

edited

Loading