BUG: Fix ts precision issue with groupby and NaT (#19526) #19530

jbandlow · 2018-02-04T06:58:50Z

closes Rounding errors with Timestamps and .last() #19526
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-02-04T08:16:59Z

Codecov Report

Merging #19530 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19530      +/-   ##
==========================================
+ Coverage    91.6%    91.6%   +<.01%     
==========================================
  Files         150      150              
  Lines       48750    48750              
==========================================
+ Hits        44657    44658       +1     
+ Misses       4093     4092       -1

Flag	Coverage Δ
#multiple	`89.97% <100%> (ø)`	⬆️
#single	`41.75% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.17% <100%> (ø)`	⬆️
pandas/io/parsers.py	`95.56% <0%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93c86aa...2fb23d6. Read the comment docs.

jreback · 2018-02-04T15:16:44Z

doc/source/whatsnew/v0.23.0.txt

@@ -548,7 +548,7 @@ Groupby/Resample/Rolling
 - Bug in :func:`DataFrame.resample` which silently ignored unsupported (or mistyped) options for ``label``, ``closed`` and ``convention`` (:issue:`19303`)
 - Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
 - Bug in ``transform`` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
-
+- Bug in :func:`DataFrame.groupby` where the use of cython aggregation functions was causing timestamps to lose precision (:issue:`19526`)


can you list them (first/last/min/max)

jreback · 2018-02-04T15:16:55Z

pandas/tests/groupby/test_groupby.py

@@ -2758,6 +2758,27 @@ def test_tuple_correct_keyerror(self):
        with tm.assert_raises_regex(KeyError, "(7, 8)"):
            df.groupby((7, 8)).mean()

+    def test_cython_with_timestamp_and_nat(self):


can you parameterize over them instead

also parameterize over a timedelta dtyped Series as well

jreback · 2018-02-04T15:19:09Z

pandas/tests/groupby/test_groupby.py

+        # https://github.com/pandas-dev/pandas/issues/19526
+        ts = pd.Timestamp('2016-10-14 21:00:44.557')
+        df = pd.DataFrame({'a': [0, 1], 'b': [ts, pd.NaT]})
+        index = pd.Int64Index([0, 1], dtype='int64', name='a')


you can write the expected like
expected = DataFrame({'b': [ts, NaT]}, index=Index([0, 1, name='a'))

note we don't use pd. anywhere (the imports are at the top)

jreback · 2018-02-04T15:19:37Z

pandas/tests/groupby/test_groupby.py

+        # We will group by a and test the cython aggregations
+        expected = pd.DataFrame({'b': [ts, pd.NaT]}, index=index)
+
+        result = df.groupby('a').max()


this test needs to be move to pandas/groupby/aggregate/test_cython.py

jreback · 2018-02-04T15:20:38Z

pandas/core/groupby.py

@@ -2324,7 +2324,7 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1):
            result = self._transform(
                result, values, labels, func, is_numeric, is_datetimelike)

-        if is_integer_dtype(result):
+        if is_integer_dtype(result) and not is_datetimelike:
            mask = result == iNaT


I think you can then remove this masking bizness (inside the is_integer_dtype)

this might also fix #16674, can you add a test and see for that (and if so add to the whatsnew).

It doesn't look like I can (easily) do this. If resample introduces missing values to an integer series, those get recorded as iNaT. Since the user is expecting nan for missing values, we have to recast the series to float and explicitly convert the iNaT values.

#16674 is about the case where iNaT was in the input dataframe as a legitimate integer value. At this point in the code, there is no way to disambiguate between that and true missing values.

jreback · 2018-02-06T23:49:40Z

thanks!

keep em coming!

jbandlow · 2018-02-06T23:59:09Z

thanks! this was a good experience for a first commit.

closes pandas-dev#19526 Author: Jason Bandlow <[email protected]> Closes pandas-dev#19530 from jbandlow/timestamp_float_conversion and squashes the following commits: 2fb23d6 [Jason Bandlow] merge af37225 [Jason Bandlow] BUG: Fix ts precision issue with groupby and NaT (pandas-dev#19526)

jreback requested changes Feb 4, 2018

View reviewed changes

jreback added Datetime Datetime data dtype Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 4, 2018

BUG: Fix ts precision issue with groupby and NaT (pandas-dev#19526)

af37225

jbandlow force-pushed the timestamp_float_conversion branch from 8071bbb to af37225 Compare February 6, 2018 15:22

merge

2fb23d6

jbandlow force-pushed the timestamp_float_conversion branch from a44916a to 2fb23d6 Compare February 6, 2018 17:58

jreback added this to the 0.23.0 milestone Feb 6, 2018

jreback approved these changes Feb 6, 2018

View reviewed changes

jreback closed this in 983d71f Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix ts precision issue with groupby and NaT (#19526) #19530

BUG: Fix ts precision issue with groupby and NaT (#19526) #19530

jbandlow commented Feb 4, 2018

codecov bot commented Feb 4, 2018 •

edited

Loading

jreback Feb 4, 2018

jreback Feb 4, 2018

jreback Feb 4, 2018

jreback Feb 4, 2018

jreback Feb 4, 2018

jreback Feb 4, 2018

jreback Feb 4, 2018

jbandlow Feb 6, 2018

jreback commented Feb 6, 2018

jbandlow commented Feb 6, 2018

BUG: Fix ts precision issue with groupby and NaT (#19526) #19530

BUG: Fix ts precision issue with groupby and NaT (#19526) #19530

Conversation

jbandlow commented Feb 4, 2018

codecov bot commented Feb 4, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 6, 2018

jbandlow commented Feb 6, 2018

codecov bot commented Feb 4, 2018 •

edited

Loading