Fix Timestamp rounding #21507

alimcmaster1 · 2018-06-15T23:39:29Z

Closes Timestamp ceil rounds up when it should not when using the '15min' frequency in 0.23.0. #21262
Tests added (thanks @Safrone PR)

This change-set is to avoid rounding a timestamp when the timestamp is a multiple of the frequency string passed in.

"Values" param passed into round_ns can either be a np array or int. So relevant handling added for both.

FYI I havn't used Cython much before so keen to get peoples thoughts/feedback.

Thanks

pep8speaks · 2018-06-15T23:39:31Z

Hello @alimcmaster1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 28, 2018 at 20:01 Hours UTC

codecov · 2018-06-16T00:19:11Z

Codecov Report

Merging #21507 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21507   +/-   ##
=======================================
  Coverage    91.9%    91.9%           
=======================================
  Files         154      154           
  Lines       49555    49555           
=======================================
  Hits        45542    45542           
  Misses       4013     4013

Flag	Coverage Δ
#multiple	`90.27% <ø> (ø)`	⬆️
#single	`42.03% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0f978d...2d0fa8b. Read the comment docs.

mroeschke · 2018-06-16T02:40:36Z

A few thoughts:

With your solution, I think it's better to make round_ns only take arrays (therefore wrap scalar inputs as array) to avoid checking the type of the input (and moreover cdefing them) and iterating over values. It appears the solution can still work by computing values % unit as a boolean array and operating with that. We do something similar with tz_localize_to_utc.
Include tests for DatetimeIndex.round
Looks like there are some linting errors (your spacing seems to be 2 instead of 4 in some places).

alimcmaster1 · 2018-06-17T15:01:05Z

Thanks @mroeschke for the comments. I've done (2) and (3) added a few additional test cases to the DateTimeIndex ceil/floor and one that clearly shows how round should behave for this bug.

Yes this definitely makes sense. Just to clarify are you thinking in
_round we check the type of value if its an int we wrap in an np array. Hence we can cdef round_ns?

Best,

Alistair

mroeschke · 2018-06-17T19:38:53Z

Rather have round_ns (which we can at best cpdef since it's imported by a python file) only accept numpy arrays. So since this method is used by timestamps, we would have to wrap it's self.value input as a numpy array. The relevant rounding code for Timestamps would turn into:

def _round(self, freq, rounder):
    if self.tz is not None:
        value = self.tz_localize(None).value
    else:
        value = self.value
    value = np.array([value], dtype=np.int64)
    r = round_ns(value, rounder, freq)

mroeschke · 2018-06-17T19:42:03Z

pandas/tests/scalar/timestamp/test_unary_ops.py

+        dt = Timestamp(test_input)
+        expected = Timestamp(expected)
+
+        result_ceil = dt.ceil(freq)


Could you also parametrize over the rounding methods (ceil, floor, and round)? It would help reduce this duplication

Sure makes sense to me

jreback · 2018-06-18T10:36:05Z

pandas/_libs/tslibs/timestamps.pyx

@@ -72,30 +72,50 @@ def round_ns(values, rounder, freq):
    -------
    int or :obj:`ndarray`
    """
+    def _round_non_int_multiple(value):


why don't you just have the int check inside here? this is convoluting the logic a lot

jreback · 2018-06-18T10:37:45Z

pandas/_libs/tslibs/timestamps.pyx


-        r = (unit * rounder((values * (divisor / float(unit))) / divisor)
-             .astype('i8'))
+    if type(values) is int:


why is this not just another part of the if?

Safrone · 2018-06-20T19:49:02Z

So it shows up on the issue properly: #21262

alimcmaster1 · 2018-06-21T00:52:03Z

Cleaned up my implementation here

Params added in test cases
round_ns now only takes np.array ( hence cleaned up the logic in here )

Thoguhts @mroeschke ?

jreback · 2018-06-21T01:09:50Z

pandas/_libs/tslibs/timestamps.pyx

-    return r
+        return r
+
+    return np.fromiter((_round_non_int_multiple(item) for item in values), np.int64)


why are you iterating? this doesn’t make any sense to do so with. vectorizes function

mroeschke · 2018-06-21T05:46:18Z

pandas/_libs/tslibs/timestamps.pyx

    """
    Applies rounding function at given frequency

    Parameters
    ----------
-    values : int, :obj:`ndarray`
-    rounder : function
+    values : np.array


I'd leave this as :obj:ndarray

mroeschke · 2018-06-21T05:46:34Z

pandas/_libs/tslibs/timestamps.pyx

    freq : str, obj

    Returns
    -------
-    int or :obj:`ndarray`
+    np.array


mroeschke · 2018-06-21T05:50:06Z

pandas/_libs/tslibs/timestamps.pyx

+        value = np.array([value], dtype=np.int64)
+
+        # Will only ever contain 1 element for timestamp
+        r = round_ns(value, rounder, freq).item()


Nit. I think just indexing into this array (i.e. round_ns(value, rounder, freq)[0]) is just fine. Looks like item returns a copy of a Python scalar (and we may want to keep this a numpy scalar just in case)

Up to you but one advantage I did see of item() is that it will throw if the size of the array is > 1. We could do [0] and justify this by asserting len(r) == 1.

self.value from Timestamp will always be a scalar, so we implicitly know the result of this is be a one element array.

mroeschke · 2018-06-21T06:00:01Z

Thanks for the revision. As @jreback mentions, I dont think it's necessary to iterate, You should be able to perform the adjustments with vectorization. At a high level the logic should look like:

mask = value % unit == 0
if mask.all():
    return value
values[~mask] = _round_non_int_multiple(values[~mask])
return values

alimcmaster1 · 2018-06-21T21:51:22Z

That @mroeschke your logic above seems much neater let me refactor!

mroeschke · 2018-06-25T02:02:16Z

pandas/_libs/tslibs/timestamps.pyx

-    values : int, :obj:`ndarray`
-    rounder : function
+    values : :obj:`ndarray`
+    rounder : function, eg. 'Ceil', 'Floor', 'round'


Nit: Could you lowercase ceil and floor

mroeschke · 2018-06-25T02:02:46Z

pandas/_libs/tslibs/timestamps.pyx


-        r = (unit * rounder((values * (divisor / float(unit))) / divisor)
-             .astype('i8'))
+    values = np.copy(values)


Why is a copy needed here?

This is to handle the case where 'NaT' exists in the DateTimeIndex. Removing it will cause test_ceil_floor_edge in test_scalar_compact.py to fail.

I found that datetimelike.py self.hasnans in _maybe_mask_resultswill return False if we don't do the copy, by copying we ensure that we arn't referencing the base of the input array. Think that is the issue here, what do you think?

cc @jreback.

Thanks for the investigation. That sounds reasonable; but I am not too familiar with nan/Nat ops with respect to references.

you are mutating in place, so you DO need to copy.

that's fine. though pls use values.copy()

jreback

i haven't fully reviewd yet

jreback · 2018-06-25T22:58:39Z

pandas/_libs/tslibs/timestamps.pyx


-        r = (unit * rounder((values * (divisor / float(unit))) / divisor)
-             .astype('i8'))
+    values = np.copy(values)


you are mutating in place, so you DO need to copy.

that's fine. though pls use values.copy()

* Google Cloud Storage support using gcsfs

Removing the semicolon delimiter at the end of the modified line of code allows the line's output to be displayed.

* Add link to Pandas-GBQ 0.5.0 in what's new. * Remove unnecessary sleep in GBQ tests. Closes googleapis/python-bigquery-pandas#177 Closes #21627

closes #10143

jreback · 2018-06-29T00:26:43Z

thanks @alimcmaster1

alimcmaster1 · 2018-07-01T13:12:21Z

thanks @jreback and @mroeschke for helping review!

(cherry picked from commit 76ef7c4)

Fix Timestamp rounding

ee8ba61

Pep8 fixes

50986c6

mroeschke added the Datetime Datetime data dtype label Jun 16, 2018

alimcmaster1 added 2 commits June 17, 2018 14:07

Pep8 fixes and add additional test cases

d1c6e6f

Futher test cases

03a42b8

mroeschke reviewed Jun 17, 2018

View reviewed changes

jreback requested changes Jun 18, 2018

View reviewed changes

jreback mentioned this pull request Jun 19, 2018

TST: Add failing tests for minute rounding #21265

Closed

5 tasks

alimcmaster1 added 2 commits June 21, 2018 01:27

Refactor timestamp rounding

a302074

Parameterize test cases

33335b4

Update function error

0363a72

jreback requested changes Jun 21, 2018

View reviewed changes

mroeschke reviewed Jun 21, 2018

View reviewed changes

pandas/_libs/tslibs/timestamps.pyx Outdated

freq : str, obj

Returns

-------

int or :obj:`ndarray`

np.array

Copy link

Member

mroeschke Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

mroeschke reviewed Jun 21, 2018

View reviewed changes

Perform manipulation with vectorization

7353c2f

mroeschke reviewed Jun 25, 2018

View reviewed changes

Lower case doc string

82c7db1

jreback requested changes Jun 25, 2018

View reviewed changes

Update copy function

559bb50

jorisvandenbossche and others added 20 commits June 28, 2018 20:39

DOC: fixup old whatsnew for dtype coercing change (#21456) (#21634)

79d982a

DEPR: MultiIndex.to_hierarchical (#21613)

367ce07

TST: xfail flaky 3.7 test, xref #21636 (#21637)

8db9303

[ENH] Add read support for Google Cloud Storage (#20729)

e8f5ede

* Google Cloud Storage support using gcsfs

PKG: Exclude data test files. (#19535)

58a1a08

DOC: fix typo in cookbook.rst (#21635)

c6660f6

Removing the semicolon delimiter at the end of the modified line of code allows the line's output to be displayed.

DOC: minor correction to v0.23.2.txt (#21644)

7555378

Cleanup clipboard tests (#21163)

45cfa62

ENH: Update to_gbq and read_gbq to pandas-gbq 0.5.0 (#21628)

d746bee

* Add link to Pandas-GBQ 0.5.0 in what's new. * Remove unnecessary sleep in GBQ tests. Closes googleapis/python-bigquery-pandas#177 Closes #21627

More speedups for Period comparisons (#21606)

476717c

use ccalendar instead of np_datetime (#21549)

001dc78

ENH: Function to walk the group hierarchy of a PyTables HDF5 file.

59286da

closes #10143

DOC: Fix versionadded directive typos in IntervalIndex (#21649)

dad1252

TST: Use absolute path for datapath (#21647)

b3b047e

DOC: update DataFrame.dropna's axis argument docs (#21652)

242ccbc

BUG: Let IntervalIndex constructor override inferred closed (#21584)

8cbfcbf

TST: Use fixtures in dtypes/test_cast.py (#21661)

d07e61b

TST: Clean old timezone issues PT2 (#21612)

0829063

Whatsnew Timestamp bug

da3b903

Merge branch 'master' into timestamp-fixes

2d0fa8b

jreback approved these changes Jun 29, 2018

View reviewed changes

jreback merged commit 76ef7c4 into pandas-dev:master Jun 29, 2018

jreback added the Needs Backport label Jun 29, 2018

alimcmaster1 deleted the timestamp-fixes branch July 1, 2018 13:12

jorisvandenbossche removed the Needs Backport label Jul 2, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018

Fix Timestamp rounding (pandas-dev#21507)

3ee7c6c

(cherry picked from commit 76ef7c4)

jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018

Fix Timestamp rounding (#21507)

0a42f18

(cherry picked from commit 76ef7c4)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Fix Timestamp rounding (pandas-dev#21507)

e479f30

Uh oh!

Fix Timestamp rounding #21507

Fix Timestamp rounding #21507

Uh oh!

Conversation

alimcmaster1 commented Jun 15, 2018 • edited by jreback Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Jun 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on June 28, 2018 at 20:01 Hours UTC

Uh oh!

codecov bot commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mroeschke commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alimcmaster1 commented Jun 17, 2018

Uh oh!

mroeschke commented Jun 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Safrone commented Jun 20, 2018

Uh oh!

alimcmaster1 commented Jun 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Jun 21, 2018

Uh oh!

alimcmaster1 commented Jun 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke Jun 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 29, 2018

Uh oh!

alimcmaster1 commented Jul 1, 2018

Uh oh!

Uh oh!

alimcmaster1 commented Jun 15, 2018 •

edited by jreback

Loading

pep8speaks commented Jun 15, 2018 •

edited

Loading

codecov bot commented Jun 16, 2018 •

edited

Loading

mroeschke commented Jun 16, 2018 •

edited

Loading

mroeschke Jun 25, 2018 •

edited

Loading