Skip to content

Fix Timestamp rounding #21507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 114 commits into from
Jun 29, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
ee8ba61
Fix Timestamp rounding
alimcmaster1 Jun 15, 2018
50986c6
Pep8 fixes
alimcmaster1 Jun 16, 2018
d1c6e6f
Pep8 fixes and add additional test cases
alimcmaster1 Jun 17, 2018
03a42b8
Futher test cases
alimcmaster1 Jun 17, 2018
a302074
Refactor timestamp rounding
alimcmaster1 Jun 21, 2018
33335b4
Parameterize test cases
alimcmaster1 Jun 21, 2018
0363a72
Update function error
alimcmaster1 Jun 21, 2018
7353c2f
Perform manipulation with vectorization
alimcmaster1 Jun 24, 2018
82c7db1
Lower case doc string
alimcmaster1 Jun 25, 2018
559bb50
Update copy function
alimcmaster1 Jun 26, 2018
bfe0d16
Refactor to use just one function
alimcmaster1 Jun 26, 2018
4249fd7
Pep8
alimcmaster1 Jun 27, 2018
2abd8e2
DOC: MultiIndex Fixes (#21414)
alimcmaster1 Jun 11, 2018
3546c00
DOC: fix grammar of deprecation message (#21421)
jorisvandenbossche Jun 11, 2018
342fd40
MAINT: Deprecate encoding from stata reader/writer (#21400)
bashtage Jun 12, 2018
a1111b3
DOC: Add 0.23.2 whatsnew template (#21433)
gfyoung Jun 12, 2018
4423dc9
MAINT: More friendly error msg on Index overflow (#21377)
gfyoung Jun 12, 2018
0550ef7
DOC: follow 0.23.1 template for 0.23.2 whatsnew (#21435)
jorisvandenbossche Jun 12, 2018
57c8222
DOC: Loading sphinxcontrib.spelling to sphinx only if it's available …
datapythonista Jun 12, 2018
8a51d07
Fix flake8 in conf.py (#21438)
jorisvandenbossche Jun 12, 2018
7f2f340
Doc Fixes (#21415)
alimcmaster1 Jun 12, 2018
56477a0
Reapply all patches by @testvinder against master (#21413)
testvinder Jun 12, 2018
ea922c6
Two tests didn't properly assert an exception was raised. Fixed. (#21…
akaihola Jun 12, 2018
a37c3d2
DOC: 0.23.1 release (#21446)
TomAugspurger Jun 12, 2018
9f9b636
DOC: Fixed warning in doc build (#21449)
TomAugspurger Jun 13, 2018
c5a11d6
DOC: add favicon to doc pages (#21440)
joelostblom Jun 13, 2018
bcc6b8c
Fix tests fragile to PATH (#21453)
alimcmaster1 Jun 13, 2018
402905d
use ccalendar.get_days_in_month, deprecate tslib.monthrange (#21451)
jbrockmendel Jun 13, 2018
22d65e1
BUG: Construct Timestamp with tz correctly near DST border (#21407)
mroeschke Jun 13, 2018
2908346
parametrize tests, unify repeated tests (#21405)
jbrockmendel Jun 13, 2018
0e05de4
DOC: isin() docstring change DataFrame to pd.DataFrame (#21403)
beepscore Jun 13, 2018
159756e
BUG: fix get_indexer_non_unique with CategoricalIndex key (#21457)
toobaz Jun 13, 2018
70a3d6d
BUG: Fix DateOffset eq to depend on normalize attr (#21404)
jbrockmendel Jun 13, 2018
7d664d5
API/BUG: DatetimeIndex correctly localizes integer data (#21216)
mroeschke Jun 14, 2018
7463576
PERF: improve performance of groupby rank (#21237) (#21285)
peterpanmj Jun 14, 2018
f504300
PERF: typing and cdefs for tslibs.resolution (#21452)
jbrockmendel Jun 14, 2018
07381bb
disallow normalize=True with Tick classes (#21427)
jbrockmendel Jun 14, 2018
e2fb27a
CLN: Comparison methods for MultiIndex should have consistent behavio…
KalyanGokhale Jun 14, 2018
f57e0eb
PERF: Add __contains__ to CategoricalIndex (#21369)
topper-123 Jun 14, 2018
0921273
CLN: Index imports and 0.23.1 whatsnew (#21490)
mroeschke Jun 15, 2018
d2de507
improve speed of nans in CategoricalIndex (#21493)
topper-123 Jun 15, 2018
aa5e1f1
perf improvements in tslibs.period (#21447)
jbrockmendel Jun 15, 2018
676ae59
BUG: Fix Series.nlargest for integer boundary values (#21432)
jschendel Jun 15, 2018
8ca172d
Removing SimpleMock test from pandas.util.testing (#21482)
uds5501 Jun 15, 2018
28780c2
TST: adding test cases for verifying correct values shown by pivot_ta…
uds5501 Jun 15, 2018
a231fb2
PERF: remove useless overrides (#21523)
toobaz Jun 18, 2018
5f44af0
TST: Add unit tests for older timezone issues (#21491)
mroeschke Jun 18, 2018
1427b69
BUG: Timedelta.__bool__ (#21485)
TomAugspurger Jun 18, 2018
2741967
BUG: Fix Index construction when given empty generator (#21470). (#21…
Liam3851 Jun 18, 2018
34b77d1
BUG/REG: file-handle object handled incorrectly in to_csv (#21478)
minggli Jun 18, 2018
e788e47
Append Mode for ExcelWriter with openpyxl (#21251)
WillAyd Jun 19, 2018
fb16555
DOC: Improve code example for Index.get_indexer (#21511)
topper-123 Jun 19, 2018
84533b1
DOC: remove grammar duplication in groupby docs (#21534)
adamjstewart Jun 19, 2018
8cbcafd
remove daytime attr, move getstate and setstate to base class (#21533)
jbrockmendel Jun 19, 2018
c53e001
BUG: Handle read_csv corner case (#21176)
r00ta Jun 19, 2018
34d74ce
De-duplicate code for indexing with list-likes of keys (#21503)
toobaz Jun 19, 2018
a10216d
Update "See Also" section of pandas/core/generic.py (#21550)
wil Jun 20, 2018
4e6a11f
Fixing documentation lists indentation (#21519)
datapythonista Jun 20, 2018
ca9ce8d
PERF: add method Categorical.__contains__ (#21508)
topper-123 Jun 20, 2018
6289c76
REGR: Fixes first_valid_index when DataFrame or Series has duplicate …
KalyanGokhale Jun 20, 2018
b19219d
API/BUG: Raise when int-dtype coercions fail (#21456)
gfyoung Jun 20, 2018
3814d0c
DOC: Add documentation for freq='infer' option of DatetimeIndex and T…
jschendel Jun 21, 2018
ea205c0
BUG: Fix group index calculation to prevent hitting maximum recursion…
Jun 21, 2018
fd8d6bc
BUG: Fix passing empty label to df drop (#21515)
alimcmaster1 Jun 21, 2018
f3a89f3
ERR: Raise a simpler backtrace for missing key (#21558)
toobaz Jun 21, 2018
eb47287
Fixed HDFSTore.groups() performance. (#21543)
spott Jun 21, 2018
c815e0b
DOC: Fixing spaces around backticks, and linting (#21570)
datapythonista Jun 21, 2018
91c9ec3
fix hashing string-casting error (#21187)
jbrockmendel Jun 21, 2018
c019f6d
make DateOffset immutable (#21341)
jbrockmendel Jun 21, 2018
c03ed38
REF: multi_take is now able to tackle all list-like (non-bool) cases …
toobaz Jun 21, 2018
4668dba
TST: Use int fixtures in test_construction.py (#21588)
gfyoung Jun 22, 2018
07e161d
DOC: Adding clarification on return dtype of to_numeric (#21585)
uds5501 Jun 22, 2018
15b040c
Update v0.24.0.txt (#21586)
uds5501 Jun 22, 2018
b21efcc
clarifying regex pipe behavior (#21589)
mitchnegus Jun 22, 2018
a7d5bb7
DOC: Note assert_almost_equal impl. detail (#21580)
gfyoung Jun 22, 2018
8944d35
DOC: update the Series.any / Dataframe.any docstring (#21579)
strickvl Jun 22, 2018
8280084
TST: Clean up tests in test_take.py (#21591)
gfyoung Jun 22, 2018
5f956ce
TST: Add interval closed fixture to top-level conftest (#21595)
jschendel Jun 22, 2018
19b78ca
cache DateOffset attrs now that they are immutable (#21582)
jbrockmendel Jun 22, 2018
303a279
BUG: Series dot product __rmatmul__ doesn't allow matrix vector multi…
minggli Jun 22, 2018
a4798c3
BUG: first/last lose timezone in groupby with as_index=False (#21573)
reidy-p Jun 22, 2018
f901431
add test case when to_csv argument is sys.stdout (#21572)
r00ta Jun 22, 2018
8794ef2
BUG: Fix json_normalize throwing TypeError (#21536) (#21540)
vuminhle Jun 22, 2018
6eb0341
DOC: updated the Series.str.rsplit and Series.str.split docstrings (#…
ryankarlos Jun 22, 2018
289f3ad
TST: Use multiple instances of parametrize instead of product (#21602)
jschendel Jun 23, 2018
a65ea90
MyPy cleanup and absolute imports in pandas.core.dtypes.common (#21008)
WillAyd Jun 23, 2018
13a3d5a
remove unused cimport (#21619)
jbrockmendel Jun 25, 2018
afdcac1
CI: Test against Python 3.7 (#21604)
TomAugspurger Jun 25, 2018
9f6c154
DOC: Do no use 'type' as first word when specifying a return type (#2…
aberres Jun 25, 2018
c19017b
CLN: make CategoricalIndex._create_categorical a classmethod (#21618)
topper-123 Jun 25, 2018
5e4882e
PERF: do not check for label presence preventively (#21594)
toobaz Jun 25, 2018
b3443da
TST: Refactor test_maybe_match_name and test_hash_pandas_object (#21600)
minggli Jun 25, 2018
ebeccfc
API/COMPAT: support axis=None for logical reduction (reduce over all …
TomAugspurger Jun 26, 2018
db233f3
DOC: Move tz cleanup whatsnew entries to v0.24 (#21631)
mroeschke Jun 26, 2018
79d982a
DOC: fixup old whatsnew for dtype coercing change (#21456) (#21634)
jorisvandenbossche Jun 26, 2018
367ce07
DEPR: MultiIndex.to_hierarchical (#21613)
KalyanGokhale Jun 26, 2018
8db9303
TST: xfail flaky 3.7 test, xref #21636 (#21637)
jreback Jun 26, 2018
e8f5ede
[ENH] Add read support for Google Cloud Storage (#20729)
bnaul Jun 26, 2018
58a1a08
PKG: Exclude data test files. (#19535)
TomAugspurger Jun 26, 2018
c6660f6
DOC: fix typo in cookbook.rst (#21635)
bgroveben Jun 26, 2018
7555378
DOC: minor correction to v0.23.2.txt (#21644)
topper-123 Jun 26, 2018
45cfa62
Cleanup clipboard tests (#21163)
david-liu-brattle-1 Jun 26, 2018
d746bee
ENH: Update to_gbq and read_gbq to pandas-gbq 0.5.0 (#21628)
tswast Jun 26, 2018
476717c
More speedups for Period comparisons (#21606)
jbrockmendel Jun 26, 2018
001dc78
use ccalendar instead of np_datetime (#21549)
jbrockmendel Jun 26, 2018
59286da
ENH: Function to walk the group hierarchy of a PyTables HDF5 file.
Aug 30, 2015
dad1252
DOC: Fix versionadded directive typos in IntervalIndex (#21649)
jschendel Jun 27, 2018
b3b047e
TST: Use absolute path for datapath (#21647)
elmq0022 Jun 27, 2018
242ccbc
DOC: update DataFrame.dropna's axis argument docs (#21652)
taljaards Jun 27, 2018
8cbfcbf
BUG: Let IntervalIndex constructor override inferred closed (#21584)
jschendel Jun 27, 2018
d07e61b
TST: Use fixtures in dtypes/test_cast.py (#21661)
gfyoung Jun 28, 2018
0829063
TST: Clean old timezone issues PT2 (#21612)
mroeschke Jun 28, 2018
da3b903
Whatsnew Timestamp bug
alimcmaster1 Jun 28, 2018
2d0fa8b
Merge branch 'master' into timestamp-fixes
alimcmaster1 Jun 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 39 additions & 25 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -59,43 +59,54 @@ cdef inline object create_timestamp_from_ts(int64_t value,


def round_ns(values, rounder, freq):

"""
Applies rounding function at given frequency

Parameters
----------
values : int, :obj:`ndarray`
rounder : function
values : np.array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this as :obj:ndarray

rounder : function, eg. 'Ceil', 'Floor', 'round'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you lowercase ceil and floor

freq : str, obj

Returns
-------
int or :obj:`ndarray`
np.array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

"""
from pandas.tseries.frequencies import to_offset
unit = to_offset(freq).nanos
if unit < 1000:
# for nano rounding, work with the last 6 digits separately
# due to float precision
buff = 1000000
r = (buff * (values // buff) + unit *
(rounder((values % buff) * (1 / float(unit)))).astype('i8'))
else:
if unit % 1000 != 0:
msg = 'Precision will be lost using frequency: {}'
warnings.warn(msg.format(freq))

# GH19206
# to deal with round-off when unit is large
if unit >= 1e9:
divisor = 10 ** int(np.log10(unit / 1e7))
def _round_non_int_multiple(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you just have the int check inside here? this is convoluting the logic a lot


from pandas.tseries.frequencies import to_offset
unit = to_offset(freq).nanos

# GH21262 If the Timestamp is multiple of the freq str
# don't apply any rounding
if value % unit == 0:
return value

if unit < 1000:
# for nano rounding, work with the last 6 digits separately
# due to float precision
buff = 1000000
r = (buff * (value // buff) + unit *
(rounder((value % buff) * (1 / float(unit)))).astype('i8'))
else:
divisor = 10
if unit % 1000 != 0:
msg = 'Precision will be lost using frequency: {}'
warnings.warn(msg.format(freq))

# GH19206
# to deal with round-off when unit is large
if unit >= 1e9:
divisor = 10 ** int(np.log10(unit / 1e7))
else:
divisor = 10

r = (unit * rounder((values * (divisor / float(unit))) / divisor)
.astype('i8'))
r = (unit * rounder((value * (divisor / float(unit))) / divisor)
.astype('i8'))

return r
return r

return np.fromiter((_round_non_int_multiple(item) for item in values), np.int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you iterating? this doesn’t make any sense to do so with. vectorizes function



# This is PITA. Because we inherit from datetime, which has very specific
Expand Down Expand Up @@ -649,7 +660,10 @@ class Timestamp(_Timestamp):
else:
value = self.value

r = round_ns(value, rounder, freq)
value = np.array([value], dtype=np.int64)

# Will only ever contain 1 element for timestamp
r = round_ns(value, rounder, freq).item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. I think just indexing into this array (i.e. round_ns(value, rounder, freq)[0]) is just fine. Looks like item returns a copy of a Python scalar (and we may want to keep this a numpy scalar just in case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you but one advantage I did see of item() is that it will throw if the size of the array is > 1. We could do [0] and justify this by asserting len(r) == 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.value from Timestamp will always be a scalar, so we implicitly know the result of this is be a one element array.

result = Timestamp(r, unit='ns')
if self.tz is not None:
result = result.tz_localize(self.tz)
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/indexes/datetimes/test_scalar_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,21 @@ def test_round(self, tz):
ts = '2016-10-17 12:00:00.001501031'
DatetimeIndex([ts]).round('1010ns')

def test_no_rounding_occurs(self, tz):
# GH 21262
rng = date_range(start='2016-01-01', periods=5,
freq='2Min', tz=tz)

expected_rng = DatetimeIndex([
Timestamp('2016-01-01 00:00:00', tz=tz, freq='2T'),
Timestamp('2016-01-01 00:02:00', tz=tz, freq='2T'),
Timestamp('2016-01-01 00:04:00', tz=tz, freq='2T'),
Timestamp('2016-01-01 00:06:00', tz=tz, freq='2T'),
Timestamp('2016-01-01 00:08:00', tz=tz, freq='2T'),
])

tm.assert_index_equal(rng.round(freq='2T'), expected_rng)

@pytest.mark.parametrize('test_input, rounder, freq, expected', [
(['2117-01-01 00:00:45'], 'floor', '15s', ['2117-01-01 00:00:45']),
(['2117-01-01 00:00:45'], 'ceil', '15s', ['2117-01-01 00:00:45']),
Expand All @@ -143,6 +158,10 @@ def test_round(self, tz):
['1823-01-01 00:00:01.000000020']),
(['1823-01-01 00:00:01'], 'floor', '1s', ['1823-01-01 00:00:01']),
(['1823-01-01 00:00:01'], 'ceil', '1s', ['1823-01-01 00:00:01']),
(['2018-01-01 00:15:00'], 'ceil', '15T', ['2018-01-01 00:15:00']),
(['2018-01-01 00:15:00'], 'floor', '15T', ['2018-01-01 00:15:00']),
(['1823-01-01 03:00:00'], 'ceil', '3H', ['1823-01-01 03:00:00']),
(['1823-01-01 03:00:00'], 'floor', '3H', ['1823-01-01 03:00:00']),
(('NaT', '1823-01-01 00:00:01'), 'floor', '1s',
('NaT', '1823-01-01 00:00:01')),
(('NaT', '1823-01-01 00:00:01'), 'ceil', '1s',
Expand Down
20 changes: 19 additions & 1 deletion pandas/tests/scalar/timestamp/test_unary_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,25 @@ def test_ceil_floor_edge(self, test_input, rounder, freq, expected):
expected = Timestamp(expected)
assert result == expected

@pytest.mark.parametrize('test_input, freq, expected', [
('2018-01-01 00:02:06', '2s', '2018-01-01 00:02:06'),
('2018-01-01 00:02:00', '2T', '2018-01-01 00:02:00'),
('2018-01-01 00:04:00', '4T', '2018-01-01 00:04:00'),
('2018-01-01 00:15:00', '15T', '2018-01-01 00:15:00'),
('2018-01-01 00:20:00', '20T', '2018-01-01 00:20:00'),
('2018-01-01 03:00:00', '3H', '2018-01-01 03:00:00'),
])
@pytest.mark.parametrize('rounder', ['ceil', 'floor', 'round'])
def test_round_minute_freq(self, test_input, freq, expected, rounder):
# Ensure timestamps that shouldnt round dont!
# GH#21262

dt = Timestamp(test_input)
expected = Timestamp(expected)
func = getattr(dt, rounder)
result = func(freq)
assert result == expected

def test_ceil(self):
dt = Timestamp('20130101 09:10:11')
result = dt.ceil('D')
Expand Down Expand Up @@ -257,7 +276,6 @@ def test_timestamp(self):
if PY3:
# datetime.timestamp() converts in the local timezone
with tm.set_timezone('UTC'):

# should agree with datetime.timestamp method
dt = ts.to_pydatetime()
assert dt.timestamp() == ts.timestamp()