Skip to content

BUG: fix Period.asfreq conversion near datetime(1, 1, 1) #19650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 21, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 +714,7 @@ Datetimelike
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
- Bug in :func:`Timestamp.floor` :func:`DatetimeIndex.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`)
- Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`)
- Bug in :func:`Period.asfreq` where periods near ``datetime(1, 1, 1)`` could be converted incorrectly (:issue:`19643`)
-

Timezones
Expand Down
5 changes: 3 additions & 2 deletions pandas/_libs/src/period_helper.c
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ PANDAS_INLINE npy_int64 transform_via_day(npy_int64 ordinal,
}

static npy_int64 DtoB_weekday(npy_int64 absdate) {
return (((absdate) / 7) * 5) + (absdate) % 7 - BDAY_OFFSET;
return floordiv(absdate, 7) * 5 + mod_compat(absdate, 7) - BDAY_OFFSET;
}

static npy_int64 DtoB(struct date_info *dinfo,
Expand Down Expand Up @@ -245,7 +245,8 @@ static npy_int64 asfreq_UpsampleWithinDay(npy_int64 ordinal,
static npy_int64 asfreq_BtoDT(npy_int64 ordinal, asfreq_info *af_info) {
ordinal += BDAY_OFFSET;
ordinal =
(((ordinal - 1) / 5) * 7 + mod_compat(ordinal - 1, 5) + 1 - ORD_OFFSET);
(floordiv(ordinal - 1, 5) * 7 + mod_compat(ordinal - 1, 5) + 1 -
ORD_OFFSET);

return upsample_daytime(ordinal, af_info);
}
Expand Down
118 changes: 109 additions & 9 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,32 @@ cdef inline int get_freq_group(int freq) nogil:
return (freq // 1000) * 1000


@cython.cdivision
# specifically _dont_ use cdvision or else ordinals near -1 are assigned to
# incorrect dates GH#19643
@cython.cdivision(False)
cdef int64_t get_period_ordinal(int year, int month, int day,
int hour, int minute, int second,
int microseconds, int picoseconds,
int freq) nogil:
"""generate an ordinal in period space"""
"""
Generate an ordinal in period space

Parameters
----------
year : int
month : int
day : int
hour : int
minute : int
second : int
microseconds : int
picoseconds : int
freq : int

Returns
-------
period_ordinal : int64_t
"""
cdef:
int64_t absdays, unix_date, seconds, delta
int64_t weeks
Expand Down Expand Up @@ -190,7 +210,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
if month >= fmonth:
mdiff += 12

return (year - 1970) * 4 + (mdiff - 1) / 3
return (year - 1970) * 4 + (mdiff - 1) // 3

elif freq == FR_MTH:
return (year - 1970) * 12 + month - 1
Expand All @@ -202,14 +222,14 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
seconds = unix_date * 86400 + hour * 3600 + minute * 60 + second

if freq == FR_MS:
return seconds * 1000 + microseconds / 1000
return seconds * 1000 + microseconds // 1000

elif freq == FR_US:
return seconds * 1000000 + microseconds

elif freq == FR_NS:
return (seconds * 1000000000 +
microseconds * 1000 + picoseconds / 1000)
microseconds * 1000 + picoseconds // 1000)

else:
return seconds
Expand All @@ -229,7 +249,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
elif freq == FR_BUS:
# calculate the current week assuming sunday as last day of a week
# Jan 1 0001 is a Monday, so subtract 1 to get to end-of-week
weeks = (unix_date + ORD_OFFSET - 1) / 7
weeks = (unix_date + ORD_OFFSET - 1) // 7
# calculate the current weekday (in range 1 .. 7)
delta = (unix_date + ORD_OFFSET - 1) % 7 + 1
# return the number of business days in full weeks plus the business
Expand All @@ -241,7 +261,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,

elif freq_group == FR_WK:
day_adj = freq - FR_WK
return (unix_date + ORD_OFFSET - (1 + day_adj)) / 7 + 1 - WEEK_OFFSET
return (unix_date + ORD_OFFSET - (1 + day_adj)) // 7 + 1 - WEEK_OFFSET

# raise ValueError

Expand Down Expand Up @@ -272,6 +292,15 @@ cdef int64_t get_python_ordinal(int64_t period_ordinal, int freq) nogil:
This corresponds to the number of days since Jan., 1st, 1AD.
When the instance has a frequency less than daily, the proleptic date
is calculated for the last day of the period.

Parameters
----------
period_ordinal : int64_t
freq : int

Returns
-------
absdate : int64_t number of days since datetime(1, 1, 1)
"""
cdef:
asfreq_info af_info
Expand All @@ -290,6 +319,21 @@ cdef int dInfoCalc_SetFromAbsDateTime(date_info *dinfo,
"""
Set the instance's value using the given date and time.
Assumes GREGORIAN_CALENDAR.

Parameters
----------
dinfo : date_info*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the asterisk is for (this notation is used in docstrings elsewhere). Have a preferred syntax?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, you are removing this anyhow IIRC? (in follow on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

absdate : int64_t days elapsed since datetime(1, 1, 1)
abstime : double
seconds elapsed since beginning of day described by absdate

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should no longer be needed. IOW this is the point of an errors declaration (and raising)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will be removing the leftover-from-C bits (and camelCase) in the next pass. For now this PR is focused on just the one bug.

-------
code : int (always 0)

Notes
-----
Updates dinfo inplace
"""
# Bounds check
# The calling function is responsible for ensuring that
Expand All @@ -307,6 +351,19 @@ cdef int dInfoCalc_SetFromAbsDate(date_info *dinfo, int64_t absdate) nogil:
"""
Sets the date part of the date_info struct
Assumes GREGORIAN_CALENDAR

Parameters
----------
dinfo : date_info*
unix_date : int64_t

Returns
-------
code : int (always 0)

Notes
-----
Updates dinfo inplace
"""
cdef:
pandas_datetimestruct dts
Expand All @@ -319,9 +376,23 @@ cdef int dInfoCalc_SetFromAbsDate(date_info *dinfo, int64_t absdate) nogil:


@cython.cdivision
cdef int dInfoCalc_SetFromAbsTime(date_info *dinfo, double abstime) nogil:
cdef void dInfoCalc_SetFromAbsTime(date_info *dinfo, double abstime) nogil:
"""
Sets the time part of the DateTime object.

Parameters
----------
dinfo : date_info*
abstime : double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific here , e.g. what is abstime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next line seconds elapsed since beginning of day described by absdate isn't clear?

seconds elapsed since beginning of day described by absdate

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have Returns for None, remove

-------
None

Notes
-----
Updates dinfo inplace
"""
cdef:
int inttime
Expand All @@ -336,7 +407,7 @@ cdef int dInfoCalc_SetFromAbsTime(date_info *dinfo, double abstime) nogil:
dinfo.hour = hour
dinfo.minute = minute
dinfo.second = second
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. Will remove when killing this function off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove the return

return


@cython.cdivision
Expand Down Expand Up @@ -370,7 +441,18 @@ cdef int64_t absdate_from_ymd(int year, int month, int day) nogil:
Find the absdate (days elapsed since datetime(1, 1, 1)
for the given year/month/day.
Assumes GREGORIAN_CALENDAR

Parameters
----------
year : int
month : int
day : int

Returns
-------
absdate : int days elapsed since datetime(1, 1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-string format, put the expl on the next line

"""

# /* Calculate the absolute date
cdef:
pandas_datetimestruct dts
Expand All @@ -385,6 +467,24 @@ cdef int64_t absdate_from_ymd(int year, int month, int day) nogil:


cdef int get_yq(int64_t ordinal, int freq, int *quarter, int *year):
"""
Find the year and quarter of a Period with the given ordinal and frequency

Parameters
----------
ordinal : int64_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very odd that these take pointers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point these functions are translated as directly as possible from the C versions. In future passes I'd like to make them more pythonic too, will need to be careful about perf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s fine

freq : int
quarter : *int
year : *int

Returns
-------
qtr_freq : int describing the implied quarterly frequency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


Notes
-----
Sets quarter and year inplace
"""
cdef:
asfreq_info af_info
int qtr_freq
Expand Down
23 changes: 22 additions & 1 deletion pandas/tests/tslibs/test_period_asfreq.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,31 @@
# -*- coding: utf-8 -*-

import pytest

from pandas.errors import OutOfBoundsDatetime
from pandas._libs.tslibs.frequencies import get_freq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not be averse to makeing this a submodule in future
tests/tslibls/period/test_asfreq

from pandas._libs.tslibs.period import period_ordinal, period_asfreq
from pandas._libs.tslibs.period import period_ordinal, period_asfreq, Period


class TestPeriodFreqConversion(object):
@pytest.mark.parametrize('freq', ['A', 'Q', 'M', 'W', 'B', 'D'])
def test_asfreq_near_zero(self, freq):
Copy link
Contributor

@jreback jreback Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note for this
nvm saw that

# GH#19643, GH#19650
per = Period('0001-01-01', freq=freq)
tup1 = (per.year, per.hour, per.day)

prev = per - 1
assert (per - 1).ordinal == per.ordinal - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as a general rule if you are importing Period, then it should go in pandas/tests/scalar/period/* else it can go here (e.g. you are testing a helper function)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as how the recent addition of tests.scalar.period will necessitate a new look at this, I propose we merge this bug fix and the next pass (later today) can attempt to standardize this convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also simply move the tests there now (as travis has to run again anyhow since you just added a commit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update for this comment

tup2 = (prev.year, prev.month, prev.day)
assert tup2 < tup1

@pytest.mark.xfail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a reason

def test_to_timestamp_out_of_bounds(self):
# GH#19643, currently gives Timestamp('1754-08-30 22:43:41.128654848')
per = Period('0001-01-01', freq='B')
with pytest.raises(OutOfBoundsDatetime):
per.to_timestamp()

def test_intraday_conversion_factors(self):
assert period_asfreq(1, get_freq('D'), get_freq('H'), False) == 24
assert period_asfreq(1, get_freq('D'), get_freq('T'), False) == 1440
Expand Down