Skip to content

BUG: fix Period.asfreq conversion near datetime(1, 1, 1) #19650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 21, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 +714,7 @@ Datetimelike
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
- Bug in :func:`Timestamp.floor` :func:`DatetimeIndex.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`)
- Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`)
- Bug in :func:`Period.asfreq` where periods near ``datetime(1, 1, 1)`` could be converted incorrectly (:issue:`19643`)
-

Timezones
Expand Down
5 changes: 3 additions & 2 deletions pandas/_libs/src/period_helper.c
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ PANDAS_INLINE npy_int64 transform_via_day(npy_int64 ordinal,
}

static npy_int64 DtoB_weekday(npy_int64 absdate) {
return (((absdate) / 7) * 5) + (absdate) % 7 - BDAY_OFFSET;
return floordiv(absdate, 7) * 5 + mod_compat(absdate, 7) - BDAY_OFFSET;
}

static npy_int64 DtoB(struct date_info *dinfo,
Expand Down Expand Up @@ -245,7 +245,8 @@ static npy_int64 asfreq_UpsampleWithinDay(npy_int64 ordinal,
static npy_int64 asfreq_BtoDT(npy_int64 ordinal, asfreq_info *af_info) {
ordinal += BDAY_OFFSET;
ordinal =
(((ordinal - 1) / 5) * 7 + mod_compat(ordinal - 1, 5) + 1 - ORD_OFFSET);
(floordiv(ordinal - 1, 5) * 7 + mod_compat(ordinal - 1, 5) + 1 -
ORD_OFFSET);

return upsample_daytime(ordinal, af_info);
}
Expand Down
110 changes: 103 additions & 7 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,30 @@ cdef inline int get_freq_group(int freq) nogil:
return (freq // 1000) * 1000


@cython.cdivision
@cython.cdivision(False) # specifically _dont_ use cdvision GH#19643
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment on why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather does this actually matter now that you are always using floor division?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather does this actually matter now that you are always using floor division?

Yes. With cdivision -1 // 7 returns 0. I find it really counter-intuitive, not at all surprising that this bug slipped through.

cdef int64_t get_period_ordinal(int year, int month, int day,
int hour, int minute, int second,
int microseconds, int picoseconds,
int freq) nogil:
"""generate an ordinal in period space"""
"""
Generate an ordinal in period space

Parameters
----------
year : int
month : int
day : int
hour : int
minute : int
second : int
microseconds : int
picoseconds : int
freq : int

Returns
-------
period_ordinal : int64_t
"""
cdef:
int64_t absdays, unix_date, seconds, delta
int64_t weeks
Expand Down Expand Up @@ -190,7 +208,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
if month >= fmonth:
mdiff += 12

return (year - 1970) * 4 + (mdiff - 1) / 3
return (year - 1970) * 4 + (mdiff - 1) // 3

elif freq == FR_MTH:
return (year - 1970) * 12 + month - 1
Expand All @@ -202,14 +220,14 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
seconds = unix_date * 86400 + hour * 3600 + minute * 60 + second

if freq == FR_MS:
return seconds * 1000 + microseconds / 1000
return seconds * 1000 + microseconds // 1000

elif freq == FR_US:
return seconds * 1000000 + microseconds

elif freq == FR_NS:
return (seconds * 1000000000 +
microseconds * 1000 + picoseconds / 1000)
microseconds * 1000 + picoseconds // 1000)

else:
return seconds
Expand All @@ -229,7 +247,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,
elif freq == FR_BUS:
# calculate the current week assuming sunday as last day of a week
# Jan 1 0001 is a Monday, so subtract 1 to get to end-of-week
weeks = (unix_date + ORD_OFFSET - 1) / 7
weeks = (unix_date + ORD_OFFSET - 1) // 7
# calculate the current weekday (in range 1 .. 7)
delta = (unix_date + ORD_OFFSET - 1) % 7 + 1
# return the number of business days in full weeks plus the business
Expand All @@ -241,7 +259,7 @@ cdef int64_t get_period_ordinal(int year, int month, int day,

elif freq_group == FR_WK:
day_adj = freq - FR_WK
return (unix_date + ORD_OFFSET - (1 + day_adj)) / 7 + 1 - WEEK_OFFSET
return (unix_date + ORD_OFFSET - (1 + day_adj)) // 7 + 1 - WEEK_OFFSET

# raise ValueError

Expand Down Expand Up @@ -272,6 +290,15 @@ cdef int64_t get_python_ordinal(int64_t period_ordinal, int freq) nogil:
This corresponds to the number of days since Jan., 1st, 1AD.
When the instance has a frequency less than daily, the proleptic date
is calculated for the last day of the period.

Parameters
----------
period_ordinal : int64_t
freq : int

Returns
-------
absdate : int64_t number of days since datetime(1, 1, 1)
"""
cdef:
asfreq_info af_info
Expand All @@ -290,6 +317,20 @@ cdef int dInfoCalc_SetFromAbsDateTime(date_info *dinfo,
"""
Set the instance's value using the given date and time.
Assumes GREGORIAN_CALENDAR.

Parameters
----------
dinfo : date_info*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the asterisk is for (this notation is used in docstrings elsewhere). Have a preferred syntax?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, you are removing this anyhow IIRC? (in follow on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

absdate : int64_t (as returned from get_python_ordinal or absdate_from_ymd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a great comment (nor does it follow doc-string conventions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

abstime : double

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should no longer be needed. IOW this is the point of an errors declaration (and raising)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will be removing the leftover-from-C bits (and camelCase) in the next pass. For now this PR is focused on just the one bug.

-------
code : int (always 0)

Notes
-----
Updates dinfo inplace
"""
# Bounds check
# The calling function is responsible for ensuring that
Expand All @@ -307,6 +348,19 @@ cdef int dInfoCalc_SetFromAbsDate(date_info *dinfo, int64_t absdate) nogil:
"""
Sets the date part of the date_info struct
Assumes GREGORIAN_CALENDAR

Parameters
----------
dinfo : date_info*
unix_date : int64_t

Returns
-------
code : int (always 0)

Notes
-----
Updates dinfo inplace
"""
cdef:
pandas_datetimestruct dts
Expand All @@ -322,6 +376,19 @@ cdef int dInfoCalc_SetFromAbsDate(date_info *dinfo, int64_t absdate) nogil:
cdef int dInfoCalc_SetFromAbsTime(date_info *dinfo, double abstime) nogil:
"""
Sets the time part of the DateTime object.

Parameters
----------
dinfo : date_info*
abstime : double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific here , e.g. what is abstime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next line seconds elapsed since beginning of day described by absdate isn't clear?


Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have Returns for None, remove

-------
code : int (always 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success / error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a dummy; there is no error-state. I guess just make it return None?


Notes
-----
Updates dinfo inplace
"""
cdef:
int inttime
Expand Down Expand Up @@ -370,7 +437,18 @@ cdef int64_t absdate_from_ymd(int year, int month, int day) nogil:
Find the absdate (days elapsed since datetime(1, 1, 1)
for the given year/month/day.
Assumes GREGORIAN_CALENDAR

Parameters
----------
year : int
month : int
day : int

Returns
-------
absdate : int days elapsed since datetime(1, 1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-string format, put the expl on the next line

"""

# /* Calculate the absolute date
cdef:
pandas_datetimestruct dts
Expand All @@ -385,6 +463,24 @@ cdef int64_t absdate_from_ymd(int year, int month, int day) nogil:


cdef int get_yq(int64_t ordinal, int freq, int *quarter, int *year):
"""
Find the year and quarter of a Period with the given ordinal and frequency

Parameters
----------
ordinal : int64_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very odd that these take pointers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point these functions are translated as directly as possible from the C versions. In future passes I'd like to make them more pythonic too, will need to be careful about perf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s fine

freq : int
quarter : *int
year : *int

Returns
-------
qtr_freq : int describing the implied quarterly frequency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


Notes
-----
Sets quarter and year inplace
"""
cdef:
asfreq_info af_info
int qtr_freq
Expand Down
Loading