Skip to content

API: which "anchor point" for datetime properties of Periods ? #20324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Mar 13, 2018 · 8 comments
Open
Labels
Docs Period Period data type

Comments

@jorisvandenbossche
Copy link
Member

In several docstring PRs for Period datetime properties, we ran into the confusion about how the date/time of those attributes are determined (start or end ?). Eg see discussion in #20277 (comment)

Small example to illustrate:

In [30]: p1 = pd.Period('2017-01-01', freq='D')

In [31]: p2 = pd.Period('2017-01-01', freq='M')

In [32]: p1
Out[32]: Period('2017-01-01', 'D')

In [33]: p2
Out[33]: Period('2017-01', 'M')

In [34]: p1.start_time
Out[34]: Timestamp('2017-01-01 00:00:00')

In [35]: p2.start_time
Out[35]: Timestamp('2017-01-01 00:00:00')

In [36]: p1.day
Out[36]: 1

In [37]: p2.day
Out[37]: 31

The discussion raised from how to describe the summary of such an attribute: "The day of the month" -> but which day of the period span? -> should this be "The day of the month of the start of the Period" ? -> ah, no, because it is not always the start, it depends on the frequency.

In the above example, M is actually the freq string for "MonthEnd", and the datetime-properties apparently then use the end as date to calculate those properties.

Questions:

  • How best to document this? Can we use a certain phrase in all docstrings?
  • Is there a way to know, given a certain freq, what the "anchor point" is? (using anchor point here, don't know if we have existing terminology for that) A way to know if the freq is a "End" ?
  • It's rather confusing behaviour, is this actually the behaviour we want?

cc @jreback @jbrockmendel @sinhrks

@dukebody
Copy link
Contributor

About third question, I'd say the current behavior is quite confusing. I'd even be in favor of deprecating these properties on Period and force the user to do e.g. period.start_time.dayofweek or period.end_time.dayofweek to avoid confusion... although it might be too nuclear :)

@dukebody
Copy link
Contributor

Another option would be to only report these properties if they are the same for start and end date. For example, if I select a period of one hour starting from now, the dayofweek is the same for start and end. If I select a period of one month, the dayofweek is somewhat undefined for the period, because there are multiple days of week during this period.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 13, 2018

It gets even stranger when you use a multiple of a freq:

In [56]: p3 = pd.Period('2017-01-01', freq='2M')

In [57]: p3.start_time
Out[57]: Timestamp('2017-01-01 00:00:00')

In [58]: p3.end_time
Out[58]: Timestamp('2017-02-28 23:59:59.999999999')

In [59]: p3.day
Out[59]: 31

So in this case, for the datetime properties, it discards the multiplier in the freq, which is for sure a bug.

@jorisvandenbossche
Copy link
Member Author

And another "strange" thing: for the day attribute it seems to take the end time for MonthEnd, but for the hour, minute, .. not:

In [61]: p2 = pd.Period('2017-01-01', freq='M')

In [62]: p2.start_time
Out[62]: Timestamp('2017-01-01 00:00:00')

In [63]: p2.end_time
Out[63]: Timestamp('2017-01-31 23:59:59.999999999')

In [64]: p2.day
Out[64]: 31

In [65]: p2.hour
Out[65]: 0

In [66]: p2.minute
Out[66]: 0

@jbrockmendel
Copy link
Member

Related: #18378.

@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

I think we should just (deprecate) and then remove all of these properties. I think these were just carried over from Timestamp.

@mroeschke mroeschke added the Deprecate Functionality to remove in pandas label May 11, 2020
@mroeschke
Copy link
Member

Don't think these attributes can be easily deprecated. Since Periods can represent time spans past Timestamp bounds, Periods have property definitions that are valid beyond the Timestamp bounds. Probably best if this is documented better

@mroeschke mroeschke added Docs and removed Deprecate Functionality to remove in pandas labels Jun 8, 2022
@jbrockmendel
Copy link
Member

I think the Technically Correct solution would be for high-frequency attributes e.g. Period("2016", "Y").day to return NaN instead of an int.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Period Period data type
Projects
None yet
Development

No branches or pull requests

5 participants