Skip to content

WIP: Add DayEnd, DayBegin Offsets (Help Wanted) #27087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

WIP: Add DayEnd, DayBegin Offsets (Help Wanted) #27087

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 27, 2019

This is missing tests, but I'm not sure I'm doing this properly (timezones, DST) so comments/advice welcome. If reviewers are inclined to merge a complete PR, I'll gladly add the missing tests.

I think the +timedelta(days=1,nanoseconds=-1) trick is safe, pytz database shows no dst transitions at midnight in the history of the world, though I encounter a trippy "non-existent time" Exception, which should merit an achievement badge in some other context.

In [4]: import pandas as pd
   ...: index=pd.date_range("2011-12-31","2012-01-01",freq="12H")
   ...: index
Out[4]: 
DatetimeIndex(['2011-12-31 00:00:00', '2011-12-31 12:00:00',
               '2012-01-01 00:00:00'],
              dtype='datetime64[ns]', freq='12H')

In [5]: index+pd.offsets.DayEnd()
Out[5]: 
DatetimeIndex(['2011-12-31 23:59:59.999999999',
               '2011-12-31 23:59:59.999999999',
               '2012-01-01 23:59:59.999999999'],
              dtype='datetime64[ns]', freq='12H')

In [6]: index-pd.offsets.DayEnd()
Out[6]: 
DatetimeIndex(['2011-12-30 23:59:59.999999999',
               '2011-12-30 23:59:59.999999999',
               '2011-12-31 23:59:59.999999999'],
              dtype='datetime64[ns]', freq='12H')

In [2]: index-pd.offsets.DayBegin()
Out[2]: DatetimeIndex(['2011-12-30', '2011-12-30', '2011-12-31'], dtype='datetime64[ns]', freq='12H')

In [3]: index+pd.offsets.DayBegin()
Out[3]: DatetimeIndex(['2012-01-01', '2012-01-01', '2012-01-02'], dtype='datetime64[ns]', freq='12H')

I needed DayEnd in #26959, when you want to compute rolling stat by day, and you want the window backward to always be from the end of the day, not whatever random time the last event that day occurred.

related #7049, insofar as users have a need for a wider range of aliases then supported.

@pep8speaks
Copy link

pep8speaks commented Jun 27, 2019

Hello @pilkibun! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 941:23: E226 missing whitespace around arithmetic operator
Line 954:80: E501 line too long (106 > 79 characters)

Comment last updated at 2019-06-28 15:50:51 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does .normalize() not actually do this already?

@ghost
Copy link
Author

ghost commented Jun 27, 2019

You're right, it does make DayBegin redundant in principle. That is, it doesn't do something else.

But since all other offsets seem to come in pairs, and for symmetry, I thought it's best to provide it. Not attached to it, though.

@mroeschke
Copy link
Member

I'm -1 for adding more offests especially in this case where DayBegin is the same as .normalize()

@ghost
Copy link
Author

ghost commented Jun 28, 2019

@mroeschke, what then is the idiomatic way for the equivalent of index+3*pd.offsets.DayEnd()?

@mroeschke
Copy link
Member

(index + pd.offsets.Day(4)).normalize() - pd.offsets.Nano(1)

We haven't seen any demand for DayEnd until your PR. It'd be best to open up an issue to gauge interest first.

@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

idiomatically we already an do this pretty easily; though I am not sure of an actual usecase that you would need this

In [1]: dr = pd.date_range("2011-12-31","2012-01-01",freq="12H")                                                                                                                                                                                                 

In [2]: dr.to_period('D').to_timestamp(how='end')                                                                                                                                                                                                                
Out[2]: 
DatetimeIndex(['2011-12-31 23:59:59.999999999',
               '2011-12-31 23:59:59.999999999',
               '2012-01-01 23:59:59.999999999'],
              dtype='datetime64[ns]', freq=None)

In [3]: dr.to_period('D').to_timestamp(how='start')                                                                                                                                                                                                              
Out[3]: DatetimeIndex(['2011-12-31', '2011-12-31', '2012-01-01'], dtype='datetime64[ns]', freq=None)

@jreback jreback added the Frequency DateOffsets label Jun 28, 2019
@ghost
Copy link
Author

ghost commented Jun 28, 2019

Compare (after #27090)

df.ceil('DE')

which matches what you currently use if you want end of month:

df.ceil('M')

a day is just another frequency, it's bad usability to require entirely different ways to accomplish what is essentially the same thing.

@ghost
Copy link
Author

ghost commented Jul 31, 2019

I still think this would be useful, but the silence on this strongly suggests a definite no.

@ghost ghost closed this Jul 31, 2019
@ghost ghost deleted the add_endofday_offset branch July 31, 2019 14:11
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants