Skip to content

BUG: seems that subtracting a YearBegin offset from an index loses the months/days #4804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Sep 10, 2013 · 17 comments
Labels
Bug Datetime Datetime data dtype Frequency DateOffsets Timedelta Timedelta data type

Comments

@jreback
Copy link
Contributor

jreback commented Sep 10, 2013

related is #4637 (on freq for WeekDay)

This might be expected, e.g. it conforms the values. or is this a bug?

In [48]: index = date_range('20140101 00:15:00',freq='D',periods=10)
In [44]: df.index.values
Out[44]: 
array(['2013-12-31T19:15:00.000000000-0500',
       '2014-01-01T19:15:00.000000000-0500',
       '2014-01-02T19:15:00.000000000-0500',
       '2014-01-03T19:15:00.000000000-0500',
       '2014-01-04T19:15:00.000000000-0500',
       '2014-01-05T19:15:00.000000000-0500',
       '2014-01-06T19:15:00.000000000-0500',
       '2014-01-07T19:15:00.000000000-0500',
       '2014-01-08T19:15:00.000000000-0500',
       '2014-01-09T19:15:00.000000000-0500'], dtype='datetime64[ns]')

In [46]: (df.index - pd.offsets.YearBegin(1)).values
Out[46]: 
array(['2012-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500',
       '2013-12-31T19:15:00.000000000-0500'], dtype='datetime64[ns]')
@rockg
Copy link
Contributor

rockg commented Sep 11, 2013

I agree this is a bug. It seems to me something is going on with the timezones. When you just look at df.index - pd.offsets.YearBegin(1) (not the values) you get the right times. If you provide a timezone initially as US/Eastern, the times you expect are returned. Also, I believe there is a bug in dateutil which doesn't observe the timezone information and so all resulting times are shifted by offset of the initial date rather than the resulting date. For example, taking a March dst time and shifting to the beginning of the month, retains the dst offset of the start date (so you actually get the last hour of February returned).

tz = pytz.timezone('US/Eastern')
dt = tz.localize(datetime(2013,3,20))
dt
datetime.datetime(2013, 3, 20, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>)
dt - MonthBegin()
datetime.datetime(2013, 3, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>)

The last time should be 19:00:00 DST

@rockg
Copy link
Contributor

rockg commented Sep 12, 2013

Actually I think this is just a representation issue of datetime64 that casts the time as local to what seems to be the machine's timezone. I cannot find the code where this happens, though. Case in point:

indextz = date_range('20140101 00:15:00', freq='D', periods=10, tz='UTC')
indextz.values
array(['2013-12-31T19:15:00.000000000-0500',
       '2014-01-01T19:15:00.000000000-0500',
       '2014-01-02T19:15:00.000000000-0500',
       '2014-01-03T19:15:00.000000000-0500',
       '2014-01-04T19:15:00.000000000-0500',
       '2014-01-05T19:15:00.000000000-0500',
       '2014-01-06T19:15:00.000000000-0500',
       '2014-01-07T19:15:00.000000000-0500',
       '2014-01-08T19:15:00.000000000-0500',
       '2014-01-09T19:15:00.000000000-0500'], dtype='datetime64[ns]')
indextz
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 00:15:00, ..., 2014-01-10 00:15:00]
Length: 10, Freq: D, Timezone: UTC

@rockg
Copy link
Contributor

rockg commented Sep 24, 2013

If I set my machine's TZ to UTC and do the above. I get the following:

index = date_range('20140101 00:15:00',freq='D',periods=10)
(index - pd.offsets.YearBegin(1)).values
array(['2013-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000',
'2014-01-01T00:15:00.000000000+0000'], dtype='datetime64[ns]')

So I'm confident that it is just a display issue.

@jreback
Copy link
Contributor Author

jreback commented Sep 25, 2013

@rockg so there is not easy way to effectily just subtract a year? Like a 'fixed' year, one that does not reset to the begin/end of year? (month obviously cannot support this as that can cause a month/year roll), but seems we need a: Year that essentially is:

In [18]: index = date_range('20121230 00:01:15',periods=5,freq='D')

In [19]: index.values
Out[19]: 
array(['2012-12-29T19:01:15.000000000-0500',
       '2012-12-30T19:01:15.000000000-0500',
       '2012-12-31T19:01:15.000000000-0500',
       '2013-01-01T19:01:15.000000000-0500',
       '2013-01-02T19:01:15.000000000-0500'], dtype='datetime64[ns]')

In [20]: from dateutil.relativedelta import relativedelta

In [21]: Index([  i + relativedelta(i,years=-1) for i in index ]).values
Out[21]: 
array(['2011-12-29T19:01:15.000000000-0500',
       '2011-12-30T19:01:15.000000000-0500',
       '2011-12-31T19:01:15.000000000-0500',
       '2012-01-01T19:01:15.000000000-0500',
       '2012-01-02T19:01:15.000000000-0500'], dtype='datetime64[ns]')

@rockg
Copy link
Contributor

rockg commented Sep 26, 2013

@jreback -- sorry, I was misunderstanding your statement (I thought you were commenting on the seemingly incorrect days/hours when using YearBegin). I agree that a Year offset is basic functionality. A couple more would be useful as well--WeekendDay (so Saturday/Sunday--the complement to BDay, maybe WDay?) and Easter (one of the more tricky given the holidays but dateutil has most of the work done already). I'll take these.

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

I would call WeekendDay just that (to avoid ambiguity). If you want to do holiday offsets, ok, but create a generic HolidayOffset first

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

@rockg heads up that cc @Cancan01 has a PR in progress for parts of this (Weekday I think), see #4511

@rockg
Copy link
Contributor

rockg commented Sep 26, 2013

I would like to get some thoughts on HolidayOffset and the next logical step of HolidayCalendar. I worked extensively with these in the past so have some ideas.

  1. A holiday falls into two categories:
    is a Month / Day combination with a possible offset/observance. For example, Thanksgiving is always the 4th Thursday of the month so would be 11/1 + 4 Thursdays.

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

@rockg there is already some functionaility in 0.12, see here. http://pandas.pydata.org/pandas-docs/dev/timeseries.html#custom-business-days-experimental

@rockg
Copy link
Contributor

rockg commented Sep 26, 2013

@jreback it was sent too early...

The functionality I would provide would allow for holidays to be retrieved from pandas in a nice way, but I need to understand what the expectations are for what lives in pandas vs not.

In general

  1. A holiday falls into two categories:
    a. Month / Day combination with a possible offset. For example, Thanksgiving is always the 4th Thursday of the month so would be 11/1 + 4 Thursdays.
    b. Month / Day combination with possible observance. For example, New Year's always occurs on 1/1 but may be observed on the following Monday or the previous Friday depending on what the rules are defined for that calendar.
  2. Holiday calendars are collections of holidays with certain observance rules and possible effective dates. For example, the US government has a different calendar than the CME Exchange which has a different holiday calendar than the UK. Given a holiday calendar there would a method called holidays which would return the relevant holidays in a given date range.

So, if we create a generic HolidayOffset would pandas then hold definitions for other holiday rules (i.e., would pandas have NewYears, LaborDay, MemorialDay, UKBankHoliday etc.)? An argument into some of these would be the observance when creating an instance of the class. Further, would pandas implement various holiday calendars (e.g., US, UK, CME, etc.) or would these subclasses just be for users to implement?

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

Calendars are a bit tricky; I in fact do keep separate calendar for each country (or you could have a sub-entity as well, e.g. CME/NYSE).

Thus the numpy calendar facility DOES become attractive (though maybe should be wrapped). To allow instantiation of calendars that are then passed simply to a BDay (or CDay) type of object.

I think an extension along those lines would be pretty useful.

@rockg
Copy link
Contributor

rockg commented Sep 26, 2013

The numpy calendars are missing a lot of nuances and still need a list of holidays to pass in. Nevertheless, I think it's in keeping with how things have been implemented so far so I will subclass. So the expectation is that pandas will have the definitions for some basic set of holidays and calendar observances?

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

@rockg I would have some 'standard' calendar sublcasses, with each holiday added thru a method, then its easy to create a 'new' calender, just sub-class your starting one, and add some holidays.

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

@rockg should take year / country combination as parameters (as some holidays are only in certain years).

you can have country sub-classes with lots of holiday definitions and various bizness sub-class to that.

@rockg
Copy link
Contributor

rockg commented Sep 26, 2013

@jreback Back to Year offset, there's not a Month offset either, but I simply use DateOffset(months/years=n). Probably still worth having explicit classes for these.

@jreback
Copy link
Contributor Author

jreback commented Sep 26, 2013

@rockg duh! but I agreee....having explicity Month/Year would be nice

@jbrockmendel
Copy link
Member

This is solved by subtracting pd.DateOffset(years=1). The related issue of naming being confusing is still in #18854.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Frequency DateOffsets Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

3 participants