Skip to content

BUG: Cannot add non-vectorized DateOffset to empty DatetimeIndex #12724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zhengyu-inboc opened this issue Mar 28, 2016 · 11 comments · Fixed by #30336
Closed

BUG: Cannot add non-vectorized DateOffset to empty DatetimeIndex #12724

zhengyu-inboc opened this issue Mar 28, 2016 · 11 comments · Fixed by #30336
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@zhengyu-inboc
Copy link

After the holidays refactor in pandas 0.17.0+, I'm seeing errors when calling the AbstractHolidayCalendar.holidays() method with a date range that is before the declared Holiday rule. Consider the example below, I'm declaring a custom MLK holiday for CME equities market with a specific start date of 1998-01-01, with a non-vectorised DateOffset (third monday of January).

If I call calendar.holidays(start='2015-01-01', end='2015-12-31'), it blows up with an exception:

Traceback (most recent call last):
  File "/tests/test_pandas_features.py", line 35, in test_no_holidays_generated_if_not_in_range
    calendar.holidays(start='1995-01-01', end='1995-12-31'))
  File "/python3/lib/python3.5/site-packages/pandas/tseries/holiday.py", line 377, in holidays
    rule_holidays = rule.dates(start, end, return_name=True)
  File "/python3/lib/python3.5/site-packages/pandas/tseries/holiday.py", line 209, in dates
    holiday_dates = self._apply_rule(dates)
  File "/python3/lib/python3.5/site-packages/pandas/tseries/holiday.py", line 278, in _apply_rule
    dates += offset
  File "/python3/lib/python3.5/site-packages/pandas/tseries/base.py", line 412, in __add__
    return self._add_delta(other)
  File "/python3/lib/python3.5/site-packages/pandas/tseries/index.py", line 736, in _add_delta
    result = DatetimeIndex(new_values, tz=tz, name=name, freq='infer')
  File "/python3/lib/python3.5/site-packages/pandas/util/decorators.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/python3/lib/python3.5/site-packages/pandas/tseries/index.py", line 231, in __new__
    raise ValueError("Must provide freq argument if no data is "
ValueError: Must provide freq argument if no data is supplied

After some digging, it appears that if the supplied date range is a year before the declared holiday rule start date, the DatetimeIndex constructed in tseries/holiday.py is an empty index (a reasonable optimization), but when a non-vectorized DateOffset is being applied to an empty DatetimeIndex, the _add_offset method in tseries/index.py returns a plain empty Index, rather than DatetimeIndex, on which .asi8 is subsequently called and returns None instead of an empty numpy i8 array.

So the underlying problem is that an empty DatetimeIndex doesn't like a non-vectorized DateOffset applied to it.

Observe the testcase below, which reproduces the issue.

Test case

import unittest

import pandas as pd
import pandas.util.testing as pdt
from dateutil.relativedelta import MO
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday
from pandas.tseries.offsets import DateOffset

CME1998StartUSMartinLutherKingJr = Holiday('Dr. Martin Luther King Jr.', month=1, day=1, offset=DateOffset(weekday=MO(3)), start_date='1998-01-01')


class CustomHolidayCalendar(AbstractHolidayCalendar):
    rules = [CME1998StartUSMartinLutherKingJr]


class TestPandasIndexFeatures(unittest.TestCase):
    def test_empty_datetime_index_added_to_non_vectorized_date_offset(self):
        empty_ts_index = pd.DatetimeIndex([], freq='infer')
        # This blows up
        new_ts_index = empty_ts_index + DateOffset(weekday=MO(3))
        self.assertEqual(0, len(new_ts_index))

    def test_no_holidays_generated_if_not_in_range(self):
        calendar = CustomHolidayCalendar()

        # This blows up
        pdt.assert_index_equal(
            pd.DatetimeIndex([]),
            calendar.holidays(start='1995-01-01', end='1995-12-31'))

    def test_holidays_generated_if_in_range(self):
        calendar = CustomHolidayCalendar()

        pdt.assert_index_equal(
            pd.DatetimeIndex(['2015-01-19']),
            calendar.holidays(start='2015-01-01', end='2015-12-31'))

Output

The following tests fail:

  • test_empty_datetime_index_added_to_non_vectorized_date_offset
  • test_no_holidays_generated_if_not_in_range

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 5870731f32ae569e01e3c0a8972cdd2c6e0301f8
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.0+44.g5870731.dirty
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: None
patsy: 0.4.1
dateutil: 2.5.1
pytz: 2016.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
@zhengyu-inboc
Copy link
Author

Currently, I'm using the following patch as a temporarily workaround. The workaround is to simply return the index if the length is 0, which should be ok as the index is immutable.

def _add_offset(self, offset):
    import warnings
    try:
        from pandas.io.common import PerformanceWarning
    except:
        from pandas.core.common import PerformanceWarning

    try:
        if self.tz is not None:
            values = self.tz_localize(None)
        else:
            values = self
        result = offset.apply_index(values)
        if self.tz is not None:
            result = result.tz_localize(self.tz)
        return result

    except NotImplementedError:
        warnings.warn("Non-vectorized DateOffset being applied to Series "
                      "or DatetimeIndex", PerformanceWarning)
        if len(self) == 0:
            return self
        return self.astype('O') + offset


def patch_pandas_index():
    pd.DatetimeIndex._add_offset = _add_offset


# Patch pandas empty datetimeindex and offset operations
patch_pandas_index()

@zhengyu-inboc zhengyu-inboc changed the title Cannot add non-vectorized DateOffset to empty DatetimeIndex BUG: Cannot add non-vectorized DateOffset to empty DatetimeIndex Mar 28, 2016
@jreback
Copy link
Contributor

jreback commented Mar 29, 2016

not really sure this is supported, meaning this kind of dateutil.relativedetla offset. What are you intending this to actually do?

@jreback jreback added the Frequency DateOffsets label Mar 29, 2016
@zhengyu-inboc
Copy link
Author

This definitely used to be a supported feature, as least in 0.16.x

The custom MLK rule in my testcase is simply a modified example taken from tseries/holiday.py. The holiday.py module is shipped with pandas and it uses dateutil.relativedelta, so it must be a supported feature.

USMartinLutherKingJr = Holiday('Dr. Martin Luther King Jr.', start_date=datetime(1986,1,1), month=1, day=1,
                               offset=DateOffset(weekday=MO(3)))

This kind of holiday rule is quite common for exchanges across the world, e.g. as new national holidays introduced they would all have a start date. With this bug generating holidays before the starting date would not be possible. one would simply expect an empty index returned if rule does not yield any holidays.

@jreback
Copy link
Contributor

jreback commented Mar 29, 2016

cc @chris-b1
cc @rockg

hmm, maybe something broke. Though the [19] is expected to be honest, its not really clear what to do with this. you are trying to add a frequency aware object to something with no frequency, so what should you do? coerce or raise? I think this error message is fine, maybe something needs an adjustment in the holiday logic.

In [18]: DatetimeIndex(['20160101'],freq='D') + DateOffset(weekday=MO(3))
Out[18]: DatetimeIndex(['2016-01-18'], dtype='datetime64[ns]', freq=None)

In [19]: DatetimeIndex([],freq='D') + DateOffset(weekday=MO(3))
ValueError: Must provide freq argument if no data is supplied

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 29, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 29, 2016
@zhengyu-inboc
Copy link
Author

in [19], I think the expected output should be:

 DatetimeIndex([], dtype='datetime64[ns]', freq='D')

i.e. the same index as before. I think that was the behaviour in pandas 0.16.x (I haven't had time to check)

I think the behaviour for empty DateTimeIndex should be consistent with other empty indexes, as well as empty numpy arrays.

In [11]: Index([1,2,3]) + 1
Out[11]: Int64Index([2, 3, 4], dtype='int64')

In [12]: Index([]) + 1
Out[12]: Index([], dtype='object')

@zhengyu-inboc
Copy link
Author

It is probably easier to fix this within the holiday.py module, maybe DatetimeIndex should be special.

@jreback
Copy link
Contributor

jreback commented Mar 29, 2016

@yerikzheng that example is not the same. Those don't have frequencies, so datetimelikes ARE special. Adding a frequency aware object to a non-freq aware is an explicit error.

@zhengyu-inboc
Copy link
Author

Yes, thinking about it again I agree with you. However, I do expect AbstractHolidayCalendar.holidays() method to handle the case for which no holiday is generated by the rule, so this must have broke since 0.17.0 release as it did work correctly before.

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 27, 2016
@jreback
Copy link
Contributor

jreback commented Apr 27, 2016

@yerikzheng would love for a PR for this.

@nickpollari
Copy link

Funny that I just came across this post. I found this error today with the same exact issue (MLK for start of 1998). I have begun looking into a work around but this definitely used to be supported. It should be fixed in the next release.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Aug 21, 2016
@jreback jreback modified the milestones: 0.19.0, Next Major Release Sep 28, 2016
@jbrockmendel jbrockmendel mentioned this issue Dec 19, 2017
39 tasks
@jbrockmendel jbrockmendel self-assigned this Jan 28, 2019
@mroeschke
Copy link
Member

This looks fixed on master (Jeff's example). Guess this could use a test

In [62]: DatetimeIndex([],freq='D') + DateOffset(weekday=MO(3))
/Users/matthewroeschke/pandas-mroeschke/pandas/core/arrays/datetimes.py:835: PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex
  PerformanceWarning,
Out[62]: DatetimeIndex([], dtype='datetime64[ns]', freq=None)

In [63]: pd.__version__
Out[63]: '0.26.0.dev0+576.gde67bb72e'

@mroeschke mroeschke removed Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Frequency DateOffsets labels Oct 16, 2019
@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Regression Functionality that used to work in a prior pandas version labels Oct 16, 2019
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 16, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants