Skip to content

Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition #11370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rekcahpassyla opened this issue Oct 19, 2015 · 4 comments · Fixed by #11427
Labels
Bug Frequency DateOffsets
Milestone

Comments

@rekcahpassyla
Copy link
Contributor

This code returns different values in 0.17.0 and 0.15.2

import pandas as pd
from pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
    pd.offsets.Day, pd.offsets.MonthBegin,
    pd.offsets.QuarterBegin, pd.offsets.YearBegin,
]

dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')

for offset in offsets:
    # adding each item individually or vectorised should give same answer
    expected_vec = dates + offset(n=0)
    expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

    msg = "offset: {}, vectorised: {}, individual: {}".format(
        offset, expected_vec, expected
    )
    try:
        if pd.__version__ == '0.17.0':
            assert_index_equal(expected_vec, expected, check_names=False)
        else:
            assert_index_equal(expected_vec, expected)
    except AssertionError as er:
        raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
  File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 24, in <module>
    raise Exception(msg + str(er))
Exception: offset: <class 'pandas.tseries.offsets.MonthBegin'>, vectorised: DatetimeIndex(['2010-12-01', '2011-01-01', '2011-01-01', '2011-01-01',
               '2011-01-01'],
              dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-01', '2011-02-01', '2011-02-01', '2011-02-01',
               '2011-02-01'],
              dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]:  DatetimeIndex(['2010-12-01', '2011-01-01', '2011-01-01', '2011-01-01',
               '2011-01-01'],
              dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-01', '2011-02-01', '2011-02-01', '2011-02-01',
               '2011-02-01'],
              dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
@rekcahpassyla rekcahpassyla changed the title Vectorised addition of MonthOffset returns different values to item-by-item addition Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition Oct 19, 2015
@chris-b1
Copy link
Contributor

This is from #10744, I didn't have the n=0 semantics right (and apparently didn't test!). It'll be a couple days, but I'll submit a fix.

@jreback jreback added Bug Frequency DateOffsets labels Oct 19, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 19, 2015
@rekcahpassyla
Copy link
Contributor Author

Many thanks for quick response!

@rekcahpassyla
Copy link
Contributor Author

MonthEnd also not working:

Test script

import pandas as pd
from pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
    pd.offsets.MonthEnd,
    pd.offsets.QuarterEnd, pd.offsets.YearEnd,
]


dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')

for offset in offsets:
    # adding each item individually or vectorised should give same answer
    expected_vec = dates + offset(n=0)
    expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

    msg = "offset: {}, vectorised: {}, individual: {}".format(
        offset, expected_vec, expected
    )
    try:
        if pd.__version__ == '0.17.0':
            assert_index_equal(expected_vec, expected, check_names=False)
        else:
            assert_index_equal(expected_vec, expected)
    except AssertionError as er:
        raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
  File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 34, in <module>
    raise Exception(msg + str(er))
Exception: offset: <class 'pandas.tseries.offsets.MonthEnd'>, vectorised: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
               '2010-12-31'],
              dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
               '2011-01-31'],
              dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]:  DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
               '2010-12-31'],
              dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
               '2011-01-31'],
              dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None

@chris-b1
Copy link
Contributor

Probably also wrong for YearEnd and QuarterEnd too as the counting
logic is shared IIRC.

On Mon, Oct 19, 2015 at 10:31 AM, Petra Chong [email protected]
wrote:

MonthEnd also not working:

Test script

import pandas as pdfrom pandas.util.testing import assert_index_equal

pd.show_versions()

offsets = [
pd.offsets.MonthEnd,
pd.offsets.QuarterEnd, pd.offsets.YearEnd,
]

dates = pd.date_range('2011-01-01', '2011-01-05', freq='D')
for offset in offsets:
# adding each item individually or vectorised should give same answer
expected_vec = dates + offset(n=0)
expected = pd.DatetimeIndex([d + offset(n=0) for d in dates])

msg = "offset: {}, vectorised: {}, individual: {}".format(
    offset, expected_vec, expected
)
try:
    if pd.__version__ == '0.17.0':
        assert_index_equal(expected_vec, expected, check_names=False)
    else:
        assert_index_equal(expected_vec, expected)
except AssertionError as er:
    raise Exception(msg + str(er))

0.17.0

INSTALLED VERSIONS------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.0
setuptools: 18.0.1
Cython: 0.22
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None
Traceback (most recent call last):
File "c:\dev\code\sandbox\pandas_17_vs_15_dateoffsets.py", line 34, in
raise Exception(msg + str(er))Exception: offset: <class 'pandas.tseries.offsets.MonthEnd'>, vectorised: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
'2010-12-31'],
dtype='datetime64[ns]', freq=None), individual: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
'2011-01-31'],
dtype='datetime64[ns]', freq=None)Index are different

Index values are different (100.0 %)
[left]: DatetimeIndex(['2010-12-31', '2010-12-31', '2010-12-31', '2010-12-31',
'2010-12-31'],
dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2011-01-31', '2011-01-31', '2011-01-31', '2011-01-31',
'2011-01-31'],
dtype='datetime64[ns]', freq=None)

0.15.2

INSTALLED VERSIONS------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 1.0.7
pymysql: None
psycopg2: None


Reply to this email directly or view it on GitHub
#11370 (comment).

@jreback jreback changed the title Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition Vectorised addition of MonthOffset(n=0) returns different values to item-by-item addition Oct 19, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants