Skip to content

shift() broken for datetime64 when used with fill_value #31971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BayerSe opened this issue Feb 14, 2020 · 10 comments · Fixed by #32591
Closed

shift() broken for datetime64 when used with fill_value #31971

BayerSe opened this issue Feb 14, 2020 · 10 comments · Fixed by #32591
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@BayerSe
Copy link

BayerSe commented Feb 14, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas.testing import assert_series_equal

dt_series = pd.Series((pd.to_datetime('2020-01-01'), pd.to_datetime('2020-01-02')))

actual_result = dt_series.shift(1, fill_value=0)
expected_result = pd.Series((pd.to_datetime('1970-01-01'), pd.to_datetime('2020-01-01')))

assert_series_equal(actual_result, expected_result)

Problem description

Above code snippet works on version 0.25.3 but breaks with 1.0.1.

print(actual_result)
0                      0
1    1577836800000000000
dtype: object

print(expected_result)
0   1970-01-01
1   2020-01-01
dtype: datetime64[ns]
AssertionError: Attributes of Series are different
Attribute "dtype" are different
[left]:  object
[right]: datetime64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-72-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@jorisvandenbossche
Copy link
Member

@BayerSe thanks for the report!

I vaguely seem to remember some changes related to fill_value for datetimelikes (cc @jbrockmendel ?), eg to start requiring an actual timestamp, but the above also doesn't seem like very useful behaviour.

@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 14, 2020
@jorisvandenbossche
Copy link
Member

Caused by #28852: I suppose the change was done on purpose, but it also didn't explicitly test that change or have a mention in the whatsnew

@jorisvandenbossche
Copy link
Member

@jbrockmendel can you comment on whether this was intentional? I find several PRs that are related to this maybe_promote, but not one (or an issue) that discusses the reasoning or the implications.

@jbrockmendel
Copy link
Member

Looks like in Block.shift it is calling new_values, fill_value = maybe_upcast(self.values, 0) and new_values is coming back as object-dtype of ints, which seems wacky.

We either need to patch Series.shift to dispatch to self._values when that is an EA, or patch DatetimeBlock to use array_values (probably also TimedeltaBlock, and that only handles the 1D cases)

@BayerSe until we get this patched i think you're best off passing fill_value=pd.Timestamp(0) (this is also more likely to be future-proof than passing an int)

@BayerSe
Copy link
Author

BayerSe commented Feb 20, 2020

@jbrockmendel Did as you suggested, thanks for the hint.

@jbrockmendel
Copy link
Member

Should this be marked as a regression?

@TomAugspurger
Copy link
Contributor

I think so.

For now, my preference would be to just catch this case specifically, I think as explicitly as

if is_datetime64ns_dtype(self.values.dtype) and isinstance(periods, int):
    periods = pd.Timestamp(periods)

I also think that this should warn and eventually in the future (1.1), instructing users to fill with a Timestamp.

@TomAugspurger TomAugspurger added Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype labels Mar 10, 2020
@jbrockmendel
Copy link
Member

For now, my preference would be to just catch this case specifically, I think as explicitly as [...]

Longer-term the solution is to patch DatetimeBlock.shift to dispatch to DTA.shift. ATM DTA.shift accepts ints, so I think that would fix the regression for 1.0.2. That said, it shouldnt accept ints, so we'll need to deprecate that behavior on the DTA, which will in turn effectively deprecate it for Series too.

@TomAugspurger
Copy link
Contributor

That sounds good to me. Did you say you're able to work on a patch to fix this regression for 1.0.2 to (temporarily) allow ints again?

@jbrockmendel
Copy link
Member

yah, im running the tests locally now, with luck will push in a few minutes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants