-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Timestamp.replace chaining not compat with datetime.replace #15683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is correct according to
|
actually I find the |
Okay, thx for the feedback. |
We did some more research on this issue and found the following: The problem occurs during DST-changes when we (de)normalize input dates from dates with tzinfo to UTC dates and back from tz-less UTC dates to dates with a tzinfo. The stdlib docs states: “Return a date with the same value, except for those parameters given new values by whichever keyword arguments are specified.” Lets test this:
Pandas 0.18.1:
Pandas 0.19.2:
The datetime in the last row of the Pandas 0.19.2 output is incorrect. This readily occurs in the context of pytz as Interestingly, the issue only occurs if we do the “double replace” but not if we directly initialize a datetime with a tzinfo:
Pandas 0.18.1:
Pandas 0.19.2:
I looked at the change logs and coudn’t find anything related to this issue. |
I guess this is a bug,
All that said, I would never use
you are welcome to submit a PR to fix. |
f8bd08e is the change (has been modified slightly since then). |
Unfortunately, we currently don't have the time to get familiar with the pandas internals and fix this issue ourselves. |
I added a regression test for this issues as follows: diff --git a/pandas/tests/tseries/test_timezones.py b/pandas/tests/tseries/test_timezones.py
index 1fc0e1b..75d4872 100644
--- a/pandas/tests/tseries/test_timezones.py
+++ b/pandas/tests/tseries/test_timezones.py
@@ -1233,6 +1233,18 @@ class TestTimeZones(tm.TestCase):
self.assertEqual(result_pytz.to_pydatetime().tzname(),
result_dateutil.to_pydatetime().tzname())
+ # issue 15683
+ dt = datetime(2016, 3, 27, 1)
+ tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
+ # This should work:
+ result_dt = dt.replace(tzinfo=tzinfo)
+ result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
+ self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
+ # self.assertEqual(result_dt, result_pd.to_datetime()) # This fails!!!
+ # This should fail:
+ result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
+ result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
+ self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
+ # self.assertEqual(result_dt, result_pd.to_datetime())
+
def test_index_equals_with_tz(self):
left = date_range('1/1/2011', periods=100, freq='H', tz='utc')
right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern') Surprisingly, the
I still have no Idea how to fix this. |
I played around with with it a little bit more and something looks very broken: >>> import datetime, pandas, pytz
>>>
>>> # Two equal datetimes:
>>> dt = datetime.datetime(2016, 3, 27, 1)
>>> pd = pandas.Timestamp(dt)
>>> dt == pd
True
>>> dt == pd.to_pydatetime()
True
>>> dt.timestamp() == pd.timestamp()
True
>>>
>>> # Let's introduce timezones and stuff breaks:
>>>
>>> tzinfo = pytz.timezone('CET')
>>> rdt = dt.replace(tzinfo=tzinfo)
>>> rpd = pd.replace(tzinfo=tzinfo)
>>>
>>> rdt == rpd # What?
False
>>> rdt == rpd.to_pydatetime() # Really?
False
>>> rdt.timestamp() == rpd.timestamp() # Why is this True now?
True
>>> # What do we have?
>>> rdt
datetime.datetime(2016, 3, 27, 1, 0, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)
>>> rpd # This *looks* like rdt but is *not equal* to it.
Timestamp('2016-03-27 01:00:00+0100', tz='CET')
>>> rpd.to_pydatetime() # This is cleary not wanted:
datetime.datetime(2016, 3, 27, 0, 0, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)
>>>
>>> # This seems to be the logical result of the above bug:
>>> ndt = rdt.replace(tzinfo=None)
>>> npd = rpd.replace(tzinfo=None)
>>> ndt
datetime.datetime(2016, 3, 27, 1, 0)
>>> npd
Timestamp('2016-03-27 00:00:00')
>>> npd.to_pydatetime()
datetime.datetime(2016, 3, 27, 0, 0)
>>> ndt == dt
True
>>> npd == pd
False |
The >>> dttz = datetime.datetime(2016, 3, 27, 1, tzinfo=tzinfo)
>>> pdtz = pandas.Timestamp(2016, 3, 27, 1, tzinfo=tzinfo)
>>> dttz
datetime.datetime(2016, 3, 27, 1, 0, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)
>>> pdtz # Where is the tzinfo?
Timestamp('2016-03-27 01:00:00')
>>> dttz.timestamp() == pdtz.timestamp() # Expected
True
>>> dttz == pdtz # Unexpected
False
>>> dttz == pdtz.to_pydatetime() # Unexpected
False |
@sscherfke
You are encourage to use the construction as a small issue xref in #15777 |
Okay, maybe my last example might then not be related to this issue. But the problem with I'd really like to help fixing this issue but Pandas has a very huge code base and I'm a very new Pandas user... :-/ |
@sscherfke well |
Yes, the other things is the problem. Finding out what they are supposed to do and what they actually to and which other thing actually it the culprit for this issue. :) |
now that #15934 is merged the construction issues should be fixed, FYI. |
Yes, the construction issues are fixed now. I hoped that this might (accidentally) fix this issue, but it doesn't: >>> import datetime, pandas, pytz
>>> tzinfo = pytz.timezone('CET')
>>> dt = datetime.datetime(2016, 3, 27, 1)
>>> pd = pandas.Timestamp(dt)
>>> dttz = dt.replace(tzinfo=tzinfo)
>>> pdtz1 = pd.replace(tzinfo=tzinfo)
>>> pdtz2 = pandas.Timestamp('2016-03-27 01:00', tz='CET')
>>> dttz == pdtz1
False
>>> dttz == pdtz2
True
>>> for x in [pdtz1, pdtz2]:
... print(x, x.tzinfo, x.timestamp(), x.value, x.to_pydatetime())
...
2016-03-27 01:00:00+01:00 CET 1459036800.0 1459033200000000000 2016-03-27 00:00:00+01:00
2016-03-27 01:00:00+01:00 CET 1459036800.0 1459036800000000000 2016-03-27 01:00:00+01:00
As you can see, the
The
So I think |
Yes, |
yeah this should not be converting, instead it should be localizing.
|
this breaks another test, but passes (so you would make this into an actual test)
|
I am very confused about what's happening inside Pandas: >>> dt = datetime.datetime(2016, 3, 27, 1)
>>> datetime.datetime.fromtimestamp(pandas.Timestamp(dt).timestamp())
datetime.datetime(2016, 3, 27, 1, 0)
>>> datetime.datetime.fromtimestamp(pandas.Timestamp(dt).value / 1000000000)
datetime.datetime(2016, 3, 27, 3, 0) I thought When *edit: Saw you comments only after I wrote this comment. |
A test case like this will show the issue and pass when your proposed fix is applied: def test_issue_15683(self):
# issue 15683
dt = datetime(2016, 3, 27, 1)
tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
result_dt = dt.replace(tzinfo=tzinfo)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
self.assertEqual(result_dt, result_pd.to_pydatetime())
self.assertEqual(result_dt, result_pd)
result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
self.assertEqual(result_dt, result_pd.to_pydatetime())
self.assertEqual(result_dt, result_pd) |
happy to take a PR to fix as I said. |
What about the breaking test? |
if you'd like to delve into that would be helpful |
More problems (with our fix): >>> import datetime, pandas, pytz
>>> tzinfo = pytz.timezone('CET')
>>>
>>> # Reference case with datetime.datetime object
>>> pd = pandas.Timestamp('2016-10-30 01:15').to_pydatetime()
>>> pd
datetime.datetime(2016, 10, 30, 1, 15)
>>> tzinfo.localize(pd, is_dst=True)
datetime.datetime(2016, 10, 30, 1, 15, tzinfo=<DstTzInfo 'CET' CEST+2:00:00 DST>)
>>>
>>> # Error in Pandas
>>> pd = pandas.Timestamp('2016-10-30 01:15')
>>> pd
Timestamp('2016-10-30 01:15:00')
>>> tzinfo.localize(pd, is_dst=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../envs/pandas/lib/python3.6/site-packages/pytz/tzinfo.py", line 314, in localize
loc_dt = tzinfo.normalize(dt.replace(tzinfo=tzinfo))
File ".../envs/pandas/lib/python3.6/site-packages/pytz/tzinfo.py", line 242, in normalize
return self.fromutc(dt)
File ".../envs/pandas/lib/python3.6/site-packages/pytz/tzinfo.py", line 187, in fromutc
return (dt + inf[0]).replace(tzinfo=self._tzinfos[inf])
File "pandas/_libs/tslib.pyx", line 735, in pandas._libs.tslib.Timestamp.replace (pandas/_libs/tslib.c:14931)
value = tz_localize_to_utc(np.array([value], dtype='i8'), _tzinfo,
File "pandas/_libs/tslib.pyx", line 4582, in pandas._libs.tslib.tz_localize_to_utc (pandas/_libs/tslib.c:77718)
raise pytz.AmbiguousTimeError(
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from Timestamp('2016-10-30 02:15:00'), try using the 'ambigu
ous' argument This is weired as 01:15 is actually not ambiguous (02:15 would be). I guess the problem arises because we convert from our destination tz to UTC in In the old version (without the fix) there would be no error but a wrong result (1h offset). |
If a Timestamp has a tzinfo (e.g., UTC or CET), If a Timestamp does not have a tzinfo, |
@sscherfke not sure what you are asking. |
I finally found a solution that works. All tests in diff --git a/pandas/_libs/tslib.pyx b/pandas/_libs/tslib.pyx
index c471d46..c418059 100644
--- a/pandas/_libs/tslib.pyx
+++ b/pandas/_libs/tslib.pyx
@@ -685,14 +685,16 @@ class Timestamp(_Timestamp):
cdef:
pandas_datetimestruct dts
int64_t value
- object _tzinfo, result, k, v
+ object _tzinfo, result, k, v, ts_input
_TSObject ts
# set to naive if needed
_tzinfo = self.tzinfo
value = self.value
if _tzinfo is not None:
- value = tz_convert_single(value, 'UTC', _tzinfo)
+ value_tz = tz_convert_single(value, _tzinfo, 'UTC')
+ offset = value - value_tz
+ value += offset
# setup components
pandas_datetime_to_datetimestruct(value, PANDAS_FR_ns, &dts)
@@ -726,16 +728,14 @@ class Timestamp(_Timestamp):
_tzinfo = tzinfo
# reconstruct & check bounds
- value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts)
+ ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,
+ dts.sec, dts.us, tzinfo=_tzinfo)
+ ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0)
+ value = ts.value + (dts.ps // 1000)
if value != NPY_NAT:
_check_dts_bounds(&dts)
- # set tz if needed
- if _tzinfo is not None:
- value = tz_convert_single(value, _tzinfo, 'UTC')
-
- result = create_timestamp_from_ts(value, dts, _tzinfo, self.freq)
- return result
+ return create_timestamp_from_ts(value, dts, _tzinfo, self.freq)
def isoformat(self, sep='T'):
base = super(_Timestamp, self).isoformat(sep=sep)
diff --git a/pandas/tests/tseries/test_timezones.py b/pandas/tests/tseries/test_timezones.py
index 06b6bbb..08b8040 100644
--- a/pandas/tests/tseries/test_timezones.py
+++ b/pandas/tests/tseries/test_timezones.py
@@ -1280,6 +1280,25 @@ class TestTimeZones(tm.TestCase):
self.assertEqual(result_pytz.to_pydatetime().tzname(),
result_dateutil.to_pydatetime().tzname())
+ def test_tzreplace_issue_15683(self):
+ """Regression test for issue 15683."""
+ dt = datetime(2016, 3, 27, 1)
+ tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
+
+ result_dt = dt.replace(tzinfo=tzinfo)
+ result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
+
+ self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
+ self.assertEqual(result_dt, result_pd)
+ self.assertEqual(result_dt, result_pd.to_pydatetime())
+
+ result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
+ result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
+
+ self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
+ self.assertEqual(result_dt, result_pd)
+ self.assertEqual(result_dt, result_pd.to_pydatetime())
+
def test_index_equals_with_tz(self):
left = date_range('1/1/2011', periods=100, freq='H', tz='utc')
right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern') |
ok if u want to put a PR |
Adjustments to coding guidelines Only perform timestamp() check if meth is available. Added sv
Code Sample, a copy-pastable example if possible
Problem description
The above code runs with Pandas 0.18 but raises the following exception with Pandas 0.19:
Is this an intentional API breakage of 0.19 or a bug?
Expected Output
Output of
pd.show_versions()
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.12-100.fc24.x86_64+debug
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.3.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: 1.5.0
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: