-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Correctly localize naive datetime strings with Series and datetimetztype (#17415) #17603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @mroeschke! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 12, 2018 at 05:11 Hours UTC |
pandas/core/dtypes/cast.py
Outdated
) | ||
is_dt_string = is_string_dtype(value) | ||
value = to_datetime(value, errors=errors) | ||
if is_dt_string: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm adding this incremental logic is error prone. I think this should simply be converted higher up. can you see if this can be simplied that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is pretty high up the stack already.
The Series constructor calls _sanitize_array
which then calls maybe_cast_to_datetime
(which I am editing here) fairly early. The line I am editing is the first instance where arrays of numbers (and strings) are converted with to_datetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try passing utc=True to to_datetime
this logic is too specific here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it might be that
to_datetime(...., tz=dtype.tz) might be the correct idiom here (another PR implements this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah #17413 will make this easier.
However I think there will need to be logic to determine if the incoming data is naive or already localized to UTC. The code comments that were already here note that (numeric) data is already UTC while string data (what I am trying to fix here) may be naive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is solving this here is wrong
this is a bug in to_datetime itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing utc=True to to_datetime will make this work (then just converting to the dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try passing utc=True
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utc=True
does not work here because the timestamp string should be localized to the specified dtype tz directly intead of UTC in between.
# with utc=True
In [5]: ts_naive = '2013-01-01 00:00:00'
...: result = Series([ts_naive], dtype='datetime64[ns, CET]')
...: expected = Series([pd.Timestamp(ts_naive, tz='CET')])
In [6]: result
Out[6]:
0 2013-01-01 01:00:00+01:00
dtype: datetime64[ns, CET]
In [7]: expected
Out[7]:
0 2013-01-01 00:00:00+01:00
dtype: datetime64[ns, CET]
b2a3b0d
to
8910a54
Compare
Codecov Report
@@ Coverage Diff @@
## master #17603 +/- ##
==========================================
- Coverage 91.19% 91.18% -0.02%
==========================================
Files 163 163
Lines 49625 49629 +4
==========================================
- Hits 45257 45252 -5
- Misses 4368 4377 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17603 +/- ##
==========================================
+ Coverage 91.52% 91.54% +0.02%
==========================================
Files 148 147 -1
Lines 48778 48777 -1
==========================================
+ Hits 44642 44655 +13
+ Misses 4136 4122 -14
Continue to review full report at Codecov.
|
pls rebase and move note to 0.21.1 |
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -488,6 +488,7 @@ Conversion | |||
- Bug in :func:`Series.fillna` returns frame when ``inplace=True`` and ``value`` is dict (:issue:`16156`) | |||
- Bug in :attr:`Timestamp.weekday_name` returning a UTC-based weekday name when localized to a timezone (:issue:`17354`) | |||
- Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.21.1
let me have a look |
can you rebase |
pandas/core/dtypes/cast.py
Outdated
) | ||
is_dt_string = is_string_dtype(value) | ||
value = to_datetime(value, errors=errors) | ||
if is_dt_string: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try passing utc=True
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I guess this is ok, until we have tz=
in pd.to_datetime
this must be done here. some comments.
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -65,6 +65,7 @@ Conversion | |||
- Bug in :meth:`IntervalIndex.copy` when copying and ``IntervalIndex`` with non-default ``closed`` (:issue:`18339`) | |||
- Bug in :func:`DataFrame.to_dict` where columns of datetime that are tz-aware were not converted to required arrays when used with ``orient='records'``, raising``TypeError` (:issue:`18372`) | |||
- Bug in :class:`DateTimeIndex` and :meth:`date_range` where mismatching tz-aware ``start`` and ``end`` timezones would not raise an err if ``end.tzinfo`` is None (:issue:`18431`) | |||
- Bug in localization of a naive, datetime string in a ``Series`` constructor with a ``datetime[ns, timezone]`` dtype (:issue:`174151`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.22
@@ -574,6 +574,12 @@ def test_constructor_with_datetime_tz(self): | |||
expected = Series(pd.DatetimeIndex(['NaT', 'NaT'], tz='US/Eastern')) | |||
assert_series_equal(s, expected) | |||
|
|||
# GH 17415: With naive string | |||
ts_naive = '2013-01-01 00:00:00' | |||
result = Series([ts_naive], dtype='datetime64[ns, CET]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a separate test
parametrize over adding a NaT, nan, None here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [3]: Series([pd.NaT], dtype='datetime64[ns, CET]') # same for np.nan and None
Out[3]:
0 NaT
dtype: datetime64[ns, CET]
In [4]: Series(pd.Timestamp(pd.NaT, tz='CET')) # same for np.nan and None
Out[4]:
0 NaT
dtype: datetime64[ns]
So I figure both of these should be equivalent and In[4]
should be the correct behavior?
if you can change this to 0.22 for the whatsnew and rebase can have a look |
Rebased and moved the whatsnew to v0.22. Still curious how to handle the following situation:
|
@mroeschke a way you can handle is like this
so in a try/except if you succeed you have a valid dtype, otherwise you can pass it directly to the constructor. |
@jreback After some digging, the reason why:
occurs is that this happens up the stack:
Curious to get your opinion if this should raise instead. |
@mroeschke this is pretty consistent. NaT can be part of a tz-aware Index/Series. It doesn't have a tz itself though. |
Makes sense, thanks for confirming. |
I realized I was concerned about a non-issue since:
makes sense then
makes sense as well. All green. |
…type (pandas-dev#17415) Add whatsnew Fix lint error Rebase and move whatsnew note Move whatsnew and add tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments. otherwise lgtm. ping on green.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -380,7 +380,7 @@ Conversion | |||
- Fixed bug where comparing :class:`DatetimeIndex` failed to raise ``TypeError`` when attempting to compare timezone-aware and timezone-naive datetimelike objects (:issue:`18162`) | |||
- Bug in :class:`DatetimeIndex` where the repr was not showing high-precision time values at the end of a day (e.g., 23:59:59.999999999) (:issue:`19030`) | |||
- Bug where dividing a scalar timedelta-like object with :class:`TimedeltaIndex` performed the reciprocal operation (:issue:`19125`) | |||
- | |||
- Bug in localization of a naive, datetime string in a ``Series`` constructor with a ``datetime[ns, timezone]`` dtype (:issue:`174151`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime64[ns, tz]
pandas/core/dtypes/cast.py
Outdated
.tz_localize('UTC') | ||
.tz_convert(dtype.tz) | ||
) | ||
# This block can be simplified once PR #17413 is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to the comment what that PR is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an xref (with this PR) to that issue as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to #13712 here (the issue and not the PR which was closed)
thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff