Skip to content

Fix unrecognized 'Z' UTC designator #8832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.15.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ Bug Fixes
- Bug in `pd.infer_freq`/`DataFrame.inferred_freq` that prevented proper sub-daily frequency inference
when the index contained DST days (:issue:`8772`).
- Bug where index name was still used when plotting a series with ``use_index=False`` (:issue:`8558`).

- Bugs when trying to stack multiple columns, when some (or all)
of the level names are numbers (:issue:`8584`).
- Bug in ``MultiIndex`` where ``__contains__`` returns wrong result if index is
not lexically sorted or unique (:issue:`7724`)
- BUG CSV: fix problem with trailing whitespace in skipped rows, (:issue:`8679`), (:issue:`8661`)
- Regression in ``Timestamp`` does not parse 'Z' zone designator for UTC (:issue:`8771`)
11 changes: 8 additions & 3 deletions pandas/src/datetime/np_datetime_strings.c
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,8 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
* to be cast to the 'unit' parameter.
*
* 'out' gets filled with the parsed date-time.
* 'out_local' gets whether returned value contains timezone. 0 for UTC, 1 for local time.
* 'out_local' gets set to 1 if the parsed time contains timezone,
* to 0 otherwise.
* 'out_tzoffset' gets set to timezone offset by minutes
* if the parsed time was in local time,
* to 0 otherwise. The values 'now' and 'today' don't get counted
Expand Down Expand Up @@ -785,11 +786,15 @@ parse_iso_8601_datetime(char *str, int len,

/* UTC specifier */
if (*substr == 'Z') {
/* "Z" means not local */
/* "Z" should be equivalent to tz offset "+00:00" */
if (out_local != NULL) {
*out_local = 0;
*out_local = 1;
}

if (out_tzoffset != NULL) {
*out_tzoffset = 0;
}

if (sublen == 1) {
goto finish;
}
Expand Down
5 changes: 4 additions & 1 deletion pandas/tseries/tests/test_tslib.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import datetime

from pandas.core.api import Timestamp, Series
from pandas.tslib import period_asfreq, period_ordinal
from pandas.tslib import period_asfreq, period_ordinal, get_timezone
from pandas.tseries.index import date_range
from pandas.tseries.frequencies import get_freq
import pandas.tseries.offsets as offsets
Expand Down Expand Up @@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

def test_utc_z_designator(self):
self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah...you took out the tests? are they just not legit/undefined? (its ok, just trying to see)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed since checking ISO 8601 and RFC 3339 I don't think ...Z0 ...Z00 are actually legit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then they should raise, yes? so maybe the parsing actually needs to be a bit more strict, e.g. if you see a Z, then it must be the end of the string OR have a full-format offset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree and as I understand it this is exactly the behavior of the parser in np_datetime_strings.c but there is currently a fallback (dateutil's parser) if anything goes wrong and this is what we are actually testing with ...Z0 and ...Z00. I think this fallback is there for a reason so I did not want to mess with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, so the fallback was producing an incorrect result (e.g. dateutil). not surprised. Ok so that we KNOW that these 2 cases are not legit. Is it easy to catch these 2 cases? (eg. either you have Z then end-of-string, or Z legit value?) (maybe just easy to simply test if 0/00 are present after Z and raise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after line 800 where the sublen is 1 you are done (e.g. you got a Z only). Otherwise it passes thru. I would then check the next 1 and then 2 characters (or if you can't its an error ) if they are 0 then its an error, otherwise pass thru.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not pass thru, check line 880, there is a goto parse_error if anything remains after Z is encountered. parse_error raises a PyExc_ValueError but this ValueError is catched in tslib.pyx/convert_to_tsobject(), hence the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so check there. These are cases that should be cause then as errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean adding checks in convert_to_tsobject? Sorry if I am a bit nitpicky
here but I think all those checks should be the responsibility of the
parser only and should not be spread across modules. That's why I am a bit
reluctant to do it. I think the cleanest way to do it currently is to allow
the internal parser to somehow bypass the fallback and be able to raise
errors of its own: if I am correct currently the internal parser raises the
same kind of exception (ValueError) for two different things: legit but
unsupported (not implemented) iso 8601 strings on one side and real ill
formed datetime strings on the other.

Le dimanche 16 novembre 2014, jreback [email protected] a écrit :

In pandas/tseries/tests/test_tslib.py:

@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

  • def test_utc_z_designator(self):
  •    self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')
    

ok, so check there. These are cases that should be cause then as errors.


Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8832/files#r20410546.

Benoît.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where you got the idea that I was suggesting you modify convert_to_tsobject. That's very complicated and not warranted. I suggest that you make a simple modification in the same file you are currently working, np_datetime_strings.c to handle the case of reading 1 and 2 characters past the Z.

I would maybe make out_local==-1 an error, then in the except block, if out_local is -1 you can simply re-raise the ValueError (in convert_to_tsobject)

or you can put a specific message in the ValueError which is checked in convert_to_tsobject. You are right, the problem is a ValueError actually means 2 things here. Need to disambiguate them (or allow your change, and simply disregard the 'error' cases that we had before).



class TestDatetimeParsingWrappers(tm.TestCase):
def test_does_not_convert_mixed_integer(self):
Expand Down