Skip to content

BUG: Timestamp cannot parse nanosecond from string #7907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 11, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Aug 3, 2014

Fixes 2 problems related to Timestamp string parsing:

# Result after the fix

# If string contains offset, it will be parsed using fixed timezone offset. Following results in 2013-11-01 05:00:00 in UTC (There is some existing tests checking this behaviour).
repr(pd.Timestamp('2013-11-01 00:00:00-0500'))
# Timestamp('2013-11-01 00:00:00-0500', tz='pytz.FixedOffset(-300)')

# If tz is specified simultaneously, it should convert the timezone. 
repr(pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago'))
# Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')

repr(pd.Timestamp('2013-11-01 00:00:00-0500', tz='Asia/Tokyo'))
# Timestamp('2013-11-01 14:00:00+0900', tz='Asia/Tokyo')
# Result after the fix

repr(pd.Timestamp('2001-01-01 00:00:00.000000005'))
# Timestamp('2001-01-01 00:00:00.000000005')

repr(pd.Timestamp('2001-01-01 00:00:00.000000005', tz='US/Eastern'))
# Timestamp('2001-01-01 00:00:00.000000005-0500', tz='US/Eastern')

repr(pd.Timestamp('2001-01-01 00:00:00.000001234+09:00'))
# Timestamp('2001-01-01 00:00:00.000001234+0900', tz='pytz.FixedOffset(540)')

repr(pd.Timestamp('2001-01-01 00:00:00.000001234+09:00', tz='Asia/Tokyo'))
# Timestamp('2001-01-01 00:00:00.000001234+0900', tz='Asia/Tokyo')

# Because non ISO 8601 format is parsed by dateutil, nanosecond will lost (no change)
repr(pd.Timestamp('01-01-01 00:00:00.000000001'))
# Timestamp('2001-01-01 00:00:00')

repr(pd.Timestamp('01-01-01 00:00:00.000000001+9:00'))
# Timestamp('2001-01-01 00:00:00+0900', tz='tzoffset(None, 32400)')

CC @cyber42 @ischwabacher @rockg @adamgreenhall

obj.value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &obj.dts)
_check_dts_bounds(&obj.dts)
if out_tzoffset != -1:
obj.tzinfo = pytz.FixedOffset(out_tzoffset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is preferable for fixed offset, pytz.FixedOffset or dateutil.tz.tzoffset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a pytz ; dateutil must be explicitly specified (in theory could have an option for this but will defer that to the future - you can create an issue if u would like - pls copy @dbew on it)

@jreback jreback added this to the 0.15.0 milestone Aug 3, 2014
@jreback
Copy link
Contributor

jreback commented Aug 3, 2014

@sinhrks can you run a perf test and post if anything amiss (I don't think so, but just to be sure)

@sinhrks
Copy link
Member Author

sinhrks commented Aug 8, 2014

As the original one gets little slower, modified the logic a little. Following is a perf before/after the modification. Looks varied, but no test gets constantly worse.

Perf with original logic in this PR (for ref)

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
read_parse_dates_iso8601                     |   3.5694 |   2.3197 |   1.5387 |

1st run with current logic

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
read_parse_dates_iso8601                     |   2.3770 |   2.3616 |   1.0065 |
...
packers_read_pickle                          | 306.5277 | 254.1726 |   1.2060 |
frame_apply_lambda_mean                      |   9.4993 |   7.8301 |   1.2132 |
write_store_table_dc                         | 246.1410 | 197.7520 |   1.2447 |
frame_dtypes                                 |   0.2213 |   0.1760 |   1.2579 |
reindex_daterange_pad                        |   2.0043 |   1.5914 |   1.2595 |
read_store_table_mixed                       |  21.8923 |  17.3550 |   1.2614 |
frame_mask_bools                             |  33.6537 |  26.1207 |   1.2884 |
frame_get_numeric_data                       |   0.2956 |   0.2290 |   1.2908 |
indexing_dataframe_boolean_rows_object       |   1.3426 |   0.9793 |   1.3710 |
index_float64_div                            |   8.8797 |   6.3930 |   1.3890 |
frame_dropna_axis1_all                       | 426.2613 | 299.9337 |   1.4212 |

2nd run

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
read_parse_dates_iso8601                     |   2.3813 |   2.3716 |   1.0041 |
...
strings_match                                |  12.7503 |  10.5516 |   1.2084 |
timeseries_1min_5min_mean                    |   1.5577 |   1.2853 |   1.2119 |
reindex_daterange_backfill                   |   1.8867 |   1.5500 |   1.2172 |
frame_ctor_dtindex_CBMonthEndx1              |   7.7756 |   6.3493 |   1.2246 |
eval_frame_add_all_threads                   |  35.2084 |  28.6183 |   1.2303 |
frame_ctor_dtindex_CustomBusinessDayx1       |   3.8170 |   3.0283 |   1.2604 |
eval_frame_mult_one_thread                   |  37.0866 |  29.2060 |   1.2698 |
eval_frame_chained_cmp_python_one_thread     | 135.0936 | 104.4240 |   1.2937 |
query_with_boolean_selection                 |  74.9604 |  57.4800 |   1.3041 |
dtindex_from_series_ctor                     |   0.0354 |   0.0270 |   1.3088 |
strings_center                               |   9.6043 |   7.3230 |   1.3115 |
stat_ops_frame_sum_int_axis_1                |   2.3720 |   1.7384 |   1.3645 |
strings_pad                                  |  10.5731 |   7.6413 |   1.3837 |
groupby_multi_different_functions            |  38.6817 |  25.0644 |   1.5433 |

@jreback
Copy link
Contributor

jreback commented Aug 8, 2014

lookw ok to me

cc @ischwabacher
cc @rockg

can you give a once over pls

@rockg
Copy link
Contributor

rockg commented Aug 9, 2014

This is kind of difficult to go through. The tests look okay to me...cannot comment on the specific changes, though.

@@ -399,7 +401,8 @@ Bug Fixes
- Bug in ``Series.str.cat`` with an index which was filtered as to not include the first item (:issue:`7857`)



- Bug in ``Timestamp`` cannot parse ``nanosecond`` from string (:issue:`7878`)
- Bug in ``Timestamp`` with string offset and ``tz`` results in incorrect (:issue:`7833`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean to say just incorrect, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be more specific, or grammatically wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just a minor grammar point

just say incorrect (not in incorrect)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@jreback
Copy link
Contributor

jreback commented Aug 10, 2014

ok ping in green

@sinhrks
Copy link
Member Author

sinhrks commented Aug 11, 2014

@jreback Yep, now green.

jreback added a commit that referenced this pull request Aug 11, 2014
BUG: Timestamp cannot parse nanosecond from string
@jreback jreback merged commit fa63e76 into pandas-dev:master Aug 11, 2014
@jreback
Copy link
Contributor

jreback commented Aug 11, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
4 participants