Skip to content

BUG: Fixed to_datetime for array with both Tz-aware datetimes and NaTs #5962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2014

Conversation

cancan101
Copy link
Contributor

Closes #5961

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

pls do a perf check on this
iIIRC a similar bit of code recently had a big git on this

solved by doing

util._checknull or is NaT

full checknull not needed here because u r already know it's not a float

@cancan101
Copy link
Contributor Author

Okay. I can make that change and then run vbench. FWIW, there are (IMHO) too many version of checknull which leads to errors like this. They are also very poorly documented.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

they certainly could be named better !

u know how it does though

u need slightly different versions of a function but code doesn't get cleaned up

can change as long as it doesn't affect t perf (sometimes it is better in cython to manually inline thing FYI ) - though harder 2 read and harder to maintain

@cancan101
Copy link
Contributor Author

frame_xs_row                                 |   0.0490 |   0.0476 |   1.0301 |
sort_level_zero                              |   4.5211 |   4.3830 |   1.0315 |
timeseries_asof_single                       |   0.0320 |   0.0310 |   1.0333 |
strings_lstrip                               |   2.8183 |   2.7260 |   1.0339 |
reshape_unstack_simple                       |   3.3143 |   3.2057 |   1.0339 |
strings_repeat                               |   3.5940 |   3.4694 |   1.0359 |
timeseries_to_datetime_YYYYMMDD              |   7.7993 |   7.5119 |   1.0383 |
frame_boolean_row_select                     |   0.2743 |   0.2640 |   1.0391 |
timestamp_ops_diff2                          |  20.5154 |  19.7160 |   1.0405 |
plot_timeseries_period                       |  66.7570 |  63.9970 |   1.0431 |
strings_join_split                           |  25.4734 |  24.3703 |   1.0453 |
timeseries_asof                              |   7.1043 |   6.7916 |   1.0460 |
stats_rank_average                           |  27.6279 |  26.4047 |   1.0463 |
strings_slice                                |   2.2720 |   2.1567 |   1.0535 |
frame_iteritems_cached                       |   0.5523 |   0.5236 |   1.0548 |
dti_reset_index                              |   0.2627 |   0.2484 |   1.0576 |
join_dataframe_integer_2key                  |   5.9144 |   5.5893 |   1.0582 |
timeseries_1min_5min_mean                    |   0.6526 |   0.6157 |   1.0600 |
join_dataframe_index_single_key_small        |   6.1546 |   5.7720 |   1.0663 |
series_constructor_ndarray                   |   0.0149 |   0.0140 |   1.0682 |
frame_mult                                   |   5.8010 |   5.4300 |   1.0683 |
frame_html_repr_trunc_si                     |  31.9476 |  29.3944 |   1.0869 |
timeseries_period_downsample_mean            |  10.3723 |   9.5270 |   1.0887 |
stats_rolling_mean                           |   1.0570 |   0.9584 |   1.1029 |
frame_add                                    |   6.0227 |   5.4283 |   1.1095 |
stat_ops_level_series_sum                    |   2.5946 |   2.3270 |   1.1150 |
timestamp_series_compare                     |   2.3543 |   2.0733 |   1.1355 |
dti_reset_index_tz                           |  10.1180 |   8.8783 |   1.1396 |
timestamp_ops_diff1                          |   7.9810 |   6.7777 |   1.1775 |
eval_frame_mult_python_one_thread            |  20.3840 |  16.5833 |   1.2292 |
datetimeindex_unique                         |   0.2013 |   0.1214 |   1.6588 |
timeseries_asof_nan                          |  10.7890 |   6.3873 |   1.6891 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [31537d1] : BUG: Fixed ``to_datetime`` for array with both Tz-aware datetimes and
``NaT``s  (GH5961)
Base   [99910a3] : Merge pull request #5944 from unutbu/str-extract

ENH: Improve perf of str_extract

@cancan101
Copy link
Contributor Author

and run again (same commit):

timeseries_asof_single                       |   0.0316 |   0.0307 |   1.0311 |
frame_getitem_single_column                  |  18.8413 |  18.2431 |   1.0328 |
reindex_fillna_pad                           |   0.4267 |   0.4127 |   1.0339 |
stat_ops_level_series_sum                    |   2.4223 |   2.3400 |   1.0352 |
timeseries_timestamp_tzinfo_cons             |   0.0117 |   0.0113 |   1.0352 |
frame_fancy_lookup                           |   2.8434 |   2.7457 |   1.0356 |
strings_get                                  |   2.4127 |   2.3274 |   1.0367 |
datetimeindex_unique                         |   0.1171 |   0.1127 |   1.0388 |
match_strings                                |   0.4327 |   0.4160 |   1.0399 |
strings_contains_few_noregex                 |   2.0041 |   1.9246 |   1.0413 |
series_drop_duplicates_string                |   0.4907 |   0.4690 |   1.0461 |
timeseries_to_datetime_YYYYMMDD              |   7.9597 |   7.5970 |   1.0477 |
datetimeindex_infer_dst                      |   3.6183 |   3.4370 |   1.0527 |
datetime_index_union                         |   0.0590 |   0.0560 |   1.0539 |
lib_fast_zip                                 |   8.1453 |   7.7287 |   1.0539 |
dti_reset_index_tz                           |   9.1577 |   8.6746 |   1.0557 |
frame_reindex_columns                        |   0.4126 |   0.3904 |   1.0570 |
packers_write_csv                            | 444.5443 | 420.1533 |   1.0581 |
append_frame_single_homogenous               |   0.2710 |   0.2553 |   1.0616 |
concat_small_frames                          |  12.6719 |  11.8536 |   1.0690 |
datetimeindex_normalize                      |   2.7840 |   2.5953 |   1.0727 |
timeseries_slice_minutely                    |   0.0720 |   0.0670 |   1.0747 |
read_parse_dates_iso8601                     |   1.2953 |   1.2027 |   1.0771 |
eval_frame_chained_cmp_python_one_thread     |  28.6223 |  26.5460 |   1.0782 |
reindex_multiindex                           |   1.4643 |   1.3556 |   1.0801 |
frame_drop_dup_inplace                       |   2.8343 |   2.5744 |   1.1010 |
frame_to_csv_date_formatting                 |  13.4597 |  12.1140 |   1.1111 |
strings_title                                |   3.4863 |   3.1323 |   1.1130 |
eval_frame_mult_one_thread                   |  12.8667 |  11.1853 |   1.1503 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [31537d1] : BUG: Fixed ``to_datetime`` for array with both Tz-aware datetimes and
``NaT``s  (GH5961)
Base   [99910a3] : Merge pull request #5944 from unutbu/str-extract

ENH: Improve perf of str_extract

jreback added a commit that referenced this pull request Jan 16, 2014
BUG: Fixed to_datetime for array with both Tz-aware datetimes and NaTs
@jreback jreback merged commit 9e21718 into pandas-dev:master Jan 16, 2014
@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_datetime broken for Tz-aware datetimes and NaTs in same array
2 participants