PERF: consider changing default of infer_datetime_format to True #12061

chris-b1 · 2016-01-16T16:17:31Z

xref #12060

jreback · 2016-01-16T19:13:30Z

if someone would run the benchmarks with this turned on we could prove whether this affects anything.....

jreback · 2016-02-09T22:26:54Z

@chris-b1 can you do a PR for this?

and see if any of the asv timings change?

chris-b1 · 2016-02-10T00:50:40Z

@jreback - here's the general tradeoff - adds some overhead (large relative, small absolute) to parsing iso8601, and of course speeds up non-iso8601 a lot. Does that seem worth it?

In [35]: s = pd.date_range('1900-1-1', periods=1000).strftime('%Y-%m-%d')

In [36]: %timeit pd.to_datetime(s)
1000 loops, best of 3: 316 µs per loop

In [37]: %timeit pd.to_datetime(s, infer_datetime_format=True)
1000 loops, best of 3: 642 µs per loop

In [38]: s_noniso = pd.date_range('1900-1-1', periods=1000).strftime('%m/%d/%Y')

In [42]: %timeit pd.to_datetime(s_noniso)
10 loops, best of 3: 79.9 ms per loop

In [43]: %timeit pd.to_datetime(s_noniso, infer_datetime_format=True)
100 loops, best of 3: 3.82 ms per loop

jreback · 2016-02-10T00:56:59Z

so is the iso parsing overhead just a couple of function calls? can u update with 10x (or 100) size as well to see how this scales

chris-b1 · 2016-02-10T01:51:08Z

below has this same test at different sizes. diff is increase (decrease) in time, ratio is (infer) / (don't infer).

Infer only looks at the first element, so not completely sure why the overhead isn't flat.

n	iso_diff	iso_ratio	noniso_diff	noniso_ratio
1	0.000282747	4.23485	0.00024401	2.40649
100	0.000324795	4.44211	-0.00697929	0.0946573
1000	0.000332079	2.02872	-0.0772536	0.0462967
100000	0.00309632	1.11218	-7.62098	0.0440372
`1000000`	0.0303454	1.10112	-82.5801	0.0421022

jreback · 2016-02-10T18:59:18Z

would it be dumb to change the default to None, then set it based on the length of the passed array? e.g. say < =100 -> False, > 100 -> True. (of course if its explicity passed then just use that).

This way you get the perf benefit without overhead on smaller samples.

jreback · 2016-02-15T20:24:05Z

@chris-b1 so what do you think?

chris-b1 · 2016-02-15T20:36:28Z

It didn't seem unreasonable - but I got to thinking - there are actually cases with ambigous dates where False and True don't do the same thing (e.g. below). So now I'm not sure what to do.

In [3]: pd.to_datetime(['24/1/2015', '1/2/2015'], infer_datetime_format=True)
Out[3]: DatetimeIndex(['2015-01-24', '2015-02-01'], dtype='datetime64[ns]', freq=None)

In [4]: pd.to_datetime(['24/1/2015', '1/2/2015'])
Out[4]: DatetimeIndex(['2015-01-24', '2015-01-02'], dtype='datetime64[ns]', freq=None)

jreback · 2016-02-15T20:42:52Z

ahh, so the yearfirst & dayfirst may not be respected. So I think these need to be of 'standard' format (e.g. dayfirst=False and yearfirst=False) or we should rase if infer_datetime_format=True as these are incompatible (or just make it False)

MarcoGorelli · 2022-12-19T08:59:38Z

as of PDEP4, this parameter is deprecated and a stricter version if the default - closing then

jreback added Datetime Datetime data dtype API Design IO CSV read_csv, to_csv labels Jan 16, 2016

jreback added this to the 0.18.0 milestone Jan 16, 2016

jreback mentioned this issue Jan 27, 2016

better docs for read_csv() argument infer_datetime_format #12152

Closed

jreback modified the milestones: 0.18.1, 0.18.0 Feb 27, 2016

jorisvandenbossche mentioned this issue Mar 10, 2016

BUG: Unexpected pd.to_datetime result #12583

Closed

jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016

jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 15, 2016

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

mroeschke added Performance Memory or execution speed performance and removed IO CSV read_csv, to_csv labels Mar 31, 2020

mroeschke removed the API Design label Apr 21, 2021

mroeschke added the Enhancement label Apr 21, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

MarcoGorelli closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: consider changing default of infer_datetime_format to True #12061

PERF: consider changing default of infer_datetime_format to True #12061

chris-b1 commented Jan 16, 2016

jreback commented Jan 16, 2016

jreback commented Feb 9, 2016

chris-b1 commented Feb 10, 2016

jreback commented Feb 10, 2016

chris-b1 commented Feb 10, 2016

jreback commented Feb 10, 2016

jreback commented Feb 15, 2016

chris-b1 commented Feb 15, 2016

jreback commented Feb 15, 2016

MarcoGorelli commented Dec 19, 2022

PERF: consider changing default of infer_datetime_format to True #12061

PERF: consider changing default of infer_datetime_format to True #12061

Comments

chris-b1 commented Jan 16, 2016

jreback commented Jan 16, 2016

jreback commented Feb 9, 2016

chris-b1 commented Feb 10, 2016

jreback commented Feb 10, 2016

chris-b1 commented Feb 10, 2016

jreback commented Feb 10, 2016

jreback commented Feb 15, 2016

chris-b1 commented Feb 15, 2016

jreback commented Feb 15, 2016

MarcoGorelli commented Dec 19, 2022