Skip to content

PERF: infer_datetime_format without padding #11142 #11146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 20, 2015

Conversation

chris-b1
Copy link
Contributor

Closes #11142

(('month',), '%b'),
(('month',), '%m'),
(('year', 'month', 'day'), '%Y%m%d', 0),
(('year',), '%Y', 4),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users expect %y when they pass 2 digits. Should this be zfilled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I hadn't planned on adding %y because it adds more ambiguity, e.g. '1-1-1'. But I suppose it would be possible to follow the dateutil semantics (%m-%d-%y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, what I meant is to keep %Y without padding, without adding %y as this is ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following, could you show an example date that would be effected?
I see what you're saying. I couldn't find any cases where it causes a problem, but I'll remove the padding on the year.

@chris-b1
Copy link
Contributor Author

@sinhrks pushed changes for your notes. The perf impact on _guess_datetime_format is negligible because it is only ever called once. Nothing showed up in asv, here are a couple timings.

In [1]: s, l = pd.date_range('2014-1-1', periods=10), pd.date_range('2014-1-1', periods=100000)
In [2]: s = s.strftime('%Y-%m-%d')
In [3]: l = l.strftime('%Y-%m-%d')

# master
In [4]: %timeit pd.to_datetime(l, infer_datetime_format=True)
10 loops, best of 3: 30.3 ms per loop

In [5]: %timeit pd.to_datetime(s, infer_datetime_format=True)
1000 loops, best of 3: 300 µs per loop

# PR
In [5]: %timeit pd.to_datetime(l, infer_datetime_format=True)
10 loops, best of 3: 30.3 ms per loop

In [6]: %timeit pd.to_datetime(s, infer_datetime_format=True)
1000 loops, best of 3: 308 µs per loop

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance labels Sep 19, 2015
@jreback jreback added this to the 0.17.0 milestone Sep 19, 2015
jreback added a commit that referenced this pull request Sep 20, 2015
PERF: infer_datetime_format without padding #11142
@jreback jreback merged commit 9e7dc17 into pandas-dev:master Sep 20, 2015
@jreback
Copy link
Contributor

jreback commented Sep 20, 2015

thanks!

@chris-b1 chris-b1 deleted the inference-padding branch September 20, 2015 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants