-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: add exact kw to to_datetime to enable faster regex format parsing #8904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
yeh, changing back to Acutally other use-cases for this
|
8c7f505
to
cc8a3de
Compare
1 similar comment
@@ -195,6 +195,9 @@ def to_datetime(arg, errors='ignore', dayfirst=False, utc=None, box=True, | |||
If True returns a DatetimeIndex, if False returns ndarray of values | |||
format : string, default None | |||
strftime to parse time, eg "%d/%m/%Y" | |||
exact : boolean, True by default | |||
if True, require an exact format match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add punctuation and capitals here? (when rendered html, sphinx does not keep the newline, so the visual formatting does not work there)
Does this also work for example what now fails (reading in nanoseconds when specifying a format):
|
@@ -79,6 +79,8 @@ Performance | |||
~~~~~~~~~~~ | |||
- Reduce memory usage when skiprows is an integer in read_csv (:issue:`8681`) | |||
|
|||
- Performance boost for ``to_datetime`` conversions with a passed ``format=`` kw, and the new ``exact=False`` for kw with non-exact matching (:issue:`8904`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the new exact keyword as a separate entry in the enhancements section?
for the keyword name: Out of curiousity, where does the speed-up come from? The extra cython type declerations? |
@jorisvandenbossche the speedup comes from not have to do an |
ok with |
This works; is their an issue about this?
|
No, no issue I think, and using this keyword solves the error, but of course not the problem that the nanoseconds are not read in. But I don't know if we can do something about that, as this is defined python formatting strings where |
You don't actually need to use a formatting string. But we could easily change the 'f' to allow for nanoseconds I think. let me see. `I``
|
""" | ||
timeseries_with_format = Benchmark("to_datetime(s,format='%d%b%y',exact=False)", \ | ||
setup, start_date=datetime(2014, 11, 26)) | ||
timeseries_with_format_replace = Benchmark("to_datetime(s.str.replace(':\S+$',''),format='%d%b%y')", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe timeseries_with_format_exact
is a better name? (the replace does not really matter for what it is about)
@jreback yes, I know, the format string is not needed in this example because I used a default format, but in general that does not hold always |
opened an issue for the nanoseconds #8989 |
And I copied only part of the line. I meant |
2e165e2
to
5d972d0
Compare
@jorisvandenbossche ok on the kw? |
you mean |
ok, merging then. |
PERF: add exact kw to to_datetime to enable faster regex format parsing
closes #8989
closes #8903
Clearly the default is exact=True for back-compat
but allows for a match starting at the beginning (has always been like this),
but doesn't require the string to ONLY match the format, IOW, can be extra stuff after the match.
Avoids having to do a regex replace first.