Skip to content

BUG: Timestamp retains frequency of input Timestamps #23503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 7, 2018

Conversation

mroeschke
Copy link
Member

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

@jbrockmendel
Copy link
Member

LGTM.

Do any of the other constructors lose freq? (Eg to_datetime)

@mroeschke
Copy link
Member Author

to_datetime looks good on master:

In [1]: pd.to_datetime(pd.Timestamp('2010-08-08', freq='D')).freq
Out[1]: <Day>

DatetimeIndex looks like it drops the frequency (but I am not sure if it should be kept in this case?)

In [3]: pd.DatetimeIndex([pd.Timestamp('2010-08-08', freq='D')]).freq

@gfyoung gfyoung added Datetime Datetime data dtype Frequency DateOffsets labels Nov 5, 2018
@@ -731,8 +732,11 @@ class Timestamp(_Timestamp):
if ts.value == NPY_NAT:
return NaT

if is_string_object(freq):
if is_string_object(freq) or isinstance(freq, _BaseOffset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this as could have an adverse perf impact (this is a python object). use is_offset_object

@@ -733,6 +734,9 @@ class Timestamp(_Timestamp):

if is_string_object(freq):
freq = to_offset(freq)
elif not is_offset_object(freq):
# GH 22311: Try to extract the frequency of a given Timestamp input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we expect it to be in this case? Aside from None I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a Timestamp in which we want to extract its frequency and pass along.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you're referring to ts_input being a Timestamp. I'm asking what freq is in this case. More specifically, anything other than None seems like it should be invalid. Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry for misunderstanding. You're correct in that this is essentially swallowing any nonsense frequency and just returning None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freq should only ever be None, a offset string, or offset object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this silently pass if freq has invalid type? Couldn't we just say that freq = getattr(ts_input, 'freq', None) if freq is None else freq?

@codecov
Copy link

codecov bot commented Nov 6, 2018

Codecov Report

Merging #23503 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23503   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files         161      161           
  Lines       51207    51207           
=======================================
  Hits        47239    47239           
  Misses       3968     3968
Flag Coverage Δ
#multiple 90.63% <ø> (ø) ⬆️
#single 42.28% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 528527b...218e418. Read the comment docs.

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018
@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

lgtm. can you rebase and also do a quick perf check.

@mroeschke
Copy link
Member Author

Looks like there's a little hit (I think the asvs make it seem worse than reality), but imo not too bad

$ asv continuous -f 1.1 upstream/master HEAD -b ^groupby

       before           after         ratio
     [d0691e05]       [2dcb1fbb]
+     1.29±0.04μs       2.21±0.6μs     1.71  timestamp.TimestampConstruction.time_parse_iso8601_no_tz
+     1.27±0.03μs       2.07±0.2μs     1.62  timestamp.TimestampConstruction.time_fromordinal
+     1.63±0.04μs       2.29±0.1μs     1.41  timestamp.TimestampConstruction.time_parse_today
+     1.59±0.01μs      2.22±0.06μs     1.40  timestamp.TimestampConstruction.time_parse_now
+      1.56±0.1μs       2.08±0.1μs     1.34  timestamp.TimestampConstruction.time_fromtimestamp

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.


# These two are the benchmark from timestamp.TimestampConstruction.time_parse_iso8601_no_tz
--branch
In [1]: %timeit pd.Timestamp('2017-08-25 08:16:14')
1.62 µs ± 42.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

--master
In [1]: %timeit pd.Timestamp('2017-08-25 08:16:14')
1.12 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@jreback jreback merged commit 691cb86 into pandas-dev:master Nov 7, 2018
@jreback
Copy link
Contributor

jreback commented Nov 7, 2018

thanks @mroeschke

@toobaz toobaz mentioned this pull request Nov 7, 2018
2 tasks
@mroeschke mroeschke deleted the timestamp_roundtrip_frequency branch November 7, 2018 16:07
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.Timestamp(timestamp) looses freq
6 participants