-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: including offset/freq in Timestamp repr (#4553) #6575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for putting this together! The tests look good. One suggestion/tweak is to I have tz and freq in the repr only if they're not None? I think that would be an improvement (less noisy and still having |
I don't have strong feelings on it, I just followed the pattern already there with tz. On the one hand, it's nice to be explicit and show users something they may have missed ("oh, I forgot to set time zone on this thing"), on the other hand, it's noisy/less elegant. I'll follow whatever you senior pandas folks deem most desirable. |
@@ -215,7 +219,9 @@ class Timestamp(_Timestamp): | |||
pass | |||
zone = "'%s'" % zone if zone else 'None' | |||
|
|||
return "Timestamp('%s', tz=%s)" % (result, zone) | |||
freq = "'%s'" % self.offset.freqstr if self.offset else 'None' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conditionally add tz and offset if they are not None
also use the new style formatting
e.g. something like
freq = ", offset={0}".format(self.offset.freqstr) if self.offset is not None else ""
tz = ", tz={0}".format(self.tz) if self.tz is not None else ""
"Timestamp('{stamp}'{zone}{freq}})".format(stamp=stamp,zone=zone,freq=freq)
also pls do a perf check |
Good catch on Playing around with it, I've just found one case where a repr can be generated that isn't executable: In [4]: pd.Timestamp('2014-03-05 00:00:00-0500')
Out[4]: Timestamp('2014-03-05 00:00:00-0500', tz='tzoffset(None, -18000)') This is a pre-existing bug. I'll need to do another pass to handle that. By "do a perf check" do you mean add a new benchmark to vbench or just run the vbench suite? (not that I've ever done either) |
I mean just run test_perf.sh -b master -t HEAD and report if anything > 1.2 or so |
Oddly it took 2 tries for test_perf.sh to work, initially I got the same exact error as in: All seems okay on the second run though, and nothing came in over ~1.15. I'll try and handle the tzoffset case and test again. |
Well, if I include code like: tz = ""
from dateutil.tz import tzoffset
if isinstance(zone, tzoffset):
tz = ", tz={0}".format(zone)
elif zone is not None:
tz = ", tz='{0}'".format(zone) which I find somewhat gross/inelegant, then we can get the following:
which if we have tzoffset imported, would be eval-able. Yes? No? If people like it I can add a test to exercise this case and then hopefully we are done. |
tz should appear as a common tz string not an offset |
I just want to make sure we're on the same page here. My present understanding is there are 2 different types of objects which can be found stored in Timestamp.tzinfo:
Are you saying that Timezone's repr() should display the 2nd type as if it was the 1st type? (so that if someone eval'd the string, they'd get an instance with different internals than what was originally repr'd) |
@rosnfeld forgot about that...ok so the repr (and str) for a So I think these needed to be treated separately, so only print the tz field if its not None and not tzoffset (or you can do it if only a pytz)....annoying that this has to be done.... |
Ok, the latest commit is rebased on top of master, I handled this new tzoffset case and fleshed out the tests a bit more, Travis passes (after an initial network failure accessing yahoo) and a perf test came up clean. |
gr8! can you post the tests reprs (the 4 or 5 cases that you have in the test) just to see |
In [1]: from pandas import Timestamp
In [2]: date = '2014-03-07'
In [3]: tz = 'US/Eastern'
In [4]: freq = 'M'
In [5]: repr(Timestamp(date)) # date_only_repr
Out[5]: "Timestamp('2014-03-07 00:00:00')"
In [6]: repr(Timestamp(date, tz=tz)) # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"
In [7]: repr(Timestamp(date, offset=freq)) # date_freq_repr
Out[7]: "Timestamp('2014-03-07 00:00:00', offset='M')"
In [8]: repr(Timestamp(date, tz=tz, offset=freq)) # date_tz_freq_repr
Out[8]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='M')"
In [9]: repr(Timestamp("2014-03-13 00:00:00-0500")) # date_with_utc_offset_repr
Out[9]: "Timestamp('2014-03-13 00:00:00-0500')" |
hmm isn't it more correct for
to be
as the original is showing essentially double information? |
Well, somewhat. US/Eastern does not always mean -0500, like today and the next few weeks for example: In [4]: repr(Timestamp('2014-03-13', tz='US/Eastern'))
Out[4]: "Timestamp('2014-03-13 00:00:00-0400', tz='US/Eastern')" IMO it's handy to see the UTC offset separately. |
Ahem, and more than just a few weeks, it's not like UTC has DST. |
ahh..ok...that makes sense |
cc @rockg comments? |
@jreback I see your comment about it being a little confusing about having the duplicate information but I think it's complete this way. For the last test, it might be informative to have tz=None or something (maybe tzoffset(...)) to inform that there actually isn't a tz, but just a straight offset. |
Sure, I like passing |
@rosnfeld |
Done |
tz = 'US/Eastern' | ||
freq = 'M' | ||
|
||
date_only_repr = repr(Timestamp(date)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these need a round-trip check for each of these, e.g.
ts = Timestamp(date)
self.assertEqual(ts,eval(repr(ts)))
normally frown on using eval
but that is the point here
(leave all the other tests as well) as they are checking what the repr looks like
Good idea, I added the round-trip check. While playing with it, I noted that offset is not being used in equality testing, e.g. In [2]: Timestamp('2014-03-13') == Timestamp('2014-03-13', offset='D')
Out[2]: True Not sure what your thoughts are on that. |
(Whoops, tiny stylistic inconsistency in the previous push. Fixed.) |
equality in Timestamp space is only dependent on the UTC (e.g. the value). I think they are equal regardless of the freq/offset as they represent a single instant in time. You could argue otherwise I suppose. I am not sure of the implications of changing this. |
Yeah, on reflection I think that makes sense. This 670edaa commit has the desired code. Hopefully this is it. |
ok...this looks good..ping when travis is all green |
ping |
|
||
return "Timestamp('%s', tz=%s)" % (result, zone) | ||
from dateutil.tz import tzoffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move this particular import to the very top of tslib.pyx
Ok, done. I feel I have already seen a few import patterns within pandas, and have read there is a tiny perf boost to doing a function-local import... are there policies on this? |
in general try not to import locally if at all possible; cython is a bit tricky because you can't you can't directly import from the pandas namespace. If the function is called a lot their IS a big perf drag (as an import is several lookups), I don't think its an issue here. but just trying to be clean. generally you only import localy if you you can't otherwise, e.g. its a circular-import problem, otherwise need to refactor into different modules, but sometimes that's too complicated . again doesnt matter here |
ENH: including offset/freq in Timestamp repr (#4553)
thanks! |
fixes #4553
Hopefully the
test_repr()
implementation I included was sufficient. I see sometest_repr()
instances in the code that are more minimal - literally just making sure that repr(obj) doesn't blow up. And some which are pretty detailed, which makes me worry they might be a bit fragile. I went more middle-of-the-road.