ENH: including offset/freq in Timestamp repr (#4553) #6575

rosnfeld · 2014-03-07T22:19:12Z

Hopefully the test_repr() implementation I included was sufficient. I see some test_repr() instances in the code that are more minimal - literally just making sure that repr(obj) doesn't blow up. And some which are pretty detailed, which makes me worry they might be a bit fragile. I went more middle-of-the-road.

In [1]: from pandas import Timestamp

In [2]: date = '2014-03-07'

In [3]: tz = 'US/Eastern'

In [4]: freq = 'M'

In [5]: repr(Timestamp(date))  # date_only_repr
Out[5]: "Timestamp('2014-03-07 00:00:00')"

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

In [7]: repr(Timestamp(date, offset=freq))  # date_freq_repr
Out[7]: "Timestamp('2014-03-07 00:00:00', offset='M')"

In [8]: repr(Timestamp(date, tz=tz, offset=freq))  # date_tz_freq_repr
Out[8]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='M')"

In [9]: repr(Timestamp('2014-03-13 00:00:00-0400', tz=None))  # date_with_utc_offset_repr
Out[9]: "Timestamp('2014-03-13 00:00:00-0400')"

hayd · 2014-03-08T01:10:09Z

Thanks for putting this together! The tests look good.

One suggestion/tweak is to I have tz and freq in the repr only if they're not None? I think that would be an improvement (less noisy and still having eval(repr(t)) == t)... ?

rosnfeld · 2014-03-08T01:29:16Z

I don't have strong feelings on it, I just followed the pattern already there with tz. On the one hand, it's nice to be explicit and show users something they may have missed ("oh, I forgot to set time zone on this thing"), on the other hand, it's noisy/less elegant.

I'll follow whatever you senior pandas folks deem most desirable.

jreback · 2014-03-09T13:29:31Z

pandas/tslib.pyx

@@ -215,7 +219,9 @@ class Timestamp(_Timestamp):
            pass
        zone = "'%s'" % zone if zone else 'None'

-        return "Timestamp('%s', tz=%s)" % (result, zone)
+        freq = "'%s'" % self.offset.freqstr if self.offset else 'None'


conditionally add tz and offset if they are not None

also use the new style formatting

e.g. something like

freq = ", offset={0}".format(self.offset.freqstr) if self.offset is not None else "" tz = ", tz={0}".format(self.tz) if self.tz is not None else "" "Timestamp('{stamp}'{zone}{freq}})".format(stamp=stamp,zone=zone,freq=freq)

jreback · 2014-03-09T13:31:19Z

also pls do a perf check

rosnfeld · 2014-03-09T20:32:04Z

Good catch on if self.offset not being the same as if self.offset is not None.

Playing around with it, I've just found one case where a repr can be generated that isn't executable:

In [4]: pd.Timestamp('2014-03-05 00:00:00-0500')
Out[4]: Timestamp('2014-03-05 00:00:00-0500', tz='tzoffset(None, -18000)')

This is a pre-existing bug. I'll need to do another pass to handle that.

By "do a perf check" do you mean add a new benchmark to vbench or just run the vbench suite? (not that I've ever done either)

jreback · 2014-03-09T20:47:13Z

I mean just run

test_perf.sh -b master -t HEAD

and report if anything > 1.2 or so

rosnfeld · 2014-03-09T23:06:26Z

Oddly it took 2 tries for test_perf.sh to work, initially I got the same exact error as in:
#5978 (comment)
and
#5283 (comment)

All seems okay on the second run though, and nothing came in over ~1.15.

I'll try and handle the tzoffset case and test again.

rosnfeld · 2014-03-10T02:58:51Z

Well, if I include code like:

        tz = ""
        from dateutil.tz import tzoffset
        if isinstance(zone, tzoffset):
            tz = ", tz={0}".format(zone)
        elif zone is not None:
            tz = ", tz='{0}'".format(zone)

which I find somewhat gross/inelegant, then we can get the following:

In [3]: pd.Timestamp('2014-03-05 00:00:00-0500')
Out[3]: Timestamp('2014-03-05 00:00:00-0500', tz=tzoffset(None, -18000))

which if we have tzoffset imported, would be eval-able.

Yes? No? If people like it I can add a test to exercise this case and then hopefully we are done.

jreback · 2014-03-10T03:02:20Z

tz should appear as a common tz string not an offset

rosnfeld · 2014-03-10T03:23:42Z

I just want to make sure we're on the same page here. My present understanding is there are 2 different types of objects which can be found stored in Timestamp.tzinfo:

pytz timezones, which str() nicely as strings like "US/Eastern". These strings can be passed back into the Timestamp constructor and it converts them back to the pytz instance.
dateutil tzoffsets, which are represented as seconds from UTC, and don't have a nice str() implementation. They just show a repr() value like "tzoffset(None, -18000)".

Are you saying that Timezone's repr() should display the 2nd type as if it was the 1st type? (so that if someone eval'd the string, they'd get an instance with different internals than what was originally repr'd)

jreback · 2014-03-10T12:44:54Z

@rosnfeld forgot about that...ok

so the repr (and str) for a tzoffset DOES not include the tz then (its embedded in the actual time field), while for a pytz it is in the tz field.

So I think these needed to be treated separately, so only print the tz field if its not None and not tzoffset (or you can do it if only a pytz)....annoying that this has to be done....

rosnfeld · 2014-03-10T15:15:32Z

Ok, the latest commit is rebased on top of master, I handled this new tzoffset case and fleshed out the tests a bit more, Travis passes (after an initial network failure accessing yahoo) and a perf test came up clean.

jreback · 2014-03-10T15:24:53Z

gr8!

can you post the tests reprs (the 4 or 5 cases that you have in the test) just to see

rosnfeld · 2014-03-10T15:40:16Z

In [1]: from pandas import Timestamp

In [2]: date = '2014-03-07'

In [3]: tz = 'US/Eastern'

In [4]: freq = 'M'

In [5]: repr(Timestamp(date))  # date_only_repr
Out[5]: "Timestamp('2014-03-07 00:00:00')"

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

In [7]: repr(Timestamp(date, offset=freq))  # date_freq_repr
Out[7]: "Timestamp('2014-03-07 00:00:00', offset='M')"

In [8]: repr(Timestamp(date, tz=tz, offset=freq))  # date_tz_freq_repr
Out[8]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='M')"

In [9]: repr(Timestamp("2014-03-13 00:00:00-0500"))  # date_with_utc_offset_repr
Out[9]: "Timestamp('2014-03-13 00:00:00-0500')"

jreback · 2014-03-10T15:51:43Z

hmm

isn't it more correct for

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

to be

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00', tz='US/Eastern')"

as the original is showing essentially double information?

rosnfeld · 2014-03-10T15:57:12Z

Well, somewhat. US/Eastern does not always mean -0500, like today and the next few weeks for example:

In [4]: repr(Timestamp('2014-03-13', tz='US/Eastern'))
Out[4]: "Timestamp('2014-03-13 00:00:00-0400', tz='US/Eastern')"

IMO it's handy to see the UTC offset separately.

rosnfeld · 2014-03-10T16:02:26Z

Ahem, and more than just a few weeks, it's not like UTC has DST.

jreback · 2014-03-10T16:03:26Z

ahh..ok...that makes sense

jreback · 2014-03-10T16:04:01Z

cc @rockg

comments?

rockg · 2014-03-10T16:15:51Z

@jreback I see your comment about it being a little confusing about having the duplicate information but I think it's complete this way. For the last test, it might be informative to have tz=None or something (maybe tzoffset(...)) to inform that there actually isn't a tz, but just a straight offset.

rosnfeld · 2014-03-10T16:24:05Z

Sure, I like passing tz=None. I'll also update the offset to -0400 in that test so that it is consistent with US/Eastern as used in the earlier tests.

jreback · 2014-03-10T16:39:35Z

@rosnfeld
can you put the reprs (the above post with your most recent changes) in the top of the PR?

rosnfeld · 2014-03-10T16:45:12Z

Done

jreback · 2014-03-10T16:48:43Z

pandas/tseries/tests/test_tslib.py

+        tz = 'US/Eastern'
+        freq = 'M'
+
+        date_only_repr = repr(Timestamp(date))


I think these need a round-trip check for each of these, e.g.

ts = Timestamp(date) self.assertEqual(ts,eval(repr(ts)))

normally frown on using eval but that is the point here

(leave all the other tests as well) as they are checking what the repr looks like

rosnfeld · 2014-03-10T17:29:44Z

Good idea, I added the round-trip check. While playing with it, I noted that offset is not being used in equality testing, e.g.

In [2]: Timestamp('2014-03-13') == Timestamp('2014-03-13', offset='D')
Out[2]: True

Not sure what your thoughts are on that.

rosnfeld · 2014-03-10T17:31:46Z

(Whoops, tiny stylistic inconsistency in the previous push. Fixed.)

jreback · 2014-03-10T17:32:03Z

equality in Timestamp space is only dependent on the UTC (e.g. the value).

I think they are equal regardless of the freq/offset as they represent a single instant in time. You could argue otherwise I suppose. I am not sure of the implications of changing this.

rosnfeld · 2014-03-10T17:36:57Z

Yeah, on reflection I think that makes sense.

This 670edaa commit has the desired code. Hopefully this is it.

jreback · 2014-03-10T17:52:26Z

ok...this looks good..ping when travis is all green

rosnfeld · 2014-03-10T18:46:58Z

ping

jreback · 2014-03-10T18:53:54Z

pandas/tslib.pyx


-        return "Timestamp('%s', tz=%s)" % (result, zone)
+        from dateutil.tz import tzoffset


I think you can move this particular import to the very top of tslib.pyx

rosnfeld · 2014-03-10T19:09:31Z

Ok, done. I feel I have already seen a few import patterns within pandas, and have read there is a tiny perf boost to doing a function-local import... are there policies on this?

jreback · 2014-03-10T19:15:07Z

in general try not to import locally if at all possible; cython is a bit tricky because you can't you can't directly import from the pandas namespace. If the function is called a lot their IS a big perf drag (as an import is several lookups), I don't think its an issue here. but just trying to be clean.

generally you only import localy if you you can't otherwise, e.g. its a circular-import problem, otherwise need to refactor into different modules, but sometimes that's too complicated . again doesnt matter here

ENH: including offset/freq in Timestamp repr (#4553)

jreback · 2014-03-10T20:01:35Z

thanks!

jreback added the Output-Formatting label Mar 9, 2014

jreback added this to the 0.14.0 milestone Mar 9, 2014

jreback reviewed Mar 9, 2014
View reviewed changes

jreback reviewed Mar 10, 2014
View reviewed changes

ENH: including offset/freq in Timestamp repr (pandas-dev#4553)

92e813f

jreback added a commit that referenced this pull request Mar 10, 2014

Merge pull request #6575 from rosnfeld/issue_4553

26e8fa8

ENH: including offset/freq in Timestamp repr (#4553)

jreback merged commit 26e8fa8 into pandas-dev:master Mar 10, 2014

rosnfeld deleted the issue_4553 branch March 10, 2014 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: including offset/freq in Timestamp repr (#4553) #6575

ENH: including offset/freq in Timestamp repr (#4553) #6575

rosnfeld commented Mar 7, 2014

hayd commented Mar 8, 2014

rosnfeld commented Mar 8, 2014

jreback Mar 9, 2014

jreback commented Mar 9, 2014

rosnfeld commented Mar 9, 2014

jreback commented Mar 9, 2014

rosnfeld commented Mar 9, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

jreback commented Mar 10, 2014

rockg commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback Mar 10, 2014

rosnfeld commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

jreback commented Mar 10, 2014


		return "Timestamp('%s', tz=%s)" % (result, zone)
		from dateutil.tz import tzoffset

ENH: including offset/freq in Timestamp repr (#4553) #6575

ENH: including offset/freq in Timestamp repr (#4553) #6575

Conversation

rosnfeld commented Mar 7, 2014

hayd commented Mar 8, 2014

rosnfeld commented Mar 8, 2014

jreback Mar 9, 2014

Choose a reason for hiding this comment

jreback commented Mar 9, 2014

rosnfeld commented Mar 9, 2014

jreback commented Mar 9, 2014

rosnfeld commented Mar 9, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

jreback commented Mar 10, 2014

rockg commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback Mar 10, 2014

Choose a reason for hiding this comment

rosnfeld commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

rosnfeld commented Mar 10, 2014

jreback Mar 10, 2014

Choose a reason for hiding this comment

rosnfeld commented Mar 10, 2014

jreback commented Mar 10, 2014

jreback commented Mar 10, 2014