Skip to content

FMT: allow datetime/timedelta formatting on a column to be better #3401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Apr 19, 2013 · 28 comments · Fixed by #5701
Closed

FMT: allow datetime/timedelta formatting on a column to be better #3401

jreback opened this issue Apr 19, 2013 · 28 comments · Fixed by #5701
Labels
Enhancement Ideas Long-Term Enhancement Discussions Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Apr 19, 2013

http://stackoverflow.com/questions/16103238/pandas-timedelta-in-days/16104567#16104567

Its easy to think about, e.g. if no dates in a particular column have time values, then can print them w/o the time values (e.g. just 2001-01-01 maybe), I think this would be nice as the default (maybe add an option for it?)

In [17]: df
Out[17]: 
                  age               today                diff      years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00  12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00   8.887671

Better?

In [17]: df
Out[17]: 
         age      today      diff     years
0 2001-01-01 2013-04-19 4491 days 12.304110
1 2004-06-01 2013-04-19 3244 days  8.887671
@cancan101
Copy link
Contributor

Any reason that Timedelta64Formatter subclasses Datetime64Formatter rather than directly subclassing GenericArrayFormatter?

@jtratner
Copy link
Contributor

@cancan101 It doesn't matter to preserve that hierarchy. Heck most of those formatters could be collapsed into one just by setting a _default_formatter property.

@cancan101
Copy link
Contributor

Also for repr_timedelta64, would there be objections to change this method to not output the hms section of a timedelta if they are all zero?

In other words, rather than: 36 days, 00:00:00, just 36 days.

It would be a relatively simple change:

if days:
    return "%s%d days, %02d:%02d:%s" % (sign_pretty, days, hours, minutes,
                                           seconds_pretty)

to something like:

   if days:
       if hours or minutes or frac:
          return "%s%d days, %02d:%02d:%s" % (sign_pretty, days, hours, minutes,
                                           seconds_pretty)
      else:
          return "%s%d days" % (sign_pretty, days)

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

I think the change to repr_timedelta should involve a keyword arg . for the most part u DO want a full repr I think (eg in csv and such)

but for output formatting to say the console you could collapse it so need both modes

@jtratner
Copy link
Contributor

My two cents: I like it for your age and today examples above with dates, but don't like it at all for timedeltas. Feels like it obscures the data that you actually have in the column.

@cancan101
Copy link
Contributor

@jreback The full repr argument is a bit weak since even now it does not show the day section of the string when the number of days is 0.

@cancan101
Copy link
Contributor

@jtratner I agree that for timedeltas it obscures the output of the data.

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

open to various display options

@jtratner
Copy link
Contributor

I think this would be good:

In [17]: df
Out[17]: 
         age      today      diff          years
0 2001-01-01 2013-04-19 4491 days, 0:00:00 12.304110
1 2004-06-01 2013-04-19 3244 days, 0:00:00  8.887671

@jtratner
Copy link
Contributor

and it has the benefit of being round-trippable

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

@jtratner iirc the short formats are round trip able as well

@jtratner
Copy link
Contributor

'3244 days'? nice

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

yep I needed it for storing timedelta in hdf store actually
when u query (to make the query easy)

to_timedelta is pretty well tested for this

@jtratner
Copy link
Contributor

well, if that's the case, I guess I'm fine either way.

@cancan101
Copy link
Contributor

@jtratner Actually in re-reading your post, I think I disagree: as long as to_timedelta('1 days') works (ie it will be round trippable), I would prefer to drop off the extraneous ", 0:00:00" in the timedelta values.

@jtratner
Copy link
Contributor

go for it

@cancan101
Copy link
Contributor

I'll throw together a PR.

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

You can do something like the test below to determine whether you have anything but days on all elements
(and this is very fast)

In [87]: x = pd.to_timedelta(range(5),unit='D')

In [88]: y = pd.to_timedelta(range(5),unit='s')

In [89]: x
Out[89]: 
0           00:00:00
1   1 days, 00:00:00
2   2 days, 00:00:00
3   3 days, 00:00:00
4   4 days, 00:00:00
dtype: timedelta64[ns]

In [90]: y
Out[90]: 
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[ns]

In [109]:  (x.astype(np.int64)%(86400*1e9)).sum() == 0
Out[109]: True

In [110]:  (y.astype(np.int64)%(86400*1e9)).sum() == 0
Out[110]: False

In [111]:  ((x+y).astype(np.int64)%(86400*1e9)).sum() == 0
Out[111]: False

Also might need to handle the 0 days case (instead of just printing 00:00:00)

@cancan101
Copy link
Contributor

@jreback Gotcha. That being said, since repr_timedelta64 currently outputs different string formats for different inputs, would it be unacceptable to make the default behavior of a call to repr_timedelta64 act like in #3401 (comment) (ie for days only to put 3 days)?

The 0 case is somewhat interesting and you would want to know how the rest of the column is output. In other words when timedelta is 0, do you write: 00:00:00, 0 days, 00:00:00 or 0 days?

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

I think that you should apriori determine which format you want (e.g. a, b, or c), then pass an argument to repr_timedelta to actually format it, maybe call it format='short'|'long'|None, maybe short is only days (and have 0 be '0 days'), 'long' means always long format, and None can be existing

@cancan101
Copy link
Contributor

@jreback Do you think is makes sense then for Timedelta64Formatter to use the code you provided above to determine the correct format and then to call repr_timedelta64 with either short or long (ie be explicit and to not allow for jagged formatting)?

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

exactly what I would do....Timedelta64formatter can figure out what it wants then pass the appropriate args to repr_timedelta64 to actually format the string.

This is almost the same as what you need to do for Datetime64 formats as well.

You can do it in the formatters, though I think these actually should exist in the DatetimeIndex / TimedeltaIndex (doesn't exists ATM) class as part of the _format_as_native_types method (which is how the csv formatting happens)

so for now, go ahead and put it in the formatters (and a big note to move to the Index classes at some point)

@jtratner what do you think?

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2013

slight change:

I would create _format_datetime64_array and _format_timedelta64_array which do exactly what we are talking about. these should live in core/format.py, right next to _format_datetime64 and _format_timedelta64. So then the formatters are easy, they just call _format_datetime64_array with the last which formats it.

Then we can change code in core/tseries/index.py/DatetimeIndex if possible to use these

@cancan101
Copy link
Contributor

@jreback What code in DatetimeIndex are you referring to? All that I can see is the __unicode__ method, which prints only the first and last elements.

@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2013

@cancan101
Copy link
Contributor

@jreback Do you know why the if ... d.tzinfo is not None: check? As in why is the d.time() != zero_time not enough?

@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2013

if you have a time zone on anything then u need to write it out fully no matter the date/time
as a date (and not time) with a time zone is not a common format at all

@cancan101
Copy link
Contributor

@jreback That makes sense. Please take a look at #5701 when you have a chance. Cleaning up datetime formatting will be done similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Ideas Long-Term Enhancement Discussions Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants