-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Smarter formatting of timedelta and datetime columns #5701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jreback What do you think shouold be done about the |
@cancan101 for DataTimeIndex, I think you should leave it unless: you determine for the entire series if their are time elements or not (using the quick test and it doesn't affect perf) - this should actually become a method of the DateTimeIndex (as a cached property) then you could change unicode |
@jreback I can see what can be done about unicode. Either way, cleaning up that method seems less important. |
yep....no biggie either way |
if isnull(x): | ||
return 'NaT' | ||
return nat_rep | ||
|
||
stamp = lib.Timestamp(x, tz=tz) | ||
return stamp._repr_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to make sure this works for having tzinfo
set. For example:
In [11]: pd.to_datetime([datetime(2013,1,1,tzinfo=pytz.utc)]).format()
Out[11]: [u'2013-01-01 00:00:00+00:00']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaT
should be returned if its null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean here? I needed to allow passing in the nat_rep
for: https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py#L630
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad....u r right
Should just be a matter of using code similar to https://github.com/pydata/pandas/pull/5701/files#diff-23878beaf55672cdc92c119f79fe492aR1797 and making sure to cache the call to |
yep....just try not to repeat the code |
pls rebase and squash a bit |
also pls add a release notes (and in this case a whatsnew entry - in v0.13.1.txt, giving a short demo) |
@jreback Done, let me know what you think. |
limit precision based on the values in the array (:issue:`3401`) | ||
|
||
Previously output might look like: | ||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first one needs to be a code-block, and make the second one actual code (e.g. create a dataframe that has those values and print it). Need to skip a line after the directive as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean the first one needs to be a code-block?
Something like this:
.. code-block:: python
age today diff
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I leave off the example output (ie will the compilation of the release notes actually run the code)?
For example, is this enough:
.. ipython:: python
df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ], columns=['age'])
df['today'] = Timestamp('20130419')
df['diff'] = df['today']-df['age']
df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build the docs and see (and will who why u need a code block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I doing something wrong here with make doc
:
File "/home/alex/git/pandas/doc/source/_templates/autosummary/class.rst", line 1, in top-level template code
{% extends "!autosummary/class.rst" %}
File "/home/alex/git/pandas/doc/source/_templates/autosummary/class.rst", line 1, in top-level template code
{% extends "!autosummary/class.rst" %}
File "/usr/local/lib/python2.7/dist-packages/jinja2/utils.py", line 339, in get
return self[key]
File "/usr/local/lib/python2.7/dist-packages/jinja2/utils.py", line 389, in __getitem__
if self._queue[-1] != key:
RuntimeError: maximum recursion depth exceeded in cmp
> /usr/local/lib/python2.7/dist-packages/jinja2/utils.py(389)__getitem__()
-> if self._queue[-1] != key:
(Pdb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which version of Sphinx do you have?
I noticed some problems when using the newest version (v1.2, still had to file an issue about it), and if I remember well, it was a similar error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 1.2 installed.
@jreback Anything else to get this merged? |
@jorisvandenbossche @jtratner @y-p any comments? |
good for me |
pls rebase...can you run the vbenchs just to confirm nothing reallly changes with this? (don't think it will but just to check) |
These stick out over a few runs of vbench:
|
Assuming you are benching this against master; the However, the other test |
What is the easiest way to run one of the vbench Benchmarks individually in IPython so that I can benchmark? |
just copy the code and run it |
This function is expensive:
|
Changing the isnull to checknull gives a nice speed boost. |
For example If I strip it down to:
the time per 50,000 calls goes from 0.310s to 0.280s |
Whereas this:
takes 0.119s |
@jreback Because of #5912, the current code is actually broken:
leads to:
|
Even for formatting jsut the date, using the native format appears superior:
|
Performance looks good with the newest change (all results > 3%):
|
can you show a perf check (just > 3%), but show all output otherwise (e.g. the commits which are refed) |
|
and again:
|
@jreback Let me know what you think about the newest code and results. |
def _format_native_types(self, na_rep=u('NaT'), | ||
date_format=None, **kwargs): | ||
data = self._get_object_index() | ||
from pandas.core.format import Datetime64Formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to module level imports? (or does it cause a problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are circular import problems from that I recall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np them
just couple of minor comments. pls squash any commits together if you can/make sense, rebase and can merge |
@jreback done |
one more time.....(I just merged something..) |
For Datetime this means that only the date is shown when for all values there is no timezone and time is midnight and for Timedelta this means that the deltas are all whole days (GH3401). str(NaT) fixed to be "NaT" (GH5708).
@jreback done again |
Smarter formatting of timedelta and datetime columns
thanks! nice PR! keep em coming |
side issue - I think this was there before....the 0's in timedeltas when displaying long format should be full-rank
|
@jreback Yes, I believe that issues was very much there before. That one is a little tougher to deal with and will require further generalizing the format from just long/short. |
okay be create an issue for it and get 2 it when possible |
I think that this commit broke some tests: cancan101@cfb9c98
I'm on a Mac if that matters. I can do a bit of digging. |
@TomAugspurger you resetup? (this has library changes that need to be recompiled) |
Ahh that was it. I was doing a |
gr8! |
Closes #3401
Closes #5708
Closes #5912