Skip to content

Return from to_timedelta is forced to dtype timedelta64[ns]. #9040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2014

Conversation

ahjulstad
Copy link
Contributor

This is a revised change, fixing the issue I experienced (issue #9011). The proper change seems to be setting the dtype to 'timedelta64[ns]', as was already done for the other cases in the to_timedelta routine.

I have also added more tests, and moved them to the test_timedeltas function (next to the other tests of the functionality of the constructor method). I have not looked too much at #8886, it is difficult for me to understand the issue.

@shoyer shoyer added Bug Timedelta Timedelta data type labels Dec 8, 2014
@shoyer shoyer added this to the 0.15.2 milestone Dec 8, 2014
@shoyer
Copy link
Member

shoyer commented Dec 8, 2014

Could you please add a note about the bug fix to what's new for 0.15.2? Depending on timing I think we could still get this into 0.15.2.

@jreback
Copy link
Contributor

jreback commented Dec 8, 2014

I indicated in a comment that the change needs to happen here: https://github.com/pydata/pandas/blob/master/pandas/tseries/tdi.py#L184

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 8, 2014
@ahjulstad
Copy link
Contributor Author

@jreback It is not obvious to me how to best do the change in tdi.py as you mentioned. This is both my first look at pandas code and first pull request, so apologies if I am fumbling.

Anyways, the docstring in to_timedelta states

Returns
-------
ret : timedelta64/arrays of timedelta64 if parsing succeeded

But it currently returns with a dtype of '<m8[ns]'

In [6]: to_timedelta(range(3), unit='s')
Out[6]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '00:00:02']
Length: 3, Freq: None

In [7]: a = to_timedelta(range(3), unit='s')

In [8]: a.dtype
Out[8]: dtype('<m8[ns]')

@shoyer
Copy link
Member

shoyer commented Dec 9, 2014

@ahjulstad Thanks for taking a look at this. to_timedelta should return a numpy timedelta64[ns] array (or a TimedeltaIndex, if box=True) -- does that make things clearer?

Can you reproduce this only using to_timedelta with box=False? (if so, then it's not a TimedeltaIndex issue directly)

@ahjulstad
Copy link
Contributor Author

In 0.15.1:

In [18]: from pandas.tseries.timedeltas import to_timedelta

In [19]: a = to_timedelta(range(3), unit='s', box=False)

In [20]: print(a)
[         0 1000000000 2000000000]

In [21]: print(a.dtype)
int64

After my change:


In [7]: from pandas.tseries.timedeltas import to_timedelta

In [8]: a = to_timedelta(range(3), unit='s', box=False)

In [9]: print(a)
[         0 1000000000 2000000000]

In [10]: print(a.dtype)
timedelta64[ns]

@shoyer
Copy link
Member

shoyer commented Dec 9, 2014

OK, this does look like the right fix to me. I don't see how it fits in TimedeltaIndex. @jreback thoughts?

It would be great if you add this box=False example as another test case (slightly more direct than testing TimedeltaIndex).

@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

see the change here: jreback@e38e94f

The constructor code is buggy here, not to_timedelta. The constructor internally always tries to convert to ns based. (m8[ns]). While to_timedelta is user facing.

I always try to avoid using astype because if could be hiding subtle bugs. (In this case it DOES work though).

@ahjulstad
Copy link
Contributor Author

This last test fails, though. Floating point arithmetic won't give accurate results, but I was surprised it was so much off:

>>> to_timedelta(1.0/3, unit='h')
Timedelta('0 days 00:19:59.998800')

The test is:

# Tests with fractional hours as input:
expected = np.array([0, 20*60*1000000000, 30*60*1000000000, 40*60*1000000000], dtype='timedelta64[ns]')
result = to_timedelta([0., 1.0/3, 0.5, 2.0/3], unit='h', box=False)
tm.assert_almost_equal(expected, result)

@jreback jreback modified the milestones: 0.15.2, 0.16.0 Dec 9, 2014
@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

@ahjulstad

I would create another issue for the floating point. Their is some internal rounding that is necessary to avoid precision issues. It may be that for say h,min its too strict.

@ahjulstad
Copy link
Contributor Author

Floating point issue raised in issue #9048.

As for the astype, I have no strong opinions, of course; I just thought that since all the other main if-branches in _convert_listlike included an astype conversion, it would make sense to include it for this specific case as well.

What is really the difference between 'm8[ns]' and 'timedelta64[ns]'?

@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

those dtypes are exactly the same (m8 is just a short name)

@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

ok this is fine. pls add a release note. and squash. ping me when green.

@ahjulstad
Copy link
Contributor Author

OK. Please clarify what you mean with squash (git is somewhat new to me). Shall I rebase from pydata/pandas/master, and squash all my commits to a single commit with the interactive rebase functionality? Thanks for helping.

From: jreback [mailto:[email protected]]
Sent: 9. desember 2014 13:25
To: pydata/pandas
Cc: Åsmund Hjulstad
Subject: Re: [pandas] Return from to_timedelta is forced to dtype timedelta64[ns]. (#9040)

ok this is fine. pls add a release note. and squash. ping me when green.


Reply to this email directly or view it on GitHubhttps://github.com//pull/9040#issuecomment-66274955.


The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorised use, dissemination of the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and delete
this message.
Thank you

@TomAugspurger
Copy link
Contributor

@ahjulstad That's correct.

@ahjulstad
Copy link
Contributor Author

@TomAugspurger but since origin (ahjulstad/pandas) has the old rework-fix-9011 branch, how can I make it accept my squashed version (that I have created locally?) I did the rebase from upstream directly, without pulling 'through' origin.

> git push origin 

To https://github.com/ahjulstad/pandas.git
 ! [rejected]        rework-fix-9011 -> rework-fix-9011 (non-fast-forward)
error: failed to push some refs to 'https://github.com/ahjulstad/pandas.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

git push origin thisbranchname -f

forces a replacement of the commits to the new ones

@ahjulstad
Copy link
Contributor Author

@jreback Done.

jreback added a commit that referenced this pull request Dec 9, 2014
Return from to_timedelta is forced to dtype timedelta64[ns].
@jreback jreback merged commit 7ea921e into pandas-dev:master Dec 9, 2014
@jreback
Copy link
Contributor

jreback commented Dec 9, 2014

@ahjulstad thanks!

@ahjulstad ahjulstad deleted the rework-fix-9011 branch December 12, 2014 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants