-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: support for nanosecond time in offset and period #3060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ypu have some failnig tests.....I think you need to Skip by |
thanks for the hint, the tests are green again |
@wesm take a look? |
I'll have a look soon as I can and we can merge for 0.11 |
fwiw, I suggest putting this off to 0.12, it's already seen some subtle breakage and it should |
I agree, it may be better to have a longer period of testing. |
OK let's merge right after 0.11. We'll want to have a 0.11.x maintenance branch in case we need to fix any critical bugs in 0.11.1/2/... |
There's a slew of stuff for "right after 0.11", it looks like we're naturally moving towards a |
@wuan, any last minute tweaks before we go live in master? |
No, not at the moment. I will have a look at #3375 now. |
you rebased on top of master? can u also post a perf check? eg test_perf -b before_commit -t last_commit anything say > 15% diff |
This is a current perofmance check, there is quite a distribution of times:
as times are more often wrapped with Timestamp, there is a performance impact. I can have a further look into this. |
performance is tricky - some times multiple runs will give diff results so obviously we care about biggest ratios do a %prun -l func() in ipython to profile - I usually do on master then on my pr to see what the hotspots |
it would be great to have this in 0.13 ... i've just pulled down and i'm going to do a little smoke testing @wuan any thoughts? it's been a while... just want to bump this |
Thanks for restarting activity ... At the moment I'm investigating some broken tests due to features introduced in the meantime. I will have some more time mid of august to help bringing things forward. Benchmark tests results are very strange at the moment, as some of the tests show speed differences, but do not involve new code at all. Still looking into this topic as well. |
Pulled master and rebased. There are still problems with strata._datetime_to_stata_elapsed raising a ValueError when there is a np.datetime64 instead of a datetime.datetime object. For me, this seems to be a bug in the strata code. Any thoughts? |
do you have an example where I can reproduce this? It appears (bried glance), that the stata code iterrows over the frame; IF there is not a homogenous (e.g. all dates) data then I think the np.datetime64 will be coerced to datetime.datetime. Agreed that this is a 'bug', or a at least a bad assumption. |
Oops, the real problem was cause by a type cast bug introduced with this pull request which is finally gone now. Tests are green again. At least there is now an open issue #4558 regarding export of np.datetime64 based types in index. |
if isinstance(key, datetime): | ||
# needed to localize naive datetimes | ||
stamp = Timestamp(key, tz=self.tz) | ||
elif isinstance(key, Timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary, a Timestamp IS and instance of datetime (its a subclass)
What happens on numpy < 1.7, There are 2 travis version (the 2.6 and 3.2) that run with numpy 1.6.1, so you can put in a specific test for this, see first item here: https://github.com/pydata/pandas/wiki/Tips-&-Tricks (you could even do something in the code too); there will very shortl be a variable in |
Nano existed well before this pull request and issue #1812. It just did not work, the effective period was zero. |
this PR adds the ability to index by nanos, correct? (e.g. more than just representing) can you add an example to the top of the PR? (this is with current master)
|
What does it do in < 1.7? (can you give an example?) |
@wuan can you rerun the perf impact? |
Here is the current performance comparison. I will have a look in the slow operations in the next days.
|
@wuan any remaining issues in this? quick perf check ? pls rebase as well I know this has been a long time ...but can get merged soon |
@wuan give me one more rebase one this pls |
no prob, I hope that the code is now mature enough |
This is great work. Happy to see the legacy scikits.timeseries code worked on to add features like this. 👍 from me |
@@ -1664,7 +1664,7 @@ def test_timedelta64(self): | |||
from pandas import date_range | |||
from datetime import datetime, timedelta | |||
|
|||
Series(np.array([1100, 20], dtype='timedelta64[s]')).to_string() | |||
Series(np.array([1100, 20], dtype='timedelta64[ns]')).to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a smoke test i presume? result is not being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be deleted, it really has no effect.
bombs away |
ENH: support for nanosecond time in offset and period
@wuan thanks for this! pls review docs and examples (they are not currently built with 1.7 so prob will fail, but make sure that they look ok locally).... |
@wuan just recompiled with ur new PR (which is great) but i'm getting an unreachable code warning from Cython
offending lines:
|
The first line returning the new Timestamp object can be removed to enable the unreachable code again. This is only fixing
with numpy < 1.7, but this is just the behaviour before this PR. I added a branch wuan/post_nanosecond_time with the related fixes. |
Also, the second Timestamp line should be: return Timestamp((self.offset * other).apply(self)) In case the multiplication needs to happen on the other side On Mon, Sep 30, 2013 at 3:56 PM, Phillip Cloud [email protected]:
|
can take a look at the vbench for :
this used to take about 1/3 time in v0.12.0... I think the period conversion was inlined back then... pls let me know |
you mean inlined method in period.c? I removed them to be compatible to Microsoft CC. If it helps, there should be a workaround using preprocessor logic to enable inlining again. Should I give it a try? |
brief look at the prior was that convert_daytime was effectively inlined (< 0.12).....that seems to be the bottleneck....pls give a try and lmk |
any luck with this? |
sorry, had a very packed week. will do it this weekend |
I added inlining of functions again to the methods in period.c (please see branch wuan/pandas/c_inline) Unfortunately I do not see a huge change, the above benchmark results are I will have a closer look at the resample method |
I think I found the problem. The Series.resample(...) uses tslib.period_asfreq_array(...) to convert all timestamps to the target period. The conversion function is determined and called within a loop over all samples. When there is an intraday conversion (like in our case) then column and row indices and the desired conversion factor is then determined for each sample, although it does not change. As there is only the multiplication / division by the factor as an additional operation this unnecessary operation easily leads to runtime which is increased by a factor of two or more. The fix would be to determine and store the conversion factor in the get_asfreq_info method which is only called once per operation and to use this conversion factor in convert_daytime without further operations. How much time do I have to implement the improvement? |
5 min! hahh... still have a week or so.....see what you can do |
@wuan heads-up: if it's at Python-level you might be able to use |
Thanks, unfortunately its at C-level. But the good thing is that fixing it seems to make the code a lot cleaner. |
I'm now at 11.4 ms benchmark time. For now I think that this is the maximum improvement one can get. The old code is so fast, because the function in use was just multiplying a constant with the current ordinal. At the moment at least one some conditions are required. |
gr8 |
The code available with #5204 has now a loop time of 10.5 ms (vs. 13.8 ms on current master) |
closes #1812
This pull request fixes handling of nanosecond times in periods and offsets. In addition it simplifies the internal processing of intraday periods.
It requires Numpy 1.7 to work, using Numpy 1.6 yields the original behaviour before this PR.
This is the continuation of #2555 after it was merged without effect fixing issue #1812.
The biggest part is a refactoring of pandas/src/period.c to use a conversion factor matrix for intraday conversion. This minimizes the impact of adding three more intraday time units (ms, us and ns). Thus the amount of frequency conversion methods is reduced to one per intraday frequency source or target. The intraday conversion factor is looked up in the conversion factor matrix inside the method.
Nanosecond support is only available when using numpy >= 1.7. When using with numpy < 1.7 everything except nanosecond support should work as expected.