-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Resampling a Series with a timezone using kind='period' Crashes with ~6000 Values #5430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
what version of pandas? |
Sorry, 0.12.0 |
can u try in master? |
Also, is 6K the minimum that causes this to occur? |
Yeah, error still persists. I created a new virtualenv, cloned master at c435e72. After installation of pandas from source, my
|
Well, the error happens with 5998 entries, so it's not a 6k boundary, but didn't seem to happen with ~5400 entries. I just kept moving the date and re-ran the test. |
ok thanks for the report; marking as a bug |
Okay, narrows it down. |
I'm still tracing the execution, but I wouldn't be surprised if this failure to correct a non-monotonic index is leading the crash. https://github.com/pydata/pandas/blob/master/pandas/tseries/resample.py#L80 |
So if you put a sort_index() call there it works? |
The exception path seems to call if I change it from a built-in function (like
|
Another data point, switching from
|
What happens if you call sort_index() on it? Does it crash? |
Yeah, it still crashes. I think I was a bit misguided on the |
What? I'm confused now. |
It doesn't seem to be due to the sorting of the index. It looks like it's because the grouper is creating an extra group for the day during daylight savings transition. Even on small sets that cross a day boundary, like this case with a range from 11/1 to 11/2 fails, so its not just daylight savings related.
|
Here's the problem: https://github.com/pydata/pandas/blob/master/pandas/tseries/resample.py#L195 It's converting the periods back into timestamps, but it lost the timezone in the process. So, it's incorrectly partitioning. Also, it's hard coded to a
|
Nice detective work there! Do you want to submit a pull request as well? |
@kevinastone can you try tests cases for #4076 and #3609 to see if your fix help their too? (put in separate commits), if they don't work, then can easily discard (or could do a separate PR) |
Negative on #4076, still adding an extra period:
Affirmative on #3609:
|
gr8 so add that on add as test to the PR (3609) |
Done |
I wrote a test case that consistently crashes the entire process. It looks like it requires a Series with data localized to a timezone that has a DST and the data crosses the DST boundary. Finally, you have to use
kind='period'
for theresample()
operation. Oddly, it's not just the actual boundary, because I can create a smaller dataset, and it resamples fine (included in the test case with the_works
suffix.With that combination, the code crashes the entire process with a glibc error.
Crashing Test Case
https://gist.github.com/kevinastone/7297033
The text was updated successfully, but these errors were encountered: