-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: resample seems to convert hours to 00:00 #34833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you need to try in a much newer version |
Can confirm the same behavior on 1.04 and master 1.04['2018-06-07T00:00:00.000000000' '2018-06-09T00:00:00.000000000' INSTALLED VERSIONScommit : None pandas : 1.0.4 master['2018-06-07T00:00:00.000000000' '2018-06-09T00:00:00.000000000' INSTALLED VERSIONScommit : 5fdd6f5 pandas : 1.1.0.dev0+1887.g5fdd6f50a |
This is because the default value of "origin" in resample is "start_day". Please see below: Lines 7754 to 7768 in 4a267c6
The following are the definitions for the 'start' and 'start_day'. Since the default is 'start_day', the first day at midnight is considered. Hence hours you have mentioned is disregarded.
Looks like there was some issue due to which this is changed, and is described here: #31809 The default of "start" is more accurate in my opinion. @jreback @hasB4K @mroeschke Can you please comment further on this? |
@dsandeep0138 pls have a read on the issue and the number of issues this patches this change is much more consistent with respect to the frequency rather than just happens to work |
@jreback Thanks for the comment. Yes, I understand the change is awesome, and deals with the inconsistencies in frequencies and fixes so many issues :) Can you please comment on this particular bug if it is expected then? Should we document it, if it is so to avoid confusion? Thanks. |
@dsandeep0138 I'll reply a bit later to explain everything, no worries 😉 |
@deepandas11 That makes sense on master, by setting origin="start" fixes the issue. However, it will still be an issue in 0.24 and 1.04 given that the origin kwarg was not introduced then. If I am using 0.24, what's the way to preserve the hours? |
@liverpool1026 The behavior has been to follow The behavior of The proof that your code is relying on a bug fixed in v0.2.4: import datetime as dt
import pandas as pd
def print_resample(example_nb, start, end, resample_freq):
print(f"\nEXAMPLE {example_nb}: {start} - {end} [{resample_freq}]")
time_index_df = pd.date_range(start, end, freq="1H", name="datetime").to_frame(index=False)
time_index_df["test"] = 1
time_index_df = time_index_df.resample(resample_freq, convention="end", on="datetime").sum().reset_index()
print(time_index_df)
print_resample(1, "2018-06-07 11:00", "2018-06-10 11:00", "2D")
print_resample(2, "2018-06-07 11:00", "2018-06-10 11:00", "12H")
print_resample(3, "2018-06-07 13:00", "2018-06-10 11:00", "12H") Outputs (v0.2.3):
Now, that being said... What should you do to align from the start of your timeseries with those constraint before the version 1.1.0? Well first, I would advise to wait a few months the release of 1.1.0... But you could hack a bit around by converting this temporally into Timedeltas and it should work: import datetime as dt
import pandas as pd
def print_resample_simulate_origin_start(example_nb, start, end, resample_freq):
print(f"\nEXAMPLE {example_nb}: {start} - {end} [{resample_freq}]")
time_index_df = pd.date_range(start, end, freq="1H", name="datetime").to_frame(index=False)
time_index_df["test"] = 1
# hack: transform datetime into timestamps to resample on start of the timeseries
time_index_df["datetime"] -= pd.Timestamp(0)
time_index_df = time_index_df.resample(resample_freq, convention="end", on="datetime").sum().reset_index()
# hack: transform datetime back into timestamps
time_index_df["datetime"] = pd.to_datetime(time_index_df["datetime"])
print(time_index_df)
print_resample_simulate_origin_start(1, "2018-06-07 11:00", "2018-06-10 11:00", "2D")
print_resample_simulate_origin_start(2, "2018-06-07 11:00", "2018-06-10 11:00", "12H")
print_resample_simulate_origin_start(3, "2018-06-07 13:00", "2018-06-10 11:00", "12H") Outputs (v0.2.3):
I know this solution is not ideal... But again, this has been fixed in #31809 and it will be in the upcoming 1.1.0. I hope I have answered to your questions/issues @liverpool1026 and @dsandeep0138. |
I have checked that this issue has not already been reported. (As far as I can see by using the search)
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
On pandas 0.23 the behaviour of resample will keep the correct datetime (18:00 in this case) after resample.
But starting from 0.24, after resample, the datetime is now converted to (00:00).
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
On. 0.23
['2018-06-07T18:00:00.000000000' '2018-06-09T18:00:00.000000000'
'2018-06-11T18:00:00.000000000' '2018-06-13T18:00:00.000000000'
'2018-06-15T18:00:00.000000000' '2018-06-17T18:00:00.000000000'
'2018-06-19T18:00:00.000000000' '2018-06-21T18:00:00.000000000'
'2018-06-23T18:00:00.000000000' '2018-06-25T18:00:00.000000000'
'2018-06-27T18:00:00.000000000']
But Starting from 0.24 it is giving me
['2018-06-07T00:00:00.000000000' '2018-06-09T00:00:00.000000000'
'2018-06-11T00:00:00.000000000' '2018-06-13T00:00:00.000000000'
'2018-06-15T00:00:00.000000000' '2018-06-17T00:00:00.000000000'
'2018-06-19T00:00:00.000000000' '2018-06-21T00:00:00.000000000'
'2018-06-23T00:00:00.000000000' '2018-06-25T00:00:00.000000000'
'2018-06-27T00:00:00.000000000']
Output of
pd.show_versions()
0.23
INSTALLED VERSIONS
commit: None
python: 3.6.9.final.0
python-bits: 64
OS: Linux
OS-release: 5.3.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
pandas: 0.23.0
pytest: None
pip: 20.0.2
setuptools: 46.0.0
Cython: None
numpy: 1.18.5
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.1
pytz: 2020.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.8
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.19
pymysql: None
psycopg2: 2.8.5 (dt dec pq3 ext lo64)
jinja2: 2.11.2
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None
0.24
INSTALLED VERSIONS
commit: None
python: 3.6.9.final.0
python-bits: 64
OS: Linux
OS-release: 5.3.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
pandas: 0.24.2
pytest: None
pip: 20.0.2
setuptools: 46.0.0
Cython: None
numpy: 1.18.5
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.1
pytz: 2020.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.8
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.2.19
pymysql: None
psycopg2: 2.8.5 (dt dec pq3 ext lo64)
jinja2: 2.11.2
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None
The text was updated successfully, but these errors were encountered: