Skip to content

BUG: can't resample if bin goes into negative numbers #51291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
MarcoGorelli opened this issue Feb 10, 2023 · 3 comments
Closed
3 tasks done

BUG: can't resample if bin goes into negative numbers #51291

MarcoGorelli opened this issue Feb 10, 2023 · 3 comments
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

idx = pd.date_range('0300-01-01', '2000-01-01', unit='s')
ser = Series(np.ones(len(idx)), index=idx)
ser.resample('299Y').mean()  # works
ser.resample('300Y').mean()  # fails

# NOTE: this requires https://github.com/pandas-dev/pandas/pull/51274 to have gone in first, else they both fail

Issue Description

fails with:

Traceback (most recent call last):
  File "t.py", line 17, in <module>
    print(ser.resample('300Y').mean())
  File "/home/marcogorelli/pandas-dev/pandas/core/series.py", line 5671, in resample
    return super().resample(
  File "/home/marcogorelli/pandas-dev/pandas/core/generic.py", line 8872, in resample
    return get_resampler(
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 1518, in get_resampler
    return tg._get_resampler(obj, kind=kind)
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 1682, in _get_resampler
    return DatetimeIndexResampler(
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 171, in __init__
    self.binner, self.grouper = self._get_binner()
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 233, in _get_binner
    binner, bins, binlabels = self._get_binner_for_time()
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 1252, in _get_binner_for_time
    return self._timegrouper._get_time_bins(self.ax)
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 1732, in _get_time_bins
    first, last = _get_timestamp_range_edges(
  File "/home/marcogorelli/pandas-dev/pandas/core/resample.py", line 2035, in _get_timestamp_range_edges
    first = Timestamp(first - freq)
  File "pandas/_libs/tslibs/offsets.pyx", line 504, in pandas._libs.tslibs.offsets.BaseOffset.__sub__
    return (-other).__add__(self)
  File "pandas/_libs/tslibs/offsets.pyx", line 488, in pandas._libs.tslibs.offsets.BaseOffset.__add__
    return self._apply(other)
  File "pandas/_libs/tslibs/offsets.pyx", line 163, in pandas._libs.tslibs.offsets.apply_wraps.wrapper
    result = func(self, other)
  File "pandas/_libs/tslibs/offsets.pyx", line 2225, in pandas._libs.tslibs.offsets.YearOffset._apply
    return shift_month(other, months, self._day_opt)
  File "pandas/_libs/tslibs/offsets.pyx", line 4449, in pandas._libs.tslibs.offsets.shift_month
    return stamp.replace(year=year, month=month, day=day)
  File "pandas/_libs/tslibs/timestamps.pyx", line 2277, in pandas._libs.tslibs.timestamps.Timestamp.replace
    ts_input = datetime(**kwargs)
ValueError: year 0 is out of range

Expected Behavior

both should work

Installed Versions

INSTALLED VERSIONS

commit : 21899b6
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+1498.g21899b63d5
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 66.1.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.1
hypothesis : 6.64.0
sphinx : 5.3.0
blosc : 1.11.1
feather : None
xlsxwriter : 3.0.7
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.8.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.6
brotli :
fastparquet : 2023.1.0
fsspec : 2022.11.0
gcsfs : 2022.11.0
matplotlib : 3.6.3
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2022.11.0
scipy : 1.10.0
snappy :
sqlalchemy : 1.4.45
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2022.7
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Feb 10, 2023
@MarcoGorelli
Copy link
Member Author

It's because the Python standard library's datetime is limited to years between 1 and 9999, and that's what we go through here:

# reconstruct & check bounds
if tzobj is not None and treat_tz_as_pytz(tzobj):
# replacing across a DST boundary may induce a new tzinfo object
# see GH#18319
ts_input = tzobj.localize(datetime(dts.year, dts.month, dts.day,
dts.hour, dts.min, dts.sec,
dts.us),
is_dst=not bool(fold))
tzobj = ts_input.tzinfo
else:
kwargs = {"year": dts.year, "month": dts.month, "day": dts.day,
"hour": dts.hour, "minute": dts.min, "second": dts.sec,
"microsecond": dts.us, "tzinfo": tzobj,
"fold": fold}
ts_input = datetime(**kwargs)

@MarcoGorelli
Copy link
Member Author

simpler reproducer:

In [3]: Timestamp('0300-01-01').replace(year=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 Timestamp('0300-01-01').replace(year=0)

File ~/pandas-dev/pandas/_libs/tslibs/timestamps.pyx:2277, in pandas._libs.tslibs.timestamps.Timestamp.replace()
   2275               "microsecond": dts.us, "tzinfo": tzobj,
   2276               "fold": fold}
-> 2277     ts_input = datetime(**kwargs)
   2278 
   2279 ts = convert_datetime_to_tsobject(

ValueError: year 0 is out of range

@MarcoGorelli
Copy link
Member Author

this was fixed in #50348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

No branches or pull requests

1 participant