Skip to content

Commit eff6566

Browse files
ENH: infer Timestamp(iso8601string) resolution (#49737)
* ENH: infer Timestamp(iso8601string) resolution * xfail windwos * fix assertion * post-merge fixup * Update doc/source/whatsnew/v2.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * CLN: remove commented-out * update test * test for unit on parsing iso8601 with offset * test for round * fix offset tests Co-authored-by: Matthew Roeschke <[email protected]>
1 parent c54ce8e commit eff6566

File tree

21 files changed

+160
-71
lines changed

21 files changed

+160
-71
lines changed

doc/source/whatsnew/v2.0.0.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,8 @@ Other API changes
465465
- :meth:`Index.astype` now allows casting from ``float64`` dtype to datetime-like dtypes, matching :class:`Series` behavior (:issue:`49660`)
466466
- Passing data with dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; timedelta64 data with lower resolution will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
467467
- Passing ``dtype`` of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
468-
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
468+
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
469+
- Passing a string in ISO-8601 format to :class:`Timestamp` will retain the resolution of the parsed input if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49737`)
469470
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
470471
- Changed behavior of :meth:`Series.quantile` and :meth:`DataFrame.quantile` with :class:`SparseDtype` to retain sparse dtype (:issue:`49583`)
471472
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
@@ -798,6 +799,7 @@ Datetimelike
798799
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
799800
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
800801
- Bug in :func:`to_datetime` was showing misleading ``ValueError`` when parsing dates with format containing ISO week directive and ISO weekday directive (:issue:`50308`)
802+
- Bug in :meth:`Timestamp.round` when the ``freq`` argument has zero-duration (e.g. "0ns") returning incorrect results instead of raising (:issue:`49737`)
801803
- Bug in :func:`to_datetime` was not raising ``ValueError`` when invalid format was passed and ``errors`` was ``'ignore'`` or ``'coerce'`` (:issue:`50266`)
802804
- Bug in :class:`DateOffset` was throwing ``TypeError`` when constructing with milliseconds and another super-daily argument (:issue:`49897`)
803805
-

pandas/_libs/tslibs/conversion.pyx

+33-18
Original file line numberDiff line numberDiff line change
@@ -405,7 +405,8 @@ cdef _TSObject convert_datetime_to_tsobject(
405405

406406

407407
cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
408-
int tzoffset, tzinfo tz=None):
408+
int tzoffset, tzinfo tz=None,
409+
NPY_DATETIMEUNIT reso=NPY_FR_ns):
409410
"""
410411
Convert a datetimestruct `dts`, along with initial timezone offset
411412
`tzoffset` to a _TSObject (with timezone object `tz` - optional).
@@ -416,6 +417,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
416417
tzoffset: int
417418
tz : tzinfo or None
418419
timezone for the timezone-aware output.
420+
reso : NPY_DATETIMEUNIT, default NPY_FR_ns
419421
420422
Returns
421423
-------
@@ -427,16 +429,19 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
427429
datetime dt
428430
Py_ssize_t pos
429431

430-
value = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
432+
value = npy_datetimestruct_to_datetime(reso, &dts)
431433
obj.dts = dts
432434
obj.tzinfo = timezone(timedelta(minutes=tzoffset))
433-
obj.value = tz_localize_to_utc_single(value, obj.tzinfo)
435+
obj.value = tz_localize_to_utc_single(
436+
value, obj.tzinfo, ambiguous=None, nonexistent=None, creso=reso
437+
)
438+
obj.creso = reso
434439
if tz is None:
435-
check_overflows(obj, NPY_FR_ns)
440+
check_overflows(obj, reso)
436441
return obj
437442

438443
cdef:
439-
Localizer info = Localizer(tz, NPY_FR_ns)
444+
Localizer info = Localizer(tz, reso)
440445

441446
# Infer fold from offset-adjusted obj.value
442447
# see PEP 495 https://www.python.org/dev/peps/pep-0495/#the-fold-attribute
@@ -454,6 +459,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
454459
obj.dts.us, obj.tzinfo, fold=obj.fold)
455460
obj = convert_datetime_to_tsobject(
456461
dt, tz, nanos=obj.dts.ps // 1000)
462+
obj.ensure_reso(reso) # TODO: more performant to get reso right up front?
457463
return obj
458464

459465

@@ -490,7 +496,7 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
490496
int out_local = 0, out_tzoffset = 0, string_to_dts_failed
491497
datetime dt
492498
int64_t ival
493-
NPY_DATETIMEUNIT out_bestunit
499+
NPY_DATETIMEUNIT out_bestunit, reso
494500

495501
if len(ts) == 0 or ts in nat_strings:
496502
ts = NaT
@@ -513,19 +519,26 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
513519
&out_tzoffset, False
514520
)
515521
if not string_to_dts_failed:
522+
reso = get_supported_reso(out_bestunit)
516523
try:
517-
check_dts_bounds(&dts, NPY_FR_ns)
524+
check_dts_bounds(&dts, reso)
518525
if out_local == 1:
519-
return _create_tsobject_tz_using_offset(dts,
520-
out_tzoffset, tz)
526+
return _create_tsobject_tz_using_offset(
527+
dts, out_tzoffset, tz, reso
528+
)
521529
else:
522-
ival = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
530+
ival = npy_datetimestruct_to_datetime(reso, &dts)
523531
if tz is not None:
524532
# shift for _localize_tso
525-
ival = tz_localize_to_utc_single(ival, tz,
526-
ambiguous="raise")
527-
528-
return convert_to_tsobject(ival, tz, None, False, False)
533+
ival = tz_localize_to_utc_single(
534+
ival, tz, ambiguous="raise", nonexistent=None, creso=reso
535+
)
536+
obj = _TSObject()
537+
obj.dts = dts
538+
obj.value = ival
539+
obj.creso = reso
540+
maybe_localize_tso(obj, tz, obj.creso)
541+
return obj
529542

530543
except OutOfBoundsDatetime:
531544
# GH#19382 for just-barely-OutOfBounds falling back to dateutil
@@ -538,10 +551,12 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
538551
pass
539552

540553
try:
541-
dt = parse_datetime_string(ts, dayfirst=dayfirst,
542-
yearfirst=yearfirst)
543-
except (ValueError, OverflowError):
544-
raise ValueError("could not convert string to Timestamp")
554+
# TODO: use the one that returns reso
555+
dt = parse_datetime_string(
556+
ts, dayfirst=dayfirst, yearfirst=yearfirst
557+
)
558+
except (ValueError, OverflowError) as err:
559+
raise ValueError("could not convert string to Timestamp") from err
545560

546561
return convert_datetime_to_tsobject(dt, tz)
547562

pandas/_libs/tslibs/offsets.pyx

+5-1
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,11 @@ def apply_wraps(func):
162162

163163
result = func(self, other)
164164

165-
result = (<_Timestamp>Timestamp(result))._as_creso(other._creso)
165+
result2 = Timestamp(result).as_unit(other.unit)
166+
if result == result2:
167+
# i.e. the conversion is non-lossy, not the case for e.g.
168+
# test_milliseconds_combination
169+
result = result2
166170

167171
if self._adjust_dst:
168172
result = result.tz_localize(tz)

pandas/_libs/tslibs/timestamps.pyx

+12-2
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,7 @@ cdef class _Timestamp(ABCTimestamp):
448448
# cython semantics, args have been switched and this is __radd__
449449
# TODO(cython3): remove this it moved to __radd__
450450
return other.__add__(self)
451+
451452
return NotImplemented
452453

453454
def __radd__(self, other):
@@ -1560,8 +1561,17 @@ class Timestamp(_Timestamp):
15601561
cdef:
15611562
int64_t nanos
15621563

1563-
to_offset(freq).nanos # raises on non-fixed freq
1564-
nanos = delta_to_nanoseconds(to_offset(freq), self._creso)
1564+
freq = to_offset(freq)
1565+
freq.nanos # raises on non-fixed freq
1566+
nanos = delta_to_nanoseconds(freq, self._creso)
1567+
if nanos == 0:
1568+
if freq.nanos == 0:
1569+
raise ValueError("Division by zero in rounding")
1570+
1571+
# e.g. self.unit == "s" and sub-second freq
1572+
return self
1573+
1574+
# TODO: problem if nanos==0
15651575

15661576
if self.tz is not None:
15671577
value = self.tz_localize(None).value

pandas/core/computation/pytables.py

+4-6
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
import numpy as np
1212

1313
from pandas._libs.tslibs import (
14-
NaT,
1514
Timedelta,
1615
Timestamp,
1716
)
@@ -216,17 +215,16 @@ def stringify(value):
216215
if isinstance(v, (int, float)):
217216
v = stringify(v)
218217
v = ensure_decoded(v)
219-
v = Timestamp(v)
220-
if v is not NaT:
221-
v = v.as_unit("ns") # pyright: ignore[reportGeneralTypeIssues]
218+
v = Timestamp(v).as_unit("ns")
222219
if v.tz is not None:
223220
v = v.tz_convert("UTC")
224221
return TermValue(v, v.value, kind)
225222
elif kind in ("timedelta64", "timedelta"):
226223
if isinstance(v, str):
227-
v = Timedelta(v).value
224+
v = Timedelta(v)
228225
else:
229-
v = Timedelta(v, unit="s").value
226+
v = Timedelta(v, unit="s")
227+
v = v.as_unit("ns").value
230228
return TermValue(int(v), v, kind)
231229
elif meta == "category":
232230
metadata = extract_array(self.metadata, extract_numpy=True)

pandas/core/resample.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2085,7 +2085,7 @@ def _adjust_dates_anchored(
20852085
elif origin == "start":
20862086
origin_nanos = first.value
20872087
elif isinstance(origin, Timestamp):
2088-
origin_nanos = origin.value
2088+
origin_nanos = origin.as_unit("ns").value
20892089
elif origin in ["end", "end_day"]:
20902090
origin_last = last if origin == "end" else last.ceil("D")
20912091
sub_freq_times = (origin_last.value - first.value) // freq.nanos

pandas/tests/arithmetic/test_datetime64.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1699,15 +1699,15 @@ def test_datetimeindex_sub_timestamp_overflow(self):
16991699
dtimax = pd.to_datetime(["2021-12-28 17:19", Timestamp.max])
17001700
dtimin = pd.to_datetime(["2021-12-28 17:19", Timestamp.min])
17011701

1702-
tsneg = Timestamp("1950-01-01")
1702+
tsneg = Timestamp("1950-01-01").as_unit("ns")
17031703
ts_neg_variants = [
17041704
tsneg,
17051705
tsneg.to_pydatetime(),
17061706
tsneg.to_datetime64().astype("datetime64[ns]"),
17071707
tsneg.to_datetime64().astype("datetime64[D]"),
17081708
]
17091709

1710-
tspos = Timestamp("1980-01-01")
1710+
tspos = Timestamp("1980-01-01").as_unit("ns")
17111711
ts_pos_variants = [
17121712
tspos,
17131713
tspos.to_pydatetime(),

pandas/tests/arrays/test_timedeltas.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def test_add_pdnat(self, tda):
102102
# TODO: 2022-07-11 this is the only test that gets to DTA.tz_convert
103103
# or tz_localize with non-nano; implement tests specific to that.
104104
def test_add_datetimelike_scalar(self, tda, tz_naive_fixture):
105-
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture)
105+
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture).as_unit("ns")
106106

107107
expected = tda.as_unit("ns") + ts
108108
res = tda + ts

pandas/tests/indexes/datetimes/methods/test_astype.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ def _check_rng(rng):
276276
)
277277
def test_integer_index_astype_datetime(self, tz, dtype):
278278
# GH 20997, 20964, 24559
279-
val = [Timestamp("2018-01-01", tz=tz).value]
279+
val = [Timestamp("2018-01-01", tz=tz).as_unit("ns").value]
280280
result = Index(val, name="idx").astype(dtype)
281281
expected = DatetimeIndex(["2018-01-01"], tz=tz, name="idx")
282282
tm.assert_index_equal(result, expected)

pandas/tests/indexes/datetimes/test_constructors.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -804,7 +804,7 @@ def test_constructor_timestamp_near_dst(self):
804804
)
805805
def test_constructor_with_int_tz(self, klass, box, tz, dtype):
806806
# GH 20997, 20964
807-
ts = Timestamp("2018-01-01", tz=tz)
807+
ts = Timestamp("2018-01-01", tz=tz).as_unit("ns")
808808
result = klass(box([ts.value]), dtype=dtype)
809809
expected = klass([ts])
810810
assert result == expected

pandas/tests/io/json/test_pandas.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -973,7 +973,9 @@ def test_mixed_timedelta_datetime(self):
973973
ts = Timestamp("20130101")
974974
frame = DataFrame({"a": [td, ts]}, dtype=object)
975975

976-
expected = DataFrame({"a": [pd.Timedelta(td).as_unit("ns").value, ts.value]})
976+
expected = DataFrame(
977+
{"a": [pd.Timedelta(td).as_unit("ns").value, ts.as_unit("ns").value]}
978+
)
977979
result = read_json(frame.to_json(date_unit="ns"), dtype={"a": "int64"})
978980
tm.assert_frame_equal(result, expected, check_index_type=False)
979981

pandas/tests/scalar/timedelta/test_arithmetic.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,14 @@ def test_td_add_datetimelike_scalar(self, op):
9999
assert result is NaT
100100

101101
def test_td_add_timestamp_overflow(self):
102+
ts = Timestamp("1700-01-01").as_unit("ns")
102103
msg = "Cannot cast 259987 from D to 'ns' without overflow."
103104
with pytest.raises(OutOfBoundsTimedelta, match=msg):
104-
Timestamp("1700-01-01") + Timedelta(13 * 19999, unit="D")
105+
ts + Timedelta(13 * 19999, unit="D")
105106

106107
msg = "Cannot cast 259987 days 00:00:00 to unit='ns' without overflow"
107108
with pytest.raises(OutOfBoundsTimedelta, match=msg):
108-
Timestamp("1700-01-01") + timedelta(days=13 * 19999)
109+
ts + timedelta(days=13 * 19999)
109110

110111
@pytest.mark.parametrize("op", [operator.add, ops.radd])
111112
def test_td_add_td(self, op):

pandas/tests/scalar/timestamp/test_arithmetic.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ def test_overflow_offset_raises(self):
3838
# xref https://github.com/statsmodels/statsmodels/issues/3374
3939
# ends up multiplying really large numbers which overflow
4040

41-
stamp = Timestamp("2017-01-13 00:00:00")
41+
stamp = Timestamp("2017-01-13 00:00:00").as_unit("ns")
4242
offset_overflow = 20169940 * offsets.Day(1)
4343
msg = (
4444
"the add operation between "
@@ -59,7 +59,7 @@ def test_overflow_offset_raises(self):
5959
# xref https://github.com/pandas-dev/pandas/issues/14080
6060
# used to crash, so check for proper overflow exception
6161

62-
stamp = Timestamp("2000/1/1")
62+
stamp = Timestamp("2000/1/1").as_unit("ns")
6363
offset_overflow = to_offset("D") * 100**5
6464

6565
lmsg3 = (
@@ -77,8 +77,8 @@ def test_overflow_offset_raises(self):
7777
def test_overflow_timestamp_raises(self):
7878
# https://github.com/pandas-dev/pandas/issues/31774
7979
msg = "Result is too large"
80-
a = Timestamp("2101-01-01 00:00:00")
81-
b = Timestamp("1688-01-01 00:00:00")
80+
a = Timestamp("2101-01-01 00:00:00").as_unit("ns")
81+
b = Timestamp("1688-01-01 00:00:00").as_unit("ns")
8282

8383
with pytest.raises(OutOfBoundsDatetime, match=msg):
8484
a - b
@@ -239,7 +239,7 @@ def test_add_int_with_freq(self, ts, other):
239239
@pytest.mark.parametrize("shape", [(6,), (2, 3)])
240240
def test_addsub_m8ndarray(self, shape):
241241
# GH#33296
242-
ts = Timestamp("2020-04-04 15:45")
242+
ts = Timestamp("2020-04-04 15:45").as_unit("ns")
243243
other = np.arange(6).astype("m8[h]").reshape(shape)
244244

245245
result = ts + other

0 commit comments

Comments
 (0)