Skip to content

Commit 5d98799

Browse files
Merge remote-tracking branch 'upstream/main' into bisect
2 parents 63fecf0 + 997f84b commit 5d98799

30 files changed

+349
-121
lines changed

doc/source/reference/testing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Exceptions and warnings
3535
errors.MergeError
3636
errors.NullFrequencyError
3737
errors.NumbaUtilError
38+
errors.NumExprClobberingError
3839
errors.OptionError
3940
errors.OutOfBoundsDatetime
4041
errors.OutOfBoundsTimedelta

doc/source/whatsnew/v1.5.0.rst

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,8 @@ Other enhancements
151151
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`)
152152
- Added ``numeric_only`` argument to :meth:`Resampler.sum`, :meth:`Resampler.prod`, :meth:`Resampler.min`, :meth:`Resampler.max`, :meth:`Resampler.first`, and :meth:`Resampler.last` (:issue:`46442`)
153153
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
154-
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, and :class:`SettingWithCopyWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
154+
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, :class:`SettingWithCopyWarning`, and :class:`NumExprClobberingError` are now exposed in ``pandas.errors`` (:issue:`27656`)
155+
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
155156

156157
.. ---------------------------------------------------------------------------
157158
.. _whatsnew_150.notable_bug_fixes:
@@ -434,6 +435,8 @@ Other API changes
434435
<https://developers.googleblog.com/2022/02/making-oauth-flows-safer.html?m=1#disallowed-oob>`_.
435436
The ``auth_local_webserver = False`` option is planned to stop working in
436437
October 2022. (:issue:`46312`)
438+
- :func:`read_json` now raises ``FileNotFoundError`` (previously ``ValueError``) when input is a string ending in ``.json``, ``.json.gz``, ``.json.bz2``, etc. but no such file exists. (:issue:`29102`)
439+
- Operations with :class:`Timestamp` or :class:`Timedelta` that would previously raise ``OverflowError`` instead raise ``OutOfBoundsDatetime`` or ``OutOfBoundsTimedelta`` where appropriate (:issue:`47268`)
437440
-
438441

439442
.. ---------------------------------------------------------------------------
@@ -682,6 +685,8 @@ Other Deprecations
682685
- Deprecated the ``closed`` argument in :class:`IntervalArray` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`)
683686
- Deprecated the ``closed`` argument in :class:`intervaltree` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`)
684687
- Deprecated the ``closed`` argument in :class:`ArrowInterval` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`)
688+
- Deprecated allowing ``unit="M"`` or ``unit="Y"`` in :class:`Timestamp` constructor with a non-round float value (:issue:`47267`)
689+
-
685690

686691
.. ---------------------------------------------------------------------------
687692
.. _whatsnew_150.performance:
@@ -726,12 +731,14 @@ Datetimelike
726731
- Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`)
727732
- Bug in :meth:`DatetimeIndex.tz_localize` localizing to UTC failing to make a copy of the underlying data (:issue:`46460`)
728733
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
734+
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
729735
-
730736

731737
Timedelta
732738
^^^^^^^^^
733739
- Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`)
734740
- Bug in constructing a :class:`Timedelta` with a ``np.timedelta64`` object and a ``unit`` sometimes silently overflowing and returning incorrect results instead of raising ``OutOfBoundsTimedelta`` (:issue:`46827`)
741+
- Bug in constructing a :class:`Timedelta` from a large integer or float with ``unit="W"`` silently overflowing and returning incorrect results instead of raising ``OutOfBoundsTimedelta`` (:issue:`47268`)
735742
-
736743

737744
Time Zones
@@ -765,7 +772,7 @@ Strings
765772
Interval
766773
^^^^^^^^
767774
- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`)
768-
-
775+
- Bug in :class:`IntervalDtype` when using datetime64[ns, tz] as a dtype string (:issue:`46999`)
769776

770777
Indexing
771778
^^^^^^^^

pandas/_libs/tslibs/conversion.pyx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
cimport cython
22

3+
import warnings
4+
35
import numpy as np
46

57
cimport numpy as cnp
@@ -248,13 +250,35 @@ cdef _TSObject convert_to_tsobject(object ts, tzinfo tz, str unit,
248250
if ts == NPY_NAT:
249251
obj.value = NPY_NAT
250252
else:
253+
if unit in ["Y", "M"]:
254+
# GH#47266 cast_from_unit leads to weird results e.g. with "Y"
255+
# and 150 we'd get 2120-01-01 09:00:00
256+
ts = np.datetime64(ts, unit)
257+
return convert_to_tsobject(ts, tz, None, False, False)
258+
251259
ts = ts * cast_from_unit(None, unit)
252260
obj.value = ts
253261
dt64_to_dtstruct(ts, &obj.dts)
254262
elif is_float_object(ts):
255263
if ts != ts or ts == NPY_NAT:
256264
obj.value = NPY_NAT
257265
else:
266+
if unit in ["Y", "M"]:
267+
if ts == int(ts):
268+
# GH#47266 Avoid cast_from_unit, which would give weird results
269+
# e.g. with "Y" and 150.0 we'd get 2120-01-01 09:00:00
270+
return convert_to_tsobject(int(ts), tz, unit, False, False)
271+
else:
272+
# GH#47267 it is clear that 2 "M" corresponds to 1970-02-01,
273+
# but not clear what 2.5 "M" corresponds to, so we will
274+
# disallow that case.
275+
warnings.warn(
276+
"Conversion of non-round float with unit={unit} is ambiguous "
277+
"and will raise in a future version.",
278+
FutureWarning,
279+
stacklevel=1,
280+
)
281+
258282
ts = cast_from_unit(ts, unit)
259283
obj.value = ts
260284
dt64_to_dtstruct(ts, &obj.dts)

pandas/_libs/tslibs/fields.pyi

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ def get_date_field(
2828
def get_timedelta_field(
2929
tdindex: npt.NDArray[np.int64], # const int64_t[:]
3030
field: str,
31+
reso: int = ..., # NPY_DATETIMEUNIT
3132
) -> npt.NDArray[np.int32]: ...
3233
def isleapyear_arr(
3334
years: np.ndarray,

pandas/_libs/tslibs/fields.pyx

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ from pandas._libs.tslibs.np_datetime cimport (
4848
get_unit_from_dtype,
4949
npy_datetimestruct,
5050
pandas_datetime_to_datetimestruct,
51+
pandas_timedelta_to_timedeltastruct,
5152
pandas_timedeltastruct,
52-
td64_to_tdstruct,
5353
)
5454

5555

@@ -491,7 +491,11 @@ def get_date_field(const int64_t[:] dtindex, str field, NPY_DATETIMEUNIT reso=NP
491491

492492
@cython.wraparound(False)
493493
@cython.boundscheck(False)
494-
def get_timedelta_field(const int64_t[:] tdindex, str field):
494+
def get_timedelta_field(
495+
const int64_t[:] tdindex,
496+
str field,
497+
NPY_DATETIMEUNIT reso=NPY_FR_ns,
498+
):
495499
"""
496500
Given a int64-based timedelta index, extract the days, hrs, sec.,
497501
field and return an array of these values.
@@ -510,7 +514,7 @@ def get_timedelta_field(const int64_t[:] tdindex, str field):
510514
out[i] = -1
511515
continue
512516

513-
td64_to_tdstruct(tdindex[i], &tds)
517+
pandas_timedelta_to_timedeltastruct(tdindex[i], reso, &tds)
514518
out[i] = tds.days
515519
return out
516520

@@ -521,7 +525,7 @@ def get_timedelta_field(const int64_t[:] tdindex, str field):
521525
out[i] = -1
522526
continue
523527

524-
td64_to_tdstruct(tdindex[i], &tds)
528+
pandas_timedelta_to_timedeltastruct(tdindex[i], reso, &tds)
525529
out[i] = tds.seconds
526530
return out
527531

@@ -532,7 +536,7 @@ def get_timedelta_field(const int64_t[:] tdindex, str field):
532536
out[i] = -1
533537
continue
534538

535-
td64_to_tdstruct(tdindex[i], &tds)
539+
pandas_timedelta_to_timedeltastruct(tdindex[i], reso, &tds)
536540
out[i] = tds.microseconds
537541
return out
538542

@@ -543,7 +547,7 @@ def get_timedelta_field(const int64_t[:] tdindex, str field):
543547
out[i] = -1
544548
continue
545549

546-
td64_to_tdstruct(tdindex[i], &tds)
550+
pandas_timedelta_to_timedeltastruct(tdindex[i], reso, &tds)
547551
out[i] = tds.nanoseconds
548552
return out
549553

pandas/_libs/tslibs/np_datetime.pxd

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,6 @@ cdef check_dts_bounds(npy_datetimestruct *dts, NPY_DATETIMEUNIT unit=?)
7777

7878
cdef int64_t dtstruct_to_dt64(npy_datetimestruct* dts) nogil
7979
cdef void dt64_to_dtstruct(int64_t dt64, npy_datetimestruct* out) nogil
80-
cdef void td64_to_tdstruct(int64_t td64, pandas_timedeltastruct* out) nogil
8180

8281
cdef int64_t pydatetime_to_dt64(datetime val, npy_datetimestruct *dts)
8382
cdef int64_t pydate_to_dt64(date val, npy_datetimestruct *dts)

pandas/_libs/tslibs/np_datetime.pyx

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -221,14 +221,6 @@ cdef inline void dt64_to_dtstruct(int64_t dt64,
221221
return
222222

223223

224-
cdef inline void td64_to_tdstruct(int64_t td64,
225-
pandas_timedeltastruct* out) nogil:
226-
"""Convenience function to call pandas_timedelta_to_timedeltastruct
227-
with the by-far-most-common frequency NPY_FR_ns"""
228-
pandas_timedelta_to_timedeltastruct(td64, NPY_FR_ns, out)
229-
return
230-
231-
232224
# just exposed for testing at the moment
233225
def py_td64_to_tdstruct(int64_t td64, NPY_DATETIMEUNIT unit):
234226
cdef:

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 68 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -161,42 +161,61 @@ def ints_to_pytimedelta(ndarray m8values, box=False):
161161
array of Timedelta or timedeltas objects
162162
"""
163163
cdef:
164+
NPY_DATETIMEUNIT reso = get_unit_from_dtype(m8values.dtype)
164165
Py_ssize_t i, n = m8values.size
165166
int64_t value
166-
object[::1] result = np.empty(n, dtype=object)
167-
NPY_DATETIMEUNIT reso = get_unit_from_dtype(m8values.dtype)
167+
object res_val
168+
169+
# Note that `result` (and thus `result_flat`) is C-order and
170+
# `it` iterates C-order as well, so the iteration matches
171+
# See discussion at
172+
# github.com/pandas-dev/pandas/pull/46886#discussion_r860261305
173+
ndarray result = cnp.PyArray_EMPTY(m8values.ndim, m8values.shape, cnp.NPY_OBJECT, 0)
174+
object[::1] res_flat = result.ravel() # should NOT be a copy
168175

169-
arr = m8values.view("i8")
176+
ndarray arr = m8values.view("i8")
177+
cnp.flatiter it = cnp.PyArray_IterNew(arr)
170178

171179
for i in range(n):
180+
# Analogous to: value = arr[i]
181+
value = (<int64_t*>cnp.PyArray_ITER_DATA(it))[0]
172182

173-
value = arr[i]
174183
if value == NPY_NAT:
175-
result[i] = <object>NaT
184+
res_val = <object>NaT
176185
else:
177186
if box:
178-
result[i] = _timedelta_from_value_and_reso(value, reso=reso)
187+
res_val = _timedelta_from_value_and_reso(value, reso=reso)
179188
elif reso == NPY_DATETIMEUNIT.NPY_FR_ns:
180-
result[i] = timedelta(microseconds=int(value) / 1000)
189+
res_val = timedelta(microseconds=int(value) / 1000)
181190
elif reso == NPY_DATETIMEUNIT.NPY_FR_us:
182-
result[i] = timedelta(microseconds=value)
191+
res_val = timedelta(microseconds=value)
183192
elif reso == NPY_DATETIMEUNIT.NPY_FR_ms:
184-
result[i] = timedelta(milliseconds=value)
193+
res_val = timedelta(milliseconds=value)
185194
elif reso == NPY_DATETIMEUNIT.NPY_FR_s:
186-
result[i] = timedelta(seconds=value)
195+
res_val = timedelta(seconds=value)
187196
elif reso == NPY_DATETIMEUNIT.NPY_FR_m:
188-
result[i] = timedelta(minutes=value)
197+
res_val = timedelta(minutes=value)
189198
elif reso == NPY_DATETIMEUNIT.NPY_FR_h:
190-
result[i] = timedelta(hours=value)
199+
res_val = timedelta(hours=value)
191200
elif reso == NPY_DATETIMEUNIT.NPY_FR_D:
192-
result[i] = timedelta(days=value)
201+
res_val = timedelta(days=value)
193202
elif reso == NPY_DATETIMEUNIT.NPY_FR_W:
194-
result[i] = timedelta(weeks=value)
203+
res_val = timedelta(weeks=value)
195204
else:
196205
# Month, Year, NPY_FR_GENERIC, pico, fempto, atto
197206
raise NotImplementedError(reso)
198207

199-
return result.base # .base to access underlying np.ndarray
208+
# Note: we can index result directly instead of using PyArray_MultiIter_DATA
209+
# like we do for the other functions because result is known C-contiguous
210+
# and is the first argument to PyArray_MultiIterNew2. The usual pattern
211+
# does not seem to work with object dtype.
212+
# See discussion at
213+
# github.com/pandas-dev/pandas/pull/46886#discussion_r860261305
214+
res_flat[i] = res_val
215+
216+
cnp.PyArray_ITER_NEXT(it)
217+
218+
return result
200219

201220

202221
# ----------------------------------------------------------------------
@@ -316,6 +335,8 @@ cdef convert_to_timedelta64(object ts, str unit):
316335
317336
Return an ns based int64
318337
"""
338+
# Caller is responsible for checking unit not in ["Y", "y", "M"]
339+
319340
if checknull_with_nat(ts):
320341
return np.timedelta64(NPY_NAT, "ns")
321342
elif isinstance(ts, _Timedelta):
@@ -329,17 +350,9 @@ cdef convert_to_timedelta64(object ts, str unit):
329350
if ts == NPY_NAT:
330351
return np.timedelta64(NPY_NAT, "ns")
331352
else:
332-
if unit in ["Y", "M", "W"]:
333-
ts = np.timedelta64(ts, unit)
334-
else:
335-
ts = cast_from_unit(ts, unit)
336-
ts = np.timedelta64(ts, "ns")
353+
ts = _maybe_cast_from_unit(ts, unit)
337354
elif is_float_object(ts):
338-
if unit in ["Y", "M", "W"]:
339-
ts = np.timedelta64(int(ts), unit)
340-
else:
341-
ts = cast_from_unit(ts, unit)
342-
ts = np.timedelta64(ts, "ns")
355+
ts = _maybe_cast_from_unit(ts, unit)
343356
elif isinstance(ts, str):
344357
if (len(ts) > 0 and ts[0] == "P") or (len(ts) > 1 and ts[:2] == "-P"):
345358
ts = parse_iso_format_string(ts)
@@ -356,6 +369,20 @@ cdef convert_to_timedelta64(object ts, str unit):
356369
return ts.astype("timedelta64[ns]")
357370

358371

372+
cdef _maybe_cast_from_unit(ts, str unit):
373+
# caller is responsible for checking
374+
# assert unit not in ["Y", "y", "M"]
375+
try:
376+
ts = cast_from_unit(ts, unit)
377+
except OverflowError as err:
378+
raise OutOfBoundsTimedelta(
379+
f"Cannot cast {ts} from {unit} to 'ns' without overflow."
380+
) from err
381+
382+
ts = np.timedelta64(ts, "ns")
383+
return ts
384+
385+
359386
@cython.boundscheck(False)
360387
@cython.wraparound(False)
361388
def array_to_timedelta64(
@@ -370,6 +397,8 @@ def array_to_timedelta64(
370397
-------
371398
np.ndarray[timedelta64ns]
372399
"""
400+
# Caller is responsible for checking
401+
assert unit not in ["Y", "y", "M"]
373402

374403
cdef:
375404
Py_ssize_t i, n = values.size
@@ -652,24 +681,20 @@ cdef inline timedelta_from_spec(object number, object frac, object unit):
652681
cdef:
653682
str n
654683

655-
try:
656-
unit = ''.join(unit)
657-
658-
if unit in ["M", "Y", "y"]:
659-
warnings.warn(
660-
"Units 'M', 'Y' and 'y' do not represent unambiguous "
661-
"timedelta values and will be removed in a future version.",
662-
FutureWarning,
663-
stacklevel=2,
664-
)
684+
unit = ''.join(unit)
685+
if unit in ["M", "Y", "y"]:
686+
warnings.warn(
687+
"Units 'M', 'Y' and 'y' do not represent unambiguous "
688+
"timedelta values and will be removed in a future version.",
689+
FutureWarning,
690+
stacklevel=3,
691+
)
665692

666-
if unit == 'M':
667-
# To parse ISO 8601 string, 'M' should be treated as minute,
668-
# not month
669-
unit = 'm'
670-
unit = parse_timedelta_unit(unit)
671-
except KeyError:
672-
raise ValueError(f"invalid abbreviation: {unit}")
693+
if unit == 'M':
694+
# To parse ISO 8601 string, 'M' should be treated as minute,
695+
# not month
696+
unit = 'm'
697+
unit = parse_timedelta_unit(unit)
673698

674699
n = ''.join(number) + '.' + ''.join(frac)
675700
return cast_from_unit(float(n), unit)
@@ -696,7 +721,7 @@ cpdef inline str parse_timedelta_unit(str unit):
696721
return unit
697722
try:
698723
return timedelta_abbrevs[unit.lower()]
699-
except (KeyError, AttributeError):
724+
except KeyError:
700725
raise ValueError(f"invalid unit abbreviation: {unit}")
701726

702727
# ----------------------------------------------------------------------

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,10 @@ from pandas._libs.tslibs.np_datetime cimport (
9090
pydatetime_to_dt64,
9191
)
9292

93-
from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
93+
from pandas._libs.tslibs.np_datetime import (
94+
OutOfBoundsDatetime,
95+
OutOfBoundsTimedelta,
96+
)
9497

9598
from pandas._libs.tslibs.offsets cimport (
9699
BaseOffset,
@@ -455,7 +458,7 @@ cdef class _Timestamp(ABCTimestamp):
455458
# Timedelta
456459
try:
457460
return Timedelta(self.value - other.value)
458-
except (OverflowError, OutOfBoundsDatetime) as err:
461+
except (OverflowError, OutOfBoundsDatetime, OutOfBoundsTimedelta) as err:
459462
if isinstance(other, _Timestamp):
460463
if both_timestamps:
461464
raise OutOfBoundsDatetime(

0 commit comments

Comments
 (0)