Skip to content

Commit e8b47d9

Browse files
jbrockmendelphofl
authored andcommitted
API: Series.astype(td64_unsupported) raise (pandas-dev#49290)
* API: Series.astype(td64_unsupported) raise * update docs
1 parent 7933557 commit e8b47d9

File tree

11 files changed

+168
-110
lines changed

11 files changed

+168
-110
lines changed

doc/source/user_guide/timedeltas.rst

+9-8
Original file line numberDiff line numberDiff line change
@@ -236,9 +236,7 @@ Numeric reduction operation for ``timedelta64[ns]`` will return ``Timedelta`` ob
236236
Frequency conversion
237237
--------------------
238238

239-
Timedelta Series, ``TimedeltaIndex``, and ``Timedelta`` scalars can be converted to other 'frequencies' by dividing by another timedelta,
240-
or by astyping to a specific timedelta type. These operations yield Series and propagate ``NaT`` -> ``nan``.
241-
Note that division by the NumPy scalar is true division, while astyping is equivalent of floor division.
239+
Timedelta Series and ``TimedeltaIndex``, and ``Timedelta`` can be converted to other frequencies by astyping to a specific timedelta dtype.
242240

243241
.. ipython:: python
244242
@@ -250,14 +248,17 @@ Note that division by the NumPy scalar is true division, while astyping is equiv
250248
td[3] = np.nan
251249
td
252250
253-
# to days
254-
td / np.timedelta64(1, "D")
255-
td.astype("timedelta64[D]")
256-
257251
# to seconds
258-
td / np.timedelta64(1, "s")
259252
td.astype("timedelta64[s]")
260253
254+
For timedelta64 resolutions other than the supported "s", "ms", "us", "ns",
255+
an alternative is to divide by another timedelta object. Note that division by the NumPy scalar is true division, while astyping is equivalent of floor division.
256+
257+
.. ipython:: python
258+
259+
# to days
260+
td / np.timedelta64(1, "D")
261+
261262
# to months (these are constant months)
262263
td / np.timedelta64(1, "M")
263264

doc/source/whatsnew/v0.13.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -532,6 +532,7 @@ Enhancements
532532
is frequency conversion. See :ref:`the docs<timedeltas.timedeltas_convert>` for the docs.
533533

534534
.. ipython:: python
535+
:okexcept:
535536
536537
import datetime
537538
td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series(

doc/source/whatsnew/v2.0.0.rst

+85
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,91 @@ notable_bug_fix2
100100
Backwards incompatible API changes
101101
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102102

103+
104+
.. _whatsnew_200.api_breaking.astype_to_unsupported_datetimelike:
105+
106+
Disallow astype conversion to non-supported datetime64/timedelta64 dtypes
107+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
108+
In previous versions, converting a :class:`Series` or :class:`DataFrame`
109+
from ``datetime64[ns]`` to a different ``datetime64[X]`` dtype would return
110+
with ``datetime64[ns]`` dtype instead of the requested dtype. In pandas 2.0,
111+
support is added for "datetime64[s]", "datetime64[ms]", and "datetime64[us]" dtypes,
112+
so converting to those dtypes gives exactly the requested dtype:
113+
114+
*Previous behavior*:
115+
116+
.. ipython:: python
117+
118+
idx = pd.date_range("2016-01-01", periods=3)
119+
ser = pd.Series(idx)
120+
121+
*Previous behavior*:
122+
123+
.. code-block:: ipython
124+
125+
In [4]: ser.astype("datetime64[s]")
126+
Out[4]:
127+
0 2016-01-01
128+
1 2016-01-02
129+
2 2016-01-03
130+
dtype: datetime64[ns]
131+
132+
With the new behavior, we get exactly the requested dtype:
133+
134+
*New behavior*:
135+
136+
.. ipython:: python
137+
138+
ser.astype("datetime64[s]")
139+
140+
For non-supported resolutions e.g. "datetime64[D]", we raise instead of silently
141+
ignoring the requested dtype:
142+
143+
*New behavior*:
144+
145+
.. ipython:: python
146+
:okexcept:
147+
148+
ser.astype("datetime64[D]")
149+
150+
For conversion from ``timedelta64[ns]`` dtypes, the old behavior converted
151+
to a floating point format.
152+
153+
*Previous behavior*:
154+
155+
.. ipython:: python
156+
157+
idx = pd.timedelta_range("1 Day", periods=3)
158+
ser = pd.Series(idx)
159+
160+
*Previous behavior*:
161+
162+
.. code-block:: ipython
163+
164+
In [7]: ser.astype("timedelta64[s]")
165+
Out[7]:
166+
0 86400.0
167+
1 172800.0
168+
2 259200.0
169+
dtype: float64
170+
171+
In [8]: ser.astype("timedelta64[D]")
172+
Out[8]:
173+
0 1.0
174+
1 2.0
175+
2 3.0
176+
dtype: float64
177+
178+
The new behavior, as for datetime64, either gives exactly the requested dtype or raises:
179+
180+
*New behavior*:
181+
182+
.. ipython:: python
183+
:okexcept:
184+
185+
ser.astype("timedelta64[s]")
186+
ser.astype("timedelta64[D]")
187+
103188
.. _whatsnew_200.api_breaking.deps:
104189

105190
Increased minimum versions for dependencies

pandas/core/arrays/timedeltas.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@
4545
from pandas.compat.numpy import function as nv
4646
from pandas.util._validators import validate_endpoints
4747

48-
from pandas.core.dtypes.astype import astype_td64_unit_conversion
4948
from pandas.core.dtypes.common import (
5049
TD64NS_DTYPE,
5150
is_dtype_equal,
@@ -330,8 +329,11 @@ def astype(self, dtype, copy: bool = True):
330329
return type(self)._simple_new(
331330
res_values, dtype=res_values.dtype, freq=self.freq
332331
)
333-
334-
return astype_td64_unit_conversion(self._ndarray, dtype, copy=copy)
332+
else:
333+
raise ValueError(
334+
f"Cannot convert from {self.dtype} to {dtype}. "
335+
"Supported resolutions are 's', 'ms', 'us', 'ns'"
336+
)
335337

336338
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)
337339

pandas/core/dtypes/astype.py

+3-57
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,6 @@
1313
import numpy as np
1414

1515
from pandas._libs import lib
16-
from pandas._libs.tslibs import (
17-
get_unit_from_dtype,
18-
is_supported_unit,
19-
is_unitless,
20-
)
2116
from pandas._libs.tslibs.timedeltas import array_to_timedelta64
2217
from pandas._typing import (
2318
ArrayLike,
@@ -131,12 +126,10 @@ def astype_nansafe(
131126
elif dtype.kind == "m":
132127
# give the requested dtype for supported units (s, ms, us, ns)
133128
# and doing the old convert-to-float behavior otherwise.
134-
if is_supported_unit(get_unit_from_dtype(arr.dtype)):
135-
from pandas.core.construction import ensure_wrapped_if_datetimelike
129+
from pandas.core.construction import ensure_wrapped_if_datetimelike
136130

137-
arr = ensure_wrapped_if_datetimelike(arr)
138-
return arr.astype(dtype, copy=copy)
139-
return astype_td64_unit_conversion(arr, dtype, copy=copy)
131+
arr = ensure_wrapped_if_datetimelike(arr)
132+
return arr.astype(dtype, copy=copy)
140133

141134
raise TypeError(f"cannot astype a timedelta from [{arr.dtype}] to [{dtype}]")
142135

@@ -291,20 +284,6 @@ def astype_array_safe(
291284
# Ensure we don't end up with a PandasArray
292285
dtype = dtype.numpy_dtype
293286

294-
if (
295-
is_datetime64_dtype(values.dtype)
296-
# need to do np.dtype check instead of is_datetime64_dtype
297-
# otherwise pyright complains
298-
and isinstance(dtype, np.dtype)
299-
and dtype.kind == "M"
300-
and not is_unitless(dtype)
301-
and not is_dtype_equal(dtype, values.dtype)
302-
and not is_supported_unit(get_unit_from_dtype(dtype))
303-
):
304-
# Supported units we handle in DatetimeArray.astype; but that raises
305-
# on non-supported units, so we handle that here.
306-
return np.asarray(values).astype(dtype)
307-
308287
try:
309288
new_values = astype_array(values, dtype, copy=copy)
310289
except (ValueError, TypeError):
@@ -316,36 +295,3 @@ def astype_array_safe(
316295
raise
317296

318297
return new_values
319-
320-
321-
def astype_td64_unit_conversion(
322-
values: np.ndarray, dtype: np.dtype, copy: bool
323-
) -> np.ndarray:
324-
"""
325-
By pandas convention, converting to non-nano timedelta64
326-
returns an int64-dtyped array with ints representing multiples
327-
of the desired timedelta unit. This is essentially division.
328-
329-
Parameters
330-
----------
331-
values : np.ndarray[timedelta64[ns]]
332-
dtype : np.dtype
333-
timedelta64 with unit not-necessarily nano
334-
copy : bool
335-
336-
Returns
337-
-------
338-
np.ndarray
339-
"""
340-
if is_dtype_equal(values.dtype, dtype):
341-
if copy:
342-
return values.copy()
343-
return values
344-
345-
# otherwise we are converting to non-nano
346-
result = values.astype(dtype, copy=False) # avoid double-copying
347-
result = result.astype(np.float64)
348-
349-
mask = isna(values)
350-
np.putmask(result, mask, np.nan)
351-
return result

pandas/tests/dtypes/test_inference.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1868,8 +1868,8 @@ def test_is_timedelta(self):
18681868
assert is_timedelta64_ns_dtype(tdi.astype("timedelta64[ns]"))
18691869

18701870
# Conversion to Int64Index:
1871-
assert not is_timedelta64_ns_dtype(tdi.astype("timedelta64"))
1872-
assert not is_timedelta64_ns_dtype(tdi.astype("timedelta64[h]"))
1871+
assert not is_timedelta64_ns_dtype(Index([], dtype=np.float64))
1872+
assert not is_timedelta64_ns_dtype(Index([], dtype=np.int64))
18731873

18741874

18751875
class TestIsScalar:

pandas/tests/frame/constructors/test_from_records.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@ def test_from_records_with_datetimes(self):
4444
dtypes = [("EXPIRY", "<M8[m]")]
4545
recarray = np.core.records.fromarrays(arrdata, dtype=dtypes)
4646
result = DataFrame.from_records(recarray)
47-
expected["EXPIRY"] = expected["EXPIRY"].astype("M8[m]")
47+
# we get the closest supported unit, "s"
48+
expected["EXPIRY"] = expected["EXPIRY"].astype("M8[s]")
4849
tm.assert_frame_equal(result, expected)
4950

5051
def test_from_records_sequencelike(self):

pandas/tests/frame/methods/test_astype.py

+47-30
Original file line numberDiff line numberDiff line change
@@ -418,14 +418,28 @@ def test_astype_to_datetime_unit(self, unit):
418418
idx = pd.Index(ser)
419419
dta = ser._values
420420

421-
result = df.astype(dtype)
422-
423421
if unit in ["ns", "us", "ms", "s"]:
424422
# GH#48928
425423
exp_dtype = dtype
424+
result = df.astype(dtype)
426425
else:
427426
# we use the nearest supported dtype (i.e. M8[s])
428427
exp_dtype = "M8[s]"
428+
msg = rf"Cannot cast DatetimeArray to dtype datetime64\[{unit}\]"
429+
with pytest.raises(TypeError, match=msg):
430+
df.astype(dtype)
431+
432+
with pytest.raises(TypeError, match=msg):
433+
ser.astype(dtype)
434+
435+
with pytest.raises(TypeError, match=msg.replace("Array", "Index")):
436+
idx.astype(dtype)
437+
438+
with pytest.raises(TypeError, match=msg):
439+
dta.astype(dtype)
440+
441+
return
442+
429443
# TODO(2.0): once DataFrame constructor doesn't cast ndarray inputs.
430444
# can simplify this
431445
exp_values = arr.astype(exp_dtype)
@@ -437,32 +451,22 @@ def test_astype_to_datetime_unit(self, unit):
437451

438452
tm.assert_frame_equal(result, exp_df)
439453

440-
# TODO(2.0): make Series/DataFrame raise like Index and DTA?
441454
res_ser = ser.astype(dtype)
442455
exp_ser = exp_df.iloc[:, 0]
443456
assert exp_ser.dtype == exp_dtype
444457
tm.assert_series_equal(res_ser, exp_ser)
445458

446-
if unit in ["ns", "us", "ms", "s"]:
447-
exp_dta = exp_ser._values
459+
exp_dta = exp_ser._values
448460

449-
res_index = idx.astype(dtype)
450-
# TODO(2.0): should be able to just call pd.Index(exp_ser)
451-
exp_index = pd.DatetimeIndex._simple_new(exp_dta, name=idx.name)
452-
assert exp_index.dtype == exp_dtype
453-
tm.assert_index_equal(res_index, exp_index)
461+
res_index = idx.astype(dtype)
462+
# TODO(2.0): should be able to just call pd.Index(exp_ser)
463+
exp_index = pd.DatetimeIndex._simple_new(exp_dta, name=idx.name)
464+
assert exp_index.dtype == exp_dtype
465+
tm.assert_index_equal(res_index, exp_index)
454466

455-
res_dta = dta.astype(dtype)
456-
assert exp_dta.dtype == exp_dtype
457-
tm.assert_extension_array_equal(res_dta, exp_dta)
458-
else:
459-
msg = rf"Cannot cast DatetimeIndex to dtype datetime64\[{unit}\]"
460-
with pytest.raises(TypeError, match=msg):
461-
idx.astype(dtype)
462-
463-
msg = rf"Cannot cast DatetimeArray to dtype datetime64\[{unit}\]"
464-
with pytest.raises(TypeError, match=msg):
465-
dta.astype(dtype)
467+
res_dta = dta.astype(dtype)
468+
assert exp_dta.dtype == exp_dtype
469+
tm.assert_extension_array_equal(res_dta, exp_dta)
466470

467471
@pytest.mark.parametrize("unit", ["ns"])
468472
def test_astype_to_timedelta_unit_ns(self, unit):
@@ -483,22 +487,35 @@ def test_astype_to_timedelta_unit(self, unit):
483487
dtype = f"m8[{unit}]"
484488
arr = np.array([[1, 2, 3]], dtype=dtype)
485489
df = DataFrame(arr)
490+
ser = df.iloc[:, 0]
491+
tdi = pd.Index(ser)
492+
tda = tdi._values
493+
486494
if unit in ["us", "ms", "s"]:
487495
assert (df.dtypes == dtype).all()
496+
result = df.astype(dtype)
488497
else:
489498
# We get the nearest supported unit, i.e. "s"
490499
assert (df.dtypes == "m8[s]").all()
491500

492-
result = df.astype(dtype)
493-
if unit in ["m", "h", "D"]:
494-
# We don't support these, so we use the pre-2.0 logic to convert to float
495-
# (xref GH#48979)
496-
497-
expected = DataFrame(df.values.astype(dtype).astype(float))
498-
else:
499-
# The conversion is a no-op, so we just get a copy
500-
expected = df
501+
msg = (
502+
rf"Cannot convert from timedelta64\[s\] to timedelta64\[{unit}\]. "
503+
"Supported resolutions are 's', 'ms', 'us', 'ns'"
504+
)
505+
with pytest.raises(ValueError, match=msg):
506+
df.astype(dtype)
507+
with pytest.raises(ValueError, match=msg):
508+
ser.astype(dtype)
509+
with pytest.raises(ValueError, match=msg):
510+
tdi.astype(dtype)
511+
with pytest.raises(ValueError, match=msg):
512+
tda.astype(dtype)
513+
514+
return
501515

516+
result = df.astype(dtype)
517+
# The conversion is a no-op, so we just get a copy
518+
expected = df
502519
tm.assert_frame_equal(result, expected)
503520

504521
@pytest.mark.parametrize("unit", ["ns", "us", "ms", "s", "h", "m", "D"])

0 commit comments

Comments
 (0)