Skip to content

DEPR/API: Non-ns precision in Index constructors #24806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 21, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,7 @@ Backwards incompatible API changes
- Incorrectly passing a :class:`DatetimeIndex` to :meth:`MultiIndex.from_tuples`, rather than a sequence of tuples, now raises a ``TypeError`` rather than a ``ValueError`` (:issue:`24024`)
- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`)
- In 0.23.x, pandas would raise a ``ValueError`` on a merge of a numeric column (e.g. ``int`` dtyped column) and an ``object`` dtyped column (:issue:`9780`). We have re-enabled the ability to merge ``object`` and other dtypes; pandas will still raise on a merge between a numeric and an ``object`` dtyped column that is composed only of strings (:issue:`21681`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` no longer ignore the dtype precision. Passing a non-nanosecond resolution dtype will raise a ``ValueError`` (:issue:`24753`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do people think about this? In 0.23.4 we ignore the dtype precision

In [3]: pd.DatetimeIndex(['2000'], dtype='datetime64[us]')
Out[3]: DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', freq=None)

Should that raise now? Or should we deprecate first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually can convert this (and I think we do?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I would not be averse to deprecating for consistency reasons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do convert an array with non-ns precision dtype. That's similar to converting any sequence of datetime-like things

In [7]: arr = np.array([1, 2], dtype='datetime64[us]')

In [8]: pd.DatetimeIndex(arr)
Out[8]: DatetimeIndex(['1970-01-01 00:00:00.000001', '1970-01-01 00:00:00.000002'], dtype='datetime64[ns]', freq=None)

The issue is when the user says DatetimeIndex(..., dtype='datetime64[us]'). That's like saying "I want the dtype of the output DatetimeIndex to be datetime64[us]", which isn't supported.


Percentage change on groupby
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -1310,6 +1311,7 @@ Deprecations
- :meth:`Series.nonzero` is deprecated and will be removed in a future version (:issue:`18262`)
- Passing an integer to :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtypes is deprecated, will raise ``TypeError`` in a future version. Use ``obj.fillna(pd.Timedelta(...))`` instead (:issue:`24694`)
- ``Series.cat.categorical``, ``Series.cat.name`` and ``Sersies.cat.index`` have been deprecated. Use the attributes on ``Series.cat`` or ``Series`` directly. (:issue:`24751`).
- Passing a dtype without a precision like ``np.dtype('datetime64')`` or ``timedelta64`` to :class:`DatetimeIndex` and :class:`TimedeltaIndex` is now deprecated. Use the nanosecond-precision dtype instead (:issue:`24753`).

.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Expand Down
11 changes: 11 additions & 0 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# -*- coding: utf-8 -*-
from datetime import datetime, time, timedelta
import textwrap
import warnings

import numpy as np
Expand Down Expand Up @@ -1986,6 +1987,16 @@ def _validate_dt64_dtype(dtype):
"""
if dtype is not None:
dtype = pandas_dtype(dtype)

if isinstance(dtype, np.dtype) and dtype == np.dtype("M8"):
# no precision, warn
dtype = _NS_DTYPE
msg = textwrap.dedent("""\
Passing in 'datetime64' dtype with no precision is deprecated
and will raise in a future version. Please pass in
'datetime64[ns]' instead.""")
warnings.warn(msg, FutureWarning, stacklevel=5)

if ((isinstance(dtype, np.dtype) and dtype != _NS_DTYPE)
or not isinstance(dtype, (np.dtype, DatetimeTZDtype))):
raise ValueError("Unexpected value for 'dtype': '{dtype}'. "
Expand Down
42 changes: 29 additions & 13 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from __future__ import division

from datetime import timedelta
import textwrap
import warnings

import numpy as np
Expand Down Expand Up @@ -160,16 +161,8 @@ def __init__(self, values, dtype=_TD_DTYPE, freq=None, copy=False):
# nanosecond UTC (or tz-naive) unix timestamps
values = values.view(_TD_DTYPE)

if values.dtype != _TD_DTYPE:
raise TypeError(_BAD_DTYPE.format(dtype=values.dtype))

try:
dtype_mismatch = dtype != _TD_DTYPE
except TypeError:
raise TypeError(_BAD_DTYPE.format(dtype=dtype))
else:
if dtype_mismatch:
raise TypeError(_BAD_DTYPE.format(dtype=dtype))
_validate_td64_dtype(values.dtype)
dtype = _validate_td64_dtype(dtype)

if freq == "infer":
msg = (
Expand Down Expand Up @@ -204,9 +197,8 @@ def _simple_new(cls, values, freq=None, dtype=_TD_DTYPE):
@classmethod
def _from_sequence(cls, data, dtype=_TD_DTYPE, copy=False,
freq=None, unit=None):
if dtype != _TD_DTYPE:
raise ValueError("Only timedelta64[ns] dtype is valid.")

if dtype:
_validate_td64_dtype(dtype)
freq, freq_infer = dtl.maybe_infer_freq(freq)

data, inferred_freq = sequence_to_td64ns(data, copy=copy, unit=unit)
Expand Down Expand Up @@ -997,6 +989,30 @@ def objects_to_td64ns(data, unit="ns", errors="raise"):
return result.view('timedelta64[ns]')


def _validate_td64_dtype(dtype):
try:
if dtype == np.dtype("timedelta64"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, use is_dtype_equal

dtype = _TD_DTYPE
msg = textwrap.dedent("""\
Passing in 'timedelta' dtype with no precision is deprecated
and will raise in a future version. Please pass in
'timedelta64[ns]' instead.""")
warnings.warn(msg, FutureWarning, stacklevel=4)
except TypeError:
# extension dtype
pass

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can avoid all of this try/except/else if you use ``not is_dtype_equal(dtype, _TD_DTYPE)

dtype_mismatch = dtype != _TD_DTYPE
except TypeError:
raise ValueError(_BAD_DTYPE.format(dtype=dtype))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the change from TypeError to ValueError, when compared with https://github.com/pandas-dev/pandas/pull/24806/files#diff-57368e94ebd4c37c96935ab9862e3bc7L169. ValueError is more consistent with DatetimeArray.

else:
if dtype_mismatch:
raise ValueError(_BAD_DTYPE.format(dtype=dtype))

return dtype


def _generate_regular_range(start, end, periods, offset):
stride = offset.nanos
if periods is None:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,8 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
elif (is_timedelta64_dtype(data) or
(dtype is not None and is_timedelta64_dtype(dtype))):
from pandas import TimedeltaIndex
result = TimedeltaIndex(data, copy=copy, name=name, **kwargs)
result = TimedeltaIndex(data, copy=copy, name=name, dtype=dtype,
**kwargs)
if dtype is not None and _o_dtype == dtype:
return Index(result.to_pytimedelta(), dtype=_o_dtype)
else:
Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/arrays/test_timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@ def test_non_array_raises(self):
TimedeltaArray([1, 2, 3])

def test_other_type_raises(self):
with pytest.raises(TypeError,
with pytest.raises(ValueError,
match="dtype bool cannot be converted"):
TimedeltaArray(np.array([1, 2, 3], dtype='bool'))

def test_incorrect_dtype_raises(self):
# TODO: why TypeError for 'category' but ValueError for i8?
with pytest.raises(TypeError,
with pytest.raises(ValueError,
match=r'category cannot be converted '
r'to timedelta64\[ns\]'):
TimedeltaArray(np.array([1, 2, 3], dtype='i8'), dtype='category')

with pytest.raises(TypeError,
with pytest.raises(ValueError,
match=r"dtype int64 cannot be converted "
r"to timedelta64\[ns\]"):
TimedeltaArray(np.array([1, 2, 3], dtype='i8'),
Expand All @@ -52,7 +52,7 @@ def test_copy(self):

class TestTimedeltaArray(object):
def test_from_sequence_dtype(self):
msg = r"Only timedelta64\[ns\] dtype is valid"
msg = "dtype .*object.* cannot be converted to timedelta64"
with pytest.raises(ValueError, match=msg):
TimedeltaArray._from_sequence([], dtype=object)

Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,23 @@ def test_construction_with_nat_and_tzlocal(self):
expected = DatetimeIndex([Timestamp('2018', tz=tz), pd.NaT])
tm.assert_index_equal(result, expected)

def test_constructor_no_precision_warns(self):
# GH-24753, GH-24739
expected = pd.DatetimeIndex(['2000'], dtype='datetime64[ns]')

# we set the stacklevel for DatetimeIndex
with tm.assert_produces_warning(FutureWarning):
result = pd.DatetimeIndex(['2000'], dtype='datetime64')
tm.assert_index_equal(result, expected)

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
result = pd.Index(['2000'], dtype='datetime64')
tm.assert_index_equal(result, expected)

def test_constructor_wrong_precision_raises(self):
with pytest.raises(ValueError):
pd.DatetimeIndex(['2000'], dtype='datetime64[us]')


class TestTimeSeries(object):

Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/indexes/timedeltas/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,20 @@ def test_constructor_name(self):
# GH10025
idx2 = TimedeltaIndex(idx, name='something else')
assert idx2.name == 'something else'

def test_constructor_no_precision_warns(self):
# GH-24753, GH-24739
expected = pd.TimedeltaIndex(['2000'], dtype='timedelta64[ns]')

# we set the stacklevel for DatetimeIndex
with tm.assert_produces_warning(FutureWarning):
result = pd.TimedeltaIndex(['2000'], dtype='timedelta64')
tm.assert_index_equal(result, expected)

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
result = pd.Index(['2000'], dtype='timedelta64')
tm.assert_index_equal(result, expected)

def test_constructor_wrong_precision_raises(self):
with pytest.raises(ValueError):
pd.TimedeltaIndex(['2000'], dtype='timedelta64[us]')