Skip to content

API: to_datetime, required unit with numerical (#15836) #15896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 78 additions & 2 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,89 @@ Other Enhancements
- :func:`pd.read_sas()` now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files (:issue:`15871`).
- :func:`DataFrame.items` and :func:`Series.items` is now present in both Python 2 and 3 and is lazy in all cases (:issue:`13918`, :issue:`17213`)



.. _whatsnew_0210.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0210.api_breaking.pandas_to_datetime:

Numerical values need an explicit unit in pd.to_datetime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- :func:`to_datetime` requires an unit with numerical arg (scalar or iterable), if not provided it raises an error (:issue:`15836`)
For example:

.. ipython:: python

# Old behaviour:
In [1]: pd.to_datetime(42)
Out[1]: Timestamp('1970-01-01 00:00:00.000000042')

# New behaviour
In [1]: pd.to_datetime(42)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-a8ad7fa1924c> in <module>()
----> 1 pd.to_datetime(42)

/home/anthony/src/pandas/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
461 elif ((not isinstance(arg, DataFrame)) and
462 (check_numerical_arg() and unit is None and format is None)):
--> 463 raise ValueError("a unit is required in case of numerical arg")
464
465 # handle origin

ValueError: a unit is required in case of numerical arg

In [2]: pd.to_datetime(42, unit='ns')
Out[2]: Timestamp('1970-01-01 00:00:00.000000042')

Furthermore, this change fixes a bug with boolean values.

.. ipython:: python
# Old behaviour
In [1]: pd.to_datetime(True, unit='ms')
Out[1]: Timestamp('1970-01-01 00:00:00.001000')

# New behaviour
In [2]: pd.to_datetime(True, unit='ms')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-d7a95ef3ecc2> in <module>()
----> 1 pd.to_datetime(True, unit='ms')

/home/anthony/src/pandas/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
533 result = _convert_listlike(arg, box, format)
534 else:
--> 535 result = _convert_listlike(np.array([arg]), box, format)[0]
536
537 return result

/home/anthony/src/pandas/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
374 arg = getattr(arg, 'values', arg)
375 result = tslib.array_with_unit_to_datetime(arg, unit,
--> 376 errors=errors)
377 if box:
378 if errors == 'ignore':

/home/anthony/src/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_with_unit_to_datetime()
2210
2211
-> 2212 cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):
2213 """
2214 convert the ndarray according to the unit

/home/anthony/src/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_with_unit_to_datetime()
2246 raise TypeError("{0} is not convertible to datetime"
2247 .format(values.dtype))
-> 2248
2249 # try a quick conversion to i8
2250 # if we have nulls that are not type-compat

TypeError: bool is not convertible to datetime

Now boolean values raise an error everytime.

.. _whatsnew_0210.api_breaking.deps:

Expand Down
19 changes: 17 additions & 2 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ cdef extern from "Python.h":
from libc.stdlib cimport free

from util cimport (is_integer_object, is_float_object, is_datetime64_object,
is_timedelta64_object, INT64_MAX)
is_bool_object, is_timedelta64_object, INT64_MAX)
cimport util

# this is our datetime.pxd
Expand Down Expand Up @@ -2242,6 +2242,9 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):
m = cast_from_unit(None, unit)

if is_raise:
if np.issubdtype(values.dtype, np.bool_):
raise TypeError("{0} is not convertible to datetime"
.format(values.dtype))

# try a quick conversion to i8
# if we have nulls that are not type-compat
Expand Down Expand Up @@ -2277,6 +2280,16 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):
if _checknull_with_nat(val):
iresult[i] = NPY_NAT

elif is_bool_object(val):
if is_raise:
raise TypeError(
"{0} is not convertible to datetime"
.format(values.dtype)
)
elif is_ignore:
raise AssertionError
iresult[i] = NPY_NAT

elif is_integer_object(val) or is_float_object(val):

if val != val or val == NPY_NAT:
Expand Down Expand Up @@ -2320,7 +2333,7 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):
else:

if is_raise:
raise ValueError("non convertible value {0}"
raise ValueError("non convertible value {0} "
"with the unit '{1}'".format(
val,
unit))
Expand All @@ -2344,6 +2357,8 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):

if _checknull_with_nat(val):
oresult[i] = NaT
elif is_bool_object(val):
oresult[i] = val
elif is_integer_object(val) or is_float_object(val):

if val != val or val == NPY_NAT:
Expand Down
8 changes: 5 additions & 3 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def trans(x): # noqa
if dtype.tz:
# convert to datetime and change timezone
from pandas import to_datetime
result = to_datetime(result).tz_localize('utc')
result = to_datetime(result, unit='ns').tz_localize('utc')
result = result.tz_convert(dtype.tz)

except:
Expand Down Expand Up @@ -963,11 +963,13 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
dtype):
try:
if is_datetime64:
value = to_datetime(value, errors=errors)._values
value = to_datetime(value, unit='ns',
errors=errors)._values
elif is_datetime64tz:
# input has to be UTC at this point, so just
# localize
value = (to_datetime(value, errors=errors)
value = (to_datetime(value, unit='ns',
errors=errors)
.tz_localize('UTC')
.tz_convert(dtype.tz)
)
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ def __new__(cls, data=None,

dayfirst = kwargs.pop('dayfirst', None)
yearfirst = kwargs.pop('yearfirst', None)
unit = kwargs.pop('unit', None)

freq_infer = False
if not isinstance(freq, DateOffset):
Expand Down Expand Up @@ -333,7 +334,7 @@ def __new__(cls, data=None,
if not (is_datetime64_dtype(data) or is_datetimetz(data) or
is_integer_dtype(data)):
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)
unit=unit, yearfirst=yearfirst)

if issubclass(data.dtype.type, np.datetime64) or is_datetimetz(data):

Expand Down
15 changes: 11 additions & 4 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
- If True, require an exact format match.
- If False, allow the format to match anywhere in the target string.

unit : string, default 'ns'
unit : string, default None
unit of the arg (D,s,ms,us,ns) denote the unit, which is an
integer or float number. This will be based off the origin.
Example, with unit='ms' and origin='unix' (the default), this
Expand Down Expand Up @@ -342,6 +342,7 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
pandas.to_timedelta : Convert argument to timedelta.
"""
from pandas.core.indexes.datetimes import DatetimeIndex
from pandas.core.frame import DataFrame

tz = 'utc' if utc else None

Expand Down Expand Up @@ -451,8 +452,15 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
except (ValueError, TypeError):
raise e

def check_numerical_arg():
return ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding the .size check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, pd.to_datetime([]) raises a ValueError("a unit is required in case of numerical arg") which is not the actual behaviour in some others tests (pandas/tests/test_resample.py::TestPeriodIndex::test_resample_empty_dtypes for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then just do len(arg) instead (its by-definition already an interable here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a scalar argument too (example: pd.to_datetime(2, unit='ns') ; arg = 2 here), i have to do a np.asarray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it can't that would hit the first clause (the is_scalar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, wrong example. It is bool element that go wrong.

In [8]: pd.to_datetime(True, unit='ns')
TypeError: object of type 'bool' has no len()

(is_numeric_dtype(np.asarray(arg)) and np.asarray(arg).size))

if arg is None:
return None
elif ((not isinstance(arg, DataFrame)) and
(check_numerical_arg() and unit is None and format is None)):
raise ValueError("a unit is required in case of numerical arg")

# handle origin
if origin == 'julian':
Expand All @@ -479,8 +487,7 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):

# arg must be a numeric
original = arg
if not ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or
is_numeric_dtype(np.asarray(arg))):
if not check_numerical_arg():
raise ValueError(
"'{arg}' is not compatible with origin='{origin}'; "
"it must be numeric with a unit specified ".format(
Expand Down Expand Up @@ -605,7 +612,7 @@ def f(value):
if len(excess):
raise ValueError("extra keys have been passed "
"to the datetime assemblage: "
"[{excess}]".format(','.join(excess=excess)))
"[{}]".format(','.join(excess)))

def coerce(values):
# we allow coercion to if errors allows
Expand Down
Loading