-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: to_datetime, required unit with numerical (#15836) #15896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tseries/tools.py
Outdated
@@ -437,6 +437,10 @@ def _convert_listlike(arg, box, format, name=None, tz=tz): | |||
if arg is None: | |||
return None | |||
|
|||
check_arg = np.array(arg) if np.iterable(arg) else np.array([arg]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do
if unit is None:
....
move the actual is_numeric_dtype / is_scalar check (same one that is in origin) to a function (that you call in both places)
@@ -715,6 +715,13 @@ def test_to_datetime_unprocessable_input(self): | |||
) | |||
self.assertRaises(TypeError, to_datetime, [1, '1'], errors='raise') | |||
|
|||
def test_to_datetime_numerical_input(self): | |||
self.assertRaises(ValueError, to_datetime, arg=int(1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use with pytest.raises(...):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also do this with parametrize: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#how-to-use-parametrize
I took into account what you said, but I still have issues with two tests What behaviour to adopt in those cases ? |
to_datetime(False) | ||
self.assertTrue(to_datetime(False, errors="coerce") is NaT) | ||
self.assertEqual(to_datetime(False, errors="ignore"), False) | ||
to_datetime(False, unit='ns') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally you would parametrize this on units to test all the units, see https://github.com/pandas-dev/pandas/blob/master/pandas/tests/indexes/datetimes/test_tools.py#L1520
(move the units fixture to the top of the file)
@@ -715,6 +716,14 @@ def test_to_datetime_unprocessable_input(self): | |||
) | |||
self.assertRaises(TypeError, to_datetime, [1, '1'], errors='raise') | |||
|
|||
@pytest.mark.parametrize('arg', [int(1), float(1), range(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to a separate class that inherits from object
(to avoid using staticmethod
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively. We can simply remove the inheritance from tm.TestCase
from this class. It was prob just copy-pasted, and prob not needed. (this is what causes parametrize to not work).
you would need to change most self.assert*
methods to use tm.assert*
methods.
pandas/tseries/tools.py
Outdated
@@ -434,9 +434,17 @@ def _convert_listlike(arg, box, format, name=None, tz=tz): | |||
except (ValueError, TypeError): | |||
raise e | |||
|
|||
def check_numerical_arg(): | |||
return ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so need to exclude DataFrame
from this check I think.
pandas/tseries/tools.py
Outdated
if arg is None: | ||
return None | ||
|
||
if unit is None and format is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively:
if unit is not None:
if isinstance(arg, DataFrame):
raise nice message here
elif format is None and check_numerical_arg():
.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a specific test for passing a DataFrame with a not-None unit as well.
Given the changes that we have to make in our own test suite, we should maybe consider to make this a deprecation first before actually changing? (raising a warning you should provide |
I think its better to actually make this an error now. If you are doing this would almost always don't mean |
f723be4
to
0520b70
Compare
can you rebase |
a411af6
to
904e45c
Compare
Codecov Report
@@ Coverage Diff @@
## master #15896 +/- ##
==========================================
+ Coverage 90.96% 91.1% +0.13%
==========================================
Files 161 161
Lines 49301 49307 +6
==========================================
+ Hits 44849 44923 +74
+ Misses 4452 4384 -68
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #15896 +/- ##
==========================================
+ Coverage 91.01% 91.1% +0.09%
==========================================
Files 163 161 -2
Lines 49570 49307 -263
==========================================
- Hits 45116 44923 -193
+ Misses 4454 4384 -70
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -40,6 +40,7 @@ Other Enhancements | |||
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`) | |||
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`) | |||
- :func:`Dataframe.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`) | |||
- :func:`to_datetime` requires an unit with numerical arg (:issue:`15836`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in API breaking changes. say that it will now raise if a numeric arg is provided w/o a unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs a mini-sub-section as this is a non-obvious change. show the previous and new behavior.
pandas/_libs/tslib.pyx
Outdated
@@ -2230,6 +2230,11 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'): | |||
m = cast_from_unit(None, unit) | |||
|
|||
if is_raise: | |||
from pandas.core.dtypes.common import is_bool_dtype | |||
|
|||
if is_bool_dtype(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise on non-numeric instead here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I raise only on non-numeric, boolean values go through and do not raise error. Plus, it causes issues with list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, instead of using is_bool_dtype
here, can do issubclass(isvalues.dtype, np.bool_)
is slightly more idiomatic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use np.issubdtype(values.dtype, np.bool_) instead of is_bool_dtype
pandas/core/tools/datetimes.py
Outdated
@@ -234,7 +234,7 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | |||
- If True, require an exact format match. | |||
- If False, allow the format to match anywhere in the target string. | |||
|
|||
unit : string, default 'ns' | |||
unit : string | |||
unit of the arg (D,s,ms,us,ns) denote the unit, which is an | |||
integer or float number. This will be based off the origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to the raises section in doc-string that a ValueError on non-numeric input when a unit is specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is default to None
@@ -446,8 +447,15 @@ def _convert_listlike(arg, box, format, name=None, tz=tz): | |||
except (ValueError, TypeError): | |||
raise e | |||
|
|||
def check_numerical_arg(): | |||
return ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding the .size check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, pd.to_datetime([])
raises a ValueError("a unit is required in case of numerical arg") which is not the actual behaviour in some others tests (pandas/tests/test_resample.py::TestPeriodIndex::test_resample_empty_dtypes
for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then just do len(arg) instead (its by-definition already an interable here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a scalar argument too (example: pd.to_datetime(2, unit='ns')
; arg = 2 here), i have to do a np.asarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it can't that would hit the first clause (the is_scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, wrong example. It is bool element that go wrong.
In [8]: pd.to_datetime(True, unit='ns')
TypeError: object of type 'bool' has no len()
can you rebase and update to comments. |
All numerical types now need a unit in to_datetime call. * Update tests * Add bool verification in tslib.pyx * Bug correction with bool and unit in ['ms', 'us', 's', 'D'] * to_datetime(True, unit='ms') did run without error * to_datetime(True, unit='ms', errors='ignore') returned None Issue: pandas-dev#15836
can you rebase and update |
closing as stale. if you'd like to continue pls ping. |
git diff upstream/master --name-only -- '*.py' | flake8 --diff