Skip to content

API: to_datetime, required unit with numerical (#15836) #15896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

atbd
Copy link
Contributor

@atbd atbd commented Apr 4, 2017

  • add test_to_datetime_numerical_input
  • check arg for numerical type

@@ -437,6 +437,10 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
if arg is None:
return None

check_arg = np.array(arg) if np.iterable(arg) else np.array([arg])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do

if unit is None:
    ....

move the actual is_numeric_dtype / is_scalar check (same one that is in origin) to a function (that you call in both places)

@@ -715,6 +715,13 @@ def test_to_datetime_unprocessable_input(self):
)
self.assertRaises(TypeError, to_datetime, [1, '1'], errors='raise')

def test_to_datetime_numerical_input(self):
self.assertRaises(ValueError, to_datetime, arg=int(1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use with pytest.raises(...):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atbd
Copy link
Contributor Author

atbd commented Apr 5, 2017

I took into account what you said, but I still have issues with two tests test_dataframe_dtypes and test_dataframe which do not need unit to work.
The units are directly in the keys of the structures.

What behaviour to adopt in those cases ?

to_datetime(False)
self.assertTrue(to_datetime(False, errors="coerce") is NaT)
self.assertEqual(to_datetime(False, errors="ignore"), False)
to_datetime(False, unit='ns')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally you would parametrize this on units to test all the units, see https://github.com/pandas-dev/pandas/blob/master/pandas/tests/indexes/datetimes/test_tools.py#L1520

(move the units fixture to the top of the file)

@@ -715,6 +716,14 @@ def test_to_datetime_unprocessable_input(self):
)
self.assertRaises(TypeError, to_datetime, [1, '1'], errors='raise')

@pytest.mark.parametrize('arg', [int(1), float(1), range(5),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to a separate class that inherits from object (to avoid using staticmethod).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively. We can simply remove the inheritance from tm.TestCase from this class. It was prob just copy-pasted, and prob not needed. (this is what causes parametrize to not work).

you would need to change most self.assert* methods to use tm.assert* methods.

@@ -434,9 +434,17 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
except (ValueError, TypeError):
raise e

def check_numerical_arg():
return ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so need to exclude DataFrame from this check I think.

if arg is None:
return None

if unit is None and format is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively:

if unit is not None:
    if isinstance(arg, DataFrame):
         raise nice message here
    elif format is None and check_numerical_arg():
        .....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a specific test for passing a DataFrame with a not-None unit as well.

@jorisvandenbossche
Copy link
Member

Given the changes that we have to make in our own test suite, we should maybe consider to make this a deprecation first before actually changing? (raising a warning you should provide unit, and in future not providing will raise?)

@jreback
Copy link
Contributor

jreback commented Apr 6, 2017

I think its better to actually make this an error now. If you are doing this would almost always don't mean ns, so that is the only case that is really affected. Otherwise its an error (not so obvious though), so better to fail this. IOW, this is a bug, not an API change.

@jreback jreback added API Design Bug Datetime Datetime data dtype labels Apr 6, 2017
@atbd atbd force-pushed the 15836 branch 2 times, most recently from f723be4 to 0520b70 Compare April 25, 2017 13:52
@atbd atbd changed the title [WIP] API: to_datetime, required unit with numerical (#15836) API: to_datetime, required unit with numerical (#15836) Apr 25, 2017
@jreback
Copy link
Contributor

jreback commented May 13, 2017

can you rebase

@codecov
Copy link

codecov bot commented Jul 12, 2017

Codecov Report

Merging #15896 into master will increase coverage by 0.13%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15896      +/-   ##
==========================================
+ Coverage   90.96%    91.1%   +0.13%     
==========================================
  Files         161      161              
  Lines       49301    49307       +6     
==========================================
+ Hits        44849    44923      +74     
+ Misses       4452     4384      -68
Flag Coverage Δ
#multiple 88.88% <90%> (+0.15%) ⬆️
#single 40.2% <60%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.33% <100%> (ø) ⬆️
pandas/core/dtypes/cast.py 86.89% <100%> (ø) ⬆️
pandas/core/tools/datetimes.py 85.2% <83.33%> (+18.26%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.71% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.05% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9421af...904e45c. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 12, 2017

Codecov Report

Merging #15896 into master will increase coverage by 0.09%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15896      +/-   ##
==========================================
+ Coverage   91.01%    91.1%   +0.09%     
==========================================
  Files         163      161       -2     
  Lines       49570    49307     -263     
==========================================
- Hits        45116    44923     -193     
+ Misses       4454     4384      -70
Flag Coverage Δ
#multiple 88.88% <90%> (+0.1%) ⬆️
#single 40.2% <60%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.33% <100%> (ø) ⬆️
pandas/core/dtypes/cast.py 86.89% <100%> (-0.93%) ⬇️
pandas/core/tools/datetimes.py 85.2% <83.33%> (+18.17%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/accessors.py 88.09% <0%> (-1.16%) ⬇️
pandas/core/internals.py 93.43% <0%> (-0.87%) ⬇️
pandas/util/_decorators.py 65.3% <0%> (-0.7%) ⬇️
pandas/plotting/_timeseries.py 60.2% <0%> (-0.53%) ⬇️
pandas/core/sparse/frame.py 94.26% <0%> (-0.45%) ⬇️
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77bfe21...3a7b1c0. Read the comment docs.

@@ -40,6 +40,7 @@ Other Enhancements
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`)
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`)
- :func:`Dataframe.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`)
- :func:`to_datetime` requires an unit with numerical arg (:issue:`15836`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in API breaking changes. say that it will now raise if a numeric arg is provided w/o a unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs a mini-sub-section as this is a non-obvious change. show the previous and new behavior.

@@ -2230,6 +2230,11 @@ cpdef array_with_unit_to_datetime(ndarray values, unit, errors='coerce'):
m = cast_from_unit(None, unit)

if is_raise:
from pandas.core.dtypes.common import is_bool_dtype

if is_bool_dtype(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise on non-numeric instead here

Copy link
Contributor Author

@atbd atbd Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I raise only on non-numeric, boolean values go through and do not raise error. Plus, it causes issues with list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, instead of using is_bool_dtype here, can do issubclass(isvalues.dtype, np.bool_) is slightly more idiomatic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use np.issubdtype(values.dtype, np.bool_) instead of is_bool_dtype

@@ -234,7 +234,7 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
- If True, require an exact format match.
- If False, allow the format to match anywhere in the target string.

unit : string, default 'ns'
unit : string
unit of the arg (D,s,ms,us,ns) denote the unit, which is an
integer or float number. This will be based off the origin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to the raises section in doc-string that a ValueError on non-numeric input when a unit is specified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is default to None

@@ -446,8 +447,15 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
except (ValueError, TypeError):
raise e

def check_numerical_arg():
return ((is_scalar(arg) and (is_integer(arg) or is_float(arg))) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding the .size check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, pd.to_datetime([]) raises a ValueError("a unit is required in case of numerical arg") which is not the actual behaviour in some others tests (pandas/tests/test_resample.py::TestPeriodIndex::test_resample_empty_dtypes for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then just do len(arg) instead (its by-definition already an interable here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a scalar argument too (example: pd.to_datetime(2, unit='ns') ; arg = 2 here), i have to do a np.asarray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it can't that would hit the first clause (the is_scalar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, wrong example. It is bool element that go wrong.

In [8]: pd.to_datetime(True, unit='ns')
TypeError: object of type 'bool' has no len()

@jreback
Copy link
Contributor

jreback commented Aug 17, 2017

can you rebase and update to comments.

All numerical types now need a unit in to_datetime call.

* Update tests
* Add bool verification in tslib.pyx
* Bug correction with bool and unit in ['ms', 'us', 's', 'D']
  * to_datetime(True, unit='ms') did run without error
  * to_datetime(True, unit='ms', errors='ignore') returned None

Issue: pandas-dev#15836
@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

can you rebase and update

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

closing as stale. if you'd like to continue pls ping.

@jreback jreback closed this Nov 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: to_datetime with ordinals and no unit
3 participants