Skip to content

BUG: Restrict DTA to 1D #27027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@
from pandas.core.dtypes.common import (
ensure_float64, ensure_int64, ensure_object, ensure_platform_int,
ensure_uint64, is_array_like, is_bool_dtype, is_categorical_dtype,
is_complex_dtype, is_datetime64_any_dtype, is_datetime64tz_dtype,
is_datetimelike, is_extension_array_dtype, is_float_dtype, is_integer,
is_integer_dtype, is_interval_dtype, is_list_like, is_numeric_dtype,
is_object_dtype, is_period_dtype, is_scalar, is_signed_integer_dtype,
is_sparse, is_timedelta64_dtype, is_unsigned_integer_dtype,
needs_i8_conversion)
is_complex_dtype, is_datetime64_any_dtype, is_datetime64_ns_dtype,
is_datetime64tz_dtype, is_datetimelike, is_extension_array_dtype,
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype,
is_list_like, is_numeric_dtype, is_object_dtype, is_period_dtype,
is_scalar, is_signed_integer_dtype, is_sparse, is_timedelta64_dtype,
is_unsigned_integer_dtype, needs_i8_conversion)
from pandas.core.dtypes.generic import ABCIndex, ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna, na_value_for_dtype

Expand Down Expand Up @@ -104,6 +104,13 @@ def _ensure_data(values, dtype=None):
dtype = values.dtype
else:
# Datetime
if values.ndim > 1 and is_datetime64_ns_dtype(values):
# Avoid calling the DatetimeIndex constructor as it is 1D only
# Note: this is reached by DataFrame.rank calls GH#27027
asi8 = values.view('i8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that ensure_data should only take 1d input at all times. Is there a case where it does not? (nb we shoul dprob document / type this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this gets called with 2D values from DataFrame.rank.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it is only rank? Because if so, it might be useful to add that as a comment for somebody later reading the code and wondering the same question where 2D things are passed to this.

dtype = values.dtype
return asi8, dtype, 'int64'

from pandas import DatetimeIndex
values = DatetimeIndex(values)
dtype = values.dtype
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@ def __init__(self, values, dtype=_NS_DTYPE, freq=None, copy=False):
"ndarray, or Series or Index containing one of those."
)
raise ValueError(msg.format(type(values).__name__))
if values.ndim != 1:
raise ValueError("Only 1-dimensional input arrays are supported.")

if values.dtype == 'i8':
# for compat with datetime/timedelta/period shared methods,
Expand Down
8 changes: 8 additions & 0 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1273,6 +1273,8 @@ def format_percentiles(percentiles):

def _is_dates_only(values):
# return a boolean if we are only dates (and don't have a timezone)
assert values.ndim == 1

values = DatetimeIndex(values)
if values.tz is not None:
return False
Expand Down Expand Up @@ -1324,6 +1326,12 @@ def _get_format_datetime64(is_dates_only, nat_rep='NaT', date_format=None):

def _get_format_datetime64_from_values(values, date_format):
""" given values and a date_format, return a string format """

if isinstance(values, np.ndarray) and values.ndim > 1:
# We don't actaully care about the order of values, and DatetimeIndex
# only accepts 1D values
values = values.ravel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even move it down one step further to to_native_types (there you don't need the if check, other part of the code there is already doing ravel() as well), but no strong feelings if you want to keep it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


is_dates_only = _is_dates_only(values)
if is_dates_only:
return date_format or "%Y-%m-%d"
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,18 @@


class TestDatetimeArrayConstructor:

def test_only_1dim_accepted(self):
arr = np.array([0, 1, 2, 3], dtype='M8[h]').astype('M8[ns]')

with pytest.raises(ValueError, match="Only 1-dimensional"):
# 2-dim
DatetimeArray(arr.reshape(2, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this already fails currently, right? You are mainly catching the error early / providing a better error message ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is currently accepted incorrectly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With latest master:

In [15]: pd.__version__
Out[15]: '0.25.0.dev0+791.gf0919f272'

In [16]: arr = np.array([0, 1, 2, 3], dtype='M8[h]').astype('M8[ns]') 

In [17]: pd.arrays.DatetimeArray(arr.reshape(2, 2)) 
...
~/scipy/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

TypeError: Cannot convert input [['1970-01-01T00:00:00.000000000' '1970-01-01T01:00:00.000000000']] of type <class 'numpy.ndarray'> to Timestamp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try changing [17] to res = pd.arrays.DatetimeArray(arr.reshape(2, 2)). I'm pretty sure that error is coming from an attempt to call __repr__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah .. yes :-)


with pytest.raises(ValueError, match="Only 1-dimensional"):
# 0-dim
DatetimeArray(arr[[0]].squeeze())

def test_freq_validation(self):
# GH#24623 check that invalid instances cannot be created with the
# public constructor
Expand Down