-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Restrict DTA to 1D #27027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Restrict DTA to 1D #27027
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,12 @@ def _ensure_data(values, dtype=None): | |
dtype = values.dtype | ||
else: | ||
# Datetime | ||
if values.ndim > 1: | ||
# Avoid calling the DatetimeIndex constructor as it is 1D only | ||
asi8 = values.view('i8') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that ensure_data should only take 1d input at all times. Is there a case where it does not? (nb we shoul dprob document / type this) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this gets called with 2D values from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know if it is only rank? Because if so, it might be useful to add that as a comment for somebody later reading the code and wondering the same question where 2D things are passed to this. |
||
dtype = values.dtype | ||
return asi8, dtype, 'int64' | ||
|
||
from pandas import DatetimeIndex | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
values = DatetimeIndex(values) | ||
dtype = values.dtype | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1273,6 +1273,11 @@ def format_percentiles(percentiles): | |||||||||||
|
||||||||||||
def _is_dates_only(values): | ||||||||||||
# return a boolean if we are only dates (and don't have a timezone) | ||||||||||||
if isinstance(values, np.ndarray) and values.ndim > 1: | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In which case do you run into this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per above, this is hit with 2D ndarray inputs, which ATM are incorrectly accepted but will now raise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but my question is: when is this actually hit with 2D ndarray input? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same question as @jorisvandenbossche There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just added an assertion for 1D-ness (in master) and the first test that failed is effectively:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I would rather do a ravel in the code calling it: pandas/pandas/core/internals/blocks.py Lines 2171 to 2175 in f0919f2
as it is also done for the actual formatting function right below. In fact, this is also kind of a bug in our formatting. As the formatting should be done column by column (the frequency of one column should not influence the formatting of another column) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I think is_dates_only is only called with non-ravelled data in one place, so I can move the maybe ravel there and put an assertion in is_dates_only. Is there anything else that should go along with that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did you update this? |
||||||||||||
# We don't actaully care about the order of values, and DatetimeIndex | ||||||||||||
# only accepts 1D values | ||||||||||||
values = values.ravel() | ||||||||||||
|
||||||||||||
values = DatetimeIndex(values) | ||||||||||||
if values.tz is not None: | ||||||||||||
return False | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,18 @@ | |
|
||
|
||
class TestDatetimeArrayConstructor: | ||
|
||
def test_only_1dim_accepted(self): | ||
arr = np.array([0, 1, 2, 3], dtype='M8[h]').astype('M8[ns]') | ||
|
||
with pytest.raises(ValueError, match="Only 1-dimensional"): | ||
# 2-dim | ||
DatetimeArray(arr.reshape(2, 2)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear: this already fails currently, right? You are mainly catching the error early / providing a better error message ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this is currently accepted incorrectly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With latest master:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try changing [17] to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah .. yes :-) |
||
|
||
with pytest.raises(ValueError, match="Only 1-dimensional"): | ||
# 0-dim | ||
DatetimeArray(arr[[0]].squeeze()) | ||
|
||
def test_freq_validation(self): | ||
# GH#24623 check that invalid instances cannot be created with the | ||
# public constructor | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly hits this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.rank
with all-datetime64 columns. #27015 has a terrible terrible hack instead of this 5-line workaround,.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't reviewed #27015 not averse, just want to avoid hacks on hacks here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, this is distinctly the non-hack solution.