-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Restrict DTA to 1D #27027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Restrict DTA to 1D #27027
Conversation
Codecov Report
@@ Coverage Diff @@
## master #27027 +/- ##
==========================================
- Coverage 91.99% 91.99% -0.01%
==========================================
Files 180 180
Lines 50774 50782 +8
==========================================
+ Hits 46712 46716 +4
- Misses 4062 4066 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27027 +/- ##
==========================================
+ Coverage 91.99% 92.04% +0.04%
==========================================
Files 180 180
Lines 50774 50723 -51
==========================================
- Hits 46712 46688 -24
+ Misses 4062 4035 -27
Continue to review full report at Codecov.
|
|
||
with pytest.raises(ValueError, match="Only 1-dimensional"): | ||
# 2-dim | ||
DatetimeArray(arr.reshape(2, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: this already fails currently, right? You are mainly catching the error early / providing a better error message ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is currently accepted incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With latest master:
In [15]: pd.__version__
Out[15]: '0.25.0.dev0+791.gf0919f272'
In [16]: arr = np.array([0, 1, 2, 3], dtype='M8[h]').astype('M8[ns]')
In [17]: pd.arrays.DatetimeArray(arr.reshape(2, 2))
...
~/scipy/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
TypeError: Cannot convert input [['1970-01-01T00:00:00.000000000' '1970-01-01T01:00:00.000000000']] of type <class 'numpy.ndarray'> to Timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try changing [17] to res = pd.arrays.DatetimeArray(arr.reshape(2, 2))
. I'm pretty sure that error is coming from an attempt to call __repr__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah .. yes :-)
pandas/io/formats/format.py
Outdated
@@ -1273,6 +1273,11 @@ def format_percentiles(percentiles): | |||
|
|||
def _is_dates_only(values): | |||
# return a boolean if we are only dates (and don't have a timezone) | |||
if isinstance(values, np.ndarray) and values.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case do you run into this?
(I was assuming the format_array is working column by column)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above, this is hit with 2D ndarray inputs, which ATM are incorrectly accepted but will now raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but my question is: when is this actually hit with 2D ndarray input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as @jorisvandenbossche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added an assertion for 1D-ness (in master) and the first test that failed is effectively:
pd.DataFrame({"A": pd.date_range('2016-01-01', periods=3)}).to_csv()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I would rather do a ravel in the code calling it:
pandas/pandas/core/internals/blocks.py
Lines 2171 to 2175 in f0919f2
fmt = _get_format_datetime64_from_values(values, date_format) | |
result = tslib.format_array_from_datetime( | |
i8values.ravel(), tz=getattr(self.values, 'tz', None), | |
format=fmt, na_rep=na_rep).reshape(i8values.shape) |
as it is also done for the actual formatting function right below.
In fact, this is also kind of a bug in our formatting. As the formatting should be done column by column (the frequency of one column should not influence the formatting of another column)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think is_dates_only is only called with non-ravelled data in one place, so I can move the maybe ravel there and put an assertion in is_dates_only. Is there anything else that should go along with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you update this?
pandas/core/algorithms.py
Outdated
@@ -104,6 +104,12 @@ def _ensure_data(values, dtype=None): | |||
dtype = values.dtype | |||
else: | |||
# Datetime | |||
if values.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly hits this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't reviewed #27015 not averse, just want to avoid hacks on hacks here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, this is distinctly the non-hack solution.
pandas/io/formats/format.py
Outdated
@@ -1273,6 +1273,11 @@ def format_percentiles(percentiles): | |||
|
|||
def _is_dates_only(values): | |||
# return a boolean if we are only dates (and don't have a timezone) | |||
if isinstance(values, np.ndarray) and values.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as @jorisvandenbossche
@@ -104,6 +104,12 @@ def _ensure_data(values, dtype=None): | |||
dtype = values.dtype | |||
else: | |||
# Datetime | |||
if values.ndim > 1: | |||
# Avoid calling the DatetimeIndex constructor as it is 1D only | |||
asi8 = values.view('i8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that ensure_data should only take 1d input at all times. Is there a case where it does not? (nb we shoul dprob document / type this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this gets called with 2D values from DataFrame.rank
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if it is only rank? Because if so, it might be useful to add that as a comment for somebody later reading the code and wondering the same question where 2D things are passed to this.
if isinstance(values, np.ndarray) and values.ndim > 1: | ||
# We don't actaully care about the order of values, and DatetimeIndex | ||
# only accepts 1D values | ||
values = values.ravel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even move it down one step further to to_native_types
(there you don't need the if check, other part of the code there is already doing ravel()
as well), but no strong feelings if you want to keep it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the two minor comments, looks good
comments addressed and green |
git diff upstream/master -u -- "*.py" | flake8 --diff
Bug identified in #27015. Much less-kludgy patch for NDFrame.rank.