BUG: Restrict DTA to 1D #27027

jbrockmendel · 2019-06-25T00:10:23Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Bug identified in #27015. Much less-kludgy patch for NDFrame.rank.

codecov · 2019-06-25T00:44:53Z

Codecov Report

Merging #27027 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #27027      +/-   ##
==========================================
- Coverage   91.99%   91.99%   -0.01%     
==========================================
  Files         180      180              
  Lines       50774    50782       +8     
==========================================
+ Hits        46712    46716       +4     
- Misses       4062     4066       +4

Flag	Coverage Δ
#multiple	`90.63% <100%> (ø)`	⬆️
#single	`41.85% <37.5%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimes.py	`97.8% <100%> (ø)`	⬆️
pandas/core/algorithms.py	`94.76% <100%> (+0.03%)`	⬆️
pandas/io/formats/format.py	`97.91% <100%> (ø)`	⬆️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe8d7...73411ec. Read the comment docs.

codecov · 2019-06-25T00:44:54Z

Codecov Report

Merging #27027 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #27027      +/-   ##
==========================================
+ Coverage   91.99%   92.04%   +0.04%     
==========================================
  Files         180      180              
  Lines       50774    50723      -51     
==========================================
- Hits        46712    46688      -24     
+ Misses       4062     4035      -27

Flag	Coverage Δ
#multiple	`90.68% <100%> (+0.05%)`	⬆️
#single	`41.87% <44.44%> (-0.04%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimes.py	`97.8% <100%> (ø)`	⬆️
pandas/core/algorithms.py	`94.77% <100%> (+0.04%)`	⬆️
pandas/io/formats/format.py	`97.91% <100%> (ø)`	⬆️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/indexing.py	`93.3% <0%> (-0.19%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️
pandas/core/ops.py	`94.66% <0%> (-0.03%)`	⬇️
pandas/core/generic.py	`94.18% <0%> (-0.02%)`	⬇️
pandas/core/indexes/datetimelike.py	`98.14% <0%> (-0.01%)`	⬇️
pandas/core/sorting.py	`98.35% <0%> (ø)`	⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe8d7...8f99a00. Read the comment docs.

jorisvandenbossche · 2019-06-25T10:02:12Z

pandas/tests/arrays/test_datetimes.py

+
+        with pytest.raises(ValueError, match="Only 1-dimensional"):
+            # 2-dim
+            DatetimeArray(arr.reshape(2, 2))


To be clear: this already fails currently, right? You are mainly catching the error early / providing a better error message ?

No, this is currently accepted incorrectly.

With latest master:

In [15]: pd.__version__ Out[15]: '0.25.0.dev0+791.gf0919f272' In [16]: arr = np.array([0, 1, 2, 3], dtype='M8[h]').astype('M8[ns]') In [17]: pd.arrays.DatetimeArray(arr.reshape(2, 2)) ... ~/scipy/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() TypeError: Cannot convert input [['1970-01-01T00:00:00.000000000' '1970-01-01T01:00:00.000000000']] of type <class 'numpy.ndarray'> to Timestamp

Try changing [17] to res = pd.arrays.DatetimeArray(arr.reshape(2, 2)). I'm pretty sure that error is coming from an attempt to call __repr__.

ah .. yes :-)

jorisvandenbossche · 2019-06-25T10:05:00Z

pandas/io/formats/format.py

@@ -1273,6 +1273,11 @@ def format_percentiles(percentiles):

 def _is_dates_only(values):
    # return a boolean if we are only dates (and don't have a timezone)
+    if isinstance(values, np.ndarray) and values.ndim > 1:


In which case do you run into this?
(I was assuming the format_array is working column by column)

Per above, this is hit with 2D ndarray inputs, which ATM are incorrectly accepted but will now raise.

Yes, but my question is: when is this actually hit with 2D ndarray input?

same question as @jorisvandenbossche

Just added an assertion for 1D-ness (in master) and the first test that failed is effectively:

pd.DataFrame({"A": pd.date_range('2016-01-01', periods=3)}).to_csv()

OK. I would rather do a ravel in the code calling it:

pandas/pandas/core/internals/blocks.py

Lines 2171 to 2175 in f0919f2

fmt = _get_format_datetime64_from_values(values, date_format)

result = tslib.format_array_from_datetime(

i8values.ravel(), tz=getattr(self.values, 'tz', None),

format=fmt, na_rep=na_rep).reshape(i8values.shape)

as it is also done for the actual formatting function right below.

In fact, this is also kind of a bug in our formatting. As the formatting should be done column by column (the frequency of one column should not influence the formatting of another column)

OK. I think is_dates_only is only called with non-ravelled data in one place, so I can move the maybe ravel there and put an assertion in is_dates_only. Is there anything else that should go along with that?

did you update this?

jreback · 2019-06-25T14:34:10Z

pandas/core/algorithms.py

@@ -104,6 +104,12 @@ def _ensure_data(values, dtype=None):
            dtype = values.dtype
        else:
            # Datetime
+            if values.ndim > 1:


what exactly hits this?

DataFrame.rank with all-datetime64 columns. #27015 has a terrible terrible hack instead of this 5-line workaround,.

we haven't reviewed #27015 not averse, just want to avoid hacks on hacks here

Yah, this is distinctly the non-hack solution.

pandas/core/arrays/datetimes.py

jreback · 2019-06-25T14:34:41Z

pandas/io/formats/format.py

@@ -1273,6 +1273,11 @@ def format_percentiles(percentiles):

 def _is_dates_only(values):
    # return a boolean if we are only dates (and don't have a timezone)
+    if isinstance(values, np.ndarray) and values.ndim > 1:


same question as @jorisvandenbossche

jreback · 2019-06-27T02:54:44Z

pandas/core/algorithms.py

@@ -104,6 +104,12 @@ def _ensure_data(values, dtype=None):
            dtype = values.dtype
        else:
            # Datetime
+            if values.ndim > 1:
+                # Avoid calling the DatetimeIndex constructor as it is 1D only
+                asi8 = values.view('i8')


I believe that ensure_data should only take 1d input at all times. Is there a case where it does not? (nb we shoul dprob document / type this)

Yes, this gets called with 2D values from DataFrame.rank.

Do you know if it is only rank? Because if so, it might be useful to add that as a comment for somebody later reading the code and wondering the same question where 2D things are passed to this.

pandas/core/algorithms.py

jorisvandenbossche · 2019-06-27T20:22:41Z

pandas/io/formats/format.py

+    if isinstance(values, np.ndarray) and values.ndim > 1:
+        # We don't actaully care about the order of values, and DatetimeIndex
+        #  only accepts 1D values
+        values = values.ravel()


I would even move it down one step further to to_native_types (there you don't need the if check, other part of the code there is already doing ravel() as well), but no strong feelings if you want to keep it here

@jbrockmendel

jorisvandenbossche

Apart from the two minor comments, looks good

jbrockmendel · 2019-06-27T21:22:04Z

comments addressed and green

Restrict DTA to 1D

73411ec

jorisvandenbossche reviewed Jun 25, 2019

View reviewed changes

jreback requested changes Jun 25, 2019

View reviewed changes

gfyoung added Bug Datetime Datetime data dtype labels Jun 26, 2019

jreback requested changes Jun 27, 2019

View reviewed changes

jreback reviewed Jun 27, 2019

View reviewed changes

pandas/core/algorithms.py Show resolved Hide resolved

address comments

e861530

jreback added this to the 0.25.0 milestone Jun 27, 2019

typo fixup

d59f41c

jorisvandenbossche reviewed Jun 27, 2019

View reviewed changes

jorisvandenbossche approved these changes Jun 27, 2019

View reviewed changes

rank comment

8f99a00

jreback approved these changes Jun 27, 2019

View reviewed changes

jreback merged commit 8b48f5c into pandas-dev:master Jun 27, 2019

jbrockmendel deleted the dtarr2d branch June 27, 2019 21:36

	fmt = _get_format_datetime64_from_values(values, date_format)

	result = tslib.format_array_from_datetime(
	i8values.ravel(), tz=getattr(self.values, 'tz', None),
	format=fmt, na_rep=na_rep).reshape(i8values.shape)

Uh oh!

BUG: Restrict DTA to 1D #27027

BUG: Restrict DTA to 1D #27027

Uh oh!

Conversation

jbrockmendel commented Jun 25, 2019

Uh oh!

codecov bot commented Jun 25, 2019

Codecov Report

Uh oh!

codecov bot commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Jun 27, 2019

Uh oh!

Uh oh!

codecov bot commented Jun 25, 2019 •

edited

Loading