Skip to content

DatetimeIndex comparison with ndim>1 incorrect or raises #19088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jan 5, 2018 · 10 comments · Fixed by #53406
Closed

DatetimeIndex comparison with ndim>1 incorrect or raises #19088

jbrockmendel opened this issue Jan 5, 2018 · 10 comments · Fixed by #53406
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member

Discovered while testing #18162.

dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')
df = pd.Series(dti).to_frame()
arr = np.array([dti.tolist()]).T  # <-- object dtype, correctly retains TZ
arr2 = np.array([dti]).T          # <-- datetime64[ns], drops TZ

>>> dti == df
[...]
  File "pandas/core/indexes/datetimes.py", line 145, in _ensure_datetime64
    raise TypeError('%s type object %s' % (type(other), str(other)))

>>> df == dti
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1302, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4012, in _combine_const
[...]
  File "pandas/core/internals.py", line 121, in __init__
    '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1

>>> dti == arr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/indexes/datetimes.py", line 126, in wrapper
    o_mask = other.view('i8') == libts.iNaT
  File "/usr/local/lib/python2.7/site-packages/numpy/core/_internal.py", line 377, in _view_is_safe
    raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.

>>> dti == arr2    # <-- this one is fixed by #18162
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

>>> df == arr   # <-- this one is OK
      0
0  True
1  True
2  True

>>> df == arr2  # <-- should raise TypeError
       0
0  False
1  False
2  False

>>> df[0] == arr2[:, 0]  # <-- should raise TypeError
0    True
1    True
2    True
@mroeschke
Copy link
Member

This looks correct now @jbrockmendel?

In [60]: dti == df
ValueError: Unable to coerce to Series, length must be 1: given 3

In [61]: df == dti
ValueError: Unable to coerce to Series, length must be 1: given 3

In [62]: dti == arr
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

In [63]: df == arr2
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects

df[0] == arr2[:, 0]
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects

@jbrockmendel
Copy link
Member Author

I’d prefer if 62 broadcast, but that’s not something we do with any consistency. The rest are clearly much improved.

@mroeschke mroeschke added the Datetime Datetime data dtype label Jan 13, 2019
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 23, 2019
@mroeschke mroeschke added the Bug label Apr 1, 2020
@mroeschke
Copy link
Member

I think these all look correct now that we store Timestamp objects in numpy arrays. Could use tests

In [19]: dti == df
ValueError: Unable to coerce to Series, length must be 1: given 3

In [20]: df == dti
ValueError: Unable to coerce to Series, length must be 1: given 3

In [21]: dti == arr
Out[21]:
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

In [22]: df == arr2
Out[22]:
      0
0  True
1  True
2  True

In [23]: df[0] == arr2[:, 0]
Out[23]:
0    True
1    True
2    True
Name: 0, dtype: bool

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Numeric Operations Arithmetic, Comparison, and Logical operations Datetime Datetime data dtype labels Jun 12, 2021
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Dec 21, 2021
@ashishthanki
Copy link

Hey @mroeschke have the tests for this been written? Can I work on it as my first time issue?

@jbrockmendel
Copy link
Member Author

Can I work on it as my first time issue?

Yes.

have the tests for this been written?

Hard to tell. I'd suggest adding a check that intentionally raises in the relevant cases and see if anything in the test suite breaks.

@ashishthanki
Copy link

No problem,

As a first step, I have been through creating a development environment successfully and set up pandas so that it is running in a docker devcontainer.

Been through the test suite and there is nothing immediate that stands out. Do you suggest having 2 tests, one for series and one for dataframes that cover the relevant scenarios above?

@mcgeestocks
Copy link
Contributor

Looks to be abandon, jumping in to pick this up.

@mcgeestocks
Copy link
Contributor

take

@mcgeestocks
Copy link
Contributor

Seems like df[0] == arr2[:, 0] fails an equality check. Looks like one is tz-aware the other is not.
What would you like to do here @mroeschke?

last line of my test:

   def test_date_range_index_comparison(self):
        rng = date_range("2011-01-01", periods=3, tz="US/Eastern")
        df = Series(rng).to_frame()
        arr = np.array([rng.to_list()]).T
        arr2 = np.array([rng]).T

        with pytest.raises(ValueError):
            rng == df

        with pytest.raises(ValueError):
            df == rng

        tm.assert_numpy_array_equal(df.values, arr)
        tm.assert_numpy_array_equal(df.values, arr2)
        tm.assert_numpy_array_equal(df[0].values, arr2[:, 0])

Results in:

E           AssertionError: numpy array are different
E           
E           numpy array values are different (100.0 %)
E           [left]:  [2011-01-01T05:00:00.000000000, 2011-01-02T05:00:00.000000000, 2011-01-03T05:00:00.000000000]
E           [right]: [2011-01-01 00:00:00-05:00, 2011-01-02 00:00:00-05:00, 2011-01-03 00:00:00-05:00]```

@mroeschke
Copy link
Member

I think you should be doing

result = dti == arr
expected = np.array/pd.DataFrame/pd.Series(...
tm.assert_...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants