Skip to content

BUG: read-only values in cython funcs #37613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -399,11 +399,13 @@ Datetimelike
- Bug in :meth:`TimedeltaIndex.sum` and :meth:`Series.sum` with ``timedelta64`` dtype on an empty index or series returning ``NaT`` instead of ``Timedelta(0)`` (:issue:`31751`)
- Bug in :meth:`DatetimeArray.shift` incorrectly allowing ``fill_value`` with a mismatched timezone (:issue:`37299`)
- Bug in adding a :class:`BusinessDay` with nonzero ``offset`` to a non-scalar other (:issue:`37457`)
- Bug in :func:`to_datetime` with a read-only array incorrectly raising (:issue:`34857`)

Timedelta
^^^^^^^^^
- Bug in :class:`TimedeltaIndex`, :class:`Series`, and :class:`DataFrame` floor-division with ``timedelta64`` dtypes and ``NaT`` in the denominator (:issue:`35529`)
- Bug in parsing of ISO 8601 durations in :class:`Timedelta`, :meth:`pd.to_datetime` (:issue:`37159`, fixes :issue:`29773` and :issue:`36204`)
- Bug in :func:`to_timedelta` with a read-only array incorrectly raising (:issue:`34857`)

Timezones
^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/join.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ ctypedef fused join_t:

@cython.wraparound(False)
@cython.boundscheck(False)
def left_join_indexer_unique(join_t[:] left, join_t[:] right):
def left_join_indexer_unique(ndarray[join_t] left, ndarray[join_t] right):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe these need const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use const join_t until cython3

cdef:
Py_ssize_t i, j, nleft, nright
ndarray[int64_t] indexer
Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/tslibs/strptime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ from _thread import allocate_lock as _thread_allocate_lock
import numpy as np
import pytz

from numpy cimport int64_t
from numpy cimport int64_t, ndarray

from pandas._libs.tslibs.nattype cimport (
NPY_NAT,
Expand Down Expand Up @@ -51,7 +51,7 @@ cdef dict _parse_code_table = {'y': 0,
'u': 22}


def array_strptime(object[:] values, object fmt, bint exact=True, errors='raise'):
def array_strptime(ndarray[object] values, object fmt, bint exact=True, errors='raise'):
"""
Calculates the datetime structs represented by the passed array of strings

Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ cdef convert_to_timedelta64(object ts, str unit):

@cython.boundscheck(False)
@cython.wraparound(False)
def array_to_timedelta64(object[:] values, str unit=None, str errors="raise"):
def array_to_timedelta64(ndarray[object] values, str unit=None, str errors="raise"):
"""
Convert an ndarray to an array of timedeltas. If errors == 'coerce',
coerce non-convertible objects to NaT. Otherwise, raise.
Expand Down
3 changes: 1 addition & 2 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1025,9 +1025,8 @@ def _addsub_object_array(self, other: np.ndarray, op):
result : same class as self
"""
assert op in [operator.add, operator.sub]
if len(other) == 1:
if len(other) == 1 and self.ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel do you recall if tests were failing without this change, or unrelated to the tests added here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a test failed without this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the backport tests fail with and without this change. with this change, we get an additional 13 performance warnings. without this change, we don't get performance warnings in 3 tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this change, we don't get performance warnings in 3 tests

Probably the same three I saw on master that motivated this edit. I just checked out the backport branch locally, will try to troubleshoot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not wild about not-having a better solution, but i think the immediate issue can be addressed by making the condiiton and (self.ndim == 1 or other.shape == self.shape == (1, 1))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbrockmendel for looking into this. I have updated the backport PR with this change for now, pending further discussion there on best approach. #37633 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggested update applied to the backport still failed. 458e0ce

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I'm struggling with this. (working on 1.1x so maybe differences with master)

can you justify this change other than tests were failing.

somethings changed in the internals dispatch where other in the _addsub_object_array call is now a scalar (in a 2d array with shape 1,1) for the failing test_td64arr_add_sub_object_array test instead of a 2d array on 1.1.x

so not raising a performance warning makes sense if other is now a 'scalar'

(also the repr fails for a 2d timedelta array)

on 1.1.x

(Pdb) other
array([[Timedelta('1 days 00:00:00')],
       [<2 * Days>]], dtype=object)
(Pdb) a
Traceback (most recent call last):
...
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not TimedeltaArray

(Pdb) self._data
array([[ 86400000000000],
       [172800000000000]], dtype='timedelta64[ns]')
(Pdb)

with the patch applied ( #37633)

c:\users\simon\pandas\pandas\core\arrays\datetimelike.py(1337)_addsub_object_array()
-> assert op in [operator.add, operator.sub]
(Pdb) other
array([[<2 * Days>]], dtype=object)
(Pdb) self._data
array([[172800000000000]], dtype='timedelta64[ns]')
(Pdb)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you justify this change other than tests were failing.

I never did manage to track down the underlying source of the problem that made this particular edit necessary. Its still early-ish afternoon for me so I can spend some more time today trying again if you think its a priority.

(also the repr fails for a 2d timedelta array)

FWIW that was fixed in #37164

# If both 1D then broadcasting is unambiguous
# TODO(EA2D): require self.ndim == other.ndim here
return op(self, other[0])

warnings.warn(
Expand Down
7 changes: 6 additions & 1 deletion pandas/tests/libs/test_join.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,14 @@ def test_cython_inner_join(self):
tm.assert_numpy_array_equal(rs, exp_rs, check_dtype=False)


def test_left_join_indexer_unique():
@pytest.mark.parametrize("readonly", [True, False])
def test_left_join_indexer_unique(readonly):
a = np.array([1, 2, 3, 4, 5], dtype=np.int64)
b = np.array([2, 2, 3, 4, 4], dtype=np.int64)
if readonly:
# GH#37312, GH#37264
a.setflags(write=False)
b.setflags(write=False)

result = libjoin.left_join_indexer_unique(b, a)
expected = np.array([1, 1, 2, 3, 3], dtype=np.int64)
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@


class TestTimeConversionFormats:
@pytest.mark.parametrize("readonly", [True, False])
def test_to_datetime_readonly(self, readonly):
# GH#34857
arr = np.array([], dtype=object)
if readonly:
arr.setflags(write=False)
result = to_datetime(arr)
expected = to_datetime([])
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize("cache", [True, False])
def test_to_datetime_format(self, cache):
values = ["1/1/2000", "1/2/2000", "1/3/2000"]
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/tools/test_to_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@


class TestTimedeltas:
@pytest.mark.parametrize("readonly", [True, False])
def test_to_timedelta_readonly(self, readonly):
# GH#34857
arr = np.array([], dtype=object)
if readonly:
arr.setflags(write=False)
result = to_timedelta(arr)
expected = to_timedelta([])
tm.assert_index_equal(result, expected)

def test_to_timedelta(self):

result = to_timedelta(["", ""])
Expand Down