Skip to content

TST/BUG in test_categorical.py: test_constructor_unsortable breaks after recent commit #13714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pijucha opened this issue Jul 20, 2016 · 5 comments
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@pijucha
Copy link
Contributor

pijucha commented Jul 20, 2016

This is a follow up to #13514 (safe sort of mixed-int arrays).
After merging this commit, test_constructor_unsortable in test_categorical.py breaks.

According to the code there, numpy.sort should sort a mixed int-datetime array in python2 and numpy >= 1.10. But it doesn't.

In [3]: arr = np.array([1, 2, datetime.now(), 0, 3], dtype='O')

In [4]: np.sort(arr)
/home/users/piotr/workspace/pandas-pijucha/pandas_dev_python2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Out[4]: array([1, 2, datetime.datetime(2016, 7, 19, 9, 49, 28, 214675), 0, 3], dtype=object)

In [6]: np.__version__
Out[6]: '1.11.0'

Ipython probably interferes here because in pure python2.7 I'm getting

TypeError: can't compare datetime.datetime to int

In the old code in factorize, there was a list comprehension similar to this:

ordered = [np.sort(np.array([e for e in arr if f(e)], dtype=object))
           for f in [lambda x: True, lambda x: False]]

I haven't caught it precisely but it looks as if it sometimes swallowed an exception. (New code in safe_sort is simpler - sorts each of the two arrays separately, but still with np.sort.)

It looks to me that Categorical.from_array(arr, ordered=True) should always raise now. And maybe test_constructor_unsortable from test_categorical.py needs to be rewritten.


Weird numpy behaviour

I tested numpy behaviour for several versions between 1.7 and 1.11 in python 2.7, both in a script and interactive python (not ipython).

Script

Running the following script:

from datetime import datetime
import numpy as np
import sys

print sys.version
print np.__version__

arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)
arr2 = np.sort(arr)
print arr2

gives for numpy < 1.10:

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.9.1
Traceback (most recent call last):
  File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 9, in <module>
    arr2 = np.sort(arr)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

and for numpy >= 1.10:

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Traceback (most recent call last):
  File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 10, in <module>
    print arr2
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1869, in array_str
    return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
    elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int

The exception is raised in the line following arr2 = np.sort(arr).

When I remove print arr2 from the script, I'm getting "exception ignored":

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Exception TypeError: "can't compare datetime.datetime to int" in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored

Interactive mode

In the interactive mode (not Ipython):

For numpy < 1.10:

>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> np.__version__
'1.9.1'

>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)

>>> np.sort(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

>>> arr2 = np.sort(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

For numpy >= 1.10:

>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]
>>> np.__version__
'1.11.0'

>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)

>>> np.sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1807, in array_repr
    ', ', "array(")
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
    elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int

>>> arr2 = np.sort(arr)
>>> arr2
TypeError: can't compare datetime.datetime to int
>>> print(arr2)
[1 2 datetime.datetime(2016, 7, 20, 0, 24, 6, 50903) 0 3]
>>> arr2
array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object)

>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
>>> order
[array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object), array([], dtype=object)]

(I pasted literally from a console, line by line, adding only empty lines for clarity. Calls to arr2 puzzle me.)

A behaviour for the above list comprehension may depend on whether np.sort raises or not on the second array in the list (here, empty).

safe_sort

Calling safe_sort on arr always raises. (But I don't really know why.)

>>> np.__version__
'1.11.0'
>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> pd.__version__
u'0.18.1+221.g8acfad3'

>>> pd.core.algorithms.safe_sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py:223: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  sorter = values.argsort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 227, in safe_sort
    ordered = sort_mixed(values)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 214, in sort_mixed
    strs = np.sort(values[str_pos])
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 824, in sort
    a = asanyarray(a).copy(order="K")
TypeError: can't compare datetime.datetime to int

I didn't test much in ipython, but it also (at least sometimes) swallows an exception and returns a partially sorted arr2, but prints a warning (as above).

@pijucha pijucha changed the title BUG in test_categorical.py: test_constructor_unsortable TST/BUG in test_categorical.py: test_constructor_unsortable breaks after recent commit Jul 20, 2016
@jreback
Copy link
Contributor

jreback commented Jul 20, 2016

thanks @pijucha

can u also post a numpy issue; use Python only to replicate (not ipython) , just to isolate the issue

@pijucha
Copy link
Contributor Author

pijucha commented Jul 20, 2016

@jreback Updated the post. It's more complicated than I thought 2 hours ago.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2016

jreback@626acfa

changes the 2.7 build to use latest numpy; so this repros the error.

if you can incorporate would be great thxs.

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions and removed Bug labels Jul 20, 2016
@jreback jreback added this to the 0.19.0 milestone Jul 20, 2016
@pijucha
Copy link
Contributor Author

pijucha commented Jul 20, 2016

Comments in numpy/numpy#3879 shed some light on what's happening inside sort.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2016

closed by 622297c

@jreback jreback closed this as completed Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants