Skip to content

ENH: Handle numpy-like arrays in assert_almost_equal #39790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

znicholls
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@znicholls znicholls changed the title Add failing test ENH: Handle numpy-like arrays in assert_almost_equal Feb 13, 2021
@znicholls
Copy link
Contributor Author

znicholls commented Feb 13, 2021

At this point, the test fails because assert_almost_equal doesn't have any real way to support numpy-like (but not numpy) arrays. I've put in a tweak which gets numpy-like arrays to go to the right branch of assert_almost_equal, however the comparison fails in array_equivalent. The problem is that the inputs in array_equivalent are converted using np.asarray. After this operation, they are identified as object type (even though their dtype attribute isn't object), hence array_equivalent goes into the is_string_dtype branch so _array_equivalent_object tries to zip the things being compared and then things explode because a numpy-scalar isn't iterable.

@znicholls
Copy link
Contributor Author

I added an __array__ method to the mock numpy-like object which seems to have fixed the issue in #39790 (comment). So, the question is, do the changes to testing made here look ok?

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Feb 15, 2021
@jreback jreback added this to the 1.3 milestone Feb 15, 2021
@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

At this point, the test fails because assert_almost_equal doesn't have any real way to support numpy-like (but not numpy) arrays. I've put in a tweak which gets numpy-like arrays to go to the right branch of assert_almost_equal, however the comparison fails in array_equivalent. The problem is that the inputs in array_equivalent are converted using np.asarray. After this operation, they are identified as object type (even though their dtype attribute isn't object), hence array_equivalent goes into the is_string_dtype branch so _array_equivalent_object tries to zip the things being compared and then things explode because a numpy-scalar isn't iterable.

sound like you need to add a function which can return true if not is_array but has size / dtype; and then use it in array_equivalent. call it is_arraylike?

but stepping back here. I thought Pint is using the EA interface; if this is the case then no changes should be needed here at all. if this is not the case, then can you explain why?

@znicholls
Copy link
Contributor Author

I thought Pint is using the EA interface; if this is the case then no changes should be needed here at all

Because the comparisons should be defined elsewhere?

In any case, it does appear this is the case. I think I've started at the wrong spot. I'll try again in a different MR (feel free to close for now, I can re-open if needed).

@znicholls
Copy link
Contributor Author

@jreback I have dug deeper on this. The test in which this is an issue is

def test_unstack(self, data, index, obj):

Casting both expected and result to object causes the raise to be hit. I am guessing we need to do something similar to what is done for JSONArray and overwrite assert_series_equal and assert_frame_equal like is done here

def assert_frame_equal(cls, left, right, *args, **kwargs):
?

@znicholls
Copy link
Contributor Author

Or we just skip that test entirely like

@pytest.mark.xfail(reason="dict for NA")

@jbrockmendel
Copy link
Member

The problem is that the inputs in array_equivalent are converted using np.asarray. After this operation, they are identified as object type (even though their dtype attribute isn't object)

do the inputs have an __array__ method? id expect that to govern what dtype np.asarray gets you

@@ -119,7 +121,7 @@ cpdef assert_almost_equal(a, b,
f"Can't compare objects without length, one or both is invalid: ({a}, {b})"
)

if a_is_ndarray and b_is_ndarray:
if (a_is_ndarray and b_is_ndarray) or (a_has_size_and_shape and b_has_size_and_shape):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arent the is_array checks redundant here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I had kept them in case there was some edge case I hadn't thought of but I can also remove them

@znicholls
Copy link
Contributor Author

do the inputs have an __array__ method? id expect that to govern what dtype np.asarray gets you

Yep sorry I realised that after the fact (comment here #39790 (comment), fix here ad5f714)

@jbrockmendel
Copy link
Member

I added an array method to the mock numpy-like object which seems to have fixed the issue in #39790 (comment). So, the question is, do the changes to testing made here look ok?

so with the __array__ method, are the test changes needed? they're going to hurt performance a small-but-frequent amount

@znicholls
Copy link
Contributor Author

The test changes are still needed. The reason is that you need something over and above is_array because is_array is equivalent to isinstance(val, np.ndarray). The array like implementations are not numpy arrays so they fail that check (even though they have an __array__ method).

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 14, 2021
@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 25, 2021
@simonjayhawkins
Copy link
Member

@znicholls Thanks for the PR. closing as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants