Skip to content

[ENH]: Add self.__eq__() to Text object #23273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
noatamir opened this issue Jun 14, 2022 · 9 comments
Open

[ENH]: Add self.__eq__() to Text object #23273

noatamir opened this issue Jun 14, 2022 · 9 comments

Comments

@noatamir
Copy link
Contributor

noatamir commented Jun 14, 2022

Problem

When writing test for Matplotlib plots one often needs to check if the text is showing what's expected.
Adding an __eq__() would make this much simpler to do.

Proposed solution

add self.__eq__() to Text object

@noatamir noatamir changed the title [ENH]: Add __eq__ to Text object [ENH]: Add __eq__() to Text object Jun 14, 2022
@noatamir noatamir changed the title [ENH]: Add __eq__() to Text object [ENH]: Add self.__eq__() to Text object Jun 14, 2022
@tacaswell
Copy link
Member

Seems reasonable, however I think there are a couple of judgement calls we will need to make about what part of the state participate in deciding equality and that reasonable people will disagree about which way to go. On the "clearly participate" side I would put the string itself and the font size, but everything else seems ambiguous to me.

The position is complicated because it is run through the transform (which is what de-facto sets the coordinate system that the position is in) so if we are strict about the positions only being comparable if the transform is the same, that is clear, but probably not all that useful (as now the only Text that will be equal to each other will be literal over-struck text). I think we could get away with something like "if both objects have a transform and the transform is one of the standard transforms on their parent Axes or Figure and they each are the same notional transform then the transforms are effectively equal". My first thought was that this would be hard, but having typed it out it does not seem so bad. If we go that route then we are also deciding that which Axes / Figure a Text object is in is is not part of the equality.

You could also compare the position by going all the way through the transform which would let text in different figures be equal (so axis labels of the same position axes in different figures that are the same size) but not the same string axis labels on different axes in the same figure. It also adds a weird coupling to the overall figure size which seems wrong.

The other thing I see problems with is the font / font spec. I do not recall off the top of my head how aggressively / when we resolve the selected font. Because there is the heuristics for finding the font closest to what the user asked for, it is possible to have more than one input (in the worst case, random strings which will all fall back to the default font as we think that "wrong font" is better than "no text") will result in the same font. This is also seems tractable, either we do it early and I'm worried for no reason or we pay the cost of fully resolving the fonts when we do the comparison.

@anntzer
Copy link
Contributor

anntzer commented Jun 15, 2022

Actually, I think we should not add such an __eq__ (for such complex objects, equality should just be identity): for example, given the test at pandas-dev/pandas@a0cd359 (#47344), I'd say that the text elements should compare unequal as the parent axes, transform, font, etc. are likely all different between the tick Texts and the manually generated ones. Instead, you should just compare whatever attributes matter for you.

(Also, it would be rather strange to special-case Texts to have their own __eq__ and not have the same on all other artists.)

@jklymak
Copy link
Member

jklymak commented Jun 15, 2022

If __eq__ is not completely unambiguous, we should be very careful to add it.

@oscargus
Copy link
Member

you should just compare whatever attributes matter for you

Hence, one can possibly add one of two things (if we want to make that easier):

  1. An equals method to Artist that takes the other object and a list of properties.
  2. A helper function in testing that take two objects and a list of properties.

I guess the second is probably more appropriate? Then the question is if we have enough benefit of it locally?

@timhoffm
Copy link
Member

A helper function in testing that take two objects and a list of properties.

That is the way to go. It does not create new API surface.

Then the question is if we have enough benefit of it locally?

We can let such functionality emerge as needed: If you do a thing more than twice, consider whether there is a pattern that we want to extract into a function. That function could also start in tests/test_text.py and only move when we need it in other modules as well.

@tacaswell
Copy link
Member

Although the consensus seems to be not to implement __eq__, one more issue I think we would have would be how it would interact with __hash__. Currently identity is both hash and equality (so we can use Artists as keys in dictionaries and put them in sets). If we implement __eq__ I think we would have to make them un-hashable. On one hand mutable objects where the hash depends on the mutable state leads to chaos, but so does having an object where equality and hash equality are not always synced (I would have to go re-read the object model docs to get all the details of why this would go badly right).

@story645
Copy link
Member

story645 commented Jun 16, 2022

That function could also start in tests/test_text.py and only move when we need it in other modules as well.

One of the pandas devs expressed interest in an equality method, which makes me think that it should be public & in testing - especially if it's written as a general test_artist_equality(A1, A2, properties)

@tacaswell
Copy link
Member

I think a minimal version of this looks like:

def test_artist_equality(a, b, properties, attributes):
    for p in properties:
        # TODO deal with numpy arrays
        assert getattr(a, f'get_{p}')() == getattr(b, f'get_{p}')()
    for attr in attributes:
        # TODO support numpy arrays
        assert getattr(a, attr) == getattr(b, attr)

which only uses public API (but takes a very literal view of "equal").

However, moving from __eq__ to a function does not solve the ambiguity of what we mean by "equal" (in addition to the ones I mentioned above, color will likely have some fun normalize/do not normalize issues). As a stand alone function I guess we could provide ways to parameterize that:

def test_artist_equality(a, b, properties: Dict[str, Callable[[Any, Any], bool]]):
    for k, func in properties.items():
        assert func(getattr(a, f'get_{p}')() , getattr(b, f'get_{p}')())

# or

import operator

def test_artist_equality(def test_artist_equality(a, b, properties, comparitors: Dict[str, Callable[[Any, Any], bool]]):
    for k in properties:
        func = comparitors.get(p, operator.eq)
        assert func(getattr(a, f'get_{p}')() , getattr(b, f'get_{p}')())

but that would get pretty awkward to use so I expect it would grow a bunch of sensible defaults depending on the key (and the associated complexity). In that case it may be simpler for users write the loop/hard code the case they want in the test themselves (or their own helper if they need to do it a bunch of times) rather than to try and understand the general function we provide!

In addition to the variations on how to normalize / "equal" heuristic per property/attribute, I could also see an alternate API where instead of "include lists" we pass "exclude lists" (at least for the "properties").

I do not think we have felt a need for this internally as we either are only comparing a small number of properties/attributes or we rely on "it rendered the same" to be the ultimate arbiter (as what the user sees is what actually matters), however I see why for testing pandas' plotting neither of those are good options.

It is not clear to me that we can sort out "the right API" without a good motivating use case (or multiple). If pandas has that use case I think this should start as a (private) helper function in the pandas test suite and once it is stabilized / proved to be useful, then we can pull it back to Matplotlib (mostly for practical reasons...I assume pandas does not want to wait for what ever version we release this function in to be their oldest tested version of Matplotlib!).

@story645
Copy link
Member

story645 commented Jun 16, 2022

If pandas has that use case I think this should start as a (private) helper function in the pandas test suite

I think part of the reason that the function doesn't yet exist is that it requires an understanding of matplotlib artist semantics and internals (as seen in this discussion) that might be out of scope for pandas.

What about either asking to expand the scope of https://github.com/matplotlib/pytest-mpl or creating a small standalone testing library that initially contains this function and then maybe grows other testing utilities?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants