Skip to content

bug#10511 Make Ordering methods consistent #6323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

NthPortal
Copy link
Contributor

Remove overridden method implementations in Ordering.

Reverts 460bbc1

Fixes scala/bug#10511

@scala-jenkins scala-jenkins added this to the 2.13.0-M4 milestone Feb 13, 2018
@NthPortal
Copy link
Contributor Author

/sync

@SethTisue
Copy link
Member

see also scala/bug#5104

this is the kind of thing it can be hard to get a reviewer for...

I'm somewhat uncomfortable with losing the ScalaCheck tests that were added in 460bbc1, are any of them still applicable, or easily adapted to be applicable?

@khernyo are you still around to weigh in...?

@NthPortal
Copy link
Contributor Author

@SethTisue all of the ScalaCheck tests check that various operations are inconsistent with Ordering.compare, and thus are not applicable (given the premise of this PR/issue is accepted)

@NthPortal
Copy link
Contributor Author

NthPortal commented Feb 28, 2018

The root problem is that IEEE floating point values do not have a total ordering; only a partial ordering. Thus, Ordering cannot be implemented for Float or Double without either violating the contract for Ordering or the IEEE spec.

To complicate matters further, Numeric extends Ordering (rather than, perhaps, PartialOrdering). Float and Double are certainly Numeric (they're numbers!), but now cannot properly fulfill part of their contract. As can be seen in 460bbc1, it is the use of Numeric to compare floating point numbers that is particularly confusing. To me at least, it is not a stretch to say that Ordering.max and Ordering.min violate the IEEE spec in order to make a total ordering; however, I would expect Numeric.max and Numeric.min to conform to the IEEE spec. But Numeric.FloatIsFractional inherits from Ordering.FloatOrdering (and similarly for Double).

Since 460bbc1, Floats and Doubles could not reliably be used in sorted collections if the collection used Ordering.lteq, as putting NaN (and perhaps -0.0) in the collection would create inconsistencies (and perhaps lead to non-deterministic sort order). For that reason, I think it is imperative that Ordering.FloatOrdering and Ordering.DoubleOrdering be made internally consistent, and behave as a total ordering.

A possible solution is to use Orderings other than Ordering.FloatOrdering and Ordering.DoubleOrdering to implement Numeric.FloatIsFractional and Numeric.DoubleIsFractional. With this behaviour, it would be possible to keep the ScalaCheck tests from 460bbc1. However, Numeric[Float] and Numeric[Double] would no longer behave the same as Ordering[Float] and Ordering[Double], which could potentially be very confusing (especially if you have the wrong one in the implicit scope by accident).

Another solution is to change Numeric to extend PartialOrdering instead of Ordering (Integral could still extend Ordering). Then, Numeric.FloatIsFractional and Numeric.DoubleIsFractional could implement PartialOrderings complying with the IEEE spec, and Ordering.FloatOrdering and Ordering.DoubleOrdering could implement consistent total orderings which violate the IEEE spec. Of course, this would be a significant change, potentially breaking lots of code.

Remove overridden method implementations in Ordering.

Reverts 460bbc1

Fixes scala/bug#10511
@khernyo
Copy link
Contributor

khernyo commented Feb 28, 2018

I just skimmed it for now but as I see it, you are trading one bugfix for another. I can understand where you are coming from, but accepting this MR will break currently working code, so let's not do that.

@adriaanm
Copy link
Contributor

adriaanm commented Feb 28, 2018

Sorry for not responding sooner. I'm afraid I'm not the best person to review this. (I like floats, but don't know much about comparing them :-))

@adriaanm adriaanm removed their request for review February 28, 2018 15:50
@NthPortal
Copy link
Contributor Author

NthPortal commented Feb 28, 2018

No worries. To be honest, I'm not sure who's the best person to review it.

Edit: Sorry the bot added you back @adriaanm - I've removed the original comment, and hopefully it won't do that again

@NthPortal
Copy link
Contributor Author

@khernyo can I ask what currently working code it breaks?

@khernyo
Copy link
Contributor

khernyo commented Mar 3, 2018

@NthPortal, the commit you reverted is about consistency of primitive relational operators and Ordering of floats. So, any code that were counting on these behaving the same might break.
E.g. you might write a function using Double and then you rewrite it using Numeric[T] (which extends Ordering[T]). The two functions will look exactly the same, but behave subtly differently. Or at least that's what I remember about the problem.

I'd say that 460bbc1 was a short sighted fix but the consistency it provided was good to have. The fixes in the current MR are also desired, but as you concluded, the two fixes are incompatible. I have no idea which one is more important to have, so it's fine by me either way.

I'd vote for using PartialOrdering instead of Ordering for floats but I know it's a huge undertaking. AFAIK, all other solutions will end up with inconsistencies.

Copy link
Contributor

@Ichoran Ichoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fundamental problem with ordering floating-point values, as people have already pointed out. However, this commit doesn't, I think, meaningfully improve the situation because it breaks as much as it fixes.

The fundamental problem is in the IEEE spec, but we still should try to make things work as naturally as possible. In particular, there are three really natural identities, at least one of which we will have to break.

def invariants(x: Double, y: Double) = {
  val o = implicitly[Ordering[Double]]
  def bits(x: Double) =
    java.lang.Double.doubleToRawLongBits(x)
  (x < y) == O.lt(x, y) && // #1
  !(O.lt(x, y) && !bits(O.min(x, y)) == bits(x)) && // #2
  O.min(x, y) == (x min y)  // #3
}

The current state of affairs is that #1 and #3 are true but #2 is false; this PR swaps it so that #1 and #3 are false but #2 is true (but has the advantage of a simpler implementation).

There is an additional constraint, which is orderings should be usable for sorting things. Unfortunately, #1 is incompatible with this; it fails the basic invariant that a total ordering must have of !(x < y && y < x). Sorts often assume this is true, and breaking it can cause not just mis-sorting but runtime failures in the algorithm. Unfortunately, the current state of affairs is that the ordering isn't an ordering.

If we want an ordering, we have to abandon #1, but arguably < has pretty batty behavior anyway for an operator named that.

We still can choose, though, whether #2 or #3 is true. And here, I think, the resolution falls in favor of #3 being true.

This still leaves us with two possibilities regarding the behavior of min. And here I think it's very dangerous to switch from min returning NaN to it returning a value in any case where you might stumble into it by accident. (This could be literally dangerous if anyone relies upon the behavior to detect an error condition.) The most problematic place this could occur subtly is in something like List(1.0, 2.0, Double.NaN).min.

It is also possible to have a typeclass that is weaker that Ordering which is something like Extrema which just defines min and max methods which is implemented by Ordering but for which there's a higher precedence implicit for Extrema[Double] and Extrema[Float] so that collections min/max can work the "right way". Failing that, I would say that overriding min and max in FloatOrdering and DoubleOrdering is needed in this PR for it to be considered (it's always more dangerous to go from an error-signaling to an error-ignoring behavior).

I'm still not sure that this PR is an improvement even with that change. It might be better to simply add a separate total ordering that users can apply when they know they want the total ordering.

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 3, 2018

The problem is, Ordering[Double] already violates #‌3. The primary method of Ordering - compare - is already inconsistent with math.min and math.max. The question is, should Ordering also violate #‌2 (sometimes).

My concern isn't that methods in Ordering do not conform with an actual total ordering; if the methods were consistent with each other, but conformed to IEEE instead of a total ordering, I'd be fine. However, some of the methods are consistent with a total ordering, and some are consistent with IEEE (and some neither), but never both at once. It should not matter whether or not you say O.lt(x, y) or O.compare(x, y) < 0, but it does. It should not matter whether or not you say if (O.lteq(x, y)) x else y or O.min(x, y), but it does. The various methods lt, lteq, gt, max, etc. for Ordering are meant (I believe) for convenience, to make code more readable - O.min(x, y) is much clearer than if (O.compare(x, y) <= 0) x else y. But as it is now, which method you use changes behaviour in often unexpected ways. TraversableOnce.min is implemented using Ordering.lteq, which leads to the unexpected and incorrect result that Seq(5.0, 3.0, NaN, 4.0).min == 4.0.

The purpose of this PR is to make it so that Ordering does one thing, consistently. And due to the signature of compare, it is not actually possible for it to comply with IEEE, unless it throws an exception when NaN is encountered (which would be very bad).

I could change the PR so that Ordering.min and .max forward to math.min and math.max, but that still leaves Ordering internally inconsistent (and with TraversibleOnce.min and .max not being consistent with Ordering.min and .max).


Regarding #‌1, I don't believe there are any values for which x < y && y < x; feel free to correct me if I'm wrong.

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 3, 2018

If others are amenable to it, I would be happy to work on the significant change of having Numeric extend PartialOrdering instead of Ordering (I'm with you here @khernyo). I don't know how much this would break, but I think it would make more sense in the long run (for the reasons given by several people above).

I don't know how much code this would break in the community. We could potentially add min and max methods to Fractional (or Numeric?) which don't inherit from Ordering, if it helps reduce breakage, though of course PartialOrderings are not guaranteed to have min and max for arbitrary elements.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

@NthPortal - Right now, #3 is not violated:

Welcome to Scala 2.12.4 (OpenJDK 64-Bit Server VM, Java 1.8.0_151).
Type in expressions for evaluation. Or try :help.

scala> val o = implicitly[Ordering[Double]]
o: Ordering[Double] = scala.math.Ordering$Double$@5699578f

scala> o.min(1.0, Double.NaN)
res0: Double = NaN

scala> (1.0 min Double.NaN)
res1: Double = NaN

However, the inconsistency you point out is a problem. But I think the resolution is that min and max should be inconsistent, but that implementations of min and max should defer to those of their ordering and not try to recompute it. In particular, this is the law we would break:

xs.isEmpty || (xs.sorted.head == xs.reduce(_ min _))

rather than breaking

xs.isEmpty || (xs.reduce(_ min _) == xs.min)

Both are surprising, but the latter seems even more surprising to me.

Anyway, I am not fine with Ordering not being an ordering in the case of IEEE floating point, except if we think it's necessary to avoid breaking people's stuff. But I am fine with having min and max propagate errors instead of blindly following lt and such.

Note that you can't actually have the consistency you wanted if you follow IEEE rules, because IEEE rules are that everything is false. But of course we want O.lt(x, y) == O.gteq(y, x)--so, no, you can't have a consistent result for compare, but that's because it needs to be consistent with two contradictory things (which are contradictory regardless of whether compare is there at all).

So I think the marginally sensible possibilities are

  1. compare is the odd one out, giving an ordering, and everything else follows IEEE rules
  2. min and max are the odd ones out, following IEEE rules, while all else follows the ordering of compare
  3. We don't even have a default ordering for Float and Double (but provide them as options the user can choose), but we do have a default min/max that follows IEEE rules.

None of these are great, but I maintain that everything else is worse. Again, I don't think it's acceptable that xs.min be different from xs.reduce(_ min _). That's about as fundamental of a relation as you can get.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

Incidentally, I don't think PartialOrdering really helps here--it isn't to IEEE spec, it's more expensive to compute, and it doesn't preserve the API for things that we need (like min over floats, because min isn't definable). As I mentioned before, I do think a new typeclass that computed extrema would help.

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 6, 2018

@Ichoran ah, but xs.min is already different from xs.reduce(_ min _), because TraversibleOnce.min is implemented as reduceLeft((x, y) => if (ord.lteq(x, y)) x else y)

Edit: perhaps that should be changed, but it is the current behaviour

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

@NthPortal - Yes, but TraversableOnce is doing min wrong by not deferring to Ordering#min.

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 6, 2018

@Ichoran Why doesn't PartialOrdering help? AFAICT, it agrees with IEEE spec

trait DoublePartialOrdering extends PartialOrdering[Double] {
  def tryCompare(x: Double, y: Double): Option[Int] = {
    if (x < y) Some(-1)
    else if (x > y) Some(1)
    else if (x == y) Some(0) // includes -0.0 == 0.0
    else None                // one or both is NaN
  }
  def lteq(x: Double, y: Double): Boolean = x <= y
  override def gteq(x: Double, y: Double): Boolean = x >= y
  override def lt(x: Double, y: Double): Boolean = x < y
  override def gt(x: Double, y: Double): Boolean = x > y
  override def equiv(x: Double, y: Double): Boolean = x == y
}

And if you don't use tryCompare, it just uses primitive operations

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

My mistake--PartialOrdering can agree with IEEE spec. But it doesn't solve the min/max problem which is what we had trouble with to begin with; and it doesn't let us sort doubles, which was also what we wanted to do to begin with. It's just a more faithful model of (the comparison operation part of) the spec for IEEE floating point behavior.

@NthPortal
Copy link
Contributor Author

If we have Numeric[Double] use a PartialOrdering instead, then we can have Ordering[Double] be fully consistent and ignore IEEE spec (like Java does)

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

Does that resolve the max/min problem?

@NthPortal
Copy link
Contributor Author

I think so? It makes it so that Numeric[Double].max/.min follow IEEE (as one would expect), and Ordering[Double].max/.min follow the total ordering (which one also might expect).

As you mentioned, it would involve adding an Extrema typeclass as well, probably (or the methods can just be added to Numeric itself).

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

So the proposal is to change the behavior of Array(1.0, Double.NaN, 2.0).min such that it returns 1.0, instead of it now returning 2.0 which is just nonsense, or 2.0 which is what you'd get from .reduce(_ min _)?

@NthPortal
Copy link
Contributor Author

Correct; I am proposing that it would return 1.0 (and .max would return NaN).

If you wanted both min and max to return NaN, you could do Array(1.0, Double.NaN, 2.0).reduce(Numeric[Double].min) as well. Or potentially, the signatures could be changed to require an implicit Extrema instead of Ordering, in which case .min(Numeric[Double]) would work (I don't know if that would break the implicit search).

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

I think the xs.min == xs.reduce(_ min _) invariant is more important than the min-according-to-ordering.

But I guess there is an issue with sorted sets, then, as the trivial implementation wouldn't work.

I do think we should do something; I just don't think this reversion is enough to get there.

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 6, 2018

I have some other issues with Numeric/Ordering as well, so I might start a thread over at https://contributors.scala-lang.org/

Edit: your concerns are another good reason to start a thread

@NthPortal
Copy link
Contributor Author

Funnily enough, I've looked at sorted sets, and there's no easy way to get the min/max by the set's Ordering - it just uses TraversableOnce's implementation (which also seems like something worth changing). So the trivial implementation only exists for maxBefore and minAfter, but not max and min (which is really weird).

@Ichoran
Copy link
Contributor

Ichoran commented Mar 7, 2018

Do you think it's worth going to all the trouble of changing the ordering type, adding new traits, and so on? Or do you think it's maybe better to define a total ordering in, say, Double.totalOrdering and let people invoke it explicitly when it is desired?

@NthPortal
Copy link
Contributor Author

I do think it's worth it. Though adding a Double.totalOrdering would help a little bit, most people would never know it exists, and it wouldn't solve the underlying problems.

I've started working on a rewrite/redesign over at #6375

@NthPortal
Copy link
Contributor Author

NthPortal commented Mar 7, 2018

Suppose we change Numeric to extend PartialOrdering instead. What does Numeric.reverse return?

If it returns a PartialOrdering, then it might break existing code. But if it returns an Ordering, then it breaks future code that uses the Numeric as a PartialOrdering.

(It shouldn't be called regardless, but still)

@Ichoran
Copy link
Contributor

Ichoran commented Mar 7, 2018

It has to return PartialOrdering because there is no "reverse total ordering" because there is no total ordering.

@NthPortal
Copy link
Contributor Author

Hopefully no one calls reverse explicitly on a Numeric (which, again, doesn't make a lot of sense in the first place)

@NthPortal
Copy link
Contributor Author

@Ichoran I just realized a major problem with using the current Ordering in a SortedMap or SortedSet. If Ordering.equiv is used, NaN keys will not be retrievable from the map, because they are never equal.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 11, 2018

@NthPortal - Good point. However, this can be fixed by making sure that SortedMap and SortedSet always use compare to decide what to return.

@NthPortal
Copy link
Contributor Author

@Ichoran true (and I believe that's what they already do); it's still a possible pitfall though

@NthPortal
Copy link
Contributor Author

I agree with the suggestion you made on Discourse, though I'm not certain the best way to implement it

@Ichoran
Copy link
Contributor

Ichoran commented Mar 11, 2018

@NthPortal - Just figure out wherever Ordering[Double] comes from and make sure that both Numeric[Double] and RealOrder[Double] are brought into scope together in such a situation. Then you'll get a conflict, and people will have to go look at the docs where we can explain carefully how to select one to have priority over the other. They might need to come in at low priority so that it's easier to override them.

@NthPortal
Copy link
Contributor Author

superseded by #6410

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants