Double Equality

Fri, Feb 10, 2023

Today I once again was hit by a simple question: when are two double (or float) values equal?

Intro

Please note that I refer to double in the following text, but everything said is true for float, and not only for Java: most programming languages share some of these problems.

If you get here you are probably familiar with the problem:

the simple question “are two double values equal?” does not have a simple answer on a computer. The standard ANSI / IEEE Std 754 which is nowadays used for all floating point arithmetics on computers hardcoded inside their processors defines some special numbers:

There is not just one zero, there is a positive and a negative zero. The idea behind that is that you can differ how you reach a possible underflow, and that you can decide whether dividing by the given value results in positive or negative infinity. Still the standard says that they represent the same number and have to be handled as if they are equal.
So there are two infinities, too: positive and negative infinity. Again dividing a standard number by either will give positive or negative zero.
Last there is also NaN (Not a Number) which you e.g. get by calculating 0.0/0.0. Indeed, there is not only one such number, but a whole universe of them, which is sometimes used to hide data inside a double (a technique known as NaN Boxing). Nearly any math calculation with a NaN involved will result in NaN. The surprising fact regarding NaN is that nearly all comparisons where NaN is involved result in false, because it is considered out of the range of numbers. So it is unequal to any number including itself: Double.NaN == Double.NaN is false. And it does not compare, so the same happens for <, <=, >=, and >. Only Double.NaN != Double.NaN is true.

It’s obvious that this last feature of NaN makes it hard to check two floating point values for equality. Keeping the above in mind the following simple solution will return false when called with two NaN values:

// DON'T USE THIS!
        static boolean areEqual(double d1, double s2)
        {
          return d1 == d2;  
        }

Please note that there is a related question: is a double value equal to a constant? In this case using == is no problem, as you know that at least one of the values is not NaN (if you don’t use NaN deliberately, but why should you). So you can still use something like if (x == 3.0) (see chapter A Last Warning below why this is still a bad idea).

Standard Solution

The standard solution in Java is falling back to java.lang.Double#compare(double, double). If it does return 0 both values are considered equal:

// YOU SHOULDN'T USE THIS! But sorrily that's how Java works.
        static boolean areEqual(double d1, double s2)
        {
          return Double.compare(d1, d2) == 0;  
        }

It avoids the basic NaN quirk indeed, but let’s see how it is implemented:

// CITATION FROM JAVA SOURCECODE OF java.lang.Double
    public static int compare(double d1, double d2) {
        if (d1 < d2)
            return -1;           // Neither val is NaN, thisVal is smaller
        if (d1 > d2)
            return 1;            // Neither val is NaN, thisVal is larger

        // Cannot use doubleToRawLongBits because of possibility of NaNs.
        long thisBits    = Double.doubleToLongBits(d1);
        long anotherBits = Double.doubleToLongBits(d2);

        return (thisBits == anotherBits ?  0 : // Values are equal
                (thisBits < anotherBits ? -1 : // (-0.0, 0.0) or (!NaN, NaN)
                 1));                          // (0.0, -0.0) or (NaN, !NaN)
    }

The implementor added helpful comments, so its workings should be clear. For two equal values, or if at least one is NaN, everything falls through, and then the bits are compared directly. It will indeed return 0 when called with two Double.NaN values. All NaN values will be placed after everything else including the infinities.

But there are a few remaining problems, still. As said there is a myriad of possibly NaN values (exactly 9,007,199,254,740,990 in a 64bit double precision value, with equal subsets quiet and signalling NaNs), and the compare method above will differ between all of them which may not be what you want as Double.isNaN() will return true for any of them. In general the rest of Java is basically designed as if there is only one NaN value.

But the greater problem is that it will still differ between the two 0.0 values, and this is quite unnatural as they represent the same mathematical number, and standard compare will consider them the same.

Improved Solution

The following solution which is contained in the de.caff.generics.Primitives class in the generics module of my de·caff Commons provides what I consider the most natural equality check. It is shown for both double and float, and you don’t need the whole jar, you can easily copy the following into your own project. It collapses both positive and negative zero as well as all possible NaN values.

  /**
   * Are two primitive {@code double} values equal?
   * <p>
   * Thanks to the underlying standard ANSI / IEEE Std 754-1985 defining floating point arithmetic
   * {@code Double.NaN} is not equal to anything, not even to itself. This is often not what you want.
   * {@link java.lang.Double#compare(double, double)} falls back to the binary representation in
   * some cases, which again is not always a good idea as it differs between {@code -0.0} and
   * {@code +0.0} which are considered not equal, although they represent the same mathematical number.
   * <p>
   * This method considers two values equal if the both represent the same number.
   * So two {@code NaN} values are equal, and both {@code 0.0} values are equal in any combination, too.
   * This seems the most natural way to compare doubles.
   * @param v1 first value to compare
   * @param v2 second value to compare
   * @return {@code true}: if both values are considered equal according to the above definition<br>
   *         {@code false}: otherwise
   */
  public static boolean areEqual(double v1, double v2)
  {
    if (v1 == v2) { // this handles -0.0 == 0.0 correctly
      return true;
    }
    return Double.isNaN(v1)  &&  Double.isNaN(v2);  // make any 2 NaN values also equal
  }

  /**
   * Are two primitive {@code float} values equal?
   * <p>
   * Thanks to the underlying standard ANSI / IEEE Std 754-1985 defining floating point arithmetic
   * {@code Float.NaN} is not equal to anything, not even to itself. This is often not what you want.
   * {@link java.lang.Float#compare(float, float)} falls back to the binary representation in
   * some cases, which again is not always a good idea as it differs between {@code -0.0f} and
   * {@code +0.0f} which are considered not equal, although they represent the same mathematical number.
   * <p>
   * This method considers two values equal if the both represent the same number.
   * So two {@code NaN} values are equal, and both {@code 0.0f} values are equal in any combination, too.
   * This seems the most natural way to compare floats.
   * @param v1 first value to compare
   * @param v2 second value to compare
   * @return {@code true}: if both values are considered equal according to the above definition<br>
   *         {@code false}: otherwise
   */
  public static boolean areEqual(float v1, float v2)
  {
    if (v1 == v2) { // this handles -0.0f == 0.0f correctly
      return true;
    }
    return Float.isNaN(v1)  &&  Float.isNaN(v2); // make any 2 NaN values also equal
  }

Porting to other languages should be dead-simple, the library method isNaN(v) just returns the result of v != v which is only true for NaN values.

Still this does not handle hashing (which is also broken in that it differs between -0 and +0 and all possible NaN values) and sorting. Thanks to automatic boxing any double value can become a Double with different behavior (see examples in the next section) which makes it impossible to circumvent these problems without too much overhead even if you want to.

The Primitives class above still provides methods for hashing double and float values in a way that agrees with the equality methods above.

A Dark Corner of Java

The standard solution for equality checks in Java is the comparison method mentioned above. This leads to a dark corner because it makes java.lang.Double and double behave differently and can lead to subtle bugs.

All asserts in the following code are fulfilled:

    // === Negative and positive zero ===
    final double nz = -0.0;  // negative zero
    final double pz = +0.0;  // positive zero, same as 0.0

    final Double abNZ = nz; // automatically boxed negative zero (i.e. Double.valueOf())
    final Double abPZ = pz; // automatically boxed positive zero (i.e. Double.valueOf())

    // Equality    
    assert nz == pz;           // equal because they represent the same number (IEEE 784)

    assert !abNZ.equals(abPZ); // but Java considers them to be different when boxed

    // Hashing
    assert Double.hashCode(nz) != Double.hashCode(pz);  // hashcodes differ
    assert Double.hashCode(nz) == abNZ.hashCode();
    assert Double.hashCode(pz) == abPZ.hashCode();

    // Comparison
    assert !(nz < pz);                          // still the same, so neither is smaller or greater
    assert abNZ.compareTo(abPZ) < 0;            // Java thinks differently
    
    // Automatic boxing/unboxing
    assert abNZ == pz  &&  !abNZ.equals(pz);    // same, but not equal

    // === NaN ===
    final double nan = Double.NaN;
    final Double bNaN = nan;

    assert bNaN != nan  &&   bNaN.equals(nan);  // deliberately: different values, but equal

Thanks to automatic boxing and unboxing especially the highlighted line commented with same, but not equal is basically pure evil: these two things are identical, but they are not equal.

As a result any Map with Double keys (or Set with Double values) will show strange behavior if you use both -0.0 and +0.0. This may easily happen if they key comes from a calculation which results in an underflow of the possible range.

As this behavior is codified deep inside the implementation of java.lang.Double since Java 1.0 there is not much hope that the above will ever change. Software might depend on it, although I’m pretty sure that for more than 99% of usages this behavior regarding the two zero values is unwanted, unexpected and therefore dangerous.

A Last Warning

In most cases you should not check floating point values for exact equality like is done above. Usually floating point calculation introduce inaccuracies, so you should always be prepared for deviations by comparing with a small margin, usually a small number named epsilon.

It is often not simple to decide for a good epsilon, as it depends on the expected range of handled numbers.

Nevertheless, for completeness here are also implementations for the good way to compare floating numbers:

  /**
   * Are two primitive {@code double} values nearly equal?
   * @param v1 first value to compare
   * @param v2 second value to compare
   * @param eps allowed deviation for both values to be still considered equal, a small non-negative number
   * @return {@code true}: if both values are considered equal within the given allowance<br>
   *         {@code false}: if they differ too much
   */
  public static boolean areEqual(double v1, double v2, double eps)
  {
    assert eps >= 0.0;
    return Math.abs(v1 - v2) <= eps;
  }

  /**
   * Are two primitive {@code float} values nearly equal?
   * @param v1 first value to compare
   * @param v2 second value to compare
   * @param eps allowed deviation for both values to be still considered equal, a small non-negative number
   * @return {@code true}: if both values are considered equal within the given allowance<br>
   *         {@code false}: if they differ too much
   */
  public static boolean areEqual(float v1, float v2, float eps)
  {
    assert eps >= 0.0f;
    return Math.abs(v1 - v2) <= eps;
  }

Using an epsilon of 0.0 in the above methods would also provide an exact equality method which is usually doing okay, but involves a bit more calculations. As expected a NaN value for any parameter will result in false, i.e. not equal, which again implies NaN != NaN. But as you use these methods on calculated and/or expected values calling them with NaN would be a sign of a problem elsewhere, and the assertions (if enabled) would take care of the stupid idea of making epsilon NaN.