Bug Hunting 01

I create bugs all the time, but usually they are plain stupid. But sometimes I can become quite creative…

A Shock in the Evening

When I have released a new version of my viewer I have the habit to test it after work in the evening some hours after I have finished work and feel relaxed. Quite often I find smaller problems like typos which I missed during the day.

But yesterday I got a shock when with a special file my viewer looked like this:

Strange bug screenshot

Screenshot of viewer showing strange bug (click to open)

Obviously there was something seriously wrong. What was drawn was completely different from that what should have been drawn which is mostly black background. But as it was late I postponed fixing for the next day.

Happy Hunting

I opened the problematic file on my work machine which is using Linux (testing was on Windows) and easily confirmed that the bug was independent of the operation system. Indeed on Linux I managed to even break the main view and not only the two smaller overview images which I didn’t manage on Windows. I checked a few other files but the error appeared only on the one. That was strange because the file was not using special features. Indeed it not even used some of the features I implemented in the last days.

To see from where the lines were originating in the original DWG which was quite large I made use of the picking feature my viewer provides. You can click with a mouse on any drawn shape and see from which item it was created. But picking these strange lines was not possible.

All drawing components in my viewer are double buffered, and as buffer image I’m using the best fit for the platform. This can speed up copying the image to screen. The code is found in the de.caff.util.Utility of the Caff Commons and looks like

  /**
   * Create a buffered image which is optimized for the current platform.
   *
   * Please note that the transparency type usually has an impact on performance.
   * Best is opaque, followed by bitmap, and translucent images may be a drawn a lot slower.
   *
   * Regardless of that it's always a good idea to call the image's {@code flush()} method
   * before it is discarded.
   * @param width        image width
   * @param height       image height
   * @param transparency transparency type, either {@link java.awt.Transparency#OPAQUE},
   *                     {@link java.awt.Transparency#BITMASK}, or {@link java.awt.Transparency#TRANSLUCENT}
   * @return image of given size, optimized for the platform
   */
  @NotNull
  public static BufferedImage createPlatformImage(int width, int height, int transparency)
  {
    final GraphicsEnvironment localGraphicsEnvironment = GraphicsEnvironment.getLocalGraphicsEnvironment();
    if (localGraphicsEnvironment.isHeadlessInstance()) {
      return new BufferedImage(width, height,
                               transparency == Transparency.OPAQUE
                                       ? BufferedImage.TYPE_INT_RGB
                                       : BufferedImage.TYPE_INT_ARGB);
    }
    else {
      GraphicsDevice gd = localGraphicsEnvironment.getDefaultScreenDevice();
      GraphicsConfiguration gc = gd.getDefaultConfiguration();

      return gc.createCompatibleImage(width, height, transparency);
    }
  }

I remembered having problems with platform images before, so I changed above method so it is always returning a standard BufferedImage. But that didn’t help, so I reverted that change.

On the way I noticed another strange fact. The two images to the right were always broken, but the main image was becoming okay as soon as I forced it to redraw. This already showed that finding the problem couldn’t be easy.

Another old error I remembered was that when you use very large coordinates on a component or image the java lib is not very good at clipping these lines. They are just drawn somewhere even if they are complete outside of the component or image. Although basically there shouldn’t appear coordinates like this in the three images as everything should fit perfectly in, but their are many layers of transformations and clipping when bringing the internal model to screen, so it not to difficult to make something wrong there. So I went to the point deep down under these layers of transformations where something is actually drawn to a java.awt.Graphics.

The code looked like

  public void drawLine(@NotNull IPoint3D p1, @NotNull IPoint3D p2, @NotNull Color color, double lineWeight)
  {
    g.setColor(color);
    setLineWeight(lineWeight);
    g.drawLine((int)p1.getX(), (int)p1.getY(), (int)p2.getX(), (int)p2.getY());
  }

I set a conditional breakpoint on the g.drawLine() line with a condition of p1.getX() > 10000 which obviously was not meant to happen.

But it happened what I already expected: nothing. But not nothing in the meaning that the breakpoint was not hit, but that it was called so often that everything hung. Evaluating the condition at a point which was called 100,000th of times was taking too long.

Luckily there is an easy solution for this, so after waiting for a minute I stopped the debugger and inserted a bit of code:

  public void drawLine(@NotNull IPoint3D p1, @NotNull IPoint3D p2, @NotNull Color color, double lineWeight)
  {
    g.setColor(color);
    setLineWeight(lineWeight);
    if (p1.getX() > 10000 || p2.getX() > 10000) {
      System.out.println("Ouch!");
    }
    g.drawLine((int)p1.getX(), (int)p1.getY(), (int)p2.getX(), (int)p2.getY());
  }

This is a neat trick to know. Evaluating a condition at a break point is a very slow operation. If the line is rarely hit, you can live with that. But if it is heavily executed converting the condition to code will speed up everything by several magnitudes. You’ll just have to remember to remove the code when you are finished.

A fast test showed that indeed there were a lot of Ouch!es appearing in the console. So I set an unconditional breakpoint to the println line and it was eventually hit.

This allowed me to have a look at the stack, which looked okay, but somehow the transformations were wrong. It took me a few minutes to understand what was happening: the three images are drawn in different threads, and although the relevant objects were cloned before usage, the cloning missed a few items, especially in one case a transformation stack. So all threads were pushing and popping transformations to the same stack, and therefore this mess appeared. Larger data makes it more probable that the error appears, so in my other tests I was just lucky.

Because I always find Java cloning complex to handle right (especially when you don’t want to miss something crucial like here) I switched the code to copy construction which is more to write but at least for older guys like me easier to get right.