Tuesday, 8 July 2014

Concurrent Bugs: Size Matters

{This post is part of a long running series on lock free queues, checkout the full index to get more context here}As part of my ongoing work on JCTools (no relation to that awfully popular Mexican dude) I've implemented SPSC/MPSC/MPSC/MPMC lock free queues, all of which conform to the java.util.Queue interface. The circular array/ring buffered variants all shared a similar data structure where a consumer thread (calling the poll() method) writes to the consumerIndex and a producer thread (calling the offer() method) writes to the producerIndex. They merrily chase each other around the array as discussed in this previous post.
Recently I've had the pleasure of having Martin Thompson pore over the code and contribute some bits and pieces. Righteous brother that Mr. T is, he caught this innocent looking nugget (we're playing spot the bug, can you see it?):
What could possibly go wrong?


Like bugs? you'll looove Concurrency!

The code above can return a negative number as the size. How can that be you wonder? Let us see the same method with some slow motion comments:

See it?
  • We need not get suspended for this to happen. The independent loads could have the second one hit a cache miss and get delayed, or maybe the consumer and the producer are just moving furiously fast. The important thing to realise is that the loads are independent and so while there may not be much time between the 2, there is the potential for some time by definition.
  • Because the 2 loads are volatile they cannot get re-ordered.
  • Doing it all on one line, or on 3 different lines (in the same order) makes no difference at all.

So how do we solve this?

Is it really solved?
  • This will at least return a value that is in the correct range.
  • This will NOT ALWAYS return the right or correct number. The only way to get the absolutely correct number would be to somehow read both indexes atomically.
  • Atomically reading 2 disjoint longs is not possible... You can (on recent x86) atomically read 2 adjacent longs, but that is:
    • Not possible in Java
    • A recipe for false sharing
  • We could block the producers/consumers from making progress while we read the 2 values, but that would kill lock freedom.
  • This method is lock free, not wait free. In theory the while loop can spin forever in the same way that a CAS loop can spin forever. That is very unlikely however.
This is as good as it gets baby.

Lesson?

We all know that a thread can get suspended and threads execution can interleave, but it is easy to forget/overlook that when looking at a line of code. This issue is in a way just as naive as incrementing a volatile field and expecting atomicity.
Sometimes there is no atomicity to be found and we just have to come to terms with the best approximation to a good answer we can get...
A further lesson here is that having your code reviewed by others is an amazingly effective way to find bugs, and learn :-)


17 comments:

  1. ...or, you employ the optimistic locking on read path, e.g. with StampedLock, which regains the read atomicity for two longs. Not sure if you can afford write locking on size updater side, or whether you can relax it with tryWriteLock , etc.

    ReplyDelete
    Replies
    1. This will:
      1. kill lock freedom
      2. involve a further deref to the lock object
      3. inevitably slow down the methods we care about more (offer/poll)
      Consistency just comes at too high a price for my taste in this particular instance...

      Delete
  2. The answer is of course that you shouldn't call "size" on a concurrent data structure :-)

    ReplyDelete
  3. ConcurrentHashMapV8 suffers from the same problem. Internally it uses a bunch of counter cells (similar to Striped64) that is incremented and decremented concurrently. One thread might add a value to the map which increments the count of cell1 to 1. Another thread might remove the value and update the count of cell2 to -1. The size is basically just the sum of all cells after, in this case after the add and remove: 1 + -1 = 0

    Obvious the calculation of the sum is non atomically so it can get negative when puts/removes are interleaved with the size calculation.

    I remember the first version of ConcurrentHashMapV8 just returned the actual sum.
    The pragmatic solution was just to return 0 in case the sum was negative. And that is basically how it still works.

    ReplyDelete
  4. I think most programmers anyway expect a rough estimation when calling size() on a concurrent queue. Even an "isEmpty() { return q.size() == 0 }" will return eventually consistent results.

    E.g. i use size() for (heuristic) flow control and to control spinlocking, so the fix introduces cost without improving anything. Is there a point in obtaining a "less-fuzzy" result ?

    ReplyDelete
    Replies
    1. There's a few other ways to cook this:
      size1 = Math.max(lvProducerIndex() - lvConsumerIndex(),0); -> underestimate size
      size2 = Math.min(-(lvConsumerIndex() -lvProducerIndex()),capacity); -> over estimate size
      As for fuzziness, size() is fuzzy by definition as long as it is lock free. The implementation in the post acknowledges the possibility of error and attempts to correct when it is observed. The 'cost' is invested in accuracy, but is arguably not worth it. If you go for one of the above implementations, I'd go for over estimates rather than under estimates.

      Delete
  5. @Override
    public int size() {
    long currentConsumerIndexBefore;
    long currentProducerIndex;
    long currentConsumerIndexAfter;

    do {
    currentConsumerIndexBefore = lvConsumerIndex();
    // LoadLoad
    currentProducerIndex = lvProducerIndex();
    // LoadLoad
    currentConsumerIndexAfter = lvConsumerIndex();
    } while (currentConsumerIndexBefore != currentConsumerIndexAfter);

    return (int)(currentProducerIndex - currentConsumerIndexBefore);
    }

    ReplyDelete
    Replies
    1. Nice, and more accurate. Now why didn't you think of that in the first place?
      ;-)

      Delete
    2. Was sitting in the bath earlier.... ;-)

      Delete
    3. Concurrency does require some back burner thinking...

      Delete
    4. Is moving one volatile read out of cycle too obvious that no one mentions it or does it break something?

      public int size() {
      long currentConsumerIndexBefore;
      long currentProducerIndex;
      long currentConsumerIndexAfter = lvConsumerIndex();

      do {
      currentConsumerIndexBefore = currentConsumerIndexAfter;
      currentProducerIndex = lvProducerIndex();
      currentConsumerIndexAfter = lvConsumerIndex();
      } while (currentConsumerIndexBefore != currentConsumerIndexAfter);

      return (int)(currentProducerIndex - currentConsumerIndexBefore);
      }

      Delete
    5. That optimisation should help with a busy queue and I believe it would be safe.

      Delete
  6. why is "while (size < capacity)"? if size < 0, it will break the while loop and return negative value, too.

    ReplyDelete
    Replies
    1. Assumption is that consumerIndex is less than or equal to producerIndex, which is enforced by the offer/poll methods. Reading consumerIndex before producerIndex allows the distance between them to grow. So given a time line of:
      T0: When we read consumerIndex
      T1: When we read producerIndex
      We can say that consumerIndex@T0 <= producerIndex@T0 <= producerIndex@T1
      size = producerIndex@T1 - consumerIndex@T0
      So we expect size to be positive. Still, maths and computers are not the same. Size is an int and can therefore overflow.
      For size to be negative producerIndex@T1 would have to be more than MAX_INT elements away from consumerIndex@T0. This could happen (rather unlikely, but could), I will consider it a minor bug to be fixed in future release.
      Thanks :-)

      Delete
    2. I do not belive it would be possible to have the producer index more than Integer.MAX_VALUE away from the consumer index when the underlying array cannot be greater than Interger.MAX_VALUE in size due to Java language restrictions.

      Delete
    3. Assuming the code in question is:
      int size;
      do {
      final long currentConsumerIndex = lvConsumerIndex(); // T0
      final long currentProducerIndex = lvProducerIndex(); // T1
      size = (int)(currentProducerIndex - currentConsumerIndex);
      } while (size > capacity);
      return size;
      Then I think it can happen for the same reason the loop is required.

      Delete
    4. Indeed. I believe you are correct on this :-) Missed that.

      Delete