Psychosomatic, Lobotomy, Saw: Java Concurrency Torture Update: Don't Stress

Wednesday, 24 July 2013

Java Concurrency Torture Update: Don't Stress

A piece of late news: the concurrency suite is back. And so is my little test on top of it.
A few months ago I posted on yet another piece of joyful brilliance crafted by Master Shipilev, the java-concurrency-suite. I used the same framework to demonstrate some valid concerns regarding unaligned access to memory, remember children:

Unaligned access is not atomic
Unaligned access is potentially slow, in particular:

On older processors
If you cross the cache line

Unaligned access can lead to SEG_FAULT on non-intel architectures, and the result of that might be severe performance hit or your process crashing...

It was so much fun that someone had to put a stop to it, and indeed they did.
The java-concurrency-suite had to be removed from github, and the Master Inquisitor asked me to stop fucking around and remove my fork too, if it's not too much trouble... so I did.
Time flew by and at some point the powers that be decided the tool can return, but torture was too much of a strong word for those corporate types and so it has been rebranded JCStress. JC is nothing to do with the Jewish Community, as some might assume, it is the same Java Concurrency but now it's not torture (it's sanctioned after all), it's stress. Aleksey is simply stressing your JVM, he is not being excessively and creatively sadistic, that would be too much!
Following the article I had a short discussion with Mr T and Gil T with regards to unaligned access in the comments to Martin's blog post on off-heap tuple like storage. The full conversation is long, so I won't bore you, but the questions asked were:

Is unaligned access a performance only, or also a correctness issue?
Are 'volatile' write/reads safer than normal writes/reads?

The answers to the best of my understanding are:

Unaligned access is not atomic[Correction 23/09/2013: On later Intel processor unaligned access within the cache line is atomic, but access across the line is not. See update below and related later post], and therefore can lead to the sort of trouble you will not usually experience in your programs (i.e half written long/int/short values). This is a problem even if you use memory barriers correctly. It adds a 'happened-badly' eventuality to the usual happens before/after reasoning and special care must be taken to not get slapped in the face with a telephone pole.
Volatile read/writes are NOT special. In particular doing a CAS[Correction 23/09/2013: CAS is atomic, all others are not. See update below and related later post]/putOrdered/putVolatile write to an unaligned location such that the value is written across the cache line is NOT ATOMIC.

This came up in a discussion recently, which prompted me to go and fork JCStress on to bitbucket and rerun the experiments on both a Core2Duo and a Xeon processor and re-confirmed the result. This is a problem for folks considering the Atomics for ByteBuffer suggestion discussed on the concurrency-interest mailing list. There is nothing in the ByteBuffer interface to stop you from doing unaligned access...

UPDATE (1/08/2013): Shipilev's slides from JVMLS lightning talk.

UPDATE (23/09/2013): I've discussed the issue further with Gil and Martin, which led to the following post. Some corrections to the above observations have been made which I now inlined above.

5 comments:

Seth Williams5 Sept 2013, 05:37:00
I hope oracle comes up with some kind of fix for these issues with Java concurrency.
ReplyDelete
Replies
forked_franz30 Jun 2015, 18:20:00
Hi Nitsan,
the header of each message in the experimental ipc implementation of a spsc ringbuffer of JCTools is alligned properly (to avoid crossing the cache line) ? If not, because the load volatile and ordered store of the state (ready/busy) of the message could cross the cache line it could turn to a atomicity issue for the producer/consumer?
ReplyDelete
Replies

Note: only a member of this blog may post a comment.