
A few months back I was satisfying my OCD by reading up on java object memory layout. Now Java, as we all know and love, is all about taking care of such pesky details as memory layout for you. You just leave it to the JVM son, and don't lose sleep over it.
Sometimes though... sometimes you do care. And when you do, here's how to find out.
In theory, theory and practice are the same
Here's an excellent article from a few years back which tells you all about how Java should layout your object, to summarise:
- Objects are 8 bytes aligned in memory (address A is K aligned if A % K == 0)
- All fields are type aligned (long/double is 8 aligned, integer/float 4, short/char 2)
- Fields are packed in the order of their size, except for references which are last
- Classes fields are never mixed, so if B extends A, an object of class B will be laid out in memory with A's fields first, then B's
- Sub class fields start at a 4 byte alignment
- If the first field of a class is long/double and the class starting point (after header, or after super) is not 8 aligned then a smaller field may be swapped to fill in the 4 bytes gap.
- Unaligned access is bad, so JVM saves you from bad layout (unaligned access to memory can cause all sorts of ill side effects, including crashing your process on some architectures)
- Naive layout of your fields would be wasting memory, the JVM reorders fields to improve the overall size of your object
- JVM implementation requires types to have consistent layout, thus requiring the sub class rules
False False Sharing Protection
For one thing, the rules are not part of the JLS, they are just implementation details. If you read Martin Thompson's article about false sharing you'll notice Mr. T had a solution to false sharing which worked on JDK 6, but no longer worked on JDK 7. Here are the 2 versions:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// No false sharing on 6, but happens on 7 | |
public final static class VolatileLong | |
{ | |
public volatile long value = 0L; | |
public long p1, p2, p3, p4, p5, p6; | |
} | |
// No false sharing on 6 or 7 | |
public static class PaddedAtomicLong extends AtomicLong | |
{ | |
public volatile long p1, p2, p3, p4, p5, p6 = 7L; | |
} |
It turns out the JVM changed the way it orders the fields between 6 and 7, and that was enough to break the spell. In fairness there is no rule specified above which requires the fields order to correlate to the order in which they were defined, but ... it's allot to worry about and it can trip you up.
Just as above rules were still fresh in my mind, LMAX (who kindly open sourced the Disruptor) released the Coalescing Ring Buffer. I read through the code and came across the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public final class CoalescingRingBuffer<K, V> implements CoalescingBuffer<K, V> { | |
private volatile long nextWrite = 1; // <-- producer access (my comment) | |
private volatile long lastCleaned = 0; // <-- producer access (my comment) | |
private volatile long rejectionCount = 0; | |
private final K[] keys; | |
private final AtomicReferenceArray<V> values; | |
private final K nonCollapsibleKey = (K) new Object(); | |
private final int mask; | |
private final int capacity; | |
private volatile long nextRead = 1; // <-- consumer access (my comment) | |
private volatile long lastRead = 0; // <-- consumer access (my comment) | |
... | |
} |
I approached Nick Zeeb on the blog post which introduced the CoalescingRingBuffer and raised my concern that the fields accessed by the producer/consumer might be suffering from false sharing, Nick's reply:
I’ve tried to order the fields such that the risk of false-sharing is minimized. I am aware that Java 7 can re-order fields however. I’ve run the performance test using Martin Thompson’s PaddedAtomicLong instead but got no performance increase on Java 7. Perhaps I’ve missed something so feel free to try it yourself.
Now Nick is a savvy dude, and I'm not quoting him here to criticise him. I'm quoting him to show that this is confusing stuff (so in a way, I quote him to comfort myself in the company of others equally confused professionals). How can we know? here's one way I thought of after talking to Nick:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class FalseSharingTest { | |
@Test | |
public void test() throws NoSuchFieldException, SecurityException{ | |
long nextWriteOffset = UnsafeAccess.unsafe.objectFieldOffset( | |
CoalescingRingBuffer.class.getDeclaredField("nextWrite")); | |
long lastReadOffset = UnsafeAccess.unsafe.objectFieldOffset( | |
CoalescingRingBuffer.class.getDeclaredField("lastRead")); | |
assertTrue(Math.abs(nextWriteOffset - lastReadOffset) >= 64); | |
} | |
} |
Using Unsafe I can get the field offset from the object reference, if 2 fields are less than a cache line apart they can suffer from false sharing (depending on the end location in memory). Sure, it's a bit of a hackish way to verify things, but it can become part of your build so in the case of version changes you on't get caught out.

Enough of that false sharing thing... so negative... why would you care about memory layout apart from false sharing? Here's another example.
The Hot Bunch
Through the blessings of the gods, at about the same time LMAX released the CoalescingRingBuffer, Gil Tene (CTO of Azul) released HdrHistogram. Now Gil is seriously, awesomely, bright and knows more about JVMs than most mortals (here's his InfoQ talk, watch it) so I was keen to look into his code. And what do you know, a bunch of hot fields:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public abstract class AbstractHistogram implements Serializable { | |
// "Cold" accessed fields. Not used in the recording code path: | |
long highestTrackableValue; | |
int numberOfSignificantValueDigits; | |
int bucketCount; | |
int subBucketCount; | |
int countsArrayLength; | |
HistogramData histogramData; | |
// Bunch "Hot" accessed fields (used in the the value recording code path) here, near the end, so | |
// that they will have a good chance of ending up in the same cache line as the counts array reference | |
// field that subclass implementations will add. | |
int subBucketHalfCountMagnitude; | |
int subBucketHalfCount; | |
long subBucketMask; | |
... | |
} |
What Gil is doing here is good stuff, he's trying to get relevant fields to huddle together in memory, which will improve the likelihood of them ending up on the same cache line, saving the CPU a potential cache miss. Sadly the JVM has other plans...
So here is another tool to help make sense of your memory layout to add to your tool belt: Java Object Layout I bumped into it by accident, not while obsessing about memory layout at all. Here's the output for Histogram:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running 64-bit HotSpot VM. | |
Using compressed references with 3-bit shift. | |
Objects are 8 bytes aligned. | |
org.HdrHistogram.Histogram | |
offset size type description | |
0 12 (assumed to be the object header + first field alignment) | |
12 4 int AbstractHistogram.numberOfSignificantValueDigits | |
16 8 long AbstractHistogram.highestTrackableValue | |
24 8 long AbstractHistogram.subBucketMask | |
32 4 int AbstractHistogram.bucketCount | |
36 4 int AbstractHistogram.subBucketCount | |
40 4 int AbstractHistogram.countsArrayLength | |
44 4 int AbstractHistogram.subBucketHalfCountMagnitude | |
48 4 int AbstractHistogram.subBucketHalfCount | |
52 4 HistogramData AbstractHistogram.histogramData | |
56 8 long Histogram.totalCount | |
64 4 long[] Histogram.counts | |
68 4 (loss due to the next object alignment) | |
72 (object boundary, size estimate) | |
VM agent is not enabled, use -javaagent: to add this JAR as Java agent |
Note how histogramData jumps to the botton and subBucketMask is moved to the top, breaking up our hot bunch. The solution is ugly but effective, move all fields but the hot bunch to an otherwise pointless parent class:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
abstract class AbstractHistogramColdFields implements Serializable { | |
// "Cold" accessed fields. Not used in the recording code path: | |
long highestTrackableValue; | |
int numberOfSignificantValueDigits; | |
int bucketCount; | |
int subBucketCount; | |
int countsArrayLength; | |
HistogramData histogramData; | |
} | |
public abstract class AbstractHistogram extends AbstractHistogramColdFields { | |
// Bunch "Hot" accessed fields (used in the the value recording code path) here, near the end, so | |
// that they will have a good chance of ending up in the same cache line as the counts array reference | |
// field that subclass implementations will add. | |
int subBucketHalfCountMagnitude; | |
int subBucketHalfCount; | |
long subBucketMask; | |
... | |
} |
And the new layout:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running 64-bit HotSpot VM. | |
Using compressed references with 3-bit shift. | |
Objects are 8 bytes aligned. | |
org.HdrHistogram.Histogram | |
offset size type description | |
0 12 (assumed to be the object header + first field alignment) | |
12 4 int AbstractHistogramColdFields.numberOfSignificantValueDigits | |
16 8 long AbstractHistogramColdFields.highestTrackableValue | |
24 4 int AbstractHistogramColdFields.bucketCount | |
28 4 int AbstractHistogramColdFields.subBucketCount | |
32 4 int AbstractHistogramColdFields.countsArrayLength | |
36 4 HistogramData AbstractHistogramColdFields.histogramData | |
40 8 long AbstractHistogram.subBucketMask | |
48 4 int AbstractHistogram.subBucketHalfCountMagnitude | |
52 4 int AbstractHistogram.subBucketHalfCount | |
56 8 long Histogram.totalCount | |
64 4 long[] Histogram.counts | |
68 4 (loss due to the next object alignment) | |
72 (object boundary, size estimate) | |
VM agent is not enabled, use -javaagent: to add this JAR as Java agent |
Joy! I shall be sending Mr. Tene a pull request shortly :-)
UPDATE 16/01/2014: The excellent JOL has now been released under OpenJDK here. It's even better than before and supports many a funky feature (worthy of a separate post). I've updated the links to point to the new project. Also check out Shipilev's blog post on heap dumps demonstrating the use of this tool.
Did you see BlackHole class from JMH? It uses (for it's inner classes) more interesting padding strategy based on sub/super class fields layout.
ReplyDeleteI didn't and now had both you and Mr. Shipilev point it out and it is a great example. Not sure how you mean more interesting though, it's the same technique but padding both sides instead of one. I totally agree it's one the right ways to avoid false sharing.
DeleteCode discussed is here: http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/logic/BlackHole.java
Thanks :-)
Pull request: https://github.com/giltene/HdrHistogram/pull/6
ReplyDeleteThanks teaching me about Java memory layout! I shall add that test and see if I can squeeze out some more performance.
ReplyDeleteJust goes to show that open-sourcing your code is an awesome way to get better code and a better understanding :-)
Thanks for sharing the Coalescing RB, looking forward to seeing more great contributions from LMAX :-)
DeleteJust merged your pull request, and added some cleanup/comments. Thanks Nitsan!
ReplyDeleteThanks :-) Just started using the HdrHistogram in a couple of projects last month, it's a great piece of work.
Delete锦厚到处一游
ReplyDeleteThanks for the great writing.
ReplyDelete