This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Foo{ | |
volatile Foo next; | |
Foo getNextNext(){ | |
// Commented out code as reminder of silly bug | |
// if (next != null) { | |
// // This can still result in NPE, next can change between reads | |
// return next.next; | |
// } | |
// This is how we do it! | |
Foo currNextVal = next; | |
if (currNextVal != null) { | |
return currNextVal.next; | |
} | |
return null; | |
} | |
} |
Q: "But... I not be doing no concurrency or nuffin' guv"
A: Using Unsafe to gain a view of on-heap addresses is concurrent access by definition.
Unsafe address: What is it good for?
Absolutely nothing! sayitagain-huh! I exaggerate, if it was good for nothing it would not be there, let's look at the friggin manual:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Allocates a new block of native memory, of the given size in bytes.The | |
* contents of the memory are uninitialized; they will generally be | |
* garbage.The resulting native pointer will never be zero, and will be | |
* aligned for all value types.Dispose of this memory by calling {@link | |
* #freeMemory}, or resize it with {@link #reallocateMemory}. | |
* | |
* @throws IllegalArgumentException if the size is negative or too large | |
* for the native size_t type | |
* | |
* @throws OutOfMemoryError if the allocation is refused by the system | |
*/ | |
public native long allocateMemory(long bytes); | |
/** | |
* Fetches a native pointer from a given memory address.If the address is | |
* zero, or does not point into a block obtained from {@link | |
* #allocateMemory}, the results are undefined. | |
* <p> If the native pointer is less than 64 bits wide, it is extended as | |
* an unsigned number to a Java long.The pointer may be indexed by any | |
* given byte offset, simply by adding that offset (as a simple integer) to | |
* the long representing the pointer.The number of bytes actually read | |
* from the target address maybe determined by consulting {@link | |
* #addressSize}. | |
*/ | |
public native long getAddress(long address); | |
/** | |
* Stores a native pointer into a given memory address.If the address is | |
* zero, or does not point into a block obtained from {@link | |
* #allocateMemory}, the results are undefined. | |
* <p> The number of bytes actually written at the target address maybe | |
* determined by consulting {@link #addressSize}. | |
*/ | |
public native void putAddress(long address, long x); |
As we can see the behaviour is only defined if we use the methods together, and by that I mean that get/putAddress are only useful when used with an address that is within a block of memory allocated by allocateMemory. Now undefined is an important word here. It means it might work some of the time... or it might not... or it might crash your VM. Let's think about this.
Q: What type of addresses are produced by allocateMemory?
A: Off-Heap memory addresses -> unmanaged memory, not touched by GC or any other JVM processes
The off-heap addresses are stable from the VM point of view. It has no intention of running around changing them, once allocated they are all yours to manage and if you cut your fingers in the process or not is completely in your control, this is why the behaviour is defined. On-Heap addresses on the other hand are a different story.
Playing With Fire: Converting An Object Ref to An Address
So imagine you just had to know the actual memory address of a given instance... perhaps you just can't resist a good dig under the hood, or maybe you are concerned about memory layout... Here's how you'd go about it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static final int REF_SIZE = ...; | |
static final int OBJECT_ARRAY_BASE = UNSAFE.arrayBaseOffset((Object[].class); | |
static final boolean USE_COMPRESSED_REFS = ...; | |
static final int COMPRESSED_REF_SHIFT = ...; | |
public static long addressOf(Object o) { | |
return addressOf(o, REF_SIZE); | |
} | |
public static long addressOf(Object o, int oopSize) { | |
Object[] array = new Object[1]; | |
array[0] = o; | |
long objectAddress; | |
switch (oopSize) { | |
case 4: | |
objectAddress = UNSAFE.getInt(array, OBJECT_ARRAY_BASE) & 0xFFFFFFFFL; | |
break; | |
case 8: | |
objectAddress = UNSAFE.getLong(array, OBJECT_ARRAY_BASE); | |
break; | |
default: | |
throw new Error("unsupported address size: " + oopSize); | |
} | |
array[0] = null; | |
return toNativeAddress(objectAddress); | |
} | |
public static long toNativeAddress(long address) { | |
if (USE_COMPRESSED_REFS) { | |
return address << COMPRESSED_REF_SHIFT; | |
} else { | |
return address; | |
} | |
} |
Now... you'll notice the object ref needs a bit of cuddling to turn into an address. Did I come up with such devilishly clever code myself? No... I will divulge a pro-tip here:
If you are going to scratch around the underbelly of the JVM, learn from as close to the JVM as you can -> from the JDK classes, or failing that, from an OpenJDK project like JOL (another Shipilev production)In fact, the above code could be re-written to:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Let Shipilev do all the heavy lifting! | |
import org.openjdk.jol.util.VMSupport; | |
... | |
// Suddenly, I feel the urge to get an object address | |
long address = VMSupport.addressOf(anUnsuspectingObject); | |
// Satisfied. I move on to never using that address... | |
... |
Key Point: On-Heap Addresses Are NOT Stable
Consider the fact that at any time your code may be paused and the whole heap can be moved around... any address value you had which pointed to the heap is now pointing to a location holding data which may be trashed/outdated/wrong and using that data will lead to a funky result indeed. Also consider that this applies to class metadata or any other internal accounting managed by the JVM.
If you are keen to use Unsafe in the heap, use object references, not addresses. I would urge you not to mix the 2 together (i.e. have object references to off-heap memory) as that can easily lead to a very confused GC trying to chase references into the unknown and crashing your VM.
This is some sweet macheta swinging action :-). The dude who wrote this is not suggesting it is safe, and only claims it is correct on a 32bit VM. And indeed, it can work and passes cursory examination. The author also states correctly that this will not work for arrays and that with some corrections this can be made to work for 64 bit JVMs as well. I'm not going to try and fix it for 64 bit JVMs, though most of the work is already done in the JOL code above. The one flaw in this code that cannot be reliably fixed is that it relies on the native Klass address (line 6) to remain valid long enough for it to chase the pointer through to read the layout helper (line 8). Spot the similarity to the volatile bug above?
This same post demonstrates how to forge references from on-heap objects to off-heap 'objects' which in effect let you cast a pointer to a native reference to an object. It goes on to state that is a BAD IDEA, and indeed it can easily crash your VM when GC comes a knocking (but it might not, I didn't try).
Consider the following method of making an off-heap copy of an object (from here, Mishadof's blog):
We see the above is using the exact same method for computing size as demonstrated above. It's getting the on-heap object address (limited correctness, see addresses discussion above) than copying the object off-heap and reading it back as a new object copy... Calling the Unsafe.copyMemory(srcAddress, destAddress, length) is inviting the same concurrency bug discussed above. A similar method is demonstrated in the HighScalability post, but there the copy method used is Unsafe.copyMemory(srcRef, srcOffset, destRef, destOffset, length). This is important as the reference using method is not exposed to the same concurrency issue.
Both are playing with fire ofcourse by converting off-heap memory to objects. Imagine this scenario:
What will happen if we read that stale reference? I've seen the VM crash in similar cases, but it might just give you back some garbage values, or let you silently corrupt some other instance state... oh, the fun you will have chasing that bugger down...
If you are keen to use Unsafe in the heap, use object references, not addresses. I would urge you not to mix the 2 together (i.e. have object references to off-heap memory) as that can easily lead to a very confused GC trying to chase references into the unknown and crashing your VM.
Case Study: SizeOf an Object (Don't do this)
This dazzling fit of hackery cropped up first (to my knowledge) here on the HighScalability blog:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public static long sizeOf(Object object) { | |
Unsafe unsafe = getUnsafe(); | |
// Original : return unsafe.getAddress( normalize( unsafe.getInt(object, 4L) ) + 12L ); | |
// This is my elaborate breakdown of original one liner | |
int addressOfKlassInObjectHeader = unsafe.getInt(object, 4L); | |
long nativeAddressOfKlass = normalize(addressOfKlassInObjectHeader); | |
long addressOfLayoutHelper = nativeAddressOfKlass + 12L; | |
return unsafe.getAddress(addressOfLayoutHelper); | |
} | |
public static long normalize(int value) { | |
if(value >= 0) return value; | |
return (~0L >>> 32) & value; | |
} |
This same post demonstrates how to forge references from on-heap objects to off-heap 'objects' which in effect let you cast a pointer to a native reference to an object. It goes on to state that is a BAD IDEA, and indeed it can easily crash your VM when GC comes a knocking (but it might not, I didn't try).
Case Study: Shallow Off-Heap Object Copy (Don't do this)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static Object shallowCopy(Object obj) { | |
long size = sizeOf(obj); | |
long start = toAddress(obj); | |
long address = getUnsafe().allocateMemory(size); | |
getUnsafe().copyMemory(start, address, size); | |
return fromAddress(address); | |
} | |
static long toAddress(Object obj) { | |
Object[] array = new Object[] {obj}; | |
long baseOffset = getUnsafe().arrayBaseOffset(Object[].class); | |
return normalize(getUnsafe().getInt(array, baseOffset)); | |
} | |
static Object fromAddress(long address) { | |
Object[] array = new Object[] {null}; | |
long baseOffset = getUnsafe().arrayBaseOffset(Object[].class); | |
getUnsafe().putLong(array, baseOffset, address); | |
return array[0]; | |
} | |
static long sizeOf(Object object){ | |
return getUnsafe().getAddress( | |
normalize(getUnsafe().getInt(object, 4L)) + 12L); | |
} | |
static long normalize(int value) { | |
if(value >= 0) return value; | |
return (~0L >>> 32) & value; | |
} |
Both are playing with fire ofcourse by converting off-heap memory to objects. Imagine this scenario:
- a copy of object A is made which refers to another object B, the copy is presented as object C
- object A is de-referenced leading to A and B being collected in the next GC cycle
- object C is still storing a stale reference to B which is no managed by the VM

Apologies
I don't mean to present either of the above post authors as fools, they are certainly clever and have presented interesting findings for their readers to contemplate without pretending their readers should run along and build on their samples. I have personally commented on some of the code on Mishadof's post and admit my comments were incomplete in identifying the issues discussed above. If anything I aim to highlight that this hidden concurrency aspect can catch out even the clever.
Finally, I would be a hypocrite if I told people not to use Unsafe, I end up using it myself for all sorts of things. But as Mr. Maker keeps telling us "Be careful, because scissors are sharp!"