Wednesday 26 March 2014

Where is my safepoint?

My new job (at Azul Systems) leads me to look at JIT compiler generated assembly quite a bit. I enjoy it despite, or perhaps because, the amount of time I spend scratching my increasingly balding cranium in search of meaning. On one of these exploratory rummages I found a nicely annotated line in the Zing (the Azul JVM) generated assembly:
gs:cmp4i [0x40 tls._please_self_suspend],0
jnz 0x500a0186
Zing is such a lady of a JVM, always minding her Ps and Qs! But why is self suspending a good thing?

Safepoints and Checkpoints

There are a few posts out there on what is a safepoint (here's a nice one going into when it happens, and here is a long quote from Mechnical Sympthy mailing list on the topic). Here's the HotSpot glossary entry:
safepoint

A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. 
To summarize, a safepoint is a known state of the JVM. Many operations the JVM needs to do happen only at safepoints. The OpenJDK safepoints are global, while Zing has a thread level safepoint called a checkpoint. The thing about them is that at a safepoint/checkpoint your code must volunteer to be suspended to allow the JVM to capitalize on this known state.
What will happen while you get suspended varies. Objects may move in memory, classes may get unloaded, code will be optimized or deoptimized, biased locks will unbias.... or maybe your JVM will just chill for a bit and catch its breath. At some point you'll get your CPU back and get on with whatever you were doing.
This will not happen often, but it can happen which is why the JVM makes sure you are never too far from a safepoint  and voluntary suspension. The above instruction from Zing's generated assembly of my code is simply that check. This is called safepoint polling.
The safepoint polling mechanism for Zing is comparing a thread local flag with 0. The comparison is harmless as long as the checkpoint flag is 0, but if the flag is set to 1 it will trigger a checkpoint call (the JNZ following the CMP4i will take us there) for the particular thread. This is key to Zing's pause-less GC algorithm as application threads are allowed to operate independently.

Reader Safpoint

Having happily grokked all of the above I went looking for the OpenJDK safepoint.

Oracle/OpenJDK Safepoints

I was hoping for something equally polite in the assembly output from Oracle, but no such luck. Beautifully annotated though the Oracle assembly output is when it comes to your code, it maintains some opaqueness when it's internals are concerned. After some digging I found this:
test   DWORD PTR [rip+0xa2b0966],eax        # 0x00007fd7f7327000
                                                ;   {poll}
No 'please', but still a safepoint poll. The OpenJDK mechanism for safepoint polling is by accessing a page that is protected when requiring suspension at a safepoint, and unprotected otherwise. Accessing a
protected page will cause a SEGV (think exception) which the JVM will handle (nice explanation here). To quote from the excellent Alexey Ragozin blog:
Safepoint status check itself is implemented in very cunning way. Normal memory variable check would require expensive memory barriers. Though, safepoint check is implemented as memory reads a barrier. Then safepoint is required, JVM unmaps page with that address provoking page fault on application thread (which is handled by JVM’s handler). This way, HotSpot maintains its JITed code CPU pipeline friendly, yet ensures correct memory semantic (page unmap is forcing memory barrier to processing cores).
The [rip+0xa2b0966] addressing is a way to save on space when storing the page address in the assembly code. The address commented on the right is the actual page address, and is equal to the rip (Relative Instruction Pointer) + given constant. This saves space as the constant is much smaller than the full address representation. I thank Mr. Tene for clarifying that one up for me.
If we were to look at safepoint polls throughout the assembly of the same process they would all follow the above pattern of pointing at the same global magic address (via this local relative trick). Setting the magic page to protected will trigger the SEGV for ALL threads. Note that the Time To Safe Point (TTSP) is not reported as GC time and may prove a hidden performance killer for your application. The effective cost of this global safepoint approach goes up the more runnable (and scheduled) threads your application has (all threads must wait for a safepoint consensus before the operation to be carried out at the safepoint can start).


Find The Safpoint Summary

In short, when looking for safepoints in Oracle/OpenJDK assembly search for poll. When looking at Zing assembly search for _please_self_suspend.



4 comments:

  1. Another great blog post Nitsan.

    It's worth mentioning that if you use "-XX:+PrintGCApplicationStoppedTime" you can get pause times including the TTSP in your GC Logs. Its not just GC pause times though it includes all hotspot pauses, for example bulk lock inflation/deflation. On the other hand if you care about pauses you probably don't care about just GC pauses.

    ReplyDelete
    Replies
    1. Thanks :)
      True, true, I wanted to highlight the TTSP will escape your notice if all you look at is GC logs, but you are correct that people should indeed look at the wider picture provided by "-XX:+PrintGCApplicationStoppedTime". I'm not sure this option gets the press it should (though Alexey does mention it in the referenced article)...

      Delete
  2. Good explanation! But why is Zing using the cmp & jnz-pair then? Afaict, the OpenJDK:s safepoint instruction is both faster and shorter.

    ReplyDelete
    Replies
    1. Shorter, yes. Faster is an interesting question. We need to consider 2 cases:
      1. Safe point is not triggered:
      This is the common case, and for this case Zing has an extra instruction overhead. But we should consider that safepoint polls are not that frequent (while loop edge or method call exit), so in normal circumstances the cost difference is going to be very insignificant. This is a branch, but a highly predictable one, so we can expect the CPU to predict it correctly and get on with life. I can say (as the Lead Performance Eng. for Azul) that we have never had safepoint polls overhead come up as an issue either externally or internally.
      2. Safe point is triggered:
      For Zing we hit the unpredicted side of the branch, which has a cost, and then get on with the safepoint in user code. For OpenJDK we hit a page protection fault, which is raised in the kernel, the fault is then reported to the JVM which handles the safepoint. The overhead for OpenJDK in this case is much much higher. Is it likely to dominate the application pause time? I wouldn't think it's that bad, after all the JVM hits many safepoints that are hardly noticeable to most applications.
      To summarize, it's an implementation choice, and both implementations are fine. I see the Zing implementation as consistent with the general goal of Zing to deliver consistently low pause times, but it's not the most important part in that general effort.

      Delete

Note: only a member of this blog may post a comment.