Escape Analysis was a much celebrated optimisation added to the JVM in Java 6u23:
"Based on escape analysis, an object's escape state might be one of the following:Alas, proving an object never escapes is a difficult problem, and many people feel they cannot rely on this optimisation to kick in and "do the right thing" for them. Part of the problem is that there is no easy way to discover if a particular allocation has been eliminated (on a debug OpenJDK build one can use -XX:+UnlockDiagnosticVMOptions -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations).
GlobalEscape
– An object escapes the method and thread. For example, an object stored in a static field, or, stored in a field of an escaped object, or, returned as the result of the current method.
ArgEscape
– An object passed as an argument or referenced by an argument but does not globally escape during a call. This state is determined by analyzing the bytecode of called method.After escape analysis, the server compiler eliminates scalar replaceable object allocations and associated locks from generated code. The server compiler also eliminates locks for all non-globally escaping objects. It does not replace a heap allocation with a stack allocation for non-globally escaping objects." - from Java 7 performance enhancements
NoEscape
– A scalar replaceable object, meaning its allocation could be removed from generated code.
The EscapeAnalysis skepticism leads some people to go as far as claim that the JIT compiler fails to eliminate the iterator allocation of collections, which are everywhere:
Buy why skepticize what you can analyze?
Theory: even simple iterators do not benefit from Escape Analysis
So lets give this some thought. I generally think the best of the JVM developer bunch. Sure they may miss here and there, they're just human after all, but we are talking about a good strong bunch of engineers. I tend to assume that if there is a simple case for an optimization that is worth while than they have not failed to capitalize on it. I would therefore be quite surprised if indeed iterators do not benefit from escape analysis as they seem quite natural candidates, particularly in the syntactic sugar case, but even in the explicit case. Still, my trust in the JVM engineers is no evidence, how can I prove this works for a given setup?
- Use a debug build... I invite the readers to try this method out, didna do it.
- Use a memory profiler, like the one packed with VisualVM or YourKit
- Setup an experiment to observe before/after effect of desired optimization. Use -XX:-+DoEscapeAnalysis and examine gc logs to observe effect.
- Look at the assembly...
Experiment: Observe a loop over an array list
Reach for your tool belts and pick the swiss army knife that is JMH. Here's the benchmark I will use for this:
This is one of them times when I'm not particularly concerned with the performance of this bit of code as such, but JMH makes a good crucible for java code. The same effort that went into correct measurement enables the examination of code in an isolated manner. This makes JMH a good tool for testing the JIT compiler.
Measurement: Profile the experiment
I want to plug the experiment into a profiler, so I set the number of iterations to 1000 and get into it man, profiling, doing it you know, like a... like a sex machine, can I count it all? Here's YourKit reporting:
Ouch! that array list iterator is right there at the top! all that trust I put in the JVM developers? GONE! Let's get a second opinion from JVisualVM, just to be sure:
Oh me, oh my, this is bad...
Finally, with tears in our eyes, let us try see what Java Mission Control has to say:
Finally, with tears in our eyes, let us try see what Java Mission Control has to say:
How curious! the iterator allocation is gone! but JMC is outnumbered 2 to 1 here. Could it be that some profilers are pooping their pants?
Measurement: Measure with -XX:+/-DoEscapeAnalysis
Sometimes we get lucky and the optimization we want to examine comes with a handy flag to turn it on and off. We expect escape analysis to remove the iterator allocation and thus leave us with a benchmark which generates no garbage. We are also fortunate that JMH comes with the handy GC profiler which simply examines the GC JMX bean to inform us if any collections happened. Running the benchmark with a short list size and a small heap should trigger plenty of young generation GC cycle in each measurement iteration. Lo and behold, with escape analysis on:
$java -jar target/microbenchmarks.jar -f 1 -i 10 -wi 10 -p size=1000 \
-jvmArgs="-Xms64m -Xmx64m -XX:+DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark (size) Score Error Units
sumIteratorOverList 1000 816.367 ± 17.768 ns/op
sumIteratorOverList:@gc.count.profiled 1000 0.000 ± NaN counts
sumIteratorOverList:@gc.count.total 1000 0.000 ± NaN counts
sumIteratorOverList:@gc.time.profiled 1000 0.000 ± NaN ms
sumIteratorOverList:@gc.time.total 1000 0.000 ± NaN ms
$java -jar target/microbenchmarks.jar -f 1 -i 10 -wi 10 -p size=1000 \
-jvmArgs="-Xms64m -Xmx64m -XX:-DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark (size) Score Error Units
sumIteratorOverList 1000 940.567 ± 94.156 ns/op
sumIteratorOverList:@gc.count.profiled 1000 19.000 ± NaN counts
sumIteratorOverList:@gc.count.total 1000 42.000 ± NaN counts
sumIteratorOverList:@gc.time.profiled 1000 9.000 ± NaN ms
sumIteratorOverList:@gc.time.total 1000 27.000 ± NaN ms
-jvmArgs="-Xms64m -Xmx64m -XX:+DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark (size) Score Error Units
sumIteratorOverList 1000 816.367 ± 17.768 ns/op
sumIteratorOverList:@gc.count.profiled 1000 0.000 ± NaN counts
sumIteratorOverList:@gc.count.total 1000 0.000 ± NaN counts
sumIteratorOverList:@gc.time.profiled 1000 0.000 ± NaN ms
sumIteratorOverList:@gc.time.total 1000 0.000 ± NaN ms
$java -jar target/microbenchmarks.jar -f 1 -i 10 -wi 10 -p size=1000 \
-jvmArgs="-Xms64m -Xmx64m -XX:-DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark (size) Score Error Units
sumIteratorOverList 1000 940.567 ± 94.156 ns/op
sumIteratorOverList:@gc.count.profiled 1000 19.000 ± NaN counts
sumIteratorOverList:@gc.count.total 1000 42.000 ± NaN counts
sumIteratorOverList:@gc.time.profiled 1000 9.000 ± NaN ms
sumIteratorOverList:@gc.time.total 1000 27.000 ± NaN ms
Now that is what we hoped for, escape analysis saves the day! Please note above numbers are from running on my laptop using Oracle Java 8u20, I was not aiming for measurement accuracy, just wanted to verify the compilation/GC aspects. Laptops are good enough for that.
WTF Profilers? Why you lie?
There's a big difference in how JVisualVM/YourKit work and how JMC work, in particular:
- JVisualVM/YourKit: treat the JVM as a black box and profile via the JVMTI interfaces and bytecode injection. Work on all JVMs.
- Java Mission Control: use internal JVM counters/reporting APIs only available to the Oracle JVM (so can't profile OpenJDK/Zing/IBM-J9 etc)
This explains a similar issue, but not this one. The above code is relevant for debuggers and other tools relying on stepping through code and examining variable values.void C2Compiler::compile_method(ciEnv* env, ciMethod* target, int entry_bci) {assert(is_initialized(), "Compiler thread must be initialized");bool subsume_loads = SubsumeLoads;bool do_escape_analysis = DoEscapeAnalysis &&!env->jvmti_can_access_local_variables();
This issue is the result of some instrumentation trapping the reference, at least for JVisualVM. We can look at the assembly to be certain, this is the iterator non-allocation before we connect JVisualVM to the process:
Note how the iterator is initialized with no allocation actually taking place (note also how line 4 annotation is a big fat lie, this is actually getting modCount from the list). Method is recompiled to same when we attach the agent. And here's what happens when we turn on memory profiling: The iterator is now allocated (see 'new' keyword at line 18 above).We can see that the instrumenting agent added a traceObjAlloc call into the iterator constructor (line 52 above). JVisualVM is open source, so we can dig into the code of the above trace method and the instrumentation code, I leave it as an exercise to the reader. If I was a better man I might have a go at fixing it, maybe another day.
Summary
- EscapeAnalysis works, at least for some trivial cases. It is not as powerful as we'd like it, and code that is not hot enough will not enjoy it, but for hot code it will happen. I'd be happier if the flags for tracking when it happens were not debug only.
- Profilers can kill EscapeAnalysis. In the example above we have very little code to look at, so it is easy to cross examine. In the real world you'd be profiling a much larger codebase, looking for allocation elimination opportunities, I suggest you have a healthy doubt in your profiler.
- Java Mission Control is my current favourite profiler for the Oracle JVM. It's a shame the underlying APIs are not made public and standard so that tool makers can rely on them for all JVMs. Perhaps one day at least part of these APIs will be made public.
Nice article, combines a few of my favourite things, measuring performance and memory allocation!
ReplyDeleteBTW in the last paragraph before the Summary, that starts "The iterator is now allocated...". It refers to "line 18 above" and "line 52 above". Which lines do those refer to, I can't see a line 52 anywhere and line 18 doesn't make sense in any of the code sample, or am I getting confused?
Friggin' Blogspot and Gist integration!!!
DeleteThanks for the heads up, fixed now.
> Friggin' Blogspot and Gist integration!!!
DeleteYeah I've had the same pain with Wordpress!
Thanks for fixing that, makes much more sense now.
From my (limited) knowledge of tools that do bytecode insertion, this is a hard problem to solve. Theses tools generally look at the bytecode and then instrument based on that, before handing off the modified bytecode to the JITter. If the JITter then optimises so that things drastically, you get a scenario like you've mentioned above, where the instrumentation no longer makes sense.
FYI using a JVMTI agent disables escape analysis.
ReplyDeletehttp://bugs.java.com/bugdatabase/view_bug.do?bug_id=6837472
ReplyDeleteThat is true if JVMTI agent is registering an interest in "env->jvmti_can_access_local_variables()" [caching a view on JvmtiExport::can_access_local_variables(), which is set by the following lines in JvmtiManageCapabilities::update():
Delete// This controls whether the compilers keep extra locals live to
// improve the debugging experience so only set them if the selected
// capabilities look like a debugger.
JvmtiExport::set_can_access_local_variables(
avail.can_access_local_variables ||
avail.can_generate_breakpoint_events ||
avail.can_generate_frame_pop_events);
] which is not a necessary flag if I understand correctly. The post does mention this particular edge case, but I don't think connecting any JVMTI agent will trigger this.
For comparison I've run same benchmark with the JVMTI agent for honest-profiler and observed expected behaviour(no GC cycles). Honest profiler does not register interest in the above mentioned capabilities and thus does not prevent escape analysis.
DeleteGreat article, very informative... ..
ReplyDeleteUsing wrong tools/settings means getting wrong results. Clearly you should use Sampler - Memory or get a heap dump to do this kind of analysis in JVisualVM. Using memory instrumentation just doesn't make any sense even if it didn't affect the results.
ReplyDeleteMaking tautological statements is always true (wrong is wrong and right is right, right?).
DeleteI wonder why you say "clearly"... is that "clear" from the tools description? stated "clearly" in the documentation?
I was describing a confusion I have met with from some pretty experienced people, the clear solution you speak of eluded them. Perhaps you are cleverer than others and what is clear to you isn't clear to others.
This comment has been removed by the author.
ReplyDeleteThanks for the article. Useful info.
ReplyDelete