Psychosomatic, Lobotomy, Saw

Thanks for the article. Useful info.

2020-06-03T13:39:34.160+01:00

Thanks for the article. Useful info.

"What they don't do is set up some benchm...

2019-11-21T17:42:15.157+00:00

"What they don't do is set up some benchmarks where the hotspot is known and use those to understand what it is that safepoint biased profilers see"
actually they do - they add a fibonacci routine that they can use to control the amount of work (section 5.1 of the paper)

It will be interesting to see if JVMTI will make u...

2019-09-21T04:50:39.260+01:00

It will be interesting to see if JVMTI will make use of Thread Local Handshakes so there is no more reliance on a global safepoints. This could JVMTI based profilers a bit more reliable.

Sorry Nitsan, I forgot to remove this comment as I...

2019-09-04T16:05:44.971+01:00

Sorry Nitsan, I forgot to remove this comment as I've moved forward and understood the mechanics here, but thanks for the answer anyway!

Why do you think "countOdds is forbidden by p...

2019-09-04T15:43:19.724+01:00

Why do you think "countOdds is forbidden by policy to be inlined (ie the caller is not compiled)" is a relevant piece of information? Also, how is "invokestatic will call the most optimized version of it without dropping the {poll_return} on method exit" relevant?

We have enough profiling data to compile the countOdds method, we hope it's been compiled before we call it from Thread::run, where the parameter we give it means it will never return, so the safepoint poll at the end of the method is not a concern. We are counting on the removal of the counted loop safepoints.

The important thing here is that countOdds

There are many VMOperations triggers, see the post...

2019-09-04T15:37:04.848+01:00

There are many VMOperations triggers, see the post on safepoints for details and instructions on how to get the JVM to print out safepoint details. The main thread is never coming back from sleeping because the JVM is waiting for a VMOperation to start, and there's no point unsuspending Java threads before it's finished.

What about a value and a timestamp together ? I wo...

2019-08-02T08:37:41.214+01:00

What about a value and a timestamp together ? I would like to get minimum values for a period. Is that possible ?

Hi, First of all, awesome post!!! I was trying t...

2019-06-04T16:02:31.050+01:00

Hi,

First of all, awesome post!!!

I was trying to run your example and the result is the expected but I was not able to understand why. Based on my understanding the JVM will died when all user threads finishes so my assumption was ok after sleeping 5 seconds the main thread will finish but the jvm will not died because the counted loop does not reach a safepoint so it cannot validate if all the user threads has finished (That was my assumption) but after running the same code and adding System.out.println("Before Sleep"); Thread.sleep(5000); System.out.println("After Sleep"); The after sleep is never printed so my new assumption is that the main thread is timed_waiting but after wait for 5 seconds it can be moved to runnable queues but maybe this movement is done as vm operation? is that correct? or could you help me to understand this. Thanks in advance

See comment above on PMA: https://psy-lob-saw.blo...

2019-04-10T09:55:41.955+01:00

See comment above on PMA:
https://psy-lob-saw.blogspot.com/2016/02/why-most-sampling-java-profilers-are.html?showComment=1461659480296#c5664411013658265777

Great tool! I tried it to together with the impact...

2019-02-24T04:27:58.245+00:00

Great tool!
I tried it to together with the impact of +-UseBiasedLocking, I use the tcpserver/client spin, which have no contention in threads and hence i expect +UseBiasedLocking should have some improvement, as the JDK socket writing and reading involves lots of synchronized block. (e.g synchronized (readLock) {...} synchronized (stateLock) {...}

However, i can consistently reproduce the fact that -UseBiasedLocking is in nearly all lines better (typical line as below)
@4285,4675,4865,5341,19046,24489,30971

while +UseBiasedLocking is typically :
@4454,4869,5250,5499,19050,27714,39104

May I ask expert like you why?

Why not use oprofile, perf OR vtune. They all use ...

2019-02-19T20:21:30.957+00:00

Why not use oprofile, perf OR vtune. They all use the hardware on Intel cpu:s to sample the PC. They all give a very good picture of a java-program running at full CPU (or of the whole system including your kernel). They only consume 1% cpu.

I see all of the implementation use Nio. I compare...

2018-08-01T09:01:27.568+01:00

I see all of the implementation use Nio. I compared Nio DatagramChannel vs DatagramSocket and found DatagramSocket to be slightly better on all percentiles, especially regarding outliers.
DatagramChannel max RTT was 2.7 msec compared to 26 msec on DatagramChannel.
Have you tested DatagramSocket implementation ?

I'm not familiar with this one, I suggest you ...

2018-07-11T19:04:22.164+01:00

I'm not familiar with this one, I suggest you hit up Vladimir Ivanov on twitter with this question: https://twitter.com/iwan0www

Nitsan, I was told about a "should this be in...

2018-07-11T15:22:41.463+01:00

Nitsan, I was told about a "should this be inlined?" heuristic the other day that I had not heard of before. I have been trying to find more information about it.

As it was explained to me, the heuristic has to do with the cost of de-optimization. If a method were inlined in a large number of places and then the method needed to be de-optimized, then there would be a high cost to modifying all of the call sites.

Is that true? It seems reasonable. Can you point me toward any resources that explain the heuristic in more detail? Even Google search terms would helpful. So far I have only found explanations of the more commonly known heuristics.

You mentioned: "There are better options out ...

2018-06-22T09:17:13.081+01:00

You mentioned:
"There are better options out there! I'll get into some of them in following posts:
- Java Mission Control
- Solaris Studio
- Honest-Profiler
- Perf + perf-map-agent (or perfasm if your workload is wrapped in a JMH benchmark)"
My question is:
Java Mission Control is a cross platform utility provided as part of the JDK.
As as I know windows does not support SIGPROF. So how Java Mission Control works on windows? (In sampling mode of course)

Thanks for the detailed analysis. Graal recently u...

2018-06-15T12:37:31.477+01:00

Thanks for the detailed analysis.
Graal recently updated some intrinsics for put/getOrdered methods.
The fixes will land on RC3, in the meantime here's how to test/build Graal from source:
https://gist.github.com/mukel/bc21a0acfe8c924fcc5fec1f480166cc

Should be easy to submit a PR with a fix for the G...

2018-06-11T08:17:18.140+01:00

Should be easy to submit a PR with a fix for the Graal compiler.

It has passed a long time but there is a very besi...

2018-06-01T18:33:24.926+01:00

It has passed a long time but there is a very besic thing I'm not sure
I have understood of the JIT mechanics here:
- Thread::run contains a call to countOdds that will be executed by the interpreter, given that Thread::run isn't compiled yet.
- countOdds is forbidden by policy to be inlined (ie the caller is not compiled)
- given that countOdds has been already compiled C2/Level 4, invokestatic will call the most optimized version of it without dropping the {poll_return} on method exit

Sorry for the step-by-step explanation, but it is correct? Am I missing anything?

"sz = align_up(sz, HeapWordSize);" impli...

2018-05-14T10:11:55.495+01:00

"sz = align_up(sz, HeapWordSize);" implies to me that the allocation is word aligned, not size aligned. Common case is aligned to 8 bytes, which means that being page/cache line aligned is the user's problem.
As for doing nothing special for malloc, I assume this is relying on the malloc contract.

Nice, that old blog post is still good. Looking i...

2018-04-10T19:44:20.699+01:00

Nice, that old blog post is still good.

Looking into the code of openjdk - http://hg.openjdk.java.net/jdk10/jdk10/hotspot/file/5ab7a67bc155/src/share/vm/prims/unsafe.cpp#l503 - it is obvious that the size is aligned - but a bit unclear how does address is getting aligned ?

don't see any (obvious) tricks of address alignment in os:malloc as well http://hg.openjdk.java.net/jdk10/jdk10/hotspot/file/5ab7a67bc155/src/share/vm/runtime/os.cpp#l649

yes, the GC default change is the issue. JDK8 + G1...

2018-02-14T11:09:28.352+00:00

yes, the GC default change is the issue. JDK8 + G1 is same.

Very nice read Nitsan. Is the performance equally...

2018-02-14T07:15:28.561+00:00

Very nice read Nitsan.

Is the performance equally bad when using Java 8 + G1 the same as Java 9 + G1? Or is there a regression?

Thank You for Great Informative article. Keep it U...

2018-02-12T11:54:38.410+00:00

Thank You for Great Informative article. Keep it Up.

> Safepoint operation interval -> Sometimes ...

2018-01-22T18:57:46.146+00:00

> Safepoint operation interval -> Sometimes in your control (be careful how you profile/monitor your application), sometimes not.

It seems there are no benchmark results that show effects of increasing the frequency of safepoint operation requests in the post (that must be configured with `@Param int intervalMs;`).
The higher the frequency of requests for safepoint operations, the bigger the impact on the user code, is that correct?

Thank you for the post, it's wonderful!

Fair enough, I leave it to you to do a more rigoro...

2018-01-06T19:32:20.074+00:00

Fair enough, I leave it to you to do a more rigorous comparison.

Happy to hear the benchmark exposed some opportunities for here, would make a nice blog post to do a before/after and work through the tweaks. Is there a bug link for tracking this?

The ops are indeed tiny. It is however a throughput test, and the aux counters make no sense at all in the avgt mode... There's a timing/cost benchmark you can look at if you prefer: QueueBurstCost

Hope that helps.