tag:blogger.com,1999:blog-5171098727364395242.post2922248676095146459..comments2023-05-14T13:23:31.669+01:00Comments on Psychosomatic, Lobotomy, Saw: On Arrays.fill, Intrinsics, SuperWord and SIMD instructionsNitsanhttp://www.blogger.com/profile/10496299147100350513noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-5171098727364395242.post-83872456151539473202015-12-09T15:20:46.020+00:002015-12-09T15:20:46.020+00:00In an ideal world (with AVX2 support) you can have...In an ideal world (with AVX2 support) you can have the above transformed into a non-destructive 3 register vectorized instruction. The JIT compiler won't do that for you, but I think it would vectorize it into a wide copy and increment in place.<br />The thing to do is write some JMH code around it and look at the generated assembly with perfasm.<br />Let me know how it went :-)Nitsanhttps://www.blogger.com/profile/10496299147100350513noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-39236209590197903082015-09-23T11:38:27.595+01:002015-09-23T11:38:27.595+01:00Great reading.
Do you know if similar SIMD optimi...Great reading.<br /><br />Do you know if similar SIMD optimizations are implemented for summing arrays?<br /><br />for(int i = 0; i<LENGTH; i++) { result[i] = a[i] + b[i]; }Antoine Chambillehttp://www.quartetfs.comnoreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-34762980918781827912015-04-21T15:53:47.658+01:002015-04-21T15:53:47.658+01:00Right tool for the job, takes a second to knock up...Right tool for the job, takes a second to knock up and you get all the tooling for free. Once you get in the habit you'll thank me.Nitsanhttps://www.blogger.com/profile/10496299147100350513noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-10931284953726033912015-04-13T20:08:38.429+01:002015-04-13T20:08:38.429+01:00time. just a five minute thingy inline .. results ...time. just a five minute thingy inline .. results are stable and give a sufficient impression.Rüdiger Möllerhttps://www.blogger.com/profile/03711813786574992852noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-59176286804998399382015-04-13T19:56:32.079+01:002015-04-13T19:56:32.079+01:00why, oh why would you not use JMH?why, oh why would you not use JMH?Nitsanhttps://www.blogger.com/profile/10496299147100350513noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-47694809751539477862015-04-13T19:38:56.766+01:002015-04-13T19:38:56.766+01:00ARGG ! fell into autoboxing trap by Arrays.fill( O...ARGG ! fell into autoboxing trap by Arrays.fill( Object[], 0 ).<br />Correct number for Object[] clear with Arrays.fill is 1390 !Rüdiger Möllerhttps://www.blogger.com/profile/03711813786574992852noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-82998707682894727412015-04-13T19:36:41.081+01:002015-04-13T19:36:41.081+01:00Follow up a quick and dirty test measuring perform...Follow up a quick and dirty test measuring performance of clearing int[] and Object[] arrays:<br /><br />clear int array using Arrays.fill : 500ms<br />clear int array using System.arraycopy of static empty array : 700ms<br />clear Object array using Arrays.fill : 3280ms (!!)<br />clear int array using System.arraycopy of static empty array : 700ms<br /><br />so indeed the trickery does only payoff for object arrays.<br /><br />(bench source https://github.com/RuedigerMoeller/fast-serialization/blob/master/src/main/java/org/nustaq/serialization/util/FSTUtil.java)<br />Rüdiger Möllerhttps://www.blogger.com/profile/03711813786574992852noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-79756747446726058732015-04-13T19:23:05.389+01:002015-04-13T19:23:05.389+01:00Thanks for not-advice. Hm .. i also use this techn...Thanks for not-advice. Hm .. i also use this technique to clear int arrays, can't remember if I also benchmarked those or just assumed it would also be faster (like with Object arrays). Smells like I can safe zem nanos :-)Rüdiger Möllerhttps://www.blogger.com/profile/03711813786574992852noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-67815270599044872292015-04-13T16:14:03.047+01:002015-04-13T16:14:03.047+01:00The JVM is prety cool on that score, yes.
I'm ...The JVM is prety cool on that score, yes.<br />I'm not sure how well GCC handles vectorization compared to C2, that would make an interesting topic. The thing that bothers people with Java and SIMD is that the compiler can't always figure out what they want to do. In C they could use SIMD intrinsics or assembly, in Java they are stuck.Nitsanhttps://www.blogger.com/profile/10496299147100350513noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-55365744454985388612015-04-13T16:03:11.906+01:002015-04-13T16:03:11.906+01:00The pattern matching replacing thingy is in the JV...The pattern matching replacing thingy is in the JVM since 1.6, but it doesn't cover Object[]/long[]/double[] so really is not a help for what you're trying to do.<br />If you want a hacky way of clearing the array using memset you can use Unsafe.setMemory on the array and set the elements to 0 (so something like: UNSAFE.setMemory(arrayB, UNSAFE.arrayBaseOffset(Object[].class), length*UNSAFE.arrayIndexScale(Object[].class), 0);). Since you're nulling it the card marking is irrelevant anyhow (well, unless you use G1GC? or some other GC algo where it matters?). This is a terrible idea so please don't do it.<br />Nitsanhttps://www.blogger.com/profile/10496299147100350513noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-19162113928013958972015-04-13T14:57:48.607+01:002015-04-13T14:57:48.607+01:00Great post (again) ! I made benchmarks regarding a...Great post (again) ! I made benchmarks regarding arrays.fillObject some 2-4 years ago and found the fastest way to null an object array was to system.arraycopy from a static empty object array. Do you know since which release the "pattern driven" optimization is present in JDK ?<br /><br />(from fast-serialization):<br /><br />public static void clear(Object[] arr, int arrlen) {<br /> int count = 0;<br /> final int length = EmptyObjArray.length;<br /> while (arrlen - count > length) {<br /> System.arraycopy(EmptyObjArray, 0, arr, count, length);<br /> count += length;<br /> }<br /> System.arraycopy(EmptyObjArray, 0, arr, count, arrlen - count);<br /> }<br />Rüdiger Möllerhttps://www.blogger.com/profile/03711813786574992852noreply@blogger.comtag:blogger.com,1999:blog-5171098727364395242.post-76644750293171329612015-04-13T13:26:55.612+01:002015-04-13T13:26:55.612+01:00Nitsan
Thanks for the wonderful post... Very time...Nitsan<br /><br />Thanks for the wonderful post... Very timely (I've been looking into SIMD compiler optimizations after I saw Mike Barkers video where he compared C and Java implementations of SimpleBinaryEncoding <br /><br />http://www.infoq.com/presentations/performance-safety<br /><br />(where @ about 35:15 he noted that the Java JIT compiler (by default) uses SIMD under the covers, whereas C requires you to "opt in" with compiler flags)<br /><br />"Java is fast by default"<br /><br />Cheers<br />EricMEDhttps://www.blogger.com/profile/14649186827365309422noreply@blogger.com