This one took some interesting turns.
First I tried to have a look at the assembly that is generated with the original method. However, the JIT does quite some inlining and optimizations, and this includes the System.out.println
calls, so the resulting assembly output was too huge (for me) to sensibly analyze it in reasonable time.
So I simplified the whole thing, in order to be able to focus on the actual question. Finally, I ran the following program:
class Test04
{
public static void main(String args[])
{
long sum = 0;
for (int i=1000; i<12000; i++)
{
sum += fizzBuzz1(i);
sum += fizzBuzz2(i);
}
System.out.println(sum);
}
public static long fizzBuzz1(int n)
{
long sum = 0;
for(int i = 1; i <= n; i++)
{
sum += fizzBuzzCore1(i);
}
return sum;
}
public static long fizzBuzzCore1(int i)
{
boolean fizzed = false;
boolean buzzed = false;
if(i % 3 == 0)
fizzed = true;
if(i % 5 == 0)
buzzed = true;
if(fizzed && buzzed)
return 4;
else if(fizzed)
return 3;
else if(buzzed)
return 2;
else
return 1;
}
public static long fizzBuzz2(int n)
{
long sum = 0;
for(int i = 1; i <= n; i++)
{
sum += fizzBuzzCore2(i);
}
return sum;
}
public static long fizzBuzzCore2(int i)
{
if(i % 3 == 0 && i % 5 == 0)
return 4;
else if(i % 3 == 0)
return 3;
else if(i % 5 == 0)
return 2;
else
return 1;
}
}
The return values are intended to prevent him from optimizing away the calls completely, and extracting the "core" methods aimed at keeping the size of the assembly output that has to be compared as small as possible.
(Note: Of course, these modifications may affect the optimization. For example, the JIT has a limit of the number of bytecode instructions that a method may have before it is considered to be too large to be inlined, -XX:MaxInlineSize=35
. But it the effects should roughly be the same on both methods, so it should still be possible to derive the desired information about the actual question).
And, not such a great surprise: After the last optimization, the assembly code for both methods contains equal instructions - here the assembly for fizzBuzzCore1
as a reference:
Decoding compiled method 0x00000000026c0090:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000057260528} 'fizzBuzzCore1' '(I)J' in 'Test04'
# parm0: rdx = int
# [sp+0x20] (sp of caller)
0x00000000026c01c0: sub $0x18,%rsp
0x00000000026c01c7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - Test04::fizzBuzzCore1@-1 (line 27)
0x00000000026c01cc: movslq %edx,%r10
0x00000000026c01cf: mov %edx,%r11d
0x00000000026c01d2: sar $0x1f,%r11d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c01d6: imul $0x66666667,%r10,%r8
0x00000000026c01dd: imul $0x55555556,%r10,%r10
0x00000000026c01e4: sar $0x21,%r8
0x00000000026c01e8: sar $0x20,%r10
0x00000000026c01ec: mov %r8d,%r8d
0x00000000026c01ef: sub %r11d,%r8d ;*irem
; - Test04::fizzBuzzCore1@14 (line 31)
0x00000000026c01f2: mov %r10d,%r10d
0x00000000026c01f5: sub %r11d,%r10d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c01f8: mov %r8d,%r11d
0x00000000026c01fb: shl $0x2,%r11d
0x00000000026c01ff: add %r8d,%r11d ;*irem
; - Test04::fizzBuzzCore1@14 (line 31)
0x00000000026c0202: mov %r10d,%r9d
0x00000000026c0205: shl %r9d
0x00000000026c0208: add %r10d,%r9d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c020b: cmp %r9d,%edx
0x00000000026c020e: jne 0x00000000026c021c ;*ifeq
; - Test04::fizzBuzzCore1@21 (line 33)
0x00000000026c0210: cmp %r11d,%edx
0x00000000026c0213: jne 0x00000000026c021c ;*ifeq
; - Test04::fizzBuzzCore1@25 (line 33)
0x00000000026c0215: mov $0x4,%eax
0x00000000026c021a: jmp 0x00000000026c0239 ;*iload_1
; - Test04::fizzBuzzCore1@32 (line 35)
0x00000000026c021c: cmp %r9d,%edx
0x00000000026c021f: jne 0x00000000026c0228 ;*ifeq
; - Test04::fizzBuzzCore1@33 (line 35)
0x00000000026c0221: mov $0x3,%eax
0x00000000026c0226: jmp 0x00000000026c0239
0x00000000026c0228: cmp %r11d,%edx
0x00000000026c022b: jne 0x00000000026c0234 ;*ifeq
; - Test04::fizzBuzzCore1@41 (line 37)
0x00000000026c022d: mov $0x2,%eax
0x00000000026c0232: jmp 0x00000000026c0239
0x00000000026c0234: mov $0x1,%eax ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c0239: add $0x10,%rsp
0x00000000026c023d: pop %rbp
0x00000000026c023e: test %eax,-0x2470244(%rip) # 0x0000000000250000
; {poll_return}
0x00000000026c0244: retq
0x00000000026c0245: hlt
0x00000000026c0246: hlt
0x00000000026c0247: hlt
0x00000000026c0248: hlt
0x00000000026c0249: hlt
0x00000000026c024a: hlt
0x00000000026c024b: hlt
0x00000000026c024c: hlt
0x00000000026c024d: hlt
0x00000000026c024e: hlt
0x00000000026c024f: hlt
0x00000000026c0250: hlt
0x00000000026c0251: hlt
0x00000000026c0252: hlt
0x00000000026c0253: hlt
0x00000000026c0254: hlt
0x00000000026c0255: hlt
0x00000000026c0256: hlt
0x00000000026c0257: hlt
0x00000000026c0258: hlt
0x00000000026c0259: hlt
0x00000000026c025a: hlt
0x00000000026c025b: hlt
0x00000000026c025c: hlt
0x00000000026c025d: hlt
0x00000000026c025e: hlt
0x00000000026c025f: hlt
[Exception Handler]
[Stub Code]
0x00000000026c0260: jmpq 0x000000000261c560 ; {no_reloc}
[Deopt Handler Code]
0x00000000026c0265: callq 0x00000000026c026a
0x00000000026c026a: subq $0x5,(%rsp)
0x00000000026c026f: jmpq 0x00000000025f6f40 ; {runtime_call}
0x00000000026c0274: hlt
0x00000000026c0275: hlt
0x00000000026c0276: hlt
0x00000000026c0277: hlt
But...
... what may be surprising, however, is: It does not compute a modulo operation at all!
At least, not explicitly: There is no idiv
instruction appearing in this code! So the JIT really tries hard to avoid the costly divisions, by doing some nasty, nasty bit twiddling hacks: The instructions
0x00000000026c01d6: imul $0x66666667,%r10,%r8
0x00000000026c01dd: imul $0x55555556,%r10,%r10
0x00000000026c01e4: sar $0x21,%r8
0x00000000026c01e8: sar $0x20,%r10
(and following...)
are a "division-free" implementation of a division. For example, the method
private static int divideBy3(int n)
{
long r10 = n;
r10 *= 0x55555556L;
r10 >>>= 0x20;
long r10d = r10 & 0xFFFFFFFFL;
return (int)r10d;
}
uses these magic constants and shifts to compute a division by 3 (similarly, for 5 with the other constant). I did not do the math myself, but an explaination of how the modulo operation is derived from that can be found at Page 32 of the "INTEGER DIVISION BY CONSTANTS" document from Hacker's Delight.