I'm trying to understand the rdpmc instruction. As such I have the following asm code:
segment .text
global _start
_start:
xor eax, eax
mov ebx, 10
.loop:
dec ebx
jnz .loop
mov ecx, 1<<30
; calling rdpmc with ecx = (1<<30) gives number of retired instructions
rdpmc
; but only if you do a bizarre incantation: (Why u do dis Intel?)
shl rdx, 32
or rax, rdx
mov rdi, rax ; return number of instructions retired.
mov eax, 60
syscall
(The implementation is a translation of rdpmc_instructions().)
I count that this code should execute 2*ebx+3 instructions before hitting the rdpmc
instruction, so I expect (in this case) that I should get a return status of 23.
If I run perf stat -e instruction:u ./a.out
on this binary, perf
tells me that I've executed 30 instructions, which looks about right. But if I execute the binary, I get a return status of 58, or 0, not deterministic.
What have I done wrong here?