[prev in list] [next in list] [prev in thread] [next in thread]
List: android-virt
Subject: Re: Instruction/Cycle Counting in Guest Using the Kvm PMU
From: Andrew Murray <andrew.murray () arm ! com>
Date: 2018-11-30 0:23:30
Message-ID: 20181130002327.GB16311 () e119886-lin ! cambridge ! arm ! com
[Download RAW message or body]
On Thu, Nov 29, 2018 at 11:30:55AM +0000, Jan Bolke wrote:
> Hi,
> And thanks for the fast replies.
>
> > The PMU emulation works by creating a perf event in the host, however it is \
> > pinned to the KVM process, so the the real PMU counters are stopped and started \
> > as the KVM process is >scheduled in and out. This means that it will include any \
> > CPU time associated with that process of which your guest is only a subset of.
>
> Thanks for the clarification.
> As this is the case, the counted cycles from the host should deliver a larger \
> number than the executed instructions inside the guest.
Indeed.
>
> > The patchset that James refers to will ensure that the underlying real PMU \
> > counters underlying the guest only events will only be enabled upon entering the \
> > guest (and disabled on >leaving). Thus you will need to apply this (to your host) \
> > for more accurate counting. (You could also then use the perf modifiers in the \
> > host to counter guest cycles, e.g. perf -e >instructions:G).
>
> So I applied your patch to a 4.19.5 kernel and also your other Patchseries for the \
> perf events in the host [0]. So what I do now is running :
> perf stat -e instructions:G -- ./run_loop_in_kvm.
>
> Run_loop_in_kvm is a small c program who starts a vm and executes a little loop in \
> the guest and then exits. I get a output from perf like the following:
> 159732 instructions:Gu
> ....
> My Problem is, I am still not sure how to interpret these values as my bare metal \
> code runs a loop for 1048577 times which executes 3 instructions in every run.
I notice you are using :Gu. The 'u' will record only EL0, are you confident
that your code runs at this level? Perhaps drop the 'u'.
Also are you compiling with compiler optimisations turned off - or have you
inspected an objdump of the binary to verify it is as you expect?
>
> My question is how comes this discrepancy of the counted values.
> The perf counting from the host delivers a value significantly smaller than the \
> number of instructions in the guest.
> I am struggling to interpret the perf counter values as an indication how many \
> instructions my guest performed. What am I missing?
Your expectations are correct. For my own sanity I attempted something similar:
# cat startup.s
ENTRY(_start)
SECTIONS
{
. = 0x80080000;
.startup . : { startup.o(.text) }
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(16);
. = . + 0x1000;
stack_top = .;
}
# cat test.c
void main() {
int x=0;
for (x=0;x<20000000;x++)
;
}
# cat script.ld
ENTRY(_start)
SECTIONS
{
. = 0x80080000;
.startup . : { startup.o(.text) }
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(16);
. = . + 0x1000;
stack_top = .;
}
And then:
# aarch64-linux-gnu-gcc -c test.c -o test.o \
# aarch64-linux-gnu-as startup.s -o startup.o \
# aarch64-linux-gnu-ld -T script.ld test.o startup.o -o out \
# aarch64-linux-gnu-objcopy -O binary -S out
On the host:
# perf stat -e instructions:G /lkvm-static run -k /out -m 1024 -c 1 --console serial \
--pmu
...
# KVM session ended normally.
Performance counter stats for '/lkvm-static run -k /out -m 1024 -c 1 --console \
serial --pmu':
160000016 instructions:G
This matches exactly with what I would expect from the disassembly:
Disassembly of section .startup:
0000000080080000 <_start>:
80080000: 580000de ldr x30, 80080018 <_start+0x18>
80080004: 910003df mov sp, x30
80080008: 94000006 bl 80080020 <main>
8008000c: d503207f wfi
80080010: 17ffffff b 8008000c <_start+0xc>
80080014: 00000000 .inst 0x00000000 ; undefined
80080018: 80081060 .word 0x80081060
8008001c: 00000000 .word 0x00000000
Disassembly of section .text:
0000000080080020 <main>:
80080020: d10043ff sub sp, sp, #0x10
80080024: b9000fff str wzr, [sp, #12]
80080028: b9000fff str wzr, [sp, #12]
8008002c: 14000004 b 8008003c <main+0x1c>
80080030: b9400fe0 ldr w0, [sp, #12]
80080034: 11000400 add w0, w0, #0x1
80080038: b9000fe0 str w0, [sp, #12]
8008003c: b9400fe1 ldr w1, [sp, #12]
80080040: 52859fe0 mov w0, #0x2cff // #11519
80080044: 72a02620 movk w0, #0x131, lsl #16
80080048: 6b00003f cmp w1, w0
8008004c: 54ffff2d b.le 80080030 <main+0x10>
80080050: d503201f nop
80080054: 910043ff add sp, sp, #0x10
80080058: d65f03c0 ret
>
> Also I get the following output for perf stat -e cycles:G ls: 647284 cyles:Gu.
> Is this a indicator that my guest/host modifiers do not work or am I \
> misunderstanding the whole concept here? Sorry for the silly question and thanks in \
> advance!
For 'perf stat -e cycles:G ls' I would expect 0. I'm not quite sure why it's
showing data for the 'u' modifier. It smells a little like the patch hasn't
applied or the userspace perf tool is falling back to not using exclude_guest
and recording just EL0 (which would explain all your figures assuming your
bare metal code actually runs at EL1).
Try running 'perf stat -vv -e instructions:G ls' Make sure that the last
'exclude_host 1' you see on the screen is set to 1.
Thanks,
Andrew Murray
>
> > Also you may want to refer to kvm-unit-tests are there are test cases that \
> > demonstrate bare metal code for PMU enabling.
> Thanks for the hint, these tests are very useful examples!
>
> [0]: http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/614985.html
>
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic