[prev in list] [next in list] [prev in thread] [next in thread] 

List:       qemu-discuss
Subject:    Re: spin loop 100x faster in user mode (CPL=3) than superuser (CPL=0)?
From:       Garrick Toubassi <gtoubassi () gmail ! com>
Date:       2021-10-29 19:06:09
Message-ID: CAKGOLmRtFnS1Q3JqQzcemG_G3BqVzJH1dL7n3awvkFPnTr8dQA () mail ! gmail ! com
[Download RAW message or body]

I went ahead and created a short repro case which can be found at
https://github.com/gtoubassi/qemu-spinrepro.  Would appreciate thoughts
from anyone or guidance on how to debug.

On Tue, Oct 19, 2021 at 3:05 PM Garrick Toubassi <gtoubassi@gmail.com>
wrote:

> Hello
>
> I have a mystery I haven't been able to run down and would appreciate any
> explanation or advice.
>
> On a mac/intel I am running qemu-system-x86_64 on a simple image which
> bootstraps into 64 bit long mode and then runs a simple spin loop
> (literally for (int i = 0; i < 10000000; i++) {}).  This completes in ~5
> seconds of wall time.  After completion it then enters user mode (CPL=3)
> via a fabricated interrupt stack frame and an iretq, returning to the same
> spin loop.  In this case it runs about 100x faster.
>
> I at first thought maybe the TCG jit somehow isn't kicking in and maybe
> there is some pure interpretation going on but I've run with "-trace
> exec_tb -trace translate_block -d out_asm,guest_errors,nochain,int,plugin"
> and it seems to be running "translation blocks", just a lot more of them
> when running the slow loop (or to be more precise running one tb many more
> times according to exec_tb logging).  Upon inspection the relevant
> generated assembly is morally equivalent between the two as best I can
> tell.  Which implies to me its something outside of the tb.  I was thinking
> perhaps its regenerating the code every time, but logging doesn't show that.
>
> I also was wondering if something about the MMU implementation might slow
> things down when in user mode?  In this case both loops are running under
> the same GDT/page table which just happens to mark all pages as "user"
> pages so that when jumping to CPL=3 it will still run.
>
> I can package up a reproducible case if it's helpful but wanted to see if
> there is something obvious I am missing in terms of expected behavior
> before doing that.
>
> Thanks!
>
> gt
>
>
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div><div class="gmail-gs" style="margin:0px;padding:0px 0px \
20px;width:1218px;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:medium"><div \
class="gmail-"><div id="gmail-:2dc" class="gmail-ii gmail-gt" \
style="font-size:0.875rem;direction:ltr;margin:8px 0px 0px;padding:0px"><div \
id="gmail-:2dd" class="gmail-a3s gmail-aiL" \
style="overflow:hidden;font-variant-numeric:normal;font-variant-east-asian:normal;font \
-stretch:normal;font-size:small;line-height:1.5;font-family:Arial,Helvetica,sans-serif"><div \
dir="ltr">I went ahead and created a short repro case which can be found at  <a \
href="https://github.com/gtoubassi/qemu-spinrepro" \
target="_blank">https://github.com/gtoubassi/qemu-spinrepro</a>.   Would appreciate \
thoughts from anyone or guidance on how to \
debug.</div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Tue, Oct 19, 2021 at 3:05 PM Garrick Toubassi &lt;<a \
href="mailto:gtoubassi@gmail.com" target="_blank">gtoubassi@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
dir="ltr">Hello<div><br></div><div>I have a mystery I haven&#39;t been able to run \
down and would appreciate any explanation or advice.</div><div><br></div><div>On a \
mac/intel I am running qemu-system-x86_64 on a simple image which bootstraps into 64 \
bit long mode and then runs a simple spin loop (literally for (int i = 0; i &lt; \
10000000; i++) {}).   This completes in ~5 seconds of wall time.   After completion \
it then enters user mode (CPL=3) via a fabricated interrupt stack frame and an iretq, \
returning to the same spin loop.   In this case it runs about 100x \
faster.</div><div><br></div><div>I at first thought maybe the TCG jit somehow \
isn&#39;t kicking in and maybe there is some pure interpretation going on but \
I&#39;ve run with  &quot;-trace exec_tb -trace translate_block -d \
out_asm,guest_errors,nochain,int,plugin&quot; and it seems to be running  \
&quot;translation blocks&quot;, just a lot more of them when running the slow loop \
(or to be more precise running one tb many more times according to exec_tb logging).  \
Upon inspection the relevant generated assembly is morally equivalent between the two \
as best I can tell.   Which implies to me its something outside of the tb.   I was \
thinking perhaps its regenerating the code every time, but logging doesn&#39;t show \
that.</div><div><br></div><div>I also was wondering if something about the MMU \
implementation might slow things down when in user mode?   In this case both loops \
are running under the same GDT/page table which just happens to mark all pages as \
&quot;user&quot; pages so that when jumping to CPL=3 it will still \
run.</div><div><br></div><div>I can package up a reproducible case if it&#39;s \
helpful but wanted  to see if there is something  obvious I am missing in terms of \
expected behavior before doing \
that.</div><div><br></div><div>Thanks!</div><div><br></div><div>gt</div><div><br></div><div><br></div><div></div><div><br></div></div>
 </blockquote></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic