[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xen-users
Subject:    Re: [Xen-users] IO APIC interrupt stuck with irr=1 (was: Re: xen hypervisor does not like my Dom0 LV
From:       Jeff Swicegood <jguru108 () gmail ! com>
Date:       2016-12-03 16:10:28
Message-ID: CAJgWp+Hq7m0Qea95Nh-shRELGoS0MW6xqhP8+k6Xzz2+McJTrA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On Thu, Dec 1, 2016 at 7:21 AM Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 01.12.16 at 11:29, <roger.pau@citrix.com> wrote:
> >> (XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
> >> (XEN) [VT-D]  RMRR address range bf7da000..bf7d9fff not in reserved
> memory;
> >> need "iommu_inclusive_mapping=1"?
> >> (XEN) [VT-D]  RMRR (bf7da000, bf7d9fff) is incorrect
> >> (XEN) Failed to parse ACPI DMAR.  Disabling VT-d.
>
> Do things work better with this worked around (as suggested by the
> message)?
>
> >> (XEN)     IRQ 20 Vec 41:
> >> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=1
> >> polarity=1 irr=1 trig=L mask=0 dest_id:8
> >
> > So this IO APIC vector seems to be stuck with irr=1, I've assumed that
> Xen would
> > ack the interrupt if a certain timeout has passed and the guest has not
> done it,
> > but I could be mistaken.
>
> Interrupts in IRR can't be acked, they first need to propagate to
> ISR (in LAPIC terms). Hence we'd need to know the state of the
> corresponding ISR bit (and for completeness also the IRR one) in
> the LAPIC. I would suspect that the ISR bit is also set, and _that_
> would then indicate we may have missed issuing an EOI. But it
> might also be that some interrupt at a higher priority (larger
> vector number) is not disappearing from ISR, effectively masking
> the relatively low numbered vector here.
>
> Also, are you positive about the IRR bit here being _permanently_
> set, rather than just at the point this one sample was taken?
>
> > I've also seen similar issues on some boxes, this seems
> > to always happen on boxes with more than one IO APIC. In the past I've
> solved it
> > by setting ioapic_ack=old, but that doesn't seem to work for his case.
>
> Not so here (for the last so many years), so I wonder whether it
> matters what Dom0 kernel is in use.
>
> >> And here are the messages to prove there was a lost interrupt:
> >>
> >> 11/30/16 5:09 PM    jaga-Desktop    kernel    [10056.569371] ata2: lost
> >> interrupt (Status 0x58)
> >> 11/30/16 5:09 PM    jaga-Desktop    kernel    [10056.569402] ata3: lost
> >> interrupt (Status 0x58)
> >> 11/30/16 6:00 PM    jaga-Desktop    kernel    [    0.187813] DMAR-IR:
> This
> >> system BIOS has enabled interrupt remapping
> >> 11/30/16 6:00 PM    jaga-Desktop    kernel    [    0.187813] interrupt
> >> remapping is being disabled.  Please
>
> These two messages are suspicious: The kernel should keep its hands
> off any IOMMU things when running under Xen.
>
> Jan
>

iommu_inclusive_mapping=1 does not help. Booting to a different kernel
makes no difference.

Jeff

[Attachment #5 (text/html)]

<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Dec 1, 2016 at \
7:21 AM Jan Beulich &lt;<a href="mailto:JBeulich@suse.com">JBeulich@suse.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">&gt;&gt;&gt; On 01.12.16 at 11:29, \
&lt;<a href="mailto:roger.pau@citrix.com" class="gmail_msg" \
target="_blank">roger.pau@citrix.com</a>&gt; wrote:<br class="gmail_msg"> &gt;&gt; \
(XEN) Enabling APIC mode:   Flat.   Using 2 I/O APICs<br class="gmail_msg"> &gt;&gt; \
(XEN) [VT-D]   RMRR address range bf7da000..bf7d9fff not in reserved memory;<br \
class="gmail_msg"> &gt;&gt; need &quot;iommu_inclusive_mapping=1&quot;?<br \
class="gmail_msg"> &gt;&gt; (XEN) [VT-D]   RMRR (bf7da000, bf7d9fff) is incorrect<br \
class="gmail_msg"> &gt;&gt; (XEN) Failed to parse ACPI DMAR.   Disabling VT-d.<br \
class="gmail_msg"> <br class="gmail_msg">
Do things work better with this worked around (as suggested by the<br \
class="gmail_msg"> message)?<br class="gmail_msg">
<br class="gmail_msg">
&gt;&gt; (XEN)        IRQ 20 Vec 41:<br class="gmail_msg">
&gt;&gt; (XEN)           Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=1<br \
class="gmail_msg"> &gt;&gt; polarity=1 irr=1 trig=L mask=0 dest_id:8<br \
class="gmail_msg"> &gt;<br class="gmail_msg">
&gt; So this IO APIC vector seems to be stuck with irr=1, I&#39;ve assumed that Xen \
would<br class="gmail_msg"> &gt; ack the interrupt if a certain timeout has passed \
and the guest has not done it,<br class="gmail_msg"> &gt; but I could be mistaken.<br \
class="gmail_msg"> <br class="gmail_msg">
Interrupts in IRR can&#39;t be acked, they first need to propagate to<br \
class="gmail_msg"> ISR (in LAPIC terms). Hence we&#39;d need to know the state of \
the<br class="gmail_msg"> corresponding ISR bit (and for completeness also the IRR \
one) in<br class="gmail_msg"> the LAPIC. I would suspect that the ISR bit is also \
set, and _that_<br class="gmail_msg"> would then indicate we may have missed issuing \
an EOI. But it<br class="gmail_msg"> might also be that some interrupt at a higher \
priority (larger<br class="gmail_msg"> vector number) is not disappearing from ISR, \
effectively masking<br class="gmail_msg"> the relatively low numbered vector here.<br \
class="gmail_msg"> <br class="gmail_msg">
Also, are you positive about the IRR bit here being _permanently_<br \
class="gmail_msg"> set, rather than just at the point this one sample was taken?<br \
class="gmail_msg"> <br class="gmail_msg">
&gt; I&#39;ve also seen similar issues on some boxes, this seems<br \
class="gmail_msg"> &gt; to always happen on boxes with more than one IO APIC. In the \
past I&#39;ve solved it<br class="gmail_msg"> &gt; by setting ioapic_ack=old, but \
that doesn&#39;t seem to work for his case.<br class="gmail_msg"> <br \
class="gmail_msg"> Not so here (for the last so many years), so I wonder whether \
it<br class="gmail_msg"> matters what Dom0 kernel is in use.<br class="gmail_msg">
<br class="gmail_msg">
&gt;&gt; And here are the messages to prove there was a lost interrupt:<br \
class="gmail_msg"> &gt;&gt;<br class="gmail_msg">
&gt;&gt; 11/30/16 5:09 PM      jaga-Desktop      kernel      [10056.569371] ata2: \
lost<br class="gmail_msg"> &gt;&gt; interrupt (Status 0x58)<br class="gmail_msg">
&gt;&gt; 11/30/16 5:09 PM      jaga-Desktop      kernel      [10056.569402] ata3: \
lost<br class="gmail_msg"> &gt;&gt; interrupt (Status 0x58)<br class="gmail_msg">
&gt;&gt; 11/30/16 6:00 PM      jaga-Desktop      kernel      [      0.187813] \
DMAR-IR: This<br class="gmail_msg"> &gt;&gt; system BIOS has enabled interrupt \
remapping<br class="gmail_msg"> &gt;&gt; 11/30/16 6:00 PM      jaga-Desktop      \
kernel      [      0.187813] interrupt<br class="gmail_msg"> &gt;&gt; remapping is \
being disabled.   Please<br class="gmail_msg"> <br class="gmail_msg">
These two messages are suspicious: The kernel should keep its hands<br \
class="gmail_msg"> off any IOMMU things when running under Xen.<br class="gmail_msg">
<br class="gmail_msg">
Jan<br class="gmail_msg"></blockquote><div><br></div><div><span \
style="color:rgb(117,117,117)">iommu_inclusive_mapping=1 does not help. Booting to a \
different kernel makes no difference.</span>  </div><div><br></div><div><font \
color="#757575">Jeff</font>     </div></div></div>


[Attachment #6 (text/plain)]

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
https://lists.xen.org/xen-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic