[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xen-devel
Subject:    Re: [Xen-devel] [PATCH v3 5/8] ARM: VGIC: factor out vgic_connect_hw_irq()
From:       Julien Grall <julien.grall () linaro ! org>
Date:       2018-01-31 16:30:28
Message-ID: d60ec8ef-81e9-d4a3-4e35-8e27bccffa30 () linaro ! org
[Download RAW message or body]



On 31/01/18 15:54, Andre Przywara wrote:
> Hi,
> 
> Yeah! Locking discussions! Have fun below ;-)
> 
> On 30/01/18 13:19, Julien Grall wrote:
>> Hi Andre,
>>
>> On 24/01/18 18:10, Andre Przywara wrote:
>>> At the moment we happily access VGIC internal data structures like
>>> the rank and struct pending_irq in gic.c, which should be VGIC agnostic.
>>>
>>> Factor out a new function vgic_connect_hw_irq(), which allows a virtual
>>> IRQ to be connected to a hardware IRQ (using the hw bit in the LR).
>>>
>>> This removes said accesses to VGIC data structures and improves
>>> abstraction.
>>
>> You are modifying the locking of the 2 functions. But I don't see how
>> this is safe. Can you explain it?
> 
> Are you worried about anything particular? I will explain my reasoning
> below, but feel free to point me to the cause of your gripes.

In general, it is quite nice to explain roughly in the commit message 
why the new locking order is ok. It would avoid reviewers to spend time 
guessing why it is fine.

In that particular case I am concerned about any potential concurrent 
access on anything related to a vIRQ.

[...]

>>>        set_bit(_IRQ_GUEST, &desc->status);
>>
>> This looks wrong to me. You don't want to execute any of the code below
>> if you are not able to route the pIRQ. For instance because the vIRQ has
>> already a desc assigned.
> 
> Ah, good point. Indeed I didn't consider the failure path. Should be
> easily fixed, though. Thanks for catching this.
> 
>>>    @@ -156,31 +141,19 @@ int gic_route_irq_to_guest(struct domain *d,
>>> unsigned int virq,
>>>            gic_set_irq_type(desc, desc->arch.type);
>>>        gic_set_irq_priority(desc, priority);
>>>    -    p->desc = desc;
>>> -    res = 0;
>>> -
>>> -out:
>>> -    vgic_unlock_rank(v_target, rank, flags);
>>> -
>>> -    return res;
>>> +    return vgic_connect_hw_irq(d, NULL, virq, desc);
>>>    }
>>>      /* This function only works with SPIs for now */
>>>    int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
>>>                                  struct irq_desc *desc)
>>>    {
>>> -    struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);
>>> -    struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
>>> -    struct pending_irq *p = irq_to_pending(v_target, virq);
>>> -    unsigned long flags;
>>> +    int ret;
>>>          ASSERT(spin_is_locked(&desc->lock));
>>>        ASSERT(test_bit(_IRQ_GUEST, &desc->status));
>>> -    ASSERT(p->desc == desc);
>>
>> You dropped this assert but I don't see any replacement in the new code?
>> We really want to make sure the caller will not do something dumb here
>> (like passing a different desc).
> 
> So the first thing here is that I can't have anything dereferencing
> struct pending_irq here. Secondly the rank lock (protecting the p->
> structure) is only taken below, so nothing prevents this from changing
> between the ASSERT and the lock, AFAICS.

You could move the ASSERT within the lock, right?

> And to be honest, I don't really get the purpose of this ASSERT: the
> desc pointer is taken from the pending_irq in the caller, but without
> any locks. So if I am not mistaken, it could race with a
> gic_route_irq_to_xen(), and that would lead to the ASSERT triggering,
> just because of this race and not because of the code being broken
> ultimately.
> I *could* get the irq_desc by calling the new vgic_get_hw_irq_desc() -
> again. Not sure if that is useful, though.
> Another possibility would be to rethink this whole functionality:
> The only caller (release_guest_irq() in irq.c) gets a virtual IRQ
> number, then finds the associated irq_desc, only to lock it. Then it
> passes both the virtual IRQ number and the irq_desc to this function,
> where both are rechecked. The reason for this redundancy seems to be
> some locking order (irq_desc first?), however I can't find any
> documentation about this.
> So I wonder if we could just pass on only the virtual IRQ number, and
> let it up to this function here to safely retrieve the right irq_desc.

While I agree that the ASSERT without any lock is dangerous, it could at 
least catch someone passing the wrong irq_desc. Something will really go 
wrong if you disable pIRQ A but the irq_desc was belonging to pIRQ B.

And I agree that the code does not prevent that today. But it at least 
limit the scope of the problem.

So I think the code should be:

gic_remove_irq_from_guest(.....)
{

    vgic_lock_rank(...)
    if ( p->desc != desc )
    {
	vgic_unlock_rank(...)
         return -1;
    }

    do the vGIC removal

    vgic_unlock_rank(...)

    return 0;
}

> 
>>>        ASSERT(!is_lpi(virq));
>>>    -    vgic_lock_rank(v_target, rank, flags);
> 
> I couldn't find what this lock protects here, so early at least. Until
> the actual "p->desc = NULL;" line below nothing needs to be protected by
> this lock, it's all already covered by the desc lock.
> We only need the lock to eventually atomically remove the connection
> between the h/w and the virtual IRQ, which is done in
> vgic_connect_hw_irq() now.

See above.

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic