'Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       qemu-ppc
Subject:    Re: [PATCH] target/ppc: handle vcpu hotplug failure gracefully
From:       "Nicholas Piggin" <npiggin () gmail ! com>
Date:       2024-05-17 3:57:47
Message-ID: D1BMB9HRF50E.2UC36WJ1ZHAFU () gmail ! com
[Download RAW message or body]

On Thu May 16, 2024 at 2:31 PM AEST, Harsh Prateek Bora wrote:
> Hi Nick,
>
> On 5/14/24 08:39, Nicholas Piggin wrote:
> > On Tue Apr 23, 2024 at 4:30 PM AEST, Harsh Prateek Bora wrote:
> >> + qemu-devel
> >>
> >> On 4/23/24 11:40, Harsh Prateek Bora wrote:
> >>> On ppc64, the PowerVM hypervisor runs with limited memory and a VCPU
> >>> creation during hotplug may fail during kvm_ioctl for KVM_CREATE_VCPU,
> >>> leading to termination of guest since errp is set to &error_fatal while
> >>> calling kvm_init_vcpu. This unexpected behaviour can be avoided by
> >>> pre-creating vcpu and parking it on success or return error otherwise.
> >>> This enables graceful error delivery for any vcpu hotplug failures while
> >>> the guest can keep running.
> > 
> > So this puts in on the park list so when kvm_init_vcpu() later runs it
> > will just take it off the park list instead of issuing another
> > KVM_CREATE_VCPU ioctl.
> > 
> > And kvm_init_vcpu() runs in the vcpu thread function, which does not
> > have a good way to indicate failure to the caller.
> > 
> > I'm don't know a lot about this part of qemu but it seems like a good
> > idea to move fail-able initialisation out of the vcpu thread in that
> > case. So the general idea seems good to me.
> > 
>
> Yeh ..
>
> >>>
> >>> Based on api refactoring to create/park vcpus introduced in 1/8 of patch series:
> >>> https://lore.kernel.org/qemu-devel/20240312020000.12992-2-salil.mehta@huawei.com/
> > 
> > So from this series AFAIKS you're just using kvm_create / kvm_park
> > routines? You could easily pull that patch 1 out ahead of that larger
> > series if progress is slow on it, it's a decent cleanup by itself by
> > the looks.
> > 
>
> Yeh, patch 1 of that series is only we need but the author mentioned on 
> the list that he is about to post next version soon.
>
> >>>
> >>> Tested OK by repeatedly doing a hotplug/unplug of vcpus as below:
> >>>
> >>>    #virsh setvcpus hotplug 40
> >>>    #virsh setvcpus hotplug 70
> >>> error: internal error: unable to execute QEMU command 'device_add':
> >>> kvmppc_cpu_realize: vcpu hotplug failed with -12
> >>>
> >>> Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
> >>> Suggested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> >>> Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> >>> Signed-off by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> >>> ---
> >>> ---
> >>>    target/ppc/kvm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >>>    1 file changed, 42 insertions(+)
> >>>
> >>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >>> index 8231feb2d4..c887f6dfa0 100644
> >>> --- a/target/ppc/kvm.c
> >>> +++ b/target/ppc/kvm.c
> >>> @@ -48,6 +48,8 @@
> >>>    #include "qemu/mmap-alloc.h"
> >>>    #include "elf.h"
> >>>    #include "sysemu/kvm_int.h"
> >>> +#include "sysemu/kvm.h"
> >>> +#include "hw/core/accel-cpu.h"
> >>>    
> >>>    #define PROC_DEVTREE_CPU      "/proc/device-tree/cpus/"
> >>>    
> >>> @@ -2339,6 +2341,43 @@ static void alter_insns(uint64_t *word, uint64_t flags, bool on)
> >>>        }
> >>>    }
> >>>    
> >>> +static int max_cpu_index = 0;
> >>> +
> >>> +static bool kvmppc_cpu_realize(CPUState *cs, Error **errp)
> >>> +{
> >>> +    int ret;
> >>> +
> >>> +    cs->cpu_index = max_cpu_index++;
> >>> +
> >>> +    POWERPC_CPU(cs)->vcpu_id = cs->cpu_index;
> > 
> > So you're overriding the cpu_get_free_index() allocator here.
> > And you need to because vcpu_id needs to be assigned before
> > the KVM create, I guess.
> > 
>
> Yes ..
>
> > I guess it works. I would add a comment like s390x has.
> > 
> Not sure which comment you were referring to but with exporting
> cpu_get_free_index as suggested later, not sure if we still need any
> comment.

Yeah that's true.

> >>> +
> >>> +    if (cs->parent_obj.hotplugged) {
> > 
> > Can _all_ kvm cpu creation go via this path? Why just limit it to
> > hotplugged?
>
> For the initial bootup, we actually want to abort if the requested vCPUs
> cant be allocated so that user can retry until the requested vCPUs are
> allocated. For hotplug failure, bringing down entire guest isn't fair,
> hence the fix.

But you could make the error handling depend on hotplugged, no?
Perhaps put that error handling decision in common code so policy
is the same for all targets and back ends.

[...]

> >>> +}
> >>> +
> >>>    static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
> >>>    {
> >>>        PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
> >>> @@ -2963,4 +3002,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >>>    
> >>>    void kvm_arch_accel_class_init(ObjectClass *oc)
> >>>    {
> >>> +    AccelClass *ac = ACCEL_CLASS(oc);
> >>> +    ac->cpu_common_realize = kvmppc_cpu_realize;
> >>> +    ac->cpu_common_unrealize = kvmppc_cpu_unrealize;
> >>>    }

One other thing I noticed -- cpu_common_realize seems to be for
core code and cpu_target_realize for targets. Should we be
using the latter here? If not, a comment would be warranted and
probably also a comment in accel_cpu_common_realize().

Thanks,
Nick

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic