[prev in list] [next in list] [prev in thread] [next in thread] 

List:       vdsm-devel
Subject:    =?utf-8?q?=5Bovirt-devel=5D?= Re: RHV and oVirt CBT issue
From:       Nir Soffer <nsoffer () redhat ! com>
Date:       2021-07-30 15:37:46
Message-ID: CAMRbyytTR8BE8Wp30MLoERNAKG8SswD6y9fGKCCVEJp1_1Zvmg () mail ! gmail ! com
[Download RAW message or body]

On Fri, Jul 30, 2021 at 5:47 AM luwen.zhang <luwen.zhang@vinchin.com> wrote:
> Sorry I was trying to open a new thread for this issue, but it seems I failed to \
> submit. Here let me explain how the issue is reproduced. 
> It's a regular backup by using CBT+imageip API, after a series of successful \
> backup, at one of the backup session beginning, when we try to obtain the VM config \
> and the snapshot list (obtain snapshot list can determine the VM virtual disk \
> format is RAW or QCOW2)

Why do you need the snapshot list when doing incremental backup? What you need
is the list of disks in the vms, accessible via:

    GET /vms/{vm-id}/diskattachments

For each disk attachment, get the disk using the diskattachment.disk.id:

    GET /disks/{disk-id}/

Please check how we do this in backup_vm.py example:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py#L445

> by using `GET vms/<vm-id>/snapshots`, but get the following error.
> 
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> 
> <fault>
> 
> <detail>duplicate key acf1edaa-e950-4c4f-94df-1bd6b3da49c1 (attempted merging \
> values org.ovirt.engine.core.common.businessentities.storage.diskimage@5103046c and \
> org.ovirt.engine.core.common.businessentities.storage.diskimage@d973046c)</detail> 
> <reason>Operation Failed</reason>
> 
> </fault>

We need a much more detailed steps.

This is a typical backup flow:

1. Start incremental backup

2. Wait until backup is ready (phase == READY)

3. Start image transfer for incremental backup

4. Wait until image transfer is ready (phase == TRANSFERRING)

5. Download disk incremental data

6. Finalize transfer

7. Wait until transfer is finished (phase == FINISHED_SUCCESS/FINISHED_FAILURE)

This is not easy, see this example:
    https://github.com/oVirt/ovirt-engine-sdk/blob/ac6f05bb5dcd8fdee2a67b2a296ade377661836a/sdk/examples/helpers/imagetransfer.py#L269


8. Finalize backup

9. Wait until backup is finished (phase == FINISHED/FAILED)

This is easier, but possible only since 4.4.7:
    https://github.com/oVirt/ovirt-engine-sdk/blob/ac6f05bb5dcd8fdee2a67b2a296ade377661836a/sdk/examples/backup_vm.py#L341


10. Rebase backup image on previous backup (if you store backup as qcow2 layers)

Where in this flow you get the snapshot list (and other stuff?)

Getting snapshots list is likely not needed for backup, but we need to fix it
in case it is broken while running backups or image transfers.

Do you run this flow in a loop? Maybe you do not wait until the previous image
transfer was finished before starting a new backup?

> After this, on oVirt engine web console, the VM show 2 disks (actually it only has \
> 1) , and the disk status always showing "Finalizing", it's been more than 30 hours \
> now, and during this, cannot modify VM disk or take snapshots. 
> Before upgrading oVirt engine to 4.4.7.7-1.el8 this problem happened frequently, \
> after upgrading, the frequency is reduced. 
> Here I'm adding the engine logs and vdsm logs.
> Engine logs: https://drive.google.com/file/d/1T3-EOxYYl3oFZOA9VMMBte5WyBoUO48U/view?usp=sharing
>  VDSM logs: https://drive.google.com/file/d/1x0B8lGqnKEDrgn666CuN3hqUGwD7fcYv/view?usp=sharing
> 

Thanks, we will check the logs next week.

> Thanks & regards!
> On 07/29/2021 19:20,Nir Soffer<nsoffer@redhat.com> wrote:
> 
> On Thu, Jul 29, 2021 at 10:08 AM luwen.zhang <luwen.zhang@vinchin.com> wrote:
> 
> The problem occurred yesterday, but we waited for more than 20 hours, still 2 disks \
> and in Finalizing state. 
> 
> If the image transfer is "finalizing" it means the image transfer is
> trying to finalize, but the operation could not complete.
> 
> In this phase the disk remains locked, and it should not be possible
> to start a new image transfer
> (e.g perform another backup).
> 
> Engine and vdsm logs should explain why the image transfer is stuck in
> the finalizing phase.
> 
> Can you add detailed instructions on how to reproduce this issue?
> 
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/T4JXFOILOB6CLNJIXHTV3Y2J6RY4VAFA/



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic