[prev in list] [next in list] [prev in thread] [next in thread] 

List:       libvir-list
Subject:    Re: [libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination
From:       Nikolay Shirokovskiy <nshirokovskiy () virtuozzo ! com>
Date:       2017-12-25 7:42:41
Message-ID: 62c98f2a-506a-bf10-f836-807e09e3bfc5 () virtuozzo ! com
[Download RAW message or body]



On 22.12.2017 17:13, John Ferlan wrote:
> [...]
> 
> > > 
> > > Still adding the "virHashRemoveAll(dmn->servers);" into
> > > virNetDaemonClose doesn't help the situation as I can still either crash
> > > randomly or hang, so I'm less convinced this would really fix anything.
> > > It does change the "nature" of the hung thread stack trace though, as
> > > the second thread is now:
> > 
> > virHashRemoveAll is not enough now. Due to unref reordeing last ref to @srv is
> > unrefed after virStateCleanup. So we need to virObjectUnref(srv|srvAdm) before
> > virStateCleanup. Or we can call virThreadPoolFree from virNetServerClose (
> > as in the first version of the patch and as Erik suggests) instead
> > of virHashRemoveAll.
> > 
> 
> Patches w/
> 
> 1. Long pause before GetAllStats (without using [u]sleep)
> 2. Adjustment to call virNetServerServiceToggle in
> virNetServerServiceClose (instead of virNetServerDispose)
> 3. Call virHashRemoveAll in virNetDaemonClose
> 4. Call virThreadPoolFree in virNetServerClose
> 5. Perform Unref (adminProgram, srvAdm, qemuProgram, lxcProgram,
> remoteProgream, and srv) before virNetDaemonClose
> 
> Still has the virCondWait's - so as Daniel points out there's quite a
> bit more work to be done. Like most Red Hat engineers - I will not be
> very active over the next week or so (until the New Year) as it's a
> holiday break/vacation for us.
> 
> So unless you have the burning desire to put together some patches and
> do the work yourself, more thoughts/work will need to wait.
> 
> John
> 

I've checked what's going on after applying patch you described above
(however it would be enough to apply only 3 (or 4) and part of 5 besides
pause hunk). I get hangs too and this kind of hangs are fixed by 
second series - '[PATCH 0/4] libvirtd: fix hang on termination in qemu driver'.
That is there is a next hang backtrace besides hang in thread
freeing thread pool you already mentioned:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at \
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1  0x00007ffff7335c58 \
in virCondWait (c=0x7fffc4000e18, m=0x7fffc4000df0) at util/virthread.c:154 #2  \
0x00007fffd9605983 in qemuMonitorSend (mon=0x7fffc4000de0, msg=0x7fffe70bd1f0) at \
qemu/qemu_monitor.c:1067 #3  0x00007fffd961b68f in qemuMonitorJSONCommandWithFd \
(mon=0x7fffc4000de0, cmd=0x7fffb0005310, scm_fd=-1,   reply=0x7fffe70bd2d0) at \
qemu/qemu_monitor_json.c:300 #4  0x00007fffd961b7c1 in qemuMonitorJSONCommand \
(mon=0x7fffc4000de0, cmd=0x7fffb0005310, reply=0x7fffe70bd2d0)  at \
qemu/qemu_monitor_json.c:330 #5  0x00007fffd9629f0b in \
qemuMonitorJSONGetObjectListPaths (mon=0x7fffc4000de0,   path=0x7fffd96a7c96 \
"/machine/peripheral", paths=0x7fffe70bd380) at qemu/qemu_monitor_json.c:5715 #6  \
0x00007fffd962dcc4 in qemuMonitorJSONFindObjectPathByAlias (mon=0x7fffc4000de0,   \
name=0x7fffd969f3cd "virtio-balloon-pci", alias=0x7fffcc1e8d30 "balloon0", \
path=0x7fffe70bd450)  at qemu/qemu_monitor_json.c:7235
#7  0x00007fffd962e231 in qemuMonitorJSONFindLinkPath (mon=0x7fffc4000de0, \
name=0x7fffd969f3cd "virtio-balloon-pci",   alias=0x7fffcc1e8d30 "balloon0", \
path=0x7fffe70bd450) at qemu/qemu_monitor_json.c:7349 #8  0x00007fffd9605bf7 in \
qemuMonitorInitBalloonObjectPath (mon=0x7fffc4000de0, balloon=0x7fffcc1e8e60)  at \
qemu/qemu_monitor.c:1157 #9  0x00007fffd96098d3 in qemuMonitorGetMemoryStats \
(mon=0x7fffc4000de0, balloon=0x7fffcc1e8e60,   stats=0x7fffe70bd5b0, nr_stats=10) at \
qemu/qemu_monitor.c:2133 #10 0x00007fffd964e70c in qemuDomainMemoryStatsInternal \
(driver=0x7fffcc1872a0, vm=0x7fffcc2737e0,   stats=0x7fffe70bd5b0, nr_stats=10) at \
qemu/qemu_driver.c:11453 #11 0x00007fffd9667013 in qemuDomainGetStatsBalloon \
(driver=0x7fffcc1872a0, dom=0x7fffcc2737e0,   record=0x7fffb00008c0, \
maxparams=0x7fffe70bd6b0, privflags=1) at qemu/qemu_driver.c:19478 #12 \
0x00007fffd9669597 in qemuDomainGetStats (conn=0x7fffb80030e0, dom=0x7fffcc2737e0, \
stats=127,   record=0x7fffe70bd790, flags=1) at qemu/qemu_driver.c:20133
#13 0x00007fffd966997f in qemuConnectGetAllDomainStats (conn=0x7fffb80030e0, \
doms=0x7fffb0005220, ndoms=1,   stats=127, retStats=0x7fffe70bd8e0, flags=0) at \
qemu/qemu_driver.c:20226 #14 0x00007ffff7424fd7 in virDomainListGetStats \
(doms=0x7fffb0005220, stats=0, retStats=0x7fffe70bd8e0, flags=0)  at \
libvirt-domain.c:11595 #15 0x00005555555ac030 in \
remoteDispatchConnectGetAllDomainStats (server=0x55555612a3a0, client=0x555556151d10, \
  msg=0x555556152540, rerr=0x7fffe70bda20, args=0x7fffb00036e0, ret=0x7fffb0002d20) \
at remote.c:6538

I'm writing this not to involve you back into the work and do not expect a reply. It \
is holydays) Only to document my research.

Nikolay


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic