[prev in list] [next in list] [prev in thread] [next in thread]
List: vdsm-devel
Subject: Re: [ovirt-devel] [OST Failure Report] [oVirt master] [09.02.2017] [test-repo_ovirt_experimental_mas
From: Piotr Kliczewski <pkliczew () redhat ! com>
Date: 2017-02-21 15:53:28
Message-ID: CAGEt_QZDEX7SR3GidL6O1FY2V7LkT_cjK-8jQ2cLNZTSkONFTw () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Dominik,
Here is the exception which caused "Not able to update response"
2017-02-09 06:30:41,828-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host1
command HostSetupNetworksVDS failed: 880 > 879
2017-02-09 06:30:41,829-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command
'HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1,
HostSetupNetworksVdsCommandParameters:{runAsync='true',
hostId='42fa76b6-7bf9-463e-80fa-fcca9b0d3e81',
vds='Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]',
rollbackOnFailure='true', connectivityTimeout='120',
networks='[HostNetwork:{defaultRoute='false', bonding='false',
networkName='Labeled_Network', nicName='eth0', vlan='600', mtu='0',
vmNetwork='false', stp='false', properties='null',
ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null',
ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null',
ipv6Prefix='null', ipv6Gateway='null'}]', removedNetworks='[]',
bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY'})' execution
failed: VDSGenericException: VDSNetworkException: 880 > 879
2017-02-09 06:30:41,829-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: 880 > 879
at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
[vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70)
[vdsbroker.jar:]
at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055)
[bll.jar:]
at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164)
[utils.jar:]
at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103)
[utils.jar:]
at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397)
[bll.jar:]
This could be connected with the specific version of the jackson that we use.
On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <dholler@redhat.com> wrote:
> A deep analysis of the logfiles gives details about the
> unexpected behavior, but I regret to not provide the fault causing the
> unexpected behavior.
>
> To get this fault, the help of someone familiar with
> org.ovirt.vdsm.jsonrpc.client.JsonRpcClient is needed.
>
> In the failing test "assign_labeled_network" a (labeld) network is
> assigned to the cluster. For this reason the network has to be added to
> the hosts. After that, the test "assign_labeled_network" checks, if the
> engine acknowledges that hosts are in the labeld network. This
> execution of the test failed, because this acknowledge of the engine
> is missing after 180 seconds [3].
>
> There are two hosts lago-basic-suite-master-host0 and
> lago-basic-suite-master-host1 in the scenario.
> lago-basic-suite-master-host1 fails and
> lago-basic-suite-master-host0 succeeds, so only
> lago-basic-suite-master-host1 is analyzed below.
>
> Please find here the most relevant steps causing this error:
> 1. The engine sends Host.setupNetworks to the hosts in
> line 40279 - 40295 in [1] with
> "id":"02298344-165f-47e4-9ea4-7c17a55d37f8".
> 2. The host executes the Host.setupNetworks RPC call successfully in
> line 1286 in [2].
> 3. The engine receives the acknowledgment of the successful execution
> in line 40716 and 40717 in [1].
> 4. The error occurs in line 40718:
> '[org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) []
> Not able to update response for
> "02298344-165f-47e4-9ea4-7c17a55d37f8"'. This means the engine can
> not process the acknowledgment of the successful execution.
> 5. The command HostSetupNetworksVDS is aborted.
> So Host.getCapabilities is skipped and the engine database is not
> updated with the new network configuration of the host.
> 6. Since the test script relays on the information from database
> about host network configuration, it does not see that
> Host.setupNetworks is successfully executed and stops with the
> error "False != True after 180 seconds" [3]
>
> So the fault happens before or in step 4 and is around the jsonrpc
> communication.
>
> It is an open action item to precise the location of the fault.
>
>
>
> [1]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_
> label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
>
> [2]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_
> label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log
>
> [3]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/testReport/junit/(root)/005_network_by_label/
> assign_labeled_network/
>
>
>
> On Thu, 9 Feb 2017 14:52:52 +0200
> Shlomo Ben David <sbendavi@redhat.com> wrote:
>
> > Hi,
> >
> >
> > *Test failed:* [test-repo_ovirt_experimental_master]
> >
> > *Link to suspected patches:* n/a
> >
> > *Link to Job:*
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217
> >
> > *Link to all logs:*
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_label.py/
> >
> > *Error snippet from the log: *
> >
> > <error>
> >
> > ifup/VLAN100_Network::ERROR::2017-02-09
> > 06:21:15,236::concurrent::189::root::(run) FINISH thread
> > <Thread(ifup/VLAN100_Network, started daemon 140189099861760)> failed
> > Traceback (most recent call last):
> > File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line
> > 185, in run ret = func(*args, **kwargs)
> > File
> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > line 949, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup)
> > File
> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > line 935, in _exec_ifup_by_name raise
> > ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
> > ConfigNetworkError: (29, 'Determining IPv6 information for
> > VLAN100_Network... failed.')
> >
> > </error>
> >
> > Best Regards,
> >
> > Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
> > RHCSA | RHCVA | RHCE
> > IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
> >
> > OPEN SOURCE - 1 4 011 && 011 4 1
>
>
[Attachment #5 (text/html)]
<div dir="ltr"><div>Dominik,<br><br></div>Here is the exception which caused \
"Not able to update response"<br><br><pre>2017-02-09 06:30:41,828-05 ERROR \
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID: \
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom \
Event ID: -1, Message: VDSM lago-basic-suite-master-host1 command \
HostSetupNetworksVDS failed: 880 > 879 2017-02-09 06:30:41,829-05 ERROR \
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command \
'HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1, \
HostSetupNetworksVdsCommandParameters:{runAsync='true', \
hostId='42fa76b6-7bf9-463e-80fa-fcca9b0d3e81', \
vds='Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]', \
rollbackOnFailure='true', connectivityTimeout='120', \
networks='[HostNetwork:{defaultRoute='false', bonding='false', \
networkName='Labeled_Network', nicName='eth0', vlan='600', \
mtu='0', vmNetwork='false', stp='false', \
properties='null', ipv4BootProtocol='NONE', \
ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', \
ipv6BootProtocol='NONE', ipv6Address='null', \
ipv6Prefix='null', ipv6Gateway='null'}]', \
removedNetworks='[]', bonds='[]', removedBonds='[]', \
clusterSwitchType='LEGACY'})' execution failed: VDSGenericException: \
VDSNetworkException: 880 > 879 2017-02-09 06:30:41,829-05 DEBUG \
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception: \
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: \
VDSNetworkException: 880 > 879 at \
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) \
[vdsbroker.jar:] at \
org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70) \
[vdsbroker.jar:] at \
org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290) \
[bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) \
[bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) \
[bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) \
[bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) \
[utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) \
[utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) \
[bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) \
[bll.jar:]<br><br></pre><pre>This could be connected with the specific version of the \
jackson that we use.<br></pre></div><div class="gmail_extra"><br><div \
class="gmail_quote">On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <span \
dir="ltr"><<a href="mailto:dholler@redhat.com" \
target="_blank">dholler@redhat.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">A deep analysis of the logfiles gives details about the<br> \
unexpected behavior, but I regret to not provide the fault causing the<br> unexpected \
behavior.<br> <br>
To get this fault, the help of someone familiar with<br>
org.ovirt.vdsm.jsonrpc.client.<wbr>JsonRpcClient is needed.<br>
<br>
In the failing test "assign_labeled_network" a (labeld) network is<br>
assigned to the cluster. For this reason the network has to be added to<br>
the hosts. After that, the test "assign_labeled_network" checks, if the<br>
engine acknowledges that hosts are in the labeld network. This<br>
execution of the test failed, because this acknowledge of the engine<br>
is missing after 180 seconds [3].<br>
<br>
There are two hosts lago-basic-suite-master-host0 and<br>
lago-basic-suite-master-host1 in the scenario.<br>
lago-basic-suite-master-host1 fails and<br>
lago-basic-suite-master-host0 succeeds, so only<br>
lago-basic-suite-master-host1 is analyzed below.<br>
<br>
Please find here the most relevant steps causing this error:<br>
1. The engine sends Host.setupNetworks to the hosts in<br>
line <a href="tel:40279%20-%2040295" value="+14027940295">40279 - 40295</a> in \
[1] with<br>
"id":"02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8".<br>
2. The host executes the Host.setupNetworks RPC call successfully in<br>
line 1286 in [2].<br>
3. The engine receives the acknowledgment of the successful execution<br>
in line 40716 and 40717 in [1].<br>
4. The error occurs in line 40718:<br>
'[org.ovirt.vdsm.jsonrpc.<wbr>client.JsonRpcClient] (ResponseWorker) []<br>
Not able to update response for<br>
"02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8"'. This means the \
engine can<br> not process the acknowledgment of the successful execution.<br>
5. The command HostSetupNetworksVDS is aborted.<br>
So Host.getCapabilities is skipped and the engine database is not<br>
updated with the new network configuration of the host.<br>
6. Since the test script relays on the information from database<br>
about host network configuration, it does not see that<br>
Host.setupNetworks is successfully executed and stops with the<br>
error "False != True after 180 seconds" [3]<br>
<br>
So the fault happens before or in step 4 and is around the jsonrpc<br>
communication.<br>
<br>
It is an open action item to precise the location of the fault.<br>
<br>
<br>
<br>
[1]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/arti \
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_ne \
twork_by_label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7 \
/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-engine/_var_log/ovirt-<wbr>engine/engine.log</a><br>
<br>
[2]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/arti \
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7 \
/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-host1/_var_log/vdsm/<wbr>vdsm.log</a><br>
<br>
[3]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/testReport/junit/(root)/005_network_by_label/assign_labeled_network/" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/testReport/junit/(<wbr>root)/005_network_by_label/<wbr>assign_labeled_network/</a><br>
<br>
<br>
<br>
On Thu, 9 Feb 2017 14:52:52 +0200<br>
Shlomo Ben David <<a href="mailto:sbendavi@redhat.com">sbendavi@redhat.com</a>> \
wrote:<br> <br>
> Hi,<br>
><br>
><br>
> *Test failed:* [test-repo_ovirt_experimental_<wbr>master]<br>
><br>
> *Link to suspected patches:* n/a<br>
><br>
> *Link to Job:*<br>
> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217</a><br>
><br>
> *Link to all logs:*<br>
> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/ar \
tifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/</a><br>
><br>
> *Error snippet from the log: *<br>
><br>
> <error><br>
><br>
> ifup/VLAN100_Network::ERROR::<wbr>2017-02-09<br>
> 06:21:15,236::concurrent::189:<wbr>:root::(run) FINISH thread<br>
> <Thread(ifup/VLAN100_Network, started daemon 140189099861760)> failed<br>
> Traceback (most recent call last):<br>
> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/concurrent.py", \
line<br> > 185, in run ret = func(*args, **kwargs)<br>
> File<br>
> "/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py",<br>
> line 949, in _exec_ifup _exec_ifup_by_name(<a href="http://iface.name" \
rel="noreferrer" target="_blank">iface.name</a>, cgroup)<br> > File<br>
> "/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py",<br>
> line 935, in _exec_ifup_by_name raise<br>
> ConfigNetworkError(ERR_FAILED_<wbr>IFUP, out[-1] if out else '')<br>
> ConfigNetworkError: (29, 'Determining IPv6 information for<br>
> VLAN100_Network... failed.')<br>
><br>
> </error><br>
><br>
> Best Regards,<br>
><br>
> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br>
> RHCSA | RHCVA | RHCE<br>
> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br>
><br>
> OPEN SOURCE - 1 4 011 && 011 4 1<br>
<br>
</blockquote></div><br></div>
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic