[prev in list] [next in list] [prev in thread] [next in thread] 

List:       vdsm-devel
Subject:    Re: [ovirt-devel] [OST Failure Report] [oVirt master] [09.02.2017] [test-repo_ovirt_experimental_mas
From:       Piotr Kliczewski <pkliczew () redhat ! com>
Date:       2017-02-21 15:53:28
Message-ID: CAGEt_QZDEX7SR3GidL6O1FY2V7LkT_cjK-8jQ2cLNZTSkONFTw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Dominik,

Here is the exception which caused "Not able to update response"

2017-02-09 06:30:41,828-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host1
command HostSetupNetworksVDS failed: 880 > 879
2017-02-09 06:30:41,829-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command
'HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1,
HostSetupNetworksVdsCommandParameters:{runAsync='true',
hostId='42fa76b6-7bf9-463e-80fa-fcca9b0d3e81',
vds='Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]',
rollbackOnFailure='true', connectivityTimeout='120',
networks='[HostNetwork:{defaultRoute='false', bonding='false',
networkName='Labeled_Network', nicName='eth0', vlan='600', mtu='0',
vmNetwork='false', stp='false', properties='null',
ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null',
ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null',
ipv6Prefix='null', ipv6Gateway='null'}]', removedNetworks='[]',
bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY'})' execution
failed: VDSGenericException: VDSNetworkException: 880 > 879
2017-02-09 06:30:41,829-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: 880 > 879
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
 [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70)
 [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290)
 [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251)
 [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391)
 [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055)
[bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164)
 [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103)
 [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451)
[bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397)
[bll.jar:]

This could be connected with the specific version of the jackson that we use.


On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <dholler@redhat.com> wrote:

> A deep analysis of the logfiles gives details about the
> unexpected behavior, but I regret to not provide the fault causing the
> unexpected behavior.
> 
> To get this fault, the help of someone familiar with
> org.ovirt.vdsm.jsonrpc.client.JsonRpcClient is needed.
> 
> In the failing test "assign_labeled_network" a (labeld) network is
> assigned to the cluster. For this reason the network has to be added to
> the hosts. After that, the test "assign_labeled_network" checks, if the
> engine acknowledges that hosts are in the labeld network. This
> execution of the test failed, because this acknowledge of the engine
> is missing after 180 seconds [3].
> 
> There are two hosts lago-basic-suite-master-host0 and
> lago-basic-suite-master-host1 in the scenario.
> lago-basic-suite-master-host1 fails and
> lago-basic-suite-master-host0 succeeds, so only
> lago-basic-suite-master-host1 is analyzed below.
> 
> Please find here the most relevant steps causing this error:
> 1. The engine sends Host.setupNetworks to the hosts in
> line 40279 - 40295 in [1] with
> "id":"02298344-165f-47e4-9ea4-7c17a55d37f8".
> 2. The host executes the Host.setupNetworks RPC call successfully in
> line 1286 in [2].
> 3. The engine receives the acknowledgment of the successful execution
> in line 40716 and 40717 in [1].
> 4. The error occurs in line 40718:
> '[org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) []
> Not able to update response for
> "02298344-165f-47e4-9ea4-7c17a55d37f8"'. This means the engine can
> not process the acknowledgment of the successful execution.
> 5. The command HostSetupNetworksVDS is aborted.
> So Host.getCapabilities is skipped and the engine database is not
> updated with the new network configuration of the host.
> 6. Since the test script relays on the information from database
> about host network configuration, it does not see that
> Host.setupNetworks is successfully executed and stops with the
> error "False != True after 180 seconds" [3]
> 
> So the fault happens before or in step 4 and is around the jsonrpc
> communication.
> 
> It is an open action item to precise the location of the fault.
> 
> 
> 
> [1]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_
> label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
> 
> [2]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_
> label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log
> 
> [3]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/testReport/junit/(root)/005_network_by_label/
> assign_labeled_network/
> 
> 
> 
> On Thu, 9 Feb 2017 14:52:52 +0200
> Shlomo Ben David <sbendavi@redhat.com> wrote:
> 
> > Hi,
> > 
> > 
> > *Test failed:* [test-repo_ovirt_experimental_master]
> > 
> > *Link to suspected patches:* n/a
> > 
> > *Link to Job:*
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217
> > 
> > *Link to all logs:*
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/5217/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-005_network_by_label.py/
> > 
> > *Error snippet from the log: *
> > 
> > <error>
> > 
> > ifup/VLAN100_Network::ERROR::2017-02-09
> > 06:21:15,236::concurrent::189::root::(run) FINISH thread
> > <Thread(ifup/VLAN100_Network, started daemon 140189099861760)> failed
> > Traceback (most recent call last):
> > File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line
> > 185, in run ret = func(*args, **kwargs)
> > File
> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > line 949, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup)
> > File
> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > line 935, in _exec_ifup_by_name raise
> > ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
> > ConfigNetworkError: (29, 'Determining IPv6 information for
> > VLAN100_Network... failed.')
> > 
> > </error>
> > 
> > Best Regards,
> > 
> > Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
> > RHCSA | RHCVA | RHCE
> > IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
> > 
> > OPEN SOURCE - 1 4 011 && 011 4 1
> 
> 


[Attachment #5 (text/html)]

<div dir="ltr"><div>Dominik,<br><br></div>Here is the exception which caused \
&quot;Not able to update response&quot;<br><br><pre>2017-02-09 06:30:41,828-05 ERROR \
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID: \
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom \
Event ID: -1, Message: VDSM lago-basic-suite-master-host1 command \
HostSetupNetworksVDS failed: 880 &gt; 879 2017-02-09 06:30:41,829-05 ERROR \
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command \
&#39;HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1, \
HostSetupNetworksVdsCommandParameters:{runAsync=&#39;true&#39;, \
hostId=&#39;42fa76b6-7bf9-463e-80fa-fcca9b0d3e81&#39;, \
vds=&#39;Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]&#39;, \
rollbackOnFailure=&#39;true&#39;, connectivityTimeout=&#39;120&#39;, \
networks=&#39;[HostNetwork:{defaultRoute=&#39;false&#39;, bonding=&#39;false&#39;, \
networkName=&#39;Labeled_Network&#39;, nicName=&#39;eth0&#39;, vlan=&#39;600&#39;, \
mtu=&#39;0&#39;, vmNetwork=&#39;false&#39;, stp=&#39;false&#39;, \
properties=&#39;null&#39;, ipv4BootProtocol=&#39;NONE&#39;, \
ipv4Address=&#39;null&#39;, ipv4Netmask=&#39;null&#39;, ipv4Gateway=&#39;null&#39;, \
ipv6BootProtocol=&#39;NONE&#39;, ipv6Address=&#39;null&#39;, \
ipv6Prefix=&#39;null&#39;, ipv6Gateway=&#39;null&#39;}]&#39;, \
removedNetworks=&#39;[]&#39;, bonds=&#39;[]&#39;, removedBonds=&#39;[]&#39;, \
clusterSwitchType=&#39;LEGACY&#39;})&#39; execution failed: VDSGenericException: \
VDSNetworkException: 880 &gt; 879 2017-02-09 06:30:41,829-05 DEBUG \
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] \
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception: \
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: \
VDSNetworkException: 880 &gt; 879  at \
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) \
[vdsbroker.jar:]  at \
org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70) \
[vdsbroker.jar:]  at \
org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290) \
[bll.jar:]  at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) \
[bll.jar:]  at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) \
[bll.jar:]  at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) \
[bll.jar:]  at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) \
[utils.jar:]  at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) \
[utils.jar:]  at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) \
[bll.jar:]  at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) \
[bll.jar:]<br><br></pre><pre>This could be connected with the specific version of the \
jackson that we use.<br></pre></div><div class="gmail_extra"><br><div \
class="gmail_quote">On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <span \
dir="ltr">&lt;<a href="mailto:dholler@redhat.com" \
target="_blank">dholler@redhat.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">A deep analysis of the logfiles gives details about the<br> \
unexpected behavior, but I regret to not provide the fault causing the<br> unexpected \
behavior.<br> <br>
To get this fault, the help of someone familiar with<br>
org.ovirt.vdsm.jsonrpc.client.<wbr>JsonRpcClient is needed.<br>
<br>
In the failing test &quot;assign_labeled_network&quot; a (labeld) network is<br>
assigned to the cluster. For this reason the network has to be added to<br>
the hosts. After that, the test &quot;assign_labeled_network&quot; checks, if the<br>
engine acknowledges that hosts are in the labeld network. This<br>
execution of the test failed, because this acknowledge of the engine<br>
is missing after 180 seconds [3].<br>
<br>
There are two hosts lago-basic-suite-master-host0 and<br>
lago-basic-suite-master-host1 in the scenario.<br>
lago-basic-suite-master-host1 fails and<br>
lago-basic-suite-master-host0 succeeds, so only<br>
lago-basic-suite-master-host1 is analyzed below.<br>
<br>
Please find here the most relevant steps causing this error:<br>
1. The engine sends Host.setupNetworks to the hosts in<br>
     line <a href="tel:40279%20-%2040295" value="+14027940295">40279 - 40295</a> in \
                [1] with<br>
     &quot;id&quot;:&quot;02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8&quot;.<br>
2. The host executes the Host.setupNetworks RPC call successfully in<br>
     line 1286 in [2].<br>
3. The engine receives the acknowledgment of the successful execution<br>
     in line 40716 and 40717 in [1].<br>
4. The error occurs in line 40718:<br>
     &#39;[org.ovirt.vdsm.jsonrpc.<wbr>client.JsonRpcClient] (ResponseWorker) []<br>
     Not able to update response for<br>
     &quot;02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8&quot;&#39;. This means the \
engine can<br>  not process the acknowledgment of the successful execution.<br>
5. The command HostSetupNetworksVDS is aborted.<br>
     So Host.getCapabilities is skipped and the engine database is not<br>
     updated with the new network configuration of the host.<br>
6. Since the test script relays on the information from database<br>
     about host network configuration, it does not see that<br>
     Host.setupNetworks is successfully executed and stops with the<br>
     error &quot;False != True after 180 seconds&quot; [3]<br>
<br>
So the fault happens before or in step 4 and is around the jsonrpc<br>
communication.<br>
<br>
It is an open action item to precise the location of the fault.<br>
<br>
<br>
<br>
[1]<br>
   <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/arti \
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_ne \
twork_by_label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7 \
/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-engine/_var_log/ovirt-<wbr>engine/engine.log</a><br>
 <br>
[2]<br>
   <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/arti \
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7 \
/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-host1/_var_log/vdsm/<wbr>vdsm.log</a><br>
 <br>
[3]<br>
   <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/testReport/junit/(root)/005_network_by_label/assign_labeled_network/" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/testReport/junit/(<wbr>root)/005_network_by_label/<wbr>assign_labeled_network/</a><br>
 <br>
<br>
<br>
On Thu, 9 Feb 2017 14:52:52 +0200<br>
Shlomo Ben David &lt;<a href="mailto:sbendavi@redhat.com">sbendavi@redhat.com</a>&gt; \
wrote:<br> <br>
&gt; Hi,<br>
&gt;<br>
&gt;<br>
&gt; *Test failed:* [test-repo_ovirt_experimental_<wbr>master]<br>
&gt;<br>
&gt; *Link to suspected patches:* n/a<br>
&gt;<br>
&gt; *Link to Job:*<br>
&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217</a><br>
 &gt;<br>
&gt; *Link to all logs:*<br>
&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/ar \
tifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/" \
rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_exp \
erimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/</a><br>
 &gt;<br>
&gt; *Error snippet from the log: *<br>
&gt;<br>
&gt; &lt;error&gt;<br>
&gt;<br>
&gt; ifup/VLAN100_Network::ERROR::<wbr>2017-02-09<br>
&gt; 06:21:15,236::concurrent::189:<wbr>:root::(run) FINISH thread<br>
&gt; &lt;Thread(ifup/VLAN100_Network, started daemon 140189099861760)&gt; failed<br>
&gt; Traceback (most recent call last):<br>
&gt;     File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/concurrent.py&quot;, \
line<br> &gt; 185, in run ret = func(*args, **kwargs)<br>
&gt;     File<br>
&gt; &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py&quot;,<br>
 &gt; line 949, in _exec_ifup _exec_ifup_by_name(<a href="http://iface.name" \
rel="noreferrer" target="_blank">iface.name</a>, cgroup)<br> &gt;     File<br>
&gt; &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py&quot;,<br>
 &gt; line 935, in _exec_ifup_by_name raise<br>
&gt; ConfigNetworkError(ERR_FAILED_<wbr>IFUP, out[-1] if out else &#39;&#39;)<br>
&gt; ConfigNetworkError: (29, &#39;Determining IPv6 information for<br>
&gt; VLAN100_Network... failed.&#39;)<br>
&gt;<br>
&gt; &lt;/error&gt;<br>
&gt;<br>
&gt; Best Regards,<br>
&gt;<br>
&gt; Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br>
&gt; RHCSA | RHCVA | RHCE<br>
&gt; IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br>
&gt;<br>
&gt; OPEN SOURCE - 1 4 011 &amp;&amp; 011 4 1<br>
<br>
</blockquote></div><br></div>



_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic