[prev in list] [next in list] [prev in thread] [next in thread] 

List:       fedora-extras-commits
Subject:    adamwill pushed to openqa (master). "Patch an issue in job grabbing that could cause workers to stal
From:       notifications () fedoraproject ! org
Date:       2017-07-31 23:08:58
Message-ID: 20170731230858.C0F876074A77 () bastion01 ! phx2 ! fedoraproject ! org
[Download RAW message or body]

From 6f43bafe5096d7114749f997fb7d5b4433ec8913 Mon Sep 17 00:00:00 2001
From: Adam Williamson <awilliam@redhat.com>
Date: Mon, 31 Jul 2017 16:08:45 -0700
Subject: Patch an issue in job grabbing that could cause workers to stall

---
 1410.patch  | 40 ++++++++++++++++++++++++++++++++++++++++
 openqa.spec |  9 ++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 1410.patch

diff --git a/1410.patch b/1410.patch
new file mode 100644
index 0000000..aedc2da
--- /dev/null
+++ b/1410.patch
@@ -0,0 +1,40 @@
+From e760214d7db8a8c8d3aaf70fae76bf743f72ef61 Mon Sep 17 00:00:00 2001
+From: Adam Williamson <awilliam@redhat.com>
+Date: Mon, 31 Jul 2017 12:32:25 -0700
+Subject: [PATCH] check_job: check if we really got a job hash, not an error
+
+After updating an openQA instance to current git, I noticed that
+many workers stopped picking up jobs. Looking at their logs, I
+see a bunch of errors indicating that first the line modified
+in this commit (the `$job->{id}` part) and then a line in
+`websocket_commands` (where it does `$job->{URL}`) are trying to
+use a string as a hash ref. In most cases the string was some
+sort of DBus error, but in one case the string was *itself* an
+error about trying to use a string as a hash ref (holy recursion,
+batman).
+
+It seems like what's going on is that, occasionally, when
+`check_job` tries to grab a job, it gets back a response that is
+ not a proper job hash ref, but some kind of error string. We
+can also look at the `grab_job` end of this problem and see if
+we can stop it from doing this, but making this check more
+robust should also help (it should cause the `else` clause to
+kick in properly and make the worker ignore the response and
+keep polling for a job on the next tick).
+---
+ lib/OpenQA/Worker/Jobs.pm | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/lib/OpenQA/Worker/Jobs.pm b/lib/OpenQA/Worker/Jobs.pm
+index 06eeee143..3af443bd2 100644
+--- a/lib/OpenQA/Worker/Jobs.pm
++++ b/lib/OpenQA/Worker/Jobs.pm
+@@ -112,7 +112,7 @@ sub check_job {
+             my ($res) = @_;
+             return unless ($res);
+             $job = $res->{job};
+-            if ($job && $job->{id}) {
++            if ($job && ref($job) eq "HASH" && $job->{id}) {
+                 Mojo::IOLoop->next_tick(sub { start_job($host) });
+             }
+             else {
diff --git a/openqa.spec b/openqa.spec
index d283b45..9acc080 100644
--- a/openqa.spec
+++ b/openqa.spec
@@ -40,7 +40,7 @@ systemd-tmpfiles --create %{?*} >/dev/null 2>&1 || :
 
 Name:           openqa
 Version:        %{github_version}
-Release:        52%{?github_date:.%{github_date}git%{shortcommit}}%{?dist}
+Release:        53%{?github_date:.%{github_date}git%{shortcommit}}%{?dist}
 Summary:        OS-level automated testing framework
 License:        GPLv2+
 Url:            http://os-autoinst.github.io/openQA/
@@ -59,6 +59,10 @@ Source2:        FedoraUpdateRestart.pm
 # at least. It also insulates us from general wackiness...
 Patch0:         0001-Duplicate-jobs-on-died-as-well-as-quit.patch
 
+# Avoid workers stalling when job grabbing goes wonky
+# https://github.com/os-autoinst/openQA/pull/1410
+Patch1:         1410.patch
+
 # For the httpd subpackage split in 4.3-7, needed for updates to work right
 Obsoletes:      openqa < 4.3-7
 
@@ -454,6 +458,9 @@ fi
 %{_datadir}/openqa/lib/OpenQA/WebAPI/Plugin/FedoraUpdateRestart.pm
 
 %changelog
+* Mon Jul 31 2017 Adam Williamson <awilliam@redhat.com> - 4.4-53.20170730git4c72a17
+- Patch an issue in job grabbing that could cause workers to stall
+
 * Thu Jul 27 2017 Adam Williamson <awilliam@redhat.com> - 4.4-52.20170730git4c72a17
 - Update to latest git, drop merged patches
 - Merge latest SUSE spec file changes
-- 
cgit v1.1


	https://src.fedoraproject.org/cgit/openqa.git/commit/?h=master&id=6f43bafe5096d7114749f997fb7d5b4433ec8913
 _______________________________________________
scm-commits mailing list -- scm-commits@lists.fedoraproject.org
To unsubscribe send an email to scm-commits-leave@lists.fedoraproject.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic