'Re: [distcc] small redesign...'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       distcc
Subject:    Re: [distcc] small redesign...
From:       Fergus Henderson <fergus () google ! com>
Date:       2014-11-02 10:24:34
Message-ID: CAPXkjd9jpPTtG+xfoAF_5qUWCGYAnan8YpSTGWwXqD4eZHuhCA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

On 1 Nov 2014 15:08, "Łukasz Tasz" <lukasz@tasz.eu> wrote:
>
> Sure, I just made quick fix to test my test case,  and immediately share
it with you.

Sure, understood -- that's great, thanks.

> I will try to send more polite fix:)
> Regards
> lt
>
> 1 lis 2014 09:06 "Fergus Henderson" <fergus@google.com> napisał(a):
>
>> Well, perhaps it would be a good idea to add a distccd flag or
environment variable to control the queue length rather than hard-coding 10
or 256?
>>
>> On 31 Oct 2014 11:37, "Łukasz Tasz" <lukasz@tasz.eu> wrote:
>>>
>>> Hi Guys,
>>>
>>> I'm very very happy, reasons of my failures are identified.
>>> issue is in:
>>> --- src/srvnet.c        (wersja 177)
>>> +++ src/srvnet.c        (kopia robocza)
>>> @@ -99,7 +99,7 @@
>>>      rs_log_info("listening on %s", sa_buf ? sa_buf : "UNKNOWN");
>>>      free(sa_buf);
>>>
>>> -    if (listen(fd, 10)) {
>>> +    if (listen(fd, 256)) {
>>>          rs_log_error("listen failed: %s", strerror(errno));
>>>          close(fd);
>>>          return EXIT_BIND_FAILED;
>>> Index: src/io.c
>>>
>>> queue for new connetcion was minited to 10, that's why in case that
>>> cluster is overloaded, many connection are reseted.
>>> aim is to even wait 5 min for cluster availability, then compile localy.
>>>
>>> @Jarek, thanks for support!
>>>
>>> let's discuss if we should fix it or not.
>>>
>>> regards
>>> Lukasz
>>>
>>>
>>> Łukasz Tasz
>>>
>>>
>>> 2014-10-24 10:27 GMT+02:00 Łukasz Tasz <lukasz@tasz.eu>:
>>> > Hi Martin
>>> >
>>> > What I have noticed.
>>> > Client tries to connect distccd 3 times with 500ms delays in between.
>>> > Linux kernel by default accept 128 connection.
>>> > If client creates connection, even if no executors are avaliable,
>>> > connection is accepted and queued by kernel running distccd.
>>> > This leads to situation that client thinks that distccd is reserved,
>>> > but in fact connection still waits to be accepted by distccd server.
>>> > I suspect that then client starts communication too fast, distcc wont
>>> > receive DIST token, and both sides waits, communication is broken, and
>>> > then timeouts are applied for client default is applied, for server
>>> > there is no defaults.
>>> >
>>> > fail scenarion is:
>>> > one distccd, and two distcc users, both of them will try to compile
>>> > with DISTCC_HOSTS=distccd/1,cpp,lzo, both users have lot of big
>>> > objects, cluster is overloaded with ratio 2.
>>> > This still should be OK, that third, and forth user will join cluster.
>>> >
>>> > Easy reproducer is to set one distcc, and set distcc_hosts=distccd/20,
>>> > this is broken configuration, but simulates overload by 20 - 20
>>> > developers uses cluster in a same time.
>>> > Please remember that those are exceptional situation, but developer
>>> > can start compilation with -j 1000 from his laptop, and cluster will
>>> > timeout, then receiving 1000 jobs on a laptop will end with memmory
>>> > killer :D
>>> > Those are exceptional situation, and somehow cluster should handle
that.
>>> >
>>> > In the attachement, next to some pump changes, you can find change
>>> > which is moving making connection to very beginning, when distcc is
>>> > picking host, also remote connection is made. if this will fail, discc
>>> > follow default behaviour, goes sleep for one sec, and will pick host
>>> > again. But this requires additional administration change on distccd
>>> > machine:
>>> > iptables -I INPUT -p tcp --dport 3632 -m connlimit --connlimit-above
>>> > <NUMBER OF DISTCCD> --connlimit-mask 0 -j REJECT --reject-with
>>> > tcp-reset
>>> > which accept only number of connection which equals to number of
executors.
>>> >
>>> > So far so good!
>>> > remark, patch is done on top of arankine_distcc_issue16-r335, since
>>> > his pump changes are making pump mode working on my environment.
>>> > But distccd allocation I tested also on latest official distcc
release.
>>> >
>>> > let me know what you think!
>>> >
>>> > with best regards
>>> > Lukasz
>>> >
>>> >
>>> >
>>> > Łukasz Tasz
>>> >
>>> >
>>> > 2014-10-24 2:42 GMT+02:00 Martin Pool <mbp@sourcefrog.net>:
>>> >> It seems like if there's nowhere to execute the job, we want the
client
>>> >> program to just pause, before using too many resources, until it gets
>>> >> unqueued by a server ready to do the job. (Or, by a local slot being
>>> >> available.)
>>> >>
>>> >>
>>> >> On Thu Oct 16 2014 at 2:43:35 AM Łukasz Tasz <lukasz@tasz.eu> wrote:
>>> >>>
>>> >>> Hi Martin,
>>> >>>
>>> >>> Lets assume that you can trigger more compilation tasks executors
then you
>>> >>> have.
>>> >>> In this scenario you are facing situation that cluster is saturated.
>>> >>> When such a compilation will be triggered by two developers, or two
CI
>>> >>> (e.g jenkins) jobs, then cluster is saturated twice...
>>> >>>
>>> >>> Default behaviour is to lock locally slot, and try to connect three
>>> >>> times, if not, fallback, if fallback is disabled CI got failed build
>>> >>> (fallback is not the case, since local machine cannot handle -j
>>> >>> $(distcc -j)).
>>> >>>
>>> >>> consider scenario, I have 1000 objects, 500 executors,
>>> >>> - clean build on one machine takes
>>> >>>   1000 * 20 sec (one obj) = 20000 / 16 processors = 1000 sec,
>>> >>> - on cluster (1000/500) * 20 sec = 40 sec
>>> >>>
>>> >>> Saturating cluster was impossible without pump mode, but now with
pump
>>> >>> mode after "warm up" effect, pump can dispatch many tasks, and I
faced
>>> >>> situation that saturated cluster destroys almost  every compilation.
>>> >>>
>>> >>> My expectation is that cluster wont reject my connect, or reject
will
>>> >>> be handled, either by client, either by server.
>>> >>>
>>> >>> by server:
>>> >>> - accept every connetion,
>>> >>> - fork child if not accepted by child,
>>> >>> - in case of pump prepare local dir structure, receive headers
>>> >>> - --critical section starts here-- multi value semaphore with value
>>> >>> maxchild
>>> >>>   - execute job
>>> >>> - release semaphore
>>> >>>
>>> >>>
>>> >>> Also what you suggested may be even better solution, since client
will
>>> >>> pick first avaliable executor instead of entering queue, so distcc
>>> >>> could make connection already in function dcc_lock_one()
>>> >>>
>>> >>> I already tried to set DISTCC_DIR on a common nfs share, but in case
>>> >>> you are triggering so many jobs, this started to be bottle neck... I
>>> >>> won't tell about locking on nfs, and also scenario that somebody
will
>>> >>> make a lock on nfs and machine will got crash - will not work by
>>> >>> design :)
>>> >>>
>>> >>> I know that scenario is not happening very often, and it has more or
>>> >>> less picks characteristic, but we should be happy that distcc
cluster
>>> >>> is saturated and this case should be handled.
>>> >>>
>>> >>> hope it's more clear now!
>>> >>> br
>>> >>> LT
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Łukasz Tasz
>>> >>>
>>> >>>
>>> >>> 2014-10-16 1:39 GMT+02:00 Martin Pool <mbp@sourcefrog.net>:
>>> >>> > Can you try to explain more clearly what difference in queueing
behavior
>>> >>> > you
>>> >>> > expect from this change?
>>> >>> >
>>> >>> > I think probably the main change that's needed is for the client
to ask
>>> >>> > all
>>> >>> > masters if they have space, to avoid needing to effectively poll
by
>>> >>> > retrying, or getting stuck waiting for a particular server.
>>> >>> >
>>> >>> > On Wed, Oct 15, 2014 at 12:53 PM, Łukasz Tasz <lukasz@tasz.eu>
wrote:
>>> >>> >>
>>> >>> >> Hi Guys,
>>> >>> >>
>>> >>> >> please correct me if I'm wrong,
>>> >>> >> - currently distcc tries to connect server 3 times, with small
delay,
>>> >>> >> - server forks x childs and all of them are trying to accept
incoming
>>> >>> >> connection.
>>> >>> >> If server runs out of childs (all of them are busy), client will
>>> >>> >> fallback, and within next 60 sec will not try this machine.
>>> >>> >>
>>> >>> >> What do you think about redesigning distcc in a way that master
server
>>> >>> >> will always accept inconing connection, fork a child, but in a
same
>>> >>> >> time only x of them will be able to enter compilation
>>> >>> >> task(dcc_spawn_child)? (mayby preforking still could be used?)
>>> >>> >>
>>> >>> >> This may create kind of queue, client always can decide by his
own, if
>>> >>> >> can wait some  time, or maximum is DISTCC_IO_TIMEOUT, but still
it's
>>> >>> >> faster to wait, since probably on a cluster side it's just a
pick of
>>> >>> >> saturation then making falback to local machine.
>>> >>> >>
>>> >>> >> currently I'm facing situation that many jobs are making
fallback, and
>>> >>> >> localmachine is being killed by make's -j calculated for
distccd...
>>> >>> >>
>>> >>> >> other trick maybe to pick different machine, if current is busy,
but
>>> >>> >> this may be much more complex in my opinion.
>>> >>> >>
>>> >>> >> what do you think?
>>> >>> >> regards
>>> >>> >> Łukasz Tasz
>>> >>> >> __
>>> >>> >> distcc mailing list            http://distcc.samba.org/
>>> >>> >> To unsubscribe or change options:
>>> >>> >> https://lists.samba.org/mailman/listinfo/distcc
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Martin
>>> __
>>> distcc mailing list            http://distcc.samba.org/
>>> To unsubscribe or change options:
>>> https://lists.samba.org/mailman/listinfo/distcc

[Attachment #5 (text/html)]

<p dir="ltr"><br>
On 1 Nov 2014 15:08, &quot;Łukasz Tasz&quot; &lt;<a \
href="mailto:lukasz@tasz.eu">lukasz@tasz.eu</a>&gt; wrote:<br> &gt;<br>
&gt; Sure, I just made quick fix to test my test case,   and immediately share it \
with you.</p> <p dir="ltr">Sure, understood -- that&#39;s great, thanks.</p>
<p dir="ltr">&gt; I will try to send more polite fix:)<br>
&gt; Regards<br>
&gt; lt<br>
&gt;<br>
&gt; 1 lis 2014 09:06 &quot;Fergus Henderson&quot; &lt;<a \
href="mailto:fergus@google.com">fergus@google.com</a>&gt; napisał(a):<br> &gt;<br>
&gt;&gt; Well, perhaps it would be a good idea to add a distccd flag or environment \
variable to control the queue length rather than hard-coding 10 or 256?<br> \
&gt;&gt;<br> &gt;&gt; On 31 Oct 2014 11:37, &quot;Łukasz Tasz&quot; &lt;<a \
href="mailto:lukasz@tasz.eu">lukasz@tasz.eu</a>&gt; wrote:<br> &gt;&gt;&gt;<br>
&gt;&gt;&gt; Hi Guys,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I&#39;m very very happy, reasons of my failures are identified.<br>
&gt;&gt;&gt; issue is in:<br>
&gt;&gt;&gt; --- src/srvnet.c            (wersja 177)<br>
&gt;&gt;&gt; +++ src/srvnet.c            (kopia robocza)<br>
&gt;&gt;&gt; @@ -99,7 +99,7 @@<br>
&gt;&gt;&gt;         rs_log_info(&quot;listening on %s&quot;, sa_buf ? sa_buf : \
&quot;UNKNOWN&quot;);<br> &gt;&gt;&gt;         free(sa_buf);<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; -      if (listen(fd, 10)) {<br>
&gt;&gt;&gt; +      if (listen(fd, 256)) {<br>
&gt;&gt;&gt;               rs_log_error(&quot;listen failed: %s&quot;, \
strerror(errno));<br> &gt;&gt;&gt;               close(fd);<br>
&gt;&gt;&gt;               return EXIT_BIND_FAILED;<br>
&gt;&gt;&gt; Index: src/io.c<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; queue for new connetcion was minited to 10, that&#39;s why in case \
that<br> &gt;&gt;&gt; cluster is overloaded, many connection are reseted.<br>
&gt;&gt;&gt; aim is to even wait 5 min for cluster availability, then compile \
localy.<br> &gt;&gt;&gt;<br>
&gt;&gt;&gt; @Jarek, thanks for support!<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; let&#39;s discuss if we should fix it or not.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; regards<br>
&gt;&gt;&gt; Lukasz<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Łukasz Tasz<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 2014-10-24 10:27 GMT+02:00 Łukasz Tasz &lt;<a \
href="mailto:lukasz@tasz.eu">lukasz@tasz.eu</a>&gt;:<br> &gt;&gt;&gt; &gt; Hi \
Martin<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; What I have noticed.<br>
&gt;&gt;&gt; &gt; Client tries to connect distccd 3 times with 500ms delays in \
between.<br> &gt;&gt;&gt; &gt; Linux kernel by default accept 128 connection.<br>
&gt;&gt;&gt; &gt; If client creates connection, even if no executors are \
avaliable,<br> &gt;&gt;&gt; &gt; connection is accepted and queued by kernel running \
distccd.<br> &gt;&gt;&gt; &gt; This leads to situation that client thinks that \
distccd is reserved,<br> &gt;&gt;&gt; &gt; but in fact connection still waits to be \
accepted by distccd server.<br> &gt;&gt;&gt; &gt; I suspect that then client starts \
communication too fast, distcc wont<br> &gt;&gt;&gt; &gt; receive DIST token, and \
both sides waits, communication is broken, and<br> &gt;&gt;&gt; &gt; then timeouts \
are applied for client default is applied, for server<br> &gt;&gt;&gt; &gt; there is \
no defaults.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; fail scenarion is:<br>
&gt;&gt;&gt; &gt; one distccd, and two distcc users, both of them will try to \
compile<br> &gt;&gt;&gt; &gt; with DISTCC_HOSTS=distccd/1,cpp,lzo, both users have \
lot of big<br> &gt;&gt;&gt; &gt; objects, cluster is overloaded with ratio 2.<br>
&gt;&gt;&gt; &gt; This still should be OK, that third, and forth user will join \
cluster.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Easy reproducer is to set one distcc, and set \
distcc_hosts=distccd/20,<br> &gt;&gt;&gt; &gt; this is broken configuration, but \
simulates overload by 20 - 20<br> &gt;&gt;&gt; &gt; developers uses cluster in a same \
time.<br> &gt;&gt;&gt; &gt; Please remember that those are exceptional situation, but \
developer<br> &gt;&gt;&gt; &gt; can start compilation with -j 1000 from his laptop, \
and cluster will<br> &gt;&gt;&gt; &gt; timeout, then receiving 1000 jobs on a laptop \
will end with memmory<br> &gt;&gt;&gt; &gt; killer :D<br>
&gt;&gt;&gt; &gt; Those are exceptional situation, and somehow cluster should handle \
that.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; In the attachement, next to some pump changes, you can find \
change<br> &gt;&gt;&gt; &gt; which is moving making connection to very beginning, \
when distcc is<br> &gt;&gt;&gt; &gt; picking host, also remote connection is made. if \
this will fail, discc<br> &gt;&gt;&gt; &gt; follow default behaviour, goes sleep for \
one sec, and will pick host<br> &gt;&gt;&gt; &gt; again. But this requires additional \
administration change on distccd<br> &gt;&gt;&gt; &gt; machine:<br>
&gt;&gt;&gt; &gt; iptables -I INPUT -p tcp --dport 3632 -m connlimit \
--connlimit-above<br> &gt;&gt;&gt; &gt; &lt;NUMBER OF DISTCCD&gt; --connlimit-mask 0 \
-j REJECT --reject-with<br> &gt;&gt;&gt; &gt; tcp-reset<br>
&gt;&gt;&gt; &gt; which accept only number of connection which equals to number of \
executors.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; So far so good!<br>
&gt;&gt;&gt; &gt; remark, patch is done on top of arankine_distcc_issue16-r335, \
since<br> &gt;&gt;&gt; &gt; his pump changes are making pump mode working on my \
environment.<br> &gt;&gt;&gt; &gt; But distccd allocation I tested also on latest \
official distcc release.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; let me know what you think!<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; with best regards<br>
&gt;&gt;&gt; &gt; Lukasz<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Łukasz Tasz<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; 2014-10-24 2:42 GMT+02:00 Martin Pool &lt;<a \
href="mailto:mbp@sourcefrog.net">mbp@sourcefrog.net</a>&gt;:<br> &gt;&gt;&gt; \
&gt;&gt; It seems like if there&#39;s nowhere to execute the job, we want the \
client<br> &gt;&gt;&gt; &gt;&gt; program to just pause, before using too many \
resources, until it gets<br> &gt;&gt;&gt; &gt;&gt; unqueued by a server ready to do \
the job. (Or, by a local slot being<br> &gt;&gt;&gt; &gt;&gt; available.)<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; On Thu Oct 16 2014 at 2:43:35 AM Łukasz Tasz &lt;<a \
href="mailto:lukasz@tasz.eu">lukasz@tasz.eu</a>&gt; wrote:<br> &gt;&gt;&gt; \
&gt;&gt;&gt;<br> &gt;&gt;&gt; &gt;&gt;&gt; Hi Martin,<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; Lets assume that you can trigger more compilation tasks \
executors then you<br> &gt;&gt;&gt; &gt;&gt;&gt; have.<br>
&gt;&gt;&gt; &gt;&gt;&gt; In this scenario you are facing situation that cluster is \
saturated.<br> &gt;&gt;&gt; &gt;&gt;&gt; When such a compilation will be triggered by \
two developers, or two CI<br> &gt;&gt;&gt; &gt;&gt;&gt; (e.g jenkins) jobs, then \
cluster is saturated twice...<br> &gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; Default behaviour is to lock locally slot, and try to \
connect three<br> &gt;&gt;&gt; &gt;&gt;&gt; times, if not, fallback, if fallback is \
disabled CI got failed build<br> &gt;&gt;&gt; &gt;&gt;&gt; (fallback is not the case, \
since local machine cannot handle -j<br> &gt;&gt;&gt; &gt;&gt;&gt; $(distcc -j)).<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; consider scenario, I have 1000 objects, 500 executors,<br>
&gt;&gt;&gt; &gt;&gt;&gt; - clean build on one machine takes<br>
&gt;&gt;&gt; &gt;&gt;&gt;     1000 * 20 sec (one obj) = 20000 / 16 processors = 1000 \
sec,<br> &gt;&gt;&gt; &gt;&gt;&gt; - on cluster (1000/500) * 20 sec = 40 sec<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; Saturating cluster was impossible without pump mode, but \
now with pump<br> &gt;&gt;&gt; &gt;&gt;&gt; mode after &quot;warm up&quot; effect, \
pump can dispatch many tasks, and I faced<br> &gt;&gt;&gt; &gt;&gt;&gt; situation \
that saturated cluster destroys almost   every compilation.<br> &gt;&gt;&gt; \
&gt;&gt;&gt;<br> &gt;&gt;&gt; &gt;&gt;&gt; My expectation is that cluster wont reject \
my connect, or reject will<br> &gt;&gt;&gt; &gt;&gt;&gt; be handled, either by \
client, either by server.<br> &gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; by server:<br>
&gt;&gt;&gt; &gt;&gt;&gt; - accept every connetion,<br>
&gt;&gt;&gt; &gt;&gt;&gt; - fork child if not accepted by child,<br>
&gt;&gt;&gt; &gt;&gt;&gt; - in case of pump prepare local dir structure, receive \
headers<br> &gt;&gt;&gt; &gt;&gt;&gt; - --critical section starts here-- multi value \
semaphore with value<br> &gt;&gt;&gt; &gt;&gt;&gt; maxchild<br>
&gt;&gt;&gt; &gt;&gt;&gt;     - execute job<br>
&gt;&gt;&gt; &gt;&gt;&gt; - release semaphore<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; Also what you suggested may be even better solution, since \
client will<br> &gt;&gt;&gt; &gt;&gt;&gt; pick first avaliable executor instead of \
entering queue, so distcc<br> &gt;&gt;&gt; &gt;&gt;&gt; could make connection already \
in function dcc_lock_one()<br> &gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; I already tried to set DISTCC_DIR on a common nfs share, \
but in case<br> &gt;&gt;&gt; &gt;&gt;&gt; you are triggering so many jobs, this \
started to be bottle neck... I<br> &gt;&gt;&gt; &gt;&gt;&gt; won&#39;t tell about \
locking on nfs, and also scenario that somebody will<br> &gt;&gt;&gt; &gt;&gt;&gt; \
make a lock on nfs and machine will got crash - will not work by<br> &gt;&gt;&gt; \
&gt;&gt;&gt; design :)<br> &gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; I know that scenario is not happening very often, and it \
has more or<br> &gt;&gt;&gt; &gt;&gt;&gt; less picks characteristic, but we should be \
happy that distcc cluster<br> &gt;&gt;&gt; &gt;&gt;&gt; is saturated and this case \
should be handled.<br> &gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; hope it&#39;s more clear now!<br>
&gt;&gt;&gt; &gt;&gt;&gt; br<br>
&gt;&gt;&gt; &gt;&gt;&gt; LT<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; Łukasz Tasz<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; 2014-10-16 1:39 GMT+02:00 Martin Pool &lt;<a \
href="mailto:mbp@sourcefrog.net">mbp@sourcefrog.net</a>&gt;:<br> &gt;&gt;&gt; \
&gt;&gt;&gt; &gt; Can you try to explain more clearly what difference in queueing \
behavior<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt; you<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; expect from this change?<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; I think probably the main change that&#39;s needed is \
for the client to ask<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt; all<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; masters if they have space, to avoid needing to \
effectively poll by<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt; retrying, or getting stuck \
waiting for a particular server.<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; On Wed, Oct 15, 2014 at 12:53 PM, Łukasz Tasz &lt;<a \
href="mailto:lukasz@tasz.eu">lukasz@tasz.eu</a>&gt; wrote:<br> &gt;&gt;&gt; \
&gt;&gt;&gt; &gt;&gt;<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; Hi Guys,<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; please correct me if I&#39;m wrong,<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; - currently distcc tries to connect server 3 \
times, with small delay,<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; - server forks x \
childs and all of them are trying to accept incoming<br> &gt;&gt;&gt; &gt;&gt;&gt; \
&gt;&gt; connection.<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; If server runs out of \
childs (all of them are busy), client will<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; \
fallback, and within next 60 sec will not try this machine.<br> &gt;&gt;&gt; \
&gt;&gt;&gt; &gt;&gt;<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; What do you think about \
redesigning distcc in a way that master server<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; \
will always accept inconing connection, fork a child, but in a same<br> &gt;&gt;&gt; \
&gt;&gt;&gt; &gt;&gt; time only x of them will be able to enter compilation<br> \
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; task(dcc_spawn_child)? (mayby preforking still \
could be used?)<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; This may create kind of queue, client always can \
decide by his own, if<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; can wait some   time, or \
maximum is DISTCC_IO_TIMEOUT, but still it&#39;s<br> &gt;&gt;&gt; &gt;&gt;&gt; \
&gt;&gt; faster to wait, since probably on a cluster side it&#39;s just a pick of<br> \
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; saturation then making falback to local \
machine.<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; currently I&#39;m facing situation that many jobs \
are making fallback, and<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; localmachine is being \
killed by make&#39;s -j calculated for distccd...<br> &gt;&gt;&gt; &gt;&gt;&gt; \
&gt;&gt;<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; other trick maybe to pick different \
machine, if current is busy, but<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; this may be \
much more complex in my opinion.<br> &gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; what do you think?<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; regards<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; Łukasz Tasz<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; __<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;&gt; distcc mailing list                  <a \
href="http://distcc.samba.org/">http://distcc.samba.org/</a><br> &gt;&gt;&gt; \
&gt;&gt;&gt; &gt;&gt; To unsubscribe or change options:<br> &gt;&gt;&gt; &gt;&gt;&gt; \
&gt;&gt; <a href="https://lists.samba.org/mailman/listinfo/distcc">https://lists.samba.org/mailman/listinfo/distcc</a><br>
 &gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; --<br>
&gt;&gt;&gt; &gt;&gt;&gt; &gt; Martin<br>
&gt;&gt;&gt; __<br>
&gt;&gt;&gt; distcc mailing list                  <a \
href="http://distcc.samba.org/">http://distcc.samba.org/</a><br> &gt;&gt;&gt; To \
unsubscribe or change options:<br> &gt;&gt;&gt; <a \
href="https://lists.samba.org/mailman/listinfo/distcc">https://lists.samba.org/mailman/listinfo/distcc</a></p>

__
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/distcc

[prev in list] [next in list] [prev in thread] [next in thread]