'Re: mapreduce application which only has 17000+ map tasks runs very slow on yarn after 16000+ map co'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: mapreduce application which only has 17000+ map tasks runs very slow on yarn after 16000+ map co
From:       Alexey Eremikhin <a.eremihin () corp ! badoo ! com ! INVALID>
Date:       2018-09-13 8:30:27
Message-ID: eb5a9f41-2496-3745-2190-be12d4395a7b () corp ! badoo ! com
[Download RAW message or body]

We were experiencing a similar issue with fair scheduler dynamic allocation.

In our case there were most of resources allocated to application 
reducers and mappers did not have enough resources to start.

That was cleanly seen on MR Application master page.

The solution for it was to specify 
mapreduce.job.reduce.slowstart.completedmaps to 1. Yes it might a bit 
delay short queries but for large queries that is essential to allocate 
enough resources for mappers



On 13.09.2018 03:51, esri_lxc@sina.com wrote:
> Hi everyone!
>
>         I'm running a simple sql(select xx,xx... from viewXXX where 
> xxxxx) using hive0.13.1 on hadoop2.6.0(the framework is MRv2, not 
> tez). After submitting it, I find that it's a MR job which has only 
> 17000+ map tasks and no reduce tasks.
>
>         The job runs very quickly in the early 15 minutes(all 400+ 
> containers on my cluster(20+ nodes) are allocated to run tasks during 
> this period), but become very slow after that(no other jobs running on 
> my cluster).
>
>         I run it a couple of times and find that the number of 
> containers allocated to the job decreases(not strictly but roughly) as 
> the time go on, and after about 15 minutes the number of containers 
> allocated to the job becomes 1(which is the ApplicationMaster's 
> container)! Then the AM is always waitting for RM to give it a 
> container to run map task. RM is not busy(no much GC) and has a lot of 
> containers available(I find that in the RM log), but it assign AM 1 
> container per MINUTE. So the job finally takes 7 hours to finish.:(
>
>        Parts of my AM,RM log is in the attachment.
>        Any help will be appreciated!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org


[Attachment #3 (multipart/related)]

[Attachment #5 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>We were experiencing a similar issue with fair scheduler dynamic
      allocation.</p>
    <p>In our case there were most of resources allocated to application
      reducers and mappers did not have enough resources to start. <br>
    </p>
    <p>That was cleanly seen on MR Application master page.</p>
    <p><img src="cid:part1.C1A499B8.DF0037A5@corp.badoo.com" alt=""
        height="268" width="883"></p>
    <p>The solution for it was to specify <span class="description"
data-reactid=".0.1.1.1.1.0.0.$1_2397@chat=1btf=1hipchat=1com.0.1.$WednesdaySeptember5, \
2018.1:$msg-group-aa810768-5121-4451-9234-fe444d644a12.1.2:$msg-wrapper-aa810768-5121- \
4451-9234-fe444d644a12.$msg-aa810768-5121-4451-9234-fe444d644a12.3">mapreduce.job.reduce.slowstart.completedmaps
  to 1. Yes it might a bit delay short queries but for large
        queries that is essential to allocate enough resources for mappers<br>
      </span></p>
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 13.09.2018 03:51, <a \
class="moz-txt-link-abbreviated" \
href="mailto:esri_lxc@sina.com">esri_lxc@sina.com</a>  wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:20180913005127.0F86F5D00098@webmail.sinamail.sina.com.cn">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      Hi everyone!
      <div class="">
        <div class="" itemprop="text">
          <p>        I'm running a simple sql(select xx,xx... from
            viewXXX where xxxxx) using hive0.13.1 on hadoop2.6.0(the
            framework is MRv2, not tez). After submitting it, I find
            that it's a MR job which has only 17000+ map tasks and no
            reduce tasks.</p>
          <p>        The job runs very quickly in the early 15
            minutes(all 400+ containers on my cluster(20+ nodes) are
            allocated to run tasks during this period), but become very
            slow after that(no other jobs running on my cluster). </p>
          <p>        I run it a couple of times and find that the number
            of containers allocated to the job decreases(not strictly
            but roughly) as the time go on, and after about 15 minutes
            the number of containers allocated to the job becomes
            1(which is the ApplicationMaster's container)! Then the AM
            is always waitting for RM to give it a container to run map
            task. RM is not busy(no much GC) and has a lot of containers
            available(I find that in the RM log), but it assign AM 1
            container per MINUTE. So the job finally takes 7 hours to
            finish.:(</p>
          <div>       Parts of my AM,RM log is in the attachment.</div>
          <div>       Any help will be appreciated!</div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">
---------------------------------------------------------------------
To unsubscribe, e-mail: <a class="moz-txt-link-abbreviated" \
href="mailto:user-unsubscribe@hadoop.apache.org">user-unsubscribe@hadoop.apache.org</a>
 For additional commands, e-mail: <a class="moz-txt-link-abbreviated" \
href="mailto:user-help@hadoop.apache.org">user-help@hadoop.apache.org</a></pre>  \
</blockquote>  <br>
  </body>
</html>


["ldpeijbepkhliidd.png" (image/png)]

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic