[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Equally split a RDD partition into two partition at the same node
From:       Fei Hu <hufei68 () gmail ! com>
Date:       2017-01-15 16:36:09
Message-ID: CANaLfm8zQRLm_REiHg95ViTAjuYCg4F3ZnWCGOYTnin8t292qQ () mail ! gmail ! com
[Download RAW message or body]

Hi Rishi,

Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data locality.

Thanks,
Fei


On Sun, Jan 15, 2017 at 2:33 AM, Rishi Yadav <rishi@infoobjects.com> wrote:

> Can you provide some more details:
> 1. How many partitions does RDD have
> 2. How big is the cluster
> On Sat, Jan 14, 2017 at 3:59 PM Fei Hu <hufei68@gmail.com> wrote:
>
>> Dear all,
>>
>> I want to equally divide a RDD partition into two partitions. That means,
>> the first half of elements in the partition will create a new partition,
>> and the second half of elements in the partition will generate another new
>> partition. But the two new partitions are required to be at the same node
>> with their parent partition, which can help get high data locality.
>>
>> Is there anyone who knows how to implement it or any hints for it?
>>
>> Thanks in advance,
>> Fei
>>
>>

[Attachment #3 (text/html)]

<div dir="ltr">Hi Rishi,<div><br></div><div>Thanks for your reply! The RDD has 24 \
partitions, and the cluster has a master node + 24 computing nodes (12 cores per \
node). Each node will have a partition, and I want to split each partition to two \
sub-partitions on the same node to improve the parallelism and achieve high data \
locality.</div><div><br></div><div>Thanks,</div><div>Fei</div><div><br></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Sun, Jan 15, 2017 at 2:33 AM, \
Rishi Yadav <span dir="ltr">&lt;<a href="mailto:rishi@infoobjects.com" \
target="_blank">rishi@infoobjects.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Can you provide some more details:<br>1. How many partitions \
does RDD have<br>2. How big is the cluster <br><div class="HOEnZb"><div \
class="h5"><div class="gmail_quote"><div dir="ltr">On Sat, Jan 14, 2017 at 3:59 PM \
Fei Hu &lt;<a href="mailto:hufei68@gmail.com" \
target="_blank">hufei68@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr" class="m_5352579092603065224gmail_msg">Dear \
all,<div class="m_5352579092603065224gmail_msg"><br \
class="m_5352579092603065224gmail_msg"></div><div \
class="m_5352579092603065224gmail_msg">I want to equally divide  a RDD partition into \
two partitions. That means, the first half of elements in the partition will create a \
new partition, and the second half of elements in the partition will generate another \
new partition. But the two new partitions are required to be at the same node with \
their parent partition, which can help get high data locality.</div><div \
class="m_5352579092603065224gmail_msg"><br \
class="m_5352579092603065224gmail_msg"></div><div \
class="m_5352579092603065224gmail_msg">Is there anyone who knows how to implement it \
or any hints for it?</div><div class="m_5352579092603065224gmail_msg"><br \
class="m_5352579092603065224gmail_msg"></div><div \
class="m_5352579092603065224gmail_msg">Thanks in advance,</div><div \
class="m_5352579092603065224gmail_msg">Fei</div><div \
class="m_5352579092603065224gmail_msg"><br \
class="m_5352579092603065224gmail_msg"></div></div> </blockquote></div>
</div></div></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic