[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-commits
Subject:    =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Hive/JoinOptimization=22_by_Liyi?=
From:       Apache Wiki <wikidiffs () apache ! org>
Date:       2010-11-30 22:04:01
Message-ID: 20101130220401.63388.1075 () eosnew ! apache ! org
[Download RAW message or body]

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change \
notification.

The "Hive/JoinOptimization" page has been changed by LiyinTang.
http://wiki.apache.org/hadoop/Hive/JoinOptimization?action=diff&rev1=4&rev2=5

--------------------------------------------------

  == 1.1 Using Distributed Cache to Propagate Hashtable File ==
  Previously, when 2 large data tables need to do a join, there will be 2 different \
Mappers to sort these tables based on the join key and emit an intermediate file, and \
the Reducer will take the intermediate file as input file and do the real join work. \
This join mechanism is perfect with two large data size. But if one the join table is \
small enough to fit into the Mapper's memory, then there is no need to launch the \
Reducer. Actually, the Reducer stage is very expensive for the performance because \
the Map/Reduce framework needs to sort and merge the intermediate files.  
- {{attachment:fig1.jpg||height="764px",width="911px"}}
+ {{attachment:fig1.jpg||height="763px",width="909px"}}
  
  '''Fig 1. The Previous Map Join'''
  
  So the basic idea of map join is to hold the data of small table in Mapper's memory \
and do the join work in Map stage, which saves the Reduce stage. As shown in Fig 1, \
the previous map join operation is not scale for large data because each Mapper will \
directly read the small table data from HDFS. If the large data file is large enough, \
there will be thousands of Mapper launched to read different record of this large \
data file. And thousands of Mapper will read this small table data from HDFS into the \
memory, which makes the small table to be the performance bottleneck, or sometimes \
Mapper will get lots of time-out for reading this small file, which may cause the \
task failed.  
- {{attachment:fig2.jpg}}
+ {{attachment:fig2.jpg||height="881px",width="1184px"}}
  
  Hive-1641 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]]) \
has solved this problem. As shown in Fig2, the basic idea is to create a new task, \
MapReduce Local Task, before the orginal Join Map/Reduce Task. This new task will \
read the small table data from HDFS to in-memory hashtable. After reading, it will \
serialize the in-memory hashtable into files on disk and compress the hashtable file \
into a tar file. In next stage, when the MapReduce task is launching, it will put \
this tar file to Hadoop Distributed Cache, which will populate the tar file to each \
Mapper's local disk and decompress the file. So all the Mappers can deserialize the \
hashtable file back into memory and do the join work as before.  
  == 1.2 Removing JDBM ==
- When profiing the Map Join,
+ Previously, Hive uses JDBM \
([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]]) as \
a persistent hashtable. Whenever the in-memory hashtable cannot hold data any more, \
it will swap the key/value into the JDBM table. However when profiing the Map Join, \
we found out this JDBM component takes more than 70 % CPU time as shown in Fig3. Also \
the persistent file JDBM genreated is too large to put into the Distributed Cache. \
For example, if users put 67,000 simple interger key/value pairs into the JDBM, it \
will generate more 22M hashtable file. So the JDBM is too heavy weight for Map Join \
and it would better to remove this componet from Hive. Map Join is designed for \
holding the small table's data into memory. If the table is too large to hold, just \
run as a Common Join. There is no need to use persistent hashtable any more.  
  == 1.3 Performance Evaluation ==
  = 2. Converting Join into Map Join dyanmically =


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic