'Re: WordCount MapReduce error'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: WordCount MapReduce error
From:       Ravi Prakash <ravihadoop () gmail ! com>
Date:       2017-02-23 21:23:47
Message-ID: CAMs9kVjyCHZuXC5j0cy1spB8aGpykaDVUDKO9-OCuWVfRdtmQw () mail ! gmail ! com
[Download RAW message or body]

Hi Vasil!

Thanks a lot for replying with your solution. Hopefully someone else will
find it useful. I know that the pi example (amongst others) is in
hadoop-mapreduce-examples-2.7.3.jar . I'm sorry I do not know of
Matrix-vector multiplication bundled in the Apache Hadoop source. I'm sure
lots of people on github may have tried that though.

Glad it worked for you finally! :-)
Regards
Ravi

On Thu, Feb 23, 2017 at 5:41 AM, Васил Григоров <vaskogr@abv.bg> wrote:

> Dear Ravi,
> 
> Even though I was unable to understand most of what you suggested me to
> try due to my lack of experience in the field, one of your suggestions did
> guide me in the right direction and I was able to solve my error. I decided
> to share it as you mentioned that you're adding this conversation to user
> mailing list for other people to see in case they run into a similiar
> problem.
> 
> It turns out that my Windows username being consisted of 2 words: "Vasil
> Grigorov" has messed up the paths for the application somewhere due to the
> space inbetween the words. I thought I had fixed it by setting the
> HADOOP_IDENT_STRING variable to equal "Vasil Grigorov" from the default
> %USERNAME%, but that only disregarded my actual username. Since there is no
> way of changing my Windows username, I decided to make another account
> called "Vadoop" and tested running the code there. And to my surprise, the
> WordCount code ran with no issue, completing both the Map and Reduce tasks
> to 100% and giving me the correct output in the output directory. It's a
> bit annoying that I had to go through all this trouble just because the
> hadoop application hasn't been modified to escape space characters in
> people's username but yet again, I don't know how hard that would be to do.
> Anyway, I really appreciate the help and I hope this would help someone
> else in the future.
> 
> Additionally, I'm about to test out some more examples provided in the
> hadoop documentation just to get more familiar with how it works. I have
> heard about these famous examples of *Matrix-vector multiplication* and *Estimate
> the value of pi *but I have been unable to find them myself online. Do
> you know if the documentation provides those examples and if so, could you
> please reference them to me? Thank you in advance!
> 
> Best regards,
> Vasil Grigorov
> 
> 
> 
> > -------- Оригинално писмо --------
> > От: Ravi Prakash ravihadoop@gmail.com
> > Относно: Re: WordCount MapReduce error
> > До: Васил Григоров <vaskogr@abv.bg>, user <user@hadoop.apache.org>
> > Изпратено на: 23.02.2017 02:22
> 
> Hi Vasil!
> 
> I'm taking the liberty of adding back user mailing list in the hope that
> someone in the future may chance on this conversation and find it useful.
> 
> Could you please try by setting HADOOP_IDENT_STRING="Vasil" , although I
> do see https://issues.apache.org/jira/browse/HADOOP-10978 and I'm not
> sure it was fixed in 2.7.3.
> 
> Could you please inspect the OS process that is launched for the Map Task?
> What user does it run as? In Linux, we have the strace utility that would
> let me see all the system calls that a process makes. Is there something
> similar in Windows?
> If you can ensure only 1 Map Task, you could try setting
> "mapred.child.java.opts" to  "-Xdebug -Xrunjdwp:transport=dt_socket,
> server=y,suspend=y,address=1047", then connecting with a remote debugger
> like eclipse / jdb and stepping through to see where the failure happens.
> 
> That is interesting. I am guessing the MapTask is trying to write
> intermediate results to "mapreduce.cluster.local.dir" which defaults to
> "${hadoop.tmp.dir}/mapred/local" . hadoop.tmp.dir in turn defaults to
> "/tmp/hadoop-${ user.name}"
> 
> Could you please try setting mapreduce.cluster.local.dir (and maybe even
> hadoop.tmp.dir) to preferably some location without space? Once that works,
> you could try narrowing down the problem.
> 
> HTH
> Ravi
> 
> 
> On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров <vaskogr@abv.bg> \
> wrote: 
> Hello Ravi, thank you for the fast reply.
> 
> 1. I did have a problem with my username having a space, however I solved
> it by changing the  *set HADOOP_IDENT_STRING=%USERNAME% *to * set
> HADOOP_IDENT_STRING="Vasil Grigorov" *in the last line of hadoop-env.cmd.
> I can't change my windows username however, so if you know another file
> where I should specify it?
> 2. I do have a D:\tmp directory and about 500GB free space on that drive
> so space shouldn't be the issue.
> 3. The application has all the required permissions.
> 
> Additionally, something I've tested is that if I set the number of reduce
> tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get
> the success files for the Map task in my output directory. So the Map tasks
> work fine but the Reduce is messing up. Is it possible that my build is
> somewhat incorrect even though it said everything was successfully built?
> 
> Thanks again, I really appreciate the help!
> 
> 
> 
> > -------- Оригинално писмо --------
> > От: Ravi Prakash ravihadoop@gmail.com
> > Относно: Re: WordCount MapReduce error
> > До: Васил Григоров < vaskogr@abv.bg>
> > Изпратено на: 22.02.2017 21:36
> 
> Hi Vasil!
> 
> It seems like the WordCount application is expecting to open the
> intermediate file but failing. Do you see a directory under
> D:/tmp/hadoop-Vasil Grigirov/ . I can think of a few reasons. I'm sorry I
> am not familiar with the Filesystem on Windows 10.
> 1. Spaces in the file name are not being encoded / decoded properly. Can
> you try changing your name / username to remove the space?
> 2. There's not enough space on the D:/tmp directory?
> 3. The application does not have the right permissions to create the file.
> 
> HTH
> Ravi
> 
> On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров <vaskogr@abv.bg> \
> wrote: 
> Hello, I've been trying to run the WordCount example provided on the
> website on my Windows 10 machine. I have built the latest hadoop version
> (2.7.3) successfully and I want to run the code on the Local (Standalone)
> Mode. Thus, I have not specified any configurations, apart from setting the
> JAVA_HOME path in the "hadoop-env.cmd" file. When I try to run the
> WordCount file it fails to run the Reduce task but it completes the Map
> tasks. I get the following output:
> 
> 
> *D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount>hadoop
> jar wc.jar WordCount
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\input
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\output*
> *17/02/22 18:40:43 INFO Configuration.deprecation: session.id
> <http://session.id> is deprecated. Instead, use dfs.metrics.session-id*
> *17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop command-line
> option parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar file
> set.  User classes may not be found. See Job or Job#setJar(String).*
> *17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to
> process : 2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of splits:2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job:
> http://localhost:8080/ <http://localhost:8080/>*
> *17/02/22 18:40:45 INFO mapreduce.Job: Running job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set in
> config null*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map tasks*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_000000_0*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:45 INFO mapred.Task:  Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@3019d00f*
> *17/02/22 18:40:45 INFO mapred.MapTask: Processing split:
> file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file02:0+27*
>                 
> *17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi
> 26214396(104857584)*
> *17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100*
> *17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length =
> 6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner:*
> *17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid
> = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396(104857584);
> kvend = 26214384(104857536); length = 13/6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0*
> *17/02/22 18:40:45 INFO mapred.Task:
> Task:attempt_local334410887_0001_m_000000_0 is done. And is in the process
> of committing*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: map*
> *17/02/22 18:40:45 INFO mapred.Task: Task
> 'attempt_local334410887_0001_m_000000_0' done.*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task:
> attempt_local334410887_0001_m_000000_0*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_000001_0*
> *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:46 INFO mapred.Task:  Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@39ef3a7*
> *17/02/22 18:40:46 INFO mapred.MapTask: Processing split:
> file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file01:0+25*
>                 
> *17/02/22 18:40:46 INFO mapred.MapTask: (EQUATOR) 0 kvi
> 26214396(104857584)*
> *17/02/22 18:40:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100*
> *17/02/22 18:40:46 INFO mapred.MapTask: soft limit at 83886080*
> *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600*
> *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396; length =
> 6553600*
> *17/02/22 18:40:46 INFO mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner:*
> *17/02/22 18:40:46 INFO mapred.MapTask: Starting flush of map output*
> *17/02/22 18:40:46 INFO mapred.MapTask: Spilling map output*
> *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufend = 42; bufvoid
> = 104857600*
> *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396(104857584);
> kvend = 26214384(104857536); length = 13/6553600*
> *17/02/22 18:40:46 INFO mapred.MapTask: Finished spill 0*
> *17/02/22 18:40:46 INFO mapred.Task:
> Task:attempt_local334410887_0001_m_000001_0 is done. And is in the process
> of committing*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map*
> *17/02/22 18:40:46 INFO mapreduce.Job: Job job_local334410887_0001 running
> in uber mode : false*
> *17/02/22 18:40:46 INFO mapred.Task: Task
> 'attempt_local334410887_0001_m_000001_0' done.*
> *17/02/22 18:40:46 INFO mapreduce.Job:  map 100% reduce 0%*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Finishing task:
> attempt_local334410887_0001_m_000001_0*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map task executor complete.*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Waiting for reduce tasks*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_r_000000_0*
> *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:46 INFO mapred.Task:  Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@13ac822f*
> *17/02/22 18:40:46 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@6c4d20c4*
> *17/02/22 18:40:46 INFO reduce.MergeManagerImpl: MergerManager:
> memoryLimit=334338464, maxSingleShuffleLimit=83584616,
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10*
> *17/02/22 18:40:46 INFO reduce.EventFetcher:
> attempt_local334410887_0001_r_000000_0 Thread started: EventFetcher for
> fetching Map Completion Events*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: reduce task executor
> complete.*
> *17/02/22 18:40:46 WARN mapred.LocalJobRunner: job_local334410887_0001*
> *java.lang.Exception:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in localfetcher#1*
> *        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)*
> *        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)*
> *Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> error in shuffle in localfetcher#1*
> *        at
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)*
> *        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)*
> *        at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)*
>                 
> *        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
> *        at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
> *        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> *        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> *        at java.lang.Thread.run(Thread.java:745)*
> *Caused by: java.io.FileNotFoundException:
> D:/tmp/hadoop-Vasil%20Grigorov/mapred/local/localRunner/Vasil%20Grigorov/jobcache/jo \
>                 b_local334410887_0001/attempt_local334410887_0001_m_000000_0/output/file.out.index*
>                 
> *        at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:200)*
> *        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)*
> *        at org.apache.hadoop.io
> <http://org.apache.hadoop.io>.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)*
>                 
> *        at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:71)*
> *        at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)*
> *        at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:57)*
> *        at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:124)*
>                 
> *        at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)*
> *        at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)*
> *17/02/22 18:40:47 INFO mapreduce.Job: Job job_local334410887_0001 failed
> with state FAILED due to: NA*
> *17/02/22 18:40:47 INFO mapreduce.Job: Counters: 18*
> *        File System Counters*
> *                FILE: Number of bytes read=1158*
> *                FILE: Number of bytes written=591978*
> *                FILE: Number of read operations=0*
> *                FILE: Number of large read operations=0*
> *                FILE: Number of write operations=0*
> *        Map-Reduce Framework*
> *                Map input records=2*
> *                Map output records=8*
> *                Map output bytes=86*
> *                Map output materialized bytes=89*
> *                Input split bytes=308*
> *                Combine input records=8*
> *                Combine output records=6*
> *                Spilled Records=6*
> *                Failed Shuffles=0*
> *                Merged Map outputs=0*
> *                GC time elapsed (ms)=0*
> *                Total committed heap usage (bytes)=574095360*
> *        File Input Format Counters*
> *                Bytes Read=52*
> 
> I have followed every tutorial available and looked for a potention
> solution to the error I get, but I have been unsuccessful. As I mentioned
> before, I have not set any further configurations to any files because I
> want to run it in Standalone mode, rather than pseudo-distributed or fully
> distributed mode. I've spent a lot of time and effort to get this far and
> I've hit a brick wall with this error, so any help would be GREATLY
> appreciated.
> 
> Thank you in advance!
> 
> 
> 
> 


[Attachment #3 (text/html)]

<div dir="ltr"><div><div><div><div>Hi Vasil!<br><br></div>Thanks a lot for replying \
with your solution. Hopefully someone else will find it useful. I know that the pi \
example (amongst others) is in hadoop-mapreduce-examples-2.7.3.jar . I&#39;m sorry I \
do not know of Matrix-vector multiplication bundled in the Apache Hadoop source. \
I&#39;m sure lots of people on github may have tried that though.<br><br></div>Glad \
it worked for you finally! :-)<br></div>Regards<br></div>Ravi<br></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 23, 2017 at 5:41 AM, \
Васил Григоров <span dir="ltr">&lt;<a href="mailto:vaskogr@abv.bg" \
target="_blank">vaskogr@abv.bg</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div> Dear Ravi,<div><br></div><div>Even though I was unable \
to understand most of what you suggested me to try due to my lack of experience in \
the field, one of your suggestions did guide me in the right direction and I was able \
to solve my error. I decided to share it as you mentioned that you&#39;re adding this \
conversation to user mailing list for other people to see in case they run into a \
similiar problem.  <br><br>It turns out that my Windows username being consisted of 2 \
words: &quot;Vasil Grigorov&quot; has messed up the paths for the application \
somewhere due to the space inbetween the words. I thought I had fixed it by setting \
the HADOOP_IDENT_STRING variable to equal &quot;Vasil Grigorov&quot; from the default \
%USERNAME%, but that only disregarded my actual username. Since there is no way of \
changing my Windows username, I decided to make another account called \
&quot;Vadoop&quot; and tested running the code there. And to my surprise, the \
WordCount code ran with no issue, completing both the Map and Reduce tasks to 100% \
and giving me the correct output in the output directory. It&#39;s a bit annoying \
that I had to go through all this trouble just because the hadoop application \
hasn&#39;t been modified to escape space characters in people&#39;s username but yet \
again, I don&#39;t know how hard that would be to do. Anyway, I really appreciate the \
help and I hope this would help someone else in the future.  <br><br>Additionally, \
I&#39;m about to test out some more examples provided in the hadoop documentation \
just to get more familiar with how it works. I have heard about these famous examples \
of <b>Matrix-vector multiplication</b> and <b>Estimate the value of pi </b>but I have \
been unable to find them myself online. Do you know if the documentation provides \
those examples and if so, could you please reference them to me? Thank you in \
advance!<br><br>Best regards,<br>Vasil Grigorov<span class=""><br><br><br><br>




 &gt;-------- Оригинално писмо --------
<br> &gt;От: Ravi Prakash <a href="mailto:ravihadoop@gmail.com" \
target="_blank">ravihadoop@gmail.com</a> <br> &gt;Относно: Re: WordCount \
MapReduce error <br></span><span class=""> &gt;До: Васил Григоров \
&lt;<a href="mailto:vaskogr@abv.bg" target="_blank">vaskogr@abv.bg</a>&gt;,   user \
&lt;<a href="mailto:user@hadoop.apache.org" \
target="_blank">user@hadoop.apache.org</a>&gt; <br> &gt;Изпратено на: \
23.02.2017 02:22 <br><br>
 
 
  </span><div><div class="h5"><div>
   <div>
    <div>
     <div>
      <div>
       <div>
        <div>
         Hi Vasil!
         <br>
         <br>
        </div>
        <div>
         I&#39;m taking the liberty of adding back user mailing list in the hope that \
someone in the future may chance on this conversation and find it useful.   <br>
         <br>
        </div>Could you please try by setting HADOOP_IDENT_STRING=&quot;Vasil&quot; , \
although I do see   <a href="https://issues.apache.org/jira/browse/HADOOP-10978" \
target="_blank">https://issues.apache.org/<wbr>jira/browse/HADOOP-10978</a> and \
I&#39;m not sure it was fixed in 2.7.3.   <br>
        <br>Could you please inspect the OS process that is launched for the Map \
Task? What user does it run as? In Linux, we have the strace utility that would let \
me see all the system calls that a process makes. Is there something similar in \
Windows?  <br>
       </div>If you can ensure only 1 Map Task, you could try setting \
&quot;mapred.child.java.opts&quot; to   &quot;-Xdebug \
-Xrunjdwp:transport=dt_socket,<wbr>server=y,suspend=y,address=<wbr>1047&quot;, then \
connecting with a remote debugger like eclipse / jdb and stepping through to see \
where the failure happens.  <br>
       <br>
      </div>That is interesting. I am guessing the MapTask is trying to write \
intermediate results to &quot;mapreduce.cluster.local.dir&quot; which defaults to \
&quot;${hadoop.tmp.dir}/mapred/<wbr>local&quot; . hadoop.tmp.dir in turn defaults to \
&quot;/tmp/hadoop-${  <a href="http://user.name" target="_blank">user.name</a>}&quot;
      <br>
      <br>
     </div>Could you please try setting mapreduce.cluster.local.dir (and maybe even \
hadoop.tmp.dir) to preferably some location without space? Once that works, you could \
try narrowing down the problem.  <br>
     <br>
    </div>HTH
    <br>
   </div>Ravi
   <br>
   <div>
    <div>
     <div>
      <div>
       <br>
      </div>
     </div>
    </div>
   </div>
  </div>
  <div>
   <br>
   <div>
    On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров 
    <span>&lt;<a>vaskogr@abv.bg</a>&gt;</span> wrote:
    <br>
    <blockquote style="margin:0 0 0 0.8ex;border-left:1px #cccccc \
solid;padding-left:1ex">  <div>
       Hello Ravi, thank you for the fast reply.
      <br>
      <br>1. I did have a problem with my username having a space, however I solved \
it by changing the    <b>set HADOOP_IDENT_STRING=%USERNAME% </b>to
      <b> set HADOOP_IDENT_STRING=&quot;Vasil Grigorov&quot; </b>in the last line of \
hadoop-env.cmd. I can&#39;t change my windows username however, so if you know \
another file where I should specify it?  <div>
       2. I do have a D:\tmp directory and about 500GB free space on that drive so \
space shouldn&#39;t be the issue.  </div>
      <div>
       3. The application has all the required permissions.
      </div>
      <div>
       <br>
      </div>
      <div>
       Additionally, something I&#39;ve tested is that if I set the number of reduce \
tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get the \
success files for the Map task in my output directory. So the Map tasks work fine but \
the Reduce is messing up. Is it possible that my build is somewhat incorrect even \
though it said everything was successfully built?  </div>
      <div>
       <br>Thanks again, I really appreciate the help!
       <br>
       <br>
       <br>
       <br> &gt;-------- Оригинално писмо -------- 
       <br> &gt;От: Ravi Prakash 
       <a>ravihadoop@gmail.com</a> 
       <br> &gt;Относно: Re: WordCount MapReduce error 
       <br> &gt;До: Васил Григоров &lt;
       <a>vaskogr@abv.bg</a>&gt; 
       <br> &gt;Изпратено на: 22.02.2017 21:36 
       <br>
       <div>
        <div>
         <br> 
         <div> 
          <div> 
           <div> 
            <div> 
             <div> 
              <div> 
               <div>
                 Hi Vasil! 
                <br> 
                <br> 
               </div>It seems like the WordCount application is expecting to open the \
intermediate file but failing. Do you see a directory under D:/tmp/hadoop-Vasil \
Grigirov/ . I can think of a few reasons. I&#39;m sorry I am not familiar with the \
Filesystem on Windows 10.   <br> 
              </div>1. Spaces in the file name are not being encoded / decoded \
properly. Can you try changing your name / username to remove the space?   <br> 
             </div>2. There&#39;s not enough space on the D:/tmp directory? 
             <br> 
            </div>3. The application does not have the right permissions to create \
the file.   <br> 
            <br> 
           </div>HTH 
           <br> 
          </div>Ravi 
          <br> 
         </div> 
         <div> 
          <br> 
          <div>
            On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров 
           <span>&lt;<a>vaskogr@abv.bg</a>&gt;</span> wrote: 
           <br> 
           <blockquote style="margin:0 0 0 0.8ex;border-left:1px #cccccc \
solid;padding-left:1ex">   <div>
              Hello, I&#39;ve been trying to run the WordCount example provided on \
the website on my Windows 10 machine. I have built the latest hadoop version (2.7.3) \
successfully and I want to run the code on the Local (Standalone) Mode. Thus, I have \
not specified any configurations, apart from setting the JAVA_HOME path in the \
&quot;hadoop-env.cmd&quot; file. When I try to run the WordCount file it fails to run \
the Reduce task but it completes the Map tasks. I get the following output:   <br> 
             <br> 
             <div> 
              <div> 
               <b><br></b> 
              </div> 
              <div> 
               <b>D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount&gt;hadoop \
jar wc.jar WordCount \
D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount\input \
D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount\output</b> \
  </div> 
              <div> 
               <b>17/02/22 18:40:43 INFO Configuration.deprecation: <a \
href="http://session.id" target="_blank">session.id</a> is deprecated. Instead, use \
dfs.metrics.session-id</b>   </div> 
              <div> 
               <b>17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics \
with processName=JobTracker, sessionId=</b>   </div> 
              <div> 
               <b>17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop \
command-line option parsing not performed. Implement the Tool interface and execute \
your application with ToolRunner to remedy this.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar \
file set.   User classes may not be found. See Job or Job#setJar(String).</b>   \
</div>   <div> 
               <b>17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to \
process : 2</b>   </div> 
              <div> 
               <b>17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of \
splits:2</b>   </div> 
              <div> 
               <b>17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens \
for job: job_local334410887_0001</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job: <a \
href="http://localhost:8080/" target="_blank">http://localhost:8080/</a></b>   </div> \
  <div> 
               <b>17/02/22 18:40:45 INFO mapreduce.Job: Running job: \
job_local334410887_0001</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set \
in config null</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is \
org.apache.hadoop.mapreduce.<wbr>lib.output.FileOutputCommitter</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map \
tasks</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_m_<wbr>000000_0</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.Task:   Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>3019d00f</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: Processing split: \
file:/D:/Programs/hadoop-2.7.<wbr>3-src/hadoop-dist/target/<wbr>hadoop-2.7.3/WordCount/input/<wbr>file02:0+27</b> \
  </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi \
26214396(104857584)</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: \
100</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = \
104857600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length = \
6553600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class = \
org.apache.hadoop.mapred.<wbr>MapTask$MapOutputBuffer</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner:</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map \
output</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; \
bufvoid = 104857600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: kvstart = \
26214396(104857584); kvend = 26214384(104857536); length = 13/6553600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.Task: \
Task:attempt_local334410887_<wbr>0001_m_000000_0 is done. And is in the process of \
committing</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: map</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.Task: Task \
&#39;attempt_local334410887_0001_<wbr>m_000000_0&#39; done.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task: \
attempt_local334410887_0001_m_<wbr>000000_0</b>   </div> 
              <div> 
               <b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_m_<wbr>000001_0</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.Task:   Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>39ef3a7</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: Processing split: \
file:/D:/Programs/hadoop-2.7.<wbr>3-src/hadoop-dist/target/<wbr>hadoop-2.7.3/WordCount/input/<wbr>file01:0+25</b> \
  </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: (EQUATOR) 0 kvi \
26214396(104857584)</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: \
100</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: soft limit at 83886080</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufvoid = \
104857600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396; length = \
6553600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: Map output collector class = \
org.apache.hadoop.mapred.<wbr>MapTask$MapOutputBuffer</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner:</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: Starting flush of map \
output</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: Spilling map output</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufend = 42; \
bufvoid = 104857600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: kvstart = \
26214396(104857584); kvend = 26214384(104857536); length = 13/6553600</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.MapTask: Finished spill 0</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.Task: \
Task:attempt_local334410887_<wbr>0001_m_000001_0 is done. And is in the process of \
committing</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: map</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapreduce.Job: Job job_local334410887_0001 \
running in uber mode : false</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.Task: Task \
&#39;attempt_local334410887_0001_<wbr>m_000001_0&#39; done.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapreduce.Job:   map 100% reduce 0%</b> 
              </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Finishing task: \
attempt_local334410887_0001_m_<wbr>000001_0</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: map task executor \
complete.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Waiting for reduce \
tasks</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_r_<wbr>000000_0</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.Task:   Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>13ac822f</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.ReduceTask: Using \
ShuffleConsumerPlugin: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle@6c4d20c4</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO reduce.MergeManagerImpl: MergerManager: \
memoryLimit=334338464, maxSingleShuffleLimit=<wbr>83584616, mergeThreshold=220663392, \
ioSortFactor=10, memToMemMergeOutputsThreshold=<wbr>10</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO reduce.EventFetcher: \
attempt_local334410887_0001_r_<wbr>000000_0 Thread started: EventFetcher for fetching \
Map Completion Events</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: reduce task executor \
complete.</b>   </div> 
              <div> 
               <b>17/02/22 18:40:46 WARN mapred.LocalJobRunner: \
job_local334410887_0001</b>   </div> 
              <div> 
               <b>java.lang.Exception: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle$<wbr>ShuffleError: error in \
shuffle in localfetcher#1</b>   </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job.runTasks(<wbr>LocalJobRunner.java:462)</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job.run(<wbr>LocalJobRunner.java:529)</b> \
  </div> 
              <div> 
               <b>Caused by: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle$<wbr>ShuffleError: error in \
shuffle in localfetcher#1</b>   </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle.run(<wbr>Shuffle.java:134)</b>   \
</div>   <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>ReduceTask.run(ReduceTask.<wbr>java:376)</b>   </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job$<wbr>ReduceTaskRunnable.run(<wbr>LocalJobRunner.java:319)</b> \
  </div> 
              <div> 
               <b>            at \
java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)</b> \
  </div> 
              <div> 
               <b>            at \
java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266)</b>   </div> 
              <div> 
               <b>            at \
java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)</b> \
  </div> 
              <div> 
               <b>            at \
java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)</b> \
  </div> 
              <div> 
               <b>            at java.lang.Thread.run(Thread.<wbr>java:745)</b> 
              </div> 
              <div> 
               <b>Caused by: java.io.FileNotFoundException: \
D:/tmp/hadoop-Vasil%<wbr>20Grigorov/mapred/local/<wbr>localRunner/Vasil%20Grigorov/<wb \
r>jobcache/job_local334410887_<wbr>0001/attempt_local334410887_<wbr>0001_m_000000_0/output/file.<wbr>out.index</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.fs.<wbr>RawLocalFileSystem.open(<wbr>RawLocalFileSystem.java:200)</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.fs.<wbr>FileSystem.open(FileSystem.<wbr>java:769)</b>   </div> 
              <div> 
               <b>            at <a href="http://org.apache.hadoop.io" \
target="_blank">org.apache.hadoop.io</a>.<wbr>SecureIOUtils.<wbr>openFSDataInputStream(<wbr>SecureIOUtils.java:156)</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>SpillRecord.&lt;init&gt;(<wbr>SpillRecord.java:71)</b>  \
</div>   <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>SpillRecord.&lt;init&gt;(<wbr>SpillRecord.java:62)</b>  \
</div>   <div> 
               <b>            at \
org.apache.hadoop.mapred.<wbr>SpillRecord.&lt;init&gt;(<wbr>SpillRecord.java:57)</b>  \
</div>   <div> 
               <b>            at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.<wbr>copyMapOutput(LocalFetcher.<wbr>java:124)</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.<wbr>doCopy(LocalFetcher.java:102)</b> \
  </div> 
              <div> 
               <b>            at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.run(<wbr>LocalFetcher.java:85)</b> \
  </div> 
              <div> 
               <b>17/02/22 18:40:47 INFO mapreduce.Job: Job job_local334410887_0001 \
failed with state FAILED due to: NA</b>   </div> 
              <div> 
               <b>17/02/22 18:40:47 INFO mapreduce.Job: Counters: 18</b> 
              </div> 
              <div> 
               <b>            File System Counters</b> 
              </div> 
              <div> 
               <b>                        FILE: Number of bytes read=1158</b> 
              </div> 
              <div> 
               <b>                        FILE: Number of bytes written=591978</b> 
              </div> 
              <div> 
               <b>                        FILE: Number of read operations=0</b> 
              </div> 
              <div> 
               <b>                        FILE: Number of large read operations=0</b> \
  </div> 
              <div> 
               <b>                        FILE: Number of write operations=0</b> 
              </div> 
              <div> 
               <b>            Map-Reduce Framework</b> 
              </div> 
              <div> 
               <b>                        Map input records=2</b> 
              </div> 
              <div> 
               <b>                        Map output records=8</b> 
              </div> 
              <div> 
               <b>                        Map output bytes=86</b> 
              </div> 
              <div> 
               <b>                        Map output materialized bytes=89</b> 
              </div> 
              <div> 
               <b>                        Input split bytes=308</b> 
              </div> 
              <div> 
               <b>                        Combine input records=8</b> 
              </div> 
              <div> 
               <b>                        Combine output records=6</b> 
              </div> 
              <div> 
               <b>                        Spilled Records=6</b> 
              </div> 
              <div> 
               <b>                        Failed Shuffles=0</b> 
              </div> 
              <div> 
               <b>                        Merged Map outputs=0</b> 
              </div> 
              <div> 
               <b>                        GC time elapsed (ms)=0</b> 
              </div> 
              <div> 
               <b>                        Total committed heap usage \
(bytes)=574095360</b>   </div> 
              <div> 
               <b>            File Input Format Counters</b> 
              </div> 
              <div> 
               <b>                        Bytes Read=52</b> 
              </div> 
              <div> 
               <br> 
              </div> 
             </div> 
             <div>
               I have followed every tutorial available and looked for a potention \
solution to the error I get, but I have been unsuccessful. As I mentioned before, I \
have not set any further configurations to any files because I want to run it in \
Standalone mode, rather than pseudo-distributed or fully distributed mode. I&#39;ve \
spent a lot of time and effort to get this far and I&#39;ve hit a brick wall with \
this error, so any help would be GREATLY appreciated.   </div> 
             <div> 
              <br> 
             </div> 
             <div>
               Thank you in advance! 
             </div> 
            </div> 
           </blockquote> 
          </div> 
          <br> 
         </div> 
        </div>
       </div>
      </div>
     </div> 
    </blockquote>
   </div>
   <br>
  </div> 
 
</div></div></div></div>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic