[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: Re: WordCount MapReduce error
From: Ravi Prakash <ravihadoop () gmail ! com>
Date: 2017-02-23 21:23:47
Message-ID: CAMs9kVjyCHZuXC5j0cy1spB8aGpykaDVUDKO9-OCuWVfRdtmQw () mail ! gmail ! com
[Download RAW message or body]
Hi Vasil!
Thanks a lot for replying with your solution. Hopefully someone else will
find it useful. I know that the pi example (amongst others) is in
hadoop-mapreduce-examples-2.7.3.jar . I'm sorry I do not know of
Matrix-vector multiplication bundled in the Apache Hadoop source. I'm sure
lots of people on github may have tried that though.
Glad it worked for you finally! :-)
Regards
Ravi
On Thu, Feb 23, 2017 at 5:41 AM, Васил Григоров <vaskogr@abv.bg> wrote:
> Dear Ravi,
>
> Even though I was unable to understand most of what you suggested me to
> try due to my lack of experience in the field, one of your suggestions did
> guide me in the right direction and I was able to solve my error. I decided
> to share it as you mentioned that you're adding this conversation to user
> mailing list for other people to see in case they run into a similiar
> problem.
>
> It turns out that my Windows username being consisted of 2 words: "Vasil
> Grigorov" has messed up the paths for the application somewhere due to the
> space inbetween the words. I thought I had fixed it by setting the
> HADOOP_IDENT_STRING variable to equal "Vasil Grigorov" from the default
> %USERNAME%, but that only disregarded my actual username. Since there is no
> way of changing my Windows username, I decided to make another account
> called "Vadoop" and tested running the code there. And to my surprise, the
> WordCount code ran with no issue, completing both the Map and Reduce tasks
> to 100% and giving me the correct output in the output directory. It's a
> bit annoying that I had to go through all this trouble just because the
> hadoop application hasn't been modified to escape space characters in
> people's username but yet again, I don't know how hard that would be to do.
> Anyway, I really appreciate the help and I hope this would help someone
> else in the future.
>
> Additionally, I'm about to test out some more examples provided in the
> hadoop documentation just to get more familiar with how it works. I have
> heard about these famous examples of *Matrix-vector multiplication* and *Estimate
> the value of pi *but I have been unable to find them myself online. Do
> you know if the documentation provides those examples and if so, could you
> please reference them to me? Thank you in advance!
>
> Best regards,
> Vasil Grigorov
>
>
>
> > -------- Оригинално писмо --------
> > От: Ravi Prakash ravihadoop@gmail.com
> > Относно: Re: WordCount MapReduce error
> > До: Васил Григоров <vaskogr@abv.bg>, user <user@hadoop.apache.org>
> > Изпратено на: 23.02.2017 02:22
>
> Hi Vasil!
>
> I'm taking the liberty of adding back user mailing list in the hope that
> someone in the future may chance on this conversation and find it useful.
>
> Could you please try by setting HADOOP_IDENT_STRING="Vasil" , although I
> do see https://issues.apache.org/jira/browse/HADOOP-10978 and I'm not
> sure it was fixed in 2.7.3.
>
> Could you please inspect the OS process that is launched for the Map Task?
> What user does it run as? In Linux, we have the strace utility that would
> let me see all the system calls that a process makes. Is there something
> similar in Windows?
> If you can ensure only 1 Map Task, you could try setting
> "mapred.child.java.opts" to "-Xdebug -Xrunjdwp:transport=dt_socket,
> server=y,suspend=y,address=1047", then connecting with a remote debugger
> like eclipse / jdb and stepping through to see where the failure happens.
>
> That is interesting. I am guessing the MapTask is trying to write
> intermediate results to "mapreduce.cluster.local.dir" which defaults to
> "${hadoop.tmp.dir}/mapred/local" . hadoop.tmp.dir in turn defaults to
> "/tmp/hadoop-${ user.name}"
>
> Could you please try setting mapreduce.cluster.local.dir (and maybe even
> hadoop.tmp.dir) to preferably some location without space? Once that works,
> you could try narrowing down the problem.
>
> HTH
> Ravi
>
>
> On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров <vaskogr@abv.bg> \
> wrote:
> Hello Ravi, thank you for the fast reply.
>
> 1. I did have a problem with my username having a space, however I solved
> it by changing the *set HADOOP_IDENT_STRING=%USERNAME% *to * set
> HADOOP_IDENT_STRING="Vasil Grigorov" *in the last line of hadoop-env.cmd.
> I can't change my windows username however, so if you know another file
> where I should specify it?
> 2. I do have a D:\tmp directory and about 500GB free space on that drive
> so space shouldn't be the issue.
> 3. The application has all the required permissions.
>
> Additionally, something I've tested is that if I set the number of reduce
> tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get
> the success files for the Map task in my output directory. So the Map tasks
> work fine but the Reduce is messing up. Is it possible that my build is
> somewhat incorrect even though it said everything was successfully built?
>
> Thanks again, I really appreciate the help!
>
>
>
> > -------- Оригинално писмо --------
> > От: Ravi Prakash ravihadoop@gmail.com
> > Относно: Re: WordCount MapReduce error
> > До: Васил Григоров < vaskogr@abv.bg>
> > Изпратено на: 22.02.2017 21:36
>
> Hi Vasil!
>
> It seems like the WordCount application is expecting to open the
> intermediate file but failing. Do you see a directory under
> D:/tmp/hadoop-Vasil Grigirov/ . I can think of a few reasons. I'm sorry I
> am not familiar with the Filesystem on Windows 10.
> 1. Spaces in the file name are not being encoded / decoded properly. Can
> you try changing your name / username to remove the space?
> 2. There's not enough space on the D:/tmp directory?
> 3. The application does not have the right permissions to create the file.
>
> HTH
> Ravi
>
> On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров <vaskogr@abv.bg> \
> wrote:
> Hello, I've been trying to run the WordCount example provided on the
> website on my Windows 10 machine. I have built the latest hadoop version
> (2.7.3) successfully and I want to run the code on the Local (Standalone)
> Mode. Thus, I have not specified any configurations, apart from setting the
> JAVA_HOME path in the "hadoop-env.cmd" file. When I try to run the
> WordCount file it fails to run the Reduce task but it completes the Map
> tasks. I get the following output:
>
>
> *D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount>hadoop
> jar wc.jar WordCount
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\input
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\output*
> *17/02/22 18:40:43 INFO Configuration.deprecation: session.id
> <http://session.id> is deprecated. Instead, use dfs.metrics.session-id*
> *17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop command-line
> option parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar file
> set. User classes may not be found. See Job or Job#setJar(String).*
> *17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to
> process : 2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of splits:2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job:
> http://localhost:8080/ <http://localhost:8080/>*
> *17/02/22 18:40:45 INFO mapreduce.Job: Running job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set in
> config null*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map tasks*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_000000_0*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:45 INFO mapred.Task: Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@3019d00f*
> *17/02/22 18:40:45 INFO mapred.MapTask: Processing split:
> file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file02:0+27*
>
> *17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi
> 26214396(104857584)*
> *17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100*
> *17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length =
> 6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner:*
> *17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid
> = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396(104857584);
> kvend = 26214384(104857536); length = 13/6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0*
> *17/02/22 18:40:45 INFO mapred.Task:
> Task:attempt_local334410887_0001_m_000000_0 is done. And is in the process
> of committing*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: map*
> *17/02/22 18:40:45 INFO mapred.Task: Task
> 'attempt_local334410887_0001_m_000000_0' done.*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task:
> attempt_local334410887_0001_m_000000_0*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_000001_0*
> *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:46 INFO mapred.Task: Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@39ef3a7*
> *17/02/22 18:40:46 INFO mapred.MapTask: Processing split:
> file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file01:0+25*
>
> *17/02/22 18:40:46 INFO mapred.MapTask: (EQUATOR) 0 kvi
> 26214396(104857584)*
> *17/02/22 18:40:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100*
> *17/02/22 18:40:46 INFO mapred.MapTask: soft limit at 83886080*
> *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600*
> *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396; length =
> 6553600*
> *17/02/22 18:40:46 INFO mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner:*
> *17/02/22 18:40:46 INFO mapred.MapTask: Starting flush of map output*
> *17/02/22 18:40:46 INFO mapred.MapTask: Spilling map output*
> *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufend = 42; bufvoid
> = 104857600*
> *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396(104857584);
> kvend = 26214384(104857536); length = 13/6553600*
> *17/02/22 18:40:46 INFO mapred.MapTask: Finished spill 0*
> *17/02/22 18:40:46 INFO mapred.Task:
> Task:attempt_local334410887_0001_m_000001_0 is done. And is in the process
> of committing*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map*
> *17/02/22 18:40:46 INFO mapreduce.Job: Job job_local334410887_0001 running
> in uber mode : false*
> *17/02/22 18:40:46 INFO mapred.Task: Task
> 'attempt_local334410887_0001_m_000001_0' done.*
> *17/02/22 18:40:46 INFO mapreduce.Job: map 100% reduce 0%*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Finishing task:
> attempt_local334410887_0001_m_000001_0*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map task executor complete.*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Waiting for reduce tasks*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_r_000000_0*
> *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:46 INFO mapred.Task: Using ResourceCalculatorProcessTree
> > org.apache.hadoop.yarn.util.WindowsBasedProcessTree@13ac822f*
> *17/02/22 18:40:46 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@6c4d20c4*
> *17/02/22 18:40:46 INFO reduce.MergeManagerImpl: MergerManager:
> memoryLimit=334338464, maxSingleShuffleLimit=83584616,
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10*
> *17/02/22 18:40:46 INFO reduce.EventFetcher:
> attempt_local334410887_0001_r_000000_0 Thread started: EventFetcher for
> fetching Map Completion Events*
> *17/02/22 18:40:46 INFO mapred.LocalJobRunner: reduce task executor
> complete.*
> *17/02/22 18:40:46 WARN mapred.LocalJobRunner: job_local334410887_0001*
> *java.lang.Exception:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in localfetcher#1*
> * at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)*
> * at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)*
> *Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> error in shuffle in localfetcher#1*
> * at
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)*
> * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)*
> * at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)*
>
> * at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> * at java.lang.Thread.run(Thread.java:745)*
> *Caused by: java.io.FileNotFoundException:
> D:/tmp/hadoop-Vasil%20Grigorov/mapred/local/localRunner/Vasil%20Grigorov/jobcache/jo \
> b_local334410887_0001/attempt_local334410887_0001_m_000000_0/output/file.out.index*
>
> * at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:200)*
> * at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)*
> * at org.apache.hadoop.io
> <http://org.apache.hadoop.io>.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)*
>
> * at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:71)*
> * at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)*
> * at
> org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:57)*
> * at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:124)*
>
> * at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)*
> * at
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)*
> *17/02/22 18:40:47 INFO mapreduce.Job: Job job_local334410887_0001 failed
> with state FAILED due to: NA*
> *17/02/22 18:40:47 INFO mapreduce.Job: Counters: 18*
> * File System Counters*
> * FILE: Number of bytes read=1158*
> * FILE: Number of bytes written=591978*
> * FILE: Number of read operations=0*
> * FILE: Number of large read operations=0*
> * FILE: Number of write operations=0*
> * Map-Reduce Framework*
> * Map input records=2*
> * Map output records=8*
> * Map output bytes=86*
> * Map output materialized bytes=89*
> * Input split bytes=308*
> * Combine input records=8*
> * Combine output records=6*
> * Spilled Records=6*
> * Failed Shuffles=0*
> * Merged Map outputs=0*
> * GC time elapsed (ms)=0*
> * Total committed heap usage (bytes)=574095360*
> * File Input Format Counters*
> * Bytes Read=52*
>
> I have followed every tutorial available and looked for a potention
> solution to the error I get, but I have been unsuccessful. As I mentioned
> before, I have not set any further configurations to any files because I
> want to run it in Standalone mode, rather than pseudo-distributed or fully
> distributed mode. I've spent a lot of time and effort to get this far and
> I've hit a brick wall with this error, so any help would be GREATLY
> appreciated.
>
> Thank you in advance!
>
>
>
>
[Attachment #3 (text/html)]
<div dir="ltr"><div><div><div><div>Hi Vasil!<br><br></div>Thanks a lot for replying \
with your solution. Hopefully someone else will find it useful. I know that the pi \
example (amongst others) is in hadoop-mapreduce-examples-2.7.3.jar . I'm sorry I \
do not know of Matrix-vector multiplication bundled in the Apache Hadoop source. \
I'm sure lots of people on github may have tried that though.<br><br></div>Glad \
it worked for you finally! :-)<br></div>Regards<br></div>Ravi<br></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 23, 2017 at 5:41 AM, \
Васил Григоров <span dir="ltr"><<a href="mailto:vaskogr@abv.bg" \
target="_blank">vaskogr@abv.bg</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div> Dear Ravi,<div><br></div><div>Even though I was unable \
to understand most of what you suggested me to try due to my lack of experience in \
the field, one of your suggestions did guide me in the right direction and I was able \
to solve my error. I decided to share it as you mentioned that you're adding this \
conversation to user mailing list for other people to see in case they run into a \
similiar problem. <br><br>It turns out that my Windows username being consisted of 2 \
words: "Vasil Grigorov" has messed up the paths for the application \
somewhere due to the space inbetween the words. I thought I had fixed it by setting \
the HADOOP_IDENT_STRING variable to equal "Vasil Grigorov" from the default \
%USERNAME%, but that only disregarded my actual username. Since there is no way of \
changing my Windows username, I decided to make another account called \
"Vadoop" and tested running the code there. And to my surprise, the \
WordCount code ran with no issue, completing both the Map and Reduce tasks to 100% \
and giving me the correct output in the output directory. It's a bit annoying \
that I had to go through all this trouble just because the hadoop application \
hasn't been modified to escape space characters in people's username but yet \
again, I don't know how hard that would be to do. Anyway, I really appreciate the \
help and I hope this would help someone else in the future. <br><br>Additionally, \
I'm about to test out some more examples provided in the hadoop documentation \
just to get more familiar with how it works. I have heard about these famous examples \
of <b>Matrix-vector multiplication</b> and <b>Estimate the value of pi </b>but I have \
been unable to find them myself online. Do you know if the documentation provides \
those examples and if so, could you please reference them to me? Thank you in \
advance!<br><br>Best regards,<br>Vasil Grigorov<span class=""><br><br><br><br>
>-------- Оригинално писмо --------
<br> >От: Ravi Prakash <a href="mailto:ravihadoop@gmail.com" \
target="_blank">ravihadoop@gmail.com</a> <br> >Относно: Re: WordCount \
MapReduce error <br></span><span class=""> >До: Васил Григоров \
<<a href="mailto:vaskogr@abv.bg" target="_blank">vaskogr@abv.bg</a>>, user \
<<a href="mailto:user@hadoop.apache.org" \
target="_blank">user@hadoop.apache.org</a>> <br> >Изпратено на: \
23.02.2017 02:22 <br><br>
</span><div><div class="h5"><div>
<div>
<div>
<div>
<div>
<div>
<div>
Hi Vasil!
<br>
<br>
</div>
<div>
I'm taking the liberty of adding back user mailing list in the hope that \
someone in the future may chance on this conversation and find it useful. <br>
<br>
</div>Could you please try by setting HADOOP_IDENT_STRING="Vasil" , \
although I do see <a href="https://issues.apache.org/jira/browse/HADOOP-10978" \
target="_blank">https://issues.apache.org/<wbr>jira/browse/HADOOP-10978</a> and \
I'm not sure it was fixed in 2.7.3. <br>
<br>Could you please inspect the OS process that is launched for the Map \
Task? What user does it run as? In Linux, we have the strace utility that would let \
me see all the system calls that a process makes. Is there something similar in \
Windows? <br>
</div>If you can ensure only 1 Map Task, you could try setting \
"mapred.child.java.opts" to "-Xdebug \
-Xrunjdwp:transport=dt_socket,<wbr>server=y,suspend=y,address=<wbr>1047", then \
connecting with a remote debugger like eclipse / jdb and stepping through to see \
where the failure happens. <br>
<br>
</div>That is interesting. I am guessing the MapTask is trying to write \
intermediate results to "mapreduce.cluster.local.dir" which defaults to \
"${hadoop.tmp.dir}/mapred/<wbr>local" . hadoop.tmp.dir in turn defaults to \
"/tmp/hadoop-${ <a href="http://user.name" target="_blank">user.name</a>}"
<br>
<br>
</div>Could you please try setting mapreduce.cluster.local.dir (and maybe even \
hadoop.tmp.dir) to preferably some location without space? Once that works, you could \
try narrowing down the problem. <br>
<br>
</div>HTH
<br>
</div>Ravi
<br>
<div>
<div>
<div>
<div>
<br>
</div>
</div>
</div>
</div>
</div>
<div>
<br>
<div>
On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров
<span><<a>vaskogr@abv.bg</a>></span> wrote:
<br>
<blockquote style="margin:0 0 0 0.8ex;border-left:1px #cccccc \
solid;padding-left:1ex"> <div>
Hello Ravi, thank you for the fast reply.
<br>
<br>1. I did have a problem with my username having a space, however I solved \
it by changing the <b>set HADOOP_IDENT_STRING=%USERNAME% </b>to
<b> set HADOOP_IDENT_STRING="Vasil Grigorov" </b>in the last line of \
hadoop-env.cmd. I can't change my windows username however, so if you know \
another file where I should specify it? <div>
2. I do have a D:\tmp directory and about 500GB free space on that drive so \
space shouldn't be the issue. </div>
<div>
3. The application has all the required permissions.
</div>
<div>
<br>
</div>
<div>
Additionally, something I've tested is that if I set the number of reduce \
tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get the \
success files for the Map task in my output directory. So the Map tasks work fine but \
the Reduce is messing up. Is it possible that my build is somewhat incorrect even \
though it said everything was successfully built? </div>
<div>
<br>Thanks again, I really appreciate the help!
<br>
<br>
<br>
<br> >-------- Оригинално писмо --------
<br> >От: Ravi Prakash
<a>ravihadoop@gmail.com</a>
<br> >Относно: Re: WordCount MapReduce error
<br> >До: Васил Григоров <
<a>vaskogr@abv.bg</a>>
<br> >Изпратено на: 22.02.2017 21:36
<br>
<div>
<div>
<br>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
Hi Vasil!
<br>
<br>
</div>It seems like the WordCount application is expecting to open the \
intermediate file but failing. Do you see a directory under D:/tmp/hadoop-Vasil \
Grigirov/ . I can think of a few reasons. I'm sorry I am not familiar with the \
Filesystem on Windows 10. <br>
</div>1. Spaces in the file name are not being encoded / decoded \
properly. Can you try changing your name / username to remove the space? <br>
</div>2. There's not enough space on the D:/tmp directory?
<br>
</div>3. The application does not have the right permissions to create \
the file. <br>
<br>
</div>HTH
<br>
</div>Ravi
<br>
</div>
<div>
<br>
<div>
On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров
<span><<a>vaskogr@abv.bg</a>></span> wrote:
<br>
<blockquote style="margin:0 0 0 0.8ex;border-left:1px #cccccc \
solid;padding-left:1ex"> <div>
Hello, I've been trying to run the WordCount example provided on \
the website on my Windows 10 machine. I have built the latest hadoop version (2.7.3) \
successfully and I want to run the code on the Local (Standalone) Mode. Thus, I have \
not specified any configurations, apart from setting the JAVA_HOME path in the \
"hadoop-env.cmd" file. When I try to run the WordCount file it fails to run \
the Reduce task but it completes the Map tasks. I get the following output: <br>
<br>
<div>
<div>
<b><br></b>
</div>
<div>
<b>D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount>hadoop \
jar wc.jar WordCount \
D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount\input \
D:\Programs\hadoop-2.7.3-src\<wbr>hadoop-dist\target\hadoop-2.7.<wbr>3\WordCount\output</b> \
</div>
<div>
<b>17/02/22 18:40:43 INFO Configuration.deprecation: <a \
href="http://session.id" target="_blank">session.id</a> is deprecated. Instead, use \
dfs.metrics.session-id</b> </div>
<div>
<b>17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics \
with processName=JobTracker, sessionId=</b> </div>
<div>
<b>17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop \
command-line option parsing not performed. Implement the Tool interface and execute \
your application with ToolRunner to remedy this.</b> </div>
<div>
<b>17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar \
file set. User classes may not be found. See Job or Job#setJar(String).</b> \
</div> <div>
<b>17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to \
process : 2</b> </div>
<div>
<b>17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of \
splits:2</b> </div>
<div>
<b>17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens \
for job: job_local334410887_0001</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job: <a \
href="http://localhost:8080/" target="_blank">http://localhost:8080/</a></b> </div> \
<div>
<b>17/02/22 18:40:45 INFO mapreduce.Job: Running job: \
job_local334410887_0001</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set \
in config null</b> </div>
<div>
<b>17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is \
org.apache.hadoop.mapreduce.<wbr>lib.output.FileOutputCommitter</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map \
tasks</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_m_<wbr>000000_0</b> </div>
<div>
<b>17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b> </div>
<div>
<b>17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.Task: Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>3019d00f</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: Processing split: \
file:/D:/Programs/hadoop-2.7.<wbr>3-src/hadoop-dist/target/<wbr>hadoop-2.7.3/WordCount/input/<wbr>file02:0+27</b> \
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi \
26214396(104857584)</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: \
100</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080</b>
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = \
104857600</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length = \
6553600</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class = \
org.apache.hadoop.mapred.<wbr>MapTask$MapOutputBuffer</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner:</b>
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map \
output</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output</b>
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; \
bufvoid = 104857600</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: kvstart = \
26214396(104857584); kvend = 26214384(104857536); length = 13/6553600</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0</b>
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.Task: \
Task:attempt_local334410887_<wbr>0001_m_000000_0 is done. And is in the process of \
committing</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: map</b>
</div>
<div>
<b>17/02/22 18:40:45 INFO mapred.Task: Task \
'attempt_local334410887_0001_<wbr>m_000000_0' done.</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task: \
attempt_local334410887_0001_m_<wbr>000000_0</b> </div>
<div>
<b>17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_m_<wbr>000001_0</b> </div>
<div>
<b>17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b> </div>
<div>
<b>17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.Task: Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>39ef3a7</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: Processing split: \
file:/D:/Programs/hadoop-2.7.<wbr>3-src/hadoop-dist/target/<wbr>hadoop-2.7.3/WordCount/input/<wbr>file01:0+25</b> \
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: (EQUATOR) 0 kvi \
26214396(104857584)</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: \
100</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: soft limit at 83886080</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufvoid = \
104857600</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396; length = \
6553600</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: Map output collector class = \
org.apache.hadoop.mapred.<wbr>MapTask$MapOutputBuffer</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner:</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: Starting flush of map \
output</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: Spilling map output</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufend = 42; \
bufvoid = 104857600</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: kvstart = \
26214396(104857584); kvend = 26214384(104857536); length = 13/6553600</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.MapTask: Finished spill 0</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.Task: \
Task:attempt_local334410887_<wbr>0001_m_000001_0 is done. And is in the process of \
committing</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: map</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapreduce.Job: Job job_local334410887_0001 \
running in uber mode : false</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.Task: Task \
'attempt_local334410887_0001_<wbr>m_000001_0' done.</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapreduce.Job: map 100% reduce 0%</b>
</div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Finishing task: \
attempt_local334410887_0001_m_<wbr>000001_0</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: map task executor \
complete.</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Waiting for reduce \
tasks</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: Starting task: \
attempt_local334410887_0001_r_<wbr>000000_0</b> </div>
<div>
<b>17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output \
Committer Algorithm version is 1</b> </div>
<div>
<b>17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: \
ProcfsBasedProcessTree currently is supported only on Linux.</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.Task: Using \
ResourceCalculatorProcessTree : \
org.apache.hadoop.yarn.util.<wbr>WindowsBasedProcessTree@<wbr>13ac822f</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.ReduceTask: Using \
ShuffleConsumerPlugin: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle@6c4d20c4</b> </div>
<div>
<b>17/02/22 18:40:46 INFO reduce.MergeManagerImpl: MergerManager: \
memoryLimit=334338464, maxSingleShuffleLimit=<wbr>83584616, mergeThreshold=220663392, \
ioSortFactor=10, memToMemMergeOutputsThreshold=<wbr>10</b> </div>
<div>
<b>17/02/22 18:40:46 INFO reduce.EventFetcher: \
attempt_local334410887_0001_r_<wbr>000000_0 Thread started: EventFetcher for fetching \
Map Completion Events</b> </div>
<div>
<b>17/02/22 18:40:46 INFO mapred.LocalJobRunner: reduce task executor \
complete.</b> </div>
<div>
<b>17/02/22 18:40:46 WARN mapred.LocalJobRunner: \
job_local334410887_0001</b> </div>
<div>
<b>java.lang.Exception: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle$<wbr>ShuffleError: error in \
shuffle in localfetcher#1</b> </div>
<div>
<b> at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job.runTasks(<wbr>LocalJobRunner.java:462)</b> \
</div>
<div>
<b> at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job.run(<wbr>LocalJobRunner.java:529)</b> \
</div>
<div>
<b>Caused by: \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle$<wbr>ShuffleError: error in \
shuffle in localfetcher#1</b> </div>
<div>
<b> at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.Shuffle.run(<wbr>Shuffle.java:134)</b> \
</div> <div>
<b> at \
org.apache.hadoop.mapred.<wbr>ReduceTask.run(ReduceTask.<wbr>java:376)</b> </div>
<div>
<b> at \
org.apache.hadoop.mapred.<wbr>LocalJobRunner$Job$<wbr>ReduceTaskRunnable.run(<wbr>LocalJobRunner.java:319)</b> \
</div>
<div>
<b> at \
java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)</b> \
</div>
<div>
<b> at \
java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266)</b> </div>
<div>
<b> at \
java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)</b> \
</div>
<div>
<b> at \
java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)</b> \
</div>
<div>
<b> at java.lang.Thread.run(Thread.<wbr>java:745)</b>
</div>
<div>
<b>Caused by: java.io.FileNotFoundException: \
D:/tmp/hadoop-Vasil%<wbr>20Grigorov/mapred/local/<wbr>localRunner/Vasil%20Grigorov/<wb \
r>jobcache/job_local334410887_<wbr>0001/attempt_local334410887_<wbr>0001_m_000000_0/output/file.<wbr>out.index</b> \
</div>
<div>
<b> at \
org.apache.hadoop.fs.<wbr>RawLocalFileSystem.open(<wbr>RawLocalFileSystem.java:200)</b> \
</div>
<div>
<b> at \
org.apache.hadoop.fs.<wbr>FileSystem.open(FileSystem.<wbr>java:769)</b> </div>
<div>
<b> at <a href="http://org.apache.hadoop.io" \
target="_blank">org.apache.hadoop.io</a>.<wbr>SecureIOUtils.<wbr>openFSDataInputStream(<wbr>SecureIOUtils.java:156)</b> \
</div>
<div>
<b> at \
org.apache.hadoop.mapred.<wbr>SpillRecord.<init>(<wbr>SpillRecord.java:71)</b> \
</div> <div>
<b> at \
org.apache.hadoop.mapred.<wbr>SpillRecord.<init>(<wbr>SpillRecord.java:62)</b> \
</div> <div>
<b> at \
org.apache.hadoop.mapred.<wbr>SpillRecord.<init>(<wbr>SpillRecord.java:57)</b> \
</div> <div>
<b> at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.<wbr>copyMapOutput(LocalFetcher.<wbr>java:124)</b> \
</div>
<div>
<b> at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.<wbr>doCopy(LocalFetcher.java:102)</b> \
</div>
<div>
<b> at \
org.apache.hadoop.mapreduce.<wbr>task.reduce.LocalFetcher.run(<wbr>LocalFetcher.java:85)</b> \
</div>
<div>
<b>17/02/22 18:40:47 INFO mapreduce.Job: Job job_local334410887_0001 \
failed with state FAILED due to: NA</b> </div>
<div>
<b>17/02/22 18:40:47 INFO mapreduce.Job: Counters: 18</b>
</div>
<div>
<b> File System Counters</b>
</div>
<div>
<b> FILE: Number of bytes read=1158</b>
</div>
<div>
<b> FILE: Number of bytes written=591978</b>
</div>
<div>
<b> FILE: Number of read operations=0</b>
</div>
<div>
<b> FILE: Number of large read operations=0</b> \
</div>
<div>
<b> FILE: Number of write operations=0</b>
</div>
<div>
<b> Map-Reduce Framework</b>
</div>
<div>
<b> Map input records=2</b>
</div>
<div>
<b> Map output records=8</b>
</div>
<div>
<b> Map output bytes=86</b>
</div>
<div>
<b> Map output materialized bytes=89</b>
</div>
<div>
<b> Input split bytes=308</b>
</div>
<div>
<b> Combine input records=8</b>
</div>
<div>
<b> Combine output records=6</b>
</div>
<div>
<b> Spilled Records=6</b>
</div>
<div>
<b> Failed Shuffles=0</b>
</div>
<div>
<b> Merged Map outputs=0</b>
</div>
<div>
<b> GC time elapsed (ms)=0</b>
</div>
<div>
<b> Total committed heap usage \
(bytes)=574095360</b> </div>
<div>
<b> File Input Format Counters</b>
</div>
<div>
<b> Bytes Read=52</b>
</div>
<div>
<br>
</div>
</div>
<div>
I have followed every tutorial available and looked for a potention \
solution to the error I get, but I have been unsuccessful. As I mentioned before, I \
have not set any further configurations to any files because I want to run it in \
Standalone mode, rather than pseudo-distributed or fully distributed mode. I've \
spent a lot of time and effort to get this far and I've hit a brick wall with \
this error, so any help would be GREATLY appreciated. </div>
<div>
<br>
</div>
<div>
Thank you in advance!
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div></div></div></div>
</blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic