'[jira] [Updated] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Updated] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer
From:       "Anton Oellerer (Jira)" <jira () apache ! org>
Date:       2020-03-30 15:49:00
Message-ID: JIRA.13295023.1585582787000.26602.1585583340217 () Atlassian ! JIRA
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/AVRO-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Anton Oellerer updated AVRO-2787:
---------------------------------
    Description: 
Hey,

I am trying to create a Hadoop pipeline getting the chi squared value in for tokens \
in reviews saved in JSON.

For this, I created multiple Hadoop jobs, and the communication between them happens, \
partly, with Avro Data containers.

When trying to run this pipeline, I get the following error at the end of the first \
reduce Job (Signature {code:java}
public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, \
AvroKey<CharSequence>, AvroValue<CategoryData>>{code} )

Error:
{code:java}
java.lang.Exception: java.lang.NoSuchMethodError: \
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) \
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)   \
 Caused by: java.lang.NoSuchMethodError: \
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
                
        at org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)    \
                
        at org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84) \
                
        at org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
                
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
                
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)               \
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
                
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   \
                
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                
        at java.lang.Thread.run(Thread.java:748)                                      \
 {code}
The Job is setup like this:
{code:java}
Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, \
Schema.create(Schema.Type.STRING)); \
AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, CategoryData.getClassSchema());

jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);

jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);

jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);

String in = otherArgs.get(0);
String out = otherArgs.get(1);

FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, \
"outCategoryData")); {code}
The pipeline is run by first building a shadowJar from the source in the development \
environment and then running it in a podman container.

With Avro 1.8.2 and gradle plugin 0.16.0 the reduce job works. 

Does someone know what the problem here might be?

Best regards

Anton

  was:
Hey,

I am trying to create a Hadoop pipeline getting the chi squared value in for tokens \
in reviews saved in JSON.

For this, I created multiple Hadoop jobs, and the communication between them happens, \
partly, with Avro Data containers.

When trying to run this pipeline, I get the following error at the end of the first \
reduce Job (Signature {code:java}
public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, \
AvroKey<CharSequence>, AvroValue<CategoryData>>{code} )

Error:
{code:java}
java.lang.Exception: java.lang.NoSuchMethodError: \
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) \
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)   \
 Caused by: java.lang.NoSuchMethodError: \
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
                
        at org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)    \
                
        at org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84) \
                
        at org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
                
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
                
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)               \
                
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
                
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   \
                
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                
        at java.lang.Thread.run(Thread.java:748)                                      \
 {code}
The Job is setup like this:
{code:java}
Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, \
Schema.create(Schema.Type.STRING)); \
AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, CategoryData.getClassSchema());

jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);

jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);

jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);

String in = otherArgs.get(0);
String out = otherArgs.get(1);

FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, \
"outCategoryData")); {code}
The pipeline is run by first building a shadowJar from the source in the development \
environment and then running it in a podman container.

  

Does someone know what the problem here might be?

Best regards

Anton


> Hadoop Mapreduce job fails when creating Writer
> -----------------------------------------------
> 
> Key: AVRO-2787
> URL: https://issues.apache.org/jira/browse/AVRO-2787
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2
> Environment: Development
> * OS: Fedora 31
> * Java version 8
> * Gradle version 6.2.2
> * Avro version 1.9.2
> * Shadow version 5.2.0
> * Gradle-avro-plugin version 0.19.1
> Running in a Podman container
> * OS: Ubuntu 18.04
> * Podman 1.8.2
> * Hadoop version 3.2.1
> * Java version 8
> Reporter: Anton Oellerer
> Priority: Blocker
> Attachments: CategoryData.avsc, CategoryTokensReducer.java, \
> TextprocessingfundamentalsApplication.java 
> 
> Hey,
> I am trying to create a Hadoop pipeline getting the chi squared value in for tokens \
> in reviews saved in JSON. For this, I created multiple Hadoop jobs, and the \
> communication between them happens, partly, with Avro Data containers. When trying \
> to run this pipeline, I get the following error at the end of the first reduce Job \
> (Signature {code:java}
> public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, \
> AvroKey<CharSequence>, AvroValue<CategoryData>>{code} )
> Error:
> {code:java}
> java.lang.Exception: java.lang.NoSuchMethodError: \
> org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>  at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)   \
>  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)        \
>  Caused by: java.lang.NoSuchMethodError: \
> org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>  at org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)         \
>  at org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84) \
>  at org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
>  at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
>  at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)                     \
>  at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)                       
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)                                            \
>  {code}
> The Job is setup like this:
> {code:java}
> Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
> AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, \
> Schema.create(Schema.Type.STRING)); \
> AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, \
> CategoryData.getClassSchema()); \
> jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class); \
> jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class); \
> jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class); \
> jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class); \
> jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class); \
> jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class); \
> String in = otherArgs.get(0); String out = otherArgs.get(1);
> FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
> FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, \
> "outCategoryData")); {code}
> The pipeline is run by first building a shadowJar from the source in the \
> development environment and then running it in a podman container. With Avro 1.8.2 \
> and gradle plugin 0.16.0 the reduce job works.  Does someone know what the problem \
> here might be? Best regards
> Anton



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic