[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: CustomSerializer throws org.apache.avro.AvroRuntimeException: not open
From:       Sunita Arvind <sunitarvind () gmail ! com>
Date:       2016-07-28 16:27:23
Message-ID: CANSBmbxjjDBdbP3H1cFRNxXU7_zGFBtJ6Yjpqa4PcQuZavji6Q () mail ! gmail ! com
[Download RAW message or body]

Nicolas,

For my usecase, I have the schema so I used the maven plugin to generate
avro classes for it. However, for your usecase where you do not have that
and just have a .avro, looks like this is an easy way:

java -jar ~/avro-tools-1.7.7.jar getschema twitter.avro > twitter.avsc

Found it here - https://github.com/miguno/avro-cli-examples

Another option as you are not using java is,
On mac :
1. ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)" <
/dev/null 2> /dev/null
2. brew install avro-tools
3. avro-tools getschema file.avro

Typing avro-tools should give you the usage details and that might get you
further details. Hope this helps.

regards
Sunita

On Thu, Jul 28, 2016 at 6:29 AM, Nicolas Ranc <rnc.nicolas@gmail.com> wrote:

> Dear Sunita,
> 
> Thank you for your fast answer.
> 
> It's not exactly what i'm expected.
> I am using apache avro C++ and i would like to deserialize an .avro file.
> Just with one .avro file and Avro C++ functions i'm trying to extract the
> schematic, the keys on the schema and the data at the end of the file.
> ("avroProgram2.avro").
> 
> I saw different functions: for example \
> *http://avro.apache.org/docs/1.7.6/api/cpp/html/ \
> <http://avro.apache.org/docs/1.7.6/api/cpp/html/>* in the file:* resolving.cc*, the \
> program use load(...) to import schema and use it after in the \
> avro::resolvingDecoder. In my case, i can't import this schema for deserialization: \
> i just have the .avro file and i'm searching function to store information from \
> this file avro file (extract the schema, the keys and values). I need these
> information because i have to create new object after in Matlab - using Mex
> functions (in C++).
> 
> Thanks you for your time,
> Nicolas Ranc
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2016-07-27 19:10 GMT+02:00 Sunita Arvind <sunitarvind@gmail.com>:
> 
> > For benefit of anyone else hitting the same issue, here is what I found:
> > 
> > The serializer I was using was extending AbstractAvroEventSerializer.
> > This class has a lot of adoption, so its not likely to be an issue in the
> > abstract class. However, I got rid of this issue by overriding the
> > configure method in AbstractAvroEventSerializer in my custom serializer, as
> > below:
> > 
> > 
> > public void configure(Context context) {
> > int syncIntervalBytes = context.getInteger("syncIntervalBytes", \
> > Integer.valueOf(2048000)).intValue(); String compressionCodec = \
> > context.getString("compressionCodec", "null"); this.writer = new \
> > ReflectDatumWriter(this.getSchema()); this.dataFileWriter = new \
> > DataFileWriter(this.writer); \
> > this.dataFileWriter.setSyncInterval(syncIntervalBytes); try {
> > CodecFactory e = CodecFactory.fromString(compressionCodec);
> > this.dataFileWriter.setCodec(e);
> > *  this.dataFileWriter.create(schema,out); --> added the creation *
> > } catch (AvroRuntimeException var5) {
> > logger.warn("Unable to instantiate avro codec with name (" + compressionCodec + \
> > "). Compression disabled. Exception follows.", var5); } catch (IOException io){
> > logger.warn("Could not open dataFileWriter Exception follows.", \
> > io.getStackTrace()); }
> > 
> > }
> > 
> > After this, the files are getting created in hdfs just right.
> > I was also able to view the files in spark using spark-avro package.
> > Hope this is the right way to do it and the solution helps someone.
> > Would love to hear if anyone in avro or flume community knows of a better way to \
> > do it. 
> > regards
> > Sunita
> > 
> > 
> > On Tue, Jul 26, 2016 at 12:45 PM, Sunita Arvind <sunitarvind@gmail.com>
> > wrote:
> > 
> > > Hello Experts,
> > > 
> > > I am trying to convert a custom data source received in flume into avro
> > > and push to hdfs. What I am attempting to do is
> > > syslog -> flume -> flume interceptor to convert into
> > > avroObject.toByteArray -> hdfs serializer which decodes the byteArray back
> > > to Avro
> > > 
> > > The flume configuration looks like:
> > > 
> > > tier1.sources.syslogsource.interceptors.i2.type=timestamp
> > > tier1.sources.syslogsource.interceptors.i2.preserveExisting=true
> > > tier1.sources.syslogsource.interceptors.i1.dataSourceType=DataSource1
> > > tier1.sources.syslogsource.interceptors.i1.type =
> > > com.flume.CustomToAvroConvertInterceptor$Builder
> > > 
> > > #hdfs sink for archival and batch analysis
> > > tier1.sinks.hdfssink.type = hdfs
> > > tier1.sinks.hdfssink.hdfs.writeFormat = Text
> > > tier1.sinks.hdfssink.hdfs.fileType = DataStream
> > > 
> > > tier1.sinks.hdfssink.hdfs.filePrefix=%{flumeHost}-%{host}%{customerId}-%Y%m%d-%H
> > >  tier1.sinks.hdfssink.hdfs.inUsePrefix=_
> > > 
> > > tier1.sinks.hdfssink.hdfs.path=/hive/rawavro/customer_id=%{customerId}/date=%Y%m%d/hr=%H
> > >  tier1.sinks.hdfssink.hdfs.fileSuffix=.avro
> > > # roll file if it's been 10 * 60 seconds = 600
> > > tier1.sinks.hdfssink.hdfs.rollInterval=600
> > > # roll file if we get 50,000 log lines (~25MB)
> > > tier1.sinks.hdfssink.hdfs.rollCount=0
> > > tier1.sinks.hdfssink.hdfs.batchSize = 100
> > > tier1.sinks.hdfssink.hdfs.rollSize=0
> > > tier1.sinks.hdfssink.serializer=com.flume.RawAvroHiveSerializer$Builder
> > > tier1.sinks.hdfssink.serializer.compressionCodec=snappy
> > > tier1.sinks.hdfssink.channel = hdfsmem
> > > 
> > > When I use tier1.sinks.hdfssink.serializer=avro_event
> > > I get binary data stored into hdfs which is the
> > > CustomToAvroConvertInterceptor.intercept(event.getbody).toByteArray ,
> > > however this data cannot be parsed in hive. As a result, I see all nulls in
> > > the column values.
> > > Based on -
> > > https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-HowcanIserializedirectlyto/fromabytearray
> > >  ?
> > > all I am doing in RawAvroHiveSerializer.convert is to decode using
> > > binary Decoder.
> > > The exception I get seems to be unrelated to the code itself, hence
> > > pasting the stack trace. Will share the code if it is required to identify
> > > the rootcause:
> > > 
> > > 2016-07-26 19:15:27,187 ERROR org.apache.flume.SinkRunner: Unable to
> > > deliver event. Exception follows.
> > > org.apache.flume.EventDeliveryException:
> > > org.apache.avro.AvroRuntimeException: not open
> > > at
> > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463)
> > > at
> > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >  at
> > > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: org.apache.avro.AvroRuntimeException: not open
> > > at
> > > org.apache.avro.file.DataFileWriter.assertOpen(DataFileWriter.java:82)
> > > at
> > > org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:299)
> > > at
> > > org.apache.flume.serialization.AbstractAvroEventSerializer.write(AbstractAvroEventSerializer.java:108)
> > >  at
> > > org.apache.flume.sink.hdfs.HDFSDataStream.append(HDFSDataStream.java:124)
> > > at
> > > org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:550)
> > > at
> > > org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:547)
> > > at
> > > org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
> > > at
> > > org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
> > > 
> > > I can reproduce this local file system as well. In the testcase, I tried
> > > setting the file open to append=true and still encounter the same exception.
> > > 
> > > Appreciate any guidance in this regard.
> > > 
> > > regards
> > > Sunita
> > > 
> > 
> > 
> 


[Attachment #3 (text/html)]

<div dir="ltr"><div><div><div><div><div><div><div>Nicolas,<br><br></div>For my \
usecase, I have the schema so I used the maven plugin to generate avro classes for \
it. However, for your usecase where you do not have that and just have a .avro, looks \
like this is an easy way:<br><br><pre style="font-family:consolas,&quot;liberation \
mono&quot;,menlo,courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px; \
font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;line-heig \
ht:1.45;word-wrap:normal;padding:16px;overflow:auto;border-radius:3px;color:rgb(51,51, \
51);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;background-color:rgb(247,247,247)"><code \
style="font-family:consolas,&quot;liberation \
mono&quot;,menlo,courier,monospace;font-size:13.6px;padding:0px;margin:0px;border-radi \
us:3px;word-break:normal;white-space:pre;border-width:0px;border-style:none;border-col \
or:-moz-use-text-color;display:inline;overflow:visible;line-height:inherit;word-wrap:normal;background:transparent \
none repeat scroll 0% 0%">java -jar ~/avro-tools-1.7.7.jar getschema twitter.avro \
&gt; twitter.avsc<br></code></pre>Found it here - <a \
href="https://github.com/miguno/avro-cli-examples">https://github.com/miguno/avro-cli-examples</a><br><br></div>Another \
option as you are not using java is,<br></div>On mac :<br>1. ruby -e &quot;$(curl \
-fsSL <a href="https://raw.githubusercontent.com/Homebrew/install/master/install">https://raw.githubusercontent.com/Homebrew/install/master/install</a>)&quot; \
&lt; /dev/null 2&gt; /dev/null<br>2. brew install avro-tools<br></div>3. avro-tools \
getschema file.avro<br><br></div>Typing avro-tools should give you the usage details \
and that might get you further details. Hope this \
helps.<br><br></div>regards<br></div>Sunita<br></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 28, 2016 at 6:29 AM, \
Nicolas Ranc <span dir="ltr">&lt;<a href="mailto:rnc.nicolas@gmail.com" \
target="_blank">rnc.nicolas@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div>Dear \
Sunita,<br><br></div>Thank you for your fast answer. <br><br></div>It&#39;s not \
exactly what i&#39;m expected. <br>I  am using apache avro C++ and i would like to \
deserialize an .avro file.  Just with one .avro file and Avro C++ functions i&#39;m \
trying to extract  the schematic, the keys on the schema and the data at the end of \
the  file. (&quot;avroProgram2.avro&quot;).<br></div><br>I saw different functions: \
for example <u><a href="http://avro.apache.org/docs/1.7.6/api/cpp/html/" \
target="_blank">http://avro.apache.org/docs/1.7.6/api/cpp/html/</a></u> in the \
file:<b> resolving.cc</b>, the program use load(...) to import schema and use it \
after in the avro::resolvingDecoder. <br>In  my case, i can&#39;t import this schema \
for deserialization: i just have  the .avro file and i&#39;m searching function to \
store information from this  file avro file (extract the schema, the keys and \
values). I need these  information because i have to create new object after in \
Matlab - using  Mex functions (in C++).<br><br></div>Thanks you for your \
time,<br></div>Nicolas Ranc<div><div \
class="h5"><br><br><br><br><br><br><br><br><br><br><br><div><div><br><div \
class="gmail_extra"><br><div class="gmail_quote">2016-07-27 19:10 GMT+02:00 Sunita \
Arvind <span dir="ltr">&lt;<a href="mailto:sunitarvind@gmail.com" \
target="_blank">sunitarvind@gmail.com</a>&gt;</span>:<br><blockquote \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex" class="gmail_quote"><div dir="ltr"><div>For \
benefit of anyone else hitting the same issue, here is what I found:<br><br></div>The \
serializer I was using was extending AbstractAvroEventSerializer. This class has a \
lot of adoption, so its not likely to be an issue in the abstract class. However, I \
got rid of this issue by overriding the configure method in \
AbstractAvroEventSerializer in my custom serializer, as below:<br><pre \
style="background-color:rgb(255,255,255);color:rgb(0,0,0);font-family:&quot;menlo&quot;;font-size:9pt"><br><span \
style="color:rgb(0,0,128);font-weight:bold">public void </span>configure(Context \
context) {<br>    <span style="color:rgb(0,0,128);font-weight:bold">int \
</span>syncIntervalBytes = context.getInteger(<span \
style="color:rgb(0,128,0);font-weight:bold">&quot;syncIntervalBytes&quot;</span>, \
Integer.<span style="font-style:italic">valueOf</span>(<span \
style="color:rgb(0,0,255)">2048000</span>)).intValue();<br>    String \
compressionCodec = context.getString(<span \
style="color:rgb(0,128,0);font-weight:bold">&quot;compressionCodec&quot;</span>, \
<span style="color:rgb(0,128,0);font-weight:bold">&quot;null&quot;</span>);<br>    \
<span style="color:rgb(0,0,128);font-weight:bold">this</span>.<span \
style="color:rgb(102,14,122);font-weight:bold">writer </span>= <span \
style="color:rgb(0,0,128);font-weight:bold">new </span>ReflectDatumWriter(<span \
style="color:rgb(0,0,128);font-weight:bold">this</span>.getSchema());<br>    <span \
style="color:rgb(0,0,128);font-weight:bold">this</span>.<span \
style="color:rgb(102,14,122);font-weight:bold">dataFileWriter </span>= <span \
style="color:rgb(0,0,128);font-weight:bold">new </span>DataFileWriter(<span \
style="color:rgb(0,0,128);font-weight:bold">this</span>.<span \
style="color:rgb(102,14,122);font-weight:bold">writer</span>);<br>    <span \
style="color:rgb(0,0,128);font-weight:bold">this</span>.<span \
style="color:rgb(102,14,122);font-weight:bold">dataFileWriter</span>.setSyncInterval(syncIntervalBytes);<br> \
<span style="color:rgb(0,0,128);font-weight:bold">try </span>{<br>        \
CodecFactory e = CodecFactory.<span \
style="font-style:italic">fromString</span>(compressionCodec);<br>        <span \
style="color:rgb(0,0,128);font-weight:bold">this</span>.<span \
style="color:rgb(102,14,122);font-weight:bold">dataFileWriter</span>.setCodec(e);<br> \
<b>  <span style="color:rgb(0,0,128)">this</span>.<span \
style="color:rgb(102,14,122)">dataFileWriter</span>.create(<span \
style="color:rgb(102,14,122)">schema</span>,<span \
style="color:rgb(102,14,122)">out</span>); --&gt; added the creation </b><br>    } \
<span style="color:rgb(0,0,128);font-weight:bold">catch </span>(AvroRuntimeException \
var5) {<br>        <span \
style="color:rgb(102,14,122);font-weight:bold;font-style:italic">logger</span>.warn(<span \
style="color:rgb(0,128,0);font-weight:bold">&quot;Unable to instantiate avro codec \
with name (&quot; </span>+ compressionCodec + <span \
style="color:rgb(0,128,0);font-weight:bold">&quot;). Compression disabled. Exception \
follows.&quot;</span>, var5);<br>    } <span \
style="color:rgb(0,0,128);font-weight:bold">catch </span>(IOException io){<br>        \
<span style="color:rgb(102,14,122);font-weight:bold;font-style:italic">logger</span>.warn(<span \
style="color:rgb(0,128,0);font-weight:bold">&quot;Could not open \
dataFileWriter</span><span style="color:rgb(0,128,0);font-weight:bold"> Exception \
follows.&quot;</span>, io.getStackTrace());<br>    }<br><br>}<br></pre><pre \
style="background-color:rgb(255,255,255);color:rgb(0,0,0);font-family:&quot;menlo&quot;;font-size:9pt">After \
this, the files are getting created in hdfs just right. <br>I was also able to view \
the files in spark using spark-avro package.<br>Hope this is the right way to do it \
and the solution helps someone.<br>Would love to hear if anyone in avro or flume \
community knows of a better way to do it.<br></pre><pre \
style="background-color:rgb(255,255,255);color:rgb(0,0,0);font-family:&quot;menlo&quot;;font-size:9pt">regards<br>Sunita<br></pre></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 26, 2016 at 12:45 PM, \
Sunita Arvind <span dir="ltr">&lt;<a href="mailto:sunitarvind@gmail.com" \
target="_blank">sunitarvind@gmail.com</a>&gt;</span> wrote:<br><blockquote \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex" class="gmail_quote"><div \
dir="ltr"><div><div><div><div><div><div><div>Hello Experts,<br><br></div>I am trying \
to convert a custom data source received in flume into avro and push to hdfs. What I \
am attempting to do is <br>syslog -&gt; flume -&gt; flume interceptor to convert into \
avroObject.toByteArray -&gt; hdfs serializer which decodes the byteArray back to \
Avro<br><br></div>The flume configuration looks \
like:<br><br>tier1.sources.syslogsource.interceptors.i2.type=timestamp<br>tier1.source \
s.syslogsource.interceptors.i2.preserveExisting=true<br>tier1.sources.syslogsource.int \
erceptors.i1.dataSourceType=DataSource1<br>tier1.sources.syslogsource.interceptors.i1.type \
= com.flume.CustomToAvroConvertInterceptor$Builder<br><br>#hdfs sink for archival and \
batch analysis<br>tier1.sinks.hdfssink.type = \
hdfs<br>tier1.sinks.hdfssink.hdfs.writeFormat = \
Text<br>tier1.sinks.hdfssink.hdfs.fileType = \
DataStream<br>tier1.sinks.hdfssink.hdfs.filePrefix=%{flumeHost}-%{host}%{customerId}-% \
Y%m%d-%H<br>tier1.sinks.hdfssink.hdfs.inUsePrefix=_<br>tier1.sinks.hdfssink.hdfs.path= \
/hive/rawavro/customer_id=%{customerId}/date=%Y%m%d/hr=%H<br>tier1.sinks.hdfssink.hdfs.fileSuffix=.avro<br># \
roll file if it&#39;s been 10 * 60 seconds = \
600<br>tier1.sinks.hdfssink.hdfs.rollInterval=600<br># roll file if we get 50,000 log \
lines (~25MB)<br>tier1.sinks.hdfssink.hdfs.rollCount=0<br>tier1.sinks.hdfssink.hdfs.batchSize \
= 100<br>tier1.sinks.hdfssink.hdfs.rollSize=0<br>tier1.sinks.hdfssink.serializer=com.f \
lume.RawAvroHiveSerializer$Builder<br>tier1.sinks.hdfssink.serializer.compressionCodec=snappy<br>tier1.sinks.hdfssink.channel \
= hdfsmem<br><br></div><div>When I use \
tier1.sinks.hdfssink.serializer=avro_event<br></div><div>I get binary data stored \
into hdfs which is the \
CustomToAvroConvertInterceptor.intercept(event.getbody).toByteArray , however this \
data cannot be parsed in hive. As a result, I see all nulls in the column \
values.<br>Based on -   <a \
href="https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-HowcanIserializedirectlyto/fromabytearray" \
target="_blank">https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-HowcanIserializedirectlyto/fromabytearray</a>?<br></div><div>all \
I am doing in RawAvroHiveSerializer.convert is to decode using binary Decoder. \
<br></div>The exception I get seems to be unrelated to the code itself, hence pasting \
the stack trace. Will share the code if it is required to identify the \
rootcause:<br><br>2016-07-26 19:15:27,187 ERROR org.apache.flume.SinkRunner: Unable \
to deliver event. Exception follows.<br>org.apache.flume.EventDeliveryException: \
org.apache.avro.AvroRuntimeException: not open<br>               at \
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463)<br>          \
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)<br> \
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)<br>             \
at java.lang.Thread.run(Thread.java:745)<br>Caused by: \
org.apache.avro.AvroRuntimeException: not open<br>               at \
org.apache.avro.file.DataFileWriter.assertOpen(DataFileWriter.java:82)<br>            \
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:299)<br>            \
at org.apache.flume.serialization.AbstractAvroEventSerializer.write(AbstractAvroEventSerializer.java:108)<br> \
at org.apache.flume.sink.hdfs.HDFSDataStream.append(HDFSDataStream.java:124)<br>      \
at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:550)<br>          \
at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:547)<br>          \
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)<br>         \
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)<br><br></div>I \
can reproduce this local file system as well. In the testcase, I tried setting the \
file open to append=true and still encounter the same \
exception.<br><br></div>Appreciate any guidance in this \
regard.<br><br></div>regards<span><font \
color="#888888"><br></font></span></div><span><font \
color="#888888">Sunita<br></font></span></div> </blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></div></div>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic