[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    Re: Using Python and Flume to store avro data
From:       Bart Verwilst <lists () verwilst ! be>
Date:       2012-11-16 10:54:15
Message-ID: 13752e7c3c0b4eddfbc26f67283a69f1 () verwilst ! be
[Download RAW message or body]

Hello, 

You send avro to Flume, but how is it stored? I would like
to have avro files as a result in HDFS, not sequencefiles containing
json or something. Not sure if that's possible? Basically and
conceptually, I want to query my MySQL, and write that data to AVRO
files in HDFS. I can't use sqoop because for every row of table X, if
have an extra array of rows from table Y that are included in the same
avro record. The idea is to create a pretty continuous flow from MySQL
into HDFS. 

This is how i would like to store it in HDFS ( avro schema
): 

{
 "type": "record",
 "name": "trace",
 "namespace": "asp",

"fields": [
 { "name": "id" , "type": "long" },
 { "name": "timestamp" ,
"type": "long" },
 { "name": "terminalid", "type": "int" },
 { "name":
"mileage", "type": ["int","null"] },
 { "name": "creationtime", "type":
"long" },
 { "name": "type", "type": "int" },
 { "name": "properties",
"type": {
 "type": "array",
 "items": {
 "name": "property",
 "type":
"record",
 "fields": [
 { "name": "id", "type": "long" },
 { "name":
"value", "type": "string" },
 { "name": "key", "type": "string" },
 ]

}
 }
 } 
 ]
} 

How do you suggest i go about this ( knowing my Java foo
is very limited ;) )? 

Thanks! 

Kind regards, 

Bart 

Andrew Jones
schreef op 13.11.2012 10:28: 

> We also use Thrift to send from
multiple languages, but have written a custom source to accept the
messages. 
> 
> Writing a custom source was quite easy. Start by looking
at the code for ThriftLegacySource and AvroSource. 
> 
> Andrew 
> 
> On
12 November 2012 19:52, Camp, Roy <rcamp@ebay.com> wrote:
> 
>> We use
thrift to send from Python, PHP & Java. Unfortunately with Flume-NG you
must use the legacyThrift source which works well but does not handle a
confirmation/ack back to the app. We have found that failures usually
result in connection exception thus allowing us to reconnect and retry
so we have virtually no data loss. Everything downstream from that
localhost Flume instance (after written to the file channel) is E2E
safe.
>> 
>> Roy
>> 
>> -----Original Message-----
>> From: Juhani
Connolly [mailto:juhani_connolly@cyberagent.co.jp]
>> Sent: Thursday,
November 08, 2012 5:46 PM
>> To: user@flume.apache.org
>> Subject: Re:
Using Python and Flume to store avro data
>> 
>> Hi Bart,
>> 
>> we send
data from python to the scribe source and it works fine. We had
everything set up in scribe before which made the switchover simple. If
you don't mind the extra overhead of http, go for that, but if you want
to keep things to a minimum, using the scribe source can be viable.
>>

>> You can't send data to avro because the python support in avro is
missing the appropriate encoder(I can't remember what it was, I'd have
to check over the code again)
>> 
>> On 11/09/2012 03:45 AM, Bart
Verwilst wrote:
>> > Hi,
>> >
>> > I've been spending quite a few hours
trying to push avro data to Flume
>> > so i can store it on HDFS, this
all with Python.
>> > It seems like something that is impossible for
now, since the only way
>> > to push avro data to Flume is by the use of
deprecated thrift binding
>> > that look pretty cumbersome to get
working.
>> > I would like to know what's the best way to import avro
data into
>> > Flume with Python? Maybe Flume isnt the right tool and I
should use
>> > something else? My goal is to have multiple python
workers pushing
>> > data to HDFS which ( by means of Flume in this case
) consolidates
>> > this all in 1 file there.
>> >
>> > Any thoughts?
>>
>
>> > Thanks!
>> >
>> > Bart
>> >
>> >
 
[Attachment #3 (unknown)]

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body style='font-family: Verdana,Geneva,sans-serif'>
<p>Hello,</p>
<p>&nbsp;</p>
<p>You send avro to Flume, but how is it stored? I would like to have avro files as a \
result in HDFS, not sequencefiles containing json or something. Not sure if that's \
possible? Basically and conceptually, I want to query my MySQL, and write that data \
to AVRO files in HDFS. I can't use sqoop because for every row of table X, if have an \
extra array of rows from table Y that are included in the same avro record. The idea \
is to create a pretty continuous flow from MySQL into HDFS.</p> <p>This is how i \
would like to store it in HDFS ( avro schema ):</p> <p>{<br />&nbsp;&nbsp;&nbsp; \
"type": "record",<br />&nbsp;&nbsp;&nbsp; "name": "trace",<br />&nbsp;&nbsp;&nbsp; \
"namespace": "asp",<br />&nbsp;&nbsp;&nbsp; "fields": [<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {&nbsp;&nbsp; "name": "id"&nbsp;&nbsp; , \
"type": "long"&nbsp;&nbsp; },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp; "name": "timestamp"&nbsp;&nbsp;&nbsp; , "type": \
"long"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {&nbsp;&nbsp; "name": "terminalid", \
"type": "int"&nbsp;&nbsp; },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp; "name": "mileage", "type": ["int","null"]&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {&nbsp;&nbsp; "name": "creationtime", \
"type": "long"&nbsp;&nbsp; },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp; "name": "type", "type": "int"&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {&nbsp;&nbsp; "name": "properties", \
"type": {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
"type": "array",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
"items": {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
"name": "property",<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
"type": "record",<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
"fields": [<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp;&nbsp; "name": "id", "type": "long"&nbsp;&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp;&nbsp; "name": "value", "type": "string"&nbsp;&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{&nbsp;&nbsp;&nbsp; "name": "key", "type": "string"&nbsp;&nbsp;&nbsp; },<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br \
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } <br />&nbsp;&nbsp;&nbsp; ]<br />}</p> \
<p>How do you suggest i go about this ( knowing my Java foo is very limited ;) )?</p> \
<p>&nbsp;</p> <p>Thanks!</p>
<p>Kind regards,</p>
<p>Bart</p>
<p>&nbsp;</p>
<div>&nbsp;</div>
<p>Andrew Jones schreef op 13.11.2012 10:28:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; \
margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta \
ignored --> <p>We also use Thrift to send from&nbsp;multiple&nbsp;languages, but have \
written a custom source to accept the messages.</p> <div>&nbsp;</div>
<div>Writing a custom source was quite easy. Start by looking at the code \
for&nbsp;ThriftLegacySource and AvroSource.</div> <div>&nbsp;</div>
<div>Andrew</div>
<div class="gmail_extra"><br /><br />
<div class="gmail_quote">On 12 November 2012 19:52, Camp, Roy <span>&lt;<a \
href="mailto:rcamp@ebay.com">rcamp@ebay.com</a>&gt;</span> wrote:<br /> <blockquote \
class="gmail_quote" style="margin: 0  0  0  .8ex; border-left: 1px  #ccc  solid; \
padding-left: 1ex;">We use thrift to send from Python, PHP &amp; Java. \
&nbsp;Unfortunately with Flume-NG you must use the legacyThrift source which works \
well but does not handle a confirmation/ack back to the app. &nbsp;We have found that \
failures usually result in connection exception thus allowing us to reconnect and \
retry so we have virtually no data loss. Everything downstream from that localhost \
Flume instance (after written to the file channel) is E2E safe.<br /><span \
class="HOEnZb"><span style="color: #888888;"><br /> Roy<br /></span></span> <div \
class="HOEnZb"> <div class="h5"><br /><br /> -----Original Message-----<br /> From: \
Juhani Connolly [mailto:<a \
href="mailto:juhani_connolly@cyberagent.co.jp">juhani_connolly@cyberagent.co.jp</a>]<br \
/> Sent: Thursday, November 08, 2012 5:46 PM<br /> To: <a \
href="mailto:user@flume.apache.org">user@flume.apache.org</a><br /> Subject: Re: \
Using Python and Flume to store avro data<br /><br /> Hi Bart,<br /><br /> we send \
data &nbsp;from python to the scribe source and it works fine. We had everything set \
up in scribe before which made the switchover simple. If you don't mind the extra \
overhead of http, go for that, but if you want to keep things to a minimum, using the \
scribe source can be viable.<br /><br /> You can't send data to avro because the \
python support in avro is missing the appropriate encoder(I can't remember what it \
was, I'd have to check over the code again)<br /><br /> On 11/09/2012 03:45 AM, Bart \
Verwilst wrote:<br /> &gt; Hi,<br /> &gt;<br /> &gt; I've been spending quite a few \
hours trying to push avro data to Flume<br /> &gt; so i can store it on HDFS, this \
all with Python.<br /> &gt; It seems like something that is impossible for now, since \
the only way<br /> &gt; to push avro data to Flume is by the use of deprecated thrift \
binding<br /> &gt; that look pretty cumbersome to get working.<br /> &gt; I would \
like to know what's the best way to import avro data into<br /> &gt; Flume with \
Python? Maybe Flume isnt the right tool and I should use<br /> &gt; something else? \
My goal is to have multiple python workers pushing<br /> &gt; data to HDFS which ( by \
means of Flume in this case ) consolidates<br /> &gt; this all in 1 file there.<br /> \
&gt;<br /> &gt; Any thoughts?<br /> &gt;<br /> &gt; Thanks!<br /> &gt;<br /> &gt; \
Bart<br /> &gt;<br /> &gt;<br /><br /></div> </div>
</blockquote>
</div>
</div>
</blockquote>
</body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic