[prev in list] [next in list] [prev in thread] [next in thread]
List: flume-user
Subject: Re: Json over netcat source
From: Deepak Subhramanian <deepak.subhramanian () gmail ! com>
Date: 2014-05-09 11:02:11
Message-ID: CA+UubijKn9nUu5vX8k9SFcGCTkfDF8boN59r1E_Kmhy7Nq0_5g () mail ! gmail ! com
[Download RAW message or body]
Sorry. My mistake. It is loading JSON data properly after the temporary fix.
On Thu, May 8, 2014 at 6:24 PM, Deepak Subhramanian <
deepak.subhramanian@gmail.com> wrote:
> Hi Ashish,
>
> Thanks for the solution. I made the changes and I can see the JSON message
> now. There is a JIRA raised on the same issue.
>
> https://issues.apache.org/jira/browse/FLUME-2126
>
>
> From Hive when I load JSON data it automatically splits JSON fields to
> different columns. For some reason the ESSink doesnt load in the same way.
> I am not sure if I am setting the correct type. There is a parameter es.
> input.json I have to set to true in hive table . Is there any similar
> variable I have to set for ESSink
>
> Here is the raw data I am getting in Kibana.
>
> {
> "_index": "test-2014-05-08",
> "_type": "parsed_logs",
> "_id": "7qSBgRx-Q_GLaCDWARs_Cg",
> "_score": null,
> "_source": {
> "@message": "{\"action\":{\"id\":\"00001\"}}",
> "@timestamp": "2014-05-08T16:48:44.180Z",
> "@type": "application/json",
> "@fields": {
> "_attachment_mimetype": "application/json",
> "timestamp": "1399567724180",
> "_type": "application/json",
> "type": "application/json"
> }
> },
> "sort": [
> 1399567724180
> ]
> }
>
>
>
> On Sun, Apr 13, 2014 at 4:56 PM, Ashish <paliwalashish@gmail.com> wrote:
>
> > little more on the issue
> >
> > builder.field(fieldName, tmp); calls the XContentBuilder API where class
> > type is determined and appropriate method is called. Since tmp, which is
> > instance of XContentBuilder, doesn't match any of the defined if conditions
> > it goes to final else where the tmp.toString() is called, and field(String,
> > String) method is called so we get object address in index.
> >
> > Replacing
> > builder.field(fieldName, tmp);
> > with
> > builder.field(fieldName, tmp.string());
> >
> > shall make things work, but I am not sure if this would be the best way
> > to use the API.
> >
> > Got the answer from ES user list :)
> >
> > http://elasticsearch-users.115913.n3.nabble.com/Issue-with-posting-json-data-to-elastic-search-via-Flume-td4054017.html
> >
> > Can ES experts comment on the best way forward?
> >
> >
> >
> > On Sun, Apr 13, 2014 at 8:10 PM, Ashish <paliwalashish@gmail.com> wrote:
> >
> > > Have been able to reproduce the problem locally using the existing test
> > > cases inside ES Sink. The problem does exist.
> > >
> > > Did some initial investigation, the framework is able to detect the JSON
> > > content and tries to add it as complex field.
> > > timestamp is added only if present in header.
> > >
> > > In the class org.apache.flume.sink.elasticsearch.ContentBuilderUtil
> > >
> > > public static void addComplexField(XContentBuilder builder, String
> > > fieldName,
> > > XContentType contentType, byte[] data) throws IOException {
> > > XContentParser parser = null;
> > > try {
> > > XContentBuilder tmp = jsonBuilder();
> > > parser = XContentFactory.xContent(contentType).createParser(data);
> > > parser.nextToken();
> > > tmp.copyCurrentStructure(parser);
> > > builder.field(fieldName, tmp); <<<< This is where the we might
> > > have an issue (real action is happening inside this method
> > > call)
> > >
> > > Can someone familiar with this part look further into this? I shall
> > > debug further as soon as I have free cycles.
> > >
> > > thanks
> > > ashish
> > >
> > >
> > >
> > > On Fri, Apr 11, 2014 at 5:24 PM, Deepak Subhramanian <
> > > deepak.subhramanian@gmail.com> wrote:
> > >
> > > > Thanks Simon. I am also struggling with no luck. I tried using the
> > > > latest flume elastic search sink jar build from 1.5SNAPSHOT ,but still no
> > > > luck. I will try to see if it is an issue with elastic search api . When I
> > > > loaded json using hive it loaded JSON properly. But we have to pass a
> > > > property es.input.json in hive. Is there a way to pass the same in Flume.
> > > >
> > > > CREATE EXTERNAL TABLE json (data STRING \
> > > > <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-1>)
> > > >
> > > >
> > > >
> > > >
> > > > STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
> > > > TBLPROPERTIES('es.resource' = '...',
> > > >
> > > >
> > > >
> > > >
> > > > 'es.input.json` = 'yes' \
> > > > <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-2>);
> > > >
> > > >
> > >
> > >
> > > --
> > > thanks
> > > ashish
> > >
> > > Blog: http://www.ashishpaliwal.com/blog
> > > My Photo Galleries: http://www.pbase.com/ashishpaliwal
> > >
> >
> >
> >
> > --
> > thanks
> > ashish
> >
> > Blog: http://www.ashishpaliwal.com/blog
> > My Photo Galleries: http://www.pbase.com/ashishpaliwal
> >
>
>
>
> --
> Deepak Subhramanian
>
--
Deepak Subhramanian
[Attachment #3 (text/html)]
<div dir="ltr">Sorry. My mistake. It is loading JSON data properly after the \
temporary fix.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, \
May 8, 2014 at 6:24 PM, Deepak Subhramanian <span dir="ltr"><<a \
href="mailto:deepak.subhramanian@gmail.com" \
target="_blank">deepak.subhramanian@gmail.com</a>></span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Ashish,<div><br></div><div>Thanks for the \
solution. I made the changes and I can see the JSON message now. There is a JIRA \
raised on the same issue.</div> <div><div> </div></div><div><a \
href="https://issues.apache.org/jira/browse/FLUME-2126" \
target="_blank">https://issues.apache.org/jira/browse/FLUME-2126</a><br> \
</div><div><br></div><div><br></div><div>From Hive when I load JSON data it \
automatically splits JSON fields to different columns. For some reason the ESSink \
doesnt load in the same way. I am not sure if I am setting the correct type. There is \
a parameter <span style="font-family:Consolas,Menlo,'DejaVu Sans \
Mono','Bitstream Vera Sans Mono','Lucida \
Console';font-size:0.9em;line-height:1.5em;white-space:pre-wrap">es</span><span \
style="color:rgb(0,136,0);background-color:rgb(240,240,240);font-family:Consolas,Menlo,'DejaVu \
Sans Mono','Bitstream Vera Sans Mono','Lucida \
Console';font-size:0.9em;line-height:1.5em;white-space:pre-wrap">.</span><span \
style="font-family:Consolas,Menlo,'DejaVu Sans Mono','Bitstream Vera Sans \
Mono','Lucida \
Console';font-size:0.9em;line-height:1.5em;white-space:pre-wrap">input</span><span \
style="color:rgb(0,136,0);background-color:rgb(240,240,240);font-family:Consolas,Menlo,'DejaVu \
Sans Mono','Bitstream Vera Sans Mono','Lucida \
Console';font-size:0.9em;line-height:1.5em;white-space:pre-wrap">.json</span> I \
have to set to true in hive table <font face="Consolas, Menlo, DejaVu Sans Mono, \
Bitstream Vera Sans Mono, Lucida Console"><span \
style="font-size:11.818181991577148px;line-height:17.549999237060547px;white-space:pre-wrap">. \
Is there any similar variable I have to set for ESSink </span></font></div>
<div><br></div><div>Here is the raw data I am getting in Kibana. </div><div><pre \
style="padding:9.5px;font-family:Menlo,Monaco,Consolas,'Courier \
New',monospace;font-size:13px;color:rgb(153,153,153);border-top-left-radius:3px;bo \
rder-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px \
;margin-top:0px;margin-bottom:10px;line-height:20px;word-break:break-all;word-wrap:break-word;white-space:pre-wrap;background-color:rgb(238,238,238);border:1px \
solid rgba(0,0,0,0.14902)"> {
"_index": "test-2014-05-08",
"_type": "parsed_logs",
"_id": "7qSBgRx-Q_GLaCDWARs_Cg",
"_score": null,
"_source": {
"@message": \
"{\"action\":{\"id\":\"00001\"}}", \
"@timestamp": "2014-05-08T16:48:44.180Z", "@type": \
"application/json", "@fields": {
"_attachment_mimetype": "application/json",
"timestamp": "1399567724180",
"_type": "application/json",
"type": "application/json"
}
},
"sort": [
1399567724180
]
}</pre></div></div><div class="gmail_extra"><div><div class="h5"><br><br><div \
class="gmail_quote">On Sun, Apr 13, 2014 at 4:56 PM, Ashish <span dir="ltr"><<a \
href="mailto:paliwalashish@gmail.com" \
target="_blank">paliwalashish@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">little more on the \
issue<div><br></div><div><span \
style="font-family:arial,sans-serif;font-size:13px">builder.field(fieldName, tmp); \
calls the X</span><font face="arial, sans-serif">ContentBuilder API where class type \
is determined and appropriate method is called. Since tmp, which is instance of \
XContentBuilder, doesn't match any of the defined if conditions it goes to final \
else where the tmp.toString() is called, and field(String, String) method is called \
so we get object address in index.</font><br>
</div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, \
sans-serif">Replacing</font></div><div><span \
style="font-family:arial,sans-serif;font-size:13px">builder.field(fieldName, tmp); \
</span><font face="arial, sans-serif"><br>
</font></div><div><span \
style="font-family:arial,sans-serif;font-size:13px">with</span></div><div><font \
face="arial, sans-serif">builder.field(fieldName, \
tmp.string());<br></font></div><div><font face="arial, sans-serif"><br>
</font></div><div><font face="arial, sans-serif">shall make things work, but I am not \
sure if this would be the best way to use the API.</font></div><div><font \
face="arial, sans-serif"><br></font></div><div><span \
style="font-family:arial,sans-serif">Got the answer from ES user list :)</span><br>
</div><div><font face="arial, sans-serif"><a \
href="http://elasticsearch-users.115913.n3.nabble.com/Issue-with-posting-json-data-to-elastic-search-via-Flume-td4054017.html" \
target="_blank">http://elasticsearch-users.115913.n3.nabble.com/Issue-with-posting-json-data-to-elastic-search-via-Flume-td4054017.html</a><br>
</font></div><div><font face="arial, sans-serif"><br></font></div><div><font \
face="arial, sans-serif">Can ES experts comment on the best way \
forward?</font></div><div><br></div></div><div><div><div class="gmail_extra"> \
<br><br><div class="gmail_quote"> On Sun, Apr 13, 2014 at 8:10 PM, Ashish <span \
dir="ltr"><<a href="mailto:paliwalashish@gmail.com" \
target="_blank">paliwalashish@gmail.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">
<div dir="ltr">Have been able to reproduce the problem locally using the existing \
test cases inside ES Sink. The problem does exist.<div><br></div><div>Did some \
initial investigation, the framework is able to detect the JSON content and tries to \
add it as complex field.</div>
<div>timestamp is added only if present in header.</div><div><br></div><div>In the \
class org.apache.flume.sink.elasticsearch.ContentBuilderUtil<br></div><div><br></div><div><div>public \
static void addComplexField(XContentBuilder builder, String fieldName,</div>
<div> XContentType contentType, byte[] data) throws IOException {</div><div> \
XContentParser parser = null;</div><div> try {</div><div> \
XContentBuilder tmp = jsonBuilder();</div><div> parser = \
XContentFactory.xContent(contentType).createParser(data);</div>
<div> parser.nextToken();</div><div> \
tmp.copyCurrentStructure(parser);</div><div> builder.field(fieldName, tmp); \
<<<< This is where the we might have an issue (real action is happening \
inside this method \
call)</div>
</div><div><br></div><div>Can someone familiar with this part look further into this? \
I shall debug further as soon as I have free \
cycles.</div><div><br></div><div>thanks</div><div>ashish</div><div><br></div></div><div \
class="gmail_extra">
<div><div>
<br><br><div class="gmail_quote">On Fri, Apr 11, 2014 at 5:24 PM, Deepak Subhramanian \
<span dir="ltr"><<a href="mailto:deepak.subhramanian@gmail.com" \
target="_blank">deepak.subhramanian@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"> Thanks Simon. I am also struggling with no \
luck. I tried using the latest flume elastic search sink jar build from 1.5SNAPSHOT \
,but still no luck. I will try to see if it is an issue with elastic search api . \
When I loaded json using hive it loaded JSON properly. But we have to pass a property \
es.input.json in hive. Is there a way to pass the same in Flume.<div>
<br></div><div><pre style="margin-bottom:15px;font-family:Consolas,Menlo,'DejaVu \
Sans Mono','Bitstream Vera Sans Mono','Lucida \
Console';font-size:0.9em;white-space:pre-wrap;word-wrap:break-word;padding:8px \
10px 8px 18px;border-left-width:3px;border-style:none none none \
solid;border-left-color:rgb(116,183,63);overflow:auto;background-color:rgb(240,240,240 \
);line-height:1.5em;color:rgb(136,136,136);border-top-right-radius:5px;border-bottom-right-radius:5px">
<span style="color:rgb(0,0,0)!important">CREATE EXTERNAL TABLE json </span><span \
style="color:rgb(102,102,0)!important">(</span><span \
style="color:rgb(0,0,0)!important">data STRING</span><a \
href="http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-1" \
style="color:rgb(116,183,63);text-decoration:none;outline:none" \
target="_blank"></a><span><img \
src="http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/images/icons/callouts/1.png" \
alt="" style="border:0px;vertical-align:middle"></span><span \
style="color:rgb(102,102,0)!important">)</span><span \
style="color:rgb(0,0,0)!important"><br>
STORED BY </span><span \
style="color:rgb(0,136,0)!important">'org.elasticsearch.hadoop.hive.EsStorageHandler'</span><span \
style="color:rgb(0,0,0)!important"><br>TBLPROPERTIES</span><span \
style="color:rgb(102,102,0)!important">(</span><span \
style="color:rgb(0,136,0)!important">'es.resource'</span><span \
style="color:rgb(0,0,0)!important"> </span><span \
style="color:rgb(102,102,0)!important">=</span><span \
style="color:rgb(0,0,0)!important"> </span><span \
style="color:rgb(0,136,0)!important">'...'</span><span \
style="color:rgb(102,102,0)!important">,</span><span \
style="color:rgb(0,0,0)!important"><br>
</span><span \
style="color:rgb(0,136,0)!important">'es.input.json` = '</span><span \
style="color:rgb(0,0,0)!important">yes</span><span \
style="color:rgb(0,136,0)!important">'</span><a \
href="http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-2" \
style="color:rgb(116,183,63);text-decoration:none;outline:none" \
target="_blank"></a><span><img \
src="http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/images/icons/callouts/2.png" \
alt="" style="border:0px;vertical-align:middle"></span><span \
style="color:rgb(0,136,0)!important">);</span></pre>
</div></div>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span><font \
color="#888888">-- <br>thanks<br>ashish<br><br>Blog: <a \
href="http://www.ashishpaliwal.com/blog" \
target="_blank">http://www.ashishpaliwal.com/blog</a><br>
My Photo Galleries: <a href="http://www.pbase.com/ashishpaliwal" \
target="_blank">http://www.pbase.com/ashishpaliwal</a> </font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- \
<br>thanks<br>ashish<br><br>Blog: <a href="http://www.ashishpaliwal.com/blog" \
target="_blank">http://www.ashishpaliwal.com/blog</a><br>My Photo Galleries: <a \
href="http://www.pbase.com/ashishpaliwal" \
target="_blank">http://www.pbase.com/ashishpaliwal</a> </div>
</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span \
class="HOEnZb"><font color="#888888">-- <br>Deepak Subhramanian </font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Deepak Subhramanian
</div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic