[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: Schema import dependencies
From:       Doug Cutting <cutting () apache ! org>
Date:       2014-05-28 22:54:54
Message-ID: CALEq1Z-VnpWBYzoEJoMiJs6CQNbBdnhZwYBxaPyKRjXPx7q5ig () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


IDL is a language-independent way let you merge two schema files into one
standalone schema file.

Doug


On Wed, May 28, 2014 at 3:40 PM, Wai Yip Tung <wy@tungwaiyip.info> wrote:

> Let's say we are interested to keep 2 schema file because they come from 2
> separate organization. When we generate a data file they need to be merged
> into one standalone schema. The maven plugin does this. Otherwise we have
> to merge it ourselves. This is not too hard to merge. I just want make sure
> I'm not missing some exiting tool or API available.
>
> Wai Yip
>
>   Doug Cutting <cutting@apache.org>
>  Wednesday, May 28, 2014 12:09 PM
> Your userInfo.avsc is not a standalone schema since it depends on
> mailing_address already being defined.  A schema included in a data file is
> always standalone, and would include the mailing_address schema definition
> within the userInfo schema's "address" field.
>
> Some tools will process such non-standalone schemas in separate files.
>  For example, the Java schema compiler will accept multiple schema files on
> the command line, and those later on the command line may reference types
> defined earlier.  Java's maven tasks also permit references to other files,
> but these are probably not of interest to a Python developer.
>
> The IDL tool uses the JVM as its runtime but is not Java-specific.
>
> Doug
>
>
>
>   Wai Yip Tung <wy@tungwaiyip.info>
>  Wednesday, May 28, 2014 11:53 AM
>  I want to extend this question somewhat. I begin to realized avro has
> accommodation to compose schema from user defined type. I want to check if
> I understand it correctly and also the proper way to use it.
>
> I take a single, two level nested schema from the web (see using an
> embedded record").
>
> http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html
>
> I break it down to two separate records. The main `userInfo` record and
> the embedded `mailing_address` record as two separate JSON object.
>
>
> ------------------------------------------------------------------------
> userInfo.avsc
>
> {
> "type" : "record",
> "name" : "userInfo",
> "namespace" : "my.example",
> "fields" : [{"name" : "username",
>              "type" : "string",
>              "default" : "NONE"},
>
>             {"name" : "age",
>              "type" : "int",
>              "default" : -1},
>
>              {"name" : "phone",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "housenum",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "address",
>               "type" : "mailing_address",   <--- user defined type
>               "default" : "NONE"},
> ]
> }
>
> ------------------------------------------------------------------------
> mailing_address.avsc
>
> {
>  "type" : "record",
>  "name" : "mailing_address",                 <--- defined here
>  "fields" : [
>     {"name" : "street",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "city",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "state_prov",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "country",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "zip",
>      "type" : "string",
>      "default" : "NONE"}
>     ]}
> }
> ------------------------------------------------------------------------
>
> Is this a valid composite avro schema definition?
>
> The second question is how can we actually use this in practice. If we
> have two separate file, is there a standard API that load them both.
> Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I
> am unfamiliar with this. Is a comparable API exist?
>
> I understand the IDL form has explicit linking of schema files. I will
> look into it next.
>
> Wai Yip
>
>
>   Doug Cutting <cutting@apache.org>
>  Thursday, May 22, 2014 2:57 PM
> You might instead use Avro IDL to define your schemas. It permits you
> define multiple schemas in a single file, so that you can determine
> the order they're defined in. It also permits ordered inclusion of
> types from other files, both IDL files and schema files.
>
> Doug
>
> On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P
>
>   Hrishikesh P <hrishi.engineer@gmail.com>
>  Thursday, May 22, 2014 10:46 AM
> I have a few avro schemas that I am generating the code from using the
> avro maven plugin. I have dependencies in the schemas which I was able to
> resolve by putting the schemas in separate folders and/or renaming the
> schema file names with 01-, 02-, ...etc so that the dependencies get
> compiled first. However, this only works on mac but not on RHEL (probably
> because of the different ways the directories are read on them?). Anybody
> knows the best way to handle schema dependencies? If I specify individual
> schema names in the POM in the imports section, the schemas get compiled
> but I have listed the folders and I would like to avoid listing individual
> files if possible.
>
> Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367
>
> Thanks in advance.
>
>

[Attachment #5 (text/html)]

<div dir="ltr">IDL is a language-independent way let you merge two schema files into \
one standalone schema file.<div><br></div><div>Doug</div></div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Wed, May 28, 2014 at 3:40 PM, \
Wai Yip Tung <span dir="ltr">&lt;<a href="mailto:wy@tungwaiyip.info" \
target="_blank">wy@tungwaiyip.info</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">Let&#39;s say we are 
interested to keep 2 schema file because they come from 2 separate 
organization. When we generate a data file they need to be merged into 
one standalone schema. The maven plugin does this. Otherwise we have to 
merge it ourselves. This is not too hard to merge. I just want make sure
 I&#39;m not missing some exiting tool or API available.<br>
<br>
Wai Yip<br>
<br>
<blockquote style="border:0px none" type="cite">
  <div style="margin:30px 25px 10px 25px"><div \
style="display:table;width:100%;border-top:1px solid #edeef0;padding-top:5px"> 	<div \
style="display:table-cell;vertical-align:middle;padding-right:6px"><img \
src="cid:part1.04020403.01030801@tungwaiyip.info" \
name="14644fe1ccdacf16_postbox-contact.jpg" height="25px" width="25px"></div>  <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">  <a \
href="mailto:cutting@apache.org" \
style="color:#737f92!important;padding-right:6px;font-weight:bold;text-decoration:none!important" \
target="_blank">Doug Cutting</a></div>   <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle">  
  <font color="#9FA2A5"><span style="padding-left:6px">Wednesday, May 
28, 2014 12:09 PM</span></font></div></div></div><div class="">
  <div style="color:#888888;margin-left:24px;margin-right:24px"><div dir="ltr">Your 
userInfo.avsc is not a standalone schema since it depends on 
mailing_address already being defined.   A schema included in a data file
 is always standalone, and would include the mailing_address schema 
definition within the userInfo schema&#39;s &quot;address&quot; field.<div>
<br></div><div>Some tools will process such non-standalone schemas in 
separate files.   For example, the Java schema compiler will accept 
multiple schema files on the command line, and those later on the 
command line may reference types defined earlier.   Java&#39;s maven tasks 
also permit references to other files, but these are probably not of 
interest to a Python developer.</div>
<div><br></div><div>The IDL tool uses the JVM as its runtime but is not 
Java-specific.</div><div><br></div><div>Doug</div></div><div \
class="gmail_extra"><br><br><br></div>

  </div>
  </div><div style="margin:30px 25px 10px 25px"><div \
style="display:table;width:100%;border-top:1px solid #edeef0;padding-top:5px"> 	<div \
style="display:table-cell;vertical-align:middle;padding-right:6px"><img \
src="cid:part2.04060003.03060807@tungwaiyip.info" \
name="14644fe1ccdacf16_postbox-contact.jpg" height="25px" width="25px"></div>  <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">  <a \
href="mailto:wy@tungwaiyip.info" \
style="color:#737f92!important;padding-right:6px;font-weight:bold;text-decoration:none!important" \
target="_blank">Wai Yip Tung</a></div>   <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle">  
  <font color="#9FA2A5"><span style="padding-left:6px">Wednesday, May 
28, 2014 11:53 AM</span></font></div></div></div><div><div class="h5">
  <div style="color:#888888;margin-left:24px;margin-right:24px">

I want to extend this 
question somewhat. I begin to realized avro has accommodation to compose
 schema from user defined type. I want to check if I understand it 
correctly and also the proper way to use it.<br>
<br>
I take a single, two level nested schema from the web (see using an 
embedded record&quot;).<br>
<a href="http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html" \
target="_blank">http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html</a><br>
 <br>
I break it down to two separate records. The main `userInfo` record and 
the embedded `mailing_address` record as two separate JSON object.<br>
<br>
<br>
<small><span style="font-family:monospace">------------------------------------------------------------------------</span><br \
style="font-family:monospace">  <span \
style="font-family:monospace">userInfo.avsc</span><br style="font-family:monospace">  \
<br style="font-family:monospace">  <span style="font-family:monospace">{</span><br \
style="font-family:monospace">  <span style="font-family:monospace">&quot;type&quot; \
: &quot;record&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">&quot;name&quot; : &quot;userInfo&quot;,</span><br \
style="font-family:monospace">  <span \
style="font-family:monospace">&quot;namespace&quot; : \
&quot;my.example&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">&quot;fields&quot; : [{&quot;name&quot; :  \
&quot;username&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                         &quot;default&quot; :  \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">                  \
{&quot;name&quot; : &quot;age&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                         &quot;type&quot; : \
&quot;int&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                         &quot;default&quot; : \
-1},</span><br style="font-family:monospace">  <br style="font-family:monospace">
  <span style="font-family:monospace">                         {&quot;name&quot; : \
&quot;phone&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                           &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                           &quot;default&quot; :  \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">                  \
{&quot;name&quot; :  &quot;housenum&quot;,</span><br style="font-family:monospace">
  <span style="font-family:monospace">                           &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">                           &quot;default&quot; :  \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">                  \
{&quot;name&quot; :  &quot;address&quot;,</span><br style="font-family:monospace">
  <span style="font-family:monospace">                           &quot;type&quot; : 
&quot;mailing_address&quot;,     &lt;--- user defined type</span><br \
style="font-family:monospace">  <span style="font-family:monospace">                  \
&quot;default&quot; :  &quot;NONE&quot;},</span><br style="font-family:monospace">
  <span style="font-family:monospace">]</span><br style="font-family:monospace">
  <span style="font-family:monospace">}</span><br style="font-family:monospace">
  <br style="font-family:monospace">
  <span style="font-family:monospace">------------------------------------------------------------------------</span><br \
style="font-family:monospace">  <span \
style="font-family:monospace">mailing_address.avsc</span><br \
style="font-family:monospace">  <br style="font-family:monospace">
  <span style="font-family:monospace">{</span><br style="font-family:monospace">
  <span style="font-family:monospace">  &quot;type&quot; : \
&quot;record&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">  &quot;name&quot; :  &quot;mailing_address&quot;,      \
&lt;--- defined here</span><br style="font-family:monospace">  <span \
style="font-family:monospace">  &quot;fields&quot; : [</span><br \
style="font-family:monospace">  <span style="font-family:monospace">       \
{&quot;name&quot; : &quot;street&quot;,</span><br style="font-family:monospace">  \
<span style="font-family:monospace">         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;default&quot; : \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">       \
{&quot;name&quot; : &quot;city&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;default&quot; : \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">       \
{&quot;name&quot; : &quot;state_prov&quot;,</span><br style="font-family:monospace">  \
<span style="font-family:monospace">         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;default&quot; : \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">       \
{&quot;name&quot; : &quot;country&quot;,</span><br style="font-family:monospace">  \
<span style="font-family:monospace">         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;default&quot; : \
&quot;NONE&quot;},</span><br style="font-family:monospace">  <br \
style="font-family:monospace">  <span style="font-family:monospace">       \
{&quot;name&quot; : &quot;zip&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;type&quot; : \
&quot;string&quot;,</span><br style="font-family:monospace">  <span \
style="font-family:monospace">         &quot;default&quot; : \
&quot;NONE&quot;}</span><br style="font-family:monospace">  <span \
style="font-family:monospace">       ]}</span><br style="font-family:monospace">  \
<span style="font-family:monospace">}</span><br style="font-family:monospace">  <span \
style="font-family:monospace">------------------------------------------------------------------------</span></small><br>
 <br>
Is this a valid composite avro schema definition?<br>
<br>
The second question is how can we actually use this in practice. If we 
have two separate file, is there a standard API that load them both. 
Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I
 am unfamiliar with this. Is a comparable API exist? <br>
<br>
I understand the IDL form has explicit linking of schema files. I will 
look into it next.<br>
<br>
Wai Yip<br>
<br>
<br>

  </div>
  <div style="margin:30px 25px 10px 25px"><div \
style="display:table;width:100%;border-top:1px solid #edeef0;padding-top:5px"> 	<div \
style="display:table-cell;vertical-align:middle;padding-right:6px"><img \
src="cid:part1.04020403.01030801@tungwaiyip.info" \
name="14644fe1ccdacf16_postbox-contact.jpg" height="25px" width="25px"></div>  <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">  <a \
href="mailto:cutting@apache.org" \
style="color:#737f92!important;padding-right:6px;font-weight:bold;text-decoration:none!important" \
target="_blank">Doug Cutting</a></div>   <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle">  
  <font color="#9FA2A5"><span style="padding-left:6px">Thursday, May 22,
 2014 2:57 PM</span></font></div></div></div>
  <div style="color:#888888;margin-left:24px;margin-right:24px"><div>You might \
instead use Avro  IDL to define your schemas.  It permits you<br>define multiple \
schemas  in a single file, so that you can determine<br>the order they&#39;re defined
 in.  It also permits ordered inclusion of<br>types from other files, 
both IDL files and schema files.<br><br>Doug<br><br>On Thu, May 22, 2014
 at 10:46 AM, Hrishikesh P</div><div><br></div></div>
  <div style="margin:30px 25px 10px 25px"><div \
style="display:table;width:100%;border-top:1px solid #edeef0;padding-top:5px"> 	<div \
style="display:table-cell;vertical-align:middle;padding-right:6px"><img \
src="cid:part4.03070407.00080600@tungwaiyip.info" \
name="14644fe1ccdacf16_compose-unknown-contact.jpg" height="25px" width="25px"></div> \
<div style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">  \
<a href="mailto:hrishi.engineer@gmail.com" \
style="color:#737f92!important;padding-right:6px;font-weight:bold;text-decoration:none!important" \
target="_blank">Hrishikesh P</a></div>   <div \
style="display:table-cell;white-space:nowrap;vertical-align:middle">  
  <font color="#9FA2A5"><span style="padding-left:6px">Thursday, May 22,
 2014 10:46 AM</span></font></div></div></div>
  <div style="color:#888888;margin-left:24px;margin-right:24px"><div dir="ltr">I have \
a few  avro schemas that I am generating the code from using the avro maven 
plugin. I have dependencies in the schemas which I was able to resolve 
by putting the schemas in separate folders and/or renaming the schema 
file names with 01-, 02-, ...etc so that the dependencies get compiled 
first. However, this only works on mac but not on RHEL (probably because
 of the different ways the directories are read on them?). Anybody knows
 the best way to handle schema dependencies? If I specify individual 
schema names in the POM in the imports section, the schemas get compiled
 but I have listed the folders and I would like to avoid listing 
individual files if possible.<div>
<br></div><div>Here&#39;s a related issue: <a \
href="https://issues.apache.org/jira/browse/AVRO-1367" \
target="_blank">https://issues.apache.org/jira/browse/AVRO-1367</a></div><div><br></div><div>Thanks
  in advance.</div></div>

  </div>
</div></div></blockquote>
</div>
</blockquote></div><br></div>

--001a113a927e6f80bd04fa7db303--


["compose-unknown-contact.jpg" (image/jpeg)]
["postbox-contact.jpg" (image/jpeg)]
["postbox-contact.jpg" (image/jpeg)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic