[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: Converting SpecificRecord to GenericRecord of different Schema Versions
From:       Scott Reynolds <sreynolds () twilio ! com>
Date:       2021-12-06 15:39:48
Message-ID: CAFk0wff_87af2aEDLSYL0rHkRmLm9RUAJwKZ4v2NEten6awhsQ () mail ! gmail ! com
[Download RAW message or body]

I wrote code to do this and it is used in production at high volume.

It is a recursive implementation that reads the next schema field or record
from the desired schema and gets the field from the original object by name.

The tricky parts are dealing with enumerations (as you noted). The code
used to make that translation is here, fieldSchema is the desired enum
field from the GenericRecord and unknownValue is the Enum from the
SpecificRecord:

GenericData.EnumSymbol(fieldSchema, unknownValue.toString().toUpperCase())

Fixed also required a special branch:

new GenericData.Fixed(fieldSchema, (byte[])unknownValue);

And finally logical types required the use of conversions:

Conversions.convertToRawType(unknownValue, fieldSchema,
fieldSchema.getLogicalType(),
conversions.get(fieldSchema.getLogicalType().getName()));

On Mon, Dec 6, 2021, 4:32 AM Martin Grigorov <mgrigorov@apache.org> wrote:

> Hi,
>
> You will need to write a new org.apache.avro.io.Encoder.
> If you succeed making it then please share it with the commuity via Pull
> Request!
> If you don't - please create an issue and we will try to help!
>
> On Sun, Dec 5, 2021 at 6:31 PM KV 59 <kvajjala59@gmail.com> wrote:
>
>> Hi All,
>>
>> Here is my situation, I have a SpecificRecord for a Schema S and I want
>> to convert it into a GenericRecord of a compatible Schema T (It is an older
>> version of S). I have seen many such examples but all strategies point to
>> serializing the SpecificRecord to either bytes or JSON and deserializing
>> back to the GenericRecord. This seems to be inefficient especially if the
>> records are huge and in a high volume streaming scenario like mine.
>>
>> I cannot simply cast the SpecificRecord to GenericRecord because of some
>> type incompatibilities like Enums and Instants.
>>
>> I have been looking at the SpecificRecordDatumWriter/Reader sources and
>> try to build a Mapper which just sets the value in the GenericRecord but I
>> cannot write such a mapper without the help of any of the protected and
>> private methods in them.
>>
>> There is a same problem of converting a POJO to GenericRecord as well
>>
>>
>> Appreciate your inputs and recommendations
>>
>> Thanks
>> Kishore
>>
>>
>>
>>
>>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="auto">I wrote code to do this and it is used in production \
at high volume.  <div dir="auto"><br></div><div dir="auto">It is a recursive \
implementation that reads the next schema field or record from the desired schema and \
gets the field from the original object by name.</div><div dir="auto"><br></div><div \
dir="auto">The tricky parts are dealing with enumerations (as you noted). The code \
used to make that translation is here, fieldSchema is the desired enum field from the \
GenericRecord and unknownValue is the Enum from the SpecificRecord:<br><br></div><div \
dir="auto"><span class="gmail-pl-smi" \
style="box-sizing:border-box;color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">GenericData</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
class="gmail-pl-smi" \
style="box-sizing:border-box;color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">EnumSymbol</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">(fieldSchema, \
unknownValue</span><span class="gmail-pl-k" \
style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">toString()</span><span \
class="gmail-pl-k" style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">toUpperCase())</span><br></div><div \
dir="auto"><span style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre"><br></span></div>Fixed \
also required a special branch:</div><div dir="auto"><br><div><span \
class="gmail-pl-k" style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">new</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre"> </span><span \
class="gmail-pl-smi" \
style="box-sizing:border-box;color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">GenericData</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
class="gmail-pl-smi" \
style="box-sizing:border-box;color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">Fixed</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">(fieldSchema, \
(</span><span class="gmail-pl-k" \
style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">byte</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">[])unknownValue);</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre"><br></span></div><div><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre"><br></span></div>And \
finally logical types required the use of conversions:</div><div \
dir="auto"><br><div><span class="gmail-pl-smi" \
style="box-sizing:border-box;color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">Conversions</span><span \
class="gmail-pl-k" style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">convertToRawType(unknownValue, \
fieldSchema, fieldSchema</span><span class="gmail-pl-k" \
style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">getLogicalType(), \
conversions</span><span class="gmail-pl-k" \
style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">get(fieldSchema</span><span \
class="gmail-pl-k" style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">getLogicalType()</span><span \
class="gmail-pl-k" style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">.</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre">getName()));</span><span \
style="color:rgb(36,41,46);font-family:SFMono-Regular,Consolas,&quot;Liberation \
Mono&quot;,Menlo,monospace;font-size:12px;white-space:pre"><br></span></div></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Dec 6, 2021, 4:32 AM \
Martin Grigorov &lt;<a href="mailto:mgrigorov@apache.org" \
target="_blank">mgrigorov@apache.org</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>You will \
need to write a new org.apache.avro.io.Encoder.</div><div>If you succeed making it \
then please share it with the commuity via Pull Request!</div><div>If you don&#39;t - \
please create an issue and we will try to help!</div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 5, 2021 at 6:31 PM \
KV 59 &lt;<a href="mailto:kvajjala59@gmail.com" rel="noreferrer" \
target="_blank">kvajjala59@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi All,<div><br></div><div>Here is \
my situation, I have a SpecificRecord for a Schema S and I want to convert it into a \
GenericRecord of a compatible Schema T (It is an older version of S). I have seen \
many such examples but all strategies point to serializing the SpecificRecord to \
either bytes or JSON and deserializing back to the  GenericRecord. This seems to be \
inefficient especially if the records are huge and in a high volume streaming \
scenario like mine.</div><div><br></div><div>I cannot simply cast the SpecificRecord \
to GenericRecord because of some type incompatibilities  like Enums and \
Instants.</div><div><br></div><div>I have been looking at the \
SpecificRecordDatumWriter/Reader sources and try to build a Mapper which just sets \
the value in the GenericRecord but I cannot write such a mapper without the help of \
any of the protected and private methods in them.</div><div><br></div><div>There is a \
same problem of converting a POJO to GenericRecord as \
well</div><div><br></div><div><br></div><div>Appreciate your inputs  and \
recommendations</div><div><br></div><div>Thanks</div><div>Kishore</div><div><br></div><div><br></div><div><br></div><div><br></div></div>
 </blockquote></div>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic