[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Commented] (AVRO-2147) Proto to Avro serialization is unnecessarily slow due to repeated sch
From:       "Doug Cutting (JIRA)" <jira () apache ! org>
Date:       2018-02-21 17:58:00
Message-ID: JIRA.13139267.1518900065000.244505.1519235880186 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/AVRO-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371757#comment-16371757 \
] 

Doug Cutting commented on AVRO-2147:
------------------------------------

Patch looks fine to me.  I don't think we need to commit the perf test.

Does anyone object to committing this patch?



> Proto to Avro serialization is unnecessarily slow due to repeated schema creation
> ---------------------------------------------------------------------------------
> 
> Key: AVRO-2147
> URL: https://issues.apache.org/jira/browse/AVRO-2147
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.8.1, 1.8.2
> Reporter: Tobi Vollebregt
> Priority: Major
> Labels: java, optimization, protobuf
> Attachments: AVRO-2147.patch, TestProtobufPerf.java, test.proto
> 
> 
> Hi,
> I discovered that proto to avro serialization is unnecessarily slow in certain \
> cases due to repeated schema creation. Specifically,  this slowness  shows when \
> serializing protocol buffer messages that contain nested protocol buffer messages \
> that contain  enums with many possible values. Some profiling showed this is due to \
> the {{Schema}} objects for the nested message/enum not being cached in this case. \
> An example that reproduces this is to add the following to {{test.proto}}: \
> {{message Foo {}} {{   ...}}
> {{    optional MessageWithLargeEnum bar = 21;}}
> {{}}}
> {{message MessageWithLargeEnum {}}
> {{   optional LargeEnum enum = 1;}}
> {{}}}
> {{enum LargeEnum {}}
> {{   AA = 1;}}
> {{   AB = 2;}}
> {{   AC = 3;}}
> {{   ...}}
> {{    ZZ = 676;}}
> {{}}}
> Then, a test like  the following  will exhibit the slow behavior:
> {{@Test public void perf() throws Exception {}}
> {{   Foo.Builder builder = Foo.newBuilder();}}
> {{   builder.setInt32(0);}}
> {{   builder.setInt64(2);}}
> {{   builder.setUint32(3);}}
> {{   builder.setUint64(4);}}
> {{   builder.setSint32(5);}}
> {{   builder.setSint64(6);}}
> {{   builder.setFixed32(7);}}
> {{   builder.setFixed64(8);}}
> {{   builder.setSfixed32(9);}}
> {{   builder.setSfixed64(10);}}
> {{   builder.setFloat(1.0F);}}
> {{   builder.setDouble(2.0);}}
> {{   builder.setBool(true);}}
> {{   builder.setString("foo");}}
> {{   builder.setBytes(ByteString.copyFromUtf8("bar"));}}
> {{   builder.setEnum(org.apache.avro.protobuf.Test.A.X);}}
> {{   builder.addIntArray(27);}}
> {{   builder.addSyms(org.apache.avro.protobuf.Test.A.Y);}}
> {{     builder.setBar(MessageWithLargeEnum.newBuilder().setEnum(LargeEnum.AA));}}
> {{   Foo objToConvert = builder.build();}}
> {{   Schema schema = ProtobufData.get().getSchema(Foo.class);}}
> {{   ByteArrayOutputStream bao = new ByteArrayOutputStream();}}
> {{   Encoder e = EncoderFactory.get().binaryEncoder(bao, null);}}
> {{   ProtobufDatumWriter<Foo> w = new ProtobufDatumWriter<Foo>(schema);}}
> {{   GenericDatumReader gdr = new GenericDatumReader(schema, schema);}}
> {{   BinaryDecoder d = null;}}
> {{   long startTime = System.nanoTime();}}
> {{   for (int i = 0; i < 1000000; ++i) {}}
> {{      bao.reset();}}
> {{      w.write(objToConvert, e);}}
> {{      e.flush();}}
> {{      d = DecoderFactory.get().binaryDecoder(bao.toByteArray(), d);}}
> {{        gdr.read(null, d);}}
> \{{   }}}
> {{   long endTime = System.nanoTime();}}
> {{   System.out.println("Elapsed: " + (endTime - startTime) / 1000000 + " ms");}}
> {{}}}
> I will attach a patch that  optimizes this.
> With the attached patch this test reports a runtime of about 4 seconds, while the \
> runtime without the patch is 30+ seconds, so this is an 7.5-8x improvement for this \
> particular test enum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic