[prev in list] [next in list] [prev in thread] [next in thread]
List: avro-dev
Subject: [jira] [Commented] (AVRO-2147) Proto to Avro serialization is unnecessarily slow due to repeated sch
From: "Doug Cutting (JIRA)" <jira () apache ! org>
Date: 2018-02-21 17:58:00
Message-ID: JIRA.13139267.1518900065000.244505.1519235880186 () Atlassian ! JIRA
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/AVRO-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371757#comment-16371757 \
]
Doug Cutting commented on AVRO-2147:
------------------------------------
Patch looks fine to me. I don't think we need to commit the perf test.
Does anyone object to committing this patch?
> Proto to Avro serialization is unnecessarily slow due to repeated schema creation
> ---------------------------------------------------------------------------------
>
> Key: AVRO-2147
> URL: https://issues.apache.org/jira/browse/AVRO-2147
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.8.1, 1.8.2
> Reporter: Tobi Vollebregt
> Priority: Major
> Labels: java, optimization, protobuf
> Attachments: AVRO-2147.patch, TestProtobufPerf.java, test.proto
>
>
> Hi,
> I discovered that proto to avro serialization is unnecessarily slow in certain \
> cases due to repeated schema creation. Specifically, this slowness shows when \
> serializing protocol buffer messages that contain nested protocol buffer messages \
> that contain enums with many possible values. Some profiling showed this is due to \
> the {{Schema}} objects for the nested message/enum not being cached in this case. \
> An example that reproduces this is to add the following to {{test.proto}}: \
> {{message Foo {}} {{ ...}}
> {{ optional MessageWithLargeEnum bar = 21;}}
> {{}}}
> {{message MessageWithLargeEnum {}}
> {{ optional LargeEnum enum = 1;}}
> {{}}}
> {{enum LargeEnum {}}
> {{ AA = 1;}}
> {{ AB = 2;}}
> {{ AC = 3;}}
> {{ ...}}
> {{ ZZ = 676;}}
> {{}}}
> Then, a test like the following will exhibit the slow behavior:
> {{@Test public void perf() throws Exception {}}
> {{ Foo.Builder builder = Foo.newBuilder();}}
> {{ builder.setInt32(0);}}
> {{ builder.setInt64(2);}}
> {{ builder.setUint32(3);}}
> {{ builder.setUint64(4);}}
> {{ builder.setSint32(5);}}
> {{ builder.setSint64(6);}}
> {{ builder.setFixed32(7);}}
> {{ builder.setFixed64(8);}}
> {{ builder.setSfixed32(9);}}
> {{ builder.setSfixed64(10);}}
> {{ builder.setFloat(1.0F);}}
> {{ builder.setDouble(2.0);}}
> {{ builder.setBool(true);}}
> {{ builder.setString("foo");}}
> {{ builder.setBytes(ByteString.copyFromUtf8("bar"));}}
> {{ builder.setEnum(org.apache.avro.protobuf.Test.A.X);}}
> {{ builder.addIntArray(27);}}
> {{ builder.addSyms(org.apache.avro.protobuf.Test.A.Y);}}
> {{ builder.setBar(MessageWithLargeEnum.newBuilder().setEnum(LargeEnum.AA));}}
> {{ Foo objToConvert = builder.build();}}
> {{ Schema schema = ProtobufData.get().getSchema(Foo.class);}}
> {{ ByteArrayOutputStream bao = new ByteArrayOutputStream();}}
> {{ Encoder e = EncoderFactory.get().binaryEncoder(bao, null);}}
> {{ ProtobufDatumWriter<Foo> w = new ProtobufDatumWriter<Foo>(schema);}}
> {{ GenericDatumReader gdr = new GenericDatumReader(schema, schema);}}
> {{ BinaryDecoder d = null;}}
> {{ long startTime = System.nanoTime();}}
> {{ for (int i = 0; i < 1000000; ++i) {}}
> {{ bao.reset();}}
> {{ w.write(objToConvert, e);}}
> {{ e.flush();}}
> {{ d = DecoderFactory.get().binaryDecoder(bao.toByteArray(), d);}}
> {{ gdr.read(null, d);}}
> \{{ }}}
> {{ long endTime = System.nanoTime();}}
> {{ System.out.println("Elapsed: " + (endTime - startTime) / 1000000 + " ms");}}
> {{}}}
> I will attach a patch that optimizes this.
> With the attached patch this test reports a runtime of about 4 seconds, while the \
> runtime without the patch is 30+ seconds, so this is an 7.5-8x improvement for this \
> particular test enum.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic