[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro
From:       "Niels Basjes (JIRA)" <jira () apache ! org>
Date:       2016-03-24 16:49:25
Message-ID: JIRA.12845517.1437035915000.42712.1458838165704 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210535#comment-15210535 \
] 

Niels Basjes commented on AVRO-1704:
------------------------------------

I did some experimenting over the last week and I posted my changed version of Avro \
here: https://github.com/nielsbasjes/avro/tree/AVRO-1704

What I did so far:
# Added to Schema the getFingerPrint() method that uses the CRC-64-AVRO to calculate \
the schema finger print. # Added a few SchemaStorage related classes that allow \
storing schemas in memory. # Added to the generated classes the toBytes() method and \
the fromBytes static method. Both effectively call the 'real' implementations which \
are in the SpecificRecordBase class.

All of this passes all of the Java unit testing.

At the application end my test code (using 3 slightly different variations of the \
same schema) looks like this.  This works exactly as I expect it to.
{code:java}
SchemaFactory.put(com.bol.measure.v1.Measurement.getClassSchema());
SchemaFactory.put(com.bol.measure.v2.Measurement.getClassSchema());
SchemaFactory.put(com.bol.measure.v3.Measurement.getClassSchema());

com.bol.measure.v1.Measurement measurement = \
DummyMeasurementFactory.createTestMeasurement(timestamp); byte[] bytesV1 = \
measurement.toBytes();

com.bol.measure.v2.Measurement newBornV2 = \
com.bol.measure.v2.Measurement.fromBytes(bytesV1); com.bol.measure.v3.Measurement \
newBornV3 = com.bol.measure.v3.Measurement.fromBytes(bytesV1); {code}

Things currently missing: Documentation, extra tests, etc.

I could really use some feedback on the structure of my change and advice on how to \
approach the need to call a 'close()' method on the schema storage part.

Thanks.

> Standardized format for encoding messages with Avro
> ---------------------------------------------------
> 
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
> Issue Type: Improvement
> Reporter: Daniel Schierbeck
> 
> I'm currently using the Datafile format for encoding messages that are written to \
> Kafka and Cassandra. This seems rather wasteful: 1. I only encode a single record \
> at a time, so there's no need for sync markers and other metadata related to \
> multi-record files. 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, meaning \
> that I can read and write data with minimal effort across the various languages in \
> use in my organization. If there was a standardized format for encoding single \
> values that was optimized for out-of-band schema transfer, I would much rather use \
> that. I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode datums in \
> this format, as well as a MessageReader that, given a SchemaStore, would be able to \
> decode datums. The reader would decode the fingerprint and ask its SchemaStore to \
> return the corresponding writer's schema. The idea is that SchemaStore would be an \
> abstract interface that allowed library users to inject custom backends. A simple, \
> file system based one could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic