[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    Re: Proposal: RFCs for Avro 2.x
From:       Zoltan Farkas <zolyfarkas () yahoo ! com ! INVALID>
Date:       2020-04-29 13:32:34
Message-ID: 7F5DF04B-5683-4A4E-8F06-0DB4E5574F97 () yahoo ! com
[Download RAW message or body]

I am all for expanding the core types… the current logical type shortcuts in the \
IDL lang will make this a bit more interesting to implement (I think they confuse \
more than they help)… 

Regarding ID based field tracking, I am not sure I understand what problem does it \
solve, and there might be better solutions for it.

but these discussions should be made as part of the AEP process…

—Z


> On Apr 28, 2020, at 8:50 PM, Ryan Blue <rblue@netflix.com.INVALID> wrote:
> 
> +1 for removing code that isn't maintained. We can still bring it back if
> anyone is interested, but I like the idea of retiring it so that users get
> a clear idea of its state (unmaintained) and so it doesn't slow down
> development (releases blocked by code rot). I support separate versioning
> and updating to semantic versioning, too!
> 
> For the 2.0 format, I think there may be some other reasons to consider it
> as well.
> 
> First, it would be great to expand the core set of types to include
> timestamps, dates, decimals, and maps with non-string keys. These are
> available through logical types, but logical types are difficult to
> configure and require deserialization and conversion instead of just
> deserialization. We could gain performance and make Avro much easier to use
> by adding to the core set of types.
> 
> Second, I would like to see Avro adopt or support id-based field tracking
> in schemas. We've built this in Apache Iceberg so that schema evolution in
> Iceberg tables never have unintended side-effects. For example, dropping a
> column and adding one with the same name never mixes the dropped column's
> data with the new column's data; and it's still possible to un-delete
> columns. Another benefit of id-based schemas is that producers and
> consumers don't need to coordinate schema changes or keep old aliases. The
> name of a column is whatever the id is labelled with in the reader's schema.
> 
> I'm not sure that even these are enough to break compatibility with v1, but
> I think it's worth a discussion.
> 
> On Tue, Apr 28, 2020 at 1:01 AM Ismaël Mejía <iemejia@gmail.com> wrote:
> 
> > Huge +1 to recover the Avro Enhancement Proposals (AEP)
> > 
> > The experimental features Ryan mentioned definitely merit(ed) to be
> > part of it, and in particular the procedure to decide when they will
> > become ‘stable' or default, for example for fastread. Also other
> > proposals/discussions like the split release or semantic versioning
> > should be part of it.
> > 
> > About Avro 2.0.0 I think breaking binary compatibility of the format
> > is going to prove to be a hard sell (are named unions valuable enough
> > to break backwards compatibility?), if we can extend the binary format
> > in a compatible way there is no reason to have 2.0.0 so I agree that
> > there is a delicate balance we should avoid because strict stability
> > could let us also ostracized.
> > 
> > What I personally would like is to make Avro as lean and efficient as
> > possible and focus mostly in the binary format part and tools probably
> > removing the less used parts (IPC/RPC/trevni) so it is good to see
> > that other people are starting to agree on that.
> > 
> > One more radical idea I would like is to try is to unify a bit the
> > implementations probably having a robust low level one in one systems
> > language (C or Rust) and bindings for all the languages that rely on
> > it but this is probably more because of my frustration of seeing
> > projects that take this approach becoming slowly the standard and
> > Apacho Avro relegated (this is already happening on the python front).
> > 
> > In general the critical issue with Avro are the downstream
> > consequences of our actions, and of course we will always have
> > incomplete information, but we can investigate and see if changes are
> > worth.
> > 
> > Regards,
> > Ismaël
> > 
> > On Mon, Apr 27, 2020 at 6:51 PM Ryan Skraba <ryan@skraba.com> wrote:
> > > 
> > > Hello!
> > > 
> > > You bring up some good points -- I'm glad Avro is so widely used, but
> > > it does make me nervous to see any changes that might break other
> > > projects, or change any behaviour.
> > > 
> > > Currently, we've talked about managing developer expectations with
> > > semantic versioning (especially with the necessary Jackson API cleanup
> > > that happened in 1.9.x), or versioning artifacts separately.
> > > 
> > > We also have a couple of experimental/feature flags for some behaviour
> > > changes:
> > https://cwiki.apache.org/confluence/display/AVRO/Experimental+features+in+Avro
> > > 
> > > And there is already a page for Avro Enhancement Proposals that look
> > > largely out of date:
> > > 
> > https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
> > > 
> > > Moving some of the extras to a separate repo brings many of the same
> > > problems as versioning artifacts separately (nobody wants to deal with
> > > a compatibility matrix).  I'm definitely not against it, but I'm not
> > > sure how it would improve the situation.
> > > 
> > > There's a fine line between being extremely stable and being
> > > paralyzed! I would be enthusiastic about any process changes that
> > > would help us encourage and adopt new features (and fixes) more
> > > quickly.
> > > 
> > > All my best, Ryan
> > > 
> > > 
> > > On Sun, Apr 26, 2020 at 11:18 AM Driesprong, Fokko <fokko@driesprong.frl>
> > wrote:
> > > > 
> > > > Hi Andy,
> > > > 
> > > > Thanks for reaching out. Sorry for not being so active in the community
> > > > lately.
> > > > 
> > > > Since Avro 1.8.2 there has been some activity on the repository again,
> > > > fixing stuff like security issues and migrating to later versions of
> > Java.
> > > > Avro has been around for 10 years now, and I would like to keep (some)
> > > > backward compatibility to make sure that people are still going to use
> > it
> > > > for another 10 years :) In the past, the idea was to keep the format
> > > > backward compatibility, this excludes the Java API to. So we did some
> > > > changes to the API, such as removing Jackson from the public API and
> > > > aggressively migrating from Joda Time to Java JSR-310. This caused a
> > lot of
> > > > issues because Avro is deeply nested in a lot of projects. For
> > example, it
> > > > is a huge task to update Avro in Hive or Hadoop. Therefore we believe
> > that
> > > > backward compatibility is very important.
> > > > 
> > > > And I agree that we should mainly focus on the Avro spec itself, and
> > not
> > > > too much on File I/O and Network etc :) However, if we decide to break
> > an
> > > > API, we should do it for a good reason.
> > > > 
> > > > Cheers, Fokko
> > > > 
> > > > Op wo 22 apr. 2020 om 16:09 schreef Andy Le <anhldbk@gmail.com>:
> > > > 
> > > > > Hi guys,
> > > > > 
> > > > > I'm new to this vibrant open source community. My story with Avro
> > can be
> > > > > found here [1]
> > > > > 
> > > > > While implementing the feature, I got stuck and had various
> > discussions
> > > > > with Dough Cutting, Fokko Driesprong.... You may see here [2]
> > > > > 
> > > > > Here my (bias) observations about our current Avro 1.9.x:
> > > > > 
> > > > > - Some improvements can't be made due to fear of backward
> > > > > incompatibilities. For example: specifications about named Union.
> > > > > 
> > > > > - If `Apache Avro™ is a data serialization system.` then the
> > repository
> > > > > `apache/avro` should solely focus on (de)serialization, right?
> > Currently
> > > > > our repository contains many nice-to-have-but-not-critical things
> > like:
> > > > > File I/O, Network I/O....
> > > > > 
> > > > > IMHO, I think:
> > > > > 
> > > > > - We should publicly gather RFCs for Avro 2.x
> > > > > 
> > > > > - We should move such nice things out of Avro 2.x (may be to other
> > > > > dedicated repositories)
> > > > > 
> > > > > What do you think about my suggestions. Pls kindly let me know.
> > > > > 
> > > > > Thank you & be strong.
> > > > > 
> > > > > [1] My fork: https://github.com/anhldbk/avro-fork#why-this-fork
> > > > > [2] My opened issue:
> > > > > 
> > https://issues.apache.org/jira/browse/AVRO-2808?jql=reporter%3Danhldbk%20AND%20resolution%20is%20EMPTY
> > 
> > > > > 
> > > > > 
> > > > > 
> > 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic