[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-dev
Subject:    Re: [DISCUSSION] Flume-Kafka 0.9 Support
From:       Brock Noland <brock () apache ! org>
Date:       2015-12-22 20:10:13
Message-ID: CAASjJObi-rgjwxKKUMkqsBk=UKgfgYPmVQK+yFCsC-T7exKJPQ () mail ! gmail ! com
[Download RAW message or body]


There are many ways to approach this problem and I think I have worked on a
at least one project which has tried them all:

Apache MRUnit => Kept versions in maven modules and eventually separate
source trees
Apache Hive (and Parquet ) => Used an extensive "shim" layer to allow Hive
to work with dozens of Hadoop releases
StreamSets => each component is a classloader isolated component

The approach StreamSets has taken is the most sustainable approach to
working with many versions. Back when I worked on
https://issues.apache.org/jira/browse/FLUME-1735 we discussed doing
something similar with Flume. Having now worked on an implementation of
that, I can say it was one of the more difficult problems I've worked on.
Doing this in Flume would be a many month effort which would likely break
backwards compatibility.

The Shim approach Hive took ended up being a complete nightmare with
someone always complaining when we removed support for one Hadoop version.
This approach built a tremendous amount of technical debt. I would not take
that approach again.

The module approach MRUnit also became a maintenance nightmare. All patches
need to be tested multiple times as unrelated changes can break one profile
due to dependency version changes and you have to be able output multiple
artifacts.

For my money, the best approach was the separate source tree approach.
Which is approach Jeff is suggesting. Users who want older Kafka can use
1.6 and those that want newer flume use 1.7. Anyway who want a 1.7 feature
with 1.6 can do the cherry-pick. Based on the aforementioned experiences, I
would strongly suggest we take this approach.




*From:* Ralph Goers <ralph.goers@dslextreme.com>
*Date:* December 22, 2015 at 1:29:08 PM EST
*To:* dev@flume.apache.org
*Subject:* *Re: [DISCUSSION] Flume-Kafka 0.9 Support*
*Reply-To:* dev@flume.apache.org

Why not simply move the "old" kafka components to their own maven module?
Then you can keep them as part of the distribution for the next release or
two.

Ralph

On Dec 22, 2015, at 8:10 AM, Jarek Jarcec Cecho <jarcec@apache.org> wrote:


It's unfortunate that in order to support the new features in Kafka 0.9
(primarily the security additions), one have to lose support of previous
version (0.8).


I do believe that the security additions that have been added recently are
important enough for us to migrate to the new version of Kafka and use it
for the next Flume release. If some people will need to continue using
future Flume version with Kafka 0.8, they should be able to simply take
1.6.0 version of Kafka Channel/Source/Sink jars and use them with the new
agent, so we do have a mitigation plan if needed.


Jarcec


On Dec 22, 2015, at 3:26 PM, Jeff Holoman <jholoman@cloudera.com> wrote:


With the new release of Kafka I wanted to start the discussion on how best

to handle updating Flume to be able to make use of some of the new features

available in 0.9.


First, it is important for Flume to adopt the 0.9 Kafka Clients as the new

Consumer / Producer API's are the only APIs that support new Security

features put into the latest Kafka release such as SSL.


If we agree that this is important, then we need to consider how best to

make this switch. With many projects, we could just update the jars/clients

and move along happily, however, the Kafka compatibility story complicates

this.



-


Kafka promises to be backward compatible with clients

-


   i.e. A 0.8.x client can talk to a 0.9.x broker

   -


Kafka does not promise to be forward compatible (at all) from client

perspective:

-


   i.e. A 0.9.x client can not talk to a 0.8.x broker

   -


   If it works, its is by luck and not reliable, even for old

   functionality

   -


   This is due to protocol changes and no way for the client to know the

   version of Kafka it's talking to. Hopefully KIP-35 (Retrieving protocol

   version) will move this in the right direction.




-


Integrations that utilize Kafka 0.9.x clients will not be able to talk

to Kafka 0.8.x brokers at all and may get cryptic error messages when doing

so.

-


Integrations will only be able to support one major version of Kafka at

a time without more complex class-loading

-


   Note: The kafka_2.10 artifact depends on the kafka-clients artifact

   so you cannot have both kafka-clients & kafka_2.10 of different

versions at

   the same time without collision

   -


However older clients (0.8.x) will work when talking to 0.9.x server.

-


   But that is pretty much useless as the benefits of 0.9.x (security

   features) won't be available.



Given these constraints, and after careful consideration, I propose that we

do the following.


1) Update the Kafka libraries to the latest 0.9/0.9+ release and update the

Source, Sink and Channel Implementations to make use of the new Kafka

Clients

2) Document that Flume no longer supports Kafka Brokers < 0.9


Given that both producer and clients will be updated, there will need to be

changes in agent configurations to support the new clients.


This means if upgrading Flume, existing agent configurations will break. I

don't see a clean way around this, unfortunately. This seems to be a

situation where we break things, and document this to be the case.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic