[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: MSc Project - compaction strategy
From:       Chris Mattmann <mattmann () apache ! org>
Date:       2016-07-19 3:09:55
Message-ID: C6BC9885-E34A-4A69-9031-44C18FB550C4 () jpl ! nasa ! gov
[Download RAW message or body]

Dev discussion about the project should ideally be on the dev list.

Further, all *decisions* must be on the dev list for the project.
JIRA has the negative impact that it is lost in many people's email
filters and hard to parse the signal from the noise.

I would consider some well formed emails to the dev list as part
of your plan as well so that the community can follow along.

Cheers,
Chris




On 7/18/16, 10:42 PM, "steve landiss" <steve.landiss@yahoo.com.INVALID> wrote:

>So much for compaction of information eh?   
>
>    On Tuesday, July 12, 2016 10:06 AM, Pedro Gordo <pedro.gordo1986@gmail.com> wrote:
> 
>
> Hi
>
>Yes, I just saw Marcus reply now, sorry for the duplicate email. The email
>filters were not set up correctly. Thanks to both!
>
>Best regards
>
>Pedro Gordo
>
>On 12 July 2016 at 12:39, Robert Stupp <snazy@snazy.de> wrote:
>
>> As Markus already mentioned, the best place to discuss the idea of your
>> compaction strategy is a lira ticket.
>> Best would be to include as much details (written, not coded) as necessary
>> to understand why this compaction strategy is useful and how it works.
>>
>> Implementation questions and clarifications on #cassandra-dev IRC
>>
>> Robert
>>
>> —
>> Robert Stupp
>> @snazy
>>
>> > On 12 Jul 2016, at 19:42, Pedro Gordo <pedro.gordo1986@gmail.com> wrote:
>> >
>> > Hi all
>> >
>> > I'm finishing an MSc in which my final project is to implement a new
>> > compaction strategy in Cassandra. I've discussed the main points of the
>> > strategy with other community members and received valuable feedback.
>> > However, I understand this will be a tough challenge for someone who has
>> > never worked with Cassandra, but after getting to know the technology,
>> I've
>> > found it fascinating. Since I wanted to contribute to an open source
>> > project in my MSc Project, this makes Cassandra the ideal technology to
>> go
>> > forward, and hence why I've chosen it.
>> >
>> > However, since this is my first time contributing to an open source
>> > project, I've some questions on how to proceed correctly. Looking at the
>> How
>> > To Contribute <http://wiki.apache.org/cassandra/HowToContribute> page, I
>> > see that we're supposed to create a ticket before starting working on it,
>> > however, in this case, does someone need to validate the usefulness of
>> the
>> > strategy or can I just proceed and implement it, or do something else?
>> > Also, is this the correct mailing list to be asking this sort of
>> questions?
>> > :)
>> >
>> > As for the code itself, if I have a question like "Should we be using an
>> > abstract class for compaction classes?" or "What is this method supposed
>> to
>> > do?", can I ask it here? What is the best course of action to learn about
>> > the details of the code in Cassandra? I already saw that it has some
>> > comments, but probably won't be enough.
>> >
>> > The strategy I have in mind will be very simple until I finish the MSc.
>> > After the submission, I'll improve it with other features and feedback I
>> > got, but for the moment, I'll keep it at a basic level. The strategy will
>> > start only during certain periods of time (for example a time of the day
>> > where the cluster has little traffic (1)), during which, the rows will be
>> > made unique across all SSTables. These new tables will be capped at a
>> > configurable size, so after compaction, we can have multiple tables
>> > created. This operation only happens if, after a prior analysis, we find
>> > that the row exists in a number of SSTables above a certain threshold.
>> What
>> > I'm trying to address here is the continuous high CPU usage of the LCS
>> (1),
>> > but also the need for lots of disc space when we have big SSTables
>> > resulting from STCS. I suppose it's a naive strategy, but the aim here is
>> > to give me experience with C*, and of course I'll be happy to take
>> > suggestions. But I'll probably only use the ideas after delivering the
>> > project because, at the moment, I need to keep it simple. Otherwise, I'll
>> > never be able to submit it. :)
>> >
>> > Sorry for the long email, and thanks for all the help in advance! I'm
>> very
>> > excited about this project and look forward to being part of this
>> community!
>> >
>> > Best regards Pedro Gordo
>>
>>
>
>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic