[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Production with Single Node
From:       Nikolay Mihaylov <nmmm () nmmm ! nu>
Date:       2016-01-27 19:52:14
Message-ID: CAAgaqhFfMKjy4JVtj0rqzyR_W+y0pnXxrF2j9sMvixY_1Ngf3A () mail ! gmail ! com
[Download RAW message or body]

HI

We have 2 - 3 installations with single node Cassandra. They working fine,
no problems there,
except if Cassandra stops, everything stops. Even on one node, we usually
"rolling" 500-600 GB data, sometimes even 2-3 TB. We use mostly standard
configuration with almost no changes there.

Here are some considerations for bloom filter config, but they are for old
Cassandra version:
http://nmmm.nu/bloomfilter.htm

https://whoisrequest.com/ - this uses single node Cassandra with about 600
GB data.

We found that it works much better and faster than MySQL. We did test
Postgres, but it was terribly slow. We were in big hurry so we did not
analyze why Postgres was so slow.

Another lesson we learned - when you do single node, put only Cassandra on
single server. Keep webserver / client on different server.

In our latest project we did use TokuDB. It is something like MySQL
"plugin". We know Toku from 5-6 years, but until recently it was paid
software with free demo. TokuDB is currently GPL.

Here is what we researched 5 years ago:

http://www.novini.net/2010/12/mysql-storage-engines-comparison.html

We also did test MongoDB. It is quite fast, but it have been eaten our HDD
very fast.

So little recap what we have:

- Cassandra single nodes - 600-700 GB data
- MySQL with MyISAM - 30-40 GB data
- TokuDB - 100 GB data (this equals to 500 GB MyISAM / InnoDB).

Feel free to contact me if you have non Cassandra related questions.


On Sat, Jan 23, 2016 at 7:10 AM, Anuj Wadehra <anujw_2003@yahoo.co.in>
wrote:

> And I think in a 3 node cluster, RAID 0 would do the job instead of RAID 5
> . So you will need less storage to get same disk space. But you will get
> protection against disk failures and infact entire node failure.
>
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Sat, 23 Jan, 2016 at 10:30 am, Anuj Wadehra
> <anujw_2003@yahoo.co.in> wrote:
> I think Jonathan said it earlier. You may be happy with the performance
> for now as you are using the same commitlog settings that you use in large
> clusters. Test the new setting recommended so that you know the real
> picture. Or be prepared to lose some data in case of failure.
>
> Other than durability, you single node cluster would be Single Point of
> Failure for your site. RAID 5 will only protect you against a disk failure.
> But a server may be down for other reasons too. Question is :Are you ok
> with site going down?
>
> I would suggest you to use hardware with smaller configuration to save on
> cost for smaller sites and go ahead with a 3 node minimum.That ways you
> will provide all the good features of your design irrespective of the site.
> Cassandra is known to work on commodity servers too.
>
>
>
> Thanks
> Anuj
>
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Sat, 23 Jan, 2016 at 4:23 am, Jack Krupansky
> <jack.krupansky@gmail.com> wrote:
> You do of course have the simple technical matters, most of which need to
> be addressed with a proof of concept implementation, related to memory,
> storage, latency, and throughput. I mean, with a scaled cluster you can
> always add nodes to increase capacity and throughput, and reduce latency,
> but with a single node you have limited flexibility.
>
> Just to be clear, Cassandra is still not recommended for "fat nodes" -
> even if you can fit tons of data on the node, you may not have the computes
> to satisfy throughput and latency requirements. And if you don't have
> enough system memory the amount of storage is irrelevant.
>
> Back to my original question:
> How much data (rows, columns), what kind of load pattern (heavy write,
> heavy update, heavy query), and what types of queries (primary key-only,
> slices, filtering, secondary indexes, etc.)?
>
> I do recall a customer who ran into problems because they had SSD but only
> a very limited amount so they were running out of storage. Having enough
> system memory for file system caching and offheap data is important as well.
>
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 5:07 PM, John Lammers <
> john.lammers@karoshealth.com> wrote:
>
>> Thanks for your response Jack.
>>
>> We are already sold on distributed databases, HA and scaling.  We just
>> have some small deployments coming up where there's no money for servers to
>> run multiple Cassandra nodes.
>>
>> So, aside from the lack of HA, I'm asking if a single Cassandra node
>> would be viable in a production environment.  (There would be RAID 5 and
>> the RAID controller cache is backed by flash memory).
>>
>> I'm asking because I'm concerned about using Cassandra in a way that it's
>> not designed for.  That to me is the unsettling aspect.
>>
>> If this is a bad idea, give me the ammo I need to shoot it down.  I need
>> specific technical reasons.
>>
>> Thanks!
>>
>> --John
>>
>> On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky <jack.krupansky@gmail.com
>> > wrote:
>>
>>> Is single-node Cassandra has the performance (and capacity) you need and
>>> the NoSQL data model and API are sufficient for your app, and your dev and
>>> ops and support teams are already familiar with and committed to Cassandra,
>>> and you don't need HA or scaling, then it sounds like you are set.
>>>
>>> You asked about risks, and normally lack of HA and scaling are
>>> unacceptable risks when people are looking at distributed databases.
>>>
>>> Most people on this list are dedicated to and passionate about
>>> distributed databases, HA, and scaling, so it is distinctly unsettling when
>>> somebody comes along who isn't interested in and committed to those same
>>> three qualities. But if single-node happens to work for you, then that's
>>> great.
>>>
>>> -- Jack Krupansky
>>>
>>
>>
>

[Attachment #3 (text/html)]

<div dir="ltr">HI<div><br></div><div>We have 2 - 3 installations with single node \
Cassandra. They working fine, no problems there,</div><div>except if Cassandra stops, \
everything stops. Even on one node, we usually &quot;rolling&quot; 500-600 GB data, \
sometimes even 2-3 TB. We use mostly standard configuration with almost no changes \
there.</div><div><br></div><div>Here are some considerations for bloom filter config, \
but they are for old Cassandra version:</div><div><a \
href="http://nmmm.nu/bloomfilter.htm">http://nmmm.nu/bloomfilter.htm</a><br></div><div><br></div><div><a \
href="https://whoisrequest.com/">https://whoisrequest.com/</a> - this uses single \
node Cassandra with about 600 GB data.</div><div><br></div><div>We found that it \
works much better and faster than MySQL. We did test Postgres, but it was terribly \
slow. We were in big hurry so we did not analyze why Postgres was so \
slow.</div><div><br></div><div>Another lesson we learned - when you do single node, \
put only Cassandra on single server. Keep webserver / client on different \
server.</div><div><br></div><div>In our latest project we did use TokuDB. It is \
something like MySQL &quot;plugin&quot;. We know Toku from 5-6 years, but until \
recently it was paid software with free demo. TokuDB is currently \
GPL.</div><div><br></div><div>Here is what we researched 5 years ago:<br><br><a \
href="http://www.novini.net/2010/12/mysql-storage-engines-comparison.html">http://www. \
novini.net/2010/12/mysql-storage-engines-comparison.html</a><br></div><div><br></div><div>We \
also did test MongoDB. It is quite fast, but it have been eaten our HDD very \
fast.</div><div><br></div><div>So little recap what we \
have:</div><div><br></div><div>- Cassandra single nodes - 600-700 GB data</div><div>- \
MySQL with MyISAM - 30-40 GB data</div><div>- TokuDB - 100 GB data (this equals to \
500 GB MyISAM / InnoDB).</div><div><br></div><div>Feel free to contact me if you have \
non Cassandra related questions.</div><div>  </div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Sat, Jan 23, 2016 at 7:10 AM, \
Anuj Wadehra <span dir="ltr">&lt;<a href="mailto:anujw_2003@yahoo.co.in" \
target="_blank">anujw_2003@yahoo.co.in</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">And I think in a 3 node cluster, RAID 0 would do the job \
instead of RAID 5 . So you will need less storage to get same disk space. But you \
will get protection against disk failures and infact entire node \
failure.<div><br></div><div><span class="">Anuj<br><br><div><a \
href="https://overview.mail.yahoo.com/mobile/?.src=Android" target="_blank">Sent from \
Yahoo Mail on Android</a></div> <br> </span><blockquote style="margin:0 0 20px 0">  \
<div>On Sat, 23 Jan, 2016 at 10:30 am, Anuj Wadehra</div><div>&lt;<a \
href="mailto:anujw_2003@yahoo.co.in" target="_blank">anujw_2003@yahoo.co.in</a>&gt; \
wrote:</div> <div><div class="h5"> <div style="padding:10px 0 0 20px;margin:10px 0 0 \
0;border-left:1px solid #6d00f6"> I think Jonathan said it earlier. You may be happy \
with the performance for now as you are using the same commitlog settings that you \
use in large clusters. Test the new setting recommended so that you know the real \
picture. Or be prepared to lose some data in case of \
failure.<div><br></div><div>Other than durability, you single node cluster would be \
Single Point of Failure for your site. RAID 5 will only protect you against a disk \
failure. But a server may be down for other reasons too. Question is :Are you ok with \
site going down?</div><div><br></div><div>I would suggest you to use hardware with \
smaller configuration to save on cost for smaller sites and go ahead with a 3 node \
minimum.That ways you will provide all the good features of your design irrespective \
of the site. Cassandra is known to work on commodity servers too.  \
</div><div><br></div><div><br></div><div><br></div><div>Thanks</div><div>Anuj</div><div><br></div><div><br></div><div><br><div><br><div><a \
href="https://overview.mail.yahoo.com/mobile/?.src=Android" target="_blank">Sent from \
Yahoo Mail on Android</a></div> <br> <blockquote style="margin:0 0 20px 0">  <div>On \
Sat, 23 Jan, 2016 at 4:23 am, Jack Krupansky</div><div>&lt;<a \
href="mailto:jack.krupansky@gmail.com" \
target="_blank">jack.krupansky@gmail.com</a>&gt; wrote:</div>  <div \
style="padding:10px 0 0 20px;margin:10px 0 0 0;border-left:1px solid #6d00f6"> <div \
dir="ltr">You do of course have the simple technical matters, most of which need to \
be addressed with a proof of concept implementation, related to memory, storage, \
latency, and throughput. I mean, with a scaled cluster you can always add nodes to \
increase capacity and throughput, and reduce latency, but with a single node you have \
limited flexibility.<div><br clear="none"></div><div>Just to be clear, Cassandra is \
still not recommended for &quot;fat nodes&quot; - even if you can fit tons of data on \
the node, you may not have the computes to satisfy throughput and latency \
requirements. And if you don&#39;t have enough system memory the amount of storage is \
irrelevant.</div><div><br clear="none"></div><div>Back to my original \
question:</div><div><span style="font-size:12.8px">How much data (rows, columns), \
what kind of load pattern (heavy write, heavy update, heavy query), and what types of \
queries (primary key-only, slices, filtering, secondary indexes, etc.)?</span><br \
clear="none"></div><div><span style="font-size:12.8px"><br \
clear="none"></span></div><div><span style="font-size:12.8px">I do recall a customer \
who ran into problems because they had SSD but only a very limited amount so they \
were running out of storage. Having enough system memory for file system caching and \
offheap data is important as well.</span></div><div><br clear="none"></div></div><div \
class="gmail_extra"><br clear="all"><div><div><div dir="ltr">-- Jack \
Krupansky</div></div></div> <br clear="none"><div><div class="gmail_quote">On Fri, \
Jan 22, 2016 at 5:07 PM, John Lammers <span dir="ltr">&lt;<a rel="nofollow" \
shape="rect">john.lammers@karoshealth.com</a>&gt;</span> wrote:<br \
clear="none"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">Thanks for your response Jack.<div><br \
clear="none"></div><div>We are already sold on distributed databases, HA and scaling. \
We just have some small deployments coming up where there&#39;s no money for servers \
to run multiple Cassandra nodes.</div><div><br clear="none"></div><div>So, aside from \
the lack of HA, I&#39;m asking if a single Cassandra node would be viable in a \
production environment.   (There would be RAID 5 and the RAID controller cache is \
backed by flash memory).</div><div><br clear="none"></div><div>I&#39;m asking because \
I&#39;m concerned about using Cassandra in a way that it&#39;s not designed for.   \
That to me is the unsettling aspect.</div><div><br clear="none"></div><div>If this is \
a bad idea, give me the ammo I need to shoot it down.   I need specific technical \
reasons.</div><div><br clear="none"></div><div>Thanks!</div><span><font \
color="#888888"></font></span><div><br \
clear="none"></div><div>--John</div><span></span><div><div class="gmail_extra"><br \
clear="none"><div class="gmail_quote">On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky \
<span dir="ltr">&lt;<a rel="nofollow" \
shape="rect">jack.krupansky@gmail.com</a>&gt;</span> wrote:<br \
clear="none"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">Is single-node Cassandra has the \
performance (and capacity) you need and the NoSQL data model and API are sufficient \
for your app, and your dev and ops and support teams are already familiar with and \
committed to Cassandra, and you don&#39;t need HA or scaling, then it sounds like you \
are set.<div><br clear="none"></div><div>You asked about risks, and normally lack of \
HA and scaling are unacceptable risks when people are looking at distributed \
databases.</div><div><br clear="none"></div><div>Most people on this list are \
dedicated to and passionate about distributed databases, HA, and scaling, so it is \
distinctly unsettling when somebody comes along who isn&#39;t interested in and \
committed to those same three qualities. But if single-node happens to work for you, \
then that&#39;s great.</div></div><div class="gmail_extra"><span><font \
color="#888888"><br clear="all"></font></span><div><div><div dir="ltr">-- Jack \
Krupansky</div></div></div></div></blockquote><div><br \
clear="none"></div></div></div></div></div> </blockquote></div></div><br \
clear="none"></div> </div> </blockquote></div></div> </div> \
</div></div></blockquote></div></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic