[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Huge single-node DCs (?)
From:       Bowen Song <bowen () bso ! ng>
Date:       2021-04-08 17:38:20
Message-ID: 8ad67800-6be2-4bb2-f794-938f574d52dc () bso ! ng
[Download RAW message or body]

This is off-topic. But if your goal is to maximise storage density and 
also ensuring data durability and availability, this is what you should 
be looking at:

  * hardware:
    https://www.backblaze.com/blog/open-source-data-storage-server/
  * architecture and software:
    https://www.backblaze.com/blog/vault-cloud-storage-architecture/


On 08/04/2021 17:50, Joe Obernberger wrote:
> I am also curious on this question.  Say your use case is to store 
> 10PBytes of data in a new server room / data-center with new 
> equipment, what makes the most sense?  If your database is primarily 
> write with little read, I think you'd want to maximize disk space per 
> rack space.  So you may opt for a 2u server with 24 3.5" disks at 
> 16TBytes each for a node with 384TBytes of disk - so ~27 servers for 
> 10PBytes.
>
> Cassandra doesn't seem to be the good choice for that configuration; 
> the rule of thumb that I'm hearing is ~2Tbytes per node, in which case 
> we'd need over 5000 servers.  This seems really unreasonable.
>
> -Joe
>
> On 4/8/2021 9:56 AM, Lapo Luchini wrote:
>> Hi, one project I wrote is using Cassandra to back the huge amount of 
>> data it needs (data is written only once and read very rarely, but 
>> needs to be accessible for years, so the storage needs become huge in 
>> time and I chose Cassandra mainly for its horizontal scalability 
>> regarding disk size) and a client of mine needs to install that on 
>> his hosts.
>>
>> Problem is, while I usually use a cluster of 6 "smallish" nodes 
>> (which can grow in time), he only has big ESX servers with huge disk 
>> space (which is already RAID-6 redundant) but wouldn't have the 
>> possibility to have 3+ nodes per DC.
>>
>> This is out of my usual experience with Cassandra and, as far as I 
>> read around, out of most use-cases found on the website or this 
>> mailing list, so the question is:
>> does it make sense to use Cassandra with a big (let's talk 6TB today, 
>> up to 20TB in a few years) single-node DataCenter, and another 
>> single-node DataCenter (to act as disaster recovery)?
>>
>> Thanks in advance for any suggestion or comment!
>>

[Attachment #3 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>This is off-topic. But if your goal is to maximise storage
      density and also ensuring data durability and availability, this
      is what you should be looking at:</p>
    <ul>
      <li>hardware:
        <a class="moz-txt-link-freetext" \
href="https://www.backblaze.com/blog/open-source-data-storage-server/">https://www.backblaze.com/blog/open-source-data-storage-server/</a></li>
  <li>architecture and software:
        <a class="moz-txt-link-freetext" \
href="https://www.backblaze.com/blog/vault-cloud-storage-architecture/">https://www.backblaze.com/blog/vault-cloud-storage-architecture/</a><br>
  </li>
    </ul>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 08/04/2021 17:50, Joe Obernberger
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:5340760c-cca6-1b6b-5f93-26746251f6ff@gmail.com">I am
      also curious on this question.  Say your use case is to store
      10PBytes of data in a new server room / data-center with new
      equipment, what makes the most sense?  If your database is
      primarily write with little read, I think you'd want to maximize
      disk space per rack space.  So you may opt for a 2u server with 24
      3.5" disks at 16TBytes each for a node with 384TBytes of disk - so
      ~27 servers for 10PBytes.
      <br>
      <br>
      Cassandra doesn't seem to be the good choice for that
      configuration; the rule of thumb that I'm hearing is ~2Tbytes per
      node, in which case we'd need over 5000 servers.  This seems
      really unreasonable.
      <br>
      <br>
      -Joe
      <br>
      <br>
      On 4/8/2021 9:56 AM, Lapo Luchini wrote:
      <br>
      <blockquote type="cite">Hi, one project I wrote is using Cassandra
        to back the huge amount of data it needs (data is written only
        once and read very rarely, but needs to be accessible for years,
        so the storage needs become huge in time and I chose Cassandra
        mainly for its horizontal scalability regarding disk size) and a
        client of mine needs to install that on his hosts.
        <br>
        <br>
        Problem is, while I usually use a cluster of 6 "smallish" nodes
        (which can grow in time), he only has big ESX servers with huge
        disk space (which is already RAID-6 redundant) but wouldn't have
        the possibility to have 3+ nodes per DC.
        <br>
        <br>
        This is out of my usual experience with Cassandra and, as far as
        I read around, out of most use-cases found on the website or
        this mailing list, so the question is:
        <br>
        does it make sense to use Cassandra with a big (let's talk 6TB
        today, up to 20TB in a few years) single-node DataCenter, and
        another single-node DataCenter (to act as disaster recovery)?
        <br>
        <br>
        Thanks in advance for any suggestion or comment!
        <br>
        <br>
      </blockquote>
    </blockquote>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic