[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bacula-devel
Subject:    Re: [Bacula-devel] De-duplication friendly Volume Format change
From:       Radosław_Korzeniewski <radoslaw () korzeniewski ! net>
Date:       2011-01-05 9:37:29
Message-ID: AANLkTimEQ73h-wKN+Jr_1Ps8R8_iDYgvSWMkrybP9yPF () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


2011/1/5 Kern Sibbald <kern@sibbald.com>

> > I hope that you mean by keeping hashes on Director you mean actually
> > keeping them on both?
>
> We are considering every possibility -- each solution has its own
> advantages
> and disadvantages, so it is very hard to say that one way of doing this is
> the correct or right way.
>
> For example, it is faster to dedupicate if the hashes are stored on the
> client
> machine than if they are stored on a server such as the Director, but not
> every client machine has enough disk space to store them.  Most estimates
> indicate that about 30% more disk space is required to keep hash codes.  In
> addition, your deduplication ratio will very significantly drop (be very
> poor) if you are only deduping a client machine and do not use a
> deduplication "pool" of hashes from multiple machines.
>
> Unless you run tests, which may vary from machine to machine, it is very
> difficult to know what algorithm is best.  One major factor is that the
> machine might be connected to a server by a very slow 100Mb Internet
> connection or a fast 10Gb LAN.
>
> We will probably start with something very simple and add to it over time.
>

A question is: How we'll store a data block hashes?
SQL database seems to be very easy to implement, but it has a lot of
disadvantages.
Another option is use one of opensource key/value database like Redis (
http://redis.io/). It has a very good performance.
The last one is to implement our own solution. Requires a lot of work and
tests (all about time).

What do you think about it?

-- 
Radosław Korzeniewski
radoslaw@korzeniewski.net

[Attachment #5 (text/html)]

<div class="gmail_quote">2011/1/5 Kern Sibbald <span dir="ltr">&lt;<a \
href="mailto:kern@sibbald.com">kern@sibbald.com</a>&gt;</span><br><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;"> <div><div></div><div class="h5">&gt; I hope that you \
mean by keeping hashes on Director you mean actually<br> &gt; keeping them on \
both?<br> <br>
</div></div>We are considering every possibility -- each solution has its own \
advantages<br> and disadvantages, so it is very hard to say that one way of doing \
this is<br> the correct or right way.<br>
<br>
For example, it is faster to dedupicate if the hashes are stored on the client<br>
machine than if they are stored on a server such as the Director, but not<br>
every client machine has enough disk space to store them.   Most estimates<br>
indicate that about 30% more disk space is required to keep hash codes.   In<br>
addition, your deduplication ratio will very significantly drop (be very<br>
poor) if you are only deduping a client machine and do not use a<br>
deduplication &quot;pool&quot; of hashes from multiple machines.<br>
<br>
Unless you run tests, which may vary from machine to machine, it is very<br>
difficult to know what algorithm is best.   One major factor is that the<br>
machine might be connected to a server by a very slow 100Mb Internet<br>
connection or a fast 10Gb LAN.<br>
<br>
We will probably start with something very simple and add to it over \
time.<br></blockquote><div><br>A question is: How we&#39;ll store a data block \
hashes?<br>SQL database seems to be very easy to implement, but it has a lot of \
disadvantages.<br> Another option is use one of opensource key/value database like \
Redis (<a href="http://redis.io/">http://redis.io/</a>). It has a very good \
performance.<br>The last one is to implement our own solution. Requires a lot of work \
and tests (all about time).<br> <br>What do you think about it? \
<br><br></div></div>-- <br>Radosław Korzeniewski<br><a \
href="mailto:radoslaw@korzeniewski.net">radoslaw@korzeniewski.net</a><br>



------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl

_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic