[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-help
Subject:    Re: [osol-help] ZFS Dedup question
From:       Edward Ned Harvey <opensolarisisdeadlongliveopensolaris () nedharvey ! com>
Date:       2011-01-29 14:27:38
Message-ID: 000101cbbfc0$b1c38d00$154aa700$ () nedharvey ! com
[Download RAW message or body]

> From: opensolaris-help-bounces@opensolaris.org [mailto:opensolaris-help-
> bounces@opensolaris.org] On Behalf Of Igor P
> 
> The thing I was wondering about was it seems like ZFS only dedup at the
file
> level and not the block. When I make multiple copies of a file to the
store I
> see an increase in the deup ratio, but when I copy similar files the ratio
stays
> at 1.00x.

It is block level, when the blocks get written to disk.  So if you have a
chunk of data in one file, and the same chunk of data offset by one byte in
another file, those two will not match and will not dedup.  But if you have
two separate files that both start with the same chunk of data (which is
larger than a multiple of the block size 128K) then that first chunk of the
file will dedup while the rest of the file will not because the rest of the
file is different.

Generally speaking, filesystems don't have a lot of files whose internal
contents are partially the same in sections and partially different...
Dedup is mostly meant to benefit filesystems which have a lot of identical
copies of the same files.  For example, a file server in a company where a
bunch of developers are all doing work in their home directories, with
working copies of something like a svn repo, or stuff like that.  Most
users, most files, in that case, will be duplicates of files that other
users have too.

It is also worth mention:  Suppose you have virtual machines in your
machine, and there are files inside the virtual machines which are
duplicates of each other.  Since the filesystem of the virtual machine is
unlikely to map identical guest vm blocks to identical host machine blocks,
it means it's very unlikely you'll dedup anything at all inside the guest
VM.

Also, while you set dedup=on for each zfs dataset, the deduplication is
pool-wide.  Meaning:  Suppose you have 3 zfs filesystems, tank/A, tank/B,
and tank/C.  You have dedup on for A and C.  Then the system won't try any
dedup for B, but the system will dedup A and C together.


_______________________________________________
opensolaris-help mailing list
opensolaris-help@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic