[prev in list] [next in list] [prev in thread] [next in thread]
List: tarsnap-users
Subject: Re: Client-side deduplication during extraction
From: James Cass <jamescass.sc () gmail ! com>
Date: 2017-11-20 12:21:03
Message-ID: CAF81nx-rRgD99_ohi7vL3P8Xzv3989n0PCXaSojVBAmzTPjCzw () mail ! gmail ! com
[Download RAW message or body]
+1 for me. This sounds like a good idea.
That's my 2 satoshis. :-)
On Sun, Nov 19, 2017 at 8:03 PM, Colin Percival <cperciva@tarsnap.com>
wrote:
> On 11/19/17 12:37, Robie Basak wrote:
> > On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote:
> >> On 04/04/17 13:06, Robie Basak wrote:
> >>> Since the redundancy is there and my client has all the details,
> >>> is there any way I can take advantage of this?
> >>
> >> Not right now. This is something I've been thinking about implementing,
> >> but it's rather complicated (the tarsnap "read" path would need to look
> at
> >> data on disk to see what it can "reuse", and normally it doesn't read
> any
> >> files from disk).
> >
> > In case it helps others, I hacked together a client-side cache for this
> > one task. It appears to have worked. Patch below.
>
> Ah yes, I was thinking in terms of "notice that we're extracting the file
> 'foo' and there is already a file 'foo', then read that file in and split
> it into blocks in case any can be reused" -- the case you've covered here
> of keeping a cache of downloaded blocks is much simpler (but only covers
> the "multiple downloads of the same data" case, not the more general case
> of "synchronizing" a system with an archive).
>
> > This is absolutely a hack and not production ready (no concurrency, bad
> > error handling, hardcoded cache path whose directory must be created in
> > advance and permissions set manually, etc), but for a one-off task it
> > was enough for me to get my data out.
> > [snip patch]
>
> Yes, this patch definitely looks like it does what you want. I'd consider
> including it (well, with details tidied up) but I'm not sure if anyone else
> would want to use this functionality... anyone else on the list interested?
>
> --
> Colin Percival
> Security Officer Emeritus, FreeBSD | The power to serve
> Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
>
[Attachment #3 (text/html)]
<div dir="ltr"><div><div>+1 for me. This sounds like a good \
idea.<br></div>That's my 2 satoshis. :-)<br></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Sun, Nov 19, 2017 at 8:03 PM, \
Colin Percival <span dir="ltr"><<a href="mailto:cperciva@tarsnap.com" \
target="_blank">cperciva@tarsnap.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><span class="">On 11/19/17 12:37, Robie Basak wrote:<br> > \
On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote:<br> >> On \
04/04/17 13:06, Robie Basak wrote:<br> </span><span class="">>>> Since the \
redundancy is there and my client has all the details,<br> >>> is there any \
way I can take advantage of this?<br> >><br>
>> Not right now. This is something I've been thinking about \
implementing,<br> >> but it's rather complicated (the tarsnap \
"read" path would need to look at<br> >> data on disk to see what it \
can "reuse", and normally it doesn't read any<br> >> files from \
disk).<br> ><br>
> In case it helps others, I hacked together a client-side cache for this<br>
> one task. It appears to have worked. Patch below.<br>
<br>
</span>Ah yes, I was thinking in terms of "notice that we're extracting the \
file<br> 'foo' and there is already a file 'foo', then read that file \
in and split<br> it into blocks in case any can be reused" -- the case \
you've covered here<br> of keeping a cache of downloaded blocks is much simpler \
(but only covers<br> the "multiple downloads of the same data" case, not \
the more general case<br> of "synchronizing" a system with an archive).<br>
<span class=""><br>
> This is absolutely a hack and not production ready (no concurrency, bad<br>
> error handling, hardcoded cache path whose directory must be created in<br>
> advance and permissions set manually, etc), but for a one-off task it<br>
> was enough for me to get my data out.<br>
</span>> [snip patch]<br>
<br>
Yes, this patch definitely looks like it does what you want. I'd consider<br>
including it (well, with details tidied up) but I'm not sure if anyone else<br>
would want to use this functionality... anyone else on the list interested?<br>
<div class="HOEnZb"><div class="h5"><br>
--<br>
Colin Percival<br>
Security Officer Emeritus, FreeBSD | The power to serve<br>
Founder, Tarsnap | <a href="http://www.tarsnap.com" rel="noreferrer" \
target="_blank">www.tarsnap.com</a> | Online backups for the truly paranoid<br> \
</div></div></blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic