[prev in list] [next in list] [prev in thread] [next in thread] 

List:       rsync
Subject:    Memory usage
From:       Eric Ziegast <ziegast () CERF ! net>
Date:       1998-05-10 10:38:50
[Download RAW message or body]

Let's say I have a directory tree with 20GB across 800,000 files of
customers' web hosting data.  They control the data.  It's my job
to replicate it across data centers.

If I do an rsync of the directory itself, it never finishes
because the rsync processs on both the server and client
become so large that they just spend their time thrashing
through memory.  Really, I'm looking at a 70+ MB process before
I give up on it or it lulls itself to sleep.

Thinking I'm clever, I decided to make a process to parallelize
the directories that I'm trying to rsync.  It works great, but
some of my less clueful customers have some very large directories
full of directories.  An example:
	# /bin/ls -ld | egrep -c '^d'
	8312

If I do 10 processes at a time, I get a bunch of large rsync
processes for the large directories and some spare process
slots to whiddle through the rest of the directories.

Client:
  USER   PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
  root   718  1.4  1.0 3192 2520 pts/0    S 09:40:10  0:55 /usr/sbin/rsync
  root  8133  1.1  3.2 8232 7888 pts/0    S 10:06:25  0:45 /usr/sbin/rsync
  root  1155  0.6  1.9 6904 4672 pts/0    S 09:47:21  0:43 /usr/sbin/rsync
  root  1143  0.4  1.2 4288 2816 pts/0    S 09:47:00  0:32 /usr/sbin/rsync
  root  8104  0.3  1.1 2864 2672 pts/0    S 10:05:49  0:11 /usr/sbin/rsync
  root  1114  0.2  0.8 2272 1816 pts/0    S 09:45:46  0:19 /usr/sbin/rsync
  root  8740  0.1  0.3  936  792 pts/0    S 10:10:04  0:00 /usr/sbin/rsync
  root  8748  0.1  0.3  936  792 pts/0    S 10:10:05  0:00 /usr/sbin/rsync
  root  1007  0.0  2.615400 6472 pts/0    S 09:44:04  1:50 /usr/sbin/rsync

Server:
  USER   PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
  root  2616 11.3  3.0 5520 3712 ?        R 09:50:52  2:11 rsync --server
  root  2569 10.4  1.6 2728 1936 ?        S 09:48:47  2:25 rsync --server
  root  2617 10.1  4.8 9048 5960 ?        S 09:52:02  1:58 rsync --server
  root  9573  9.5  2.0 3528 2400 ?        S 10:09:11  0:17 rsync --server
  root  2418  9.4  2.2 3944 2648 ?        R 09:44:14  2:59 rsync --server
  root  9572  7.7  6.3 8240 7848 ?        R 10:08:48  0:19 rsync --server
  root  2451  4.8 17.62256821952 ?        S 09:46:26  1:12 rsync --server
  root  9545  4.7  2.7 3528 3296 ?        S 10:08:11  0:14 rsync --server
  root  5860  1.9  4.126896 5048 ?        S 10:02:48  0:22 rsync --server
  root  2181  0.0  3.0 3912 3696 ?        S 09:42:32  0:17 rsync --server
  root  2559  0.0  2.0 2696 2512 ?        S 09:48:08  0:11 rsync --server
  root  2588  0.0  4.3 5488 5312 ?        S 09:49:22  0:30 rsync --server
  root  2599  0.0  7.1 9016 8800 ?        S 09:49:43  0:51 rsync --server

My question is, "Why does the rsync process size keep growing?"

I would have hoped hope that rsync, like "du" or "tar", would not
try to remember any information about a subdirectory once it's done
with it.

Related:
	Has anyone run Pure Atria's "Purify" against rsync?  Sounds
	like a good idea if someone hasn't already done so.  I don't
	have it, otherwise I'd run it myself.

--
Eric Ziegast
ziegast@cerf.net

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic