[prev in list] [next in list] [prev in thread] [next in thread]
List: amanda-users
Subject: Stuck dumper
From: Mike Leong <leongmzlist () gmail ! com>
Date: 2014-02-19 18:43:48
Message-ID: CAJ=Epp8drFV=jG_hfBAJ1_2vE-s=udiRJJo7W_s+kxUUs2P0HQ () mail ! gmail ! com
[Download RAW message or body]
Hi,
I'm running into an occasional problem where a dumper would run w/ 100% CPU
for days doesn't seem to stop. The only fix I have is to manually kill the
dumper. I'm running amanda-backup_server-3.3.3-1.rhel5
-------------
Here's the dumper.<DATE>.debug:
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_getdriver(name=ssh) returns 0x2aca5c8fe520
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_handleinit(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f96eb0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: exec: /usr/bin/ssh -x -o
BatchMode=yes -o PreferredAuthentications=publickey -l amandabackup -i
/var/lib/amanda/.ssh/id_rsa
colo1pchef01.company.com/usr/libexec/amanda/amandad -auth=ssh
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: tcpm_send_token: data is
still flowing
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: got response:
----
CONNECT DATA 499999 MESG 499998 INDEX 499997
OPTIONS features=ffffffff9efefbffffffffff0f;
----
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f9ef10, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13fa6f70, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13faefd0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_close(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_stream_close(0x13f96eb0)
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: execute: /usr/bin/gzip
--best
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Building type FILE header
of 32768-32768 bytes with name='colo1pchef01' disk='/var' dumplevel=2 and
blocksize=32768
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Sending data to
127.0.0.1:3809
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: make_socket opening
socket with family 2
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connect_port: Try port
1024: available - Success
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connected to
127.0.0.1:3809
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: our side is 0.0.0.0:1024
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: try_socksize: send buffer
size is 65536
Sun Feb 16 23:16:45 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing
Sun Feb 16 23:16:56 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing
afterwards, the log line "dumper: stream_read_callback: data is still
flowing" would keep repeating about every 10 seconds.
--------------
Here's the process list, stuck dumper has PID 29924:
[root@backup1:/var/log/amanda/server/CompanyDaily]# ps -ef|grep "^511"
511 1905 29921 0 Feb16 ? 00:00:11 chunker1 CompanyDaily
511 1906 29924 0 Feb16 ? 00:01:21 [ssh] <defunct>
511 1907 29924 0 Feb16 ? 00:00:00 /usr/bin/gzip --best
511 3786 29928 0 Feb16 ? 00:00:00 [ssh] <defunct>
511 3788 29925 0 Feb16 ? 00:00:00 [ssh] <defunct>
511 29882 4670 0 Feb16 ? 00:00:00 crond
511 29886 29882 0 Feb16 ? 00:00:00 /usr/bin/perl
/usr/sbin/amdump CompanyDaily
511 29918 29882 0 Feb16 ? 00:00:00 /usr/sbin/sendmail
-FCronDaemon -i -odi -oem -oi -t
511 29919 29918 0 Feb16 ? 00:00:00 /usr/sbin/postdrop -r
511 29921 29886 0 Feb16 ? 00:00:04 /usr/libexec/amanda/driver
CompanyDaily
511 29922 29921 0 Feb16 ? 00:05:02 /usr/bin/perl
/usr/libexec/amanda/taper CompanyDaily
511 29923 29921 0 Feb16 ? 00:01:09 dumper0 CompanyDaily
511 29924 29921 99 Feb16 ? 2-10:58:00 dumper1 CompanyDaily
511 29925 29921 0 Feb16 ? 00:00:20 dumper2 CompanyDaily
511 29926 29921 0 Feb16 ? 00:04:06 dumper3 CompanyDaily
511 29927 29921 0 Feb16 ? 00:02:27 dumper4 CompanyDaily
511 29928 29921 0 Feb16 ? 00:02:21 dumper5 CompanyDaily
--
heres' the pstree output via pstree -lpa :
|-crond,4670
| `-crond,29882
| |-amdump,29886 /usr/sbin/amdump LithiumDaily
| | `-driver,29921 LithiumDaily
| | |-chunker,1905 LithiumDaily
| | |-dumper,29923 LithiumDaily
| | |-dumper,29924 LithiumDaily
| | | |-gzip,1907 --best
| | | `-(ssh,1906)
| | |-dumper,29925 LithiumDaily
| | | `-(ssh,3788)
| | |-dumper,29926 LithiumDaily
| | |-dumper,29927 LithiumDaily
| | |-dumper,29928 LithiumDaily
| | | `-(ssh,3786)
| | `-taper,29922 /usr/libexec/amanda/taper LithiumDaily
| `-sendmail,29918 -FCronDaemon -i -odi -oem -oi -t
| `-postdrop,29919 -r
----
Here's the netstat output for those amanda processes:
unix 3 [ ] STREAM CONNECTED 223556345 1905/chunker1
unix 3 [ ] STREAM CONNECTED 223556344 29921/driver
unix 3 [ ] STREAM CONNECTED 223529657 29928/dumper5
unix 3 [ ] STREAM CONNECTED 223529656 29921/driver
unix 3 [ ] STREAM CONNECTED 223529655 29927/dumper4
unix 3 [ ] STREAM CONNECTED 223529654 29921/driver
unix 3 [ ] STREAM CONNECTED 223529653 29926/dumper3
unix 3 [ ] STREAM CONNECTED 223529652 29921/driver
unix 3 [ ] STREAM CONNECTED 223529651 29925/dumper2
unix 3 [ ] STREAM CONNECTED 223529650 29921/driver
unix 3 [ ] STREAM CONNECTED 223529649 29924/dumper1
unix 3 [ ] STREAM CONNECTED 223529648 29921/driver
unix 3 [ ] STREAM CONNECTED 223529647 29923/dumper0
unix 3 [ ] STREAM CONNECTED 223529646 29921/driver
unix 3 [ ] STREAM CONNECTED 223529645 29922/perl
unix 3 [ ] STREAM CONNECTED 223529644 29921/driver
unix 2 [ ] DGRAM 223529619 29919/postdrop
unix 3 [ ] STREAM CONNECTED 223529616 29918/sendmail
unix 3 [ ] STREAM CONNECTED 223529615 29919/postdrop
unix 2 [ ] DGRAM 223529609 29918/sendmail
--------
Any ideas on why it's stuck?
thx
m
[Attachment #3 (text/html)]
<div dir="ltr">Hi,<br><br>I'm running into an occasional problem where a dumper \
would run w/ 100% CPU for days doesn't seem to stop. The only fix I have is to \
manually kill the dumper. I'm running amanda-backup_server-3.3.3-1.rhel5<br> \
<br>-------------<br>Here's the dumper.<DATE>.debug:<br><br>Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_getdriver(name=ssh) returns \
0x2aca5c8fe520<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_handleinit(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13f96eb0, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
exec: /usr/bin/ssh -x -o BatchMode=yes -o PreferredAuthentications=publickey -l \
amandabackup -i /var/lib/amanda/.ssh/id_rsa <a \
href="http://colo1pchef01.company.com">colo1pchef01.company.com</a> \
/usr/libexec/amanda/amandad -auth=ssh<br> Sun Feb 16 23:16:34 2014: thd-0x13f128e0: \
dumper: tcpm_send_token: data is still flowing<br>Sun Feb 16 23:16:34 2014: \
thd-0x13f128e0: dumper: got response:<br>----<br>CONNECT DATA 499999 MESG 499998 \
INDEX 499997<br>OPTIONS features=ffffffff9efefbffffffffff0f;<br> <br>----<br><br>Sun \
Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13f9ef10, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_streaminit(stream=0x13fa6f70, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13faefd0, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_close(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_stream_close(0x13f96eb0)<br>Sun Feb \
16 23:16:35 2014: thd-0x13f128e0: dumper: execute: /usr/bin/gzip --best<br>Sun Feb 16 \
23:16:35 2014: thd-0x13f128e0: dumper: Building type FILE header of 32768-32768 bytes \
with name='colo1pchef01' disk='/var' dumplevel=2 and \
blocksize=32768<br> Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Sending data to \
<a href="http://127.0.0.1:3809">127.0.0.1:3809</a><br><br>Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: make_socket opening socket with family 2<br>Sun Feb 16 \
23:16:35 2014: thd-0x13f128e0: dumper: connect_port: Try port 1024: available - \
Success<br> Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connected to <a \
href="http://127.0.0.1:3809">127.0.0.1:3809</a><br>Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: our side is <a \
href="http://0.0.0.0:1024">0.0.0.0:1024</a><br> Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: try_socksize: send buffer size is 65536<br>Sun Feb 16 \
23:16:45 2014: thd-0x13f128e0: dumper: stream_read_callback: data is still \
flowing<br>Sun Feb 16 23:16:56 2014: thd-0x13f128e0: dumper: stream_read_callback: \
data is still flowing<br> <br><br>afterwards, the log line "dumper: \
stream_read_callback: data is still flowing" would keep repeating about every 10 \
seconds.<br>--------------<br>Here's the process list, stuck dumper has PID \
29924:<br> <br>[root@backup1:/var/log/amanda/server/CompanyDaily]# ps -ef|grep \
"^511"<br>511 1905 29921 0 Feb16 ? 00:00:11 chunker1 \
CompanyDaily<br>511 1906 29924 0 Feb16 ? 00:01:21 [ssh] \
<defunct><br> 511 1907 29924 0 Feb16 ? 00:00:00 /usr/bin/gzip \
--best<br>511 3786 29928 0 Feb16 ? 00:00:00 [ssh] \
<defunct><br>511 3788 29925 0 Feb16 ? 00:00:00 [ssh] \
<defunct><br>511 29882 4670 0 Feb16 ? 00:00:00 crond<br> 511 \
29886 29882 0 Feb16 ? 00:00:00 /usr/bin/perl /usr/sbin/amdump \
CompanyDaily<br>511 29918 29882 0 Feb16 ? 00:00:00 /usr/sbin/sendmail \
-FCronDaemon -i -odi -oem -oi -t<br>511 29919 29918 0 Feb16 ? 00:00:00 \
/usr/sbin/postdrop -r<br> 511 29921 29886 0 Feb16 ? 00:00:04 \
/usr/libexec/amanda/driver CompanyDaily<br>511 29922 29921 0 Feb16 ? \
00:05:02 /usr/bin/perl /usr/libexec/amanda/taper CompanyDaily<br>511 29923 29921 \
0 Feb16 ? 00:01:09 dumper0 CompanyDaily<br> 511 29924 29921 99 Feb16 ? \
2-10:58:00 dumper1 CompanyDaily<br>511 29925 29921 0 Feb16 ? 00:00:20 \
dumper2 CompanyDaily<br>511 29926 29921 0 Feb16 ? 00:04:06 dumper3 \
CompanyDaily<br>511 29927 29921 0 Feb16 ? 00:02:27 dumper4 \
CompanyDaily<br> 511 29928 29921 0 Feb16 ? 00:02:21 dumper5 \
CompanyDaily<br><br>--<br>heres' the pstree output via pstree -lpa :<br><br><br> \
|-crond,4670<br> | `-crond,29882<br> | |-amdump,29886 /usr/sbin/amdump \
LithiumDaily<br> | | `-driver,29921 LithiumDaily<br> | | \
|-chunker,1905 LithiumDaily<br> | | |-dumper,29923 LithiumDaily<br> | \
| |-dumper,29924 LithiumDaily<br> | | | |-gzip,1907 --best<br> \
| | | `-(ssh,1906)<br> | | |-dumper,29925 LithiumDaily<br> \
| | | `-(ssh,3788)<br> | | |-dumper,29926 LithiumDaily<br> \
| | |-dumper,29927 LithiumDaily<br> | | |-dumper,29928 \
LithiumDaily<br> | | | `-(ssh,3786)<br> | | `-taper,29922 \
/usr/libexec/amanda/taper LithiumDaily<br> | `-sendmail,29918 -FCronDaemon -i \
-odi -oem -oi -t<br> | `-postdrop,29919 -r<br><br>----<br>Here's the \
netstat output for those amanda processes:<br><br>unix 3 [ ] STREAM \
CONNECTED 223556345 1905/chunker1<br>unix 3 [ ] STREAM \
CONNECTED 223556344 29921/driver<br> unix 3 [ ] STREAM \
CONNECTED 223529657 29928/dumper5<br>unix 3 [ ] STREAM \
CONNECTED 223529656 29921/driver<br>unix 3 [ ] STREAM CONNECTED \
223529655 29927/dumper4<br> unix 3 [ ] STREAM CONNECTED \
223529654 29921/driver<br>unix 3 [ ] STREAM CONNECTED 223529653 \
29926/dumper3<br>unix 3 [ ] STREAM CONNECTED 223529652 \
29921/driver<br> unix 3 [ ] STREAM CONNECTED 223529651 \
29925/dumper2<br>unix 3 [ ] STREAM CONNECTED 223529650 \
29921/driver<br>unix 3 [ ] STREAM CONNECTED 223529649 \
29924/dumper1<br> unix 3 [ ] STREAM CONNECTED 223529648 \
29921/driver<br>unix 3 [ ] STREAM CONNECTED 223529647 \
29923/dumper0<br>unix 3 [ ] STREAM CONNECTED 223529646 \
29921/driver<br> unix 3 [ ] STREAM CONNECTED 223529645 \
29922/perl<br>unix 3 [ ] STREAM CONNECTED 223529644 \
29921/driver<br>unix 2 [ ] DGRAM 223529619 \
29919/postdrop<br> unix 3 [ ] STREAM CONNECTED 223529616 \
29918/sendmail<br>unix 3 [ ] STREAM CONNECTED 223529615 \
29919/postdrop<br>unix 2 [ ] DGRAM 223529609 \
29918/sendmail<br> <br>--------<br><br>Any ideas on why it's \
stuck?<br><br>thx<br>m</div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic