[prev in list] [next in list] [prev in thread] [next in thread] 

List:       amanda-users
Subject:    Stuck dumper
From:       Mike Leong <leongmzlist () gmail ! com>
Date:       2014-02-19 18:43:48
Message-ID: CAJ=Epp8drFV=jG_hfBAJ1_2vE-s=udiRJJo7W_s+kxUUs2P0HQ () mail ! gmail ! com
[Download RAW message or body]

Hi,

I'm running into an occasional problem where a dumper would run w/ 100% CPU
for days doesn't seem to stop.  The only fix I have is to manually kill the
dumper.  I'm running amanda-backup_server-3.3.3-1.rhel5

-------------
Here's the dumper.<DATE>.debug:

Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_getdriver(name=ssh) returns 0x2aca5c8fe520
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_handleinit(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f96eb0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: exec: /usr/bin/ssh -x -o
BatchMode=yes -o PreferredAuthentications=publickey -l amandabackup -i
/var/lib/amanda/.ssh/id_rsa
colo1pchef01.company.com/usr/libexec/amanda/amandad -auth=ssh
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: tcpm_send_token: data is
still flowing
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: got response:
----
CONNECT DATA 499999 MESG 499998 INDEX 499997
OPTIONS features=ffffffff9efefbffffffffff0f;

----

Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f9ef10, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13fa6f70, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13faefd0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_close(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_stream_close(0x13f96eb0)
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: execute: /usr/bin/gzip
--best
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Building type FILE header
of 32768-32768 bytes with name='colo1pchef01' disk='/var' dumplevel=2 and
blocksize=32768
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Sending data to
127.0.0.1:3809

Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: make_socket opening
socket with family 2
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connect_port: Try  port
1024: available - Success
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connected to
127.0.0.1:3809
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: our side is 0.0.0.0:1024
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: try_socksize: send buffer
size is 65536
Sun Feb 16 23:16:45 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing
Sun Feb 16 23:16:56 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing


afterwards, the log line "dumper: stream_read_callback: data is still
flowing" would keep repeating about every 10 seconds.
--------------
Here's the process list, stuck dumper has PID 29924:

[root@backup1:/var/log/amanda/server/CompanyDaily]# ps -ef|grep "^511"
511       1905 29921  0 Feb16 ?        00:00:11 chunker1 CompanyDaily
511       1906 29924  0 Feb16 ?        00:01:21 [ssh] <defunct>
511       1907 29924  0 Feb16 ?        00:00:00 /usr/bin/gzip --best
511       3786 29928  0 Feb16 ?        00:00:00 [ssh] <defunct>
511       3788 29925  0 Feb16 ?        00:00:00 [ssh] <defunct>
511      29882  4670  0 Feb16 ?        00:00:00 crond
511      29886 29882  0 Feb16 ?        00:00:00 /usr/bin/perl
/usr/sbin/amdump CompanyDaily
511      29918 29882  0 Feb16 ?        00:00:00 /usr/sbin/sendmail
-FCronDaemon -i -odi -oem -oi -t
511      29919 29918  0 Feb16 ?        00:00:00 /usr/sbin/postdrop -r
511      29921 29886  0 Feb16 ?        00:00:04 /usr/libexec/amanda/driver
CompanyDaily
511      29922 29921  0 Feb16 ?        00:05:02 /usr/bin/perl
/usr/libexec/amanda/taper CompanyDaily
511      29923 29921  0 Feb16 ?        00:01:09 dumper0 CompanyDaily
511      29924 29921 99 Feb16 ?        2-10:58:00 dumper1 CompanyDaily
511      29925 29921  0 Feb16 ?        00:00:20 dumper2 CompanyDaily
511      29926 29921  0 Feb16 ?        00:04:06 dumper3 CompanyDaily
511      29927 29921  0 Feb16 ?        00:02:27 dumper4 CompanyDaily
511      29928 29921  0 Feb16 ?        00:02:21 dumper5 CompanyDaily

--
heres' the pstree output via pstree -lpa :


  |-crond,4670
  |   `-crond,29882
  |       |-amdump,29886 /usr/sbin/amdump LithiumDaily
  |       |   `-driver,29921 LithiumDaily
  |       |       |-chunker,1905 LithiumDaily
  |       |       |-dumper,29923 LithiumDaily
  |       |       |-dumper,29924 LithiumDaily
  |       |       |   |-gzip,1907 --best
  |       |       |   `-(ssh,1906)
  |       |       |-dumper,29925 LithiumDaily
  |       |       |   `-(ssh,3788)
  |       |       |-dumper,29926 LithiumDaily
  |       |       |-dumper,29927 LithiumDaily
  |       |       |-dumper,29928 LithiumDaily
  |       |       |   `-(ssh,3786)
  |       |       `-taper,29922 /usr/libexec/amanda/taper LithiumDaily
  |       `-sendmail,29918 -FCronDaemon -i -odi -oem -oi -t
  |           `-postdrop,29919 -r

----
Here's the netstat output for those amanda processes:

unix  3      [ ]         STREAM     CONNECTED     223556345 1905/chunker1
unix  3      [ ]         STREAM     CONNECTED     223556344 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529657 29928/dumper5
unix  3      [ ]         STREAM     CONNECTED     223529656 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529655 29927/dumper4
unix  3      [ ]         STREAM     CONNECTED     223529654 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529653 29926/dumper3
unix  3      [ ]         STREAM     CONNECTED     223529652 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529651 29925/dumper2
unix  3      [ ]         STREAM     CONNECTED     223529650 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529649 29924/dumper1
unix  3      [ ]         STREAM     CONNECTED     223529648 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529647 29923/dumper0
unix  3      [ ]         STREAM     CONNECTED     223529646 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529645 29922/perl
unix  3      [ ]         STREAM     CONNECTED     223529644 29921/driver
unix  2      [ ]         DGRAM                    223529619 29919/postdrop
unix  3      [ ]         STREAM     CONNECTED     223529616 29918/sendmail
unix  3      [ ]         STREAM     CONNECTED     223529615 29919/postdrop
unix  2      [ ]         DGRAM                    223529609 29918/sendmail

--------

Any ideas on why it's stuck?

thx
m

[Attachment #3 (text/html)]

<div dir="ltr">Hi,<br><br>I&#39;m running into an occasional problem where a dumper \
would run w/ 100% CPU for days doesn&#39;t seem to stop.  The only fix I have is to \
manually kill the dumper.  I&#39;m running amanda-backup_server-3.3.3-1.rhel5<br> \
<br>-------------<br>Here&#39;s the dumper.&lt;DATE&gt;.debug:<br><br>Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_getdriver(name=ssh) returns \
0x2aca5c8fe520<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_handleinit(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13f96eb0, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
exec: /usr/bin/ssh -x -o BatchMode=yes -o PreferredAuthentications=publickey -l \
amandabackup -i /var/lib/amanda/.ssh/id_rsa <a \
href="http://colo1pchef01.company.com">colo1pchef01.company.com</a> \
/usr/libexec/amanda/amandad -auth=ssh<br> Sun Feb 16 23:16:34 2014: thd-0x13f128e0: \
dumper: tcpm_send_token: data is still flowing<br>Sun Feb 16 23:16:34 2014: \
thd-0x13f128e0: dumper: got response:<br>----<br>CONNECT DATA 499999 MESG 499998 \
INDEX 499997<br>OPTIONS features=ffffffff9efefbffffffffff0f;<br> <br>----<br><br>Sun \
Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13f9ef10, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_streaminit(stream=0x13fa6f70, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_streaminit(stream=0x13faefd0, \
driver=0x2aca5c8fe520 (SSH))<br>Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: \
security_close(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))<br> Sun Feb 16 \
23:16:34 2014: thd-0x13f128e0: dumper: security_stream_close(0x13f96eb0)<br>Sun Feb \
16 23:16:35 2014: thd-0x13f128e0: dumper: execute: /usr/bin/gzip --best<br>Sun Feb 16 \
23:16:35 2014: thd-0x13f128e0: dumper: Building type FILE header of 32768-32768 bytes \
with name=&#39;colo1pchef01&#39; disk=&#39;/var&#39; dumplevel=2 and \
blocksize=32768<br> Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Sending data to \
<a href="http://127.0.0.1:3809">127.0.0.1:3809</a><br><br>Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: make_socket opening socket with family 2<br>Sun Feb 16 \
23:16:35 2014: thd-0x13f128e0: dumper: connect_port: Try  port 1024: available - \
Success<br> Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connected to <a \
href="http://127.0.0.1:3809">127.0.0.1:3809</a><br>Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: our side is <a \
href="http://0.0.0.0:1024">0.0.0.0:1024</a><br> Sun Feb 16 23:16:35 2014: \
thd-0x13f128e0: dumper: try_socksize: send buffer size is 65536<br>Sun Feb 16 \
23:16:45 2014: thd-0x13f128e0: dumper: stream_read_callback: data is still \
flowing<br>Sun Feb 16 23:16:56 2014: thd-0x13f128e0: dumper: stream_read_callback: \
data is still flowing<br> <br><br>afterwards, the log line &quot;dumper: \
stream_read_callback: data is still flowing&quot; would keep repeating about every 10 \
seconds.<br>--------------<br>Here&#39;s the process list, stuck dumper has PID \
29924:<br> <br>[root@backup1:/var/log/amanda/server/CompanyDaily]# ps -ef|grep \
&quot;^511&quot;<br>511       1905 29921  0 Feb16 ?        00:00:11 chunker1 \
CompanyDaily<br>511       1906 29924  0 Feb16 ?        00:01:21 [ssh] \
&lt;defunct&gt;<br> 511       1907 29924  0 Feb16 ?        00:00:00 /usr/bin/gzip \
--best<br>511       3786 29928  0 Feb16 ?        00:00:00 [ssh] \
&lt;defunct&gt;<br>511       3788 29925  0 Feb16 ?        00:00:00 [ssh] \
&lt;defunct&gt;<br>511      29882  4670  0 Feb16 ?        00:00:00 crond<br> 511      \
29886 29882  0 Feb16 ?        00:00:00 /usr/bin/perl /usr/sbin/amdump \
CompanyDaily<br>511      29918 29882  0 Feb16 ?        00:00:00 /usr/sbin/sendmail \
-FCronDaemon -i -odi -oem -oi -t<br>511      29919 29918  0 Feb16 ?        00:00:00 \
/usr/sbin/postdrop -r<br> 511      29921 29886  0 Feb16 ?        00:00:04 \
/usr/libexec/amanda/driver CompanyDaily<br>511      29922 29921  0 Feb16 ?        \
00:05:02 /usr/bin/perl /usr/libexec/amanda/taper CompanyDaily<br>511      29923 29921 \
0 Feb16 ?        00:01:09 dumper0 CompanyDaily<br> 511      29924 29921 99 Feb16 ?    \
2-10:58:00 dumper1 CompanyDaily<br>511      29925 29921  0 Feb16 ?        00:00:20 \
dumper2 CompanyDaily<br>511      29926 29921  0 Feb16 ?        00:04:06 dumper3 \
CompanyDaily<br>511      29927 29921  0 Feb16 ?        00:02:27 dumper4 \
CompanyDaily<br> 511      29928 29921  0 Feb16 ?        00:02:21 dumper5 \
CompanyDaily<br><br>--<br>heres&#39; the pstree output via pstree -lpa :<br><br><br>  \
|-crond,4670<br>  |   `-crond,29882<br>  |       |-amdump,29886 /usr/sbin/amdump \
LithiumDaily<br>  |       |   `-driver,29921 LithiumDaily<br>  |       |       \
|-chunker,1905 LithiumDaily<br>  |       |       |-dumper,29923 LithiumDaily<br>  |   \
|       |-dumper,29924 LithiumDaily<br>  |       |       |   |-gzip,1907 --best<br>  \
|       |       |   `-(ssh,1906)<br>  |       |       |-dumper,29925 LithiumDaily<br> \
|       |       |   `-(ssh,3788)<br>  |       |       |-dumper,29926 LithiumDaily<br> \
|       |       |-dumper,29927 LithiumDaily<br>  |       |       |-dumper,29928 \
LithiumDaily<br>  |       |       |   `-(ssh,3786)<br>  |       |       `-taper,29922 \
/usr/libexec/amanda/taper LithiumDaily<br>  |       `-sendmail,29918 -FCronDaemon -i \
-odi -oem -oi -t<br>  |           `-postdrop,29919 -r<br><br>----<br>Here&#39;s the \
netstat output for those amanda processes:<br><br>unix  3      [ ]         STREAM     \
CONNECTED     223556345 1905/chunker1<br>unix  3      [ ]         STREAM     \
CONNECTED     223556344 29921/driver<br> unix  3      [ ]         STREAM     \
CONNECTED     223529657 29928/dumper5<br>unix  3      [ ]         STREAM     \
CONNECTED     223529656 29921/driver<br>unix  3      [ ]         STREAM     CONNECTED \
223529655 29927/dumper4<br> unix  3      [ ]         STREAM     CONNECTED     \
223529654 29921/driver<br>unix  3      [ ]         STREAM     CONNECTED     223529653 \
29926/dumper3<br>unix  3      [ ]         STREAM     CONNECTED     223529652 \
29921/driver<br> unix  3      [ ]         STREAM     CONNECTED     223529651 \
29925/dumper2<br>unix  3      [ ]         STREAM     CONNECTED     223529650 \
29921/driver<br>unix  3      [ ]         STREAM     CONNECTED     223529649 \
29924/dumper1<br> unix  3      [ ]         STREAM     CONNECTED     223529648 \
29921/driver<br>unix  3      [ ]         STREAM     CONNECTED     223529647 \
29923/dumper0<br>unix  3      [ ]         STREAM     CONNECTED     223529646 \
29921/driver<br> unix  3      [ ]         STREAM     CONNECTED     223529645 \
29922/perl<br>unix  3      [ ]         STREAM     CONNECTED     223529644 \
29921/driver<br>unix  2      [ ]         DGRAM                    223529619 \
29919/postdrop<br> unix  3      [ ]         STREAM     CONNECTED     223529616 \
29918/sendmail<br>unix  3      [ ]         STREAM     CONNECTED     223529615 \
29919/postdrop<br>unix  2      [ ]         DGRAM                    223529609 \
29918/sendmail<br> <br>--------<br><br>Any ideas on why it&#39;s \
stuck?<br><br>thx<br>m</div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic