[prev in list] [next in list] [prev in thread] [next in thread] 

List:       snort-devel
Subject:    Re: [Snort-devel] Unable to kill a non-zombie process with -9
From:       elof2 () sentor ! se
Date:       2014-10-31 17:58:38
Message-ID: alpine.BSF.2.00.1410311819410.32996 () farmermaggot ! shire ! sentor ! se
[Download RAW message or body]


No one have any thoughts about this?

Its happening sporadically on several FreeBSD 10 machines of mine, while 
all of the FreeBSD 9-machines work just fine.

If the problem isn't fixed, people won't be able to upgrade to and run 
snort on FreeBSD 10.

log:

> I'm starting snort (as root).
> 
> <<<lots of startup logs for pid 22646>>>
> Oct 15 08:46:59 snort[22646]: Initializing daemon mode
> Oct 15 08:46:59 snort[22648]: Daemon initialized, signaled parent pid: 22646
> Oct 15 08:46:59 snort[22648]: Reload thread starting...
> Oct 15 08:46:59 snort[22648]: Reload thread started, thread 0x8146e8800 (22648)
> End of log.
> 
> Error! Nothing more happens with the snort process!
> Normally it should continue and log the following lines as well:
> 
> 
> snort[nnn]: Decoding Ethernet
> snort[nnn]: Checking PID path...
> snort[nnn]: PID path stat checked out ok, PID path set to /var/run/
> snort[nnn]: Writing PID "7627" to file "/var/run//snort_mon0.pid"
> snort[nnn]: Chroot directory = /usr/foobar/log
> snort[nnn]: Set gid to 100
> snort[nnn]: Set uid to 100
> snort[nnn]:
> snort[nnn]:         --== Initialization Complete ==--
> snort[nnn]: Commencing packet processing (pid=nnn)
> 
> 
> 
> 
> 
> When looking at this half-started snort process with 'ps', it looks like this:
> 
> ps faxulwwj 22648
> USER   PID %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND        UID \
> PPID CPU PRI NI MWCHAN  PGID   SID JOBC root 22648 51.8  1.1 488552 179344  -  Rs   \
> 8:46AM 53:06.52 /usr/local/bin/s 0    1   0  88  0 -      22648 22648    0 
> 
> The process is still owned by root, so just as the missing log lines are
> saying, it has not yet performed any change of uid/gid.
> 
> 
> 
> 
> So there seem to be two questions.
> 
> Q1)
> What happens between "Reload thread started, thread 0x8146e8800 (22648)" and
> "Decoding Ethernet"?
> Apparently something goes wrong here on FreeBSD 10.0.
> (this problem does not always occur, sometimes snort start just fine)
> 
> Q2)
> When the process has frozen in this half-started state, it can't be killed even
> with a kill -9. Why?
> 
> 
> 
> 
> John-Mark asked me for some debugging info. Here it is:
> 
> I now run 'kill 22648' on the above semi-started process:
> 
> USER   PID %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND UID
> PPID CPU PRI NI MWCHAN  PGID   SID JOBC
> old root 22648 51.8  1.1 488552 179344  -  Rs    8:46AM 53:06.52
> /usr/local/bin/s   0    1   0  88  0 -      22648 22648    0
> new root 22648 52.3  1.1 488552 179344  -  Rs    8:46AM 53:36.48
> /usr/local/bin/s   0    1   0  52  0 -      22648 22648    0
> 
> No change.
> 
> 
> 
> kill -9 22648
> 
> USER   PID %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND UID
> PPID CPU PRI NI MWCHAN  PGID   SID JOBC
> old root 22648 51.8  1.1 488552 179344  -  Rs    8:46AM 53:06.52
> /usr/local/bin/s   0    1   0  88  0 -      22648 22648    0
> new root 22648 37.7  1.1 488552 179344  -  Ts    8:46AM 53:50.87
> /usr/local/bin/s   0    1   0  52  0 -      22648 22648    0
> 
> Less CPU-usage and STAT changed to "Ts".
> 
> 
> 
> 
> kill -CONT 22648
> 
> USER   PID %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND UID
> PPID CPU PRI NI MWCHAN  PGID   SID JOBC
> old root 22648 51.8  1.1 488552 179344  -  Rs    8:46AM 53:06.52
> /usr/local/bin/s   0    1   0  88  0 -      22648 22648    0
> new root 22648  0.0  1.1 488552 179344  -  Ts    8:46AM 53:50.88
> /usr/local/bin/s   0    1   0  52  0 -      22648 22648    0
> 
> No change except cpu is down to 0.
> 
> 
> I now start 'kgdb'
> info threads
> I found two threads for snort, doing a bt for both of them:
> 372 Thread 100602 (PID=22648: snort)  sched_switch (td=0xfffff802c061f490,
> newtd=<value optimized out>, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1962
> 371 Thread 100598 (PID=22648: snort)  sched_switch (td=0xfffff80221857000,
> newtd=<value optimized out>, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1962
> thread 372
> [Switching to thread 372 (Thread 100602)]#0  sched_switch
> (td=0xfffff802c061f490, newtd=<value optimized out>, flags=<value optimized
> out>) at /usr/src/sys/kern/sched_ule.c:1962
> 1962    in /usr/src/sys/kern/sched_ule.c
> bt
> #0  sched_switch (td=0xfffff802c061f490, newtd=<value optimized out>,
> flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1962
> #1  0xffffffff808b8c1e in mi_switch (flags=266, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:494
> #2  0xffffffff808c04b0 in thread_suspend_switch (td=0xfffff802c061f490) at
> /usr/src/sys/kern/kern_thread.c:883
> #3  0xffffffff808c0276 in thread_single (mode=1) at
> /usr/src/sys/kern/kern_thread.c:713
> #4  0xffffffff8087c1bb in exit1 (td=0xfffff802c061f490, rv=9) at
> /usr/src/sys/kern/kern_exit.c:180
> #5  0xffffffff808b2faf in sigexit (td=<value optimized out>, sig=<value
> optimized out>) at /usr/src/sys/kern/kern_sig.c:2935
> #6  0xffffffff808b3669 in postsig (sig=<value optimized out>) at
> /usr/src/sys/kern/kern_sig.c:2822
> #7  0xffffffff808f6f57 in ast (framep=<value optimized out>) at
> /usr/src/sys/kern/subr_trap.c:271
> #8  0xffffffff80c75870 in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:416
> #9  0x0000000801d6f19a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> 
> 
> thread 371
> [Switching to thread 371 (Thread 100598)]#0  sched_switch
> (td=0xfffff80221857000, newtd=<value optimized out>, flags=<value optimized
> out>) at /usr/src/sys/kern/sched_ule.c:1962
> 1962    in /usr/src/sys/kern/sched_ule.c
> bt
> #0  sched_switch (td=0xfffff80221857000, newtd=<value optimized out>,
> flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1962
> #1  0xffffffff808b8c1e in mi_switch (flags=260, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:494
> #2  0xffffffff808f2e3a in sleepq_wait (wchan=0x0, pri=0) at
> /usr/src/sys/kern/subr_sleepqueue.c:620
> #3  0xffffffff80864aad in _cv_wait (cvp=0xffffffff8147a500,
> lock=0xffffffff8147a480) at /usr/src/sys/kern/kern_condvar.c:139
> #4  0xffffffff808fb05f in vmem_xalloc (vm=0xffffffff8147a480, size0=<value
> optimized out>, align=<value optimized out>, phase=0, nocross=<value optimized
> out>, minaddr=0, maxaddr=18446735286768857088, flags=8194, addrp=<value
> optimized out>) at /usr/src/sys/kern/subr_vmem.c:1196
> #5  0xffffffff808fae6b in vmem_alloc (vm=0x0, size=0, flags=<value optimized
> out>, addrp=0xfffffe0466e1d6e8) at /usr/src/sys/kern/subr_vmem.c:1082
> #6  0xffffffff80b0fa58 in kmem_malloc (vmem=0xffffffff8147a480,
> size=2139729920, flags=2) at /usr/src/sys/vm/vm_kern.c:314
> #7  0xffffffff80b08dfb in uma_large_malloc (size=<value optimized out>, wait=2)
> at /usr/src/sys/vm/uma_core.c:1006
> #8  0xffffffff80898cf3 in malloc (size=2139729920, mtp=0xffffffff813a0450,
> flags=0) at /usr/src/sys/kern/kern_malloc.c:520
> #9  0xffffffff8096307b in bpf_buffer_ioctl_sblen (d=0xfffff80159ea9000,
> i=<value optimized out>) at /usr/src/sys/net/bpf_buffer.c:183
> #10 0xffffffff80960a3c in bpfioctl (dev=0x0, cmd=<value optimized out>,
> addr=0xfffff801fbd06b40 "", flags=0, td=0xfffff80221857000) at
> /usr/src/sys/net/bpf.c:408
> #11 0xffffffff807ac1df in devfs_ioctl_f (fp=0xfffff8002b3d9d20, com=3221504614,
> data=0xfffff801fbd06b40, cred=<value optimized out>, td=0xfffff80221857000) at
> /usr/src/sys/fs/devfs/devfs_vnops.c:757
> #12 0xffffffff808fdfae in kern_ioctl (td=0xfffff80221857000, fd=<value
> optimized out>, com=0) at file.h:319
> #13 0xffffffff808fdd2f in sys_ioctl (td=0xfffff80221857000,
> uap=0xfffffe0466e1da40) at /usr/src/sys/kern/sys_generic.c:702
> #14 0xffffffff80c8f117 in amd64_syscall (td=0xfffff80221857000, traced=0) at
> subr_syscall.c:134
> #15 0xffffffff80c7580b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:391
> #16 0x0000000801d8f08a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> 
> 
> Let me know if I can debug this any further.
> 
> /Elof
> 
> 
> 
> On Thu, 9 Oct 2014, John-Mark Gurney wrote:
> 
> > elof2@sentor.se wrote this message on Wed, Oct 08, 2014 at 13:30 +0200:
> > > 
> > > I guess this is a bug report for FreeBSD 10.0.
> > > 
> > > 
> > > 
> > > Sometimes I can't kill my snort process on FreeBSD 10.0.
> > > It won't die, even with kill -9.
> > > 
> > > I'm not talking about a zombie process. Snort is a process that should
> > > die normally.
> > > I've run snort on over 100 nodes since FreeBSD v6.x and I've never seen
> > > this behavior until now in FreeBSD 10.0.
> > > 
> > > 
> > > Example:
> > > 
> > > #ps faxuw
> > > USER      PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED        TIME
> > > COMMAND
> > > root    49222  53.4  2.2 492648 183012  -  Rs   11:46AM     7:05.59
> > > /usr/local/bin/snort -q -D -c snort.conf
> > > root    47937   0.0  2.2 488552 182864  -  Ts   10:56AM    29:35.98
> > > /usr/local/bin/snort -q -D -c snort.conf
> > 
> > What is the MWCHAN?  add l to the ps command...
> > 
> > > The pid 47937 has been killed (repeatedly) with -9.
> > > Its status is "Ts" meaning it is Stopped.
> > 
> > have you tried to kill -CONT <pid> to resume it?
> > 
> > > But it won't actually die and disappear. The only way to get rid of it
> > > seem to be to reboot the machine. :-(
> > > 
> > > (pid 49222 is the new process that was started after 47937 was killed)
> > > 
> > > 
> > > The problem doesn't happen all the time and I haven't found any patterns
> > > as to when it does. :-(
> > > If I restart snort once every day, it fails to die approximately 2-4 times
> > > per month.
> > > Even though the problem doesn't happen on every kill, it is a definately a
> > > recurring event.
> > 
> > Can you run kgdb on the machine? (yes, it works on a live machine), use
> > info threads to find the thread id, and then use thread <threadid> to
> > switch to it, and run bt to get a back trace...
> > 
> > > I began to see it on a heavily loaded 10GE sensor, so I thought it could
> > > have something to do with the ix driver, or the heavy load.
> > > But now another FreeBSD 10.0-sensor had the exact same problem, and this
> > > sensor don't have any 10GE NICs. In fact, this sensor has been running
> > > just fine with both FreeBSD 9.1 and 9.3 for the past years. Snort has
> > > always terminated correctly! After I reinstalled this machine with FreeBSD
> > > 10.0 last friday, snort has then terminated correctly every day until
> > > today, when it failed with the above pid 47937. (this sensor use the 'em'
> > > driver, not 'ixgbe')
> > > 
> > > I'm running snort with the same configuration, settings, version, daq,
> > > libs, etc on 10.0 as I do on 9.3.
> > > None of the 9.3 sensors have this problem, so it has to be something new
> > > in FreeBSD 10.0.
> > 
> > --
> > John-Mark Gurney				Voice: +1 415 225 5579
> > 
> > "All that I will do, has been done, All that I have, has not."
> > 
> 
> 
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Snort-devel mailing list
> Snort-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/snort-devel
> Archive:
> http://sourceforge.net/mailarchive/forum.php?forum_name=snort-devel
> 
> Please visit http://blog.snort.org for the latest news about Snort!
> 

------------------------------------------------------------------------------
_______________________________________________
Snort-devel mailing list
Snort-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snort-devel
Archive:
http://sourceforge.net/mailarchive/forum.php?forum_name=snort-devel

Please visit http://blog.snort.org for the latest news about Snort!


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic