[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zeromq-dev
Subject:    Re: [zeromq-dev] zctx_destroy is hanging
From:       Stephen Hemminger <stephen () networkplumber ! org>
Date:       2013-05-22 18:12:59
Message-ID: 20130522111259.19dbc305 () nehalam ! linuxnetplumber ! net
[Download RAW message or body]

On Wed, 22 May 2013 09:44:52 +0200
Pieter Hintjens <ph@imatix.com> wrote:

> Can you provide a minimal reproducible case?
> 
> -Pieter
> 
> 
> On Wed, May 22, 2013 at 12:32 AM, Stephen Hemminger <
> stephen@networkplumber.org> wrote:
> 
> > We have a ZMQ based application (in C) using CZMQ and ZMQ 2.2.0
> > When daemon is due to be restarted or shutdown
> >  1. it receives a SIGTERM
> >  2. The signal is caught, and flag is set
> >  3. all the worker threads exit
> >  4. main thread waits for workers and does some other cleanup
> >  5. calls zctx_destroy()
> > and hangs there; any clues? maybe the zctx_destroy() is redundant anyway.
> >
> >
> > int
> > main(int argc, char **argv)
> > {
> >    ...
> >
> >         zctx_destroy(&zmq_ctx); << hang here
> >
> >         return 0;
> > }
> >
> > There were several ZMQ sockets created, instrumenting CZMQ, it looks
> > like ZMQ is hanging in zctx__socket_destroy() of the ZMQ_REQ socket
> > which was bound twice, once to an ipc: endpoint and again to a
> > tcp://lo:5910
> > endpoint.
> >
> > Internally it looks like ZMQ reaper isn't working.
> >
> > The back trace of main thread is:
> > [Switching to thread 1 (Thread 0x7f1267625c80 (LWP 2065))]#0
> >  0x00007f126626ec13 in poll () from /lib/libc.so.6
> > (gdb) where
> > #0  0x00007f126626ec13 in poll () from /lib/libc.so.6
> > #1  0x00007f1266bd5df0 in zmq::signaler_t::wait (this=<value optimized
> > out>,
> >     timeout_=-1) at signaler.cpp:145
> > #2  0x00007f1266bc6aae in zmq::mailbox_t::recv (this=0x1b4c808,
> >     cmd_=0x7fff010baee0, timeout_=-1) at mailbox.cpp:74
> > #3  0x00007f1266bc059d in zmq::ctx_t::terminate (this=0x1b4c770) at
> > ctx.cpp:146
> > #4  0x00007f1266be100c in zmq_term (ctx_=0x1b4c770) at zmq.cpp:292
> > #5  0x00007f1266df8efe in zctx_destroy (self_p=0x7107a0) at zctx.c:122
> > #6  0x000000000040ae53 in main (argc=<value optimized out>,
> >
> > Some other threads:
> > (gdb) thread 4
> > [Switching to thread 4 (Thread 0x7f1241bf9700 (LWP 2149))]#0
> >  0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0  0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1  0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4e680) at
> > epoll.cpp:142
> > #2  0x00007f1266bdbdeb in thread_routine (arg_=0x1b4e6f0) at thread.cpp:75
> > #3  0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4  0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5  0x0000000000000000 in ?? ()
> > (gdb) thread 5
> > [Switching to thread 5 (Thread 0x7f12423fa700 (LWP 2148))]#0
> >  0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0  0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1  0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4d050) at
> > epoll.cpp:142
> > #2  0x00007f1266bdbdeb in thread_routine (arg_=0x1b4d0c0) at thread.cpp:75
> > #3  0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4  0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5  0x0000000000000000 in ?? ()
> > (gdb) thread 6
> > [Switching to thread 6 (Thread 0x7f1242bfb700 (LWP 2102))]#0
> >  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1  0x00000000004c6938 in eal_thread_loop ()
> > #2  0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3  0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4  0x0000000000000000 in ?? ()
> > (gdb) thread 7
> > [Switching to thread 7 (Thread 0x7f12433fc700 (LWP 2101))]#0
> >  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1  0x00000000004c6938 in eal_thread_loop ()
> > #2  0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3  0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4  0x0000000000000000 in ?? ()
> > (gdb) thread 8
> > [Switching to thread 8 (Thread 0x7f1243bfd700 (LWP 2100))]#0
> >  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0  0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1  0x00000000004c6938 in eal_thread_loop ()
> > #2  0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3  0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4  0x0000000000000000 in ?? ()
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >

Found it, not a zmq problem per say.
Like any other application, our application has grown, and off in a new feature
there is another zthread which was being started as a detached thread but using
the same ctx and not exiting. Having it watch the same exit flag, and giving it
it's own context solved the issue.
_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic