[prev in list] [next in list] [prev in thread] [next in thread]
List: zeromq-dev
Subject: Re: [zeromq-dev] zctx_destroy is hanging
From: Stephen Hemminger <stephen () networkplumber ! org>
Date: 2013-05-22 18:12:59
Message-ID: 20130522111259.19dbc305 () nehalam ! linuxnetplumber ! net
[Download RAW message or body]
On Wed, 22 May 2013 09:44:52 +0200
Pieter Hintjens <ph@imatix.com> wrote:
> Can you provide a minimal reproducible case?
>
> -Pieter
>
>
> On Wed, May 22, 2013 at 12:32 AM, Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
> > We have a ZMQ based application (in C) using CZMQ and ZMQ 2.2.0
> > When daemon is due to be restarted or shutdown
> > 1. it receives a SIGTERM
> > 2. The signal is caught, and flag is set
> > 3. all the worker threads exit
> > 4. main thread waits for workers and does some other cleanup
> > 5. calls zctx_destroy()
> > and hangs there; any clues? maybe the zctx_destroy() is redundant anyway.
> >
> >
> > int
> > main(int argc, char **argv)
> > {
> > ...
> >
> > zctx_destroy(&zmq_ctx); << hang here
> >
> > return 0;
> > }
> >
> > There were several ZMQ sockets created, instrumenting CZMQ, it looks
> > like ZMQ is hanging in zctx__socket_destroy() of the ZMQ_REQ socket
> > which was bound twice, once to an ipc: endpoint and again to a
> > tcp://lo:5910
> > endpoint.
> >
> > Internally it looks like ZMQ reaper isn't working.
> >
> > The back trace of main thread is:
> > [Switching to thread 1 (Thread 0x7f1267625c80 (LWP 2065))]#0
> > 0x00007f126626ec13 in poll () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126626ec13 in poll () from /lib/libc.so.6
> > #1 0x00007f1266bd5df0 in zmq::signaler_t::wait (this=<value optimized
> > out>,
> > timeout_=-1) at signaler.cpp:145
> > #2 0x00007f1266bc6aae in zmq::mailbox_t::recv (this=0x1b4c808,
> > cmd_=0x7fff010baee0, timeout_=-1) at mailbox.cpp:74
> > #3 0x00007f1266bc059d in zmq::ctx_t::terminate (this=0x1b4c770) at
> > ctx.cpp:146
> > #4 0x00007f1266be100c in zmq_term (ctx_=0x1b4c770) at zmq.cpp:292
> > #5 0x00007f1266df8efe in zctx_destroy (self_p=0x7107a0) at zctx.c:122
> > #6 0x000000000040ae53 in main (argc=<value optimized out>,
> >
> > Some other threads:
> > (gdb) thread 4
> > [Switching to thread 4 (Thread 0x7f1241bf9700 (LWP 2149))]#0
> > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4e680) at
> > epoll.cpp:142
> > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4e6f0) at thread.cpp:75
> > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5 0x0000000000000000 in ?? ()
> > (gdb) thread 5
> > [Switching to thread 5 (Thread 0x7f12423fa700 (LWP 2148))]#0
> > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4d050) at
> > epoll.cpp:142
> > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4d0c0) at thread.cpp:75
> > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5 0x0000000000000000 in ?? ()
> > (gdb) thread 6
> > [Switching to thread 6 (Thread 0x7f1242bfb700 (LWP 2102))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > (gdb) thread 7
> > [Switching to thread 7 (Thread 0x7f12433fc700 (LWP 2101))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > (gdb) thread 8
> > [Switching to thread 8 (Thread 0x7f1243bfd700 (LWP 2100))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
Found it, not a zmq problem per say.
Like any other application, our application has grown, and off in a new feature
there is another zthread which was being started as a detached thread but using
the same ctx and not exiting. Having it watch the same exit flag, and giving it
it's own context solved the issue.
_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic