[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bacula-devel
Subject:    Re: [Bacula-devel] Bacula 5.2.13 client fails ASSERT when queried for status
From:       Kern Sibbald <kern () sibbald ! com>
Date:       2013-04-13 8:31:16
Message-ID: 516917D4.8030303 () sibbald ! com
[Download RAW message or body]

Hello,

This appears to me to be a bad build.  First, it looks like you have enabled
the DEVELOPER define in src/version.h.  You might start by turning that
off.  Send, I recommend that you review your build procedures for Solaris.
On Sparc systems for Bacula Enterprise, we build the binaries
with the Solaris c++ compiler, and if you are using the gnu compiler,
perhaps you are running into problems (we have never used it,
so I simply do not know).  On i86 Solaris systems, the
gnu compiler works perfectly well.

If you have modified the source, then that is the most likely cause of
the problems.

Finally, we have not built and tested the community version on the
Sparc.  The
Enterprise version, which is very similar does build and work perfectly.

Best regards,
Kern

On 04/10/2013 08:35 PM, Michael Hocke wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> 
> 
> Hello everybody,
> 
> I am dealing with a segmentation fault error on one of my bacula-fd clients. It's \
> running 5.2.13 on SPARC Solaris 10 Generic_147440-01. According to the debug output \
> it is caused by 
> abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: current \
> >= 0 Bacula interrupted by signal 11: Segmentation Fault
> 
> I am able to reproduce this by querying the client status from bconsole a SECOND \
> time after restarting bacula-fd. The first time it works fine but the second time \
> it crashes. It happens even when I try to run backup jobs. The first one succeeds, \
> the second one crashes with the same assert problem. Or if I try to query the \
> status of the client while a backup is running, i.e. the second connection after \
> restart. 
> Here is the backtrace of the time when it crashes:
> 
> - ----- bacula.1203.traceback ------
> [New process 1203]
> Retry #1:
> Retry #2:
> Retry #3:
> Retry #4:
> [Thread debugging using libthread_db enabled]
> [New LWP    3        ]
> [New LWP    2        ]
> [New Thread 1        ]
> [New Thread 2 (LWP 2)]
> [New Thread 3        ]
> [Switching to Thread 1        ]
> 0xfebca710 in __lwp_park () from /lib/libc.so.1
> $1 = '\000' <repeats 29 times>
> $2 = 0x47530 "bacula-fd"
> $3 = 0x0
> $4 = 0x0
> $5 = 0xff2985d0 "5.2.13 (19 February 2013)"
> $6 = 0xff2985a8 "sparc-sun-solaris2.10"
> $7 = 0xff2985a0 "solaris"
> $8 = 0xff298598 "5.10"
> $9 = "abudhabi-rad1-sys", '\000' <repeats 20 times>
> $10 = 0xff2985c0 "solaris 5.10"
> $11 = 0
> Environment variable "TestName" not defined.
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> #1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
> #2  0xfebace94 in _flockget () from /lib/libc.so.1
> #3  0xfebadbf8 in fclose () from /lib/libc.so.1
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> Thread 6 (Thread 3        ):
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> #1  0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
> #2  0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
> #3  0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
> #4  0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
> #5  0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
> #6  0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
> #7  0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, \
> abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line21) at lockmgr.c:824 #8  \
> 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321 #9  \
> 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939 #10 0xfebca678 in \
> _lwp_start () from /lib/libc.so.1 #11 0xfebca678 in _lwp_start () from \
> /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt \
> stack?) 
> Thread 5 (Thread 2 (LWP 2)):
> #0  0xfebce658 in _waitid () from /lib/libc.so.1
> #1  0xfeb6f81c in _waitpid () from /lib/libc.so.1
> #2  0xfebbe000 in waitpid () from /lib/libc.so.1
> #3  0xff281dc0 in signal_handler (sig) at signal.c:237
> #4  <signal handler called>
> #5  0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 \
> "bnet_server.c", l 9) at lockmgr.c:360 #6  0xff2918b0 in bthread_mutex_unlock_p \
> (m=0xff2ae158, file=0xff293a20 "bnet_server.c", line 9) at lockmgr.c:793 #7  \
> 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, max_clients , \
> client_wq=0x47180, handle_client_request=0x242dc <handle_client_request(void*)>) at \
> bnet_server.c:209 #8  0x0002e4ec in main (argc=<optimized out>, argv=<optimized \
> out>) at filed.c:278 
> Thread 4 (Thread 1        ):
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> #1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
> #2  0xfebace94 in _flockget () from /lib/libc.so.1
> #3  0xfebadbf8 in fclose () from /lib/libc.so.1
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> Thread 3 (LWP    2        ):
> #0  0xfebce658 in _waitid () from /lib/libc.so.1
> #1  0xfeb6f81c in _waitpid () from /lib/libc.so.1
> #2  0xfebbe000 in waitpid () from /lib/libc.so.1
> #3  0xff281dc0 in signal_handler (sig) at signal.c:237
> #4  <signal handler called>
> #5  0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 \
> "bnet_server.c", l 9) at lockmgr.c:360 #6  0xff2918b0 in bthread_mutex_unlock_p \
> (m=0xff2ae158, file=0xff293a20 "bnet_server.c", line 9) at lockmgr.c:793 #7  \
> 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, max_clients , \
> client_wq=0x47180, handle_client_request=0x242dc <handle_client_request(void*)>) at \
> bnet_server.c:209 #8  0x0002e4ec in main (argc=<optimized out>, argv=<optimized \
> out>) at filed.c:278 
> Thread 2 (LWP    3        ):
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> #1  0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
> #2  0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
> #3  0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
> #4  0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
> #5  0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
> #6  0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
> #7  0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, \
> abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line21) at lockmgr.c:824 #8  \
> 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321 #9  \
> 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939 #10 0xfebca678 in \
> _lwp_start () from /lib/libc.so.1 #11 0xfebca678 in _lwp_start () from \
> /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt \
> stack?) 
> Thread 1 (LWP    1        ):
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> #1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
> #2  0xfebace94 in _flockget () from /lib/libc.so.1
> #3  0xfebadbf8 in fclose () from /lib/libc.so.1
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> #0  0xfebca710 in __lwp_park () from /lib/libc.so.1
> No symbol table info available.
> #1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
> No symbol table info available.
> #2  0xfebace94 in _flockget () from /lib/libc.so.1
> No symbol table info available.
> #3  0xfebadbf8 in fclose () from /lib/libc.so.1
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> - ----- SNIP -----
> 
> And here is the debug output from bacula-fd -c ../etc/bacula-fd.conf -v -f -d799
> 
> - ------ SNIP -----
> 
> # ./bacula-fd -c ../etc/bacula-fd.conf -v -f -d799
> bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
> bacula-fd: filed_conf.c:452-0 Inserting director res: bacula-mon
> bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
> abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
> abudhabi-rad1-sys-fd: message.c:347-0 Copy message resource 4a040 to 48300
> abudhabi-rad1-sys-fd: bsys.c:556-0 Could not open state file. sfd=-1 size2: ERR=No \
>                 such file or directory
> abudhabi-rad1-sys-fd: fd_plugins.c:1100-0 plugin dir is NULL
> abudhabi-rad1-sys-fd: filed.c:276-0 filed: listening on port 9102
> abudhabi-rad1-sys-fd: bnet_server.c:112-0 Addresses host[ipv4:0.0.0.0:9102]
> abudhabi-rad1-sys-fd: bnet.c:766-0 who=client host8.122.128.60 port‘02
> abudhabi-rad1-sys-fd: find.c:81-0 init_find_files ffb9d0
> abudhabi-rad1-sys-fd: job.c:270-0 <dird: Hello Director bacula-dir calling
> abudhabi-rad1-sys-fd: job.c:286-0 Executing Hello command.
> abudhabi-rad1-sys-fd: job.c:436-0 Calling Authenticate
> abudhabi-rad1-sys-fd: cram-md5.c:72-0 send: auth cram-md5 \
>                 <146880531.1365609328@abudhabi-rad1-sys-fd> ssl=0
> abudhabi-rad1-sys-fd: cram-md5.c:131-0 cram-get received: auth cram-md5 \
>                 <1124129763.1365609328@bacula-dir> ssl=0
> abudhabi-rad1-sys-fd: cram-md5.c:150-0 sending resp to challenge: \
>                 vSJPvl/W69/Ag6Zs83+6+C
> abudhabi-rad1-sys-fd: job.c:440-0 OK Authenticate
> abudhabi-rad1-sys-fd: job.c:270-0 <dird: JobId=0 \
>                 Job=-Console-.2013-04-10_11.40.56_40 SDid=0 SDtime=0 \
>                 Authorization=dummy
> abudhabi-rad1-sys-fd: job.c:286-0 Executing JobId= command.
> abudhabi-rad1-sys-fd: job.c:1737-0 set sd auth key
> abudhabi-rad1-sys-fd: job.c:544-0 JobId=0 Auth=dummy
> abudhabi-rad1-sys-fd: fd_plugins.c:1197-0 plugin list is NULL
> abudhabi-rad1-sys-fd: job.c:270-0 <dird: statusabudhabi-rad1-sys-fd: job.c:286-0 \
>                 Executing status command.
> abudhabi-rad1-sys-fd: runscript.c:108-0 runscript: running all RUNSCRIPT object \
>                 (ClientAfterJob) JobStatus=C
> abudhabi-rad1-sys-fd: job.c:399-0 Calling term_find_files
> abudhabi-rad1-sys-fd: job.c:404-0 Done with term_find_files
> abudhabi-rad1-sys-fd: runscript.c:286-0 runscript: freeing all RUNSCRIPTS object
> abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcrb580
> abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
> abudhabi-rad1-sys-fd: job.c:406-0 Done with free_jcr
> abudhabi-rad1-sys-fd: mem_pool.c:375-0 garbage collect memory pool
> 
> That was the first client status query. This is the debug output of the second \
> query: 
> abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: current \
> >= 0 Bacula interrupted by signal 11: Segmentation Fault
> Kaboom! bacula-fd, abudhabi-rad1-sys-fd got signal 11 - Segmentation Fault. \
> Attempting traceback. Kaboom! exepath=/usr/local/bacula/sbin
> abudhabi-rad1-sys-fd: signal.c:205-0 Working=/usr/local/bacula/var
> abudhabi-rad1-sys-fd: signal.c:206-0 btpath=/usr/local/bacula/sbin/btraceback
> abudhabi-rad1-sys-fd: signal.c:207-0 exepath=/usr/local/bacula/sbin/bacula-fd
> abudhabi-rad1-sys-fd: signal.c:236-0 Doing waitpid
> Calling: /usr/local/bacula/sbin/btraceback /usr/local/bacula/sbin/bacula-fd 1203 \
>                 /usr/local/bacula/var
> gcore: /usr/local/bacula/var/bacula-fd.1203 dumped
> /usr/local/bacula/sbin/btraceback: /usr/local/bacula/sbin/bsmtp: not found
> abudhabi-rad1-sys-fd: signalThe btraceback call returned 1
> Dumping: /usr/local/bacula/var/abudhabi-rad1-sys-fd.1203.bactrace
> cat: write error: Broken pipe
> - ----- SNIP -----
> 
> I'll be happy to provide more information if needed.
> 
> Thanks!
> 
> - - Michael
> - --
> Michael Hocke					    New York University
> Sr UNIX Systems Administrator		Information Technology Services
> 				               C&CS COS
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: PGP Desktop 10.0.3 (Build 1)
> Charset: us-ascii
> 
> wsBVAwUBUWWw/5bfnpCg64TVAQGAWggAp+gq0qVwciCCYarrO/3fSshpl7svySeK
> wtvxEcGx90c86Hb8KMb33F7XmB2uiwM/e2roMeHh7Q8qrD2RxmFVkUmrZvp5usq6
> ttL2NC72nVWkqtg6axeOjcQkcFQc6m6bsObDJv11p3LIcD78aHXUYellhU8RNXSZ
> Zjh/zE2iIJ5MRJk9gcoaOOmicfMIaGLXScQAw2EJsD3TF/QsxoiXUbc3pwu9b5eI
> yZZ4C5P1Z1RdZROp/AU3i417znTPCXaObaulEnnt96uGqaKU79lNq/g5eb/58qpd
> lFy8/goB1Fd94J4/KG0zfoWMSf9POmeaBosBkrza9UkAU5TZ00yPVA=> =PVlP
> -----END PGP SIGNATURE-----
> 
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> 


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic