[prev in list] [next in list] [prev in thread] [next in thread] 

List:       dbi-users
Subject:    Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]
From:       Martin Evans <martin.evans () easysoft ! com>
Date:       2008-03-14 11:47:03
Message-ID: 47DA65B7.1060403 () easysoft ! com
[Download RAW message or body]

Scott T. Hildreth wrote:
> On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:
> > This seems to be a threading error with the linux kernel version.
> > I am running this process on newer kernels (2.6.22.x) and the error
> > never occurs.  We also are experiencing a lot the "Futex WAIT" issues
> > with Oracle and the 2.6.20 kernels.
> 
> The kernel upgrade didn't solve the problem.  Since the process didn't
> crash on some of our servers and not the others, I narrowed down the
> difference in the servers.  I concluded that all the SuSE Enterprise 
> 10 servers had the problem and the crash only occurred when execute_array()
> method was used.  All of our servers have Oracle 10.2.0.3 so there wasn't 
> a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 
> 1.20.  We basically decided that this was race problem with the threads, 
> especially since it was an intermittent problem.  I decided to compile a 
> DBD::Oracle with debugging symbols, hopping I would get better info from
> the core file and gdb.  When I was running perl Makefile.PL a message 
> appeared that I often ignored.
> 
> WARNING: If you have problems you may need to rebuild perl with threading enabled.
> 
> I build our own Perl in /usr/local/ and leave the vendor Perl alone.  I 
> never compile with threads, since we have not found a need for them, yet.
> So I used the /usr/bin/perl, which is always compiled with threading, and
> the process stopped crashing.   So the WARNING never applied until now. I 
> guess I will start building a threaded Perl on our SuSE Enterprise servers
> from now on.  This seems to fixed the problem (knocking on wood).  I thought
> would share my findings, just in case someone else runs into this same situation.
> Save yourself time, read the WARNINGS. :-)   
> 
> Any ideas on why array processing would cause this to occur?  Did I just get lucky
> and hit the right scenario for this to happen? Just curious.
> 
> Thanks.

As far as I recall, Oracle client libraries are built with a thread-safe 
option -pthread or whatever option the compiler needs for the platform. 
I seem to remember that on Linux I could build code asking for 
thread-safe and mix it with libraries which were not built this way and 
the linker did not complain. The problem (although I'm not saying it 
still exists) is that when built thread-safe some structures in C header 
files changed size (one of them was something to do with longjmp I 
think). So if you mixed thread-safe code with a library that was not 
built that way they had different ideas of the longjmp structure and 
that could lead to all sorts of apparently random seg faults etc.

On Linux my company to this day distributes ODBC drivers and the 
unixODBC driver manager built thread-safe and non-thread-safe for this 
very reason as we've no idea how any app they use was built.

A sure sign to look for is:

ldd libclntsh.so.10.1
         linux-gate.so.1 =>  (0x00598000)
         libnnz10.so => 
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/libnnz10.so 
(0x00111000)
         libdl.so.2 => /lib/libdl.so.2 (0x00320000)
         libm.so.6 => /lib/libm.so.6 (0x00324000)
         libpthread.so.0 => /lib/libpthread.so.0 (0x00349000)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

         libnsl.so.1 => /lib/libnsl.so.1 (0x0035d000)
         libc.so.6 => /lib/libc.so.6 (0x00373000)
         /lib/ld-linux.so.2 (0x00599000)

Mix code dependent on libpthread with code which isn't and they were 
probably compiled with incompatible compiler options.

Sounds like this may have been your problem.

Martin
-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

> > Thanks for listening.
> > 
> > On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:
> > > Should have posted to users not dev.  This is really a bizarre problem.
> > > I can get it to fail about every fifth iteration otherwise the process
> > > works.  I ran it from another server connect to the same database and 
> > > it will intermittently fail.  I run it from a third sever and I can't 
> > > get it to core dump.  All 3 servers have the same kernel & Perl
> > > versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
> > > I created a test case, which uses execute_array and of course I can't
> > > get it to core dump.  If anyone has any ideas on what might be going on
> > > here,  I would love to hear them!
> > > 
> > > Thanks
> > > STH
> > > 
> > > On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
> > > > I am not sure how to describe this, my co-worker will run his process and get \
> > > > a core dump (I pasted the back trace below) and then run the process again \
> > > > with no core dumps.  Sometimes it will core dump several times in a row and \
> > > > then the next run it finishes fine.  I ran the process with DBI_TRACE=9 and \
> > > > this is what shows up at the end of the log, 
> > > > 
> > > > 1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER \
> > > > CODE(0xN) undef) ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef \
> > > > undef)... OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0 \
> > > > (*=0),mode=2)=ERROR OCIErrorGet(10a9138,1,"<NULL>",7fff058d684c,"ORA-02005: \
> > > > implicit (-1) length not valid for this bind or define datatype \
> > > > ",1024,2)=SUCCESS OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: \
> > > > ORA-02005: implicit (-1) length not valid for this bind or define datatype 
> > > > OCIErrorGet(10a9138,2,"<NULL>",7fff058d684c,"ORA-02005: implicit (-1) length \
> > > > not valid for this bind or define datatype ",1024,2)=NO_DATA
> > > > 
> > > > At first I thought it was a 32bit library with a 64bit Perl problem, but \
> > > > Oracle.so & Perl are both linked with the correct 64 bit libs.  The Oracle \
> > > > client is 10.2.0.3 and DBI versions are, 
> > > > Perl            : 5.008008    (x86_64-linux)
> > > > OS              : linux       (2.6.20.19)
> > > > DBI             : 1.602
> > > > DBD::mysql      : 4.005
> > > > DBD::Sponge     : 12.010002
> > > > DBD::SQLite     : 1.13
> > > > DBD::Proxy      : 0.2004
> > > > DBD::Oracle     : 1.20
> > > > DBD::Multiplex  : 2.04
> > > > DBD::Gofer      : 0.010103
> > > > DBD::File       : 0.35
> > > > DBD::ExampleP   : 12.010007
> > > > DBD::DBM        : 0.03
> > > > 
> > > > I am going to try to isolate a small test case, but right now I wanted to \
> > > > post what I  have found so far.
> > > > 
> > > > Thanks,
> > > > STH
> > > > 
> > > > ############## Back Trace \
> > > > ############################################################# 
> > > > (gdb) bt
> > > > #0  0x00002b66ec7d9b95 in raise () from /lib64/libc.so.6
> > > > #1  0x00002b66ec7daf90 in abort () from /lib64/libc.so.6
> > > > #2  0x00002b66ec81035b in __libc_message () from /lib64/libc.so.6
> > > > #3  0x00002b66ec81534e in malloc_printerr () from /lib64/libc.so.6
> > > > #4  0x00002b66ec81695c in free () from /lib64/libc.so.6
> > > > #5  0x00002b66ef0ac102 in ora_st_execute_array () from \
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> > > >  #6  0x00002b66ef0a62bf in XS_DBD__Oracle__st_ora_execute_array ()
> > > > from /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> > > >  #7  0x000000000046bc47 in Perl_pp_entersub ()
> > > > #8  0x000000000046a29e in Perl_runops_standard ()
> > > > #9  0x000000000041e82d in Perl_call_sv ()
> > > > #10 0x00002b66ec9ee038 in XS_DBI_dispatch () from \
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so #11 \
> > > > 0x000000000046bc47 in Perl_pp_entersub () #12 0x000000000046a29e in \
> > > > Perl_runops_standard () #13 0x000000000041e82d in Perl_call_sv ()
> > > > #14 0x00002b66ec9ee038 in XS_DBI_dispatch () from \
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so #15 \
> > > > 0x000000000046bc47 in Perl_pp_entersub () #16 0x000000000046a29e in \
> > > > Perl_runops_standard () #17 0x000000000041f1d1 in perl_run ()
> > > > #18 0x000000000041ba2c in main ()
> > > > 
> > > > ########################################################################################
> > > >  
> > > > *** glibc detected *** /usr/local/bin/perl: double free or corruption \
> > > > (!prev): 0x0000000001163e10 *** ======= Backtrace: =========
> > > > /lib64/libc.so.6[0x2ad39584e34e]
> > > > /lib64/libc.so.6(__libc_free+0x6c)[0x2ad39584f95c]
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(ora_st_execute_array+0xfa4)[0x2ad3980e6e94]
> > > >                 
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(XS_DBD__Oracle__st_ora_execute_array+0xef)[0x2ad3980e0f9f]
> > > >                 
> > > > /usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
> > > > /usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
> > > > /usr/local/bin/perl(Perl_call_sv+0x49d)[0x41e80d]
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so(XS_DBI_dispatch+0x7a8)[0x2ad395a27068]
> > > >                 
> > > > /usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
> > > > /usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
> > > > /usr/local/bin/perl(Perl_call_sv+0x49d)[0x41e80d]
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so(XS_DBI_dispatch+0x7a8)[0x2ad395a27068]
> > > >                 
> > > > /usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
> > > > /usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
> > > > /usr/local/bin/perl(perl_run+0x2c1)[0x41f1b1]
> > > > /usr/local/bin/perl(main+0xac)[0x41ba2c]
> > > > /lib64/libc.so.6(__libc_start_main+0xf4)[0x2ad395800154]
> > > > /usr/local/bin/perl[0x41b8e9]
> > > > ======= Memory map: ========
> > > > 00400000-004fc000 r-xp 00000000 08:02 427095                             \
> > > > /usr/local/perl-5.8.8/bin/perl 005fb000-00601000 rw-p 000fb000 08:02 427095   \
> > > > /usr/local/perl-5.8.8/bin/perl 00601000-01183000 rw-p 00601000 00:00 0        \
> > > > [heap] 2ad39511b000-2ad395136000 r-xp 00000000 08:02 4295                     \
> > > > /lib64/ld-2.4.so 2ad395136000-2ad395137000 rw-p 2ad395136000 00:00 0 
> > > > 2ad395147000-2ad395148000 rw-p 2ad395147000 00:00 0 
> > > > 2ad395235000-2ad395237000 rw-p 0001a000 08:02 4295                       \
> > > > /lib64/ld-2.4.so 2ad395237000-2ad39524a000 r-xp 00000000 08:02 4168           \
> > > > /lib64/libnsl-2.4.so 2ad39524a000-2ad395349000 ---p 00013000 08:02 4168       \
> > > > /lib64/libnsl-2.4.so 2ad395349000-2ad39534b000 rw-p 00012000 08:02 4168       \
> > > > /lib64/libnsl-2.4.so 2ad39534b000-2ad39534d000 rw-p 2ad39534b000 00:00 0 
> > > > 2ad39534d000-2ad39534f000 r-xp 00000000 08:02 4163                       \
> > > > /lib64/libdl-2.4.so 2ad39534f000-2ad39544f000 ---p 00002000 08:02 4163        \
> > > > /lib64/libdl-2.4.so 2ad39544f000-2ad395451000 rw-p 00002000 08:02 4163        \
> > > > /lib64/libdl-2.4.so 2ad395451000-2ad3954a5000 r-xp 00000000 08:02 4165        \
> > > > /lib64/libm-2.4.so 2ad3954a5000-2ad3955a4000 ---p 00054000 08:02 4165         \
> > > > /lib64/libm-2.4.so 2ad3955a4000-2ad3955a6000 rw-p 00053000 08:02 4165         \
> > > > /lib64/libm-2.4.so 2ad3955a6000-2ad3955a7000 rw-p 2ad3955a6000 00:00 0 
> > > > 2ad3955a7000-2ad3955b0000 r-xp 00000000 08:02 4161                       \
> > > > /lib64/libcrypt-2.4.so 2ad3955b0000-2ad3956af000 ---p 00009000 08:02 4161     \
> > > > /lib64/libcrypt-2.4.so 2ad3956af000-2ad3956b2000 rw-p 00008000 08:02 4161     \
> > > > /lib64/libcrypt-2.4.so 2ad3956b2000-2ad3956e0000 rw-p 2ad3956b2000 00:00 0 
> > > > 2ad3956e0000-2ad3956e2000 r-xp 00000000 08:02 4191                       \
> > > > /lib64/libutil-2.4.so 2ad3956e2000-2ad3957e1000 ---p 00002000 08:02 4191      \
> > > > /lib64/libutil-2.4.so 2ad3957e1000-2ad3957e3000 rw-p 00001000 08:02 4191      \
> > > > /lib64/libutil-2.4.so 2ad3957e3000-2ad39590a000 r-xp 00000000 08:02 4157      \
> > > > /lib64/libc-2.4.so 2ad39590a000-2ad395a0a000 ---p 00127000 08:02 4157         \
> > > > /lib64/libc-2.4.so 2ad395a0a000-2ad395a0d000 r--p 00127000 08:02 4157         \
> > > > /lib64/libc-2.4.so 2ad395a0d000-2ad395a0f000 rw-p 0012a000 08:02 4157         \
> > > > /lib64/libc-2.4.so 2ad395a0f000-2ad395a16000 rw-p 2ad395a0f000 00:00 0 
> > > > 2ad395a16000-2ad395a30000 r-xp 00000000 08:02 442510                     \
> > > > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so \
> > > > 2ad395a30000-2ad395b30000 ---p 0001a000 08:02 442510                     \
> > > > /usr/local/perl-5.8.8/lib/site_pzsh: 26245 abort (core dumped)  DBI_TRACE=2 
> 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic