[prev in list] [next in list] [prev in thread] [next in thread] 

List:       intermezzo-devel
Subject:    Dieing lentos
From:       "Peter J. Braam" <braam () cs ! cmu ! edu>
Date:       2000-03-19 18:26:18
[Download RAW message or body]

> > > > > "Andreas" == Andreas J Koenig <andreas.koenig@anima.de> writes:

    Andreas> Is it possible, all errors are due to clock skew? I had a
    Andreas> working intermezzo for many, many hours when the two
    Andreas> clocks were in sync. I changed one clock by several
    Andreas> minutes and from then on I could watch four lentos
    Andreas> die. 

Hi, 

Clock skew between the machines is ok, however, changing the time on the machine can \
cause a disconnect and that can have bad consequences.

Oh boy, looking at the logs below, there really is some bug eh?

- Peter -


    Andreas> Interestingly, the errors I had reported earlier all
    Andreas> happened with unsynced clocks.

    Andreas> Here are the memos of the four dieing lentos:

    Andreas> After hours of flawless operation, I had a lento dieing
    Andreas> at line 55 of Reintegrate. The line reads

    Andreas>      open(FILE, ">$filename") || die;

    Andreas> I had no debugging on, I wasn't even watching, so no
    Andreas> information available:-( Please change these die()s to
    Andreas> contain at least something.

    Andreas> ----------------------------------------------------------------------

    Andreas> I had a server die with debuglevel=1. Last words were:

    Andreas>   [Lento/Replicator.pm: 168] Replicator p66 shared:
    Andreas> SENDING SET 0 (CURRENT STATE 3)
    Andreas> POE::Session::POE/Session.pm (line 241) [Lento/List.pm:
    Andreas> 94] Iterator on (POE::Session=HASH(0x8adf11c)) class
    Andreas> Lento::Replicator, file Lento/Replicator.pm, line 145
    >>>>>> POE::Session=HASH(0x8a958a4); POE::Session=HASH(0x8a51260);
    >>>>>> POE::Session=HASH(0x8ad1540); POE::Session=HASH(0x8aeee00);
    >>>>>> POE::Session=HASH(0x8acc014); POE::Session=HASH(0x8938868);
    >>>>>> POE::Session=HASH(0x8b22170); POE::Session=HASH(0x8a513b0);
    >>>>>> POE::Session=HASH(0x8ae25ac); POE::Kernel=ARRAY(0x83c8940);
    >>>>>> POE::Session=HASH(0x8ad73e0) <<<<<
    Andreas>   -- No target - SOURCE: shared to p66 CML for send_done
    Andreas> to POE::Session=HASH(0x8adf11c) -- Poll Journal -
    Andreas> POE::Session=HASH(0x8a958a4) -- Acceptor -- FSDB shared
    Andreas> to p66 CML -- ReqDispatcher -- Journal -- shared to p66
    Andreas> CML -- Connection (from from ) -- GetML No such
    Andreas> pseudo-hash field "name" at POE/Kernel.pm line 898.

    Andreas> -------------------------------------------------------------------------


    Andreas> A client died with

    Andreas>   ==> [06:20:55] Reintegrate -> do_one_record [sender:
    Andreas> Reintegrate] [Lento/InterMezzo/ReqHandler.pm: 513]
    Andreas> record: 953443625, 1024, 0, 41952, -1072506109,
    Andreas> 953443631, 953443631, 953443631, 16895, 136136136, 65,
    Andreas> /koenigs/t/84>.D@4DB-6, SETATTR bad lstat -
    Andreas> /izo0/koenigs/t/84>.D@4DB-6


    Andreas> Yes, I am using funny filenames during testing.

    Andreas> --------------------------------------------------------------------------


    Andreas> Another client died with

    Andreas>   ==> [06:41:47] Reintegrate -> do_one_record [sender:
    Andreas> Reintegrate] [Lento/InterMezzo/ReqHandler.pm: 513]
    Andreas> record: 953444874, 1024, /koenigs/t/232, 7D94, UNLINK
    Andreas> [Lento/Reintegrate.pm: 147] Executing:
    Andreas> unlink(/izo0/koenigs/t/232/7D94) ==> [06:41:47]
    Andreas> Reintegrate -> complete_record [sender: Reintegrate]
    Andreas> ***RH*** NTS= 0 NTE= 9505 ==> [06:41:47] Reintegrate ->
    Andreas> do_one_record [sender: Reintegrate]
    Andreas> [Lento/InterMezzo/ReqHandler.pm: 513] record: 953444882,
    Andreas> 1024, /koenigs/t, 232, RMDIR [Lento/Reintegrate.pm: 159]
    Andreas> Executing: rmdir(/izo0/koenigs/t/232) Died at
    Andreas> Lento/Reintegrate.pm line 160.

    Andreas> --------------------------------------------------------------------------


    Andreas> Logfiles are available on request, but they are big:

    Andreas> # ls -l /home/lento@71.* -rw-r--r-- 1 root root 2057772
    Andreas> Mar 19 06:27 /home/lento@71.poe-kernel-name.out
    Andreas> -rw-r--r-- 1 root root 88118077 Mar 19 07:10
    Andreas> /home/lento@71.server_of_rmdir.out # ls -l
    Andreas> /usr/raid/lento@66.* -rw-r--r-- 1 root root 1288225 Mar
    Andreas> 19 06:20 /usr/raid/lento@66.bad-lstat.out -rw-r--r-- 1
    Andreas> root root 49500471 Mar 19 06:41
    Andreas> /usr/raid/lento@66.rmdir.out




    Andreas> -- andreas


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic