[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] Overflow on longclock_t?
From:       Guillem Anguera <ganguera () datagrama ! net>
Date:       2008-03-28 9:06:13
Message-ID: 47ECB505.1080804 () datagrama ! net
[Download RAW message or body]

Hi list!

I asked this on main mailing list, but nobody seems to know it...

I'm using heartbeat on 2 active/active firewall systems; within the last 
48 hours, coinciding with an uptime of 49 days and a few hours, all 
servers have suffered the same problem: /var/log/heartbeat.log grows 
until fills /var free space partition with messages like attached file. 
 From the first message, the other node take over all resources despite 
of the original node isn't able to release it, at this point original 
node doesn't works.

I watch at source code (include/clplumbing/longclock.h) that longclock_t 
is at least defined as 64 bits variable, that seems to be enough. But I 
think that on my servers is defined as 32 bits variable:

2^32 = 4294967296 / 1000 (miliseconds to seconds) = 4294967,296 / 3600 
(seconds to hours) = 1193,046471111 / 24 (hours to days) = 49,71026963 
days, like system's uptime.

What do you think? Is that possible?

Additional Information:
Debian version: Sarge (3.1)
Vanilla kernel version: 2.4.34.5
Debian heartbeat version: 2.0.7-2

P.D: Sorry for my poor english skills.

-- 
Guillem Anguera
Administrador de Sistemas
Jazztel - DATAGRAMA
Tel: 900 80 83 80
Fax: +34 93 289 63 10
Em@il: ganguera () datagrama ! net
http://www.jazztel.es


["heartbeat.log" (text/x-log)]

Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Daily informational memory statistics
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 11/12254511 ms age 0 \
                [pid24261/MST_CONTROL]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 729/386969605  \
                103240/48586 [pid24261/MST_CONTROL]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 327624 total malloc \
                bytes. pid [24261/MST_CONTROL]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 0/5 ms age -282384916 \
                [pid24264/HBFIFO]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 348/493  43568/21103 \
                [pid24264/HBFIFO]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 45396 total malloc \
                bytes. pid [24264/HBFIFO]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 0/0 ms age -18800606 \
                [pid24265/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 350/2593550  \
                43800/21267 [pid24265/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 52740 total malloc \
                bytes. pid [24265/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 0/0 ms age -18800606 \
                [pid24266/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 352/9774446  \
                43968/21355 [pid24266/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 44592 total malloc \
                bytes. pid [24266/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 0/0 ms age -18800606 \
                [pid24267/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 352/2593560  \
                43968/21355 [pid24267/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 52908 total malloc \
                bytes. pid [24267/HBWRITE]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: MSG stats: 0/0 ms age -18800606 \
                [pid24268/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: ha_malloc stats: 352/9774410  \
                43968/21355 [pid24268/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: RealMalloc stats: 44592 total malloc \
                bytes. pid [24268/HBREAD]
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: Current arena value: 0
Mar 25 11:09:41 fw02 heartbeat: [24261]: info: These are nothing to worry about.
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 16650912
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 33311431
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 16650913
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 33311432
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 16650914
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 33311433
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 33311434
Mar 25 16:23:00 fw02 logd: [24003]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 16650915
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: old value was 429496648, \
                new value is 19, diff is 429496629, callcount 33311435
Mar 25 16:23:00 fw02 logd: [23999]: CRIT: time_longclock: clock_t from times(2) \
                appears to have jumped backwards (in error)!
...



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic