[prev in list] [next in list] [prev in thread] [next in thread]
List: openbsd-bugs
Subject: Re: OpenBSD -current panic under heavy load on a squid proxy
From: Frederic URBAN <frederic.urban () ircad ! fr>
Date: 2016-03-29 15:26:24
Message-ID: 56FA9EA0.6050208 () ircad ! fr
[Download RAW message or body]
Hello,
Looks like this "workaround" break the tcp stack. I was unable to
connect to the server through ssh after modifing tcp_input.c
Any other idea ? I repeat, same setup under 5.4 was working properly for
400d+ uptime :O
Le 26/03/2016 18:25, Alexander Bluhm a écrit :
> On Thu, Mar 24, 2016 at 05:21:00PM +0100, Frederic URBAN wrote:
>> panic: kernel diagnostic assertion "sotoinpcb(inp->inp_socket) == inp"
>> failed: file "../../../../netinet/tcp_input.c", line 632
>> Stopped at? ? ? ? ? Debugger+0x9:? ? leave
>> ? ? TID? ? ? PID? ? ? UID? ? ? ? PRFLAGS? ? ? ? PFLAGS? CPU?
>> COMMAND
>> ? 40563? 40563? ? ? 515? ? ? ? ? ? ? 0x32? ? ? ? ? ? 0x80? ? ? 2?
>> squid
>> * 1402? ? 1402? ? ? ? ? 0? ? ? ? 0x14000? ? ? ? ? 0x210? ? ? 4?
>> softnet
>> Debugger() at Debugger+0x9
>> panic() at panic+0xfe
>> __assert() at __assert+0x25
>> tcp_input() at tcp_input+0x122c
>> ipv4_input() at ipv4_input+0x32e
>> ipintr() at ipintr+0x1e
>> netintr() at netintr+0x64
>> softintr_dispatch() at softintr_dispatch+0x8b
>> Xsoftnet() at Xsoftnet+0x1f
>> --- interrupt ---
>> end trace frame: 0x0, count: 6
>> taskq_thread+0x6c:
> Interesting, I am trying to find and fix this bug for years. We
> know that the pointers within the kernel are inconsistent when it
> crashes. But it is unclear what caused the corruption.
>
>> ? ? ? ? ? ? ? Very specific setup, squid + squidGuard + pf (pf.conf
>> attached) It was working under OpenBSD 5.4
> I added this assertion in OpenBSD 5.5. The bug was there before,
> but did not show up that clearly. Back then it paniced with some
> use after free of the pcb.
>
> ----------------------------
> revision 1.268
> date: 2013/09/06 18:35:16; author: bluhm; state: Exp; lines: +3 -1;
> In one core dump the pointers to socket, inpcb, tcpcb on the stack
> of tcp_input() and tcp_output() were very inconsistent. Especially
> the so->so_pcb is NULL which can only happen after the inp has been
> detached. The whole issue looks similar to the old panic:
> pool_do_get(inpcbpl): free list modified.
> http://marc.info/?l=openbsd-bugs&m=132630237316970&w=2
>
> To get more information, add some asserts that guarantee the
> consistency of the socket, inpcb, tcpcb linking. They should trigger
> when an inp is taken from the pcb hashes after it has been freed.
> OK henning@
> ----------------------------
>
>> ? ? ? ? ? ? ? This squid proxy is a transparent proxy using squid and
>> squidguard. pf divert packets to the lo interface.
> This is simmilar to the setup of our customers where I saw the crash.
>
>> ddb{1}> mach ddbcpu 0x02
>> Stopped at? ? ? ? ? Debugger+0x9:? ? leave
>> Debugger() at Debugger+0x9
>> x86_ipi_handler() at x86_ipi_handler+0x76
>> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
>> --- interrupt ---
>> __mp_lock() at __mp_lock+0x48
>> __mp_acquire_count() at __mp_acquire_count+0x2b
>> mi_switch() at mi_switch+0x21e
>> sleep_finish() at sleep_finish+0xb1
>> tsleep() at tsleep+0x154
>> kqueue_scan() at kqueue_scan+0x138
>> sys_kevent() at sys_kevent+0x282
>> syscall() at syscall+0x368
>> --- syscall (number 72) ---
>> end of kernel
>> end trace frame: 0x7f7ffffd1758, count: 4
>> 0xa9630355e9a:
> Another process is waiting for kqueue. Not surprising. I have also
> seen this with select.
>
>> ddb{3}> mach ddbcpu 0x04
>> Stopped at? ? ? ? ? Debugger+0x9:? ? leave
>> Debugger() at Debugger+0x9
>> panic() at panic+0xfe
>> __assert() at __assert+0x25
>> tcp_input() at tcp_input+0x122c
>> ipv4_input() at ipv4_input+0x32e
>> ipintr() at ipintr+0x1e
>> netintr() at netintr+0x64
>> softintr_dispatch() at softintr_dispatch+0x8b
>> Xsoftnet() at Xsoftnet+0x1f
>> --- interrupt ---
>> end trace frame: 0x0, count: 6
>> taskq_thread+0x6c:
> And that is the CPU where it panics.
>
>> pass quick proto carp keep state (no-sync)
>> pass quick on sync proto pfsync keep state (no-sync)
> I have seen this on machines without carp and without pfsync. So
> I think it is not related.
>
>> pass in log quick on proxy inet proto tcp from <lan_networks> to any port
>> 80 route-to lo0 divert-to 127.0.0.1 port 3128
> This seems to be the rule that diverts all the traffic and causes
> the trouble.
>
> Thanks for the bug report. I am sorry that I have no solution for
> you. I will continue thinking about it.
>
> As a workaround you could try the following diff. Normally a pf
> state is used to find a socket. This is used as speed optimization.
> It is also necessary when using source transparent relays without
> a divert-reply rule. Your setup should work without it, so try
> this diff which disables it.
>
> I would be interrested wether your setup still works without the
> pf_inp_lookup(). And does this diff make the panic go away?
>
> bluhm
>
> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.315
> diff -u -p -r1.315 tcp_input.c
> --- netinet/tcp_input.c 21 Mar 2016 15:52:27 -0000 1.315
> +++ netinet/tcp_input.c 26 Mar 2016 17:14:56 -0000
> @@ -579,7 +579,7 @@ tcp_input(struct mbuf *m, ...)
> /*
> * Locate pcb for segment.
> */
> -#if NPF > 0
> +#if 0
> inp = pf_inp_lookup(m);
> #endif
> findpcb:
--
Frédéric URBAN
*Frédéric URBAN*
Ingénieur Réseaux
frederic.urban@ircad.fr <mailto:frederic.urban@ircad.fr>
Tél. : +33 (0)3 88 119 038
IRCAD France
http://www.ircad.fr/ <http://www.ircad.fr/>
Suivez l'IRCAD sur Facebook
<http://www.facebook.com/pages/IRCAD/193785273990141>
*IRCAD France*
Hôpitaux Universitaires - 1, place de l'Hôpital - 67091 Strasbourg Cedex
- FRANCE
[Attachment #3 (multipart/related)]
["ighdacgc.png" (image/png)]
["djfdcaga.png" (image/png)]
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic