[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-bugs
Subject:    Re: OpenBSD -current panic under heavy load on a squid proxy
From:       Frederic URBAN <frederic.urban () ircad ! fr>
Date:       2016-03-29 15:26:24
Message-ID: 56FA9EA0.6050208 () ircad ! fr
[Download RAW message or body]

Hello,

Looks like this "workaround" break the tcp stack. I was unable to 
connect to the server through ssh after modifing tcp_input.c

Any other idea ? I repeat, same setup under 5.4 was working properly for 
400d+ uptime :O

Le 26/03/2016 18:25, Alexander Bluhm a écrit :
> On Thu, Mar 24, 2016 at 05:21:00PM +0100, Frederic URBAN wrote:
>> panic: kernel diagnostic assertion "sotoinpcb(inp->inp_socket) == inp"
>> failed: file "../../../../netinet/tcp_input.c", line 632
>> Stopped at? ? ? ? ?  Debugger+0x9:? ?  leave
>> ? ?  TID? ? ?  PID? ? ?  UID? ? ? ?  PRFLAGS? ? ? ?  PFLAGS?  CPU?
>> COMMAND
>> ? 40563?  40563? ? ?  515? ? ? ? ? ? ?  0x32? ? ? ? ? ?  0x80? ? ?  2?
>> squid
>> * 1402? ?  1402? ? ? ? ?  0? ? ? ?  0x14000? ? ? ? ?  0x210? ? ?  4?
>> softnet
>> Debugger() at Debugger+0x9
>> panic() at panic+0xfe
>> __assert() at __assert+0x25
>> tcp_input() at tcp_input+0x122c
>> ipv4_input() at ipv4_input+0x32e
>> ipintr() at ipintr+0x1e
>> netintr() at netintr+0x64
>> softintr_dispatch() at softintr_dispatch+0x8b
>> Xsoftnet() at Xsoftnet+0x1f
>> --- interrupt ---
>> end trace frame: 0x0, count: 6
>> taskq_thread+0x6c:
> Interesting, I am trying to find and fix this bug for years.  We
> know that the pointers within the kernel are inconsistent when it
> crashes.  But it is unclear what caused the corruption.
>
>> ? ? ? ? ? ? ?  Very specific setup, squid + squidGuard + pf (pf.conf
>> attached) It was working under OpenBSD 5.4
> I added this assertion in OpenBSD 5.5.  The bug was there before,
> but did not show up that clearly.  Back then it paniced with some
> use after free of the pcb.
>
> ----------------------------
> revision 1.268
> date: 2013/09/06 18:35:16;  author: bluhm;  state: Exp;  lines: +3 -1;
> In one core dump the pointers to socket, inpcb, tcpcb on the stack
> of tcp_input() and tcp_output() were very inconsistent.  Especially
> the so->so_pcb is NULL which can only happen after the inp has been
> detached.  The whole issue looks similar to the old panic:
> pool_do_get(inpcbpl): free list modified.
> http://marc.info/?l=openbsd-bugs&m=132630237316970&w=2
>
> To get more information, add some asserts that guarantee the
> consistency of the socket, inpcb, tcpcb linking.  They should trigger
> when an inp is taken from the pcb hashes after it has been freed.
> OK henning@
> ----------------------------
>
>> ? ? ? ? ? ? ?  This squid proxy is a transparent proxy using squid and
>> squidguard. pf divert packets to the lo interface.
> This is simmilar to the setup of our customers where I saw the crash.
>
>> ddb{1}> mach ddbcpu 0x02
>> Stopped at? ? ? ? ?  Debugger+0x9:? ?  leave
>> Debugger() at Debugger+0x9
>> x86_ipi_handler() at x86_ipi_handler+0x76
>> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
>> --- interrupt ---
>> __mp_lock() at __mp_lock+0x48
>> __mp_acquire_count() at __mp_acquire_count+0x2b
>> mi_switch() at mi_switch+0x21e
>> sleep_finish() at sleep_finish+0xb1
>> tsleep() at tsleep+0x154
>> kqueue_scan() at kqueue_scan+0x138
>> sys_kevent() at sys_kevent+0x282
>> syscall() at syscall+0x368
>> --- syscall (number 72) ---
>> end of kernel
>> end trace frame: 0x7f7ffffd1758, count: 4
>> 0xa9630355e9a:
> Another process is waiting for kqueue.  Not surprising.  I have also
> seen this with select.
>
>> ddb{3}> mach ddbcpu 0x04
>> Stopped at? ? ? ? ?  Debugger+0x9:? ?  leave
>> Debugger() at Debugger+0x9
>> panic() at panic+0xfe
>> __assert() at __assert+0x25
>> tcp_input() at tcp_input+0x122c
>> ipv4_input() at ipv4_input+0x32e
>> ipintr() at ipintr+0x1e
>> netintr() at netintr+0x64
>> softintr_dispatch() at softintr_dispatch+0x8b
>> Xsoftnet() at Xsoftnet+0x1f
>> --- interrupt ---
>> end trace frame: 0x0, count: 6
>> taskq_thread+0x6c:
> And that is the CPU where it panics.
>
>> pass quick proto carp keep state (no-sync)
>> pass quick on sync proto pfsync keep state (no-sync)
> I have seen this on machines without carp and without pfsync.  So
> I think it is not related.
>
>> pass in log quick on proxy inet proto tcp from <lan_networks> to any port
>> 80 route-to lo0 divert-to 127.0.0.1 port 3128
> This seems to be the rule that diverts all the traffic and causes
> the trouble.
>
> Thanks for the bug report.  I am sorry that I have no solution for
> you.  I will continue thinking about it.
>
> As a workaround you could try the following diff.  Normally a pf
> state is used to find a socket.  This is used as speed optimization.
> It is also necessary when using source transparent relays without
> a divert-reply rule.  Your setup should work without it, so try
> this diff which disables it.
>
> I would be interrested wether your setup still works without the
> pf_inp_lookup().  And does this diff make the panic go away?
>
> bluhm
>
> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.315
> diff -u -p -r1.315 tcp_input.c
> --- netinet/tcp_input.c	21 Mar 2016 15:52:27 -0000	1.315
> +++ netinet/tcp_input.c	26 Mar 2016 17:14:56 -0000
> @@ -579,7 +579,7 @@ tcp_input(struct mbuf *m, ...)
>   	/*
>   	 * Locate pcb for segment.
>   	 */
> -#if NPF > 0
> +#if 0
>   	inp = pf_inp_lookup(m);
>   #endif
>   findpcb:

-- 
Frédéric URBAN
*Frédéric URBAN*
Ingénieur Réseaux

frederic.urban@ircad.fr <mailto:frederic.urban@ircad.fr>
Tél. : +33 (0)3 88 119 038
		IRCAD France
http://www.ircad.fr/ <http://www.ircad.fr/>

Suivez l'IRCAD sur Facebook 
<http://www.facebook.com/pages/IRCAD/193785273990141>

*IRCAD France*
Hôpitaux Universitaires - 1, place de l'Hôpital - 67091 Strasbourg Cedex 
- FRANCE


[Attachment #3 (multipart/related)]

["ighdacgc.png" (image/png)]
["djfdcaga.png" (image/png)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic