From owner-freebsd-net@FreeBSD.ORG Sat Aug 18 10:28:06 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA19616A41A for ; Sat, 18 Aug 2007 10:28:06 +0000 (UTC) (envelope-from vanhu@zeninc.net) Received: from smtp.zeninc.net (reverse-25.fdn.fr [80.67.176.25]) by mx1.freebsd.org (Postfix) with ESMTP id 7D4EA13C457 for ; Sat, 18 Aug 2007 10:28:06 +0000 (UTC) (envelope-from vanhu@zeninc.net) Received: from jayce.zen.inc (jayce.zen.inc [192.168.1.7]) by smtp.zeninc.net (smtpd) with ESMTP id D27863F7A for ; Sat, 18 Aug 2007 12:28:04 +0200 (CEST) Received: by jayce.zen.inc (Postfix, from userid 1000) id 916052E5B5; Sat, 18 Aug 2007 12:28:03 +0200 (CEST) Date: Sat, 18 Aug 2007 12:28:03 +0200 From: VANHULLEBUS Yvan To: freebsd-net@freebsd.org Message-ID: <20070818102803.GA1319@jayce.zen.inc> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: All mail clients suck. This one just sucks less. Subject: Re: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_1_2 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Aug 2007 10:28:07 -0000 On Fri, Aug 17, 2007 at 04:53:56PM -0400, Scott Ullrich wrote: > Hello! Hi. > We are trying to track down a problem that involves a large number of > ipsec tunnels (in this case 80). Frequently racoon (ipsec-tools > 0.7rc1 and also 0.6) will deadlock into the sbwait state or will enter > a 100% cpu usage state and will not recover without killing the > process and restarting. > [....] [backtrace] > #0 0x2827a187 in recvfrom () from /lib/libc.so.6 > #1 0x28225904 in recv () from /lib/libc.so.6 > #2 0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826 > #3 0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314 [....] > Does anyone know what we can look at further to try and eliminate the > problem or does anyone have suggestions on how we can debug further? It really looks like an old "known" (well, at least known by me...) problem with PFKey interface: it is quite impossible to set up more than 50-100 tunnels on a standard FreeBSD (and probably any other KAME based stack), because some kind of socket related problems will happen when racoon will try to get the SPD or the SADB entries. When the problem occurs withe the SPD, racoon won't be able to negociate some tunnels (because it doesn't have the SPD entries in it's own table), when the problems occurs with the SADB, it can lead to the 100% CPU usage you have.... Some workarounds are possible depending on your configuration, you may be able to reduce the number of used SAs (merge some phases2 with contiguous subnets, use REQUIRE instead of UNIQUE for some tunnels, etc...), but if you have 80 peers with each one only ONE phase2, that's another problem.... To solve that problem, the only solution we found is to do a big PFKey hack, to have only one request/response, and all the SPD/SAD entries exchanged via a single buffer shared by kernel and racoon. I also know an old bug in sbspace macro (found in FreeBSD 4.x), but it seems it has been fixed at least in FreeBSD 6. Yvan. -- NETASQ http://www.netasq.com