From owner-freebsd-net@FreeBSD.ORG Tue Mar 1 20:47:36 2005 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 77DF416A4CE for ; Tue, 1 Mar 2005 20:47:36 +0000 (GMT) Received: from mail24.sea5.speakeasy.net (mail24.sea5.speakeasy.net [69.17.117.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9DE3843D46 for ; Tue, 1 Mar 2005 20:47:35 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 11895 invoked from network); 1 Mar 2005 20:47:35 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender )AES256-SHA encrypted SMTP for ; 1 Mar 2005 20:47:34 -0000 Received: from [10.50.40.202] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j21KlPbX074756; Tue, 1 Mar 2005 15:47:29 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-sparc64@FreeBSD.org Date: Tue, 1 Mar 2005 13:40:18 -0500 User-Agent: KMail/1.6.2 References: <20050301000436.GA33346@xor.obsecurity.org> In-Reply-To: <20050301000436.GA33346@xor.obsecurity.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200503011340.18162.jhb@FreeBSD.org> X-Spam-Status: No, score=-102.8 required=4.2 tests=ALL_TRUSTED, USER_IN_WHITELIST autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx cc: net@FreeBSD.org cc: rwatson@FreeBSD.org cc: bmilekic@FreeBSD.org cc: sparc64@FreeBSD.org cc: Kris Kennaway Subject: Re: Race condition in mb_free_ext()? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2005 20:47:36 -0000 On Monday 28 February 2005 07:04 pm, Kris Kennaway wrote: > I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines > running RELENG_5. It's hard to get a good trace because the processes > running on other CPUs cannot be traced from DDB, but I've been lucky a > few times: > > db> show alllocks > Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008) > exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @ > netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0 > (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex > tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15 > Tracing pid 15 tid 100008 td 0xfffff8001fb07480 > sab_intr() at sab_intr+0x40 > psycho_intr_stub() at psycho_intr_stub+0x8 > intr_fast() at intr_fast+0x88 > -- interrupt level=0xd pil=0 %o7=0xc01a0040 -- > mb_free_ext() at mb_free_ext+0x28 > sbdrop_locked() at sbdrop_locked+0x19c > tcp_input() at tcp_input+0x2aa0 > ip_input() at ip_input+0x964 > netisr_processqueue() at netisr_processqueue+0x7c > swi_net() at swi_net+0x120 > ithread_loop() at ithread_loop+0x24c > fork_exit() at fork_exit+0xd4 > fork_trampoline() at fork_trampoline+0x8 > db> > > That code is here in mb_free_ext(): > > /* > * This is tricky. We need to make sure to decrement the > * refcount in a safe way but to also clean up if we're the > * last reference. This method seems to do it without race. > */ > while (dofree == 0) { > cnt = *(m->m_ext.ref_cnt); > if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) { > if (cnt == 1) > dofree = 1; > break; > } > } Well, this is obtuse at least. A simpler version would be: do { cnt = *m->m_ext.ref_cnt; } while (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1) == 0); dofree = (cnt == 1); -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org