From owner-freebsd-current@FreeBSD.ORG Thu Aug 12 17:17:59 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5EA4316A4CE; Thu, 12 Aug 2004 17:17:59 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0D3C443D1D; Thu, 12 Aug 2004 17:17:59 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i7CHGJRu059094; Thu, 12 Aug 2004 13:16:19 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i7CHGJEh059091; Thu, 12 Aug 2004 13:16:19 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 12 Aug 2004 13:16:19 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Don Lewis In-Reply-To: <200408121709.i7CH98H8020875@gw.catspoiler.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@FreeBSD.org cc: mb@imp.ch Subject: Re: SCHEDULE and high load situations X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2004 17:17:59 -0000 On Thu, 12 Aug 2004, Don Lewis wrote: > > (gdb) l *unp_connect2+0x2a > > 0x1f93 is in unp_connect2 (/usr/src/sys/kern/uipc_usrreq.c:892). > > 887 UNP_LOCK_ASSERT(); > > 888 > > 889 if (so2->so_type != so->so_type) > > 890 return (EPROTOTYPE); > > 891 unp2 = sotounpcb(so2); > > 892 unp->unp_conn = unp2; > > 893 switch (so->so_type) { > > 894 > > 895 case SOCK_DGRAM: > > 896 LIST_INSERT_HEAD(&unp2->unp_refs, unp, unp_reflink); > > Looks like unp is NULL here. > > My first suspicion would be the recent memory allocation changes that > affected the type safety of various dynamically allocated data > structures, though I'm not sure that fits the symptoms. Hmm. I thought unix domain sockets weren't affected by those changes, but could be wrong. However, it does look like a null pointer dereference, and in particular, a possible race between two threads accessesing either the same end or opposite ends of a unix domain socket. Martin's dropping a core dump, kernel, and source tree for me to look at. Some early debugging shows that the unix domain socket is a datagram oriented socket, and that the SS_NOFDREF flag is set in so->so_state, suggesting maybe we have a race between connect() and close() in the application. However, I need to sit down with the core for a bit. I would have expected a more likely race to be between two unix domain socket endpoints, since most applications don't mess up with file descriptors, I would think. In any case, more details soon. I'm guessing the race was present previously, but the move to ADAPTIVE_GIANT has caused it to trigger more frequently on Martin's system. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research