From owner-freebsd-stable@FreeBSD.ORG Thu Feb 8 20:49:15 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5127E16A406 for ; Thu, 8 Feb 2007 20:49:15 +0000 (UTC) (envelope-from frode@nordahl.net) Received: from smtp1.powertech.no (smtp1.powertech.no [195.159.0.145]) by mx1.freebsd.org (Postfix) with ESMTP id DB8C213C4AA for ; Thu, 8 Feb 2007 20:49:14 +0000 (UTC) (envelope-from frode@nordahl.net) Received: from [195.159.148.126] (dhcp7.xu.nordahl.net [195.159.148.126]) by smtp1.powertech.no (Postfix) with ESMTP id 44FB48B20; Thu, 8 Feb 2007 21:49:13 +0100 (CET) In-Reply-To: <20061127092146.GA69556@deviant.kiev.zoral.com.ua> References: <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <84F23118-A6C3-44F8-B3FD-AE21C50D0EF9@nordahl.net> Content-Transfer-Encoding: 7bit From: Frode Nordahl Date: Thu, 8 Feb 2007 21:49:23 +0100 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-stable@freebsd.org, V??clav Haisman , tegge@freebsd.org, bde@freebsd.org, freebsd-current@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Feb 2007 20:49:15 -0000 On 27. nov. 2006, at 10.21, Kostik Belousov wrote: > On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: >> Hi, >> the attached lor.txt contains LOR I got this yesterday. It is >> FreeBSD 6.1 >> with relatively recent kernel, from last week or so. >> >> -- >> VH > >> +lock order reversal: >> + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: >> 1547 >> + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ >> sys/ufs/ufs/ufs_vnops.c:138 >> +KDB: stack backtrace: >> +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) >> at kdb_backtrace+0x2f >> +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at >> witness_checkorder+0x5fe >> +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at >> _mtx_lock_flags+0x32 >> +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at >> ufs_itimes+0x6c >> +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at >> ufs_getattr+0x20 >> +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) >> at VOP_GETATTR_APV+0x3a >> +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 >> +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 >> +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at >> VOP_WRITE_APV+0x148 >> +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write >> +0x201 >> +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at >> dofilewrite+0x84 >> +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 >> +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f >> +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 >> +Xint0x80_syscall() at Xint0x80_syscall+0x1f >> +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = >> 0xbfbfea1c, ebp = 0xbfbfea48 --- > > Thank you for the report. The LOR is caused by my commit into > sys/ufs/ufs/ufs_vnops.c, rev. 1.280. While debugging a problem I have with 6.2-RELEASE on one of my servers I saw this LOR. After being up for a short while the server freezes, not responding to serial console, network og keyboard. I can't even get to DDB by sending BREAK on the serial console. Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN did not give more information about the freeze other than printing the LOR now and then. The LOR I am getting is exactly the same except the calls are made to writev instead of write. > What application you run that triggers the LOR ? Patch below is one > possible approach to fixing it. I am seeing this on a front-end MX server, I can trigger it by running "tail -f /var/log/maillog", the LOR is printed before any output is printed by tail. After triggering it once, it will not trigger regularilly until waiting for some time. Waiting 180 seconds seems to be good to make it happen every time, but it can be triggered earlier. My maillog grows about 976K during that time. May this LOR have something to do with the system freze I am experiencing? Should I try the patch in your mail from november 27. or december 13? Or has some other fix emerged since then? -- Frode Nordahl