From owner-freebsd-stable Wed May 12 2: 8:59 1999 Delivered-To: freebsd-stable@freebsd.org Received: from lazlo.internal.steam.com (lazlo.steam.com [199.108.84.37]) by hub.freebsd.org (Postfix) with ESMTP id 9DD4614DD7; Wed, 12 May 1999 02:08:55 -0700 (PDT) (envelope-from cliff@steam.com) Received: from lazlo.internal.steam.com (cliff@lazlo.internal.steam.com [192.168.32.2]) by lazlo.internal.steam.com (8.9.3/8.9.3) with ESMTP id CAA02414; Wed, 12 May 1999 02:09:01 -0700 (PDT) Date: Wed, 12 May 1999 02:09:00 -0700 (PDT) From: Cliff Skolnick X-Sender: cliff@lazlo.internal.steam.com To: David Greenman Cc: Mike Tancsa , freebsd-stable@FreeBSD.ORG, luoqi@FreeBSD.ORG, Matthew Dillon Subject: Re: vm_fault deadlock and PR 8416 ... NOT fixed! In-Reply-To: <199905120755.AAA01361@implode.root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 12 May 1999, David Greenman wrote: > >Well a few minutes ago my system went into deadlock - and this is with the > >kern_lock.c dated 5/11. This patch is different than the one in 8416 that > >solved my problem before. I'd say this the problem is still there. > > Time is very short for getting this fixed before the release deadline. I > think Luoqi's patch that was in the PR was suseptible to a priority inversion > problem and has risks associated with using it. The fix that Matt Dillion > made for -current that I back-ported to -stable was an attempt to fix the > problem while minimizing the side effects. If it doesn't fix the problem > then we'll proceed with plan B which is probably to just go with Luoqi's > fix or to possibly troubleshoot Matt's fix (but as I said, time is short). I'll do whatever I can do, I've been looking at the code and am thinking about some debugging strategies to try and figure out what is happening. Unfortunately this code is complex in regards to potential interactions, so I am going through this quite slowly. I really can't afford to back out the patch at this point, but I can add any code to my kernel to gather debugging info for people working on this. It would be probably better for you to tell me what you want to know than for me to guess. I have lots of disk space, so super verbose stuff is OK. My system is a PII 350 with 128MB of memory and 512MB swap, 3 SCSI buses (a 2940 and a 3940), an intel etherexpress 10/100, and a zynx 4 port 10/100 card. This problem started the begining of this month after an installworld and kernel update. This system had been running 3.1 since 3.1-release and 2.2.x since mid summer without a single crash, well except for that bad SCSI cable in september which was my fault. The problem does not occur until there is some paging activity. It also usually seems to happen when users with large mailboxes are writing to their mail spools. Usually there are 2-6 people reading mail on the machine at any given time, 15 active users in total. 20% of the users use PINE accessing the spool directly, 20% use pine but accessing the spool via IMAP, and 60% use IMAP only from other systems (at times also with a shell open on the system but not reading mail). It seems the first two pine categories are much more likely to cause the problem, but those users also are the ones with bigger spool files. The last deadlock was me running screen, pine, and compiling egcs. There was one other user logged into the system with an idle shell open, the other user was also reading mail via IMAP from netscape. This is not really a heavy load for this system, but with 128MB it will probably start to use swap at this point. The same machine runs: samba, active mounts - no tranfers in progress apache, no requests being served DNS, constant traffic, but not high volume sendmail, was attempting a few local and about 10 remote deliveries at the time. IMAP had two connections active, my pine session and the netscape session on a remote machine No X server is ever run on this machine, ssh, telnet and an occasional console login is how this machine is used. Routing, it's the router between a few very low use LANs and the firewall router. > > >Once again my server is useless, deadlocked. No panic, responding to pings, > >no ability to do disk I/O or any VM related stuff. > > > >An unhappy freebsd user once again, > > Is this really necessary? It sure doesn't help the debugging process. This was not a flame, but an expression of my frustration. Necessary, no. True, yes. I make my living helping other people with their systems figuring out solutions. I've recommended FreeBSD a hell of a lot, it is my #1 choice due to it's stability. I used to use linux, but stopped recommending Linux for anything except a firewall almost a year ago after bad luck with most any user level process running for a long period of time, especially named. When I install FreeBSD for or at a client site and it crashes, it is I who looks bad. I know there will be more mail in my mailbox about people that can't get access to their mail and web sites that are down. I think it is perfectly reasonable to be unhappy, and to express that in a limited manner. Anyways, it's late and I hope this post made sense. My writing skills decrease rapidly when I'm falling asleep at the keyboard. Cliff -- Cliff Skolnick | "They that can give up essential liberty to obtain Steam Tunnel Operations | a little temporary safety deserve neither liberty cliff@steam.com | nor safety." http://www.steam.com/ | -- Benjamin Franklin, 1759 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message