From owner-freebsd-current@FreeBSD.ORG Wed Aug 4 22:05:09 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC69716A4CE; Wed, 4 Aug 2004 22:05:09 +0000 (GMT) Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4CAF843D53; Wed, 4 Aug 2004 22:05:09 +0000 (GMT) (envelope-from sven@dmv.com) Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46]) i74M569D030765; Wed, 4 Aug 2004 18:05:06 -0400 (EDT) (envelope-from sven@dmv.com) From: Sven Willenberger To: Scott Long In-Reply-To: <411154D8.1050001@freebsd.org> References: <20040804204915.8337A5D08@ptavv.es.net> <411154D8.1050001@freebsd.org> Content-Type: text/plain Date: Wed, 04 Aug 2004 18:03:21 -0400 Message-Id: <1091657001.29488.64.camel@lanshark.dmv.com> Mime-Version: 1.0 X-Mailer: Evolution 1.5.9 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 cc: freebsd-current@freebsd.org Subject: Re: Postgresql locks up server - no response at all X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2004 22:05:09 -0000 On Wed, 2004-08-04 at 15:27 -0600, Scott Long wrote: > Sven Willenberger wrote: > > > On Wed, 2004-08-04 at 13:49 -0700, Kevin Oberman wrote: > > > >>>Date: Wed, 4 Aug 2004 13:34:56 -0700 > >>>From: Jeremy Chadwick > >>>Sender: owner-freebsd-current@freebsd.org > >>> > >>>I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld. > >>>Please note that the server is "heavily loaded" (note the quotes); usually > >>>a load of around 0.50 to 1.00 at all times, with mysqld being the top > >>>process. Server runs all latest -CURRENT builds. > >>> > >>>Many people over in freebsd-threads mentioned this problem, and recommended > >>>all sorts-of different workarounds. I tried every one available to me, > >>>except mucking with PREEMPTION (as I did not feel comfortable tinkering > >>>with a random .h file on the box; seemed to be a kernel-related thing, > >>>so I'd rather have just an "options" line for it -- I'm conditionally > >>>lazy). > >> > >>Please note that PREEMPTION is now NOT enabled in CURRENT. scottl > >>changed that a day or two ago because of all of these lock-ups. He and > >>Julian are listed as working to isolate the problem. Scott believes it's > >>in the scheduler. It's not specific to either ULE or 4BSD. > >> > >>So cvsup, rebuild the kernel and you should be fine.At least for a while. > > > > > > Based on this and Jeremy C.'s response it would appear that I should > > either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary > > because of the vinum array - root is not mounted on a vinum device, but > > the data directory is - will gvinum simply read this correctly? it is a > > stripe+mirror array of 4 drives) or start from scratch and go back to > > 4.10 (STABLE) for a while. I am assuming that the lockups I am seeing > > were exacerbated by the PREEMPTION episodes of the past couple weeks? If > > I choose the upgrade to -CURRENT, are there any caveats or > > recommendations? (besides reading "/usr/src/UPDATING" which I do > > religiously anyway) > > > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > I'm a bit nervous with asking you to upgrade to -current. PREEMPTION is > practically disabled in 5.2.1 so upgrading has a low chance of fixing > the problem except maybe by sheer luck. The best action would be to > get a crashdump. If your system has an NMI button, then there are some > trivial patches that will assist with this. If not, then you might want > to look at backporting the ichwd watchdog driver and letting that do a > chip-assisted NMI. > > In any case, finding out exactly what each CPU is doing at the time of > the lockup is going to be vital. The lockups that I've been able to > reproduce happen when a TAILQ in the scheduler gets corrupted and > resulting in one CPU spinning on the list forever with the scheduler > lock held. All other cpus then quickly grind to a halt while they wait > for the sched lock to become free, which it never does. > The case unfortunately does not have a button (although the mobo does have an NMI header/jumper). Backporting the watchdog driver sounds doable; other than downloading the sys/dev/ichwd directory from a repository and adding "options ichwd" to my kernel config file, what else would be needed? I am willing to try to get at least one crashdump before I have to go back to a -STABLE setup or try something so I can get some uptime on this box.