From owner-freebsd-current@FreeBSD.ORG  Wed Aug  4 22:05:09 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id AC69716A4CE; Wed,  4 Aug 2004 22:05:09 +0000 (GMT)
Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4CAF843D53; Wed,  4 Aug 2004 22:05:09 +0000 (GMT)
	(envelope-from sven@dmv.com)
Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46])
	i74M569D030765;	Wed, 4 Aug 2004 18:05:06 -0400 (EDT)
	(envelope-from sven@dmv.com)
From: Sven Willenberger <sven@dmv.com>
To: Scott Long <scottl@freebsd.org>
In-Reply-To: <411154D8.1050001@freebsd.org>
References: <20040804204915.8337A5D08@ptavv.es.net>
	<411154D8.1050001@freebsd.org>
Content-Type: text/plain
Date: Wed, 04 Aug 2004 18:03:21 -0400
Message-Id: <1091657001.29488.64.camel@lanshark.dmv.com>
Mime-Version: 1.0
X-Mailer: Evolution 1.5.9 
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.39
cc: freebsd-current@freebsd.org
Subject: Re: Postgresql locks up server - no response at all
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Aug 2004 22:05:09 -0000

On Wed, 2004-08-04 at 15:27 -0600, Scott Long wrote:
> Sven Willenberger wrote:
> 
> > On Wed, 2004-08-04 at 13:49 -0700, Kevin Oberman wrote:
> > 
> >>>Date: Wed, 4 Aug 2004 13:34:56 -0700
> >>>From: Jeremy Chadwick <freebsd@jdc.parodius.com>
> >>>Sender: owner-freebsd-current@freebsd.org
> >>>
> >>>I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld.
> >>>Please note that the server is "heavily loaded" (note the quotes); usually
> >>>a load of around 0.50 to 1.00 at all times, with mysqld being the top
> >>>process.  Server runs all latest -CURRENT builds.
> >>>
> >>>Many people over in freebsd-threads mentioned this problem, and recommended
> >>>all sorts-of different workarounds.  I tried every one available to me,
> >>>except mucking with PREEMPTION (as I did not feel comfortable tinkering
> >>>with a random .h file on the box; seemed to be a kernel-related thing,
> >>>so I'd rather have just an "options" line for it -- I'm conditionally
> >>>lazy).
> >>
> >>Please note that PREEMPTION is now NOT enabled in CURRENT. scottl
> >>changed that a day or two ago because of all of these lock-ups. He and
> >>Julian are listed as working to isolate the problem. Scott believes it's
> >>in the scheduler. It's not specific to either ULE or 4BSD.
> >>
> >>So cvsup, rebuild the kernel and you should be fine.At least for a while.
> > 
> > 
> > Based on this and Jeremy C.'s response it would appear that I should
> > either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary
> > because of the vinum array - root is not mounted on a vinum device, but
> > the data directory is - will gvinum simply read this correctly? it is a
> > stripe+mirror array of 4 drives) or start from scratch and go back to
> > 4.10 (STABLE) for a while. I am assuming that the lockups I am seeing
> > were exacerbated by the PREEMPTION episodes of the past couple weeks? If
> > I choose the upgrade to -CURRENT, are there any caveats or
> > recommendations? (besides reading "/usr/src/UPDATING" which I do
> > religiously anyway)
> > 
> > _______________________________________________
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 
> I'm a bit nervous with asking you to upgrade to -current.  PREEMPTION is
> practically disabled in 5.2.1 so upgrading has a low chance of fixing
> the problem except maybe by sheer luck.  The best action would be to
> get a crashdump.  If your system has an NMI button, then there are some
> trivial patches that will assist with this.  If not, then you might want
> to look at backporting the ichwd watchdog driver and letting that do a
> chip-assisted NMI.
> 
> In any case, finding out exactly what each CPU is doing at the time of
> the lockup is going to be vital.  The lockups that I've been able to
> reproduce happen when a TAILQ in the scheduler gets corrupted and
> resulting in one CPU spinning on the list forever with the scheduler
> lock held.  All other cpus then quickly grind to a halt while they wait
> for the sched lock to become free, which it never does.
> 

The case unfortunately does not have a button (although the mobo does
have an NMI header/jumper). Backporting the watchdog driver sounds
doable; other than downloading the sys/dev/ichwd directory from a
repository and adding "options ichwd" to my kernel config file, what
else would be needed? I am willing to try to get at least one crashdump
before I have to go back to a -STABLE setup or try something so I can
get some uptime on this box.