From owner-freebsd-current@FreeBSD.ORG  Sat Jan 14 18:52:53 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7CF6916A41F
	for <current@freebsd.org>; Sat, 14 Jan 2006 18:52:53 +0000 (GMT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.208.78.105])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DC33F43D6D
	for <current@freebsd.org>; Sat, 14 Jan 2006 18:52:47 +0000 (GMT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
	by troutmask.apl.washington.edu (8.13.4/8.13.4) with ESMTP id
	k0EIqklm001085; Sat, 14 Jan 2006 10:52:46 -0800 (PST)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.13.4/8.13.1/Submit) id
	k0EIqk3M001084; Sat, 14 Jan 2006 10:52:46 -0800 (PST)
	(envelope-from sgk)
Date: Sat, 14 Jan 2006 10:52:46 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Scott Long <scottl@samsco.org>
Message-ID: <20060114185246.GA773@troutmask.apl.washington.edu>
References: <20060114052117.GA16773@xor.obsecurity.org>
	<20060114055347.GA45580@troutmask.apl.washington.edu>
	<43C9188B.3050300@samsco.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <43C9188B.3050300@samsco.org>
User-Agent: Mutt/1.4.2.1i
Cc: current@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject: Re: Extreme contigmalloc() slowness with mpt driver
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jan 2006 18:52:53 -0000

On Sat, Jan 14, 2006 at 08:28:11AM -0700, Scott Long wrote:
> Steve Kargl wrote:
> >On Sat, Jan 14, 2006 at 12:21:17AM -0500, Kris Kennaway wrote:
> >
> >>I have an amd64 machine with 16GB of RAM that takes ages to boot (~40
> >>minutes on 7.0).  This is because the mpt driver takes 20 minutes to
> >>attach (with 2 instances).  This in turn is because the following code
> >>from dev/mpt/mpt_pci.c:mpt_dma_mem_alloc() takes about 5 seconds to
> >>execute, and it is run 256 times in a loop:
> >>
> >>               error = bus_dmamap_create(mpt->buffer_dmat, 0, &req->dmap);
> >>
> >>When I set vm.old_contigmalloc=1, the system boots without delay.
> >>
> >>This points to a bug in contigmalloc.
> >>
> >
> >This is probably related to my recent reports of extremely
> >slow probing of fxp0.  I have 12 GB on a Tyan K8S Pro and
> >fxp0 takes on the order of 7 minutes to probe.
> >
> 
> Yep, that's the same reason.  THe issue here is that bus_dmamap_create
> is using contigmalloc to allocate bounce pages for the device.  At the
> request of Soeren, I recently upped the max limit on bounce pages from
> 512 to 4096.  Before that, drivers would quickly reach the max and then
> move on.  Now that the max is a lot higher, I guess it points to a
> scalability problem in the page search algorithm of contigmalloc.
> 

Thanks for the confirmation.  There may be more serious problems than
just long boot times.  I'm seeing recurring lock-ups on my system.  There
is no panic and no keyboard/network response from the system.  The system
sit in my office and acts as a very expense heater.

In my attempts to diagnosis the problems, I've cleaned out all installed
ports, all old shared libraries, all old bin/, sbin/, usr/bin, and
usr/sbin binaries.  Then, I rebuilt kernel and world and booted a fairly
clean system.  I have INVARIANTS/WITNESS/DDB in my kernelr.  I rebuilt
multiple ports from multiple vty terms without a problem.  Fired up X11,
opened several xterms and built more ports.  All seems fine.

Then, I rebuilt my Monte Carlo simulation code.  This program will
fork 2 children.  Each child will allocate up to 1 GB of memory 
where there are a few 250 MB arrays.  Each child runs for 7 minutes,
writes a few files, then exits.  The parent waits on the children,
and then forks 2 more children.  The parent should run for 24 to 96
hours for a complete simulation.  At some point the system will lock
up.  This is while X11 is running.  I haven't had a lock up at a 
vty term.

Note, I've used memtest86+ to check the memory and I've used Adaptec's
low level verfication to check my hard drives.

Anyhow, I'll keep hunting for the root of the problem.

-- 
Steve