From owner-freebsd-current@FreeBSD.ORG Sat Jan 14 18:52:53 2006 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7CF6916A41F for ; Sat, 14 Jan 2006 18:52:53 +0000 (GMT) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC33F43D6D for ; Sat, 14 Jan 2006 18:52:47 +0000 (GMT) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.13.4/8.13.4) with ESMTP id k0EIqklm001085; Sat, 14 Jan 2006 10:52:46 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.13.4/8.13.1/Submit) id k0EIqk3M001084; Sat, 14 Jan 2006 10:52:46 -0800 (PST) (envelope-from sgk) Date: Sat, 14 Jan 2006 10:52:46 -0800 From: Steve Kargl To: Scott Long Message-ID: <20060114185246.GA773@troutmask.apl.washington.edu> References: <20060114052117.GA16773@xor.obsecurity.org> <20060114055347.GA45580@troutmask.apl.washington.edu> <43C9188B.3050300@samsco.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43C9188B.3050300@samsco.org> User-Agent: Mutt/1.4.2.1i Cc: current@freebsd.org, Kris Kennaway Subject: Re: Extreme contigmalloc() slowness with mpt driver X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Jan 2006 18:52:53 -0000 On Sat, Jan 14, 2006 at 08:28:11AM -0700, Scott Long wrote: > Steve Kargl wrote: > >On Sat, Jan 14, 2006 at 12:21:17AM -0500, Kris Kennaway wrote: > > > >>I have an amd64 machine with 16GB of RAM that takes ages to boot (~40 > >>minutes on 7.0). This is because the mpt driver takes 20 minutes to > >>attach (with 2 instances). This in turn is because the following code > >>from dev/mpt/mpt_pci.c:mpt_dma_mem_alloc() takes about 5 seconds to > >>execute, and it is run 256 times in a loop: > >> > >> error = bus_dmamap_create(mpt->buffer_dmat, 0, &req->dmap); > >> > >>When I set vm.old_contigmalloc=1, the system boots without delay. > >> > >>This points to a bug in contigmalloc. > >> > > > >This is probably related to my recent reports of extremely > >slow probing of fxp0. I have 12 GB on a Tyan K8S Pro and > >fxp0 takes on the order of 7 minutes to probe. > > > > Yep, that's the same reason. THe issue here is that bus_dmamap_create > is using contigmalloc to allocate bounce pages for the device. At the > request of Soeren, I recently upped the max limit on bounce pages from > 512 to 4096. Before that, drivers would quickly reach the max and then > move on. Now that the max is a lot higher, I guess it points to a > scalability problem in the page search algorithm of contigmalloc. > Thanks for the confirmation. There may be more serious problems than just long boot times. I'm seeing recurring lock-ups on my system. There is no panic and no keyboard/network response from the system. The system sit in my office and acts as a very expense heater. In my attempts to diagnosis the problems, I've cleaned out all installed ports, all old shared libraries, all old bin/, sbin/, usr/bin, and usr/sbin binaries. Then, I rebuilt kernel and world and booted a fairly clean system. I have INVARIANTS/WITNESS/DDB in my kernelr. I rebuilt multiple ports from multiple vty terms without a problem. Fired up X11, opened several xterms and built more ports. All seems fine. Then, I rebuilt my Monte Carlo simulation code. This program will fork 2 children. Each child will allocate up to 1 GB of memory where there are a few 250 MB arrays. Each child runs for 7 minutes, writes a few files, then exits. The parent waits on the children, and then forks 2 more children. The parent should run for 24 to 96 hours for a complete simulation. At some point the system will lock up. This is while X11 is running. I haven't had a lock up at a vty term. Note, I've used memtest86+ to check the memory and I've used Adaptec's low level verfication to check my hard drives. Anyhow, I'll keep hunting for the root of the problem. -- Steve