From owner-freebsd-current@FreeBSD.ORG Fri Apr 22 16:00:17 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C43C16A4CE for ; Fri, 22 Apr 2005 16:00:17 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8B5A543D39 for ; Fri, 22 Apr 2005 16:00:12 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.1/8.13.1) with ESMTP id j3MG1meQ001573; Fri, 22 Apr 2005 10:01:48 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <42691EC2.7080106@samsco.org> Date: Fri, 22 Apr 2005 09:56:50 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050218 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Daniel Eriksson References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org cc: 'FreeBSD Current' Subject: Re: Serious I/O problems (bad performance and live-lock) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Apr 2005 16:00:17 -0000 Daniel Eriksson wrote: > With recent CURRENT (at least for the last 2 days, but probably longer), two > of my systems can be brought to their knees (live-lock) with a simple "dd > if=/dev/zero of=test bs=128k" command. I have not tested any other systems. > > I keep both servers synced running 6-CURRENT: > > Server #1: dual AthlonMP 2600+, Compaq SmartArray 5302/64 hardware raid card > (ciss). The card hosts two arrays, one RAID-5 built from 4 discs that holds > the system and one RAID-0 built from 14 discs. All the discs are 36GB 10krpm > and I have one array on each channel on the card. > > Server #2: AthlonXP 2500+ with an old Maxtor 27GB UDMA66 disc for the > system. > > > What made me take notice was that server #2 ran through a "make > installkernel; make installworld" faster than server #1 during a recent > upgrade. This makes no sense given the superior I/O performance of the > hardware scsi raid array on server #1, and I know that in the past server #1 > has finished the process ahead of server #2. > > After the upgrade was done I ran some simple tests with 'dd', and it only > took ~1 minute for the system to live-lock. Breaking into DDB and killing > the 'dd' process brought the machine back to life. I assumed the problem was > ciss-related, CAM-related or SMP-related, but I just tried doing the same > thing on the UP machine (server #2), and it too live-locked within a minute. > > Both systems use pretty much the same config, with the only major difference > being SMP or not: > * SCHED_4BSD, PREEMPTION, ADAPTIVE_GIANT, DEVICE_POLLING, HZ=2000 > * debug.mpsafenet="1", debug.mpsafevfs="1" What happens if you turn off mpsafevfs? Scott