From owner-freebsd-current@FreeBSD.ORG Fri Apr 22 15:33:22 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 014CB16A4CE for ; Fri, 22 Apr 2005 15:33:22 +0000 (GMT) Received: from postal2.es.net (postal2.es.net [198.128.3.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id BAC7F43D41 for ; Fri, 22 Apr 2005 15:33:21 +0000 (GMT) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal2.es.net (Postal Node 2) with ESMTP (SSL) id IBA74465; Fri, 22 Apr 2005 08:33:21 -0700 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id E99665D07; Fri, 22 Apr 2005 08:33:20 -0700 (PDT) To: "Daniel Eriksson" In-reply-to: Your message of "Fri, 22 Apr 2005 15:13:43 +0200." Date: Fri, 22 Apr 2005 08:33:20 -0700 From: "Kevin Oberman" Message-Id: <20050422153320.E99665D07@ptavv.es.net> cc: 'FreeBSD Current' Subject: Re: Serious I/O problems (bad performance and live-lock) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Apr 2005 15:33:22 -0000 > From: "Daniel Eriksson" > Date: Fri, 22 Apr 2005 15:13:43 +0200 > Sender: owner-freebsd-current@freebsd.org > > > With recent CURRENT (at least for the last 2 days, but probably longer), two > of my systems can be brought to their knees (live-lock) with a simple "dd > if=/dev/zero of=test bs=128k" command. I have not tested any other systems. > > I keep both servers synced running 6-CURRENT: > > Server #1: dual AthlonMP 2600+, Compaq SmartArray 5302/64 hardware raid card > (ciss). The card hosts two arrays, one RAID-5 built from 4 discs that holds > the system and one RAID-0 built from 14 discs. All the discs are 36GB 10krpm > and I have one array on each channel on the card. > > Server #2: AthlonXP 2500+ with an old Maxtor 27GB UDMA66 disc for the > system. > > > What made me take notice was that server #2 ran through a "make > installkernel; make installworld" faster than server #1 during a recent > upgrade. This makes no sense given the superior I/O performance of the > hardware scsi raid array on server #1, and I know that in the past server #1 > has finished the process ahead of server #2. > > After the upgrade was done I ran some simple tests with 'dd', and it only > took ~1 minute for the system to live-lock. Breaking into DDB and killing > the 'dd' process brought the machine back to life. I assumed the problem was > ciss-related, CAM-related or SMP-related, but I just tried doing the same > thing on the UP machine (server #2), and it too live-locked within a minute. > > Both systems use pretty much the same config, with the only major difference > being SMP or not: > * SCHED_4BSD, PREEMPTION, ADAPTIVE_GIANT, DEVICE_POLLING, HZ=2000 > * debug.mpsafenet="1", debug.mpsafevfs="1" > > The problem manifests itself like this: > Shortly after 'dd' is started, the machine starts to swap. > The swapping makes the machine very unresponsive. > After about a minute or so the machine enters some sort of live-lock where > the IP-stack replies to icmp echos, but nothing else can be done. > > The last test I did was on a system compiled from sources dated > 2005.04.22.01.00.00 (earlier today). The oldest system I've tested is from > 2005.04.20.14.30.00 (but I did notice the system being slightly sluggish > earlier in the week too, so I think the problem is older than that). > > This is a serious regression! I don't know when I last did any testing with > 'dd', but I'm pretty sure it was less than 3 months ago (and back then > neither system live-locked). I had been seeing similar problems (very painful to get anything done) for several days, but the kernel I built from CURRENT as of 2005.04.20.16.29.00 seems to have fixed the problem for me. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634