Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Apr 2005 08:33:20 -0700
From:      "Kevin Oberman" <oberman@es.net>
To:        "Daniel Eriksson" <daniel_k_eriksson@telia.com>
Cc:        'FreeBSD Current' <freebsd-current@freebsd.org>
Subject:   Re: Serious I/O problems (bad performance and live-lock) 
Message-ID:  <20050422153320.E99665D07@ptavv.es.net>
In-Reply-To: Your message of "Fri, 22 Apr 2005 15:13:43 %2B0200." <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1PdsKAAAAQAAAAKDq8qQ2O9UK7PKMOCt2NqwEAAAAA@telia.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> From: "Daniel Eriksson" <daniel_k_eriksson@telia.com>
> Date: Fri, 22 Apr 2005 15:13:43 +0200
> Sender: owner-freebsd-current@freebsd.org
> 
> 
> With recent CURRENT (at least for the last 2 days, but probably longer), two
> of my systems can be brought to their knees (live-lock) with a simple "dd
> if=/dev/zero of=test bs=128k" command. I have not tested any other systems.
> 
> I keep both servers synced running 6-CURRENT:
> 
> Server #1: dual AthlonMP 2600+, Compaq SmartArray 5302/64 hardware raid card
> (ciss). The card hosts two arrays, one RAID-5 built from 4 discs that holds
> the system and one RAID-0 built from 14 discs. All the discs are 36GB 10krpm
> and I have one array on each channel on the card.
> 
> Server #2: AthlonXP 2500+ with an old Maxtor 27GB UDMA66 disc for the
> system.
> 
> 
> What made me take notice was that server #2 ran through a "make
> installkernel; make installworld" faster than server #1 during a recent
> upgrade. This makes no sense given the superior I/O performance of the
> hardware scsi raid array on server #1, and I know that in the past server #1
> has finished the process ahead of server #2.
> 
> After the upgrade was done I ran some simple tests with 'dd', and it only
> took ~1 minute for the system to live-lock. Breaking into DDB and killing
> the 'dd' process brought the machine back to life. I assumed the problem was
> ciss-related, CAM-related or SMP-related, but I just tried doing the same
> thing on the UP machine (server #2), and it too live-locked within a minute.
> 
> Both systems use pretty much the same config, with the only major difference
> being SMP or not:
> * SCHED_4BSD, PREEMPTION, ADAPTIVE_GIANT, DEVICE_POLLING, HZ=2000
> * debug.mpsafenet="1", debug.mpsafevfs="1"
> 
> The problem manifests itself like this:
> Shortly after 'dd' is started, the machine starts to swap.
> The swapping makes the machine very unresponsive.
> After about a minute or so the machine enters some sort of live-lock where
> the IP-stack replies to icmp echos, but nothing else can be done.
> 
> The last test I did was on a system compiled from sources dated
> 2005.04.22.01.00.00 (earlier today). The oldest system I've tested is from
> 2005.04.20.14.30.00 (but I did notice the system being slightly sluggish
> earlier in the week too, so I think the problem is older than that).
> 
> This is a serious regression! I don't know when I last did any testing with
> 'dd', but I'm pretty sure it was less than 3 months ago (and back then
> neither system live-locked).

I had been seeing similar problems (very painful to get anything done)
for several days, but the kernel I built from CURRENT as of
2005.04.20.16.29.00 seems to have fixed the problem for me.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050422153320.E99665D07>