Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Sep 2006 09:06:51 -0700 (PDT)
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   i386/103435: Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th)
Message-ID:  <20060920160651.C79AC1FA035@icarus.home.lan>
Resent-Message-ID: <200609201610.k8KGAUkJ056277@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         103435
>Category:       i386
>Synopsis:       Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 20 16:10:30 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Jeremy Chadwick
>Release:        FreeBSD 6.2-PRERELEASE i386
>Organization:
Parodius Networking
>Environment:
System: FreeBSD icarus.home.lan 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Mon Sep 18 03:38:31 PDT 2006 root@icarus.home.lan:/usr/obj/usr/src/sys/ICARUS i386
>Description:
Sometime between August 4th and September 12th, someone
changed something in the FreeBSD code which is breaking
things badly.  Particularly the following:

ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=41171803
ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=51392291
ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=31011999
em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP
ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719

The em0 timeouts happen at the same time (but not always!) as the
ATA timeouts:

Sep 20 08:47:42 icarus kernel: em0: watchdog timeout -- resetting
Sep 20 08:47:42 icarus kernel: em0: link state changed to DOWN
Sep 20 08:47:51 icarus kernel: em0: link state changed to UP
Sep 20 08:47:51 icarus kernel: ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
Sep 20 08:47:51 icarus kernel: ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719

The hardware in this box hasn't changed -- but because of the
ATA errors I was seeing, I decided to swap the disk out on ad12
with a completely different (and brand new) disk.  It does the
same thing you see above.  I also have a disk on ad14 (SATA port
#1), which can induce the same thing.

The controller being used is an ICH5-based controller.  There is
no RAID being used (all pure JBOD).

The motherboard is an Intel D865GLC, running the previous-to-
latest BIOS.  (The latest version only fixes some VGA adapter
issues).  Hyperthreading is enabled in the BIOS, but the
kernel itself is NOT using SMP.  (But DOES have the apic
device enabled)

As far as IRQs go, it looks as if the ICH5 and the em0 are
sharing an IRQ.  This is bizzare, as I would expect the APIC
to pick separate IRQs for these devices:

atapci2: <Intel ICH5 SATA150 controller> port 0xe800-0xe807,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd80f irq 18 at device 31.2 on pci0
em0: <Intel(R) PRO/1000 Network Connection Version - 6.1.4> port 0xac00-0xac1f mem 0xff800000-0xff81ffff irq 18 at device 1.0 on pci1

I've also built a kernel as of the 18th (you can see the above
uname output), and it has the same problem.

>How-To-Repeat:
I can reproduce this problem easily: during heavy disk
activity, the system will "stall" as if the kernel is spending
too much time doing something (deadlocked).  The best way I've
found to do this is to pick a FreeBSD port that relies on a
lot of dependancies and do a 'make clean' over and over:

  cd /usr/ports/databases/phpmyadmin
  make clean & ; make clean & ; make clean &
  {watch above problem occur}

Control-C to intercept applications doesn't work when this is
going on.
	
>Fix:
I haven't tried a different motherboard (I won't deny there's a
chance the MB is going bad -- hardware goes bad all the time in this
day and age), but I didn't have this problem until I built the
September 12th kernel.

I also have not tried booting without ACPI.
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060920160651.C79AC1FA035>