Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Sep 2003 13:51:56 -0700 (PDT)
From:      John Polstra <jdp@polstra.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: recent stability problems with fxp driver
Message-ID:  <XFMail.20030912135156.jdp@polstra.com>
In-Reply-To: <6.0.0.22.0.20030912134112.05891060@209.112.4.2>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12-Sep-2003 Mike Tancsa wrote:
> At 12:26 PM 12/09/2003, Info Account wrote:
>>I've spent the past four days or so updating machines here to 4.8/9-stable via
>>cvsup, and have done a complete make buildworld/kernel on each machine (some
>>SMP, some single processor).  It seems something is broken with the latest fxp
>>driver, on each machine (different mobos and hardware configs) heavy network
>>traffic with fxp NICs causes timeouts and random kernel panics.
> 
> I have a few boxes pushing over 50Mb with fxp cards and havent seen this 
> problem.  What type of fxp cards do you have ?  What does
>   pciconf -v -l
> show for the Intel types ?
> 
> Also, I have found in the past that I would see this behavior if I changed 
> NICs and didnt do a PCIconfig reset in the MB BIOS.  There is something 
> about Intel nics and Adaptec and 3ware cards that particularly require 
> this.  Also, make sure that you dont have some duplex mismatches on the 
> nics.  I have seen where excessive errors combined with high traffic will 
> cause panics.
> 
> Also, please post the actual error messages on each of the machines.

The problem is real, at least on some hardware.  I had to give up on
using the two integrated fxp devices on my Dell 1550 -- which is a
real bummer, since it's a 1U box that only has two PCI slots.  With
the latest -stable driver, I couldn't fetch a 560 MB file from
another machine on the LAN using FTP without killing the fxp device.
The messages vary in detail, but this will give you the general idea:

    Sep 12 10:18:22 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x90 0x0
    Sep 12 10:18:31 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x90 0x0
    Sep 12 10:18:32 thin su: jdp to root on /dev/ttyp1
    Sep 12 10:18:39 thin /kernel: fxp0: DMA timeout
    Sep 12 10:18:39 thin last message repeated 2 times
    Sep 12 10:18:49 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:18:51 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:18:54 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:18:56 thin /kernel: fxp0: device timeout
    Sep 12 10:18:56 thin /kernel: fxp0: DMA timeout
    Sep 12 10:19:10 thin last message repeated 5 times
    Sep 12 10:19:10 thin /kernel: fxp0: SCB timeout: 0x1 0x20 0x80 0x0
    Sep 12 10:19:13 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:19:14 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:19:15 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:19:16 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:19:36 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:19:38 thin /kernel: fxp0: device timeout
    Sep 12 10:19:38 thin /kernel: fxp0: DMA timeout
    Sep 12 10:19:38 thin last message repeated 2 times
    Sep 12 10:19:52 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:19:54 thin /kernel: fxp0: device timeout
    Sep 12 10:19:54 thin /kernel: fxp0: DMA timeout
    Sep 12 10:19:54 thin last message repeated 2 times
    Sep 12 10:20:00 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:20:21 thin /kernel: fxp0: device timeout
    Sep 12 10:20:21 thin /kernel: fxp0: DMA timeout
    Sep 12 10:20:21 thin last message repeated 2 times
    Sep 12 10:20:29 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x50 0x0
    Sep 12 10:20:35 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:20:35 thin /kernel: fxp0: SCB timeout: 0x80 0x0 0x50 0x0
    Sep 12 10:21:04 thin /kernel: fxp0: SCB timeout: 0x70 0x0 0x90 0x0
    Sep 12 10:21:09 thin /kernel: fxp0: device timeout
    Sep 12 10:21:09 thin /kernel: fxp0: DMA timeout
    Sep 12 10:21:09 thin last message repeated 2 times
    Sep 12 10:21:09 thin /kernel: fxp0: command queue timeout
    Sep 12 10:21:12 thin shutdown: reboot by jdp: 

This morning I tried regressing the driver to earlier versions in an
attempt to find the commit that broke it.  Not good news:

    RELENG_4_8_0_RELEASE        bad
    RELENG_4_7_0_RELEASE        bad
    RELENG_4_6_0_RELEASE        bad
    RELENG_4_4_0_RELEASE        bad
    RELENG_4_2_0_RELEASE        bad
    RELENG_4_1_0_RELEASE        bad

The problem is easier to reproduce in recent versions of the
driver than in older versions.  With the current -stable driver, I
can almost always kill the chips with a single transfer of that 560
MB file.  With the 4.7.0 driver, it takes about 5 transfers before
it fails.  With the 4.2.0 driver, it took 15+ transfers.

The devices are Intel 82559 chips.  Here's their pciconf output:

none0@pci0:1:0: class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter'
    class    = network
    subclass = ethernet
none1@pci0:2:0: class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter'
    class    = network
    subclass = ethernet

Maybe the problem really is in the Dell 1550.  I have various flavors
of fxp card in several other machines, and I never have trouble with
them.  I did check my firmware and BIOS versions, though, and they're
fully up-to-date.  I have a suspicion that our driver may not be
dealing properly with Dell's power management or IPMI stuff, but it's
just a vague suspicion without any real evidence.

John



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20030912135156.jdp>