Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Nov 2006 14:43:16 -0700
From:      Scott Long <scottl@samsco.org>
To:        Clayton Milos <clay@milos.co.za>
Cc:        freebsd-stable@freebsd.org, Jack Vogel <jfvogel@gmail.com>, ke han <ke.han@redstarling.com>
Subject:   Re: em driver testing
Message-ID:  <4550FDF4.90908@samsco.org>
In-Reply-To: <001601c702af$9d355940$9603a8c0@claylaptop>
References:  <68011C68-0962-4946-88E1-F36EE7C707DA@redstarling.com><20061106221219.GA66676@hugo10.ka.punkt.de><041201c701f9$37b2aed0$9603a8c0@claylaptop>	<2a41acea0611061614n478efe77y82c0ebc2e1b01e19@mail.gmail.com> <001601c702af$9d355940$9603a8c0@claylaptop>

next in thread | previous in thread | raw e-mail | index | archive | help
We've basically identified problems with the way that watchdogs are 
handled.  It is very fragile and sensitive to timing, so it's not
surprising that adjusting the the timing in one driver will affect
another driver.  The solution is to push the timieout/watchdog
logic entirely into the NIC drivers, like we did for if_em.  That
will take some time, and I doubt that xl specifically will get fixed
for 6.2.

Scott


Clayton Milos wrote:
> Hi Jack
> 
> 
> I patched the driver and re-compiled the kernel and userland.
> 
> All appears well with the em driver now. No more errors on it.
> I am getting watchdog timeouts on the xl driver now though. It was 
> happenning before at the same time as the em ones. Now I've passed a lot 
> of traffic on the em interface but the xl interface gets watchdog 
> errors. The em interface still works fine but the xl one is no usable 
> after this.
> 
> The motherboard has 2 onboard xl's and I am using the one for a live IP 
> and the other one is doing nothing. It is a server motherboard with an 
> AMD762 north bridge. It has 64bit pci 66MHz slots which I have the em 
> card in. The em card is a 32bit pcs 33MHz card though.
> 
> Here's what the xl card is with pciconf -lhv
> xl0@pci0:15:0:  class=0x020000 card=0x246210f1 chip=0x980010b7 rev=0x78 
> hdr=0x00
>    vendor   = '3COM Corp, Networking Division'
>    device   = '3C980-TX Fast EtherLink XL Server Adapter2'
> 
> The em card is such:
> em0@pci0:9:0:   class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 
> hdr=0x00
>    vendor   = 'Intel Corporation'
>    device   = 'PRO/1000 GT'
> 
> 
> Any help would be greatly appreciated.
> 
> 
> Clay
> 
> 
> 
> ----- Original Message ----- From: "Jack Vogel" <jfvogel@gmail.com>
> To: "Clayton Milos" <clay@milos.co.za>
> Cc: <freebsd-stable@freebsd.org>; "ke han" <ke.han@redstarling.com>
> Sent: Tuesday, November 07, 2006 2:14 AM
> Subject: Re: em driver testing
> 
> 
>> Well, so run 6.2 BETA3 plus the patch I posted as Patrick
>> mentioned and then report on that. You've got a lot of
>> potential problem areas here, I have no experience with
>> samba on FreeBSD. And that motherboard only has PCI
>> as I recall, yes? Still, it should get rid of the watchdogs
>> unless you have real hardware issues.
>>
>> Good luck,
>>
>> Jack
>>
>>
>> On 11/6/06, Clayton Milos <clay@milos.co.za> wrote:
>>
>>> Hi there
>>>
>>> I am having similar issues. Running 6.1-RELEASE.
>>>
>>> I'm using the box as a samba server with pure-ftpd on it too with 
>>> 2.5T of
>>> raid storage in it. the box is running the generic MP kernel on a Tyan
>>> Thunder K7 with the latest bios v2.14 and dual AthlonMP's. ECC Reg 
>>> ram that
>>> passed all tests with memtest.
>>>
>>> When I pull a few concurrent files over samba or if i pull a big file 
>>> (say
>>> 2-3G) over ftp to my laptop it runs at 30MB/sec but usually locks up 
>>> the box
>>> with watchdog timeout on the em interface. Usually it pops up with 
>>> timeouts
>>> on the xl interface at the same time and after a few seconds on the ahc
>>> (onboard adaptec scsi) interfce and I have to hard boot the box to 
>>> get it
>>> back to life.
>>>
>>> I''ve tried the same box with a 3com 3C996B-T NIC which has a Broadcom
>>> BCM5701TKHB chipset on it. It crashes within minutes with no traffic 
>>> on the
>>> interface. In fact the interface will accept an IP address but times out
>>> pinging anything on the LAN.
>>>
>>> If a kernel developer would like access to the box to chek it out please
>>> mail me.
>>>
>>> Regards
>>>
>>> Clay
>>>
>>>
>>> > Hello!
>>> >
>>> > On Tue, Nov 07, 2006 at 04:55:50AM +0800, ke han wrote:
>>> >
>>> >> I have a Sun X4100 which uses Intel ethernet.  I would like to
>>> >> install amd64 6.2beta3 on this server and put it through some tests.
>>> >> But I have no idea what tests to run or how to run them.
>>> >> Can someone provide some pointers?  I am happy to post my findings.
>>> >
>>> > Put some CPU load on the machine, e.g. by running
>>> >
>>> > cd /usr/src
>>> > sh
>>> > while true
>>> > do
>>> > make -j4 buildworld
>>> > done >mk.log
>>> >
>>> > on one terminal and then transfer some data to the system, e.g.
>>> > by fetch(1)ing via FTP from another box connected to the same
>>> > LAN. On all systems I have, there is no need to saturate the
>>> > Gbit-Link. 100 Mbit/s local connection will trigger the problem, too.
>>> >
>>> > If the problem exists on your system, you will see emN - watchdog > 
>>> timeout
>>> > messages on the console and in /var/log/messages, followed by a
>>> > reset of the interface and a short and recoverable, but complete,
>>> > loss of connectivity. A couple of seconds, maybe. This is enough
>>> > to frustrate people, who e.g. run large backup jobs over a single
>>> > TCP connection that takes a couple of hours to complete - the 
>>> interface
>>> > reset aborts the backup :-/
>>> >
>>> > I must say that it seems to me, these guys are putting a hell of
>>> > a lot of effort into this problem and "we" are making progress.
>>> > Things look quite good to me for 6.2-RELEASE.
>>> >
>>> > HTH,
>>> > Patrick
>>> > --
>>> > punkt.de GmbH         Internet - Dienstleistungen - Beratung
>>> > Vorholzstr. 25        Tel. 0721 9109 -0 Fax: -100
>>> > 76137 Karlsruhe       http://punkt.de



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4550FDF4.90908>